* Recovering RAID Volumes from 6 Disks
@ 2016-07-19 16:29 Amit Biswas
2016-07-19 19:57 ` Wols Lists
0 siblings, 1 reply; 9+ messages in thread
From: Amit Biswas @ 2016-07-19 16:29 UTC (permalink / raw)
To: linux-raid
Greetings!
Backup server was acting up and the issue was the drives (all of them)
:( Could use some guidance or verdict.
It has a total of 6 drives: sda,b,c,d,e,f. From the superblock info
(attached), there is a raid 1, and a raid 10 volume. Problem is all
the disks are part of both raid volumes (according to superblock).
I am currently booted into an ubuntu live disk shell.
Raid 1 : sde2(0) sdb2(1) sdc2(s) sdd2(s) sdf2(s)
- > 2 devices
- > contains the boot partition. I was able to mount this.
Raid 0: sdc3(2) sdd3(3) sde3(4) sdf3(5)
- > 6 Devices
- > currently inactive
Again, attached are the superblocks, dmseg output.
[Thu Jul 7 15:54:07 2016] Buffer I/O error on dev sdb3, logical block
1, async page read
/dev/sdb2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 356f81a0:69057dd3:8f19639c:26b37454
Name : Vlab-backup:0
Creation Time : Wed May 28 20:44:31 2014
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 976384 (476.83 MiB 499.91 MB)
Array Size : 488128 (476.77 MiB 499.84 MB)
Used Dev Size : 976256 (476.77 MiB 499.84 MB)
Data Offset : 512 sectors
Super Offset : 8 sectors
State : clean
Device UUID : ab7e7ad4:0bcf824d:3976a240:063016cf
Update Time : Wed Jul 6 23:02:21 2016
Checksum : ead06f71 - correct
Events : 142
Device Role : Active device 1
Array State : AA ('A' == active, '.' == missing)
/dev/sdc2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 356f81a0:69057dd3:8f19639c:26b37454
Name : Vlab-backup:0
Creation Time : Wed May 28 20:44:31 2014
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 976384 (476.83 MiB 499.91 MB)
Array Size : 488128 (476.77 MiB 499.84 MB)
Used Dev Size : 976256 (476.77 MiB 499.84 MB)
Data Offset : 512 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 86665b3f:664bd08b:65195b2a:bd8345b1
Update Time : Wed Jul 6 23:02:21 2016
Checksum : 5fe6ca8b - correct
Events : 142
Device Role : spare
Array State : AA ('A' == active, '.' == missing)
/dev/sdd2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 356f81a0:69057dd3:8f19639c:26b37454
Name : Vlab-backup:0
Creation Time : Wed May 28 20:44:31 2014
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 976384 (476.83 MiB 499.91 MB)
Array Size : 488128 (476.77 MiB 499.84 MB)
Used Dev Size : 976256 (476.77 MiB 499.84 MB)
Data Offset : 512 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 5ad0350e:5f2b9fad:c8cac4fa:aa5524ef
Update Time : Wed Jul 6 23:02:21 2016
Checksum : 5ed897aa - correct
Events : 142
Device Role : spare
Array State : AA ('A' == active, '.' == missing)
/dev/sde2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 356f81a0:69057dd3:8f19639c:26b37454
Name : Vlab-backup:0
Creation Time : Wed May 28 20:44:31 2014
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 976384 (476.83 MiB 499.91 MB)
Array Size : 488128 (476.77 MiB 499.84 MB)
Used Dev Size : 976256 (476.77 MiB 499.84 MB)
Data Offset : 512 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 3e6b7c97:3feb9132:235a5a6e:836e4f35
Update Time : Wed Jul 6 23:02:21 2016
Checksum : 26d29aa2 - correct
Events : 142
Device Role : Active device 0
Array State : AA ('A' == active, '.' == missing)
/dev/sdf2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 356f81a0:69057dd3:8f19639c:26b37454
Name : Vlab-backup:0
Creation Time : Wed May 28 20:44:31 2014
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 976384 (476.83 MiB 499.91 MB)
Array Size : 488128 (476.77 MiB 499.84 MB)
Used Dev Size : 976256 (476.77 MiB 499.84 MB)
Data Offset : 512 sectors
Super Offset : 8 sectors
State : clean
Device UUID : b0b62b5e:56fbc4c5:735250c4:029d3ee2
Update Time : Wed Jul 6 23:02:21 2016
Checksum : 839a1cfc - correct
Events : 142
Device Role : spare
Array State : AA ('A' == active, '.' == missing)
/dev/sdc3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 3d825ac3:a2e5a336:554fa8d8:542297a3
Name : Vlab-backup:1
Creation Time : Wed May 28 20:44:56 2014
Raid Level : raid10
Raid Devices : 6
Avail Dev Size : 1952280576 (930.92 GiB 999.57 GB)
Array Size : 2928419328 (2792.76 GiB 2998.70 GB)
Used Dev Size : 1952279552 (930.92 GiB 999.57 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 0b23c18b:3202f588:9b7b5636:49f80c36
Update Time : Wed Jul 6 22:29:24 2016
Checksum : 34d1544e - correct
Events : 9149455
Layout : near=2
Chunk Size : 512K
Device Role : Active device 2
Array State : .AAAAA ('A' == active, '.' == missing)
/dev/sdd3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 3d825ac3:a2e5a336:554fa8d8:542297a3
Name : Vlab-backup:1
Creation Time : Wed May 28 20:44:56 2014
Raid Level : raid10
Raid Devices : 6
Avail Dev Size : 1952280576 (930.92 GiB 999.57 GB)
Array Size : 2928419328 (2792.76 GiB 2998.70 GB)
Used Dev Size : 1952279552 (930.92 GiB 999.57 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 2fd7193f:502b657f:708dc952:7a0a7309
Update Time : Wed Jul 6 22:29:24 2016
Checksum : ce735592 - correct
Events : 9149455
Layout : near=2
Chunk Size : 512K
Device Role : Active device 3
Array State : .AAAAA ('A' == active, '.' == missing)
/dev/sde3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 3d825ac3:a2e5a336:554fa8d8:542297a3
Name : Vlab-backup:1
Creation Time : Wed May 28 20:44:56 2014
Raid Level : raid10
Raid Devices : 6
Avail Dev Size : 1952280576 (930.92 GiB 999.57 GB)
Array Size : 2928419328 (2792.76 GiB 2998.70 GB)
Used Dev Size : 1952279552 (930.92 GiB 999.57 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 43d64bab:059237da:63729452:2ce3beda
Update Time : Wed Jul 6 22:29:24 2016
Checksum : 668e7903 - correct
Events : 9149455
Layout : near=2
Chunk Size : 512K
Device Role : Active device 4
Array State : .AAAAA ('A' == active, '.' == missing)
/dev/sdf3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 3d825ac3:a2e5a336:554fa8d8:542297a3
Name : Vlab-backup:1
Creation Time : Wed May 28 20:44:56 2014
Raid Level : raid10
Raid Devices : 6
Avail Dev Size : 1952280576 (930.92 GiB 999.57 GB)
Array Size : 2928419328 (2792.76 GiB 2998.70 GB)
Used Dev Size : 1952279552 (930.92 GiB 999.57 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : c243087b:605b0938:f6b1e13b:11cf83ad
Update Time : Wed Jul 6 22:29:24 2016
Checksum : 50334dad - correct
Events : 9149455
Layout : near=2
Chunk Size : 512K
Device Role : Active device 5
Array State : .AAAAA ('A' == active, '.' == missing)
[Thu Jul 7 15:53:41 2016] ata3: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
[Thu Jul 7 15:53:41 2016] ata3.00: ATA-8: ST1000DM003-1CH162, CC46,
max UDMA/133
[Thu Jul 7 15:53:41 2016] ata3.00: 1953525168 sectors, multi 0: LBA48
NCQ (depth 31/32), AA
[Thu Jul 7 15:53:41 2016] ata3.00: configured for UDMA/133
[Thu Jul 7 15:53:41 2016] scsi 2:0:0:0: Direct-Access ATA
ST1000DM003-1CH1 CC46 PQ: 0 ANSI: 5
[Thu Jul 7 15:53:41 2016] sd 2:0:0:0: [sda] 1953525168 512-byte
logical blocks: (1.00 TB/931 GiB)
[Thu Jul 7 15:53:41 2016] sd 2:0:0:0: [sda] 4096-byte physical blocks
[Thu Jul 7 15:53:41 2016] sd 2:0:0:0: Attached scsi generic sg2 type 0
[Thu Jul 7 15:53:41 2016] sd 2:0:0:0: [sda] Write Protect is off
[Thu Jul 7 15:53:41 2016] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[Thu Jul 7 15:53:41 2016] sd 2:0:0:0: [sda] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
[Thu Jul 7 15:53:41 2016] scsi 3:0:0:0: Direct-Access ATA
ST1000DM003-9YN1 CC9C PQ: 0 ANSI: 5
[Thu Jul 7 15:53:41 2016] sd 3:0:0:0: [sdb] 1953525168 512-byte
logical blocks: (1.00 TB/931 GiB)
[Thu Jul 7 15:53:41 2016] sd 3:0:0:0: [sdb] 4096-byte physical blocks
[Thu Jul 7 15:53:41 2016] sd 3:0:0:0: Attached scsi generic sg3 type 0
[Thu Jul 7 15:53:41 2016] sd 3:0:0:0: [sdb] Write Protect is off
[Thu Jul 7 15:53:41 2016] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[Thu Jul 7 15:53:41 2016] sd 3:0:0:0: [sdb] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
[Thu Jul 7 15:53:41 2016] scsi 4:0:0:0: Direct-Access ATA
ST1000NM0011 SN03 PQ: 0 ANSI: 5
[Thu Jul 7 15:53:41 2016] sd 4:0:0:0: [sdc] 1953525168 512-byte
logical blocks: (1.00 TB/931 GiB)
[Thu Jul 7 15:53:41 2016] sd 4:0:0:0: Attached scsi generic sg4 type 0
[Thu Jul 7 15:53:41 2016] sd 4:0:0:0: [sdc] Write Protect is off
[Thu Jul 7 15:53:41 2016] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[Thu Jul 7 15:53:41 2016] sd 4:0:0:0: [sdc] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
[Thu Jul 7 15:53:41 2016] scsi 5:0:0:0: Direct-Access ATA
ST1000NM0011 SN03 PQ: 0 ANSI: 5
[Thu Jul 7 15:53:41 2016] sd 5:0:0:0: [sdd] 1953525168 512-byte
logical blocks: (1.00 TB/931 GiB)
[Thu Jul 7 15:53:41 2016] sd 5:0:0:0: Attached scsi generic sg5 type 0
[Thu Jul 7 15:53:41 2016] sd 5:0:0:0: [sdd] Write Protect is off
[Thu Jul 7 15:53:41 2016] sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[Thu Jul 7 15:53:41 2016] sd 5:0:0:0: [sdd] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
[Thu Jul 7 15:53:41 2016] scsi 6:0:0:0: Direct-Access ATA
ST1000DM003-1CH1 CC46 PQ: 0 ANSI: 5
[Thu Jul 7 15:53:41 2016] sd 6:0:0:0: [sde] 1953525168 512-byte
logical blocks: (1.00 TB/931 GiB)
[Thu Jul 7 15:53:41 2016] sd 6:0:0:0: [sde] 4096-byte physical blocks
[Thu Jul 7 15:53:41 2016] sd 6:0:0:0: Attached scsi generic sg6 type 0
[Thu Jul 7 15:53:41 2016] sd 6:0:0:0: [sde] Write Protect is off
[Thu Jul 7 15:53:41 2016] sd 6:0:0:0: [sde] Mode Sense: 00 3a 00 00
[Thu Jul 7 15:53:41 2016] sd 6:0:0:0: [sde] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
[Thu Jul 7 15:53:41 2016] scsi 7:0:0:0: Direct-Access ATA
ST1000DM003-9YN1 CC9C PQ: 0 ANSI: 5
[Thu Jul 7 15:53:41 2016] sd 7:0:0:0: [sdf] 1953525168 512-byte
logical blocks: (1.00 TB/931 GiB)
[Thu Jul 7 15:53:41 2016] sd 7:0:0:0: [sdf] 4096-byte physical blocks
[Thu Jul 7 15:53:41 2016] sd 7:0:0:0: Attached scsi generic sg7 type 0
[Thu Jul 7 15:53:41 2016] sd 7:0:0:0: [sdf] Write Protect is off
[Thu Jul 7 15:53:41 2016] sd 7:0:0:0: [sdf] Mode Sense: 00 3a 00 00
[Thu Jul 7 15:53:41 2016] sd 7:0:0:0: [sdf] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
[Thu Jul 7 15:53:41 2016] random: nonblocking pool is initialized
[Thu Jul 7 15:53:41 2016] sdd: sdd1 sdd2 sdd3
[Thu Jul 7 15:53:41 2016] sd 5:0:0:0: [sdd] Attached SCSI disk
[Thu Jul 7 15:53:41 2016] sdc: sdc1 sdc2 sdc3
[Thu Jul 7 15:53:41 2016] sd 4:0:0:0: [sdc] Attached SCSI disk
[Thu Jul 7 15:53:41 2016] sdb: sdb1 sdb2 sdb3
[Thu Jul 7 15:53:41 2016] sd 3:0:0:0: [sdb] Attached SCSI disk
[Thu Jul 7 15:53:41 2016] sdf: sdf1 sdf2 sdf3
[Thu Jul 7 15:53:41 2016] sd 7:0:0:0: [sdf] Attached SCSI disk
[Thu Jul 7 15:53:41 2016] sde: sde1 sde2 sde3
[Thu Jul 7 15:53:41 2016] sd 6:0:0:0: [sde] Attached SCSI disk
[Thu Jul 7 15:54:02 2016] ata3.00: qc timeout (cmd 0x47)
[Thu Jul 7 15:54:02 2016] ata3.00: READ LOG DMA EXT failed, trying unqueued
[Thu Jul 7 15:54:02 2016] ata3: failed to read log page 10h (errno=-5)
[Thu Jul 7 15:54:02 2016] ata3.00: exception Emask 0x1 SAct
0x10000000 SErr 0x0 action 0x6 frozen
[Thu Jul 7 15:54:02 2016] ata3.00: irq_stat 0x40000008
[Thu Jul 7 15:54:02 2016] ata3.00: failed command: READ FPDMA QUEUED
[Thu Jul 7 15:54:02 2016] ata3.00: cmd
60/08:e0:a8:6d:70/00:00:74:00:00/40 tag 28 ncq 4096 in
[Thu Jul 7 15:54:02 2016] res
40/00:e0:a8:6d:70/00:00:74:00:00/40 Emask 0x1 (device error)
[Thu Jul 7 15:54:02 2016] ata3.00: status: { DRDY }
[Thu Jul 7 15:54:02 2016] ata3: hard resetting link
[Thu Jul 7 15:54:05 2016] ata3: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
[Thu Jul 7 15:54:05 2016] ata3.00: configured for UDMA/133
[Thu Jul 7 15:54:05 2016] ata3: EH complete
[Thu Jul 7 15:54:06 2016] ata4.00: exception Emask 0x0 SAct 0x8 SErr
0x0 action 0x0
[Thu Jul 7 15:54:06 2016] ata4.00: irq_stat 0x40000008
[Thu Jul 7 15:54:06 2016] ata4.00: failed command: READ FPDMA QUEUED
[Thu Jul 7 15:54:06 2016] ata4.00: cmd
60/08:18:08:f8:0e/00:00:00:00:00/40 tag 3 ncq 4096 in
[Thu Jul 7 15:54:06 2016] res
41/40:08:09:f8:0e/00:00:00:00:00/00 Emask 0x409 (media error) <F>
[Thu Jul 7 15:54:06 2016] ata4.00: status: { DRDY ERR }
[Thu Jul 7 15:54:06 2016] ata4.00: error: { UNC }
[Thu Jul 7 15:54:06 2016] ata4.00: configured for UDMA/133
[Thu Jul 7 15:54:06 2016] sd 3:0:0:0: [sdb] tag#3 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Jul 7 15:54:06 2016] sd 3:0:0:0: [sdb] tag#3 Sense Key : Medium
Error [current] [descriptor]
[Thu Jul 7 15:54:06 2016] sd 3:0:0:0: [sdb] tag#3 Add. Sense:
Unrecovered read error - auto reallocate failed
[Thu Jul 7 15:54:06 2016] sd 3:0:0:0: [sdb] tag#3 CDB: Read(10) 28 00
00 0e f8 08 00 00 08 00
[Thu Jul 7 15:54:06 2016] blk_update_request: I/O error, dev sdb, sector 981001
[Thu Jul 7 15:54:06 2016] ata4: EH complete
[Thu Jul 7 15:54:07 2016] ata4.00: exception Emask 0x0 SAct 0x400
SErr 0x0 action 0x0
[Thu Jul 7 15:54:07 2016] ata4.00: irq_stat 0x40000008
[Thu Jul 7 15:54:07 2016] ata4.00: failed command: READ FPDMA QUEUED
[Thu Jul 7 15:54:07 2016] ata4.00: cmd
60/08:50:08:f8:0e/00:00:00:00:00/40 tag 10 ncq 4096 in
[Thu Jul 7 15:54:07 2016] res
41/40:08:09:f8:0e/00:00:00:00:00/00 Emask 0x409 (media error) <F>
[Thu Jul 7 15:54:07 2016] ata4.00: status: { DRDY ERR }
[Thu Jul 7 15:54:07 2016] ata4.00: error: { UNC }
[Thu Jul 7 15:54:07 2016] ata4.00: configured for UDMA/133
[Thu Jul 7 15:54:07 2016] sd 3:0:0:0: [sdb] tag#10 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Jul 7 15:54:07 2016] sd 3:0:0:0: [sdb] tag#10 Sense Key : Medium
Error [current] [descriptor]
[Thu Jul 7 15:54:07 2016] sd 3:0:0:0: [sdb] tag#10 Add. Sense:
Unrecovered read error - auto reallocate failed
[Thu Jul 7 15:54:07 2016] sd 3:0:0:0: [sdb] tag#10 CDB: Read(10) 28
00 00 0e f8 08 00 00 08 00
[Thu Jul 7 15:54:07 2016] blk_update_request: I/O error, dev sdb, sector 981001
[Thu Jul 7 15:54:07 2016] Buffer I/O error on dev sdb3, logical block
1, async page read
[Thu Jul 7 15:54:07 2016] ata4: EH complete
[Thu Jul 7 15:54:07 2016] ata4.00: exception Emask 0x0 SAct 0x2 SErr
0x0 action 0x0
[Thu Jul 7 15:54:07 2016] ata4.00: irq_stat 0x40000008
[Thu Jul 7 15:54:07 2016] ata4.00: failed command: READ FPDMA QUEUED
[Thu Jul 7 15:54:07 2016] ata4.00: cmd
60/08:08:08:f8:0e/00:00:00:00:00/40 tag 1 ncq 4096 in
[Thu Jul 7 15:54:07 2016] res
41/40:08:09:f8:0e/00:00:00:00:00/00 Emask 0x409 (media error) <F>
[Thu Jul 7 15:54:07 2016] ata4.00: status: { DRDY ERR }
[Thu Jul 7 15:54:07 2016] ata4.00: error: { UNC }
[Thu Jul 7 15:54:07 2016] ata4.00: configured for UDMA/133
[Thu Jul 7 15:54:07 2016] sd 3:0:0:0: [sdb] tag#1 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Jul 7 15:54:07 2016] sd 3:0:0:0: [sdb] tag#1 Sense Key : Medium
Error [current] [descriptor]
[Thu Jul 7 15:54:07 2016] sd 3:0:0:0: [sdb] tag#1 Add. Sense:
Unrecovered read error - auto reallocate failed
[Thu Jul 7 15:54:07 2016] sd 3:0:0:0: [sdb] tag#1 CDB: Read(10) 28 00
00 0e f8 08 00 00 08 00
[Thu Jul 7 15:54:07 2016] blk_update_request: I/O error, dev sdb, sector 981001
[Thu Jul 7 15:54:07 2016] Buffer I/O error on dev sdb3, logical block
1, async page read
[Thu Jul 7 15:54:07 2016] ata4: EH complete
[Thu Jul 7 15:54:08 2016] ata4.00: exception Emask 0x0 SAct 0x20 SErr
0x0 action 0x0
[Thu Jul 7 15:54:08 2016] ata4.00: irq_stat 0x40000008
[Thu Jul 7 15:54:08 2016] ata4.00: failed command: READ FPDMA QUEUED
[Thu Jul 7 15:54:08 2016] ata4.00: cmd
60/08:28:08:f8:0e/00:00:00:00:00/40 tag 5 ncq 4096 in
[Thu Jul 7 15:54:08 2016] res
41/40:08:09:f8:0e/00:00:00:00:00/00 Emask 0x409 (media error) <F>
[Thu Jul 7 15:54:08 2016] ata4.00: status: { DRDY ERR }
[Thu Jul 7 15:54:08 2016] ata4.00: error: { UNC }
[Thu Jul 7 15:54:08 2016] ata4.00: configured for UDMA/133
[Thu Jul 7 15:54:08 2016] sd 3:0:0:0: [sdb] tag#5 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Jul 7 15:54:08 2016] sd 3:0:0:0: [sdb] tag#5 Sense Key : Medium
Error [current] [descriptor]
[Thu Jul 7 15:54:08 2016] sd 3:0:0:0: [sdb] tag#5 Add. Sense:
Unrecovered read error - auto reallocate failed
[Thu Jul 7 15:54:08 2016] sd 3:0:0:0: [sdb] tag#5 CDB: Read(10) 28 00
00 0e f8 08 00 00 08 00
[Thu Jul 7 15:54:08 2016] blk_update_request: I/O error, dev sdb, sector 981001
[Thu Jul 7 15:54:08 2016] Buffer I/O error on dev sdb3, logical block
1, async page read
[Thu Jul 7 15:54:08 2016] ata4: EH complete
[Thu Jul 7 15:54:09 2016] ata4.00: exception Emask 0x0 SAct 0x4000000
SErr 0x0 action 0x0
[Thu Jul 7 15:54:09 2016] ata4.00: irq_stat 0x40000008
[Thu Jul 7 15:54:09 2016] ata4.00: failed command: READ FPDMA QUEUED
[Thu Jul 7 15:54:09 2016] ata4.00: cmd
60/08:d0:08:f8:0e/00:00:00:00:00/40 tag 26 ncq 4096 in
[Thu Jul 7 15:54:09 2016] res
41/40:08:09:f8:0e/00:00:00:00:00/00 Emask 0x409 (media error) <F>
[Thu Jul 7 15:54:09 2016] ata4.00: status: { DRDY ERR }
[Thu Jul 7 15:54:09 2016] ata4.00: error: { UNC }
[Thu Jul 7 15:54:09 2016] ata4.00: configured for UDMA/133
[Thu Jul 7 15:54:09 2016] sd 3:0:0:0: [sdb] tag#26 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Jul 7 15:54:09 2016] sd 3:0:0:0: [sdb] tag#26 Sense Key : Medium
Error [current] [descriptor]
[Thu Jul 7 15:54:09 2016] sd 3:0:0:0: [sdb] tag#26 Add. Sense:
Unrecovered read error - auto reallocate failed
[Thu Jul 7 15:54:09 2016] sd 3:0:0:0: [sdb] tag#26 CDB: Read(10) 28
00 00 0e f8 08 00 00 08 00
[Thu Jul 7 15:54:09 2016] blk_update_request: I/O error, dev sdb, sector 981001
[Thu Jul 7 15:54:09 2016] Buffer I/O error on dev sdb3, logical block
1, async page read
[Thu Jul 7 15:54:09 2016] ata4: EH complete
[Thu Jul 7 15:54:09 2016] ata4.00: exception Emask 0x0 SAct
0x40000000 SErr 0x0 action 0x0
Much appreciated,
Amit Biswas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Recovering RAID Volumes from 6 Disks
2016-07-19 16:29 Recovering RAID Volumes from 6 Disks Amit Biswas
@ 2016-07-19 19:57 ` Wols Lists
2016-07-19 22:34 ` Amit Biswas
0 siblings, 1 reply; 9+ messages in thread
From: Wols Lists @ 2016-07-19 19:57 UTC (permalink / raw)
To: Amit Biswas, linux-raid
Another bit of useful information - can you post the output of smartctl
on all your drives?
smartctl -x /dev/sd[a,b,c...]
Seeing as the drives are Seagate 1TB drives, I suspect they do support
ERC and timeout mismatch is not the problem, but this will tell us.
I'll let others chime in with recovery info, but this information will
definitely help them.
Cheers,
Wol
On 19/07/16 17:29, Amit Biswas wrote:
> Greetings!
>
> Backup server was acting up and the issue was the drives (all of them)
> :( Could use some guidance or verdict.
>
> It has a total of 6 drives: sda,b,c,d,e,f. From the superblock info
> (attached), there is a raid 1, and a raid 10 volume. Problem is all
> the disks are part of both raid volumes (according to superblock).
>
> I am currently booted into an ubuntu live disk shell.
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Recovering RAID Volumes from 6 Disks
2016-07-19 19:57 ` Wols Lists
@ 2016-07-19 22:34 ` Amit Biswas
2016-07-20 14:31 ` Wols Lists
0 siblings, 1 reply; 9+ messages in thread
From: Amit Biswas @ 2016-07-19 22:34 UTC (permalink / raw)
To: Wols Lists; +Cc: linux-raid
Here are the smart reports for all six drives. drive sda was not co-operating...
/dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: /2:0:0:0
Product:
User Capacity: 600,332,565,813,390,450 bytes [600 PB]
Logical block size: 774843950 bytes
>> Terminate command early due to bad response to IEC mode page
=== START OF READ SMART DATA SECTION ===
Error Counter logging not supported
Device does not support Self Test logging
/dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.14 (AF)
Device Model: ST1000DM003-9YN162
Serial Number: Z1D05TKG
LU WWN Device Id: 5 000c50 03ec23134
Firmware Version: CC9C
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Jul 7 18:58:22 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 575) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 116) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3081) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail
Always - 205143226
3 Spin_Up_Time 0x0003 097 097 000 Pre-fail
Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always - 152
5 Reallocated_Sector_Ct 0x0033 095 095 036 Pre-fail
Always - 7808
7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail
Always - 4573741014
9 Power_On_Hours 0x0032 067 067 000 Old_age
Always - 29429
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always - 151
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age
Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age
Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age
Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age
Always - 4 5 5
189 High_Fly_Writes 0x003a 100 100 000 Old_age
Always - 0
190 Airflow_Temperature_Cel 0x0022 069 035 045 Old_age
Always In_the_past 31 (0 101 31 30 0)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age
Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age
Always - 149
193 Load_Cycle_Count 0x0032 087 087 000 Old_age
Always - 26214
194 Temperature_Celsius 0x0022 031 065 000 Old_age
Always - 31 (0 14 0 0 0)
197 Current_Pending_Sector 0x0012 080 080 000 Old_age
Always - 3359
198 Offline_Uncorrectable 0x0010 080 080 000 Old_age
Offline - 3359
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 15576h+39m+12.722s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age
Offline - 42665052319107
242 Total_LBAs_Read 0x0000 100 253 000 Old_age
Offline - 234087115868324
SMART Error Log Version: 1
ATA Error Count: 342 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 342 occurred at disk power-on lifetime: 29429 hours (1226 days + 5 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 09 f8 0e 00 Error: UNC at LBA = 0x000ef809 = 981001
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 08 f8 0e 40 00 02:57:13.845 READ FPDMA QUEUED
60 00 08 00 f8 0e 40 00 02:57:13.845 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:57:13.844 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:57:13.844 READ FPDMA QUEUED
b0 da 00 00 4f c2 00 00 02:49:00.670 SMART RETURN STATUS
Error 341 occurred at disk power-on lifetime: 29429 hours (1226 days + 5 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 09 f8 0e 00 Error: UNC at LBA = 0x000ef809 = 981001
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 08 f8 0e 40 00 02:57:13.845 READ FPDMA QUEUED
60 00 08 00 f8 0e 40 00 02:57:13.845 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:57:13.844 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:57:13.844 READ FPDMA QUEUED
b0 da 00 00 4f c2 00 00 02:49:00.670 SMART RETURN STATUS
Error 340 occurred at disk power-on lifetime: 29428 hours (1226 days + 4 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 09 f8 0e 00 Error: UNC at LBA = 0x000ef809 = 981001
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 08 f8 0e 40 00 01:36:54.470 READ FPDMA QUEUED
60 00 08 00 f8 0e 40 00 01:36:54.470 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 01:36:54.470 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 01:36:54.469 READ FPDMA QUEUED
60 00 08 08 10 00 40 00 01:36:12.430 READ FPDMA QUEUED
Error 339 occurred at disk power-on lifetime: 29428 hours (1226 days + 4 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 09 f8 0e 00 Error: UNC at LBA = 0x000ef809 = 981001
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 08 f8 0e 40 00 01:36:54.470 READ FPDMA QUEUED
60 00 08 00 f8 0e 40 00 01:36:54.470 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 01:36:54.470 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 01:36:54.469 READ FPDMA QUEUED
60 00 08 08 10 00 40 00 01:36:12.430 READ FPDMA QUEUED
Error 338 occurred at disk power-on lifetime: 29427 hours (1226 days + 3 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 09 f8 0e 00 Error: UNC at LBA = 0x000ef809 = 981001
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 08 f8 0e 40 00 01:28:40.063 READ FPDMA QUEUED
60 00 08 08 00 00 40 00 01:28:37.685 READ FPDMA QUEUED
60 00 08 08 08 00 40 00 01:28:37.685 READ FPDMA QUEUED
60 00 08 08 10 00 40 00 01:28:37.684 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 01:28:37.684 SET FEATURES [Enable SATA feature]
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
/dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Constellation ES (SATA 6Gb/s)
Device Model: ST1000NM0011
Serial Number: Z1N4DQG8
LU WWN Device Id: 5 000c50 064169d91
Firmware Version: SN03
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7202 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Jul 7 18:58:31 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 600) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 149) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x10bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 069 064 044 Pre-fail
Always - 10352612
3 Spin_Up_Time 0x0003 096 096 000 Pre-fail
Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always - 21
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000f 081 060 030 Pre-fail
Always - 129759408
9 Power_On_Hours 0x0032 080 080 000 Old_age
Always - 17545
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always - 21
184 End-to-End_Error 0x0032 100 100 099 Old_age
Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age
Always - 0
188 Command_Timeout 0x0032 100 099 000 Old_age
Always - 3
189 High_Fly_Writes 0x003a 100 100 000 Old_age
Always - 0
190 Airflow_Temperature_Cel 0x0022 069 062 045 Old_age
Always - 31 (Min/Max 30/31)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age
Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age
Always - 16
193 Load_Cycle_Count 0x0032 096 096 000 Old_age
Always - 9383
194 Temperature_Celsius 0x0022 031 040 000 Old_age
Always - 31 (0 14 0 0 0)
195 Hardware_ECC_Recovered 0x001a 105 100 000 Old_age
Always - 10352612
197 Current_Pending_Sector 0x0012 100 100 000 Old_age
Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age
Always - 66
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
/dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Constellation ES (SATA 6Gb/s)
Device Model: ST1000NM0011
Serial Number: Z1N4DQG8
LU WWN Device Id: 5 000c50 064169d91
Firmware Version: SN03
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7202 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Jul 7 18:58:31 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 600) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 149) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x10bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 069 064 044 Pre-fail
Always - 10352612
3 Spin_Up_Time 0x0003 096 096 000 Pre-fail
Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always - 21
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000f 081 060 030 Pre-fail
Always - 129759408
9 Power_On_Hours 0x0032 080 080 000 Old_age
Always - 17545
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always - 21
184 End-to-End_Error 0x0032 100 100 099 Old_age
Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age
Always - 0
188 Command_Timeout 0x0032 100 099 000 Old_age
Always - 3
189 High_Fly_Writes 0x003a 100 100 000 Old_age
Always - 0
190 Airflow_Temperature_Cel 0x0022 069 062 045 Old_age
Always - 31 (Min/Max 30/31)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age
Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age
Always - 16
193 Load_Cycle_Count 0x0032 096 096 000 Old_age
Always - 9383
194 Temperature_Celsius 0x0022 031 040 000 Old_age
Always - 31 (0 14 0 0 0)
195 Hardware_ECC_Recovered 0x001a 105 100 000 Old_age
Always - 10352612
197 Current_Pending_Sector 0x0012 100 100 000 Old_age
Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age
Always - 66
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
/dev/sdd
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Constellation ES (SATA 6Gb/s)
Device Model: ST1000NM0011
Serial Number: Z1N4DX3G
LU WWN Device Id: 5 000c50 06416d153
Firmware Version: SN03
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7202 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Jul 7 18:58:38 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 609) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 153) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x10bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 078 063 044 Pre-fail
Always - 59676424
3 Spin_Up_Time 0x0003 096 094 000 Pre-fail
Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always - 64
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000f 083 060 030 Pre-fail
Always - 202527202
9 Power_On_Hours 0x0032 074 074 000 Old_age
Always - 23267
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always - 60
184 End-to-End_Error 0x0032 100 100 099 Old_age
Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age
Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age
Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age
Always - 0
190 Airflow_Temperature_Cel 0x0022 067 032 045 Old_age
Always In_the_past 33 (0 111 33 31 0)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age
Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age
Always - 51
193 Load_Cycle_Count 0x0032 095 095 000 Old_age
Always - 10161
194 Temperature_Celsius 0x0022 033 068 000 Old_age
Always - 33 (0 16 0 0 0)
195 Hardware_ECC_Recovered 0x001a 114 099 000 Old_age
Always - 59676424
197 Current_Pending_Sector 0x0012 100 100 000 Old_age
Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 478 -
# 2 Extended offline Aborted by host 80% 462 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
/dev/sde
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.14 (AF)
Device Model: ST1000DM003-1CH162
Serial Number: S1D8EGH8
LU WWN Device Id: 5 000c50 05c135595
Firmware Version: CC46
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Jul 7 18:58:46 2016 UTC
==> WARNING: A firmware update for this drive is available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 575) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 115) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3085) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail
Always - 203719328
3 Spin_Up_Time 0x0003 098 097 000 Pre-fail
Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always - 122
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000f 081 060 030 Pre-fail
Always - 156895542
9 Power_On_Hours 0x0032 068 068 000 Old_age
Always - 28487
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always - 121
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age
Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age
Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age
Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age
Always - 0 0 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age
Always - 0
190 Airflow_Temperature_Cel 0x0022 071 033 045 Old_age
Always In_the_past 29 (0 200 29 27 0)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age
Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age
Always - 119
193 Load_Cycle_Count 0x0032 095 095 000 Old_age
Always - 11202
194 Temperature_Celsius 0x0022 029 067 000 Old_age
Always - 29 (0 11 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age
Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 27611h+29m+28.145s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age
Offline - 10984310419
242 Total_LBAs_Read 0x0000 100 253 000 Old_age
Offline - 42457231761
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
/dev/sdf
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.14 (AF)
Device Model: ST1000DM003-9YN162
Serial Number: Z1D04N3L
LU WWN Device Id: 5 000c50 03633f4d6
Firmware Version: CC9C
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Jul 7 18:58:53 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 584) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 115) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3081) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 089 087 006 Pre-fail
Always - 107847548
3 Spin_Up_Time 0x0003 098 097 000 Pre-fail
Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always - 148
5 Reallocated_Sector_Ct 0x0033 072 051 036 Pre-fail
Always - 37616
7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail
Always - 235958313
9 Power_On_Hours 0x0032 066 066 000 Old_age
Always - 30474
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always - 147
183 Runtime_Bad_Block 0x0032 098 098 000 Old_age
Always - 2
184 End-to-End_Error 0x0032 100 100 099 Old_age
Always - 0
187 Reported_Uncorrect 0x0032 001 001 000 Old_age
Always - 587
188 Command_Timeout 0x0032 100 098 000 Old_age
Always - 13 13 13
189 High_Fly_Writes 0x003a 100 100 000 Old_age
Always - 0
190 Airflow_Temperature_Cel 0x0022 069 034 045 Old_age
Always In_the_past 31 (0 102 31 29 0)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age
Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age
Always - 145
193 Load_Cycle_Count 0x0032 088 088 000 Old_age
Always - 24193
194 Temperature_Celsius 0x0022 031 066 000 Old_age
Always - 31 (0 13 0 0 0)
197 Current_Pending_Sector 0x0012 001 001 000 Old_age
Always - 33664
198 Offline_Uncorrectable 0x0010 001 001 000 Old_age
Offline - 33664
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 29478h+42m+45.934s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age
Offline - 35126025198006
242 Total_LBAs_Read 0x0000 100 253 000 Old_age
Offline - 233821549666301
SMART Error Log Version: 1
ATA Error Count: 561 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 561 occurred at disk power-on lifetime: 28893 hours (1203 days + 21 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 00 ff ff ff 4f 00 12d+01:32:01.326 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 12d+01:32:01.325 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 12d+01:32:01.325 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 12d+01:32:01.325 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 12d+01:32:01.325 READ FPDMA QUEUED
Error 560 occurred at disk power-on lifetime: 28893 hours (1203 days + 21 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 12d+01:31:16.334 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 12d+01:31:15.803 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 12d+01:31:13.683 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 12d+01:31:13.683 READ FPDMA QUEUED
61 00 08 ff ff ff 4f 00 12d+01:31:13.683 WRITE FPDMA QUEUED
Error 559 occurred at disk power-on lifetime: 28893 hours (1203 days + 21 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 12d+01:31:10.402 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 12d+01:31:07.982 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 12d+01:31:07.982 READ FPDMA QUEUED
61 00 08 ff ff ff 4f 00 12d+01:31:07.982 WRITE FPDMA QUEUED
ef 10 02 00 00 00 a0 00 12d+01:31:07.922 SET FEATURES [Enable SATA feature]
Error 558 occurred at disk power-on lifetime: 28893 hours (1203 days + 21 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 12d+01:31:04.755 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 12d+01:31:04.755 READ FPDMA QUEUED
61 00 08 ff ff ff 4f 00 12d+01:31:04.755 WRITE FPDMA QUEUED
ef 10 02 00 00 00 a0 00 12d+01:31:04.694 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 12d+01:31:04.694 READ NATIVE MAX ADDRESS
EXT [OBS-ACS-3]
Error 557 occurred at disk power-on lifetime: 28893 hours (1203 days + 21 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 12d+01:31:01.457 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 12d+01:31:01.457 READ FPDMA QUEUED
61 00 08 ff ff ff 4f 00 12d+01:31:01.457 WRITE FPDMA QUEUED
ef 10 02 00 00 00 a0 00 12d+01:31:01.444 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 12d+01:31:01.444 READ NATIVE MAX ADDRESS
EXT [OBS-ACS-3]
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Amit Biswas
Lab Manager
CSE Department
NYU Tandon School Of Engineering
Office: 1-646-997-3023
On Tue, Jul 19, 2016 at 3:57 PM, Wols Lists <antlists@youngman.org.uk> wrote:
> Another bit of useful information - can you post the output of smartctl
> on all your drives?
>
> smartctl -x /dev/sd[a,b,c...]
>
> Seeing as the drives are Seagate 1TB drives, I suspect they do support
> ERC and timeout mismatch is not the problem, but this will tell us.
>
> I'll let others chime in with recovery info, but this information will
> definitely help them.
>
> Cheers,
> Wol
>
> On 19/07/16 17:29, Amit Biswas wrote:
>> Greetings!
>>
>> Backup server was acting up and the issue was the drives (all of them)
>> :( Could use some guidance or verdict.
>>
>> It has a total of 6 drives: sda,b,c,d,e,f. From the superblock info
>> (attached), there is a raid 1, and a raid 10 volume. Problem is all
>> the disks are part of both raid volumes (according to superblock).
>>
>> I am currently booted into an ubuntu live disk shell.
>>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Recovering RAID Volumes from 6 Disks
2016-07-19 22:34 ` Amit Biswas
@ 2016-07-20 14:31 ` Wols Lists
2016-07-20 14:53 ` Wols Lists
2016-07-20 15:07 ` Roman Mamedov
0 siblings, 2 replies; 9+ messages in thread
From: Wols Lists @ 2016-07-20 14:31 UTC (permalink / raw)
To: Amit Biswas; +Cc: linux-raid
Ummmmm ...
b,e and f are Barracudas ... I know my 3TB Barracudas are vulnerable to
the timeout problem. It looks like the 1TB ones probably are as well ...
While you're waiting for someone else to chime in, read the following
... not the best reading ... about why your Barracudas are probably a
bad choice :-(
http://marc.info/?l=linux-raid&m=139050322510249&w=2
http://marc.info/?l=linux-raid&m=135863964624202&w=2
http://marc.info/?l=linux-raid&m=135811522817345&w=1
http://marc.info/?l=linux-raid&m=133761065622164&w=2
http://marc.info/?l=linux-raid&m=132477199207506
http://marc.info/?l=linux-raid&m=133665797115876&w=2
http://marc.info/?l=linux-raid&m=142487508806844&w=3
http://marc.info/?l=linux-raid&m=144535576302583&w=2
Have you got a spare Constellation lying around? If not, can you get a
proper raid drive - WD Red or Seagate NAS? Do a ddrescue to copy sda to
the replacement drive if you can. You don't want to use that to recover
the array if you can help it, but you might not have much choice, and at
least you'll have it to hand.
And do NOT do this until the experts chime in and help, but hopefully
it's just a case of making sure all your arrays are stopped, running the
following script
for x in /sys/block/sd[a-z] ; do
echo 180 > $x/device/timeout
done
echo 4096 > /sys/block/md0/md/stripe_cache_size
on the barracudas and re-assembling the array(s). At which point,
backing up and replacing the barracudas should be extremely high on the
agenda! It's probably a good idea to go Raid-6 and get 2 or 3TB drives.
Cheers,
Wol
On 19/07/16 23:34, Amit Biswas wrote:
> Here are the smart reports for all six drives. drive sda was not co-operating...
>
> /dev/sda
>
> smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
> Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Vendor: /2:0:0:0
> Product:
> User Capacity: 600,332,565,813,390,450 bytes [600 PB]
> Logical block size: 774843950 bytes
>>> Terminate command early due to bad response to IEC mode page
>
> === START OF READ SMART DATA SECTION ===
>
> Error Counter logging not supported
>
> Device does not support Self Test logging
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Recovering RAID Volumes from 6 Disks
2016-07-20 14:31 ` Wols Lists
@ 2016-07-20 14:53 ` Wols Lists
2016-07-20 15:07 ` Roman Mamedov
1 sibling, 0 replies; 9+ messages in thread
From: Wols Lists @ 2016-07-20 14:53 UTC (permalink / raw)
To: Amit Biswas; +Cc: linux-raid
Looking back at your first post, what I think has happened is that sda
has failed, and sdb has fallen foul of the timeout problem.
IFF I'm right, getting your array back shouldn't be too hard.
Cheers,
Wol
On 20/07/16 15:31, Wols Lists wrote:
> Ummmmm ...
>
> b,e and f are Barracudas ... I know my 3TB Barracudas are vulnerable to
> the timeout problem. It looks like the 1TB ones probably are as well ...
>
> While you're waiting for someone else to chime in, read the following
> ... not the best reading ... about why your Barracudas are probably a
> bad choice :-(
>
> http://marc.info/?l=linux-raid&m=139050322510249&w=2
> http://marc.info/?l=linux-raid&m=135863964624202&w=2
> http://marc.info/?l=linux-raid&m=135811522817345&w=1
> http://marc.info/?l=linux-raid&m=133761065622164&w=2
> http://marc.info/?l=linux-raid&m=132477199207506
> http://marc.info/?l=linux-raid&m=133665797115876&w=2
> http://marc.info/?l=linux-raid&m=142487508806844&w=3
> http://marc.info/?l=linux-raid&m=144535576302583&w=2
>
> Have you got a spare Constellation lying around? If not, can you get a
> proper raid drive - WD Red or Seagate NAS? Do a ddrescue to copy sda to
> the replacement drive if you can. You don't want to use that to recover
> the array if you can help it, but you might not have much choice, and at
> least you'll have it to hand.
>
> And do NOT do this until the experts chime in and help, but hopefully
> it's just a case of making sure all your arrays are stopped, running the
> following script
>
> for x in /sys/block/sd[a-z] ; do
> echo 180 > $x/device/timeout
> done
>
> echo 4096 > /sys/block/md0/md/stripe_cache_size
>
> on the barracudas and re-assembling the array(s). At which point,
> backing up and replacing the barracudas should be extremely high on the
> agenda! It's probably a good idea to go Raid-6 and get 2 or 3TB drives.
>
> Cheers,
> Wol
>
>
> On 19/07/16 23:34, Amit Biswas wrote:
>> Here are the smart reports for all six drives. drive sda was not co-operating...
>>
>> /dev/sda
>>
>> smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-27-generic] (local build)
>> Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Vendor: /2:0:0:0
>> Product:
>> User Capacity: 600,332,565,813,390,450 bytes [600 PB]
>> Logical block size: 774843950 bytes
>>>> Terminate command early due to bad response to IEC mode page
>>
>> === START OF READ SMART DATA SECTION ===
>>
>> Error Counter logging not supported
>>
>> Device does not support Self Test logging
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Recovering RAID Volumes from 6 Disks
2016-07-20 14:31 ` Wols Lists
2016-07-20 14:53 ` Wols Lists
@ 2016-07-20 15:07 ` Roman Mamedov
2016-07-20 15:36 ` Wols Lists
1 sibling, 1 reply; 9+ messages in thread
From: Roman Mamedov @ 2016-07-20 15:07 UTC (permalink / raw)
To: Wols Lists; +Cc: Amit Biswas, linux-raid
[-- Attachment #1: Type: text/plain, Size: 875 bytes --]
On Wed, 20 Jul 2016 15:31:09 +0100
Wols Lists <antlists@youngman.org.uk> wrote:
> backing up and replacing the barracudas
Yeah especially the sdb and sdf ones, which are failing HARD right now.
5 Reallocated_Sector_Ct 0x0033 095 095 036 Pre-fail Always - 7808
197 Current_Pending_Sector 0x0012 080 080 000 Old_age Always - 3359
198 Offline_Uncorrectable 0x0010 080 080 000 Old_age Offline - 3359
5 Reallocated_Sector_Ct 0x0033 072 051 036 Pre-fail Always - 37616
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 587
197 Current_Pending_Sector 0x0012 001 001 000 Old_age Always - 33664
198 Offline_Uncorrectable 0x0010 001 001 000 Old_age Offline - 33664
--
With respect,
Roman
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Recovering RAID Volumes from 6 Disks
2016-07-20 15:07 ` Roman Mamedov
@ 2016-07-20 15:36 ` Wols Lists
2016-07-20 16:10 ` Wols Lists
0 siblings, 1 reply; 9+ messages in thread
From: Wols Lists @ 2016-07-20 15:36 UTC (permalink / raw)
To: Roman Mamedov; +Cc: Amit Biswas, linux-raid
On 20/07/16 16:07, Roman Mamedov wrote:
> On Wed, 20 Jul 2016 15:31:09 +0100 Wols Lists
> <antlists@youngman.org.uk> wrote:
>
>> backing up and replacing the barracudas
>
> Yeah especially the sdb and sdf ones, which are failing HARD right
> now.
>
> 5 Reallocated_Sector_Ct 0x0033 095 095 036 Pre-fail
> Always - 7808 197 Current_Pending_Sector 0x0012 080
> 080 000 Old_age Always - 3359 198
> Offline_Uncorrectable 0x0010 080 080 000 Old_age Offline
> - 3359
>
> 5 Reallocated_Sector_Ct 0x0033 072 051 036 Pre-fail
> Always - 37616 187 Reported_Uncorrect 0x0032 001
> 001 000 Old_age Always - 587 197
> Current_Pending_Sector 0x0012 001 001 000 Old_age Always
> - 33664 198 Offline_Uncorrectable 0x0010 001 001 000
> Old_age Offline - 33664
>
OUCH!
Okay, and I don't like recommending stuff because I'm not an expert,
but you have 6 x 1TB drives, raid-10. Does that give you 1.5TB of
usable space, or 3TB? Never mind. I'm going to recommend getting 4 x
3TB drives at about £100 each - not nice. But you only need one to
start with.
Get that first 3TB drive. NOW. Physically replace sda in the machine,
and configure it as a single-drive mirror ( --create --devices=2 sda
spare).
Boot your system, run that timeout script, and try to assemble your
array with --scan --assemble --force. That SHOULD be safe. Read up and
make certain - I accept no responsibility for your data ...
If that works, you can now mount your array(s). READ ONLY.
Now copy your data across to the new drive - use something like rsync
or cp and keep a log - there's a high probability you'll get read
errors, and you don't want this to crash the copy and leave it only
partly complete, and you also want to know what failed.
You can now bring the system up on the new drive.
DUMP THE BARRACUDAS - ALL OF THEM. Two are failing, and the third one
is probably no better - it's not worth risking your data. The
constellations are probably okay as backup drives - it's a couple of
quid for an enclosure to turn them into usb drives :-)
As soon as you can, get the other three 3TB drives. The first of these
is urgent - your system will be running on a degraded mirror and you
need to fix that asap. The second drive will convert your mirror to
raid5, and the last one will convert it to raid6.
NB - I can't remember - is your boot/system partition on these drives?
You're better off running that as a mirror regardless, so if so, split
the 3TB drives into a small boot/system partition and a large data
partition, raid6 the data as you get the drives, and raid1 the
boot/system across all four drives (install grub on all four, too).
Cheers,
Wol
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Recovering RAID Volumes from 6 Disks
2016-07-20 15:36 ` Wols Lists
@ 2016-07-20 16:10 ` Wols Lists
2016-07-22 13:56 ` Phil Turmel
0 siblings, 1 reply; 9+ messages in thread
From: Wols Lists @ 2016-07-20 16:10 UTC (permalink / raw)
To: Roman Mamedov; +Cc: Amit Biswas, linux-raid
On 20/07/16 16:36, Wols Lists wrote:
> Get that first 3TB drive. NOW. Physically replace sda in the machine,
> and configure it as a single-drive mirror ( --create --devices=2 sda
> spare).
Just noticed your edu address. If Phil Turmel chimes in, he'll tell me
off for telling you to spend money :-)
If you are an impecunious student, and your data will fit on 2TB, then
beg borrow or steal :-) a 2TB drive.
Use that as your backup, mirrored, as I said, and then you can combine
the two constellations into a 2TB raid0, and add that in as the second
half of your mirror. That will at least give you a working, safe, raid
system.
Cheers,
Wol
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Recovering RAID Volumes from 6 Disks
2016-07-20 16:10 ` Wols Lists
@ 2016-07-22 13:56 ` Phil Turmel
0 siblings, 0 replies; 9+ messages in thread
From: Phil Turmel @ 2016-07-22 13:56 UTC (permalink / raw)
To: Wols Lists, Roman Mamedov; +Cc: Amit Biswas, linux-raid
On 07/20/2016 12:10 PM, Wols Lists wrote:
> On 20/07/16 16:36, Wols Lists wrote:
>> Get that first 3TB drive. NOW. Physically replace sda in the machine,
>> and configure it as a single-drive mirror ( --create --devices=2 sda
>> spare).
>
> Just noticed your edu address. If Phil Turmel chimes in, he'll tell me
> off for telling you to spend money :-)
No, not this time. :-)
I would also note that the raid1 composed of the six sd?2 partitions is
only operating as a two-copy mirror -- the other four devices are
spares. Whoever created that array added the extra drives but never
used --grow to set the number of mirrors to six.
Amit, as soon as you get your arrays assembled with the --force option,
follow Wol's advice on rsync'ing to a new array. I wouldn't advise
trying to fix the UREs on your existing arrays since there are so many
that you'd certainly trip the 10/hr read error limit in MD.
Just copying data may also trip the read error rate limit, but you have
no choice. If it happens and kicks out any array devices, simply stop
the rsync, stop and re-assemble the array (with --force), and continue
with rsync.
The Constellations can probably be recycled into spares or backup
devices after you copy your data. I didn't look close at their stats.
Phil
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-07-22 13:56 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-19 16:29 Recovering RAID Volumes from 6 Disks Amit Biswas
2016-07-19 19:57 ` Wols Lists
2016-07-19 22:34 ` Amit Biswas
2016-07-20 14:31 ` Wols Lists
2016-07-20 14:53 ` Wols Lists
2016-07-20 15:07 ` Roman Mamedov
2016-07-20 15:36 ` Wols Lists
2016-07-20 16:10 ` Wols Lists
2016-07-22 13:56 ` Phil Turmel
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.