* Help with failed RAID-5 -> 6 migration
@ 2013-06-08 3:02 Keith Phillips
2013-06-08 22:43 ` Phil Turmel
2013-06-08 23:02 ` Phil Turmel
0 siblings, 2 replies; 10+ messages in thread
From: Keith Phillips @ 2013-06-08 3:02 UTC (permalink / raw)
To: linux-raid
Hi,
I have a problem. I'm worried I may have borked my array :/
I've been running a 3x2TB RAID-5 array and I recently got another 2TB
drive, intending to bump it up to a 4x2TB RAID-6 array.
I stuck the new disk in and added it to the RAID array, as follows
("/files" is on a non-RAID disk):
mdadm --manage /dev/md0 --add /dev/sda
mdadm --grow /dev/md0 --raid-devices 4 --level 6
--backup-file=/files/mdadm-backup
It seemed to work and the grow process started okay, reporting about 3
days to completion (at ~8MB/s) which seemed really slow, but I left it
anyway. Next morning, time to complete was several years and the
kernel had spat out a bunch of I/O errors (lost those logs, sorry).
I figured the new disk must be at fault, because I'd done an array
check recently and the others seemed okay. Hoping it might abort the
grow, I failed the new disk:
mdadm --manage /dev/md0 --fail /dev/sda
But mdadm kept reporting years to completion. So I rebooted.
Now I'd like to know - what state is my array in? If possible I'd like
to get back to a working 3 disk RAID-5 configuration while I test the
new disk and figure out what to do with it.
The backup-file doesn't exist, and the stats on the array are as follows:
--------------------------
cat /proc/mdstat:
--------------------------
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : inactive sdd[1] sde[3] sdc[0] sda[4]
7814054240 blocks super 1.2
unused devices: <none>
--------------------------
mdadm --detail /dev/md0
--------------------------
/dev/md0:
Version : 1.2
Creation Time : Sun Jul 17 00:41:57 2011
Raid Level : raid6
Used Dev Size : 1953512960 (1863.02 GiB 2000.40 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Sat Jun 8 11:00:43 2013
State : active, degraded, Not Started
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric-6
Chunk Size : 512K
New Layout : left-symmetric
Name : muncher:0 (local to host muncher)
UUID : 830b9ec8:ca8dac63:e31946a0:4c76ccf0
Events : 50599
Number Major Minor RaidDevice State
0 8 32 0 active sync /dev/sdc
1 8 48 1 active sync /dev/sdd
3 8 64 2 active sync /dev/sde
4 8 0 3 spare rebuilding /dev/sda
--------------------------
Any advice greatly appreciated.
Cheers,
Keith
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Help with failed RAID-5 -> 6 migration
2013-06-08 3:02 Help with failed RAID-5 -> 6 migration Keith Phillips
@ 2013-06-08 22:43 ` Phil Turmel
2013-06-08 23:02 ` Phil Turmel
1 sibling, 0 replies; 10+ messages in thread
From: Phil Turmel @ 2013-06-08 22:43 UTC (permalink / raw)
To: Keith Phillips; +Cc: linux-raid
On 06/07/2013 11:02 PM, Keith Phillips wrote:
> Hi,
>
> I have a problem. I'm worried I may have borked my array :/
>
> I've been running a 3x2TB RAID-5 array and I recently got another 2TB
> drive, intending to bump it up to a 4x2TB RAID-6 array.
>
> I stuck the new disk in and added it to the RAID array, as follows
> ("/files" is on a non-RAID disk):
> mdadm --manage /dev/md0 --add /dev/sda
> mdadm --grow /dev/md0 --raid-devices 4 --level 6
> --backup-file=/files/mdadm-backup
>
> It seemed to work and the grow process started okay, reporting about 3
> days to completion (at ~8MB/s) which seemed really slow, but I left it
> anyway. Next morning, time to complete was several years and the
> kernel had spat out a bunch of I/O errors (lost those logs, sorry).
>
> I figured the new disk must be at fault, because I'd done an array
> check recently and the others seemed okay. Hoping it might abort the
> grow, I failed the new disk:
> mdadm --manage /dev/md0 --fail /dev/sda
>
> But mdadm kept reporting years to completion. So I rebooted.
>
> Now I'd like to know - what state is my array in? If possible I'd like
> to get back to a working 3 disk RAID-5 configuration while I test the
> new disk and figure out what to do with it.
>
> The backup-file doesn't exist, and the stats on the array are as follows:
>
> --------------------------
> cat /proc/mdstat:
> --------------------------
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md0 : inactive sdd[1] sde[3] sdc[0] sda[4]
> 7814054240 blocks super 1.2
>
> unused devices: <none>
> --------------------------
> mdadm --detail /dev/md0
> --------------------------
> /dev/md0:
> Version : 1.2
> Creation Time : Sun Jul 17 00:41:57 2011
> Raid Level : raid6
> Used Dev Size : 1953512960 (1863.02 GiB 2000.40 GB)
> Raid Devices : 4
> Total Devices : 4
> Persistence : Superblock is persistent
>
> Update Time : Sat Jun 8 11:00:43 2013
> State : active, degraded, Not Started
> Active Devices : 3
> Working Devices : 4
> Failed Devices : 0
> Spare Devices : 1
>
> Layout : left-symmetric-6
> Chunk Size : 512K
>
> New Layout : left-symmetric
>
> Name : muncher:0 (local to host muncher)
> UUID : 830b9ec8:ca8dac63:e31946a0:4c76ccf0
> Events : 50599
>
> Number Major Minor RaidDevice State
> 0 8 32 0 active sync /dev/sdc
> 1 8 48 1 active sync /dev/sdd
> 3 8 64 2 active sync /dev/sde
> 4 8 0 3 spare rebuilding /dev/sda
>
> --------------------------
>
> Any advice greatly appreciated.
>
> Cheers,
> Keith
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Help with failed RAID-5 -> 6 migration
2013-06-08 3:02 Help with failed RAID-5 -> 6 migration Keith Phillips
2013-06-08 22:43 ` Phil Turmel
@ 2013-06-08 23:02 ` Phil Turmel
[not found] ` <CAASLJ=5JkQ8L9fbrOSUKH8Y-a7PZgkTcCsi6PW=rhzsUPRF6ow@mail.gmail.com>
1 sibling, 1 reply; 10+ messages in thread
From: Phil Turmel @ 2013-06-08 23:02 UTC (permalink / raw)
To: Keith Phillips; +Cc: linux-raid
Whoops. A bit click-happy.
On 06/07/2013 11:02 PM, Keith Phillips wrote:
> Hi,
>
> I have a problem. I'm worried I may have borked my array :/
Not yet. But you do have problems.
> I've been running a 3x2TB RAID-5 array and I recently got another 2TB
> drive, intending to bump it up to a 4x2TB RAID-6 array.
>
> I stuck the new disk in and added it to the RAID array, as follows
> ("/files" is on a non-RAID disk):
> mdadm --manage /dev/md0 --add /dev/sda
> mdadm --grow /dev/md0 --raid-devices 4 --level 6
> --backup-file=/files/mdadm-backup
Good so far.
> It seemed to work and the grow process started okay, reporting about 3
> days to completion (at ~8MB/s) which seemed really slow, but I left it
> anyway. Next morning, time to complete was several years and the
> kernel had spat out a bunch of I/O errors (lost those logs, sorry).
That's unfortunate. I'm going to guess you'd still be getting errors if
the array was running. If you get more, please save them and report.
> I figured the new disk must be at fault, because I'd done an array
> check recently and the others seemed okay.
Please elaborate on your recent "check". What method did you use, and
did you get any I/O errors in you logs at that time?
{Your problem is extraordinarily unlikely to be the fault of your new
drive, since almost all traffic to it would be *writes*, and a failed
write will kick a drive out of an array immediately.)
> Hoping it might abort the
> grow, I failed the new disk:
> mdadm --manage /dev/md0 --fail /dev/sda
No, that won't (and didn't) abort the grow. Your array details show the
old and new layouts in progress.
> But mdadm kept reporting years to completion. So I rebooted.
>
> Now I'd like to know - what state is my array in? If possible I'd like
> to get back to a working 3 disk RAID-5 configuration while I test the
> new disk and figure out what to do with it.
Not sure yet. But unless the new drive is truly bad, there's no
significant difference in going forward vs. going back.
> The backup-file doesn't exist, and the stats on the array are as follows:
Losing the backup file may cause some data loss, regardless of
conversion direction.
[trim /]
> Any advice greatly appreciated.
More data is needed:
1) output of "mdadm -E /dev/sd[acde]"
2) output of "for x in /dev/sd[acde] ; do smartctl -x $x ; done"
3) trimmed output of "ls -l /dev/disk/by-id" showing serial number vs.
device name for the subject disks.
4) output of "for x in /sys/block/sd[acde]/device/timeout ; do echo $x
$(< $x) ; done"
Meanwhile, report what you know about "error recovery control". If it
is "nothing", you may need to do some googling in this list's archives.
Suitable keywords would include: "scterc", "ure", "timeout", and "error
recovery".
Phil
^ permalink raw reply [flat|nested] 10+ messages in thread
* Fwd: Help with failed RAID-5 -> 6 migration
[not found] ` <CAASLJ=5JkQ8L9fbrOSUKH8Y-a7PZgkTcCsi6PW=rhzsUPRF6ow@mail.gmail.com>
@ 2013-06-10 16:16 ` Keith Phillips
2013-06-10 19:35 ` Phil Turmel
0 siblings, 1 reply; 10+ messages in thread
From: Keith Phillips @ 2013-06-10 16:16 UTC (permalink / raw)
To: linux-raid
Apologies, Phil, if this is the second time you've got this now, but I
just realised I dropped the linux-raid group from the email.
I'm still looking at a degraded array that won't start, so any input
would be greatly appreciated.
---------- Forwarded message ----------
From: Keith Phillips <spootsy.ootsy@gmail.com>
Date: Sun, Jun 9, 2013 at 3:33 PM
Subject: Re: Help with failed RAID-5 -> 6 migration
To: Phil Turmel <philip@turmel.org>
Thanks for the response, Phil.
*snip*
> That's unfortunate. I'm going to guess you'd still be getting errors if
> the array was running. If you get more, please save them and report.
Entirely possible - if I can get the array started again I suppose
we'll see. All I can remember of it is an I/O error on something like
'/dev/md/0/8', with a big stack trace.
*snip*
> Please elaborate on your recent "check". What method did you use, and
> did you get any I/O errors in you logs at that time?
There was Ubuntu's default monthly "check of redundancy data" -
admittedly I hadn't looked at this to see what it actually does, but I
was assuming it would verify the parity data for each stripe. mdadm is
configured to email me on detection of errors.
Also, I installed the new drive a day prior to actually adding it to
the array, and for some reason when I powered the machine back on the
existing array started rebuilding itself (took about 6 hours and
finished happily - no errors reported anywhere). Not a deliberate
process, but I assumed (wrongly?) that one of those would've issued
some warnings/errors if there was a problem.
*snip*
> Not sure yet. But unless the new drive is truly bad, there's no
> significant difference in going forward vs. going back.
>
>> The backup-file doesn't exist, and the stats on the array are as follows:
>
> Losing the backup file may cause some data loss, regardless of
> conversion direction.
I'm okay with a bit of data loss - most of the data isn't critical.
It'd be a real hassle to lose it all, though.
*snip*
> Meanwhile, report what you know about "error recovery control". If it
> is "nothing", you may need to do some googling in this list's archives.
> Suitable keywords would include: "scterc", "ure", "timeout", and "error
> recovery".
>
> Phil
Prior to looking through this list yesterday: absolutely nothing. Now:
almost nothing :)
According to smartctl, none of my drives support it. Not surprising as
they're all "green" desktop versions. When buying them I wasn't aware
of this deficiency. By my limited understanding, lack of support just
means the drives are likely to drop out of the array unnecessarily,
correct? Maybe this was the cause of the unexpected rebuild after I
added the new drive...
*edited forward* Actually, on reflection that wouldn't be it, would
it? If the drive was dropped for not responding due to it's lack of
scterc, I think I would have had to manually re-add it, which I didn't
do.
Requested info follows. FYI the new drive is now showing as
"/dev/sde/" rather than "/dev/sda".
Also, while poking yesterday I noticed I was getting warnings of the
form "Device has wrong state in superblock but /dev/sde seems ok", so
I tried a forced assemble:
mdadm --assemble /dev/md0 --force
Looks like it updated some info in the superblocks (and yes, I forgot
to save the original output first!), but the array remains inactive. I
have now sworn off poking around by myself, because I've no idea what
to do from here.
Cheers,
Keith
----------------------------
mdadm -E /dev/sd[bcde]
----------------------------
/dev/sdb:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : 830b9ec8:ca8dac63:e31946a0:4c76ccf0
Name : muncher:0 (local to host muncher)
Creation Time : Sun Jul 17 00:41:57 2011
Raid Level : raid6
Raid Devices : 4
Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
Array Size : 7814051840 (3726.03 GiB 4000.79 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 5d37816b:d5fb16a0:7d6a6b10:31cd6ce1
Reshape pos'n : 28540928 (27.22 GiB 29.23 GB)
New Layout : left-symmetric
Update Time : Sat Jun 8 11:00:43 2013
Checksum : 761bc532 - correct
Events : 50599
Layout : left-symmetric-6
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAA ('A' == active, '.' == missing)
/dev/sdc:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : 830b9ec8:ca8dac63:e31946a0:4c76ccf0
Name : muncher:0 (local to host muncher)
Creation Time : Sun Jul 17 00:41:57 2011
Raid Level : raid6
Raid Devices : 4
Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
Array Size : 7814051840 (3726.03 GiB 4000.79 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 283edca6:910be50c:1afca18d:4cd908a6
Reshape pos'n : 28540928 (27.22 GiB 29.23 GB)
New Layout : left-symmetric
Update Time : Sat Jun 8 11:00:43 2013
Checksum : 6018796d - correct
Events : 50599
Layout : left-symmetric-6
Chunk Size : 512K
Device Role : Active device 1
Array State : AAA. ('A' == active, '.' == missing)
/dev/sdd:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : 830b9ec8:ca8dac63:e31946a0:4c76ccf0
Name : muncher:0 (local to host muncher)
Creation Time : Sun Jul 17 00:41:57 2011
Raid Level : raid6
Raid Devices : 4
Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
Array Size : 7814051840 (3726.03 GiB 4000.79 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : f5494aad:07c9d06a:408628c7:39d7dfcf
Reshape pos'n : 28540928 (27.22 GiB 29.23 GB)
New Layout : left-symmetric
Update Time : Sat Jun 8 11:00:43 2013
Checksum : 27cfcac6 - correct
Events : 50599
Layout : left-symmetric-6
Chunk Size : 512K
Device Role : Active device 2
Array State : AAA. ('A' == active, '.' == missing)
/dev/sde:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x6
Array UUID : 830b9ec8:ca8dac63:e31946a0:4c76ccf0
Name : muncher:0 (local to host muncher)
Creation Time : Sun Jul 17 00:41:57 2011
Raid Level : raid6
Raid Devices : 4
Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
Array Size : 7814051840 (3726.03 GiB 4000.79 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Recovery Offset : 28540928 sectors
State : active
Device UUID : 49cc8e58:0547cc5b:9c47dd19:6e510c7d
Reshape pos'n : 28540928 (27.22 GiB 29.23 GB)
New Layout : left-symmetric
Update Time : Sat Jun 8 01:26:39 2013
Checksum : c5a30022 - correct
Events : 50598
Layout : left-symmetric-6
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAA ('A' == active, '.' == missing)
----------------------------
for x in /dev/sd[acde] ; do smartctl -x $x ; done
----------------------------
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.0.0-32-server] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda Green (Adv. Format)
Device Model: ST2000DL003-9VT166
Serial Number: 5YD4476E
LU WWN Device Id: 5 000c50 038e1b0af
Firmware Version: CC32
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Sun Jun 9 13:08:39 2013 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 612) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x30b7) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 110 100 006 - 25974592
3 Spin_Up_Time PO---- 093 093 000 - 0
4 Start_Stop_Count -O--CK 100 100 020 - 7
5 Reallocated_Sector_Ct PO--CK 100 100 036 - 0
7 Seek_Error_Rate POSR-- 100 253 030 - 31
9 Power_On_Hours -O--CK 083 083 000 - 15249
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 50
183 Runtime_Bad_Block -O--CK 100 100 000 - 0
184 End-to-End_Error -O--CK 100 100 099 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
188 Command_Timeout -O--CK 100 100 000 - 0
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 067 066 045 - 33 (Min/Max 33/34)
191 G-Sense_Error_Rate -O--CK 100 100 000 - 0
192 Power-Off_Retract_Count -O--CK 100 100 000 - 28
193 Load_Cycle_Count -O--CK 100 100 000 - 49
194 Temperature_Celsius -O---K 033 040 000 - 33 (0 17 0 0)
195 Hardware_ECC_Recovered -O-RC- 015 015 000 - 25974592
197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0
240 Head_Flying_Hours ------ 100 253 000 - 24416889077783
241 Total_LBAs_Written ------ 100 253 000 - 2021756950
242 Total_LBAs_Read ------ 100 253 000 - 1114083404
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
GP/S Log at address 0x00 has 1 sectors [Log Directory]
SMART Log at address 0x01 has 1 sectors [Summary SMART error log]
SMART Log at address 0x02 has 5 sectors [Comprehensive SMART error log]
GP Log at address 0x03 has 5 sectors [Ext. Comprehensive SMART error log]
SMART Log at address 0x06 has 1 sectors [SMART self-test log]
GP Log at address 0x07 has 1 sectors [Extended self-test log]
SMART Log at address 0x09 has 1 sectors [Selective self-test log]
GP Log at address 0x10 has 1 sectors [NCQ Command Error]
GP Log at address 0x11 has 1 sectors [SATA Phy Event Counters]
GP Log at address 0x21 has 1 sectors [Write stream error log]
GP Log at address 0x22 has 1 sectors [Read stream error log]
GP/S Log at address 0x80 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x81 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x82 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x83 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x84 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x85 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x86 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x87 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x88 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x89 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8a has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8b has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8c has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8d has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8e has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8f has 16 sectors [Host vendor specific log]
GP/S Log at address 0x90 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x91 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x92 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x93 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x94 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x95 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x96 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x97 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x98 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x99 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9a has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9b has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9c has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9d has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9e has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9f has 16 sectors [Host vendor specific log]
GP/S Log at address 0xa1 has 20 sectors [Device vendor specific log]
GP Log at address 0xa2 has 2248 sectors [Device vendor specific log]
GP/S Log at address 0xa8 has 20 sectors [Device vendor specific log]
GP/S Log at address 0xa9 has 1 sectors [Device vendor specific log]
GP Log at address 0xab has 1 sectors [Device vendor specific log]
GP Log at address 0xb0 has 2819 sectors [Device vendor specific log]
GP Log at address 0xbd has 252 sectors [Device vendor specific log]
GP Log at address 0xbe has 65535 sectors [Device vendor specific log]
GP Log at address 0xbf has 65535 sectors [Device vendor specific log]
GP/S Log at address 0xc0 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xe0 has 1 sectors [SCT Command/Status]
GP/S Log at address 0xe1 has 1 sectors [SCT Data Transfer]
SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 522 (0x020a)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 34 Celsius
Power Cycle Min/Max Temperature: 31/35 Celsius
Lifetime Min/Max Temperature: 15/46 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 59 minutes
Min/Max recommended Temperature: 14/55 Celsius
Min/Max Temperature Limit: 10/60 Celsius
Temperature History Size (Index): 128 (60)
Index Estimated Time Temperature Celsius
61 2013-06-04 08:00 31 ************
... ..( 2 skipped). .. ************
64 2013-06-04 10:57 31 ************
65 2013-06-04 11:56 32 *************
66 2013-06-04 12:55 31 ************
67 2013-06-04 13:54 31 ************
68 2013-06-04 14:53 30 ***********
69 2013-06-04 15:52 30 ***********
70 2013-06-04 16:51 30 ***********
71 2013-06-04 17:50 29 **********
... ..( 3 skipped). .. **********
75 2013-06-04 21:46 29 **********
76 2013-06-04 22:45 28 *********
... ..( 3 skipped). .. *********
80 2013-06-05 02:41 28 *********
81 2013-06-05 03:40 29 **********
... ..( 7 skipped). .. **********
89 2013-06-05 11:32 29 **********
90 2013-06-05 12:31 30 ***********
91 2013-06-05 13:30 31 ************
... ..( 3 skipped). .. ************
95 2013-06-05 17:26 31 ************
96 2013-06-05 18:25 30 ***********
97 2013-06-05 19:24 29 **********
... ..( 2 skipped). .. **********
100 2013-06-05 22:21 29 **********
101 2013-06-05 23:20 28 *********
... ..( 2 skipped). .. *********
104 2013-06-06 02:17 28 *********
105 2013-06-06 03:16 29 **********
... ..( 5 skipped). .. **********
111 2013-06-06 09:10 29 **********
112 2013-06-06 10:09 ? -
113 2013-06-06 11:08 25 ******
114 2013-06-06 12:07 25 ******
115 2013-06-06 13:06 32 *************
116 2013-06-06 14:05 33 **************
... ..( 9 skipped). .. **************
126 2013-06-06 23:55 33 **************
127 2013-06-07 00:54 32 *************
0 2013-06-07 01:53 32 *************
1 2013-06-07 02:52 33 **************
... ..( 10 skipped). .. **************
12 2013-06-07 13:41 33 **************
13 2013-06-07 14:40 ? -
14 2013-06-07 15:39 32 *************
15 2013-06-07 16:38 32 *************
16 2013-06-07 17:37 33 **************
17 2013-06-07 18:36 34 ***************
18 2013-06-07 19:35 34 ***************
19 2013-06-07 20:34 34 ***************
20 2013-06-07 21:33 33 **************
21 2013-06-07 22:32 33 **************
22 2013-06-07 23:31 32 *************
... ..( 4 skipped). .. *************
27 2013-06-08 04:26 32 *************
28 2013-06-08 05:25 ? -
29 2013-06-08 06:24 32 *************
30 2013-06-08 07:23 32 *************
31 2013-06-08 08:22 ? -
32 2013-06-08 09:21 26 *******
33 2013-06-08 10:20 26 *******
34 2013-06-08 11:19 ? -
35 2013-06-08 12:18 31 ************
36 2013-06-08 13:17 31 ************
37 2013-06-08 14:16 33 **************
38 2013-06-08 15:15 33 **************
39 2013-06-08 16:14 34 ***************
... ..( 5 skipped). .. ***************
45 2013-06-08 22:08 34 ***************
46 2013-06-08 23:07 33 **************
47 2013-06-09 00:06 33 **************
48 2013-06-09 01:05 34 ***************
... ..( 2 skipped). .. ***************
51 2013-06-09 04:02 34 ***************
52 2013-06-09 05:01 33 **************
... ..( 7 skipped). .. **************
60 2013-06-09 12:53 33 **************
Warning: device does not support SCT Error Recovery Control command
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x000a 2 9 Device-to-host register FISes sent due to a COMRESET
0x0001 2 0 Command failed due to ICRC error
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.0.0-32-server] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda Green (Adv. Format)
Device Model: ST2000DL003-9VT166
Serial Number: 5YD40GKJ
LU WWN Device Id: 5 000c50 038e29000
Firmware Version: CC32
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Sun Jun 9 13:08:39 2013 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 623) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x30b7) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 115 100 006 - 99752928
3 Spin_Up_Time PO---- 096 096 000 - 0
4 Start_Stop_Count -O--CK 100 100 020 - 4
5 Reallocated_Sector_Ct PO--CK 100 100 036 - 0
7 Seek_Error_Rate POSR-- 100 253 030 - 24
9 Power_On_Hours -O--CK 083 083 000 - 15185
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 47
183 Runtime_Bad_Block -O--CK 100 100 000 - 0
184 End-to-End_Error -O--CK 100 100 099 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
188 Command_Timeout -O--CK 100 100 000 - 0
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 066 066 045 - 34 (Min/Max 34/34)
191 G-Sense_Error_Rate -O--CK 100 100 000 - 0
192 Power-Off_Retract_Count -O--CK 100 100 000 - 25
193 Load_Cycle_Count -O--CK 100 100 000 - 46
194 Temperature_Celsius -O---K 034 040 000 - 34 (0 16 0 0)
195 Hardware_ECC_Recovered -O-RC- 019 019 000 - 99752928
197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0
240 Head_Flying_Hours ------ 100 253 000 - 72765335928851
241 Total_LBAs_Written ------ 100 253 000 - 983226830
242 Total_LBAs_Read ------ 100 253 000 - 1540468804
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
GP/S Log at address 0x00 has 1 sectors [Log Directory]
SMART Log at address 0x01 has 1 sectors [Summary SMART error log]
SMART Log at address 0x02 has 5 sectors [Comprehensive SMART error log]
GP Log at address 0x03 has 5 sectors [Ext. Comprehensive SMART error log]
SMART Log at address 0x06 has 1 sectors [SMART self-test log]
GP Log at address 0x07 has 1 sectors [Extended self-test log]
SMART Log at address 0x09 has 1 sectors [Selective self-test log]
GP Log at address 0x10 has 1 sectors [NCQ Command Error]
GP Log at address 0x11 has 1 sectors [SATA Phy Event Counters]
GP Log at address 0x21 has 1 sectors [Write stream error log]
GP Log at address 0x22 has 1 sectors [Read stream error log]
GP/S Log at address 0x80 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x81 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x82 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x83 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x84 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x85 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x86 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x87 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x88 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x89 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8a has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8b has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8c has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8d has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8e has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8f has 16 sectors [Host vendor specific log]
GP/S Log at address 0x90 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x91 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x92 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x93 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x94 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x95 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x96 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x97 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x98 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x99 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9a has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9b has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9c has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9d has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9e has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9f has 16 sectors [Host vendor specific log]
GP/S Log at address 0xa1 has 20 sectors [Device vendor specific log]
GP Log at address 0xa2 has 2248 sectors [Device vendor specific log]
GP/S Log at address 0xa8 has 20 sectors [Device vendor specific log]
GP/S Log at address 0xa9 has 1 sectors [Device vendor specific log]
GP Log at address 0xab has 1 sectors [Device vendor specific log]
GP Log at address 0xb0 has 2819 sectors [Device vendor specific log]
GP Log at address 0xbd has 252 sectors [Device vendor specific log]
GP Log at address 0xbe has 65535 sectors [Device vendor specific log]
GP Log at address 0xbf has 65535 sectors [Device vendor specific log]
GP/S Log at address 0xc0 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xe0 has 1 sectors [SCT Command/Status]
GP/S Log at address 0xe1 has 1 sectors [SCT Data Transfer]
SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 522 (0x020a)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 34 Celsius
Power Cycle Min/Max Temperature: 31/35 Celsius
Lifetime Min/Max Temperature: 14/48 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 59 minutes
Min/Max recommended Temperature: 14/55 Celsius
Min/Max Temperature Limit: 10/60 Celsius
Temperature History Size (Index): 128 (51)
Index Estimated Time Temperature Celsius
52 2013-06-04 08:00 33 **************
... ..( 2 skipped). .. **************
55 2013-06-04 10:57 33 **************
56 2013-06-04 11:56 34 ***************
57 2013-06-04 12:55 34 ***************
58 2013-06-04 13:54 33 **************
59 2013-06-04 14:53 32 *************
60 2013-06-04 15:52 32 *************
61 2013-06-04 16:51 32 *************
62 2013-06-04 17:50 31 ************
... ..( 4 skipped). .. ************
67 2013-06-04 22:45 31 ************
68 2013-06-04 23:44 30 ***********
... ..( 2 skipped). .. ***********
71 2013-06-05 02:41 30 ***********
72 2013-06-05 03:40 31 ************
... ..( 7 skipped). .. ************
80 2013-06-05 11:32 31 ************
81 2013-06-05 12:31 33 **************
82 2013-06-05 13:30 34 ***************
83 2013-06-05 14:29 33 **************
... ..( 3 skipped). .. **************
87 2013-06-05 18:25 33 **************
88 2013-06-05 19:24 31 ************
89 2013-06-05 20:23 31 ************
90 2013-06-05 21:22 30 ***********
... ..( 4 skipped). .. ***********
95 2013-06-06 02:17 30 ***********
96 2013-06-06 03:16 31 ************
... ..( 5 skipped). .. ************
102 2013-06-06 09:10 31 ************
103 2013-06-06 10:09 ? -
104 2013-06-06 11:08 26 *******
105 2013-06-06 12:07 26 *******
106 2013-06-06 13:06 33 **************
107 2013-06-06 14:05 34 ***************
... ..( 8 skipped). .. ***************
116 2013-06-06 22:56 34 ***************
117 2013-06-06 23:55 33 **************
... ..( 2 skipped). .. **************
120 2013-06-07 02:52 33 **************
121 2013-06-07 03:51 34 ***************
... ..( 9 skipped). .. ***************
3 2013-06-07 13:41 34 ***************
4 2013-06-07 14:40 ? -
5 2013-06-07 15:39 33 **************
6 2013-06-07 16:38 33 **************
7 2013-06-07 17:37 34 ***************
8 2013-06-07 18:36 34 ***************
9 2013-06-07 19:35 35 ****************
10 2013-06-07 20:34 35 ****************
11 2013-06-07 21:33 34 ***************
12 2013-06-07 22:32 33 **************
... ..( 5 skipped). .. **************
18 2013-06-08 04:26 33 **************
19 2013-06-08 05:25 ? -
20 2013-06-08 06:24 33 **************
21 2013-06-08 07:23 33 **************
22 2013-06-08 08:22 ? -
23 2013-06-08 09:21 27 ********
24 2013-06-08 10:20 27 ********
25 2013-06-08 11:19 ? -
26 2013-06-08 12:18 31 ************
27 2013-06-08 13:17 31 ************
28 2013-06-08 14:16 33 **************
29 2013-06-08 15:15 34 ***************
30 2013-06-08 16:14 34 ***************
31 2013-06-08 17:13 35 ****************
... ..( 4 skipped). .. ****************
36 2013-06-08 22:08 35 ****************
37 2013-06-08 23:07 34 ***************
38 2013-06-09 00:06 34 ***************
39 2013-06-09 01:05 35 ****************
40 2013-06-09 02:04 34 ***************
... ..( 10 skipped). .. ***************
51 2013-06-09 12:53 34 ***************
Warning: device does not support SCT Error Recovery Control command
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x000a 2 10 Device-to-host register FISes sent due to a COMRESET
0x0001 2 0 Command failed due to ICRC error
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.0.0-32-server] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda Green (Adv. Format)
Device Model: ST2000DL003-9VT166
Serial Number: 5YD46608
LU WWN Device Id: 5 000c50 038edda2f
Firmware Version: CC32
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Sun Jun 9 13:08:40 2013 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 623) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x30b7) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 116 100 006 - 104045136
3 Spin_Up_Time PO---- 093 093 000 - 0
4 Start_Stop_Count -O--CK 100 100 020 - 7
5 Reallocated_Sector_Ct PO--CK 100 100 036 - 0
7 Seek_Error_Rate POSR-- 100 253 030 - 27
9 Power_On_Hours -O--CK 083 083 000 - 15269
10 Spin_Retry_Count PO--C- 100 100 097 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 50
183 Runtime_Bad_Block -O--CK 100 100 000 - 0
184 End-to-End_Error -O--CK 100 100 099 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
188 Command_Timeout -O--CK 100 100 000 - 1
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 066 066 045 - 34 (Min/Max 34/34)
191 G-Sense_Error_Rate -O--CK 100 100 000 - 0
192 Power-Off_Retract_Count -O--CK 100 100 000 - 28
193 Load_Cycle_Count -O--CK 100 100 000 - 49
194 Temperature_Celsius -O---K 034 040 000 - 34 (0 16 0 0)
195 Hardware_ECC_Recovered -O-RC- 020 020 000 - 104045136
197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0
240 Head_Flying_Hours ------ 100 253 000 - 52411485913107
241 Total_LBAs_Written ------ 100 253 000 - 2955990382
242 Total_LBAs_Read ------ 100 253 000 - 3371798023
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
GP/S Log at address 0x00 has 1 sectors [Log Directory]
SMART Log at address 0x01 has 1 sectors [Summary SMART error log]
SMART Log at address 0x02 has 5 sectors [Comprehensive SMART error log]
GP Log at address 0x03 has 5 sectors [Ext. Comprehensive SMART error log]
SMART Log at address 0x06 has 1 sectors [SMART self-test log]
GP Log at address 0x07 has 1 sectors [Extended self-test log]
SMART Log at address 0x09 has 1 sectors [Selective self-test log]
GP Log at address 0x10 has 1 sectors [NCQ Command Error]
GP Log at address 0x11 has 1 sectors [SATA Phy Event Counters]
GP Log at address 0x21 has 1 sectors [Write stream error log]
GP Log at address 0x22 has 1 sectors [Read stream error log]
GP/S Log at address 0x80 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x81 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x82 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x83 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x84 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x85 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x86 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x87 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x88 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x89 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8a has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8b has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8c has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8d has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8e has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8f has 16 sectors [Host vendor specific log]
GP/S Log at address 0x90 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x91 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x92 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x93 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x94 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x95 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x96 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x97 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x98 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x99 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9a has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9b has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9c has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9d has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9e has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9f has 16 sectors [Host vendor specific log]
GP/S Log at address 0xa1 has 20 sectors [Device vendor specific log]
GP Log at address 0xa2 has 2248 sectors [Device vendor specific log]
GP/S Log at address 0xa8 has 20 sectors [Device vendor specific log]
GP/S Log at address 0xa9 has 1 sectors [Device vendor specific log]
GP Log at address 0xab has 1 sectors [Device vendor specific log]
GP Log at address 0xb0 has 2819 sectors [Device vendor specific log]
GP Log at address 0xbd has 252 sectors [Device vendor specific log]
GP Log at address 0xbe has 65535 sectors [Device vendor specific log]
GP Log at address 0xbf has 65535 sectors [Device vendor specific log]
GP/S Log at address 0xc0 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xe0 has 1 sectors [SCT Command/Status]
GP/S Log at address 0xe1 has 1 sectors [SCT Data Transfer]
SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 522 (0x020a)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 34 Celsius
Power Cycle Min/Max Temperature: 31/35 Celsius
Lifetime Min/Max Temperature: 13/48 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 59 minutes
Min/Max recommended Temperature: 14/55 Celsius
Min/Max Temperature Limit: 10/60 Celsius
Temperature History Size (Index): 128 (60)
Index Estimated Time Temperature Celsius
61 2013-06-04 08:00 33 **************
62 2013-06-04 08:59 33 **************
63 2013-06-04 09:58 33 **************
64 2013-06-04 10:57 34 ***************
65 2013-06-04 11:56 35 ****************
66 2013-06-04 12:55 34 ***************
67 2013-06-04 13:54 33 **************
68 2013-06-04 14:53 32 *************
... ..( 3 skipped). .. *************
72 2013-06-04 18:49 32 *************
73 2013-06-04 19:48 31 ************
... ..( 15 skipped). .. ************
89 2013-06-05 11:32 31 ************
90 2013-06-05 12:31 33 **************
91 2013-06-05 13:30 34 ***************
... ..( 3 skipped). .. ***************
95 2013-06-05 17:26 34 ***************
96 2013-06-05 18:25 33 **************
97 2013-06-05 19:24 32 *************
98 2013-06-05 20:23 31 ************
... ..( 2 skipped). .. ************
101 2013-06-05 23:20 31 ************
102 2013-06-06 00:19 30 ***********
103 2013-06-06 01:18 30 ***********
104 2013-06-06 02:17 31 ************
... ..( 6 skipped). .. ************
111 2013-06-06 09:10 31 ************
112 2013-06-06 10:09 ? -
113 2013-06-06 11:08 26 *******
114 2013-06-06 12:07 26 *******
115 2013-06-06 13:06 33 **************
116 2013-06-06 14:05 34 ***************
... ..( 7 skipped). .. ***************
124 2013-06-06 21:57 34 ***************
125 2013-06-06 22:56 33 **************
... ..( 5 skipped). .. **************
3 2013-06-07 04:50 33 **************
4 2013-06-07 05:49 34 ***************
... ..( 2 skipped). .. ***************
7 2013-06-07 08:46 34 ***************
8 2013-06-07 09:45 33 **************
9 2013-06-07 10:44 34 ***************
10 2013-06-07 11:43 34 ***************
11 2013-06-07 12:42 34 ***************
12 2013-06-07 13:41 33 **************
13 2013-06-07 14:40 ? -
14 2013-06-07 15:39 33 **************
15 2013-06-07 16:38 33 **************
16 2013-06-07 17:37 33 **************
17 2013-06-07 18:36 34 ***************
18 2013-06-07 19:35 35 ****************
19 2013-06-07 20:34 34 ***************
20 2013-06-07 21:33 33 **************
... ..( 6 skipped). .. **************
27 2013-06-08 04:26 33 **************
28 2013-06-08 05:25 ? -
29 2013-06-08 06:24 33 **************
30 2013-06-08 07:23 33 **************
31 2013-06-08 08:22 ? -
32 2013-06-08 09:21 26 *******
33 2013-06-08 10:20 26 *******
34 2013-06-08 11:19 ? -
35 2013-06-08 12:18 31 ************
36 2013-06-08 13:17 31 ************
37 2013-06-08 14:16 33 **************
38 2013-06-08 15:15 34 ***************
39 2013-06-08 16:14 34 ***************
40 2013-06-08 17:13 35 ****************
... ..( 4 skipped). .. ****************
45 2013-06-08 22:08 35 ****************
46 2013-06-08 23:07 34 ***************
47 2013-06-09 00:06 34 ***************
48 2013-06-09 01:05 35 ****************
49 2013-06-09 02:04 34 ***************
... ..( 10 skipped). .. ***************
60 2013-06-09 12:53 34 ***************
Warning: device does not support SCT Error Recovery Control command
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x000a 2 10 Device-to-host register FISes sent due to a COMRESET
0x0001 2 0 Command failed due to ICRC error
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.0.0-32-server] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: WDC WD20EZRX-00DC0B0
Serial Number: WD-WMC301671583
LU WWN Device Id: 5 0014ee 658a8e467
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 9
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sun Jun 9 13:08:40 2013 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (25500) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x70b5) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 100 253 051 - 0
3 Spin_Up_Time POS--K 100 253 021 - 0
4 Start_Stop_Count -O--CK 100 100 000 - 5
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 100 253 000 - 0
9 Power_On_Hours -O--CK 100 100 000 - 65
10 Spin_Retry_Count -O--CK 100 253 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 5
192 Power-Off_Retract_Count -O--CK 200 200 000 - 1
193 Load_Cycle_Count -O--CK 200 200 000 - 24
194 Temperature_Celsius -O---K 118 117 000 - 29
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
GP/S Log at address 0x00 has 1 sectors [Log Directory]
SMART Log at address 0x01 has 1 sectors [Summary SMART error log]
SMART Log at address 0x02 has 5 sectors [Comprehensive SMART error log]
GP Log at address 0x03 has 6 sectors [Ext. Comprehensive SMART error log]
SMART Log at address 0x06 has 1 sectors [SMART self-test log]
GP Log at address 0x07 has 1 sectors [Extended self-test log]
SMART Log at address 0x09 has 1 sectors [Selective self-test log]
GP Log at address 0x10 has 1 sectors [NCQ Command Error]
GP Log at address 0x11 has 1 sectors [SATA Phy Event Counters]
GP/S Log at address 0x80 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x81 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x82 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x83 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x84 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x85 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x86 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x87 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x88 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x89 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8a has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8b has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8c has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8d has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8e has 16 sectors [Host vendor specific log]
GP/S Log at address 0x8f has 16 sectors [Host vendor specific log]
GP/S Log at address 0x90 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x91 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x92 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x93 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x94 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x95 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x96 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x97 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x98 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x99 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9a has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9b has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9c has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9d has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9e has 16 sectors [Host vendor specific log]
GP/S Log at address 0x9f has 16 sectors [Host vendor specific log]
GP/S Log at address 0xa0 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa1 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa2 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa3 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa4 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa5 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa6 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa7 has 16 sectors [Device vendor specific log]
GP/S Log at address 0xa8 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xa9 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xaa has 1 sectors [Device vendor specific log]
GP/S Log at address 0xab has 1 sectors [Device vendor specific log]
GP/S Log at address 0xac has 1 sectors [Device vendor specific log]
GP/S Log at address 0xad has 1 sectors [Device vendor specific log]
GP/S Log at address 0xae has 1 sectors [Device vendor specific log]
GP/S Log at address 0xaf has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb0 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb1 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb2 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb3 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb4 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb5 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb6 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xb7 has 1 sectors [Device vendor specific log]
GP/S Log at address 0xbd has 1 sectors [Device vendor specific log]
GP/S Log at address 0xc0 has 1 sectors [Device vendor specific log]
GP Log at address 0xc1 has 93 sectors [Device vendor specific log]
GP/S Log at address 0xe0 has 1 sectors [SCT Command/Status]
GP/S Log at address 0xe1 has 1 sectors [SCT Data Transfer]
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 29 Celsius
Power Cycle Min/Max Temperature: 27/30 Celsius
Lifetime Min/Max Temperature: 19/30 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (111)
Index Estimated Time Temperature Celsius
112 2013-06-09 05:11 29 **********
... ..( 91 skipped). .. **********
204 2013-06-09 06:43 29 **********
205 2013-06-09 06:44 28 *********
... ..(382 skipped). .. *********
110 2013-06-09 13:07 28 *********
111 2013-06-09 13:08 29 **********
Warning: device does not support SCT Error Recovery Control command
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 5 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 5 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 87852 Vendor specific
----------------------------
serial number vs. device name for the subject disks
----------------------------
ata-ST2000DL003-9VT166_5YD40GKJ -> ../../sdc
ata-ST2000DL003-9VT166_5YD4476E -> ../../sdb
ata-ST2000DL003-9VT166_5YD46608 -> ../../sdd
ata-WDC_WD20EZRX-00DC0B0_WD-WMC301671583 -> ../../sde
----------------------------
for x in /sys/block/sd[acde]/device/timeout ; do echo $x $(< $x) ; done
----------------------------
/sys/block/sdb/device/timeout 30
/sys/block/sdc/device/timeout 30
/sys/block/sdd/device/timeout 30
/sys/block/sde/device/timeout 30
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: Help with failed RAID-5 -> 6 migration
2013-06-10 16:16 ` Fwd: " Keith Phillips
@ 2013-06-10 19:35 ` Phil Turmel
2013-06-11 2:08 ` Keith Phillips
0 siblings, 1 reply; 10+ messages in thread
From: Phil Turmel @ 2013-06-10 19:35 UTC (permalink / raw)
To: Keith Phillips; +Cc: linux-raid
On 06/10/2013 12:16 PM, Keith Phillips wrote:
> Apologies, Phil, if this is the second time you've got this now, but I
> just realised I dropped the linux-raid group from the email.
It's ok. I was busy yesterday and today.
> I'm still looking at a degraded array that won't start, so any input
> would be greatly appreciated.
>
> ---------- Forwarded message ----------
> From: Keith Phillips <spootsy.ootsy@gmail.com>
> Date: Sun, Jun 9, 2013 at 3:33 PM
> Subject: Re: Help with failed RAID-5 -> 6 migration
> To: Phil Turmel <philip@turmel.org>
>
>
> Thanks for the response, Phil.
>
> *snip*
>
>> That's unfortunate. I'm going to guess you'd still be getting errors if
>> the array was running. If you get more, please save them and report.
>
> Entirely possible - if I can get the array started again I suppose
> we'll see. All I can remember of it is an I/O error on something like
> '/dev/md/0/8', with a big stack trace.
A big stack trace suggests other problems in your system. Not that you
don't have potential I/O error issues, but there might be a kernel problem.
Please show "uname -a" and "mdadm --version".
>> Please elaborate on your recent "check". What method did you use, and
>> did you get any I/O errors in you logs at that time?
>
> There was Ubuntu's default monthly "check of redundancy data" -
> admittedly I hadn't looked at this to see what it actually does, but I
> was assuming it would verify the parity data for each stripe. mdadm is
> configured to email me on detection of errors.
The key thing to look for is a nonzero mismatch count in sysfs for that
array. I'm not familiar with Ubuntu's script, so you might want to look
by hand at some future point.
> Also, I installed the new drive a day prior to actually adding it to
> the array, and for some reason when I powered the machine back on the
> existing array started rebuilding itself (took about 6 hours and
> finished happily - no errors reported anywhere). Not a deliberate
> process, but I assumed (wrongly?) that one of those would've issued
> some warnings/errors if there was a problem.
There have been some conflicts between various distro scripts and MD's
requirements at shutdown, opening the possibility of unsaved
superblocks. I believe these are all fixed in current kernels.
>> Not sure yet. But unless the new drive is truly bad, there's no
>> significant difference in going forward vs. going back.
>>
>>> The backup-file doesn't exist, and the stats on the array are as follows:
>>
>> Losing the backup file may cause some data loss, regardless of
>> conversion direction.
>
> I'm okay with a bit of data loss - most of the data isn't critical.
> It'd be a real hassle to lose it all, though.
The backup file holds only a stripe's worth of data that can't be
juggled in place. And it isn't always needed.
>> Meanwhile, report what you know about "error recovery control". If it
>> is "nothing", you may need to do some googling in this list's archives.
>> Suitable keywords would include: "scterc", "ure", "timeout", and "error
>> recovery".
>>
>> Phil
>
> Prior to looking through this list yesterday: absolutely nothing. Now:
> almost nothing :)
Well, it bite many people. From the smartctl data below, not you. Yet.
> According to smartctl, none of my drives support it. Not surprising as
> they're all "green" desktop versions. When buying them I wasn't aware
> of this deficiency. By my limited understanding, lack of support just
> means the drives are likely to drop out of the array unnecessarily,
> correct? Maybe this was the cause of the unexpected rebuild after I
> added the new drive...
>
> *edited forward* Actually, on reflection that wouldn't be it, would
> it? If the drive was dropped for not responding due to it's lack of
> scterc, I think I would have had to manually re-add it, which I didn't
> do.
Drives are dropped immediately on write errors. Small numbers of read
errors are tolerated, and if correctable from redundancy, rewritten with
correct data. Consumer drives become unresponsive on read error due to
their aggressive error recovery algorithms, that can take a couple
minutes. Linux doesn't wait that long by default, and MD's attempt to
correct the bad data hits an unresponsive drive. ==> write error.
Boom. Single read error has turned into an array-killing write error.
> Requested info follows. FYI the new drive is now showing as
> "/dev/sde/" rather than "/dev/sda".
Ok. Adjust suggestions as appropriate.
> Also, while poking yesterday I noticed I was getting warnings of the
> form "Device has wrong state in superblock but /dev/sde seems ok", so
> I tried a forced assemble:
> mdadm --assemble /dev/md0 --force
>
> Looks like it updated some info in the superblocks (and yes, I forgot
> to save the original output first!), but the array remains inactive. I
> have now sworn off poking around by myself, because I've no idea what
> to do from here.
Please show /proc/mdstat again, along with "mdadm -D /dev/md0".
[trim /]
> for x in /sys/block/sd[acde]/device/timeout ; do echo $x $(< $x) ; done
> ----------------------------
> /sys/block/sdb/device/timeout 30
> /sys/block/sdc/device/timeout 30
> /sys/block/sdd/device/timeout 30
> /sys/block/sde/device/timeout 30
Due to your green drives, you cannot leave these timeouts at 30 seconds.
I recommend 180 seconds:
for x in /sys/block/sd[bcde]/device/timeout ; do echo 180 >$x ; done
(You should do this ASAP. On the run is fine.)
You will need your system to do this at every boot. Most distros have
rc.local or a similar scripting mechanism you can use.
Phil
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: Help with failed RAID-5 -> 6 migration
2013-06-10 19:35 ` Phil Turmel
@ 2013-06-11 2:08 ` Keith Phillips
2013-06-11 10:44 ` Phil Turmel
0 siblings, 1 reply; 10+ messages in thread
From: Keith Phillips @ 2013-06-11 2:08 UTC (permalink / raw)
To: Phil Turmel; +Cc: linux-raid
Hi Phil,
> A big stack trace suggests other problems in your system. Not that you
> don't have potential I/O error issues, but there might be a kernel problem.
>
> Please show "uname -a" and "mdadm --version".
These are the verisons I currently have, which the migration was
attempted with. The array was originally constructed years ago,
probably with older kernel/mdadm versions:
Linux muncher 3.0.0-32-server #51-Ubuntu SMP Thu Mar 21 16:09:49 UTC
2013 x86_64 x86_64 x86_64 GNU/Linux
mdadm - v3.1.4 - 31st August 2010
> The key thing to look for is a nonzero mismatch count in sysfs for that
> array. I'm not familiar with Ubuntu's script, so you might want to look
> by hand at some future point.
I'll have a look in future. I do also have mdadm running daily via
cron with "--monitor --oneshot" - do you know if this checks the
"mismatch_cnt" file and reports errors?
>> Also, while poking yesterday I noticed I was getting warnings of the
>> form "Device has wrong state in superblock but /dev/sde seems ok", so
>> I tried a forced assemble:
>> mdadm --assemble /dev/md0 --force
>>
>> Looks like it updated some info in the superblocks (and yes, I forgot
>> to save the original output first!), but the array remains inactive. I
>> have now sworn off poking around by myself, because I've no idea what
>> to do from here.
>
> Please show /proc/mdstat again, along with "mdadm -D /dev/md0".
---------------------------
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : inactive sde[4] sdc[1] sdb[0] sdd[3]
7814054240 blocks super 1.2
unused devices: <none>
---------------------------
/dev/md0:
Version : 1.2
Creation Time : Sun Jul 17 00:41:57 2011
Raid Level : raid6
Used Dev Size : 1953512960 (1863.02 GiB 2000.40 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Sat Jun 8 11:00:43 2013
State : active, degraded, Not Started
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric-6
Chunk Size : 512K
New Layout : left-symmetric
Name : muncher:0 (local to host muncher)
UUID : 830b9ec8:ca8dac63:e31946a0:4c76ccf0
Events : 50599
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 32 1 active sync /dev/sdc
3 8 48 2 active sync /dev/sdd
4 8 64 3 spare rebuilding /dev/sde
---------------------------
>> for x in /sys/block/sd[acde]/device/timeout ; do echo $x $(< $x) ; done
>> ----------------------------
>> /sys/block/sdb/device/timeout 30
>> /sys/block/sdc/device/timeout 30
>> /sys/block/sdd/device/timeout 30
>> /sys/block/sde/device/timeout 30
>
> Due to your green drives, you cannot leave these timeouts at 30 seconds.
> I recommend 180 seconds:
>
> for x in /sys/block/sd[bcde]/device/timeout ; do echo 180 >$x ; done
>
> (You should do this ASAP. On the run is fine.)
>
> You will need your system to do this at every boot. Most distros have
> rc.local or a similar scripting mechanism you can use.
>
> Phil
Done - thanks for the tip.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: Help with failed RAID-5 -> 6 migration
2013-06-11 2:08 ` Keith Phillips
@ 2013-06-11 10:44 ` Phil Turmel
2013-06-11 12:42 ` Vanhorn, Mike
[not found] ` <CAASLJ=6eEVY6DeZ=+9Aw6yXmqNSc5mygqtD_8y+MaUid6B_TcQ@mail.gmail.com>
0 siblings, 2 replies; 10+ messages in thread
From: Phil Turmel @ 2013-06-11 10:44 UTC (permalink / raw)
To: Keith Phillips; +Cc: linux-raid
On 06/10/2013 10:08 PM, Keith Phillips wrote:
> Hi Phil,
>
>> A big stack trace suggests other problems in your system. Not that you
>> don't have potential I/O error issues, but there might be a kernel problem.
>>
>> Please show "uname -a" and "mdadm --version".
>
> These are the verisons I currently have, which the migration was
> attempted with. The array was originally constructed years ago,
> probably with older kernel/mdadm versions:
>
> Linux muncher 3.0.0-32-server #51-Ubuntu SMP Thu Mar 21 16:09:49 UTC
> 2013 x86_64 x86_64 x86_64 GNU/Linux
>
> mdadm - v3.1.4 - 31st August 2010
If the recommendations below don't help, consider using a modern liveCD
to complete the reshape. I use SystemRescueCD myself, but I'm sure
others would do fine, too.
>> The key thing to look for is a nonzero mismatch count in sysfs for that
>> array. I'm not familiar with Ubuntu's script, so you might want to look
>> by hand at some future point.
>
> I'll have a look in future. I do also have mdadm running daily via
> cron with "--monitor --oneshot" - do you know if this checks the
> "mismatch_cnt" file and reports errors?
I don't think so.
>>> Also, while poking yesterday I noticed I was getting warnings of the
>>> form "Device has wrong state in superblock but /dev/sde seems ok", so
>>> I tried a forced assemble:
>>> mdadm --assemble /dev/md0 --force
>>>
>>> Looks like it updated some info in the superblocks (and yes, I forgot
>>> to save the original output first!), but the array remains inactive. I
>>> have now sworn off poking around by myself, because I've no idea what
>>> to do from here.
>>
>> Please show /proc/mdstat again, along with "mdadm -D /dev/md0".
>
> ---------------------------
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md0 : inactive sde[4] sdc[1] sdb[0] sdd[3]
> 7814054240 blocks super 1.2
>
> unused devices: <none>
> ---------------------------
> /dev/md0:
> Version : 1.2
> Creation Time : Sun Jul 17 00:41:57 2011
> Raid Level : raid6
> Used Dev Size : 1953512960 (1863.02 GiB 2000.40 GB)
> Raid Devices : 4
> Total Devices : 4
> Persistence : Superblock is persistent
>
> Update Time : Sat Jun 8 11:00:43 2013
> State : active, degraded, Not Started
> Active Devices : 3
> Working Devices : 4
> Failed Devices : 0
> Spare Devices : 1
>
> Layout : left-symmetric-6
> Chunk Size : 512K
>
> New Layout : left-symmetric
>
> Name : muncher:0 (local to host muncher)
> UUID : 830b9ec8:ca8dac63:e31946a0:4c76ccf0
> Events : 50599
>
> Number Major Minor RaidDevice State
> 0 8 16 0 active sync /dev/sdb
> 1 8 32 1 active sync /dev/sdc
> 3 8 48 2 active sync /dev/sdd
> 4 8 64 3 spare rebuilding /dev/sde
> ---------------------------
>
>>> for x in /sys/block/sd[acde]/device/timeout ; do echo $x $(< $x) ; done
>>> ----------------------------
>>> /sys/block/sdb/device/timeout 30
>>> /sys/block/sdc/device/timeout 30
>>> /sys/block/sdd/device/timeout 30
>>> /sys/block/sde/device/timeout 30
>>
>> Due to your green drives, you cannot leave these timeouts at 30 seconds.
>> I recommend 180 seconds:
>>
>> for x in /sys/block/sd[bcde]/device/timeout ; do echo 180 >$x ; done
>>
>> (You should do this ASAP. On the run is fine.)
>>
>> You will need your system to do this at every boot. Most distros have
>> rc.local or a similar scripting mechanism you can use.
>>
>> Phil
>
> Done - thanks for the tip.
Given the above data, I believe you should be able to just do "mdadm
/dev/md0 --run" and watch it recover.
If it still gives you trouble, stop the array and reassemble with "-vv"
and show what it reports.
Also report any dmesg errors.
Phil
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Help with failed RAID-5 -> 6 migration
2013-06-11 10:44 ` Phil Turmel
@ 2013-06-11 12:42 ` Vanhorn, Mike
[not found] ` <CAASLJ=6eEVY6DeZ=+9Aw6yXmqNSc5mygqtD_8y+MaUid6B_TcQ@mail.gmail.com>
1 sibling, 0 replies; 10+ messages in thread
From: Vanhorn, Mike @ 2013-06-11 12:42 UTC (permalink / raw)
To: Phil Turmel, Keith Phillips; +Cc: linux-raid
Using Keith Phillips' reported output from /proc/mdstat and mdadm
--detail, I have a question:
/proc/mdstat says that the array is "inactive":
>>---------------------------
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> [raid4] [raid10]
>> md0 : inactive sde[4] sdc[1] sdb[0] sdd[3]
>> 7814054240 blocks super 1.2
>>
>> unused devices: <none>
>> ---------------------------
But mdadm --detail says
>> State : active, degraded, Not Started
and goes on to show that the array is rebuilding using the spare. So, how
can it be both "inactive" and "active", and be rebuilding but "Not
Started"?
I think this is just my un-clarity concerning what these terms mean.
Thanks!
---
Mike VanHorn
Senior Computer Systems Administrator
College of Engineering and Computer Science
Wright State University
265 Russ Engineering Center
937-775-5157
michael.vanhorn@wright.edu
http://www.cecs.wright.edu/~mvanhorn/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: Help with failed RAID-5 -> 6 migration
[not found] ` <CAASLJ=6eEVY6DeZ=+9Aw6yXmqNSc5mygqtD_8y+MaUid6B_TcQ@mail.gmail.com>
@ 2013-06-12 14:51 ` Phil Turmel
[not found] ` <51B88AB2.5060303@turmel.org>
1 sibling, 0 replies; 10+ messages in thread
From: Phil Turmel @ 2013-06-12 14:51 UTC (permalink / raw)
To: Keith Phillips; +Cc: linux-raid
Sorry for the dupe, forgot the list:
On 06/11/2013 08:01 AM, Keith Phillips wrote:
[trim /]
> Assembling it with "mdadm -vv --assemble /dev/md0 /dev/sd[bcde]":
> -----------------------
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdb is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/sdc is identified as a member of /dev/md0, slot 1.
> mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/sde is identified as a member of /dev/md0, slot 3.
> mdadm:/dev/md0 has an active reshape - checking if critical section
> needs to be restored
> mdadm: Failed to find backup of critical section
> mdadm: Failed to restore critical section for reshape, sorry.
> Possibly you needed to specify the --backup-file
^^^^^^^^^^^^^
You won't be able to assemble and run your array without a backup file.
You said you lost your original, so you will have to use a blank one
and tell mdadm to ignore the invalid file.
When reshaping, some scenarios need the backup file only on the first
stripe. Some only on the last stripe. And some need the backup file
for every stripe. That appears to be your situation. Note that when
needed for every stripe, the speed of the reshape will be limited by the
speed of the device holding the backup file.
Phil
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Fwd: Help with failed RAID-5 -> 6 migration
[not found] ` <CAASLJ=7=hnez3udgc4Voa_i7drZq_Y-8FkOgxt02_ROL5eD3qg@mail.gmail.com>
@ 2013-06-13 14:09 ` Phil Turmel
0 siblings, 0 replies; 10+ messages in thread
From: Phil Turmel @ 2013-06-13 14:09 UTC (permalink / raw)
To: Keith Phillips; +Cc: linux-raid
On 06/13/2013 09:58 AM, Keith Phillips wrote:
>> You won't be able to assemble and run your array without a backup file.
>> You said you lost your original, so you will have to use a blank one
>> and tell mdadm to ignore the invalid file.
>
> Ah, didn't realise this was an option. After a brief googling it seems
> my version of mdadm pre-dated the "--invalid-backup" option.
>
> Cloned the git repo and built a newer version, and re-assembled with
> an empty "--backup-file" and the "--invalid-backup" option. Now it's
> chugging along happily again - at %20 and counting now, no errors in
> sight!
Good to hear. :-)
> Will do an ext4 fsck once it's finished the grow. Are there any tips
> for determining what data I trashed by losing the backup-file? Or is
> it just a case of trying to access stuff and seeing what's broken?
You have the reshape position where the process stopped in the original
mdadm -E reports. Use that to query for inodes that contain those
sectors, then look up those inodes.
A quick google came up with:
http://smartmontools.sourceforge.net/badblockhowto.html
You'll have to reinterpret that to use the sector offsets in your array
rather than sector offset from smartctl.
> Thanks so much for the help, Phil :)
You're welcome.
Phil
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-06-13 14:09 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-08 3:02 Help with failed RAID-5 -> 6 migration Keith Phillips
2013-06-08 22:43 ` Phil Turmel
2013-06-08 23:02 ` Phil Turmel
[not found] ` <CAASLJ=5JkQ8L9fbrOSUKH8Y-a7PZgkTcCsi6PW=rhzsUPRF6ow@mail.gmail.com>
2013-06-10 16:16 ` Fwd: " Keith Phillips
2013-06-10 19:35 ` Phil Turmel
2013-06-11 2:08 ` Keith Phillips
2013-06-11 10:44 ` Phil Turmel
2013-06-11 12:42 ` Vanhorn, Mike
[not found] ` <CAASLJ=6eEVY6DeZ=+9Aw6yXmqNSc5mygqtD_8y+MaUid6B_TcQ@mail.gmail.com>
2013-06-12 14:51 ` Fwd: " Phil Turmel
[not found] ` <51B88AB2.5060303@turmel.org>
[not found] ` <CAASLJ=7=hnez3udgc4Voa_i7drZq_Y-8FkOgxt02_ROL5eD3qg@mail.gmail.com>
2013-06-13 14:09 ` Phil Turmel
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.