* RAID6 array lost a disk, can someone decode the error?
@ 2009-11-11 5:37 Guy Watkins
2009-11-11 5:52 ` Majed B.
0 siblings, 1 reply; 4+ messages in thread
From: Guy Watkins @ 2009-11-11 5:37 UTC (permalink / raw)
To: 'LinuxRaid'
I have 2 4-disk RAID6 arrays that loose a disk sometimes. Maybe once every
month or 3. As far as I can tell I don't have disks that have un-readable
blocks. The RAID1 arrays also loose disks sometimes. I have the 4 disks on
1 controller, from lspci:
00:0e.0 Mass storage controller: Promise Technology, Inc. PDC20318 (SATA150
TX4) (rev 02)
I thought the RAID6 logic corrected single block errors? Maybe not on a
write? And I think this is a write because of "super_written"?
The array is a RAID6 but the errors say RAID5?
When I remove and add the disks back in they rebuild just fine.
Anyway, does anyone understand what this error really is? Is it bad disks?
Bad cable? Bad controller? Bad sunspots? :)
I did see that a smart test had failed at about the same time. I also read
that some disks or controllers can't handle smart tests. Could that be it?
I don't run smart tests vary often, so I know the other failures from the
past were not caused by a smart test. Maybe I am doing the tests wrong? I
used this command: "smartctl --test=long /dev/sda"
All info I think might be needed:
The disks are all Seagate ST3320620AS (320 GB disks).
# uname -a
Linux linux.watkins-home.com 2.6.27.35-170.2.94.fc10.i686 #1 SMP Thu Oct 1
14:58:51 EDT 2009 i686 i686 i386 GNU/Linux
# rpm -qa mdadm
mdadm-2.6.9-1.fc10.i386
From /var/log/messages-20091108
Nov 1 21:48:29 linux kernel: ata4.00: exception Emask 0x10 SAct 0x0 SErr
0x180203 action 0x6 frozen
Nov 1 21:48:29 linux kernel: ata4: SError: { RecovData RecovComm Persist
10B8B Dispar }
Nov 1 21:48:29 linux kernel: ata4.00: cmd
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Nov 1 21:48:29 linux kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask
0x14 (ATA bus error)
Nov 1 21:48:29 linux kernel: ata4.00: status: { DRDY }
Nov 1 21:48:29 linux kernel: ata4: hard resetting link
Nov 1 21:48:31 linux kernel: ata4: SATA link up 1.5 Gbps (SStatus 113
SControl 300)
Nov 1 21:48:31 linux kernel: ata4.00: configured for UDMA/133
Nov 1 21:48:31 linux kernel: ata4.00: device reported invalid CHS sector 0
Nov 1 21:48:31 linux kernel: ata4: EH complete
Nov 1 21:48:31 linux kernel: sd 3:0:0:0: [sdb] 625142448 512-byte hardware
sectors (320073 MB)
Nov 1 21:48:31 linux kernel: end_request: I/O error, dev sdb, sector
34089705
Nov 1 21:48:31 linux kernel: md: super_written gets error=-5, uptodate=0
Nov 1 21:48:31 linux kernel: raid5: Disk failure on sdb2, disabling device.
Nov 1 21:48:31 linux kernel: raid5: Operation continuing on 3 devices.
Nov 1 21:48:31 linux kernel: sd 3:0:0:0: [sdb] Write Protect is off
Nov 1 21:48:31 linux kernel: sd 3:0:0:0: [sdb] Write cache: enabled, read
cache: enabled, doesn't support DPO or FUA
Nov 1 21:48:31 linux kernel: RAID5 conf printout:
Nov 1 21:48:31 linux kernel: --- rd:4 wd:3
Nov 1 21:48:31 linux kernel: disk 0, o:0, dev:sdb2
Nov 1 21:48:31 linux kernel: disk 1, o:1, dev:sdd2
Nov 1 21:48:31 linux kernel: disk 2, o:1, dev:sdc2
Nov 1 21:48:31 linux kernel: disk 3, o:1, dev:sda2
Nov 1 21:48:31 linux kernel: RAID5 conf printout:
Nov 1 21:48:31 linux kernel: --- rd:4 wd:3
Nov 1 21:48:31 linux kernel: disk 1, o:1, dev:sdd2
Nov 1 21:48:31 linux kernel: disk 2, o:1, dev:sdc2
Nov 1 21:48:31 linux kernel: disk 3, o:1, dev:sda2
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1]
md0 : active raid1 sdd1[0] sda1[3] sdc1[2] sdb1[1]
264960 blocks [4/4] [UUUU]
bitmap: 0/33 pages [0KB], 4KB chunk
md4 : active raid6 sdd4[0] sda4[3] sdc4[2] sdb4[4](F)
586853888 blocks level 6, 256k chunk, algorithm 2 [4/3] [U_UU]
bitmap: 70/140 pages [280KB], 1024KB chunk
md2 : active raid1 sdb3[0] sda3[1]
2096384 blocks [2/2] [UU]
bitmap: 0/128 pages [0KB], 8KB chunk
md1 : active raid1 sdd3[0] sdc3[1]
2096384 blocks [2/2] [UU]
bitmap: 0/128 pages [0KB], 8KB chunk
md3 : active raid6 sdb2[4](F) sdd2[1] sda2[3] sdc2[2]
33559552 blocks level 6, 256k chunk, algorithm 2 [4/3] [_UUU]
bitmap: 119/129 pages [476KB], 64KB chunk
unused devices: <none>
# smartctl -a /dev/sda
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.10 family
Device Model: ST3320620AS
Serial Number: 3QF08NDL
Firmware Version: 3.AAD
User Capacity: 320,072,933,376 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Tue Nov 10 23:57:28 2009 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection:
Enabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test has
ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 115) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 114 097 006 Pre-fail Always
- 77830969
3 Spin_Up_Time 0x0003 094 090 000 Pre-fail Always
- 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
- 83
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000f 081 060 030 Pre-fail Always
- 150227385
9 Power_On_Hours 0x0032 073 073 000 Old_age Always
- 23919
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always
- 116
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
- 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
- 0
190 Airflow_Temperature_Cel 0x0022 061 046 045 Old_age Always
- 39 (Lifetime Min/Max 37/43)
194 Temperature_Celsius 0x0022 039 054 000 Old_age Always
- 39 (0 21 0 0)
195 Hardware_ECC_Recovered 0x001a 065 054 000 Old_age Always
- 102168431
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
- 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline
- 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always
- 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours)
LBA_of_first_error
# 1 Extended offline Completed without error 00% 23730
-
# 2 Extended offline Completed without error 00% 22581
-
# 3 Short offline Completed without error 00% 22577
-
# 4 Extended offline Completed without error 00% 17267
-
# 5 Short offline Completed without error 00% 17259
-
# 6 Extended offline Completed without error 00% 384
-
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
# smartctl -a /dev/sdb
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.10 family
Device Model: ST3320620AS
Serial Number: 3QF08SKR
Firmware Version: 3.AAD
User Capacity: 320,072,933,376 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Wed Nov 11 00:03:14 2009 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection:
Enabled.
Self-test execution status: ( 37) The self-test routine was
interrupted
by the host with a hard or soft
reset.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 115) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 111 091 006 Pre-fail Always
- 136981744
3 Spin_Up_Time 0x0003 099 090 000 Pre-fail Always
- 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
- 104
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
- 1
7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always
- 257877357
9 Power_On_Hours 0x0032 073 073 000 Old_age Always
- 23916
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always
- 157
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
- 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
- 0
190 Airflow_Temperature_Cel 0x0022 059 049 045 Old_age Always
- 41 (Lifetime Min/Max 38/43)
194 Temperature_Celsius 0x0022 041 051 000 Old_age Always
- 41 (0 21 0 0)
195 Hardware_ECC_Recovered 0x001a 063 054 000 Old_age Always
- 160751697
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
- 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline
- 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always
- 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours)
LBA_of_first_error
# 1 Extended offline Interrupted (host reset) 50% 23726
-
# 2 Extended offline Completed without error 00% 22580
-
# 3 Short offline Completed without error 00% 22577
-
# 4 Extended offline Completed without error 00% 17267
-
# 5 Short offline Completed without error 00% 17260
-
# 6 Extended offline Completed without error 00% 384
-
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
# smartctl -a /dev/sdc
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.10 family
Device Model: ST3320620AS
Serial Number: 3QF08V24
Firmware Version: 3.AAD
User Capacity: 320,072,933,376 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Wed Nov 11 00:03:36 2009 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection:
Enabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test has
ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 115) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 119 090 006 Pre-fail Always
- 221110249
3 Spin_Up_Time 0x0003 094 090 000 Pre-fail Always
- 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
- 94
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000f 081 060 030 Pre-fail Always
- 138219006
9 Power_On_Hours 0x0032 073 073 000 Old_age Always
- 23917
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always
- 130
187 Reported_Uncorrect 0x0032 082 082 000 Old_age Always
- 18
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
- 0
190 Airflow_Temperature_Cel 0x0022 059 044 045 Old_age Always
In_the_past 41 (Lifetime Min/Max 39/45)
194 Temperature_Celsius 0x0022 041 056 000 Old_age Always
- 41 (0 22 0 0)
195 Hardware_ECC_Recovered 0x001a 066 057 000 Old_age Always
- 145841009
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
- 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline
- 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always
- 0
SMART Error Log Version: 1
ATA Error Count: 18 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 18 occurred at disk power-on lifetime: 5380 hours (224 days + 4 hours)
When the command that caused the error occurred, the device was active or
idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 e1 7e 09 e0 00 00:16:26.026 READ DMA EXT
ec 00 00 00 00 00 a0 00 00:16:26.022 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 00:16:26.022 SET FEATURES [Set transfer
mode]
ec 00 00 00 00 00 a0 00 00:16:26.019 IDENTIFY DEVICE
25 00 00 e1 7e 09 e0 00 00:16:24.456 READ DMA EXT
Error 17 occurred at disk power-on lifetime: 5380 hours (224 days + 4 hours)
When the command that caused the error occurred, the device was active or
idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 e1 7e 09 e0 00 00:16:21.313 READ DMA EXT
ec 00 00 00 00 00 a0 00 00:16:19.753 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 00:16:19.749 SET FEATURES [Set transfer
mode]
ec 00 00 00 00 00 a0 00 00:16:19.749 IDENTIFY DEVICE
25 00 00 e1 7e 09 e0 00 00:16:24.456 READ DMA EXT
Error 16 occurred at disk power-on lifetime: 5380 hours (224 days + 4 hours)
When the command that caused the error occurred, the device was active or
idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 e1 7e 09 e0 00 00:16:21.313 READ DMA EXT
ec 00 00 00 00 00 a0 00 00:16:19.753 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 00:16:19.749 SET FEATURES [Set transfer
mode]
ec 00 00 00 00 00 a0 00 00:16:19.749 IDENTIFY DEVICE
25 00 00 e1 7e 09 e0 00 00:16:19.745 READ DMA EXT
Error 15 occurred at disk power-on lifetime: 5380 hours (224 days + 4 hours)
When the command that caused the error occurred, the device was active or
idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 e1 7e 09 e0 00 00:16:21.313 READ DMA EXT
ec 00 00 00 00 00 a0 00 00:16:19.753 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 00:16:19.749 SET FEATURES [Set transfer
mode]
ec 00 00 00 00 00 a0 00 00:16:19.749 IDENTIFY DEVICE
25 00 00 e1 7e 09 e0 00 00:16:19.745 READ DMA EXT
Error 14 occurred at disk power-on lifetime: 5380 hours (224 days + 4 hours)
When the command that caused the error occurred, the device was active or
idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 e1 7e 09 e0 00 00:16:17.672 READ DMA EXT
ec 00 00 00 00 00 a0 00 00:16:19.753 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 00:16:19.749 SET FEATURES [Set transfer
mode]
ec 00 00 00 00 00 a0 00 00:16:19.749 IDENTIFY DEVICE
25 00 00 e1 7e 09 e0 00 00:16:19.745 READ DMA EXT
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours)
LBA_of_first_error
# 1 Extended offline Completed without error 00% 23728
-
# 2 Extended offline Completed without error 00% 22579
-
# 3 Short offline Completed without error 00% 22576
-
# 4 Extended offline Completed without error 00% 17265
-
# 5 Short offline Completed without error 00% 17257
-
# 6 Extended offline Completed without error 00% 384
-
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
# smartctl -a /dev/sdd
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.10 family
Device Model: ST3320620AS
Serial Number: 3QF08WDP
Firmware Version: 3.AAD
User Capacity: 320,072,933,376 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Wed Nov 11 00:04:04 2009 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection:
Enabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test has
ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 115) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 110 090 006 Pre-fail Always
- 25809154
3 Spin_Up_Time 0x0003 098 090 000 Pre-fail Always
- 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
- 516
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
- 0
7 Seek_Error_Rate 0x000f 082 060 030 Pre-fail Always
- 192909989
9 Power_On_Hours 0x0032 073 073 000 Old_age Always
- 23896
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always
- 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always
- 777
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
- 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
- 0
190 Airflow_Temperature_Cel 0x0022 061 050 045 Old_age Always
- 39 (Lifetime Min/Max 36/42)
194 Temperature_Celsius 0x0022 039 050 000 Old_age Always
- 39 (0 20 0 0)
195 Hardware_ECC_Recovered 0x001a 064 055 000 Old_age Always
- 81546876
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline
- 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
- 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline
- 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always
- 0
SMART Error Log Version: 1
ATA Error Count: 6 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 6 occurred at disk power-on lifetime: 10007 hours (416 days + 23
hours)
When the command that caused the error occurred, the device was active or
idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 8f 0c 4a e0 00 00:05:45.657 READ DMA EXT
27 00 00 00 00 00 e0 00 00:05:45.654 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:05:43.727 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 00:05:43.660 SET FEATURES [Set transfer
mode]
27 00 00 00 00 00 e0 00 00:05:43.658 READ NATIVE MAX ADDRESS EXT
Error 5 occurred at disk power-on lifetime: 10007 hours (416 days + 23
hours)
When the command that caused the error occurred, the device was active or
idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 8f 0c 4a e0 00 00:05:45.657 READ DMA EXT
27 00 00 00 00 00 e0 00 00:05:45.654 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:05:43.727 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 00:05:43.660 SET FEATURES [Set transfer
mode]
27 00 00 00 00 00 e0 00 00:05:43.658 READ NATIVE MAX ADDRESS EXT
Error 4 occurred at disk power-on lifetime: 10007 hours (416 days + 23
hours)
When the command that caused the error occurred, the device was active or
idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 8f 0c 4a e0 00 00:05:39.547 READ DMA EXT
27 00 00 00 00 00 e0 00 00:05:39.544 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:05:43.727 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 00:05:43.660 SET FEATURES [Set transfer
mode]
27 00 00 00 00 00 e0 00 00:05:43.658 READ NATIVE MAX ADDRESS EXT
Error 3 occurred at disk power-on lifetime: 10007 hours (416 days + 23
hours)
When the command that caused the error occurred, the device was active or
idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 8f 0c 4a e0 00 00:05:39.547 READ DMA EXT
27 00 00 00 00 00 e0 00 00:05:39.544 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:05:39.530 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 00:05:39.475 SET FEATURES [Set transfer
mode]
27 00 00 00 00 00 e0 00 00:05:39.472 READ NATIVE MAX ADDRESS EXT
Error 2 occurred at disk power-on lifetime: 10007 hours (416 days + 23
hours)
When the command that caused the error occurred, the device was active or
idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 8f 0c 4a e0 00 00:05:39.547 READ DMA EXT
27 00 00 00 00 00 e0 00 00:05:39.544 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 00:05:39.530 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 00:05:39.475 SET FEATURES [Set transfer
mode]
27 00 00 00 00 00 e0 00 00:05:39.472 READ NATIVE MAX ADDRESS EXT
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours)
LBA_of_first_error
# 1 Extended offline Completed without error 00% 23707
-
# 2 Extended offline Completed without error 00% 22559
-
# 3 Short offline Completed without error 00% 22555
-
# 4 Extended offline Completed without error 00% 17248
-
# 5 Short offline Completed without error 00% 17241
-
# 6 Short offline Completed without error 00% 17241
-
# 7 Extended offline Completed without error 00% 384
-
# 8 Short offline Completed without error 00% 381
-
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Thanks,
Guy
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RAID6 array lost a disk, can someone decode the error?
2009-11-11 5:37 RAID6 array lost a disk, can someone decode the error? Guy Watkins
@ 2009-11-11 5:52 ` Majed B.
2009-11-11 7:01 ` Guy Watkins
2009-11-11 7:08 ` Thomas Fjellstrom
0 siblings, 2 replies; 4+ messages in thread
From: Majed B. @ 2009-11-11 5:52 UTC (permalink / raw)
To: Guy Watkins; +Cc: LinuxRaid
You seem to have very high numbers in Hardware_ECC_Recovered and
Raw_Read_Error_Rate. I suggest you replace your cables.
You don't have bad sectors, which is good.
Are you using the controller for RAD or just as a way to connect your disks?
I've had similar link-reset problems, but not written related. Turns
out one of the disks had a bad PCB.
On Wed, Nov 11, 2009 at 8:37 AM, Guy Watkins <guy@watkins-home.com> wrote:
> I have 2 4-disk RAID6 arrays that loose a disk sometimes. Maybe once every
> month or 3. As far as I can tell I don't have disks that have un-readable
> blocks. The RAID1 arrays also loose disks sometimes. I have the 4 disks on
> 1 controller, from lspci:
> 00:0e.0 Mass storage controller: Promise Technology, Inc. PDC20318 (SATA150
> TX4) (rev 02)
>
> I thought the RAID6 logic corrected single block errors? Maybe not on a
> write? And I think this is a write because of "super_written"?
>
> The array is a RAID6 but the errors say RAID5?
>
> When I remove and add the disks back in they rebuild just fine.
>
> Anyway, does anyone understand what this error really is? Is it bad disks?
> Bad cable? Bad controller? Bad sunspots? :)
>
> I did see that a smart test had failed at about the same time. I also read
> that some disks or controllers can't handle smart tests. Could that be it?
> I don't run smart tests vary often, so I know the other failures from the
> past were not caused by a smart test. Maybe I am doing the tests wrong? I
> used this command: "smartctl --test=long /dev/sda"
>
> All info I think might be needed:
>
> The disks are all Seagate ST3320620AS (320 GB disks).
>
> # uname -a
> Linux linux.watkins-home.com 2.6.27.35-170.2.94.fc10.i686 #1 SMP Thu Oct 1
> 14:58:51 EDT 2009 i686 i686 i386 GNU/Linux
>
> # rpm -qa mdadm
> mdadm-2.6.9-1.fc10.i386
>
> From /var/log/messages-20091108
> Nov 1 21:48:29 linux kernel: ata4.00: exception Emask 0x10 SAct 0x0 SErr
> 0x180203 action 0x6 frozen
> Nov 1 21:48:29 linux kernel: ata4: SError: { RecovData RecovComm Persist
> 10B8B Dispar }
> Nov 1 21:48:29 linux kernel: ata4.00: cmd
> ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
> Nov 1 21:48:29 linux kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask
> 0x14 (ATA bus error)
> Nov 1 21:48:29 linux kernel: ata4.00: status: { DRDY }
> Nov 1 21:48:29 linux kernel: ata4: hard resetting link
> Nov 1 21:48:31 linux kernel: ata4: SATA link up 1.5 Gbps (SStatus 113
> SControl 300)
> Nov 1 21:48:31 linux kernel: ata4.00: configured for UDMA/133
> Nov 1 21:48:31 linux kernel: ata4.00: device reported invalid CHS sector 0
> Nov 1 21:48:31 linux kernel: ata4: EH complete
> Nov 1 21:48:31 linux kernel: sd 3:0:0:0: [sdb] 625142448 512-byte hardware
> sectors (320073 MB)
> Nov 1 21:48:31 linux kernel: end_request: I/O error, dev sdb, sector
> 34089705
> Nov 1 21:48:31 linux kernel: md: super_written gets error=-5, uptodate=0
> Nov 1 21:48:31 linux kernel: raid5: Disk failure on sdb2, disabling device.
> Nov 1 21:48:31 linux kernel: raid5: Operation continuing on 3 devices.
> Nov 1 21:48:31 linux kernel: sd 3:0:0:0: [sdb] Write Protect is off
> Nov 1 21:48:31 linux kernel: sd 3:0:0:0: [sdb] Write cache: enabled, read
> cache: enabled, doesn't support DPO or FUA
> Nov 1 21:48:31 linux kernel: RAID5 conf printout:
> Nov 1 21:48:31 linux kernel: --- rd:4 wd:3
> Nov 1 21:48:31 linux kernel: disk 0, o:0, dev:sdb2
> Nov 1 21:48:31 linux kernel: disk 1, o:1, dev:sdd2
> Nov 1 21:48:31 linux kernel: disk 2, o:1, dev:sdc2
> Nov 1 21:48:31 linux kernel: disk 3, o:1, dev:sda2
> Nov 1 21:48:31 linux kernel: RAID5 conf printout:
> Nov 1 21:48:31 linux kernel: --- rd:4 wd:3
> Nov 1 21:48:31 linux kernel: disk 1, o:1, dev:sdd2
> Nov 1 21:48:31 linux kernel: disk 2, o:1, dev:sdc2
> Nov 1 21:48:31 linux kernel: disk 3, o:1, dev:sda2
>
> # cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4] [raid1]
> md0 : active raid1 sdd1[0] sda1[3] sdc1[2] sdb1[1]
> 264960 blocks [4/4] [UUUU]
> bitmap: 0/33 pages [0KB], 4KB chunk
>
> md4 : active raid6 sdd4[0] sda4[3] sdc4[2] sdb4[4](F)
> 586853888 blocks level 6, 256k chunk, algorithm 2 [4/3] [U_UU]
> bitmap: 70/140 pages [280KB], 1024KB chunk
>
> md2 : active raid1 sdb3[0] sda3[1]
> 2096384 blocks [2/2] [UU]
> bitmap: 0/128 pages [0KB], 8KB chunk
>
> md1 : active raid1 sdd3[0] sdc3[1]
> 2096384 blocks [2/2] [UU]
> bitmap: 0/128 pages [0KB], 8KB chunk
>
> md3 : active raid6 sdb2[4](F) sdd2[1] sda2[3] sdc2[2]
> 33559552 blocks level 6, 256k chunk, algorithm 2 [4/3] [_UUU]
> bitmap: 119/129 pages [476KB], 64KB chunk
>
> unused devices: <none>
>
> # smartctl -a /dev/sda
> smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce
> Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF INFORMATION SECTION ===
> Model Family: Seagate Barracuda 7200.10 family
> Device Model: ST3320620AS
> Serial Number: 3QF08NDL
> Firmware Version: 3.AAD
> User Capacity: 320,072,933,376 bytes
> Device is: In smartctl database [for details use: -P show]
> ATA Version is: 7
> ATA Standard is: Exact ATA specification draft version not indicated
> Local Time is: Tue Nov 10 23:57:28 2009 EST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status: (0x82) Offline data collection activity
> was completed without error.
> Auto Offline Data Collection:
> Enabled.
> Self-test execution status: ( 0) The previous self-test routine
> completed
> without error or no self-test has
> ever
> been run.
> Total time to complete Offline
> data collection: ( 430) seconds.
> Offline data collection
> capabilities: (0x5b) SMART execute Offline immediate.
> Auto Offline data collection on/off
> support.
> Suspend Offline collection upon new
> command.
> Offline surface scan supported.
> Self-test supported.
> No Conveyance Self-test supported.
> Selective Self-test supported.
> SMART capabilities: (0x0003) Saves SMART data before entering
> power-saving mode.
> Supports SMART auto save timer.
> Error logging capability: (0x01) Error logging supported.
> General Purpose Logging supported.
> Short self-test routine
> recommended polling time: ( 1) minutes.
> Extended self-test routine
> recommended polling time: ( 115) minutes.
>
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x000f 114 097 006 Pre-fail Always
> - 77830969
> 3 Spin_Up_Time 0x0003 094 090 000 Pre-fail Always
> - 0
> 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
> - 83
> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
> - 0
> 7 Seek_Error_Rate 0x000f 081 060 030 Pre-fail Always
> - 150227385
> 9 Power_On_Hours 0x0032 073 073 000 Old_age Always
> - 23919
> 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always
> - 0
> 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always
> - 116
> 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
> - 0
> 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
> - 0
> 190 Airflow_Temperature_Cel 0x0022 061 046 045 Old_age Always
> - 39 (Lifetime Min/Max 37/43)
> 194 Temperature_Celsius 0x0022 039 054 000 Old_age Always
> - 39 (0 21 0 0)
> 195 Hardware_ECC_Recovered 0x001a 065 054 000 Old_age Always
> - 102168431
> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
> - 0
> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline
> - 0
> 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
> - 0
> 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline
> - 0
> 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always
> - 0
>
> SMART Error Log Version: 1
> No Errors Logged
>
> SMART Self-test log structure revision number 1
> Num Test_Description Status Remaining LifeTime(hours)
> LBA_of_first_error
> # 1 Extended offline Completed without error 00% 23730
> -
> # 2 Extended offline Completed without error 00% 22581
> -
> # 3 Short offline Completed without error 00% 22577
> -
> # 4 Extended offline Completed without error 00% 17267
> -
> # 5 Short offline Completed without error 00% 17259
> -
> # 6 Extended offline Completed without error 00% 384
> -
>
> SMART Selective self-test log data structure revision number 1
> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
> 1 0 0 Not_testing
> 2 0 0 Not_testing
> 3 0 0 Not_testing
> 4 0 0 Not_testing
> 5 0 0 Not_testing
> Selective self-test flags (0x0):
> After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
>
> # smartctl -a /dev/sdb
> smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce
> Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF INFORMATION SECTION ===
> Model Family: Seagate Barracuda 7200.10 family
> Device Model: ST3320620AS
> Serial Number: 3QF08SKR
> Firmware Version: 3.AAD
> User Capacity: 320,072,933,376 bytes
> Device is: In smartctl database [for details use: -P show]
> ATA Version is: 7
> ATA Standard is: Exact ATA specification draft version not indicated
> Local Time is: Wed Nov 11 00:03:14 2009 EST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status: (0x82) Offline data collection activity
> was completed without error.
> Auto Offline Data Collection:
> Enabled.
> Self-test execution status: ( 37) The self-test routine was
> interrupted
> by the host with a hard or soft
> reset.
> Total time to complete Offline
> data collection: ( 430) seconds.
> Offline data collection
> capabilities: (0x5b) SMART execute Offline immediate.
> Auto Offline data collection on/off
> support.
> Suspend Offline collection upon new
> command.
> Offline surface scan supported.
> Self-test supported.
> No Conveyance Self-test supported.
> Selective Self-test supported.
> SMART capabilities: (0x0003) Saves SMART data before entering
> power-saving mode.
> Supports SMART auto save timer.
> Error logging capability: (0x01) Error logging supported.
> General Purpose Logging supported.
> Short self-test routine
> recommended polling time: ( 1) minutes.
> Extended self-test routine
> recommended polling time: ( 115) minutes.
>
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x000f 111 091 006 Pre-fail Always
> - 136981744
> 3 Spin_Up_Time 0x0003 099 090 000 Pre-fail Always
> - 0
> 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
> - 104
> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
> - 1
> 7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always
> - 257877357
> 9 Power_On_Hours 0x0032 073 073 000 Old_age Always
> - 23916
> 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always
> - 0
> 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always
> - 157
> 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
> - 0
> 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
> - 0
> 190 Airflow_Temperature_Cel 0x0022 059 049 045 Old_age Always
> - 41 (Lifetime Min/Max 38/43)
> 194 Temperature_Celsius 0x0022 041 051 000 Old_age Always
> - 41 (0 21 0 0)
> 195 Hardware_ECC_Recovered 0x001a 063 054 000 Old_age Always
> - 160751697
> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
> - 0
> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline
> - 0
> 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
> - 0
> 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline
> - 0
> 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always
> - 0
>
> SMART Error Log Version: 1
> No Errors Logged
>
> SMART Self-test log structure revision number 1
> Num Test_Description Status Remaining LifeTime(hours)
> LBA_of_first_error
> # 1 Extended offline Interrupted (host reset) 50% 23726
> -
> # 2 Extended offline Completed without error 00% 22580
> -
> # 3 Short offline Completed without error 00% 22577
> -
> # 4 Extended offline Completed without error 00% 17267
> -
> # 5 Short offline Completed without error 00% 17260
> -
> # 6 Extended offline Completed without error 00% 384
> -
>
> SMART Selective self-test log data structure revision number 1
> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
> 1 0 0 Not_testing
> 2 0 0 Not_testing
> 3 0 0 Not_testing
> 4 0 0 Not_testing
> 5 0 0 Not_testing
> Selective self-test flags (0x0):
> After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
>
> # smartctl -a /dev/sdc
> smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce
> Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF INFORMATION SECTION ===
> Model Family: Seagate Barracuda 7200.10 family
> Device Model: ST3320620AS
> Serial Number: 3QF08V24
> Firmware Version: 3.AAD
> User Capacity: 320,072,933,376 bytes
> Device is: In smartctl database [for details use: -P show]
> ATA Version is: 7
> ATA Standard is: Exact ATA specification draft version not indicated
> Local Time is: Wed Nov 11 00:03:36 2009 EST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> See vendor-specific Attribute list for marginal Attributes.
>
> General SMART Values:
> Offline data collection status: (0x82) Offline data collection activity
> was completed without error.
> Auto Offline Data Collection:
> Enabled.
> Self-test execution status: ( 0) The previous self-test routine
> completed
> without error or no self-test has
> ever
> been run.
> Total time to complete Offline
> data collection: ( 430) seconds.
> Offline data collection
> capabilities: (0x5b) SMART execute Offline immediate.
> Auto Offline data collection on/off
> support.
> Suspend Offline collection upon new
> command.
> Offline surface scan supported.
> Self-test supported.
> No Conveyance Self-test supported.
> Selective Self-test supported.
> SMART capabilities: (0x0003) Saves SMART data before entering
> power-saving mode.
> Supports SMART auto save timer.
> Error logging capability: (0x01) Error logging supported.
> General Purpose Logging supported.
> Short self-test routine
> recommended polling time: ( 1) minutes.
> Extended self-test routine
> recommended polling time: ( 115) minutes.
>
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x000f 119 090 006 Pre-fail Always
> - 221110249
> 3 Spin_Up_Time 0x0003 094 090 000 Pre-fail Always
> - 0
> 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
> - 94
> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
> - 0
> 7 Seek_Error_Rate 0x000f 081 060 030 Pre-fail Always
> - 138219006
> 9 Power_On_Hours 0x0032 073 073 000 Old_age Always
> - 23917
> 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always
> - 0
> 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always
> - 130
> 187 Reported_Uncorrect 0x0032 082 082 000 Old_age Always
> - 18
> 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
> - 0
> 190 Airflow_Temperature_Cel 0x0022 059 044 045 Old_age Always
> In_the_past 41 (Lifetime Min/Max 39/45)
> 194 Temperature_Celsius 0x0022 041 056 000 Old_age Always
> - 41 (0 22 0 0)
> 195 Hardware_ECC_Recovered 0x001a 066 057 000 Old_age Always
> - 145841009
> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
> - 0
> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline
> - 0
> 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
> - 0
> 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline
> - 0
> 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always
> - 0
>
> SMART Error Log Version: 1
> ATA Error Count: 18 (device log contains only the most recent five errors)
> CR = Command Register [HEX]
> FR = Features Register [HEX]
> SC = Sector Count Register [HEX]
> SN = Sector Number Register [HEX]
> CL = Cylinder Low Register [HEX]
> CH = Cylinder High Register [HEX]
> DH = Device/Head Register [HEX]
> DC = Device Command Register [HEX]
> ER = Error register [HEX]
> ST = Status register [HEX]
> Powered_Up_Time is measured from power on, and printed as
> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
> SS=sec, and sss=millisec. It "wraps" after 49.710 days.
>
> Error 18 occurred at disk power-on lifetime: 5380 hours (224 days + 4 hours)
> When the command that caused the error occurred, the device was active or
> idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 00 e1 7e 09 e0 00 00:16:26.026 READ DMA EXT
> ec 00 00 00 00 00 a0 00 00:16:26.022 IDENTIFY DEVICE
> ef 03 46 00 00 00 a0 00 00:16:26.022 SET FEATURES [Set transfer
> mode]
> ec 00 00 00 00 00 a0 00 00:16:26.019 IDENTIFY DEVICE
> 25 00 00 e1 7e 09 e0 00 00:16:24.456 READ DMA EXT
>
> Error 17 occurred at disk power-on lifetime: 5380 hours (224 days + 4 hours)
> When the command that caused the error occurred, the device was active or
> idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 00 e1 7e 09 e0 00 00:16:21.313 READ DMA EXT
> ec 00 00 00 00 00 a0 00 00:16:19.753 IDENTIFY DEVICE
> ef 03 46 00 00 00 a0 00 00:16:19.749 SET FEATURES [Set transfer
> mode]
> ec 00 00 00 00 00 a0 00 00:16:19.749 IDENTIFY DEVICE
> 25 00 00 e1 7e 09 e0 00 00:16:24.456 READ DMA EXT
>
> Error 16 occurred at disk power-on lifetime: 5380 hours (224 days + 4 hours)
> When the command that caused the error occurred, the device was active or
> idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 00 e1 7e 09 e0 00 00:16:21.313 READ DMA EXT
> ec 00 00 00 00 00 a0 00 00:16:19.753 IDENTIFY DEVICE
> ef 03 46 00 00 00 a0 00 00:16:19.749 SET FEATURES [Set transfer
> mode]
> ec 00 00 00 00 00 a0 00 00:16:19.749 IDENTIFY DEVICE
> 25 00 00 e1 7e 09 e0 00 00:16:19.745 READ DMA EXT
>
> Error 15 occurred at disk power-on lifetime: 5380 hours (224 days + 4 hours)
> When the command that caused the error occurred, the device was active or
> idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 00 e1 7e 09 e0 00 00:16:21.313 READ DMA EXT
> ec 00 00 00 00 00 a0 00 00:16:19.753 IDENTIFY DEVICE
> ef 03 46 00 00 00 a0 00 00:16:19.749 SET FEATURES [Set transfer
> mode]
> ec 00 00 00 00 00 a0 00 00:16:19.749 IDENTIFY DEVICE
> 25 00 00 e1 7e 09 e0 00 00:16:19.745 READ DMA EXT
>
> Error 14 occurred at disk power-on lifetime: 5380 hours (224 days + 4 hours)
> When the command that caused the error occurred, the device was active or
> idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 00 e1 7e 09 e0 00 00:16:17.672 READ DMA EXT
> ec 00 00 00 00 00 a0 00 00:16:19.753 IDENTIFY DEVICE
> ef 03 46 00 00 00 a0 00 00:16:19.749 SET FEATURES [Set transfer
> mode]
> ec 00 00 00 00 00 a0 00 00:16:19.749 IDENTIFY DEVICE
> 25 00 00 e1 7e 09 e0 00 00:16:19.745 READ DMA EXT
>
> SMART Self-test log structure revision number 1
> Num Test_Description Status Remaining LifeTime(hours)
> LBA_of_first_error
> # 1 Extended offline Completed without error 00% 23728
> -
> # 2 Extended offline Completed without error 00% 22579
> -
> # 3 Short offline Completed without error 00% 22576
> -
> # 4 Extended offline Completed without error 00% 17265
> -
> # 5 Short offline Completed without error 00% 17257
> -
> # 6 Extended offline Completed without error 00% 384
> -
>
> SMART Selective self-test log data structure revision number 1
> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
> 1 0 0 Not_testing
> 2 0 0 Not_testing
> 3 0 0 Not_testing
> 4 0 0 Not_testing
> 5 0 0 Not_testing
> Selective self-test flags (0x0):
> After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
>
> # smartctl -a /dev/sdd
> smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce
> Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF INFORMATION SECTION ===
> Model Family: Seagate Barracuda 7200.10 family
> Device Model: ST3320620AS
> Serial Number: 3QF08WDP
> Firmware Version: 3.AAD
> User Capacity: 320,072,933,376 bytes
> Device is: In smartctl database [for details use: -P show]
> ATA Version is: 7
> ATA Standard is: Exact ATA specification draft version not indicated
> Local Time is: Wed Nov 11 00:04:04 2009 EST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status: (0x82) Offline data collection activity
> was completed without error.
> Auto Offline Data Collection:
> Enabled.
> Self-test execution status: ( 0) The previous self-test routine
> completed
> without error or no self-test has
> ever
> been run.
> Total time to complete Offline
> data collection: ( 430) seconds.
> Offline data collection
> capabilities: (0x5b) SMART execute Offline immediate.
> Auto Offline data collection on/off
> support.
> Suspend Offline collection upon new
> command.
> Offline surface scan supported.
> Self-test supported.
> No Conveyance Self-test supported.
> Selective Self-test supported.
> SMART capabilities: (0x0003) Saves SMART data before entering
> power-saving mode.
> Supports SMART auto save timer.
> Error logging capability: (0x01) Error logging supported.
> General Purpose Logging supported.
> Short self-test routine
> recommended polling time: ( 1) minutes.
> Extended self-test routine
> recommended polling time: ( 115) minutes.
>
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x000f 110 090 006 Pre-fail Always
> - 25809154
> 3 Spin_Up_Time 0x0003 098 090 000 Pre-fail Always
> - 0
> 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
> - 516
> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
> - 0
> 7 Seek_Error_Rate 0x000f 082 060 030 Pre-fail Always
> - 192909989
> 9 Power_On_Hours 0x0032 073 073 000 Old_age Always
> - 23896
> 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always
> - 0
> 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always
> - 777
> 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
> - 0
> 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
> - 0
> 190 Airflow_Temperature_Cel 0x0022 061 050 045 Old_age Always
> - 39 (Lifetime Min/Max 36/42)
> 194 Temperature_Celsius 0x0022 039 050 000 Old_age Always
> - 39 (0 20 0 0)
> 195 Hardware_ECC_Recovered 0x001a 064 055 000 Old_age Always
> - 81546876
> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
> - 0
> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline
> - 0
> 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
> - 0
> 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline
> - 0
> 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always
> - 0
>
> SMART Error Log Version: 1
> ATA Error Count: 6 (device log contains only the most recent five errors)
> CR = Command Register [HEX]
> FR = Features Register [HEX]
> SC = Sector Count Register [HEX]
> SN = Sector Number Register [HEX]
> CL = Cylinder Low Register [HEX]
> CH = Cylinder High Register [HEX]
> DH = Device/Head Register [HEX]
> DC = Device Command Register [HEX]
> ER = Error register [HEX]
> ST = Status register [HEX]
> Powered_Up_Time is measured from power on, and printed as
> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
> SS=sec, and sss=millisec. It "wraps" after 49.710 days.
>
> Error 6 occurred at disk power-on lifetime: 10007 hours (416 days + 23
> hours)
> When the command that caused the error occurred, the device was active or
> idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 00 8f 0c 4a e0 00 00:05:45.657 READ DMA EXT
> 27 00 00 00 00 00 e0 00 00:05:45.654 READ NATIVE MAX ADDRESS EXT
> ec 00 00 00 00 00 a0 00 00:05:43.727 IDENTIFY DEVICE
> ef 03 46 00 00 00 a0 00 00:05:43.660 SET FEATURES [Set transfer
> mode]
> 27 00 00 00 00 00 e0 00 00:05:43.658 READ NATIVE MAX ADDRESS EXT
>
> Error 5 occurred at disk power-on lifetime: 10007 hours (416 days + 23
> hours)
> When the command that caused the error occurred, the device was active or
> idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 00 8f 0c 4a e0 00 00:05:45.657 READ DMA EXT
> 27 00 00 00 00 00 e0 00 00:05:45.654 READ NATIVE MAX ADDRESS EXT
> ec 00 00 00 00 00 a0 00 00:05:43.727 IDENTIFY DEVICE
> ef 03 46 00 00 00 a0 00 00:05:43.660 SET FEATURES [Set transfer
> mode]
> 27 00 00 00 00 00 e0 00 00:05:43.658 READ NATIVE MAX ADDRESS EXT
>
> Error 4 occurred at disk power-on lifetime: 10007 hours (416 days + 23
> hours)
> When the command that caused the error occurred, the device was active or
> idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 00 8f 0c 4a e0 00 00:05:39.547 READ DMA EXT
> 27 00 00 00 00 00 e0 00 00:05:39.544 READ NATIVE MAX ADDRESS EXT
> ec 00 00 00 00 00 a0 00 00:05:43.727 IDENTIFY DEVICE
> ef 03 46 00 00 00 a0 00 00:05:43.660 SET FEATURES [Set transfer
> mode]
> 27 00 00 00 00 00 e0 00 00:05:43.658 READ NATIVE MAX ADDRESS EXT
>
> Error 3 occurred at disk power-on lifetime: 10007 hours (416 days + 23
> hours)
> When the command that caused the error occurred, the device was active or
> idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 00 8f 0c 4a e0 00 00:05:39.547 READ DMA EXT
> 27 00 00 00 00 00 e0 00 00:05:39.544 READ NATIVE MAX ADDRESS EXT
> ec 00 00 00 00 00 a0 00 00:05:39.530 IDENTIFY DEVICE
> ef 03 46 00 00 00 a0 00 00:05:39.475 SET FEATURES [Set transfer
> mode]
> 27 00 00 00 00 00 e0 00 00:05:39.472 READ NATIVE MAX ADDRESS EXT
>
> Error 2 occurred at disk power-on lifetime: 10007 hours (416 days + 23
> hours)
> When the command that caused the error occurred, the device was active or
> idle.
>
> After command completion occurred, registers were:
> ER ST SC SN CL CH DH
> -- -- -- -- -- -- --
> 40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
>
> Commands leading to the command that caused the error were:
> CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> -- -- -- -- -- -- -- -- ---------------- --------------------
> 25 00 00 8f 0c 4a e0 00 00:05:39.547 READ DMA EXT
> 27 00 00 00 00 00 e0 00 00:05:39.544 READ NATIVE MAX ADDRESS EXT
> ec 00 00 00 00 00 a0 00 00:05:39.530 IDENTIFY DEVICE
> ef 03 46 00 00 00 a0 00 00:05:39.475 SET FEATURES [Set transfer
> mode]
> 27 00 00 00 00 00 e0 00 00:05:39.472 READ NATIVE MAX ADDRESS EXT
>
> SMART Self-test log structure revision number 1
> Num Test_Description Status Remaining LifeTime(hours)
> LBA_of_first_error
> # 1 Extended offline Completed without error 00% 23707
> -
> # 2 Extended offline Completed without error 00% 22559
> -
> # 3 Short offline Completed without error 00% 22555
> -
> # 4 Extended offline Completed without error 00% 17248
> -
> # 5 Short offline Completed without error 00% 17241
> -
> # 6 Short offline Completed without error 00% 17241
> -
> # 7 Extended offline Completed without error 00% 384
> -
> # 8 Short offline Completed without error 00% 381
> -
>
> SMART Selective self-test log data structure revision number 1
> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
> 1 0 0 Not_testing
> 2 0 0 Not_testing
> 3 0 0 Not_testing
> 4 0 0 Not_testing
> 5 0 0 Not_testing
> Selective self-test flags (0x0):
> After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
>
> Thanks,
> Guy
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: RAID6 array lost a disk, can someone decode the error?
2009-11-11 5:52 ` Majed B.
@ 2009-11-11 7:01 ` Guy Watkins
2009-11-11 7:08 ` Thomas Fjellstrom
1 sibling, 0 replies; 4+ messages in thread
From: Guy Watkins @ 2009-11-11 7:01 UTC (permalink / raw)
To: 'Majed B.'; +Cc: 'LinuxRaid'
} -----Original Message-----
} From: Majed B. [mailto:majedb@gmail.com]
} Sent: Wednesday, November 11, 2009 12:52 AM
} To: Guy Watkins
} Cc: LinuxRaid
} Subject: Re: RAID6 array lost a disk, can someone decode the error?
}
} You seem to have very high numbers in Hardware_ECC_Recovered and
} Raw_Read_Error_Rate. I suggest you replace your cables.
I thought I just did not understand those fields. They seemed high/bad to
me too, but I do not have any other disks to compare to. You think all 4
cables could be bad? They are the same, but no idea what brand.
ok, any recommended vendor for new cables?
} You don't have bad sectors, which is good.
}
} Are you using the controller for RAD or just as a way to connect your
} disks?
JBOD. I did not know that controller had RAID. :)
} I've had similar link-reset problems, but not written related. Turns
} out one of the disks had a bad PCB.
}
} On Wed, Nov 11, 2009 at 8:37 AM, Guy Watkins <guy@watkins-home.com> wrote:
} > I have 2 4-disk RAID6 arrays that loose a disk sometimes. Maybe once
} every
} > month or 3. As far as I can tell I don't have disks that have un-
} readable
} > blocks. The RAID1 arrays also loose disks sometimes. I have the 4
} disks on
} > 1 controller, from lspci:
} > 00:0e.0 Mass storage controller: Promise Technology, Inc. PDC20318
} (SATA150
} > TX4) (rev 02)
} >
} > I thought the RAID6 logic corrected single block errors? Maybe not on a
} > write? And I think this is a write because of "super_written"?
} >
} > The array is a RAID6 but the errors say RAID5?
} >
} > When I remove and add the disks back in they rebuild just fine.
} >
} > Anyway, does anyone understand what this error really is? Is it bad
} disks?
} > Bad cable? Bad controller? Bad sunspots? :)
} >
} > I did see that a smart test had failed at about the same time. I also
} read
} > that some disks or controllers can't handle smart tests. Could that be
} it?
} > I don't run smart tests vary often, so I know the other failures from
} the
} > past were not caused by a smart test. Maybe I am doing the tests wrong?
} I
} > used this command: "smartctl --test=long /dev/sda"
} >
} > All info I think might be needed:
} >
} > The disks are all Seagate ST3320620AS (320 GB disks).
} >
} > # uname -a
} > Linux linux.watkins-home.com 2.6.27.35-170.2.94.fc10.i686 #1 SMP Thu Oct
} 1
} > 14:58:51 EDT 2009 i686 i686 i386 GNU/Linux
} >
} > # rpm -qa mdadm
} > mdadm-2.6.9-1.fc10.i386
} >
} > From /var/log/messages-20091108
} > Nov 1 21:48:29 linux kernel: ata4.00: exception Emask 0x10 SAct 0x0
} SErr
} > 0x180203 action 0x6 frozen
} > Nov 1 21:48:29 linux kernel: ata4: SError: { RecovData RecovComm
} Persist
} > 10B8B Dispar }
} > Nov 1 21:48:29 linux kernel: ata4.00: cmd
} > ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
} > Nov 1 21:48:29 linux kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00
} Emask
} > 0x14 (ATA bus error)
} > Nov 1 21:48:29 linux kernel: ata4.00: status: { DRDY }
} > Nov 1 21:48:29 linux kernel: ata4: hard resetting link
} > Nov 1 21:48:31 linux kernel: ata4: SATA link up 1.5 Gbps (SStatus 113
} > SControl 300)
} > Nov 1 21:48:31 linux kernel: ata4.00: configured for UDMA/133
} > Nov 1 21:48:31 linux kernel: ata4.00: device reported invalid CHS
} sector 0
} > Nov 1 21:48:31 linux kernel: ata4: EH complete
} > Nov 1 21:48:31 linux kernel: sd 3:0:0:0: [sdb] 625142448 512-byte
} hardware
} > sectors (320073 MB)
} > Nov 1 21:48:31 linux kernel: end_request: I/O error, dev sdb, sector
} > 34089705
} > Nov 1 21:48:31 linux kernel: md: super_written gets error=-5,
} uptodate=0
} > Nov 1 21:48:31 linux kernel: raid5: Disk failure on sdb2, disabling
} device.
} > Nov 1 21:48:31 linux kernel: raid5: Operation continuing on 3 devices.
} > Nov 1 21:48:31 linux kernel: sd 3:0:0:0: [sdb] Write Protect is off
} > Nov 1 21:48:31 linux kernel: sd 3:0:0:0: [sdb] Write cache: enabled,
} read
} > cache: enabled, doesn't support DPO or FUA
} > Nov 1 21:48:31 linux kernel: RAID5 conf printout:
} > Nov 1 21:48:31 linux kernel: --- rd:4 wd:3
} > Nov 1 21:48:31 linux kernel: disk 0, o:0, dev:sdb2
} > Nov 1 21:48:31 linux kernel: disk 1, o:1, dev:sdd2
} > Nov 1 21:48:31 linux kernel: disk 2, o:1, dev:sdc2
} > Nov 1 21:48:31 linux kernel: disk 3, o:1, dev:sda2
} > Nov 1 21:48:31 linux kernel: RAID5 conf printout:
} > Nov 1 21:48:31 linux kernel: --- rd:4 wd:3
} > Nov 1 21:48:31 linux kernel: disk 1, o:1, dev:sdd2
} > Nov 1 21:48:31 linux kernel: disk 2, o:1, dev:sdc2
} > Nov 1 21:48:31 linux kernel: disk 3, o:1, dev:sda2
} >
} > # cat /proc/mdstat
} > Personalities : [raid6] [raid5] [raid4] [raid1]
} > md0 : active raid1 sdd1[0] sda1[3] sdc1[2] sdb1[1]
} > 264960 blocks [4/4] [UUUU]
} > bitmap: 0/33 pages [0KB], 4KB chunk
} >
} > md4 : active raid6 sdd4[0] sda4[3] sdc4[2] sdb4[4](F)
} > 586853888 blocks level 6, 256k chunk, algorithm 2 [4/3] [U_UU]
} > bitmap: 70/140 pages [280KB], 1024KB chunk
} >
} > md2 : active raid1 sdb3[0] sda3[1]
} > 2096384 blocks [2/2] [UU]
} > bitmap: 0/128 pages [0KB], 8KB chunk
} >
} > md1 : active raid1 sdd3[0] sdc3[1]
} > 2096384 blocks [2/2] [UU]
} > bitmap: 0/128 pages [0KB], 8KB chunk
} >
} > md3 : active raid6 sdb2[4](F) sdd2[1] sda2[3] sdc2[2]
} > 33559552 blocks level 6, 256k chunk, algorithm 2 [4/3] [_UUU]
} > bitmap: 119/129 pages [476KB], 64KB chunk
} >
} > unused devices: <none>
} >
} > # smartctl -a /dev/sda
} > smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce
} > Allen
} > Home page is http://smartmontools.sourceforge.net/
} >
} > === START OF INFORMATION SECTION ===
} > Model Family: Seagate Barracuda 7200.10 family
} > Device Model: ST3320620AS
} > Serial Number: 3QF08NDL
} > Firmware Version: 3.AAD
} > User Capacity: 320,072,933,376 bytes
} > Device is: In smartctl database [for details use: -P show]
} > ATA Version is: 7
} > ATA Standard is: Exact ATA specification draft version not indicated
} > Local Time is: Tue Nov 10 23:57:28 2009 EST
} > SMART support is: Available - device has SMART capability.
} > SMART support is: Enabled
} >
} > === START OF READ SMART DATA SECTION ===
} > SMART overall-health self-assessment test result: PASSED
} >
} > General SMART Values:
} > Offline data collection status: (0x82) Offline data collection activity
} > was completed without error.
} > Auto Offline Data Collection:
} > Enabled.
} > Self-test execution status: ( 0) The previous self-test routine
} > completed
} > without error or no self-test has
} > ever
} > been run.
} > Total time to complete Offline
} > data collection: ( 430) seconds.
} > Offline data collection
} > capabilities: (0x5b) SMART execute Offline immediate.
} > Auto Offline data collection
} on/off
} > support.
} > Suspend Offline collection upon
} new
} > command.
} > Offline surface scan supported.
} > Self-test supported.
} > No Conveyance Self-test
} supported.
} > Selective Self-test supported.
} > SMART capabilities: (0x0003) Saves SMART data before entering
} > power-saving mode.
} > Supports SMART auto save timer.
} > Error logging capability: (0x01) Error logging supported.
} > General Purpose Logging
} supported.
} > Short self-test routine
} > recommended polling time: ( 1) minutes.
} > Extended self-test routine
} > recommended polling time: ( 115) minutes.
} >
} > SMART Attributes Data Structure revision number: 10
} > Vendor Specific SMART Attributes with Thresholds:
} > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
} UPDATED
} > WHEN_FAILED RAW_VALUE
} > 1 Raw_Read_Error_Rate 0x000f 114 097 006 Pre-fail Always
} > - 77830969
} > 3 Spin_Up_Time 0x0003 094 090 000 Pre-fail Always
} > - 0
} > 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
} > - 83
} > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
} > - 0
} > 7 Seek_Error_Rate 0x000f 081 060 030 Pre-fail Always
} > - 150227385
} > 9 Power_On_Hours 0x0032 073 073 000 Old_age Always
} > - 23919
} > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always
} > - 0
} > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always
} > - 116
} > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
} > - 0
} > 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
} > - 0
} > 190 Airflow_Temperature_Cel 0x0022 061 046 045 Old_age Always
} > - 39 (Lifetime Min/Max 37/43)
} > 194 Temperature_Celsius 0x0022 039 054 000 Old_age Always
} > - 39 (0 21 0 0)
} > 195 Hardware_ECC_Recovered 0x001a 065 054 000 Old_age Always
} > - 102168431
} > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
} > - 0
} > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
} Offline
} > - 0
} > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
} > - 0
} > 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age
} Offline
} > - 0
} > 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always
} > - 0
} >
} > SMART Error Log Version: 1
} > No Errors Logged
} >
} > SMART Self-test log structure revision number 1
} > Num Test_Description Status Remaining
} LifeTime(hours)
} > LBA_of_first_error
} > # 1 Extended offline Completed without error 00% 23730
} > -
} > # 2 Extended offline Completed without error 00% 22581
} > -
} > # 3 Short offline Completed without error 00% 22577
} > -
} > # 4 Extended offline Completed without error 00% 17267
} > -
} > # 5 Short offline Completed without error 00% 17259
} > -
} > # 6 Extended offline Completed without error 00% 384
} > -
} >
} > SMART Selective self-test log data structure revision number 1
} > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
} > 1 0 0 Not_testing
} > 2 0 0 Not_testing
} > 3 0 0 Not_testing
} > 4 0 0 Not_testing
} > 5 0 0 Not_testing
} > Selective self-test flags (0x0):
} > After scanning selected spans, do NOT read-scan remainder of disk.
} > If Selective self-test is pending on power-up, resume after 0 minute
} delay.
} >
} > # smartctl -a /dev/sdb
} > smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce
} > Allen
} > Home page is http://smartmontools.sourceforge.net/
} >
} > === START OF INFORMATION SECTION ===
} > Model Family: Seagate Barracuda 7200.10 family
} > Device Model: ST3320620AS
} > Serial Number: 3QF08SKR
} > Firmware Version: 3.AAD
} > User Capacity: 320,072,933,376 bytes
} > Device is: In smartctl database [for details use: -P show]
} > ATA Version is: 7
} > ATA Standard is: Exact ATA specification draft version not indicated
} > Local Time is: Wed Nov 11 00:03:14 2009 EST
} > SMART support is: Available - device has SMART capability.
} > SMART support is: Enabled
} >
} > === START OF READ SMART DATA SECTION ===
} > SMART overall-health self-assessment test result: PASSED
} >
} > General SMART Values:
} > Offline data collection status: (0x82) Offline data collection activity
} > was completed without error.
} > Auto Offline Data Collection:
} > Enabled.
} > Self-test execution status: ( 37) The self-test routine was
} > interrupted
} > by the host with a hard or soft
} > reset.
} > Total time to complete Offline
} > data collection: ( 430) seconds.
} > Offline data collection
} > capabilities: (0x5b) SMART execute Offline immediate.
} > Auto Offline data collection
} on/off
} > support.
} > Suspend Offline collection upon
} new
} > command.
} > Offline surface scan supported.
} > Self-test supported.
} > No Conveyance Self-test
} supported.
} > Selective Self-test supported.
} > SMART capabilities: (0x0003) Saves SMART data before entering
} > power-saving mode.
} > Supports SMART auto save timer.
} > Error logging capability: (0x01) Error logging supported.
} > General Purpose Logging
} supported.
} > Short self-test routine
} > recommended polling time: ( 1) minutes.
} > Extended self-test routine
} > recommended polling time: ( 115) minutes.
} >
} > SMART Attributes Data Structure revision number: 10
} > Vendor Specific SMART Attributes with Thresholds:
} > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
} UPDATED
} > WHEN_FAILED RAW_VALUE
} > 1 Raw_Read_Error_Rate 0x000f 111 091 006 Pre-fail Always
} > - 136981744
} > 3 Spin_Up_Time 0x0003 099 090 000 Pre-fail Always
} > - 0
} > 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
} > - 104
} > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
} > - 1
} > 7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always
} > - 257877357
} > 9 Power_On_Hours 0x0032 073 073 000 Old_age Always
} > - 23916
} > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always
} > - 0
} > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always
} > - 157
} > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
} > - 0
} > 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
} > - 0
} > 190 Airflow_Temperature_Cel 0x0022 059 049 045 Old_age Always
} > - 41 (Lifetime Min/Max 38/43)
} > 194 Temperature_Celsius 0x0022 041 051 000 Old_age Always
} > - 41 (0 21 0 0)
} > 195 Hardware_ECC_Recovered 0x001a 063 054 000 Old_age Always
} > - 160751697
} > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
} > - 0
} > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
} Offline
} > - 0
} > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
} > - 0
} > 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age
} Offline
} > - 0
} > 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always
} > - 0
} >
} > SMART Error Log Version: 1
} > No Errors Logged
} >
} > SMART Self-test log structure revision number 1
} > Num Test_Description Status Remaining
} LifeTime(hours)
} > LBA_of_first_error
} > # 1 Extended offline Interrupted (host reset) 50% 23726
} > -
} > # 2 Extended offline Completed without error 00% 22580
} > -
} > # 3 Short offline Completed without error 00% 22577
} > -
} > # 4 Extended offline Completed without error 00% 17267
} > -
} > # 5 Short offline Completed without error 00% 17260
} > -
} > # 6 Extended offline Completed without error 00% 384
} > -
} >
} > SMART Selective self-test log data structure revision number 1
} > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
} > 1 0 0 Not_testing
} > 2 0 0 Not_testing
} > 3 0 0 Not_testing
} > 4 0 0 Not_testing
} > 5 0 0 Not_testing
} > Selective self-test flags (0x0):
} > After scanning selected spans, do NOT read-scan remainder of disk.
} > If Selective self-test is pending on power-up, resume after 0 minute
} delay.
} >
} > # smartctl -a /dev/sdc
} > smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce
} > Allen
} > Home page is http://smartmontools.sourceforge.net/
} >
} > === START OF INFORMATION SECTION ===
} > Model Family: Seagate Barracuda 7200.10 family
} > Device Model: ST3320620AS
} > Serial Number: 3QF08V24
} > Firmware Version: 3.AAD
} > User Capacity: 320,072,933,376 bytes
} > Device is: In smartctl database [for details use: -P show]
} > ATA Version is: 7
} > ATA Standard is: Exact ATA specification draft version not indicated
} > Local Time is: Wed Nov 11 00:03:36 2009 EST
} > SMART support is: Available - device has SMART capability.
} > SMART support is: Enabled
} >
} > === START OF READ SMART DATA SECTION ===
} > SMART overall-health self-assessment test result: PASSED
} > See vendor-specific Attribute list for marginal Attributes.
} >
} > General SMART Values:
} > Offline data collection status: (0x82) Offline data collection activity
} > was completed without error.
} > Auto Offline Data Collection:
} > Enabled.
} > Self-test execution status: ( 0) The previous self-test routine
} > completed
} > without error or no self-test has
} > ever
} > been run.
} > Total time to complete Offline
} > data collection: ( 430) seconds.
} > Offline data collection
} > capabilities: (0x5b) SMART execute Offline immediate.
} > Auto Offline data collection
} on/off
} > support.
} > Suspend Offline collection upon
} new
} > command.
} > Offline surface scan supported.
} > Self-test supported.
} > No Conveyance Self-test
} supported.
} > Selective Self-test supported.
} > SMART capabilities: (0x0003) Saves SMART data before entering
} > power-saving mode.
} > Supports SMART auto save timer.
} > Error logging capability: (0x01) Error logging supported.
} > General Purpose Logging
} supported.
} > Short self-test routine
} > recommended polling time: ( 1) minutes.
} > Extended self-test routine
} > recommended polling time: ( 115) minutes.
} >
} > SMART Attributes Data Structure revision number: 10
} > Vendor Specific SMART Attributes with Thresholds:
} > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
} UPDATED
} > WHEN_FAILED RAW_VALUE
} > 1 Raw_Read_Error_Rate 0x000f 119 090 006 Pre-fail Always
} > - 221110249
} > 3 Spin_Up_Time 0x0003 094 090 000 Pre-fail Always
} > - 0
} > 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
} > - 94
} > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
} > - 0
} > 7 Seek_Error_Rate 0x000f 081 060 030 Pre-fail Always
} > - 138219006
} > 9 Power_On_Hours 0x0032 073 073 000 Old_age Always
} > - 23917
} > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always
} > - 0
} > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always
} > - 130
} > 187 Reported_Uncorrect 0x0032 082 082 000 Old_age Always
} > - 18
} > 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
} > - 0
} > 190 Airflow_Temperature_Cel 0x0022 059 044 045 Old_age Always
} > In_the_past 41 (Lifetime Min/Max 39/45)
} > 194 Temperature_Celsius 0x0022 041 056 000 Old_age Always
} > - 41 (0 22 0 0)
} > 195 Hardware_ECC_Recovered 0x001a 066 057 000 Old_age Always
} > - 145841009
} > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
} > - 0
} > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
} Offline
} > - 0
} > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
} > - 0
} > 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age
} Offline
} > - 0
} > 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always
} > - 0
} >
} > SMART Error Log Version: 1
} > ATA Error Count: 18 (device log contains only the most recent five
} errors)
} > CR = Command Register [HEX]
} > FR = Features Register [HEX]
} > SC = Sector Count Register [HEX]
} > SN = Sector Number Register [HEX]
} > CL = Cylinder Low Register [HEX]
} > CH = Cylinder High Register [HEX]
} > DH = Device/Head Register [HEX]
} > DC = Device Command Register [HEX]
} > ER = Error register [HEX]
} > ST = Status register [HEX]
} > Powered_Up_Time is measured from power on, and printed as
} > DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
} > SS=sec, and sss=millisec. It "wraps" after 49.710 days.
} >
} > Error 18 occurred at disk power-on lifetime: 5380 hours (224 days + 4
} hours)
} > When the command that caused the error occurred, the device was active
} or
} > idle.
} >
} > After command completion occurred, registers were:
} > ER ST SC SN CL CH DH
} > -- -- -- -- -- -- --
} > 40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
} >
} > Commands leading to the command that caused the error were:
} > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
} > -- -- -- -- -- -- -- -- ---------------- --------------------
} > 25 00 00 e1 7e 09 e0 00 00:16:26.026 READ DMA EXT
} > ec 00 00 00 00 00 a0 00 00:16:26.022 IDENTIFY DEVICE
} > ef 03 46 00 00 00 a0 00 00:16:26.022 SET FEATURES [Set transfer
} > mode]
} > ec 00 00 00 00 00 a0 00 00:16:26.019 IDENTIFY DEVICE
} > 25 00 00 e1 7e 09 e0 00 00:16:24.456 READ DMA EXT
} >
} > Error 17 occurred at disk power-on lifetime: 5380 hours (224 days + 4
} hours)
} > When the command that caused the error occurred, the device was active
} or
} > idle.
} >
} > After command completion occurred, registers were:
} > ER ST SC SN CL CH DH
} > -- -- -- -- -- -- --
} > 40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
} >
} > Commands leading to the command that caused the error were:
} > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
} > -- -- -- -- -- -- -- -- ---------------- --------------------
} > 25 00 00 e1 7e 09 e0 00 00:16:21.313 READ DMA EXT
} > ec 00 00 00 00 00 a0 00 00:16:19.753 IDENTIFY DEVICE
} > ef 03 46 00 00 00 a0 00 00:16:19.749 SET FEATURES [Set transfer
} > mode]
} > ec 00 00 00 00 00 a0 00 00:16:19.749 IDENTIFY DEVICE
} > 25 00 00 e1 7e 09 e0 00 00:16:24.456 READ DMA EXT
} >
} > Error 16 occurred at disk power-on lifetime: 5380 hours (224 days + 4
} hours)
} > When the command that caused the error occurred, the device was active
} or
} > idle.
} >
} > After command completion occurred, registers were:
} > ER ST SC SN CL CH DH
} > -- -- -- -- -- -- --
} > 40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
} >
} > Commands leading to the command that caused the error were:
} > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
} > -- -- -- -- -- -- -- -- ---------------- --------------------
} > 25 00 00 e1 7e 09 e0 00 00:16:21.313 READ DMA EXT
} > ec 00 00 00 00 00 a0 00 00:16:19.753 IDENTIFY DEVICE
} > ef 03 46 00 00 00 a0 00 00:16:19.749 SET FEATURES [Set transfer
} > mode]
} > ec 00 00 00 00 00 a0 00 00:16:19.749 IDENTIFY DEVICE
} > 25 00 00 e1 7e 09 e0 00 00:16:19.745 READ DMA EXT
} >
} > Error 15 occurred at disk power-on lifetime: 5380 hours (224 days + 4
} hours)
} > When the command that caused the error occurred, the device was active
} or
} > idle.
} >
} > After command completion occurred, registers were:
} > ER ST SC SN CL CH DH
} > -- -- -- -- -- -- --
} > 40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
} >
} > Commands leading to the command that caused the error were:
} > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
} > -- -- -- -- -- -- -- -- ---------------- --------------------
} > 25 00 00 e1 7e 09 e0 00 00:16:21.313 READ DMA EXT
} > ec 00 00 00 00 00 a0 00 00:16:19.753 IDENTIFY DEVICE
} > ef 03 46 00 00 00 a0 00 00:16:19.749 SET FEATURES [Set transfer
} > mode]
} > ec 00 00 00 00 00 a0 00 00:16:19.749 IDENTIFY DEVICE
} > 25 00 00 e1 7e 09 e0 00 00:16:19.745 READ DMA EXT
} >
} > Error 14 occurred at disk power-on lifetime: 5380 hours (224 days + 4
} hours)
} > When the command that caused the error occurred, the device was active
} or
} > idle.
} >
} > After command completion occurred, registers were:
} > ER ST SC SN CL CH DH
} > -- -- -- -- -- -- --
} > 40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
} >
} > Commands leading to the command that caused the error were:
} > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
} > -- -- -- -- -- -- -- -- ---------------- --------------------
} > 25 00 00 e1 7e 09 e0 00 00:16:17.672 READ DMA EXT
} > ec 00 00 00 00 00 a0 00 00:16:19.753 IDENTIFY DEVICE
} > ef 03 46 00 00 00 a0 00 00:16:19.749 SET FEATURES [Set transfer
} > mode]
} > ec 00 00 00 00 00 a0 00 00:16:19.749 IDENTIFY DEVICE
} > 25 00 00 e1 7e 09 e0 00 00:16:19.745 READ DMA EXT
} >
} > SMART Self-test log structure revision number 1
} > Num Test_Description Status Remaining
} LifeTime(hours)
} > LBA_of_first_error
} > # 1 Extended offline Completed without error 00% 23728
} > -
} > # 2 Extended offline Completed without error 00% 22579
} > -
} > # 3 Short offline Completed without error 00% 22576
} > -
} > # 4 Extended offline Completed without error 00% 17265
} > -
} > # 5 Short offline Completed without error 00% 17257
} > -
} > # 6 Extended offline Completed without error 00% 384
} > -
} >
} > SMART Selective self-test log data structure revision number 1
} > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
} > 1 0 0 Not_testing
} > 2 0 0 Not_testing
} > 3 0 0 Not_testing
} > 4 0 0 Not_testing
} > 5 0 0 Not_testing
} > Selective self-test flags (0x0):
} > After scanning selected spans, do NOT read-scan remainder of disk.
} > If Selective self-test is pending on power-up, resume after 0 minute
} delay.
} >
} > # smartctl -a /dev/sdd
} > smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce
} > Allen
} > Home page is http://smartmontools.sourceforge.net/
} >
} > === START OF INFORMATION SECTION ===
} > Model Family: Seagate Barracuda 7200.10 family
} > Device Model: ST3320620AS
} > Serial Number: 3QF08WDP
} > Firmware Version: 3.AAD
} > User Capacity: 320,072,933,376 bytes
} > Device is: In smartctl database [for details use: -P show]
} > ATA Version is: 7
} > ATA Standard is: Exact ATA specification draft version not indicated
} > Local Time is: Wed Nov 11 00:04:04 2009 EST
} > SMART support is: Available - device has SMART capability.
} > SMART support is: Enabled
} >
} > === START OF READ SMART DATA SECTION ===
} > SMART overall-health self-assessment test result: PASSED
} >
} > General SMART Values:
} > Offline data collection status: (0x82) Offline data collection activity
} > was completed without error.
} > Auto Offline Data Collection:
} > Enabled.
} > Self-test execution status: ( 0) The previous self-test routine
} > completed
} > without error or no self-test has
} > ever
} > been run.
} > Total time to complete Offline
} > data collection: ( 430) seconds.
} > Offline data collection
} > capabilities: (0x5b) SMART execute Offline immediate.
} > Auto Offline data collection
} on/off
} > support.
} > Suspend Offline collection upon
} new
} > command.
} > Offline surface scan supported.
} > Self-test supported.
} > No Conveyance Self-test
} supported.
} > Selective Self-test supported.
} > SMART capabilities: (0x0003) Saves SMART data before entering
} > power-saving mode.
} > Supports SMART auto save timer.
} > Error logging capability: (0x01) Error logging supported.
} > General Purpose Logging
} supported.
} > Short self-test routine
} > recommended polling time: ( 1) minutes.
} > Extended self-test routine
} > recommended polling time: ( 115) minutes.
} >
} > SMART Attributes Data Structure revision number: 10
} > Vendor Specific SMART Attributes with Thresholds:
} > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
} UPDATED
} > WHEN_FAILED RAW_VALUE
} > 1 Raw_Read_Error_Rate 0x000f 110 090 006 Pre-fail Always
} > - 25809154
} > 3 Spin_Up_Time 0x0003 098 090 000 Pre-fail Always
} > - 0
} > 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
} > - 516
} > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
} > - 0
} > 7 Seek_Error_Rate 0x000f 082 060 030 Pre-fail Always
} > - 192909989
} > 9 Power_On_Hours 0x0032 073 073 000 Old_age Always
} > - 23896
} > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always
} > - 0
} > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always
} > - 777
} > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always
} > - 0
} > 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
} > - 0
} > 190 Airflow_Temperature_Cel 0x0022 061 050 045 Old_age Always
} > - 39 (Lifetime Min/Max 36/42)
} > 194 Temperature_Celsius 0x0022 039 050 000 Old_age Always
} > - 39 (0 20 0 0)
} > 195 Hardware_ECC_Recovered 0x001a 064 055 000 Old_age Always
} > - 81546876
} > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
} > - 0
} > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
} Offline
} > - 0
} > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
} > - 0
} > 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age
} Offline
} > - 0
} > 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always
} > - 0
} >
} > SMART Error Log Version: 1
} > ATA Error Count: 6 (device log contains only the most recent five
} errors)
} > CR = Command Register [HEX]
} > FR = Features Register [HEX]
} > SC = Sector Count Register [HEX]
} > SN = Sector Number Register [HEX]
} > CL = Cylinder Low Register [HEX]
} > CH = Cylinder High Register [HEX]
} > DH = Device/Head Register [HEX]
} > DC = Device Command Register [HEX]
} > ER = Error register [HEX]
} > ST = Status register [HEX]
} > Powered_Up_Time is measured from power on, and printed as
} > DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
} > SS=sec, and sss=millisec. It "wraps" after 49.710 days.
} >
} > Error 6 occurred at disk power-on lifetime: 10007 hours (416 days + 23
} > hours)
} > When the command that caused the error occurred, the device was active
} or
} > idle.
} >
} > After command completion occurred, registers were:
} > ER ST SC SN CL CH DH
} > -- -- -- -- -- -- --
} > 40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
} >
} > Commands leading to the command that caused the error were:
} > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
} > -- -- -- -- -- -- -- -- ---------------- --------------------
} > 25 00 00 8f 0c 4a e0 00 00:05:45.657 READ DMA EXT
} > 27 00 00 00 00 00 e0 00 00:05:45.654 READ NATIVE MAX ADDRESS EXT
} > ec 00 00 00 00 00 a0 00 00:05:43.727 IDENTIFY DEVICE
} > ef 03 46 00 00 00 a0 00 00:05:43.660 SET FEATURES [Set transfer
} > mode]
} > 27 00 00 00 00 00 e0 00 00:05:43.658 READ NATIVE MAX ADDRESS EXT
} >
} > Error 5 occurred at disk power-on lifetime: 10007 hours (416 days + 23
} > hours)
} > When the command that caused the error occurred, the device was active
} or
} > idle.
} >
} > After command completion occurred, registers were:
} > ER ST SC SN CL CH DH
} > -- -- -- -- -- -- --
} > 40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
} >
} > Commands leading to the command that caused the error were:
} > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
} > -- -- -- -- -- -- -- -- ---------------- --------------------
} > 25 00 00 8f 0c 4a e0 00 00:05:45.657 READ DMA EXT
} > 27 00 00 00 00 00 e0 00 00:05:45.654 READ NATIVE MAX ADDRESS EXT
} > ec 00 00 00 00 00 a0 00 00:05:43.727 IDENTIFY DEVICE
} > ef 03 46 00 00 00 a0 00 00:05:43.660 SET FEATURES [Set transfer
} > mode]
} > 27 00 00 00 00 00 e0 00 00:05:43.658 READ NATIVE MAX ADDRESS EXT
} >
} > Error 4 occurred at disk power-on lifetime: 10007 hours (416 days + 23
} > hours)
} > When the command that caused the error occurred, the device was active
} or
} > idle.
} >
} > After command completion occurred, registers were:
} > ER ST SC SN CL CH DH
} > -- -- -- -- -- -- --
} > 40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
} >
} > Commands leading to the command that caused the error were:
} > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
} > -- -- -- -- -- -- -- -- ---------------- --------------------
} > 25 00 00 8f 0c 4a e0 00 00:05:39.547 READ DMA EXT
} > 27 00 00 00 00 00 e0 00 00:05:39.544 READ NATIVE MAX ADDRESS EXT
} > ec 00 00 00 00 00 a0 00 00:05:43.727 IDENTIFY DEVICE
} > ef 03 46 00 00 00 a0 00 00:05:43.660 SET FEATURES [Set transfer
} > mode]
} > 27 00 00 00 00 00 e0 00 00:05:43.658 READ NATIVE MAX ADDRESS EXT
} >
} > Error 3 occurred at disk power-on lifetime: 10007 hours (416 days + 23
} > hours)
} > When the command that caused the error occurred, the device was active
} or
} > idle.
} >
} > After command completion occurred, registers were:
} > ER ST SC SN CL CH DH
} > -- -- -- -- -- -- --
} > 40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
} >
} > Commands leading to the command that caused the error were:
} > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
} > -- -- -- -- -- -- -- -- ---------------- --------------------
} > 25 00 00 8f 0c 4a e0 00 00:05:39.547 READ DMA EXT
} > 27 00 00 00 00 00 e0 00 00:05:39.544 READ NATIVE MAX ADDRESS EXT
} > ec 00 00 00 00 00 a0 00 00:05:39.530 IDENTIFY DEVICE
} > ef 03 46 00 00 00 a0 00 00:05:39.475 SET FEATURES [Set transfer
} > mode]
} > 27 00 00 00 00 00 e0 00 00:05:39.472 READ NATIVE MAX ADDRESS EXT
} >
} > Error 2 occurred at disk power-on lifetime: 10007 hours (416 days + 23
} > hours)
} > When the command that caused the error occurred, the device was active
} or
} > idle.
} >
} > After command completion occurred, registers were:
} > ER ST SC SN CL CH DH
} > -- -- -- -- -- -- --
} > 40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
} >
} > Commands leading to the command that caused the error were:
} > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
} > -- -- -- -- -- -- -- -- ---------------- --------------------
} > 25 00 00 8f 0c 4a e0 00 00:05:39.547 READ DMA EXT
} > 27 00 00 00 00 00 e0 00 00:05:39.544 READ NATIVE MAX ADDRESS EXT
} > ec 00 00 00 00 00 a0 00 00:05:39.530 IDENTIFY DEVICE
} > ef 03 46 00 00 00 a0 00 00:05:39.475 SET FEATURES [Set transfer
} > mode]
} > 27 00 00 00 00 00 e0 00 00:05:39.472 READ NATIVE MAX ADDRESS EXT
} >
} > SMART Self-test log structure revision number 1
} > Num Test_Description Status Remaining
} LifeTime(hours)
} > LBA_of_first_error
} > # 1 Extended offline Completed without error 00% 23707
} > -
} > # 2 Extended offline Completed without error 00% 22559
} > -
} > # 3 Short offline Completed without error 00% 22555
} > -
} > # 4 Extended offline Completed without error 00% 17248
} > -
} > # 5 Short offline Completed without error 00% 17241
} > -
} > # 6 Short offline Completed without error 00% 17241
} > -
} > # 7 Extended offline Completed without error 00% 384
} > -
} > # 8 Short offline Completed without error 00% 381
} > -
} >
} > SMART Selective self-test log data structure revision number 1
} > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
} > 1 0 0 Not_testing
} > 2 0 0 Not_testing
} > 3 0 0 Not_testing
} > 4 0 0 Not_testing
} > 5 0 0 Not_testing
} > Selective self-test flags (0x0):
} > After scanning selected spans, do NOT read-scan remainder of disk.
} > If Selective self-test is pending on power-up, resume after 0 minute
} delay.
} >
} > Thanks,
} > Guy
} >
} > --
} > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
} > the body of a message to majordomo@vger.kernel.org
} > More majordomo info at http://vger.kernel.org/majordomo-info.html
} >
}
}
}
} --
} Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RAID6 array lost a disk, can someone decode the error?
2009-11-11 5:52 ` Majed B.
2009-11-11 7:01 ` Guy Watkins
@ 2009-11-11 7:08 ` Thomas Fjellstrom
1 sibling, 0 replies; 4+ messages in thread
From: Thomas Fjellstrom @ 2009-11-11 7:08 UTC (permalink / raw)
To: Majed B.; +Cc: Guy Watkins, LinuxRaid
On Tue November 10 2009, you wrote:
> You seem to have very high numbers in Hardware_ECC_Recovered and
> Raw_Read_Error_Rate. I suggest you replace your cables.
It seems to be a standard reply for Seagate disks. ALL of mine regardless of
the cables return nonsense numbers for several SMART entries. I recall
people telling me its just something Seagate drives do.
> You don't have bad sectors, which is good.
>
> Are you using the controller for RAD or just as a way to connect your
> disks?
>
> I've had similar link-reset problems, but not written related. Turns
> out one of the disks had a bad PCB.
>
> On Wed, Nov 11, 2009 at 8:37 AM, Guy Watkins <guy@watkins-home.com> wrote:
> > I have 2 4-disk RAID6 arrays that loose a disk sometimes. Maybe once
> > every month or 3. As far as I can tell I don't have disks that have
> > un-readable blocks. The RAID1 arrays also loose disks sometimes. I
> > have the 4 disks on 1 controller, from lspci:
> > 00:0e.0 Mass storage controller: Promise Technology, Inc. PDC20318
> > (SATA150 TX4) (rev 02)
> >
> > I thought the RAID6 logic corrected single block errors? Maybe not on
> > a write? And I think this is a write because of "super_written"?
> >
> > The array is a RAID6 but the errors say RAID5?
> >
> > When I remove and add the disks back in they rebuild just fine.
> >
> > Anyway, does anyone understand what this error really is? Is it bad
> > disks? Bad cable? Bad controller? Bad sunspots? :)
> >
> > I did see that a smart test had failed at about the same time. I also
> > read that some disks or controllers can't handle smart tests. Could
> > that be it? I don't run smart tests vary often, so I know the other
> > failures from the past were not caused by a smart test. Maybe I am
> > doing the tests wrong? I used this command: "smartctl --test=long
> > /dev/sda"
> >
> > All info I think might be needed:
> >
> > The disks are all Seagate ST3320620AS (320 GB disks).
> >
> > # uname -a
> > Linux linux.watkins-home.com 2.6.27.35-170.2.94.fc10.i686 #1 SMP Thu
> > Oct 1 14:58:51 EDT 2009 i686 i686 i386 GNU/Linux
> >
> > # rpm -qa mdadm
> > mdadm-2.6.9-1.fc10.i386
> >
> > From /var/log/messages-20091108
> > Nov 1 21:48:29 linux kernel: ata4.00: exception Emask 0x10 SAct 0x0
> > SErr 0x180203 action 0x6 frozen
> > Nov 1 21:48:29 linux kernel: ata4: SError: { RecovData RecovComm
> > Persist 10B8B Dispar }
> > Nov 1 21:48:29 linux kernel: ata4.00: cmd
> > ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
> > Nov 1 21:48:29 linux kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00
> > Emask 0x14 (ATA bus error)
> > Nov 1 21:48:29 linux kernel: ata4.00: status: { DRDY }
> > Nov 1 21:48:29 linux kernel: ata4: hard resetting link
> > Nov 1 21:48:31 linux kernel: ata4: SATA link up 1.5 Gbps (SStatus 113
> > SControl 300)
> > Nov 1 21:48:31 linux kernel: ata4.00: configured for UDMA/133
> > Nov 1 21:48:31 linux kernel: ata4.00: device reported invalid CHS
> > sector 0 Nov 1 21:48:31 linux kernel: ata4: EH complete
> > Nov 1 21:48:31 linux kernel: sd 3:0:0:0: [sdb] 625142448 512-byte
> > hardware sectors (320073 MB)
> > Nov 1 21:48:31 linux kernel: end_request: I/O error, dev sdb, sector
> > 34089705
> > Nov 1 21:48:31 linux kernel: md: super_written gets error=-5,
> > uptodate=0 Nov 1 21:48:31 linux kernel: raid5: Disk failure on sdb2,
> > disabling device. Nov 1 21:48:31 linux kernel: raid5: Operation
> > continuing on 3 devices. Nov 1 21:48:31 linux kernel: sd 3:0:0:0:
> > [sdb] Write Protect is off Nov 1 21:48:31 linux kernel: sd 3:0:0:0:
> > [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or
> > FUA
> > Nov 1 21:48:31 linux kernel: RAID5 conf printout:
> > Nov 1 21:48:31 linux kernel: --- rd:4 wd:3
> > Nov 1 21:48:31 linux kernel: disk 0, o:0, dev:sdb2
> > Nov 1 21:48:31 linux kernel: disk 1, o:1, dev:sdd2
> > Nov 1 21:48:31 linux kernel: disk 2, o:1, dev:sdc2
> > Nov 1 21:48:31 linux kernel: disk 3, o:1, dev:sda2
> > Nov 1 21:48:31 linux kernel: RAID5 conf printout:
> > Nov 1 21:48:31 linux kernel: --- rd:4 wd:3
> > Nov 1 21:48:31 linux kernel: disk 1, o:1, dev:sdd2
> > Nov 1 21:48:31 linux kernel: disk 2, o:1, dev:sdc2
> > Nov 1 21:48:31 linux kernel: disk 3, o:1, dev:sda2
> >
> > # cat /proc/mdstat
> > Personalities : [raid6] [raid5] [raid4] [raid1]
> > md0 : active raid1 sdd1[0] sda1[3] sdc1[2] sdb1[1]
> > 264960 blocks [4/4] [UUUU]
> > bitmap: 0/33 pages [0KB], 4KB chunk
> >
> > md4 : active raid6 sdd4[0] sda4[3] sdc4[2] sdb4[4](F)
> > 586853888 blocks level 6, 256k chunk, algorithm 2 [4/3] [U_UU]
> > bitmap: 70/140 pages [280KB], 1024KB chunk
> >
> > md2 : active raid1 sdb3[0] sda3[1]
> > 2096384 blocks [2/2] [UU]
> > bitmap: 0/128 pages [0KB], 8KB chunk
> >
> > md1 : active raid1 sdd3[0] sdc3[1]
> > 2096384 blocks [2/2] [UU]
> > bitmap: 0/128 pages [0KB], 8KB chunk
> >
> > md3 : active raid6 sdb2[4](F) sdd2[1] sda2[3] sdc2[2]
> > 33559552 blocks level 6, 256k chunk, algorithm 2 [4/3] [_UUU]
> > bitmap: 119/129 pages [476KB], 64KB chunk
> >
> > unused devices: <none>
> >
> > # smartctl -a /dev/sda
> > smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8
> > Bruce Allen
> > Home page is http://smartmontools.sourceforge.net/
> >
> > === START OF INFORMATION SECTION ===
> > Model Family: Seagate Barracuda 7200.10 family
> > Device Model: ST3320620AS
> > Serial Number: 3QF08NDL
> > Firmware Version: 3.AAD
> > User Capacity: 320,072,933,376 bytes
> > Device is: In smartctl database [for details use: -P show]
> > ATA Version is: 7
> > ATA Standard is: Exact ATA specification draft version not indicated
> > Local Time is: Tue Nov 10 23:57:28 2009 EST
> > SMART support is: Available - device has SMART capability.
> > SMART support is: Enabled
> >
> > === START OF READ SMART DATA SECTION ===
> > SMART overall-health self-assessment test result: PASSED
> >
> > General SMART Values:
> > Offline data collection status: (0x82) Offline data collection
> > activity was completed without error. Auto Offline Data Collection:
> > Enabled.
> > Self-test execution status: ( 0) The previous self-test routine
> > completed
> > without error or no self-test
> > has ever
> > been run.
> > Total time to complete Offline
> > data collection: ( 430) seconds.
> > Offline data collection
> > capabilities: (0x5b) SMART execute Offline
> > immediate. Auto Offline data collection on/off support.
> > Suspend Offline collection upon
> > new command.
> > Offline surface scan supported.
> > Self-test supported.
> > No Conveyance Self-test
> > supported. Selective Self-test supported. SMART capabilities:
> > (0x0003) Saves SMART data before entering power-saving mode.
> > Supports SMART auto save timer.
> > Error logging capability: (0x01) Error logging supported.
> > General Purpose Logging
> > supported. Short self-test routine
> > recommended polling time: ( 1) minutes.
> > Extended self-test routine
> > recommended polling time: ( 115) minutes.
> >
> > SMART Attributes Data Structure revision number: 10
> > Vendor Specific SMART Attributes with Thresholds:
> > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
> > UPDATED WHEN_FAILED RAW_VALUE
> > 1 Raw_Read_Error_Rate 0x000f 114 097 006 Pre-fail Always
> > - 77830969
> > 3 Spin_Up_Time 0x0003 094 090 000 Pre-fail Always
> > - 0
> > 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
> > - 83
> > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
> > - 0
> > 7 Seek_Error_Rate 0x000f 081 060 030 Pre-fail Always
> > - 150227385
> > 9 Power_On_Hours 0x0032 073 073 000 Old_age Always
> > - 23919
> > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
> > Always - 0
> > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age
> > Always - 116
> > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age
> > Always - 0
> > 189 High_Fly_Writes 0x003a 100 100 000 Old_age
> > Always - 0
> > 190 Airflow_Temperature_Cel 0x0022 061 046 045 Old_age
> > Always - 39 (Lifetime Min/Max 37/43)
> > 194 Temperature_Celsius 0x0022 039 054 000 Old_age
> > Always - 39 (0 21 0 0)
> > 195 Hardware_ECC_Recovered 0x001a 065 054 000 Old_age
> > Always - 102168431
> > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age
> > Always - 0
> > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
> > Offline - 0
> > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
> > Always - 0
> > 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age
> > Offline - 0
> > 202 TA_Increase_Count 0x0032 100 253 000 Old_age
> > Always - 0
> >
> > SMART Error Log Version: 1
> > No Errors Logged
> >
> > SMART Self-test log structure revision number 1
> > Num Test_Description Status Remaining
> > LifeTime(hours) LBA_of_first_error
> > # 1 Extended offline Completed without error 00% 23730
> > -
> > # 2 Extended offline Completed without error 00% 22581
> > -
> > # 3 Short offline Completed without error 00% 22577
> > -
> > # 4 Extended offline Completed without error 00% 17267
> > -
> > # 5 Short offline Completed without error 00% 17259
> > -
> > # 6 Extended offline Completed without error 00% 384
> > -
> >
> > SMART Selective self-test log data structure revision number 1
> > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
> > 1 0 0 Not_testing
> > 2 0 0 Not_testing
> > 3 0 0 Not_testing
> > 4 0 0 Not_testing
> > 5 0 0 Not_testing
> > Selective self-test flags (0x0):
> > After scanning selected spans, do NOT read-scan remainder of disk.
> > If Selective self-test is pending on power-up, resume after 0 minute
> > delay.
> >
> > # smartctl -a /dev/sdb
> > smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8
> > Bruce Allen
> > Home page is http://smartmontools.sourceforge.net/
> >
> > === START OF INFORMATION SECTION ===
> > Model Family: Seagate Barracuda 7200.10 family
> > Device Model: ST3320620AS
> > Serial Number: 3QF08SKR
> > Firmware Version: 3.AAD
> > User Capacity: 320,072,933,376 bytes
> > Device is: In smartctl database [for details use: -P show]
> > ATA Version is: 7
> > ATA Standard is: Exact ATA specification draft version not indicated
> > Local Time is: Wed Nov 11 00:03:14 2009 EST
> > SMART support is: Available - device has SMART capability.
> > SMART support is: Enabled
> >
> > === START OF READ SMART DATA SECTION ===
> > SMART overall-health self-assessment test result: PASSED
> >
> > General SMART Values:
> > Offline data collection status: (0x82) Offline data collection
> > activity was completed without error. Auto Offline Data Collection:
> > Enabled.
> > Self-test execution status: ( 37) The self-test routine was
> > interrupted
> > by the host with a hard or soft
> > reset.
> > Total time to complete Offline
> > data collection: ( 430) seconds.
> > Offline data collection
> > capabilities: (0x5b) SMART execute Offline
> > immediate. Auto Offline data collection on/off support.
> > Suspend Offline collection upon
> > new command.
> > Offline surface scan supported.
> > Self-test supported.
> > No Conveyance Self-test
> > supported. Selective Self-test supported. SMART capabilities:
> > (0x0003) Saves SMART data before entering power-saving mode.
> > Supports SMART auto save timer.
> > Error logging capability: (0x01) Error logging supported.
> > General Purpose Logging
> > supported. Short self-test routine
> > recommended polling time: ( 1) minutes.
> > Extended self-test routine
> > recommended polling time: ( 115) minutes.
> >
> > SMART Attributes Data Structure revision number: 10
> > Vendor Specific SMART Attributes with Thresholds:
> > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
> > UPDATED WHEN_FAILED RAW_VALUE
> > 1 Raw_Read_Error_Rate 0x000f 111 091 006 Pre-fail Always
> > - 136981744
> > 3 Spin_Up_Time 0x0003 099 090 000 Pre-fail Always
> > - 0
> > 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
> > - 104
> > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
> > - 1
> > 7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always
> > - 257877357
> > 9 Power_On_Hours 0x0032 073 073 000 Old_age Always
> > - 23916
> > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
> > Always - 0
> > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age
> > Always - 157
> > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age
> > Always - 0
> > 189 High_Fly_Writes 0x003a 100 100 000 Old_age
> > Always - 0
> > 190 Airflow_Temperature_Cel 0x0022 059 049 045 Old_age
> > Always - 41 (Lifetime Min/Max 38/43)
> > 194 Temperature_Celsius 0x0022 041 051 000 Old_age
> > Always - 41 (0 21 0 0)
> > 195 Hardware_ECC_Recovered 0x001a 063 054 000 Old_age
> > Always - 160751697
> > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age
> > Always - 0
> > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
> > Offline - 0
> > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
> > Always - 0
> > 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age
> > Offline - 0
> > 202 TA_Increase_Count 0x0032 100 253 000 Old_age
> > Always - 0
> >
> > SMART Error Log Version: 1
> > No Errors Logged
> >
> > SMART Self-test log structure revision number 1
> > Num Test_Description Status Remaining
> > LifeTime(hours) LBA_of_first_error
> > # 1 Extended offline Interrupted (host reset) 50% 23726
> > -
> > # 2 Extended offline Completed without error 00% 22580
> > -
> > # 3 Short offline Completed without error 00% 22577
> > -
> > # 4 Extended offline Completed without error 00% 17267
> > -
> > # 5 Short offline Completed without error 00% 17260
> > -
> > # 6 Extended offline Completed without error 00% 384
> > -
> >
> > SMART Selective self-test log data structure revision number 1
> > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
> > 1 0 0 Not_testing
> > 2 0 0 Not_testing
> > 3 0 0 Not_testing
> > 4 0 0 Not_testing
> > 5 0 0 Not_testing
> > Selective self-test flags (0x0):
> > After scanning selected spans, do NOT read-scan remainder of disk.
> > If Selective self-test is pending on power-up, resume after 0 minute
> > delay.
> >
> > # smartctl -a /dev/sdc
> > smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8
> > Bruce Allen
> > Home page is http://smartmontools.sourceforge.net/
> >
> > === START OF INFORMATION SECTION ===
> > Model Family: Seagate Barracuda 7200.10 family
> > Device Model: ST3320620AS
> > Serial Number: 3QF08V24
> > Firmware Version: 3.AAD
> > User Capacity: 320,072,933,376 bytes
> > Device is: In smartctl database [for details use: -P show]
> > ATA Version is: 7
> > ATA Standard is: Exact ATA specification draft version not indicated
> > Local Time is: Wed Nov 11 00:03:36 2009 EST
> > SMART support is: Available - device has SMART capability.
> > SMART support is: Enabled
> >
> > === START OF READ SMART DATA SECTION ===
> > SMART overall-health self-assessment test result: PASSED
> > See vendor-specific Attribute list for marginal Attributes.
> >
> > General SMART Values:
> > Offline data collection status: (0x82) Offline data collection
> > activity was completed without error. Auto Offline Data Collection:
> > Enabled.
> > Self-test execution status: ( 0) The previous self-test routine
> > completed
> > without error or no self-test
> > has ever
> > been run.
> > Total time to complete Offline
> > data collection: ( 430) seconds.
> > Offline data collection
> > capabilities: (0x5b) SMART execute Offline
> > immediate. Auto Offline data collection on/off support.
> > Suspend Offline collection upon
> > new command.
> > Offline surface scan supported.
> > Self-test supported.
> > No Conveyance Self-test
> > supported. Selective Self-test supported. SMART capabilities:
> > (0x0003) Saves SMART data before entering power-saving mode.
> > Supports SMART auto save timer.
> > Error logging capability: (0x01) Error logging supported.
> > General Purpose Logging
> > supported. Short self-test routine
> > recommended polling time: ( 1) minutes.
> > Extended self-test routine
> > recommended polling time: ( 115) minutes.
> >
> > SMART Attributes Data Structure revision number: 10
> > Vendor Specific SMART Attributes with Thresholds:
> > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
> > UPDATED WHEN_FAILED RAW_VALUE
> > 1 Raw_Read_Error_Rate 0x000f 119 090 006 Pre-fail Always
> > - 221110249
> > 3 Spin_Up_Time 0x0003 094 090 000 Pre-fail Always
> > - 0
> > 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
> > - 94
> > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
> > - 0
> > 7 Seek_Error_Rate 0x000f 081 060 030 Pre-fail Always
> > - 138219006
> > 9 Power_On_Hours 0x0032 073 073 000 Old_age Always
> > - 23917
> > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
> > Always - 0
> > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age
> > Always - 130
> > 187 Reported_Uncorrect 0x0032 082 082 000 Old_age
> > Always - 18
> > 189 High_Fly_Writes 0x003a 100 100 000 Old_age
> > Always - 0
> > 190 Airflow_Temperature_Cel 0x0022 059 044 045 Old_age
> > Always In_the_past 41 (Lifetime Min/Max 39/45)
> > 194 Temperature_Celsius 0x0022 041 056 000 Old_age
> > Always - 41 (0 22 0 0)
> > 195 Hardware_ECC_Recovered 0x001a 066 057 000 Old_age
> > Always - 145841009
> > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age
> > Always - 0
> > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
> > Offline - 0
> > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
> > Always - 0
> > 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age
> > Offline - 0
> > 202 TA_Increase_Count 0x0032 100 253 000 Old_age
> > Always - 0
> >
> > SMART Error Log Version: 1
> > ATA Error Count: 18 (device log contains only the most recent five
> > errors) CR = Command Register [HEX]
> > FR = Features Register [HEX]
> > SC = Sector Count Register [HEX]
> > SN = Sector Number Register [HEX]
> > CL = Cylinder Low Register [HEX]
> > CH = Cylinder High Register [HEX]
> > DH = Device/Head Register [HEX]
> > DC = Device Command Register [HEX]
> > ER = Error register [HEX]
> > ST = Status register [HEX]
> > Powered_Up_Time is measured from power on, and printed as
> > DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
> > SS=sec, and sss=millisec. It "wraps" after 49.710 days.
> >
> > Error 18 occurred at disk power-on lifetime: 5380 hours (224 days + 4
> > hours) When the command that caused the error occurred, the device was
> > active or idle.
> >
> > After command completion occurred, registers were:
> > ER ST SC SN CL CH DH
> > -- -- -- -- -- -- --
> > 40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
> >
> > Commands leading to the command that caused the error were:
> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> > -- -- -- -- -- -- -- -- ---------------- --------------------
> > 25 00 00 e1 7e 09 e0 00 00:16:26.026 READ DMA EXT
> > ec 00 00 00 00 00 a0 00 00:16:26.022 IDENTIFY DEVICE
> > ef 03 46 00 00 00 a0 00 00:16:26.022 SET FEATURES [Set transfer
> > mode]
> > ec 00 00 00 00 00 a0 00 00:16:26.019 IDENTIFY DEVICE
> > 25 00 00 e1 7e 09 e0 00 00:16:24.456 READ DMA EXT
> >
> > Error 17 occurred at disk power-on lifetime: 5380 hours (224 days + 4
> > hours) When the command that caused the error occurred, the device was
> > active or idle.
> >
> > After command completion occurred, registers were:
> > ER ST SC SN CL CH DH
> > -- -- -- -- -- -- --
> > 40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
> >
> > Commands leading to the command that caused the error were:
> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> > -- -- -- -- -- -- -- -- ---------------- --------------------
> > 25 00 00 e1 7e 09 e0 00 00:16:21.313 READ DMA EXT
> > ec 00 00 00 00 00 a0 00 00:16:19.753 IDENTIFY DEVICE
> > ef 03 46 00 00 00 a0 00 00:16:19.749 SET FEATURES [Set transfer
> > mode]
> > ec 00 00 00 00 00 a0 00 00:16:19.749 IDENTIFY DEVICE
> > 25 00 00 e1 7e 09 e0 00 00:16:24.456 READ DMA EXT
> >
> > Error 16 occurred at disk power-on lifetime: 5380 hours (224 days + 4
> > hours) When the command that caused the error occurred, the device was
> > active or idle.
> >
> > After command completion occurred, registers were:
> > ER ST SC SN CL CH DH
> > -- -- -- -- -- -- --
> > 40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
> >
> > Commands leading to the command that caused the error were:
> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> > -- -- -- -- -- -- -- -- ---------------- --------------------
> > 25 00 00 e1 7e 09 e0 00 00:16:21.313 READ DMA EXT
> > ec 00 00 00 00 00 a0 00 00:16:19.753 IDENTIFY DEVICE
> > ef 03 46 00 00 00 a0 00 00:16:19.749 SET FEATURES [Set transfer
> > mode]
> > ec 00 00 00 00 00 a0 00 00:16:19.749 IDENTIFY DEVICE
> > 25 00 00 e1 7e 09 e0 00 00:16:19.745 READ DMA EXT
> >
> > Error 15 occurred at disk power-on lifetime: 5380 hours (224 days + 4
> > hours) When the command that caused the error occurred, the device was
> > active or idle.
> >
> > After command completion occurred, registers were:
> > ER ST SC SN CL CH DH
> > -- -- -- -- -- -- --
> > 40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
> >
> > Commands leading to the command that caused the error were:
> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> > -- -- -- -- -- -- -- -- ---------------- --------------------
> > 25 00 00 e1 7e 09 e0 00 00:16:21.313 READ DMA EXT
> > ec 00 00 00 00 00 a0 00 00:16:19.753 IDENTIFY DEVICE
> > ef 03 46 00 00 00 a0 00 00:16:19.749 SET FEATURES [Set transfer
> > mode]
> > ec 00 00 00 00 00 a0 00 00:16:19.749 IDENTIFY DEVICE
> > 25 00 00 e1 7e 09 e0 00 00:16:19.745 READ DMA EXT
> >
> > Error 14 occurred at disk power-on lifetime: 5380 hours (224 days + 4
> > hours) When the command that caused the error occurred, the device was
> > active or idle.
> >
> > After command completion occurred, registers were:
> > ER ST SC SN CL CH DH
> > -- -- -- -- -- -- --
> > 40 51 00 63 81 09 e0 Error: UNC at LBA = 0x00098163 = 622947
> >
> > Commands leading to the command that caused the error were:
> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> > -- -- -- -- -- -- -- -- ---------------- --------------------
> > 25 00 00 e1 7e 09 e0 00 00:16:17.672 READ DMA EXT
> > ec 00 00 00 00 00 a0 00 00:16:19.753 IDENTIFY DEVICE
> > ef 03 46 00 00 00 a0 00 00:16:19.749 SET FEATURES [Set transfer
> > mode]
> > ec 00 00 00 00 00 a0 00 00:16:19.749 IDENTIFY DEVICE
> > 25 00 00 e1 7e 09 e0 00 00:16:19.745 READ DMA EXT
> >
> > SMART Self-test log structure revision number 1
> > Num Test_Description Status Remaining
> > LifeTime(hours) LBA_of_first_error
> > # 1 Extended offline Completed without error 00% 23728
> > -
> > # 2 Extended offline Completed without error 00% 22579
> > -
> > # 3 Short offline Completed without error 00% 22576
> > -
> > # 4 Extended offline Completed without error 00% 17265
> > -
> > # 5 Short offline Completed without error 00% 17257
> > -
> > # 6 Extended offline Completed without error 00% 384
> > -
> >
> > SMART Selective self-test log data structure revision number 1
> > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
> > 1 0 0 Not_testing
> > 2 0 0 Not_testing
> > 3 0 0 Not_testing
> > 4 0 0 Not_testing
> > 5 0 0 Not_testing
> > Selective self-test flags (0x0):
> > After scanning selected spans, do NOT read-scan remainder of disk.
> > If Selective self-test is pending on power-up, resume after 0 minute
> > delay.
> >
> > # smartctl -a /dev/sdd
> > smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8
> > Bruce Allen
> > Home page is http://smartmontools.sourceforge.net/
> >
> > === START OF INFORMATION SECTION ===
> > Model Family: Seagate Barracuda 7200.10 family
> > Device Model: ST3320620AS
> > Serial Number: 3QF08WDP
> > Firmware Version: 3.AAD
> > User Capacity: 320,072,933,376 bytes
> > Device is: In smartctl database [for details use: -P show]
> > ATA Version is: 7
> > ATA Standard is: Exact ATA specification draft version not indicated
> > Local Time is: Wed Nov 11 00:04:04 2009 EST
> > SMART support is: Available - device has SMART capability.
> > SMART support is: Enabled
> >
> > === START OF READ SMART DATA SECTION ===
> > SMART overall-health self-assessment test result: PASSED
> >
> > General SMART Values:
> > Offline data collection status: (0x82) Offline data collection
> > activity was completed without error. Auto Offline Data Collection:
> > Enabled.
> > Self-test execution status: ( 0) The previous self-test routine
> > completed
> > without error or no self-test
> > has ever
> > been run.
> > Total time to complete Offline
> > data collection: ( 430) seconds.
> > Offline data collection
> > capabilities: (0x5b) SMART execute Offline
> > immediate. Auto Offline data collection on/off support.
> > Suspend Offline collection upon
> > new command.
> > Offline surface scan supported.
> > Self-test supported.
> > No Conveyance Self-test
> > supported. Selective Self-test supported. SMART capabilities:
> > (0x0003) Saves SMART data before entering power-saving mode.
> > Supports SMART auto save timer.
> > Error logging capability: (0x01) Error logging supported.
> > General Purpose Logging
> > supported. Short self-test routine
> > recommended polling time: ( 1) minutes.
> > Extended self-test routine
> > recommended polling time: ( 115) minutes.
> >
> > SMART Attributes Data Structure revision number: 10
> > Vendor Specific SMART Attributes with Thresholds:
> > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
> > UPDATED WHEN_FAILED RAW_VALUE
> > 1 Raw_Read_Error_Rate 0x000f 110 090 006 Pre-fail Always
> > - 25809154
> > 3 Spin_Up_Time 0x0003 098 090 000 Pre-fail Always
> > - 0
> > 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always
> > - 516
> > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
> > - 0
> > 7 Seek_Error_Rate 0x000f 082 060 030 Pre-fail Always
> > - 192909989
> > 9 Power_On_Hours 0x0032 073 073 000 Old_age Always
> > - 23896
> > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
> > Always - 0
> > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age
> > Always - 777
> > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age
> > Always - 0
> > 189 High_Fly_Writes 0x003a 100 100 000 Old_age
> > Always - 0
> > 190 Airflow_Temperature_Cel 0x0022 061 050 045 Old_age
> > Always - 39 (Lifetime Min/Max 36/42)
> > 194 Temperature_Celsius 0x0022 039 050 000 Old_age
> > Always - 39 (0 20 0 0)
> > 195 Hardware_ECC_Recovered 0x001a 064 055 000 Old_age
> > Always - 81546876
> > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age
> > Always - 0
> > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
> > Offline - 0
> > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
> > Always - 0
> > 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age
> > Offline - 0
> > 202 TA_Increase_Count 0x0032 100 253 000 Old_age
> > Always - 0
> >
> > SMART Error Log Version: 1
> > ATA Error Count: 6 (device log contains only the most recent five
> > errors) CR = Command Register [HEX]
> > FR = Features Register [HEX]
> > SC = Sector Count Register [HEX]
> > SN = Sector Number Register [HEX]
> > CL = Cylinder Low Register [HEX]
> > CH = Cylinder High Register [HEX]
> > DH = Device/Head Register [HEX]
> > DC = Device Command Register [HEX]
> > ER = Error register [HEX]
> > ST = Status register [HEX]
> > Powered_Up_Time is measured from power on, and printed as
> > DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
> > SS=sec, and sss=millisec. It "wraps" after 49.710 days.
> >
> > Error 6 occurred at disk power-on lifetime: 10007 hours (416 days + 23
> > hours)
> > When the command that caused the error occurred, the device was active
> > or idle.
> >
> > After command completion occurred, registers were:
> > ER ST SC SN CL CH DH
> > -- -- -- -- -- -- --
> > 40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
> >
> > Commands leading to the command that caused the error were:
> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> > -- -- -- -- -- -- -- -- ---------------- --------------------
> > 25 00 00 8f 0c 4a e0 00 00:05:45.657 READ DMA EXT
> > 27 00 00 00 00 00 e0 00 00:05:45.654 READ NATIVE MAX ADDRESS EXT
> > ec 00 00 00 00 00 a0 00 00:05:43.727 IDENTIFY DEVICE
> > ef 03 46 00 00 00 a0 00 00:05:43.660 SET FEATURES [Set transfer
> > mode]
> > 27 00 00 00 00 00 e0 00 00:05:43.658 READ NATIVE MAX ADDRESS EXT
> >
> > Error 5 occurred at disk power-on lifetime: 10007 hours (416 days + 23
> > hours)
> > When the command that caused the error occurred, the device was active
> > or idle.
> >
> > After command completion occurred, registers were:
> > ER ST SC SN CL CH DH
> > -- -- -- -- -- -- --
> > 40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
> >
> > Commands leading to the command that caused the error were:
> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> > -- -- -- -- -- -- -- -- ---------------- --------------------
> > 25 00 00 8f 0c 4a e0 00 00:05:45.657 READ DMA EXT
> > 27 00 00 00 00 00 e0 00 00:05:45.654 READ NATIVE MAX ADDRESS EXT
> > ec 00 00 00 00 00 a0 00 00:05:43.727 IDENTIFY DEVICE
> > ef 03 46 00 00 00 a0 00 00:05:43.660 SET FEATURES [Set transfer
> > mode]
> > 27 00 00 00 00 00 e0 00 00:05:43.658 READ NATIVE MAX ADDRESS EXT
> >
> > Error 4 occurred at disk power-on lifetime: 10007 hours (416 days + 23
> > hours)
> > When the command that caused the error occurred, the device was active
> > or idle.
> >
> > After command completion occurred, registers were:
> > ER ST SC SN CL CH DH
> > -- -- -- -- -- -- --
> > 40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
> >
> > Commands leading to the command that caused the error were:
> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> > -- -- -- -- -- -- -- -- ---------------- --------------------
> > 25 00 00 8f 0c 4a e0 00 00:05:39.547 READ DMA EXT
> > 27 00 00 00 00 00 e0 00 00:05:39.544 READ NATIVE MAX ADDRESS EXT
> > ec 00 00 00 00 00 a0 00 00:05:43.727 IDENTIFY DEVICE
> > ef 03 46 00 00 00 a0 00 00:05:43.660 SET FEATURES [Set transfer
> > mode]
> > 27 00 00 00 00 00 e0 00 00:05:43.658 READ NATIVE MAX ADDRESS EXT
> >
> > Error 3 occurred at disk power-on lifetime: 10007 hours (416 days + 23
> > hours)
> > When the command that caused the error occurred, the device was active
> > or idle.
> >
> > After command completion occurred, registers were:
> > ER ST SC SN CL CH DH
> > -- -- -- -- -- -- --
> > 40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
> >
> > Commands leading to the command that caused the error were:
> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> > -- -- -- -- -- -- -- -- ---------------- --------------------
> > 25 00 00 8f 0c 4a e0 00 00:05:39.547 READ DMA EXT
> > 27 00 00 00 00 00 e0 00 00:05:39.544 READ NATIVE MAX ADDRESS EXT
> > ec 00 00 00 00 00 a0 00 00:05:39.530 IDENTIFY DEVICE
> > ef 03 46 00 00 00 a0 00 00:05:39.475 SET FEATURES [Set transfer
> > mode]
> > 27 00 00 00 00 00 e0 00 00:05:39.472 READ NATIVE MAX ADDRESS EXT
> >
> > Error 2 occurred at disk power-on lifetime: 10007 hours (416 days + 23
> > hours)
> > When the command that caused the error occurred, the device was active
> > or idle.
> >
> > After command completion occurred, registers were:
> > ER ST SC SN CL CH DH
> > -- -- -- -- -- -- --
> > 40 51 00 a5 0d 4a e0 Error: UNC at LBA = 0x004a0da5 = 4853157
> >
> > Commands leading to the command that caused the error were:
> > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
> > -- -- -- -- -- -- -- -- ---------------- --------------------
> > 25 00 00 8f 0c 4a e0 00 00:05:39.547 READ DMA EXT
> > 27 00 00 00 00 00 e0 00 00:05:39.544 READ NATIVE MAX ADDRESS EXT
> > ec 00 00 00 00 00 a0 00 00:05:39.530 IDENTIFY DEVICE
> > ef 03 46 00 00 00 a0 00 00:05:39.475 SET FEATURES [Set transfer
> > mode]
> > 27 00 00 00 00 00 e0 00 00:05:39.472 READ NATIVE MAX ADDRESS EXT
> >
> > SMART Self-test log structure revision number 1
> > Num Test_Description Status Remaining
> > LifeTime(hours) LBA_of_first_error
> > # 1 Extended offline Completed without error 00% 23707
> > -
> > # 2 Extended offline Completed without error 00% 22559
> > -
> > # 3 Short offline Completed without error 00% 22555
> > -
> > # 4 Extended offline Completed without error 00% 17248
> > -
> > # 5 Short offline Completed without error 00% 17241
> > -
> > # 6 Short offline Completed without error 00% 17241
> > -
> > # 7 Extended offline Completed without error 00% 384
> > -
> > # 8 Short offline Completed without error 00% 381
> > -
> >
> > SMART Selective self-test log data structure revision number 1
> > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
> > 1 0 0 Not_testing
> > 2 0 0 Not_testing
> > 3 0 0 Not_testing
> > 4 0 0 Not_testing
> > 5 0 0 Not_testing
> > Selective self-test flags (0x0):
> > After scanning selected spans, do NOT read-scan remainder of disk.
> > If Selective self-test is pending on power-up, resume after 0 minute
> > delay.
> >
> > Thanks,
> > Guy
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid"
> > in the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Thomas Fjellstrom
tfjellstrom@shaw.ca
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-11-11 7:08 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-11 5:37 RAID6 array lost a disk, can someone decode the error? Guy Watkins
2009-11-11 5:52 ` Majed B.
2009-11-11 7:01 ` Guy Watkins
2009-11-11 7:08 ` Thomas Fjellstrom
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.