* Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
@ 2016-08-22 21:51 Ben Kamen
2016-08-22 23:06 ` Adam Goryachev
0 siblings, 1 reply; 25+ messages in thread
From: Ben Kamen @ 2016-08-22 21:51 UTC (permalink / raw)
To: linux-raid
Hey all. I'm looking at the RAID Wiki and need some help.
First Info:
I have a RAID5 with 4 members /dev/sd[cdef]1 where last night, sdc1
reported a smart error recommended drive replacement (after watching
sector errors pile up for about a week.)
no problem. shut down the drive, pulled it, replace it with a cold
spare. Started the rebuild (around midnight CDT).
At 5:43am, I got this message:
This is an automatically generated mail message from mdadm
running on quantum
A Fail event had been detected on md device /dev/md127.
It could be related to component device /dev/sde1.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sda2[0] sdb2[2]
511988 blocks super 1.0 [2/2] [UU]
md127 : active raid5 sdc1[4] sdf1[6] sde1[1](F) sdd1[5]
2930276352 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/2] [U_U_]
[===========>.........] recovery = 55.9% (546131076/976758784)
finish=381.6min speed=18805K/sec
bitmap: 4/8 pages [16KB], 65536KB chunk
md1 : active raid1 sda3[0] sdb3[2]
239489916 blocks super 1.1 [2/2] [UU]
bitmap: 2/2 pages [8KB], 65536KB chunk
md10 : active raid1 sda1[0] sdb1[2]
4193272 blocks super 1.1 [2/2] [UU]
unused devices: <none>
/dev/md127 is the one with issues.
It looks like the SATA controller had issues. I couldn't see sde - so
I rebooted. (scold me later.)
All the drives are available. SMARTCTL tells me /dev/sde is happy as
can be (has a few bad sectors and is slated for replacement next, but
smart says drive is healthy).
I looked at the raid Wiki - and saved the mdadm --examine info. Of the
active members, the event count is off by 25 for happy vs unhappy
members.
But forcing the assembly claims
mdadm --assemble --force /dev/md127 /dev/sd[cdef]1
mdadm: /dev/sdc1 is busy - skipping
mdadm: /dev/sdd1 is busy - skipping
mdadm: /dev/sdf1 is busy - skipping
mdadm: Found some drive for an array that is already active: /dev/md/:BigRAID
mdadm: giving up.
So before I mess up ANYTHING else...
What should I be doing?
(should I be stopping the RAID as right now it's seems like it's running)
Thanks,
-Ben
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
2016-08-22 21:51 Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive) Ben Kamen
@ 2016-08-22 23:06 ` Adam Goryachev
2016-08-23 11:36 ` Wols Lists
[not found] ` <CADDTLRBf9NPO6OuF4a3b+xffZgeZRqHRG+pJdPmbc9-Jat0HVQ@mail.gmail.com>
0 siblings, 2 replies; 25+ messages in thread
From: Adam Goryachev @ 2016-08-22 23:06 UTC (permalink / raw)
To: Ben Kamen, linux-raid
On 23/08/16 07:51, Ben Kamen wrote:
> Hey all. I'm looking at the RAID Wiki and need some help.
>
> First Info:
>
> I have a RAID5 with 4 members /dev/sd[cdef]1 where last night, sdc1
> reported a smart error recommended drive replacement (after watching
> sector errors pile up for about a week.)
>
> no problem. shut down the drive, pulled it, replace it with a cold
> spare. Started the rebuild (around midnight CDT).
>
> At 5:43am, I got this message:
>
> This is an automatically generated mail message from mdadm
> running on quantum
>
> A Fail event had been detected on md device /dev/md127.
>
> It could be related to component device /dev/sde1.
>
> Faithfully yours, etc.
>
> P.S. The /proc/mdstat file currently contains the following:
>
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md0 : active raid1 sda2[0] sdb2[2]
> 511988 blocks super 1.0 [2/2] [UU]
>
> md127 : active raid5 sdc1[4] sdf1[6] sde1[1](F) sdd1[5]
> 2930276352 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/2] [U_U_]
> [===========>.........] recovery = 55.9% (546131076/976758784)
> finish=381.6min speed=18805K/sec
> bitmap: 4/8 pages [16KB], 65536KB chunk
>
> md1 : active raid1 sda3[0] sdb3[2]
> 239489916 blocks super 1.1 [2/2] [UU]
> bitmap: 2/2 pages [8KB], 65536KB chunk
>
> md10 : active raid1 sda1[0] sdb1[2]
> 4193272 blocks super 1.1 [2/2] [UU]
>
> unused devices: <none>
>
> /dev/md127 is the one with issues.
>
> It looks like the SATA controller had issues. I couldn't see sde - so
> I rebooted. (scold me later.)
>
> All the drives are available. SMARTCTL tells me /dev/sde is happy as
> can be (has a few bad sectors and is slated for replacement next, but
> smart says drive is healthy).
>
> I looked at the raid Wiki - and saved the mdadm --examine info. Of the
> active members, the event count is off by 25 for happy vs unhappy
> members.
>
> But forcing the assembly claims
>
> mdadm --assemble --force /dev/md127 /dev/sd[cdef]1
> mdadm: /dev/sdc1 is busy - skipping
> mdadm: /dev/sdd1 is busy - skipping
> mdadm: /dev/sdf1 is busy - skipping
> mdadm: Found some drive for an array that is already active: /dev/md/:BigRAID
> mdadm: giving up.
>
> So before I mess up ANYTHING else...
>
> What should I be doing?
>
> (should I be stopping the RAID as right now it's seems like it's running)
>
> Thanks,
>
First step, if the raid is running, then do a backup.
Second step, read all about SCT/ERC, and almost certainly fix the issues
with your drives (either enable SCT/ERC on the drive or set the timeout
appropriately).
Third step, make sure your backup is up to date
Fourth step, provide the current output of the raid array, is it
resyncing, is the resync pending, is it finished, etc...
If it's finished, then don't replace the next drive in the same way, use
the replace method instead. That will keep redundancy in the array
during the replacement, and hopefully avoid this sort of issue.
Later, you might consider moving to RAID6 to add some additional
redundancy instead of using a cold spare.
I hope the above is helpful, but really we will need more information
about your drives before being able to make further suggestions. output
of lsdrv (google it), smartctl, mdadm --misc --detail /dev/md127 would
all be helpful.
Regards,
Adam
--
Adam Goryachev Website Managers www.websitemanagers.com.au
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
2016-08-22 23:06 ` Adam Goryachev
@ 2016-08-23 11:36 ` Wols Lists
2016-08-23 15:44 ` Ben
[not found] ` <CADDTLRBf9NPO6OuF4a3b+xffZgeZRqHRG+pJdPmbc9-Jat0HVQ@mail.gmail.com>
1 sibling, 1 reply; 25+ messages in thread
From: Wols Lists @ 2016-08-23 11:36 UTC (permalink / raw)
To: Adam Goryachev, Ben Kamen, linux-raid
On 23/08/16 00:06, Adam Goryachev wrote:
> I hope the above is helpful, but really we will need more information
> about your drives before being able to make further suggestions. output
> of lsdrv (google it), smartctl, mdadm --misc --detail /dev/md127 would
> all be helpful.
And while it's probably too late now, read up on mdadm --replace. If
you've got the spare slots, it's much better/safer than physically
pulling a dodgy disk and replacing it.
NB - get the data Adam asked for - and the output of "mdadm --examine
..." and "mdadm --display ..." might well be useful (or might have been
included elsewhere).
Cheers,
Wol
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
2016-08-23 11:36 ` Wols Lists
@ 2016-08-23 15:44 ` Ben
0 siblings, 0 replies; 25+ messages in thread
From: Ben @ 2016-08-23 15:44 UTC (permalink / raw)
To: linux-raid
On 8/23/2016 6:36 AM, Wols Lists wrote:
> On 23/08/16 00:06, Adam Goryachev wrote:
>
> And while it's probably too late now, read up on mdadm --replace. If
> you've got the spare slots, it's much better/safer than physically
> pulling a dodgy disk and replacing it.
>
> NB - get the data Adam asked for - and the output of "mdadm --examine
> ..." and "mdadm --display ..." might well be useful (or might have been
> included elsewhere).
hi there!
Thanks -- Adam mentioned and yea, it's too late but I have it for next time.
the vast bulk of the data on the array is duplicated to another NAS -- so it's not the end of the world.
Adam helped me get the array back online so I can do some things to it (like some 'nice to have' files).. it's staying reasonably in sync when it craps out...
so hopefully soon I'll have it resolved.
but will probably switch to a RAID6 soon down the road.
Thanks for the help,
-Ben
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
[not found] ` <933228e0-bce4-ffad-f48d-034bf89bc07f@websitemanagers.com.au>
@ 2016-08-26 1:20 ` Ben
2016-08-26 2:22 ` Phil Turmel
2016-08-26 18:07 ` Wols Lists
0 siblings, 2 replies; 25+ messages in thread
From: Ben @ 2016-08-26 1:20 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2646 bytes --]
As an update,
Adam's been helping me out (and I'm not used to hitting "reply-all" for mailing lists as pretty much all the ones I'm on set the "reply-to:")
I've turned on sct/erc for the drives... and the one that went bonkers during the rebuild (sde) still would have read issues during a rebuild.
SMART reports it's ok. but.. (shrug) I ended up running ddrescue to the new replacement drive (sdc) that kept getting put back into spare status when the rebuilds would fail.
So I just copied sde -> sdc which went pretty much flawlessly (ddrescue completed without any final complaints)
I also played with badblocks after doing my copy and could find bad blocks -- but apparnently ddrescue had no issues.
So - I went back to
*bringing up the array. No problems.
* adding ANOTHER new drive (that I ordered Sunday night) and it rebuilt fine.
* doing an FSCK -n first which reported no issues - so I did a regular fsck (without -y) and it never prompted me for anything.
My last step is to run rsync -n from my backup to see if it can find any issues between my last backup and the current data for any files with byte oddities.
All this has me wonder if those old bad sectors left some files with a sector of garbage in them or not.
Adam seems to think everything is fine -- so far, that seems to be the case.
A last few questions I have are:
The new drive I got was (supposed to be) the same model as the last Seagate I ordered, but SMART reports them differently. (see attached)
The question on the new drive is that it says it does offline collection... but with gsmartcontrol, I can't seem to turn it on.
This new drive also doesn't seem to support SCT/ERC the same way.
Again,
/dev/sdc - old new spare (bought after seagate bought Samsung and discontinued the HD103SJ model)
/dev/sdd - original RAID member
/dev/sde - brand spanking new drive purchased Sunday.
/dev/sdf - original RAID member
I realize now one says: ST1000DM005 vs ST1000DM003 - Grrr!!!
So I'd like recommendations on whether I should get better matching drives (I can use these elsewhere) or it doesn't matter.
Can I mix/match this array with WD REDs? (and eventually retire all these HD103SJ drives) Do people even like these? They seem ok?
I read a lot of conflicting info on SCT/ERC online (well, TLER anyway) -- Adam likes it enabled. What say the rest of you?
And last -- any caveats as to upgrading this array to RAID6 from RAID5? Can I even do that while in place?
Thanks all, (especially Adam!)
-Ben
p.s. Check out some of the SMART parms on the /dev/sde. Head flying hours?? And they're not zero. Weird. :/ This drive kinda creeps me out.
[-- Attachment #2: RAID.smart-info.txt --]
[-- Type: text/plain, Size: 20353 bytes --]
[root@quantum ~]# smartctl -a /dev/sdc
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.el6.centos.plus.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: ST1000DM005 HD103SJ
Serial Number: S246JQ0D800949
LU WWN Device Id: 5 0000f0 080bb4909
Firmware Version: 1AJ10001
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 6
Local Time is: Thu Aug 25 20:04:06 2016 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 9120) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 152) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 0
2 Throughput_Performance 0x0026 054 054 000 Old_age Always - 8630
3 Spin_Up_Time 0x0023 076 071 025 Pre-fail Always - 7526
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 11
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 133
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 14
191 G-Sense_Error_Rate 0x0022 252 252 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 064 063 000 Old_age Always - 30 (Min/Max 21/37)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 100 100 000 Old_age Always - 10
200 Multi_Zone_Error_Rate 0x002a 100 096 000 Old_age Always - 558
223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 14
========================================================================================================================
[root@quantum ~]# smartctl -a /dev/sdd
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.el6.centos.plus.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: SAMSUNG SpinPoint F3
Device Model: SAMSUNG HD103SJ
Serial Number: S246J9AB404176
LU WWN Device Id: 5 0024e9 204fbf695
Firmware Version: 1AJ10001
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 6
Local Time is: Thu Aug 25 20:05:32 2016 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 9180) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 153) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 195
2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0
3 Spin_Up_Time 0x0023 073 070 025 Pre-fail Always - 8310
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 58
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 37763
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 75
191 G-Sense_Error_Rate 0x0022 252 252 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 064 062 000 Old_age Always - 31 (Min/Max 20/43)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 100 100 000 Old_age Always - 8
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 146
223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 77
========================================================================================================================
[root@quantum ~]# smartctl -a /dev/sde
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.el6.centos.plus.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda (SATA 3Gb/s, 4K Sectors)
Device Model: ST1000DM003-1ER162
Serial Number: Z4YDLXWJ
LU WWN Device Id: 5 000c50 091877801
Firmware Version: CC45
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ACS-2 (unknown minor revision code: 0x001f)
Local Time is: Thu Aug 25 20:06:33 2016 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 80) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 105) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x1085) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 108 100 006 Pre-fail Always - 18255632
3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 2
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 269743
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 9
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 2
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 099 099 000 Old_age Always - 1
190 Airflow_Temperature_Cel 0x0022 071 068 045 Old_age Always - 29 (Min/Max 26/32)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 21
194 Temperature_Celsius 0x0022 029 040 000 Old_age Always - 29 (0 25 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 109964047679495
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 3907074414
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 5102115
========================================================================================================================
[root@quantum ~]# smartctl -a /dev/sdf
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.el6.centos.plus.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: SAMSUNG SpinPoint F3
Device Model: SAMSUNG HD103SJ
Serial Number: S246J9AB404174
LU WWN Device Id: 5 0024e9 204fbf676
Firmware Version: 1AJ10001
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 6
Local Time is: Thu Aug 25 20:07:19 2016 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 9360) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 156) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 353
2 Throughput_Performance 0x0026 055 055 000 Old_age Always - 8559
3 Spin_Up_Time 0x0023 073 069 025 Pre-fail Always - 8389
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 74
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 43724
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 92
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 1
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 064 063 000 Old_age Always - 30 (Min/Max 15/40)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 100 100 000 Old_age Always - 91
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 229
223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 100
========================================================================================================================
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
2016-08-26 1:20 ` Ben
@ 2016-08-26 2:22 ` Phil Turmel
2016-08-26 2:54 ` Benjammin2068
2016-08-26 18:07 ` Wols Lists
1 sibling, 1 reply; 25+ messages in thread
From: Phil Turmel @ 2016-08-26 2:22 UTC (permalink / raw)
To: Ben, linux-raid
On 08/25/2016 09:20 PM, Ben wrote:
> I read a lot of conflicting info on SCT/ERC online (well, TLER anyway)
> -- Adam likes it enabled. What say the rest of you?
Adam is correct, and it's not a matter of "like". You either must have
it enabled, or you *must* apply the kernel driver timeout work-around
(180 seconds) for that drive. Failure to do so results in crashed arrays.
Enterprise and NAS drives work out of the box. Desktop/green drives do not.
Some reading assignments from old discussions (read whole threads if you
have time):
http://marc.info/?l=linux-raid&m=139050322510249&w=2
http://marc.info/?l=linux-raid&m=135863964624202&w=2
http://marc.info/?l=linux-raid&m=135811522817345&w=1
http://marc.info/?l=linux-raid&m=133761065622164&w=2
http://marc.info/?l=linux-raid&m=132477199207506
http://marc.info/?l=linux-raid&m=133665797115876&w=2
http://marc.info/?l=linux-raid&m=142487508806844&w=3
http://marc.info/?l=linux-raid&m=144535576302583&w=2
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
2016-08-26 2:22 ` Phil Turmel
@ 2016-08-26 2:54 ` Benjammin2068
2016-08-26 12:38 ` Phil Turmel
0 siblings, 1 reply; 25+ messages in thread
From: Benjammin2068 @ 2016-08-26 2:54 UTC (permalink / raw)
To: linux-raid
On 08/25/2016 09:22 PM, Phil Turmel wrote:
> On 08/25/2016 09:20 PM, Ben wrote:
>
>> I read a lot of conflicting info on SCT/ERC online (well, TLER anyway)
>> -- Adam likes it enabled. What say the rest of you?
> Adam is correct, and it's not a matter of "like".
"like" was just an expression.
>
>
> You either must have
> it enabled, or you *must* apply the kernel driver timeout work-around
> (180 seconds) for that drive. Failure to do so results in crashed arrays.
For the ST1000DM003, its SMART capabilities states "SCT Status Supported" -- What does that mean in comparison with the other HD103SJ drives?
It does SCT but doesn't let the user control it or it doesn't do it at all?
(smartctl -l scterc /dev/sde yields a message that implies control is not supported)
>
> Enterprise and NAS drives work out of the box. Desktop/green drives do not.
Yea - I didn't buy any green drives (purposefully anyway) for this system.
>
> Some reading assignments from old discussions (read whole threads if you
> have time):
>
> http://marc.info/?l=linux-raid&m=139050322510249&w=2
> http://marc.info/?l=linux-raid&m=135863964624202&w=2
> http://marc.info/?l=linux-raid&m=135811522817345&w=1
> http://marc.info/?l=linux-raid&m=133761065622164&w=2
> http://marc.info/?l=linux-raid&m=132477199207506
> http://marc.info/?l=linux-raid&m=133665797115876&w=2
> http://marc.info/?l=linux-raid&m=142487508806844&w=3
> http://marc.info/?l=linux-raid&m=144535576302583&w=2
>
Thanks, will go read.
-Ben
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
2016-08-26 2:54 ` Benjammin2068
@ 2016-08-26 12:38 ` Phil Turmel
0 siblings, 0 replies; 25+ messages in thread
From: Phil Turmel @ 2016-08-26 12:38 UTC (permalink / raw)
To: Benjammin2068, linux-raid
On 08/25/2016 10:54 PM, Benjammin2068 wrote:
>> You either must have
>> it enabled, or you *must* apply the kernel driver timeout work-around
>> (180 seconds) for that drive. Failure to do so results in crashed arrays.
>
> For the ST1000DM003, its SMART capabilities states "SCT Status Supported" -- What does that mean in comparison with the other HD103SJ drives?
>
> It does SCT but doesn't let the user control it or it doesn't do it at all?
ERC is a feature within the SCT standard. For modern hard drives,
claiming "SCT" support is comparable to a bottled water supplier
advertising that their product is wet.
> (smartctl -l scterc /dev/sde yields a message that implies control is not supported)
ERC on the other hand is a valuable feature that modern drive
manufacturers make you pay extra for.
>> Enterprise and NAS drives work out of the box. Desktop/green drives do not.
>
> Yea - I didn't buy any green drives (purposefully anyway) for this system.
I originally wrote that sentence as "Desktop drives do not." I added
"/green" to clarify that some non-enterprise, non-NAS drives aren't
marketed as desktop drives, but still lack ERC functionality.
Your ST1000DM003 is marketed as a desktop drive. Seagate's product page
for this model has links to other models for specialty use cases,
including NAS.
>> Some reading assignments from old discussions (read whole threads if you
>> have time):
>>
>> http://marc.info/?l=linux-raid&m=139050322510249&w=2
>> http://marc.info/?l=linux-raid&m=135863964624202&w=2
>> http://marc.info/?l=linux-raid&m=135811522817345&w=1
>> http://marc.info/?l=linux-raid&m=133761065622164&w=2
>> http://marc.info/?l=linux-raid&m=132477199207506
>> http://marc.info/?l=linux-raid&m=133665797115876&w=2
>> http://marc.info/?l=linux-raid&m=142487508806844&w=3
>> http://marc.info/?l=linux-raid&m=144535576302583&w=2
>
> Thanks, will go read.
You will find detailed explanations for my comments above in these old
threads.
Phil
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
2016-08-26 1:20 ` Ben
2016-08-26 2:22 ` Phil Turmel
@ 2016-08-26 18:07 ` Wols Lists
2016-08-28 18:29 ` Benjammin2068
1 sibling, 1 reply; 25+ messages in thread
From: Wols Lists @ 2016-08-26 18:07 UTC (permalink / raw)
To: Ben, linux-raid
On 26/08/16 02:20, Ben wrote:
> [root@quantum ~]# smartctl -a /dev/sde
> smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.el6.centos.plus.x86_64] (local build)
> Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF INFORMATION SECTION ===
> Model Family: Seagate Barracuda (SATA 3Gb/s, 4K Sectors)
> Device Model: ST1000DM003-1ER162
> Serial Number: Z4YDLXWJ
> LU WWN Device Id: 5 000c50 091877801
> Firmware Version: CC45
> User Capacity: 1,000,204,886,016 bytes [1.00 TB]
> Sector Sizes: 512 bytes logical, 4096 bytes physical
> Device is: In smartctl database [for details use: -P show]
Sorry Ben - that drive was NOT a smart buy !!! Seagate Barracuda :-(
You MUST enable the timeout on this drive :-(
Gut feel tells me most 1TB or less drives are okay in a raid - the
Barracudas are an exception :-( I've got two 3TB Barracudas mirrored,
and from reading the list, there's no way I'd go raid5 for more capacity
without ditching them.
Most people seem to get WD Reds - I've asked about Seagate NAS but I've
not picked up on any reports about them - good or bad. Barracudas - the
news is pretty much all bad :-(
Cheers,
Wol
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
2016-08-26 18:07 ` Wols Lists
@ 2016-08-28 18:29 ` Benjammin2068
2016-08-28 19:20 ` Anthony Youngman
2016-08-28 23:54 ` Adam Goryachev
0 siblings, 2 replies; 25+ messages in thread
From: Benjammin2068 @ 2016-08-28 18:29 UTC (permalink / raw)
To: linux-raid
On 08/26/2016 01:07 PM, Wols Lists wrote:
> On 26/08/16 02:20, Ben wrote:
>> [root@quantum ~]# smartctl -a /dev/sde
>> smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.el6.centos.plus.x86_64] (local build)
>> Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
>>
>> === START OF INFORMATION SECTION ===
>> Model Family: Seagate Barracuda (SATA 3Gb/s, 4K Sectors)
>> Device Model: ST1000DM003-1ER162
>> Serial Number: Z4YDLXWJ
>> LU WWN Device Id: 5 000c50 091877801
>> Firmware Version: CC45
>> User Capacity: 1,000,204,886,016 bytes [1.00 TB]
>> Sector Sizes: 512 bytes logical, 4096 bytes physical
>> Device is: In smartctl database [for details use: -P show]
> Sorry Ben - that drive was NOT a smart buy !!! Seagate Barracuda :-(
>
> You MUST enable the timeout on this drive :-(
>
> Gut feel tells me most 1TB or less drives are okay in a raid - the
> Barracudas are an exception :-( I've got two 3TB Barracudas mirrored,
> and from reading the list, there's no way I'd go raid5 for more capacity
> without ditching them.
>
> Most people seem to get WD Reds - I've asked about Seagate NAS but I've
> not picked up on any reports about them - good or bad. Barracudas - the
> news is pretty much all bad :-(
>
>
Yea, I figured that out -- just couldn't find a decent detailed reference with what "SCT status supported" means versus the more fully featured.
And this drive (sort of - but not this sub model -- and that's the replacement that Seagate recommended.) is not going to stay in the array.
I'm going to get some more WD red's (or decent NAS friendly mechs) and pull this puppy out of the stack and use it elsewhere.
Thanks for the confirmations!
-Ben
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
2016-08-28 18:29 ` Benjammin2068
@ 2016-08-28 19:20 ` Anthony Youngman
2016-08-29 1:23 ` Benjammin2068
2016-08-28 23:54 ` Adam Goryachev
1 sibling, 1 reply; 25+ messages in thread
From: Anthony Youngman @ 2016-08-28 19:20 UTC (permalink / raw)
To: Benjammin2068, linux-raid
On 28/08/16 19:29, Benjammin2068 wrote:
> And this drive (sort of - but not this sub model -- and that's the replacement that Seagate recommended.) is not going to stay in the array.
If they knew you were using it in a raid, and recommended it, then I
don't know about your laws but over here in the UK I'd send it back as
"unfit for purpose". Under SOGA (Sale Of Goods Act) they've sold you a
pup and it's their problem, not yours.
(UK law assumes the salesman knows more than you, and so long as you
tell them what you want, that forms part of the contract. Which means if
they sell you something that does not meet the requirements you told
them, they have to put matters right - either swap the drive for
something that is suitable, or give you a refund. They can charge the
difference if "suitable" means a more expensive drive, but a lot of UK
shops would swallow the loss if they had recommended the wrong drive.)
Cheers,
Wol
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
2016-08-28 18:29 ` Benjammin2068
2016-08-28 19:20 ` Anthony Youngman
@ 2016-08-28 23:54 ` Adam Goryachev
2016-08-29 1:25 ` Benjammin2068
1 sibling, 1 reply; 25+ messages in thread
From: Adam Goryachev @ 2016-08-28 23:54 UTC (permalink / raw)
To: Benjammin2068, linux-raid
On 29/08/16 04:29, Benjammin2068 wrote:
>
> On 08/26/2016 01:07 PM, Wols Lists wrote:
>> On 26/08/16 02:20, Ben wrote:
>>> [root@quantum ~]# smartctl -a /dev/sde
>>> smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.el6.centos.plus.x86_64] (local build)
>>> Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
>>>
>>> === START OF INFORMATION SECTION ===
>>> Model Family: Seagate Barracuda (SATA 3Gb/s, 4K Sectors)
>>> Device Model: ST1000DM003-1ER162
>>> Serial Number: Z4YDLXWJ
>>> LU WWN Device Id: 5 000c50 091877801
>>> Firmware Version: CC45
>>> User Capacity: 1,000,204,886,016 bytes [1.00 TB]
>>> Sector Sizes: 512 bytes logical, 4096 bytes physical
>>> Device is: In smartctl database [for details use: -P show]
>> Sorry Ben - that drive was NOT a smart buy !!! Seagate Barracuda :-(
>>
>> You MUST enable the timeout on this drive :-(
>>
>> Gut feel tells me most 1TB or less drives are okay in a raid - the
>> Barracudas are an exception :-( I've got two 3TB Barracudas mirrored,
>> and from reading the list, there's no way I'd go raid5 for more capacity
>> without ditching them.
>>
>> Most people seem to get WD Reds - I've asked about Seagate NAS but I've
>> not picked up on any reports about them - good or bad. Barracudas - the
>> news is pretty much all bad :-(
>>
>>
> Yea, I figured that out -- just couldn't find a decent detailed reference with what "SCT status supported" means versus the more fully featured.
When I saw this, I assume it means you can ask for the status, and it
will tell you it is disabled, but there is no support to modify the
status (ie, turn it on). Totally useless for all intents and purposes....
Then again, I could be wrong... but compared to your other drive which
showed additional supports, or on my one here:
SCT capabilities: (0x0039) SCT Status supported.
SCT Error Recovery Control
supported.
SCT Feature Control supported.
SCT Data Table supported.
ie, the second one is probably what you want, the third allows you to
turn it on/off, and no idea about the last option....
Regards,
Adam
--
Adam Goryachev Website Managers www.websitemanagers.com.au
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
2016-08-28 19:20 ` Anthony Youngman
@ 2016-08-29 1:23 ` Benjammin2068
0 siblings, 0 replies; 25+ messages in thread
From: Benjammin2068 @ 2016-08-29 1:23 UTC (permalink / raw)
To: linux-raid
On 08/28/2016 02:20 PM, Anthony Youngman wrote:
> On 28/08/16 19:29, Benjammin2068 wrote:
>> And this drive (sort of - but not this sub model -- and that's the replacement that Seagate recommended.) is not going to stay in the array.
>
> If they knew you were using it in a raid, and recommended it, then I don't know about your laws but over here in the UK I'd send it back as "unfit for purpose". Under SOGA (Sale Of Goods Act) they've sold you a pup and it's their problem, not yours.
>
> (UK law assumes the salesman knows more than you, and so long as you tell them what you want, that forms part of the contract. Which means if they sell you something that does not meet the requirements you told them, they have to put matters right - either swap the drive for something that is suitable, or give you a refund. They can charge the difference if "suitable" means a more expensive drive, but a lot of UK shops would swallow the loss if they had recommended the wrong drive.)
>
In the US.
I'll have to look at my receipt. The recommendation was went I purchase the *last* drive.. not this current set. But I copied and pasted part numbers. So I'll have to look to see what's up.
Like I said, I can find a use for them elsewhere. It's not a huge deal.
-Ben
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
2016-08-28 23:54 ` Adam Goryachev
@ 2016-08-29 1:25 ` Benjammin2068
2016-08-29 11:19 ` Wols Lists
0 siblings, 1 reply; 25+ messages in thread
From: Benjammin2068 @ 2016-08-29 1:25 UTC (permalink / raw)
To: linux-raid
On 08/28/2016 06:54 PM, Adam Goryachev wrote:
> When I saw this, I assume it means you can ask for the status, and it will tell you it is disabled, but there is no support to modify the status (ie, turn it on). Totally useless for all intents and purposes....
>
> Then again, I could be wrong... but compared to your other drive which showed additional supports, or on my one here:
> SCT capabilities: (0x0039) SCT Status supported.
> SCT Error Recovery Control supported.
> SCT Feature Control supported.
> SCT Data Table supported.
>
> ie, the second one is probably what you want, the third allows you to turn it on/off, and no idea about the last option....
>
Right - I get that. But not knowing *for sure* I thought I would go look it up and google wasn't exactly helpful for a developer style description of what exactly the difference was.
again, no worries. I'll get me some of the right drives one way or another.
-Ben
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
2016-08-29 1:25 ` Benjammin2068
@ 2016-08-29 11:19 ` Wols Lists
2016-09-18 17:13 ` Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)) Benjammin2068
0 siblings, 1 reply; 25+ messages in thread
From: Wols Lists @ 2016-08-29 11:19 UTC (permalink / raw)
To: Benjammin2068, linux-raid
On 29/08/16 02:25, Benjammin2068 wrote:
> Right - I get that. But not knowing *for sure* I thought I would go look it up and google wasn't exactly helpful for a developer style description of what exactly the difference was.
>
> again, no worries. I'll get me some of the right drives one way or another.
I don't know whether you can still get them, but there was a post about
a crashed raid1 array here not long ago, and the array contained a
couple of 1TB Seagate Constellations. Those DID support raid, but
they're probably discontinued now :-(
Cheers,
Wol
^ permalink raw reply [flat|nested] 25+ messages in thread
* Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
2016-08-29 11:19 ` Wols Lists
@ 2016-09-18 17:13 ` Benjammin2068
2016-09-18 17:50 ` Chris Murphy
2016-09-18 18:08 ` Benjammin2068
0 siblings, 2 replies; 25+ messages in thread
From: Benjammin2068 @ 2016-09-18 17:13 UTC (permalink / raw)
To: linux-raid
In a followup question to my arrays, I have a question about the new WDs with the larger sector size geometry but support 512B sectors.
I bought some WD Reds (WD10EFRX) drives.
When I let the linux "Disk Utility" (palimpest <- who the heck named that anyway?) do the RAID management with a new drive, it partitions on cyls and not sectors.
So it makes a partition and then complains to me it's off by 512bytes which could affect performance.
Gee. Thanks.
So I can use g/parted -- or fdisk....
but I thought I'd get any suggestions for the preferred tool and any pitfalls to watch out for.
Thanks,
-Ben
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
2016-09-18 17:13 ` Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)) Benjammin2068
@ 2016-09-18 17:50 ` Chris Murphy
2016-09-18 18:41 ` Benjammin2068
2016-09-18 18:08 ` Benjammin2068
1 sibling, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2016-09-18 17:50 UTC (permalink / raw)
To: Benjammin2068; +Cc: Linux-RAID
On Sun, Sep 18, 2016 at 11:13 AM, Benjammin2068 <benjammin2068@gmail.com> wrote:
> In a followup question to my arrays, I have a question about the new WDs with the larger sector size geometry but support 512B sectors.
>
> I bought some WD Reds (WD10EFRX) drives.
>
> When I let the linux "Disk Utility" (palimpest <- who the heck named that anyway?) do the RAID management with a new drive, it partitions on cyls and not sectors.
>
> So it makes a partition and then complains to me it's off by 512bytes which could affect performance.
This is one of the dumbest things, haha. I do not for the life of me
understand what distribution won't backport this, if they're unwilling
to put modern tools for modern hardware in their distributions. It's
one of the simplest, safest backports they could do and yet they
don't. Incredible to me.
Anyway, yeah partition with something not from the Pleistocene.
Seriously, it's that old, it's that much of a solved problem, for
probably 5 years, maybe even longer.
Any version of gdisk will do this correctly out of the box, so you can
just install that from your existing old distro presumably. And if you
can't, then get a recent live CD from pretty much anybody: Fedora 23
or Fedora 24 has gdisk already on the media, and its version of parted
and fdisk, also included, all do alignment to 4KiB sectors correctly.
Actually, on either Fedora live media version you can do
dnf install https://kojipkgs.fedoraproject.org//packages/blivet-gui/2.0.1/1.fc25/noarch/blivet-gui-2.0.1-1.fc25.noarch.rpm
Which is the current version, and it will work on F24 for sure and
maybe/probably F23 also. And dnf will sort out any additional
dependencies needed. It has a similar gparted style UI, but it will do
all kinds of wild things: mdadm raid, LVM raid, Btrfs. It'll create
the partitions, RAID, LV's, file systems, and it will discover things
already on the drive and properly wipe their signatures with a proper
tear down before creating the new things. So you don't end up with
crusty old stuff coming back to haunt you some other day.
--
Chris Murphy
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
2016-09-18 17:13 ` Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)) Benjammin2068
2016-09-18 17:50 ` Chris Murphy
@ 2016-09-18 18:08 ` Benjammin2068
1 sibling, 0 replies; 25+ messages in thread
From: Benjammin2068 @ 2016-09-18 18:08 UTC (permalink / raw)
To: linux-raid
As an update to this, here's some data:
the older Samsung HD103SJ drives (3 of the 4 drive RAID5 are still alive and well in this stack) have partition#1 (/dev/sdX1) which lists out at:
> [root@quantum myth]# sfdisk -l -uM /dev/sdc <-- this is the output from one of the 3 HD103SJ drives. The partition was originally created by palimpest.
>
> Disk /dev/sdc: 121601 cylinders, 255 heads, 63 sectors/track
> Units = mebibytes of 1048576 bytes, blocks of 1024 bytes, counting from 0
>
> Device Boot Start End MiB #blocks Id System
> /dev/sdc1 0+ 953867- 953868- 976760001 fd Linux raid autodetect
> /dev/sdc2 0 - 0 0 0 Empty
> /dev/sdc3 0 - 0 0 0 Empty
> /dev/sdc4 0 - 0 0 0 Empty
When I do the math:
976,760,001 * 1024 = 1,000,202,241,024 bytes --- ok, so that's /dev/sdX1
Now we take 1,000,202,241,024 / 4096 (block size of new drives) = 244190000.25 -- so I have a 1024byte (2 512byte sector) difference between the 2 models when trying to switch over.
Is there a best practice for how to contend with this? (resize the partition somehow on the raid and then alter the partitions sizes -2 sectors to make then /8 nicely? I know. Sounds insane. I have backups. I'd do it. :P )
Should I just eat the performance hit for now?
Thanks,
-Ben
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
2016-09-18 17:50 ` Chris Murphy
@ 2016-09-18 18:41 ` Benjammin2068
2016-09-18 19:17 ` Wols Lists
0 siblings, 1 reply; 25+ messages in thread
From: Benjammin2068 @ 2016-09-18 18:41 UTC (permalink / raw)
To: Chris Murphy; +Cc: Linux-RAID
On 09/18/2016 12:50 PM, Chris Murphy wrote:
>
> This is one of the dumbest things, haha. I do not for the life of me
> understand what distribution won't backport this, if they're unwilling
> to put modern tools for modern hardware in their distributions. It's
> one of the simplest, safest backports they could do and yet they
> don't. Incredible to me.
Yeaaaa.... and considering how often I have to do these kinds of installs or admin... it's... well.. yea.
> Any version of gdisk will do this correctly out of the box, so you can
> just install that from your existing old distro presumably. And if you
> can't, then get a recent live CD from pretty much anybody: Fedora 23
> or Fedora 24 has gdisk already on the media, and its version of parted
> and fdisk, also included, all do alignment to 4KiB sectors correctly.
>
> Actually, on either Fedora live media version you can do
>
> dnf install https://kojipkgs.fedoraproject.org//packages/blivet-gui/2.0.1/1.fc25/noarch/blivet-gui-2.0.1-1.fc25.noarch.rpm
>
> Which is the current version, and it will work on F24 for sure and
> maybe/probably F23 also. And dnf will sort out any additional
> dependencies needed. It has a similar gparted style UI, but it will do
> all kinds of wild things: mdadm raid, LVM raid, Btrfs. It'll create
> the partitions, RAID, LV's, file systems, and it will discover things
> already on the drive and properly wipe their signatures with a proper
> tear down before creating the new things. So you don't end up with
> crusty old stuff coming back to haunt you some other day.
>
I'll check - this is CentOS... but I've (as shown in followup email) played with fdisk (which doesn't bother me) and some of the others...
now I just have to sort out this offset issue which I think I'm stuck with due to different partition sizes.
-Ben
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
2016-09-18 18:41 ` Benjammin2068
@ 2016-09-18 19:17 ` Wols Lists
2016-09-18 19:58 ` Benjammin2068
0 siblings, 1 reply; 25+ messages in thread
From: Wols Lists @ 2016-09-18 19:17 UTC (permalink / raw)
To: Benjammin2068, Chris Murphy; +Cc: Linux-RAID
On 18/09/16 19:41, Benjammin2068 wrote:
> I'll check - this is CentOS... but I've (as shown in followup email) played with fdisk (which doesn't bother me) and some of the others...
>
> now I just have to sort out this offset issue which I think I'm stuck with due to different partition sizes.
Don't quite understand what you're trying to do, but ...
I'm sure you know this, but getting the physical/logical block size
out-of-sync hurts disk performance. And copying a smaller partition into
a larger allocated space is perfectly harmless. So...
I'd simply use a modern partition manager (such as gdisk) to partition
your new drives such that the new partitions are larger than the
existing ones, and are properly aligned relative to the drive geometry.
Then copy the old partitions across however you were planning - whether
it's "mdadm --replace" or stopping the array and "dd old-device
new-device" or whatever.
If you've got a bit of wasted space, or whatever, who cares.
You can resize your file-systems to use all available space, if you wish
(can't remember how, whenever I've done that sort of stuff it hasn't
been hard).
But I'd certainly try and avoid those offset warnings - it smacks to me
of a mismatch between 512-byte blocks and 4K disk sectors, and I
wouldn't want the drive firmware messing about correcting mismatches
between OS 4K blocks and drive 4K blocks. I don't fully understand it
but I know there was a lot of grief with exactly this sort of thing in
the transition from 512-byte to 4K.
Cheers,
Wol
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
2016-09-18 19:17 ` Wols Lists
@ 2016-09-18 19:58 ` Benjammin2068
2016-09-18 21:21 ` Wols Lists
0 siblings, 1 reply; 25+ messages in thread
From: Benjammin2068 @ 2016-09-18 19:58 UTC (permalink / raw)
To: Wols Lists, Chris Murphy; +Cc: Linux-RAID
On 09/18/2016 02:17 PM, Wols Lists wrote:
>
> I'm sure you know this, but getting the physical/logical block size
> out-of-sync hurts disk performance. And copying a smaller partition into
> a larger allocated space is perfectly harmless. So...
>
> I'd simply use a modern partition manager (such as gdisk) to partition
> your new drives such that the new partitions are larger than the
> existing ones, and are properly aligned relative to the drive geometry.
>
> Then copy the old partitions across however you were planning - whether
> it's "mdadm --replace" or stopping the array and "dd old-device
> new-device" or whatever.
>
> If you've got a bit of wasted space, or whatever, who cares.
> You can resize your file-systems to use all available space, if you wish
> (can't remember how, whenever I've done that sort of stuff it hasn't
> been hard).
>
> But I'd certainly try and avoid those offset warnings - it smacks to me
> of a mismatch between 512-byte blocks and 4K disk sectors, and I
> wouldn't want the drive firmware messing about correcting mismatches
> between OS 4K blocks and drive 4K blocks. I don't fully understand it
> but I know there was a lot of grief with exactly this sort of thing in
> the transition from 512-byte to 4K.
>
Aha! That's what I needed to know.
I was wondering if I can make a partition (I think) that's 3/4 of a block larger (3072bytes) than the original /dev/sdX1's on the old HD103SJs drives.
You've answered my question perfectly.
I can use sfdisk or parted to get that done...
Thanks a bunch!
-Ben
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
2016-09-18 19:58 ` Benjammin2068
@ 2016-09-18 21:21 ` Wols Lists
2016-09-18 21:29 ` Benjammin2068
0 siblings, 1 reply; 25+ messages in thread
From: Wols Lists @ 2016-09-18 21:21 UTC (permalink / raw)
To: Benjammin2068, Chris Murphy; +Cc: Linux-RAID
On 18/09/16 20:58, Benjammin2068 wrote:
> Aha! That's what I needed to know.
>
> I was wondering if I can make a partition (I think) that's 3/4 of a block larger (3072bytes) than the original /dev/sdX1's on the old HD103SJs drives.
Good. It's a bit like string logic - if the buffer is bigger than the
string everything's fine, but if the string is bigger than the buffer,
well, ooopppssssss.
Basically, I think the root cause of all this mess is that drive
sectors/blocks/whatever used to be 512 bytes. So, obviously, it made
sense to have sector 0 be the boot sector, and your first partition
started in sector 1. If your drives are small, you don't want to waste
space.
Then the new drives came along with 4K sectors. Aarghh. Put an old-style
partition scheme on a new-style drive, and every OS 4K block would start
in the 2nd 512-byte block of a 4K drive sector. So every disk write from
the OS would force the drive to read two sectors from disk, overlay the
OS block over them, and write them both back. Not nice. And the latest
drives refuse to do that!
Which is one of the reasons why modern partitioning programs start the
first partition - iirc - at the start of the 3rd megabyte of the disk.
Leaving plenty of space for the boot/startup code.
So it's not worth replicating your old partitions directly on the new
drives. Just make sure the new drives are the same size (or a bit
larger) than the old ones, and move the data across. Bit like copying a
string :-)
Cheers,
Wol
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
2016-09-18 21:21 ` Wols Lists
@ 2016-09-18 21:29 ` Benjammin2068
2016-09-19 6:25 ` Wols Lists
0 siblings, 1 reply; 25+ messages in thread
From: Benjammin2068 @ 2016-09-18 21:29 UTC (permalink / raw)
To: Wols Lists; +Cc: Linux-RAID
On 09/18/2016 04:21 PM, Wols Lists wrote:
> On 18/09/16 20:58, Benjammin2068 wrote:
>> Aha! That's what I needed to know.
>>
>> I was wondering if I can make a partition (I think) that's 3/4 of a block larger (3072bytes) than the original /dev/sdX1's on the old HD103SJs drives.
> Good. It's a bit like string logic - if the buffer is bigger than the
> string everything's fine, but if the string is bigger than the buffer,
> well, ooopppssssss.
>
> Basically, I think the root cause of all this mess is that drive
> sectors/blocks/whatever used to be 512 bytes. So, obviously, it made
> sense to have sector 0 be the boot sector, and your first partition
> started in sector 1. If your drives are small, you don't want to waste
> space.
>
> Then the new drives came along with 4K sectors. Aarghh. Put an old-style
> partition scheme on a new-style drive, and every OS 4K block would start
> in the 2nd 512-byte block of a 4K drive sector. So every disk write from
> the OS would force the drive to read two sectors from disk, overlay the
> OS block over them, and write them both back. Not nice. And the latest
> drives refuse to do that!
hah.. yea.. I remember when it happened (and why). (I still have a seagate ST-251 40MB MFM HD sitting in a box with my Atari software on it. Right now, it's Schrodinger's drive. It still working as long as I don't pull it out and test it. LoL....)
Drive companies claimed (and maybe rightfully so) that the 512B sector with all the seeks required to read data was wasteful. (considering the armature movement needed for scattered files and people who didn't defrag their drives.)
Also, the number of sectors that could be numbered on a drive was an issue with the sizes of drives coming out.
a 2^32 sectors @ 512bytes = 2,199,023,255,552 <-- doesn't that number ring a bell. ;)
So they moved to bigger sector sizes.
> Which is one of the reasons why modern partitioning programs start the
> first partition - iirc - at the start of the 3rd megabyte of the disk.
> Leaving plenty of space for the boot/startup code.
Yup. Now with all the bootloaders...
>
> So it's not worth replicating your old partitions directly on the new
> drives. Just make sure the new drives are the same size (or a bit
> larger) than the old ones, and move the data across. Bit like copying a
> string :-)
Sounds good. I was more worried about the specifics of the partition and how mdadm sees a larger sized partition -- NOT just a larger sized drive. (on which a same size partition could be built)
Thanks again,
-Ben
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
2016-09-18 21:29 ` Benjammin2068
@ 2016-09-19 6:25 ` Wols Lists
2016-09-19 16:17 ` Benjammin2068
0 siblings, 1 reply; 25+ messages in thread
From: Wols Lists @ 2016-09-19 6:25 UTC (permalink / raw)
To: Benjammin2068; +Cc: Linux-RAID
On 18/09/16 22:29, Benjammin2068 wrote:
> Sounds good. I was more worried about the specifics of the partition and how mdadm sees a larger sized partition -- NOT just a larger sized drive. (on which a same size partition could be built)
Yeah. I've done that a couple of times. Create the new partition larger
than the old one. dd the old partition across. Use whatever
filesystem-specific tool there was to grow the file system into all
available space on the partition.
Oh yes - and be damn careful with FAT :-) I can't remember the details,
but when there was a problem it used to prefer a faulty filesystem size
to the partition size, and would gaily sail off the end of the
partition, trashing the next partition. My "record to USB" TV seems
rather prone to this :-(
Cheers,
Wol
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
2016-09-19 6:25 ` Wols Lists
@ 2016-09-19 16:17 ` Benjammin2068
0 siblings, 0 replies; 25+ messages in thread
From: Benjammin2068 @ 2016-09-19 16:17 UTC (permalink / raw)
To: Wols Lists; +Cc: Linux-RAID
On 09/19/2016 01:25 AM, Wols Lists wrote:
>
> Yeah. I've done that a couple of times. Create the new partition larger
> than the old one. dd the old partition across. Use whatever
> filesystem-specific tool there was to grow the file system into all
> available space on the partition.
>
> Oh yes - and be damn careful with FAT :-) I can't remember the details,
> but when there was a problem it used to prefer a faulty filesystem size
> to the partition size, and would gaily sail off the end of the
> partition, trashing the next partition. My "record to USB" TV seems
> rather prone to this :-(
>
These drives are wholly allocated to nothing but the RAID array... so I only have to make 1 partition and it's more or less the whole disk. :)
I've got the new WDs online and am growing that RAID5 to a RAID6 as we speak.
(two thumbs up)
I have (2) HD103SJ drives left in the array... one installed when the array was built and has about 44500 hours on it... while the other only has about 38400hours on it.
smartctl is keeping an eye on them for me. ;)
The rest of the drives are relatively new (especially after the episode of drive failures a couple weeks ago).
Thanks again for the help everyone!
-Ben
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2016-09-19 16:17 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-22 21:51 Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive) Ben Kamen
2016-08-22 23:06 ` Adam Goryachev
2016-08-23 11:36 ` Wols Lists
2016-08-23 15:44 ` Ben
[not found] ` <CADDTLRBf9NPO6OuF4a3b+xffZgeZRqHRG+pJdPmbc9-Jat0HVQ@mail.gmail.com>
[not found] ` <d6d3fe0d-3f9f-985f-9bfb-051428cf221b@websitemanagers.com.au>
[not found] ` <57BBDA5B.3020706@gmail.com>
[not found] ` <57BBDC15.5030301@gmail.com>
[not found] ` <b8c6a380-7e6a-fda9-5834-b85271b26892@websitemanagers.com.au>
[not found] ` <57BC61F7.8070102@gmail.com>
[not found] ` <aca4e83f-9a3f-c200-7c16-3b5d9df52c1e@websitemanagers.com.au>
[not found] ` <57BE450B.4030700@gmail.com>
[not found] ` <56e86db5-456d-e9c1-339d-ba8903fe5dde@websitemanagers.com.au>
[not found] ` <57BE52BC.6040908@gmail.com>
[not found] ` <933228e0-bce4-ffad-f48d-034bf89bc07f@websitemanagers.com.au>
2016-08-26 1:20 ` Ben
2016-08-26 2:22 ` Phil Turmel
2016-08-26 2:54 ` Benjammin2068
2016-08-26 12:38 ` Phil Turmel
2016-08-26 18:07 ` Wols Lists
2016-08-28 18:29 ` Benjammin2068
2016-08-28 19:20 ` Anthony Youngman
2016-08-29 1:23 ` Benjammin2068
2016-08-28 23:54 ` Adam Goryachev
2016-08-29 1:25 ` Benjammin2068
2016-08-29 11:19 ` Wols Lists
2016-09-18 17:13 ` Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)) Benjammin2068
2016-09-18 17:50 ` Chris Murphy
2016-09-18 18:41 ` Benjammin2068
2016-09-18 19:17 ` Wols Lists
2016-09-18 19:58 ` Benjammin2068
2016-09-18 21:21 ` Wols Lists
2016-09-18 21:29 ` Benjammin2068
2016-09-19 6:25 ` Wols Lists
2016-09-19 16:17 ` Benjammin2068
2016-09-18 18:08 ` Benjammin2068
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.