Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)

All of lore.kernel.org
 help / color / mirror / Atom feed

* Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
@ 2016-08-22 21:51 Ben Kamen
  2016-08-22 23:06 ` Adam Goryachev
  0 siblings, 1 reply; 25+ messages in thread
From: Ben Kamen @ 2016-08-22 21:51 UTC (permalink / raw)
  To: linux-raid

Hey all. I'm looking at the RAID Wiki and need some help.

First Info:

I have a RAID5 with 4 members /dev/sd[cdef]1 where last night, sdc1
reported a smart error recommended drive replacement (after watching
sector errors pile up for about a week.)

no problem. shut down the drive, pulled it, replace it with a cold
spare. Started the rebuild (around midnight CDT).

At 5:43am, I got this message:

This is an automatically generated mail message from mdadm
running on quantum

A Fail event had been detected on md device /dev/md127.

It could be related to component device /dev/sde1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sda2[0] sdb2[2]
      511988 blocks super 1.0 [2/2] [UU]

md127 : active raid5 sdc1[4] sdf1[6] sde1[1](F) sdd1[5]
      2930276352 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/2] [U_U_]
      [===========>.........]  recovery = 55.9% (546131076/976758784)
finish=381.6min speed=18805K/sec
      bitmap: 4/8 pages [16KB], 65536KB chunk

md1 : active raid1 sda3[0] sdb3[2]
      239489916 blocks super 1.1 [2/2] [UU]
      bitmap: 2/2 pages [8KB], 65536KB chunk

md10 : active raid1 sda1[0] sdb1[2]
      4193272 blocks super 1.1 [2/2] [UU]

unused devices: <none>

/dev/md127  is the one with issues.

It looks like the SATA controller had issues. I couldn't see sde - so
I rebooted. (scold me later.)

All the drives are available. SMARTCTL tells me /dev/sde is happy as
can be (has a few bad sectors and is slated for replacement next, but
smart says drive is healthy).

I looked at the raid Wiki - and saved the mdadm --examine info. Of the
active members, the event count is off by 25 for happy vs unhappy
members.

But forcing the assembly claims

mdadm --assemble --force /dev/md127 /dev/sd[cdef]1
mdadm: /dev/sdc1 is busy - skipping
mdadm: /dev/sdd1 is busy - skipping
mdadm: /dev/sdf1 is busy - skipping
mdadm: Found some drive for an array that is already active: /dev/md/:BigRAID
mdadm: giving up.

So before I mess up ANYTHING else...

What should I be doing?

(should I be stopping the RAID as right now it's seems like it's running)

Thanks,

   -Ben

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
  2016-08-22 21:51 Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive) Ben Kamen
@ 2016-08-22 23:06 ` Adam Goryachev
  2016-08-23 11:36   ` Wols Lists
       [not found]   ` <CADDTLRBf9NPO6OuF4a3b+xffZgeZRqHRG+pJdPmbc9-Jat0HVQ@mail.gmail.com>
  0 siblings, 2 replies; 25+ messages in thread
From: Adam Goryachev @ 2016-08-22 23:06 UTC (permalink / raw)
  To: Ben Kamen, linux-raid

On 23/08/16 07:51, Ben Kamen wrote:
> Hey all. I'm looking at the RAID Wiki and need some help.
>
> First Info:
>
> I have a RAID5 with 4 members /dev/sd[cdef]1 where last night, sdc1
> reported a smart error recommended drive replacement (after watching
> sector errors pile up for about a week.)
>
> no problem. shut down the drive, pulled it, replace it with a cold
> spare. Started the rebuild (around midnight CDT).
>
> At 5:43am, I got this message:
>
> This is an automatically generated mail message from mdadm
> running on quantum
>
> A Fail event had been detected on md device /dev/md127.
>
> It could be related to component device /dev/sde1.
>
> Faithfully yours, etc.
>
> P.S. The /proc/mdstat file currently contains the following:
>
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md0 : active raid1 sda2[0] sdb2[2]
>        511988 blocks super 1.0 [2/2] [UU]
>
> md127 : active raid5 sdc1[4] sdf1[6] sde1[1](F) sdd1[5]
>        2930276352 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/2] [U_U_]
>        [===========>.........]  recovery = 55.9% (546131076/976758784)
> finish=381.6min speed=18805K/sec
>        bitmap: 4/8 pages [16KB], 65536KB chunk
>
> md1 : active raid1 sda3[0] sdb3[2]
>        239489916 blocks super 1.1 [2/2] [UU]
>        bitmap: 2/2 pages [8KB], 65536KB chunk
>
> md10 : active raid1 sda1[0] sdb1[2]
>        4193272 blocks super 1.1 [2/2] [UU]
>
> unused devices: <none>
>
> /dev/md127  is the one with issues.
>
> It looks like the SATA controller had issues. I couldn't see sde - so
> I rebooted. (scold me later.)
>
> All the drives are available. SMARTCTL tells me /dev/sde is happy as
> can be (has a few bad sectors and is slated for replacement next, but
> smart says drive is healthy).
>
> I looked at the raid Wiki - and saved the mdadm --examine info. Of the
> active members, the event count is off by 25 for happy vs unhappy
> members.
>
> But forcing the assembly claims
>
> mdadm --assemble --force /dev/md127 /dev/sd[cdef]1
> mdadm: /dev/sdc1 is busy - skipping
> mdadm: /dev/sdd1 is busy - skipping
> mdadm: /dev/sdf1 is busy - skipping
> mdadm: Found some drive for an array that is already active: /dev/md/:BigRAID
> mdadm: giving up.
>
> So before I mess up ANYTHING else...
>
> What should I be doing?
>
> (should I be stopping the RAID as right now it's seems like it's running)
>
> Thanks,
>
First step, if the raid is running, then do a backup.
Second step, read all about SCT/ERC, and almost certainly fix the issues 
with your drives (either enable SCT/ERC on the drive or set the timeout 
appropriately).
Third step, make sure your backup is up to date
Fourth step, provide the current output of the raid array, is it 
resyncing, is the resync pending, is it finished, etc...
If it's finished, then don't replace the next drive in the same way, use 
the replace method instead. That will keep redundancy in the array 
during the replacement, and hopefully avoid this sort of issue.
Later, you might consider moving to RAID6 to add some additional 
redundancy instead of using a cold spare.

I hope the above is helpful, but really we will need more information 
about your drives before being able to make further suggestions. output 
of lsdrv (google it), smartctl, mdadm --misc --detail /dev/md127 would 
all be helpful.

Regards,
Adam


-- 
Adam Goryachev Website Managers www.websitemanagers.com.au

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
  2016-08-22 23:06 ` Adam Goryachev
@ 2016-08-23 11:36   ` Wols Lists
  2016-08-23 15:44     ` Ben
       [not found]   ` <CADDTLRBf9NPO6OuF4a3b+xffZgeZRqHRG+pJdPmbc9-Jat0HVQ@mail.gmail.com>
  1 sibling, 1 reply; 25+ messages in thread
From: Wols Lists @ 2016-08-23 11:36 UTC (permalink / raw)
  To: Adam Goryachev, Ben Kamen, linux-raid

On 23/08/16 00:06, Adam Goryachev wrote:
> I hope the above is helpful, but really we will need more information
> about your drives before being able to make further suggestions. output
> of lsdrv (google it), smartctl, mdadm --misc --detail /dev/md127 would
> all be helpful.

And while it's probably too late now, read up on mdadm --replace. If
you've got the spare slots, it's much better/safer than physically
pulling a dodgy disk and replacing it.

NB - get the data Adam asked for - and the output of "mdadm --examine
..." and "mdadm --display ..." might well be useful (or might have been
included elsewhere).

Cheers,
Wol

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
  2016-08-23 11:36   ` Wols Lists
@ 2016-08-23 15:44     ` Ben
  0 siblings, 0 replies; 25+ messages in thread
From: Ben @ 2016-08-23 15:44 UTC (permalink / raw)
  To: linux-raid

On 8/23/2016 6:36 AM, Wols Lists wrote:
> On 23/08/16 00:06, Adam Goryachev wrote:
>
> And while it's probably too late now, read up on mdadm --replace. If
> you've got the spare slots, it's much better/safer than physically
> pulling a dodgy disk and replacing it.
>
> NB - get the data Adam asked for - and the output of "mdadm --examine
> ..." and "mdadm --display ..." might well be useful (or might have been
> included elsewhere).

hi there!

Thanks -- Adam mentioned and yea, it's too late but I have it for next time.

the vast bulk of the data on the array is duplicated to another NAS -- so it's not the end of the world.

Adam helped me get the array back online so I can do some things to it (like some 'nice to have' files).. it's staying reasonably in sync when it craps out...

so hopefully soon I'll have it resolved.

but will probably switch to a RAID6 soon down the road.

Thanks for the help,

  -Ben

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
       [not found]                       ` <933228e0-bce4-ffad-f48d-034bf89bc07f@websitemanagers.com.au>
@ 2016-08-26  1:20                         ` Ben
  2016-08-26  2:22                           ` Phil Turmel
  2016-08-26 18:07                           ` Wols Lists
  0 siblings, 2 replies; 25+ messages in thread
From: Ben @ 2016-08-26  1:20 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2646 bytes --]

As an update,

Adam's been helping me out (and I'm not used to hitting "reply-all" for mailing lists as pretty  much all the ones I'm on set the "reply-to:")

I've turned on sct/erc for the drives... and the one that went bonkers during the rebuild (sde) still would have read issues during a rebuild.

SMART reports it's ok. but.. (shrug) I ended up running ddrescue to the new replacement drive (sdc) that kept getting put back into spare status when the rebuilds would fail.

So I just copied sde -> sdc which went pretty much flawlessly (ddrescue completed without any final complaints)

I also played with badblocks after doing my copy and could find bad blocks -- but apparnently ddrescue had no issues.

So - I went back to

*bringing up the array. No problems.
* adding ANOTHER new drive (that I ordered Sunday night) and it rebuilt fine.
* doing an FSCK -n first which reported no issues - so I did a regular fsck (without -y) and it never prompted me for anything.

My last step is to run rsync -n from my backup to see if it can find any issues between my last backup and the current data for any files with byte oddities.

All this has me wonder if those old bad sectors left some files with a sector of garbage in them or not.

Adam seems to think everything is fine -- so far, that seems to be the case.

A last few questions I have are:

The new drive I got was (supposed to be) the same model as the last Seagate I ordered, but SMART reports them differently. (see attached)

The question on the new drive is that it says it does offline collection... but with gsmartcontrol, I can't seem to turn it on.

This new drive also doesn't seem to support SCT/ERC the same way.

Again,

/dev/sdc - old new spare (bought after seagate bought Samsung and discontinued the HD103SJ model)
/dev/sdd - original RAID member
/dev/sde - brand spanking new drive purchased Sunday.
/dev/sdf - original RAID member

I realize now one says: ST1000DM005 vs ST1000DM003 - Grrr!!!

So I'd like recommendations on whether I should get better matching drives (I can use these elsewhere) or it doesn't matter.

Can I mix/match this array with WD REDs? (and eventually retire all these HD103SJ drives) Do people even like these? They seem ok?

I read a lot of conflicting info on SCT/ERC online (well, TLER anyway) -- Adam likes it enabled. What say the rest of you?

And last -- any caveats as to upgrading this array to RAID6 from RAID5? Can I even do that while in place?

Thanks all, (especially Adam!)

  -Ben

p.s. Check out some of the SMART parms on the /dev/sde. Head flying hours?? And they're not zero. Weird. :/ This drive kinda creeps me out.


[-- Attachment #2: RAID.smart-info.txt --]
[-- Type: text/plain, Size: 20353 bytes --]

[root@quantum ~]# smartctl -a /dev/sdc
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.el6.centos.plus.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     ST1000DM005 HD103SJ
Serial Number:    S246JQ0D800949
LU WWN Device Id: 5 0000f0 080bb4909
Firmware Version: 1AJ10001
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 6
Local Time is:    Thu Aug 25 20:04:06 2016 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                ( 9120) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 152) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0026   054   054   000    Old_age   Always       -       8630
  3 Spin_Up_Time            0x0023   076   071   025    Pre-fail  Always       -       7526
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       11
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       133
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       14
191 G-Sense_Error_Rate      0x0022   252   252   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   064   063   000    Old_age   Always       -       30 (Min/Max 21/37)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   252   252   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   100   100   000    Old_age   Always       -       10
200 Multi_Zone_Error_Rate   0x002a   100   096   000    Old_age   Always       -       558
223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       14
========================================================================================================================
[root@quantum ~]# smartctl -a /dev/sdd
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.el6.centos.plus.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F3
Device Model:     SAMSUNG HD103SJ
Serial Number:    S246J9AB404176
LU WWN Device Id: 5 0024e9 204fbf695
Firmware Version: 1AJ10001
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 6
Local Time is:    Thu Aug 25 20:05:32 2016 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                ( 9180) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 153) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       195
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   073   070   025    Pre-fail  Always       -       8310
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       58
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       37763
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       75
191 G-Sense_Error_Rate      0x0022   252   252   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   064   062   000    Old_age   Always       -       31 (Min/Max 20/43)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   252   252   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   100   100   000    Old_age   Always       -       8
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       146
223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       77
========================================================================================================================
[root@quantum ~]# smartctl -a /dev/sde
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.el6.centos.plus.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda (SATA 3Gb/s, 4K Sectors)
Device Model:     ST1000DM003-1ER162
Serial Number:    Z4YDLXWJ
LU WWN Device Id: 5 000c50 091877801
Firmware Version: CC45
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ACS-2 (unknown minor revision code: 0x001f)
Local Time is:    Thu Aug 25 20:06:33 2016 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (   80) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 105) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x1085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   108   100   006    Pre-fail  Always       -       18255632
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       2
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       269743
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       9
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       2
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   099   099   000    Old_age   Always       -       1
190 Airflow_Temperature_Cel 0x0022   071   068   045    Old_age   Always       -       29 (Min/Max 26/32)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       21
194 Temperature_Celsius     0x0022   029   040   000    Old_age   Always       -       29 (0 25 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       109964047679495
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       3907074414
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       5102115

========================================================================================================================
[root@quantum ~]# smartctl -a /dev/sdf
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.el6.centos.plus.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F3
Device Model:     SAMSUNG HD103SJ
Serial Number:    S246J9AB404174
LU WWN Device Id: 5 0024e9 204fbf676
Firmware Version: 1AJ10001
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 6
Local Time is:    Thu Aug 25 20:07:19 2016 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                ( 9360) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 156) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       353
  2 Throughput_Performance  0x0026   055   055   000    Old_age   Always       -       8559
  3 Spin_Up_Time            0x0023   073   069   025    Pre-fail  Always       -       8389
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       74
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       43724
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       92
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       1
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   064   063   000    Old_age   Always       -       30 (Min/Max 15/40)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   252   252   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   100   100   000    Old_age   Always       -       91
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       229
223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       100

========================================================================================================================





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
  2016-08-26  1:20                         ` Ben
@ 2016-08-26  2:22                           ` Phil Turmel
  2016-08-26  2:54                             ` Benjammin2068
  2016-08-26 18:07                           ` Wols Lists
  1 sibling, 1 reply; 25+ messages in thread
From: Phil Turmel @ 2016-08-26  2:22 UTC (permalink / raw)
  To: Ben, linux-raid

On 08/25/2016 09:20 PM, Ben wrote:

> I read a lot of conflicting info on SCT/ERC online (well, TLER anyway)
> -- Adam likes it enabled. What say the rest of you?

Adam is correct, and it's not a matter of "like".  You either must have
it enabled, or you *must* apply the kernel driver timeout work-around
(180 seconds) for that drive.  Failure to do so results in crashed arrays.

Enterprise and NAS drives work out of the box.  Desktop/green drives do not.

Some reading assignments from old discussions (read whole threads if you
have time):

http://marc.info/?l=linux-raid&m=139050322510249&w=2
http://marc.info/?l=linux-raid&m=135863964624202&w=2
http://marc.info/?l=linux-raid&m=135811522817345&w=1
http://marc.info/?l=linux-raid&m=133761065622164&w=2
http://marc.info/?l=linux-raid&m=132477199207506
http://marc.info/?l=linux-raid&m=133665797115876&w=2
http://marc.info/?l=linux-raid&m=142487508806844&w=3
http://marc.info/?l=linux-raid&m=144535576302583&w=2


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
  2016-08-26  2:22                           ` Phil Turmel
@ 2016-08-26  2:54                             ` Benjammin2068
  2016-08-26 12:38                               ` Phil Turmel
  0 siblings, 1 reply; 25+ messages in thread
From: Benjammin2068 @ 2016-08-26  2:54 UTC (permalink / raw)
  To: linux-raid



On 08/25/2016 09:22 PM, Phil Turmel wrote:
> On 08/25/2016 09:20 PM, Ben wrote:
>
>> I read a lot of conflicting info on SCT/ERC online (well, TLER anyway)
>> -- Adam likes it enabled. What say the rest of you?
> Adam is correct, and it's not a matter of "like".  

"like" was just an expression.

>
>
> You either must have
> it enabled, or you *must* apply the kernel driver timeout work-around
> (180 seconds) for that drive.  Failure to do so results in crashed arrays.

For the ST1000DM003, its SMART capabilities states "SCT Status Supported" -- What does that mean in comparison with the other HD103SJ drives?

It does SCT but doesn't let the user control it or it doesn't do it at all?

(smartctl -l scterc /dev/sde yields a message that implies control is not supported)

>
> Enterprise and NAS drives work out of the box.  Desktop/green drives do not.

Yea - I didn't buy any green drives (purposefully anyway) for this system.

>
> Some reading assignments from old discussions (read whole threads if you
> have time):
>
> http://marc.info/?l=linux-raid&m=139050322510249&w=2
> http://marc.info/?l=linux-raid&m=135863964624202&w=2
> http://marc.info/?l=linux-raid&m=135811522817345&w=1
> http://marc.info/?l=linux-raid&m=133761065622164&w=2
> http://marc.info/?l=linux-raid&m=132477199207506
> http://marc.info/?l=linux-raid&m=133665797115876&w=2
> http://marc.info/?l=linux-raid&m=142487508806844&w=3
> http://marc.info/?l=linux-raid&m=144535576302583&w=2
>

Thanks, will go read.


  -Ben

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
  2016-08-26  2:54                             ` Benjammin2068
@ 2016-08-26 12:38                               ` Phil Turmel
  0 siblings, 0 replies; 25+ messages in thread
From: Phil Turmel @ 2016-08-26 12:38 UTC (permalink / raw)
  To: Benjammin2068, linux-raid

On 08/25/2016 10:54 PM, Benjammin2068 wrote:
>> You either must have
>> it enabled, or you *must* apply the kernel driver timeout work-around
>> (180 seconds) for that drive.  Failure to do so results in crashed arrays.
> 
> For the ST1000DM003, its SMART capabilities states "SCT Status Supported" -- What does that mean in comparison with the other HD103SJ drives?
> 
> It does SCT but doesn't let the user control it or it doesn't do it at all?

ERC is a feature within the SCT standard.  For modern hard drives,
claiming "SCT" support is comparable to a bottled water supplier
advertising that their product is wet.

> (smartctl -l scterc /dev/sde yields a message that implies control is not supported)

ERC on the other hand is a valuable feature that modern drive
manufacturers make you pay extra for.

>> Enterprise and NAS drives work out of the box.  Desktop/green drives do not.
> 
> Yea - I didn't buy any green drives (purposefully anyway) for this system.

I originally wrote that sentence as "Desktop drives do not."  I added
"/green" to clarify that some non-enterprise, non-NAS drives aren't
marketed as desktop drives, but still lack ERC functionality.

Your ST1000DM003 is marketed as a desktop drive.  Seagate's product page
for this model has links to other models for specialty use cases,
including NAS.

>> Some reading assignments from old discussions (read whole threads if you
>> have time):
>>
>> http://marc.info/?l=linux-raid&m=139050322510249&w=2
>> http://marc.info/?l=linux-raid&m=135863964624202&w=2
>> http://marc.info/?l=linux-raid&m=135811522817345&w=1
>> http://marc.info/?l=linux-raid&m=133761065622164&w=2
>> http://marc.info/?l=linux-raid&m=132477199207506
>> http://marc.info/?l=linux-raid&m=133665797115876&w=2
>> http://marc.info/?l=linux-raid&m=142487508806844&w=3
>> http://marc.info/?l=linux-raid&m=144535576302583&w=2
> 
> Thanks, will go read.

You will find detailed explanations for my comments above in these old
threads.

Phil


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
  2016-08-26  1:20                         ` Ben
  2016-08-26  2:22                           ` Phil Turmel
@ 2016-08-26 18:07                           ` Wols Lists
  2016-08-28 18:29                             ` Benjammin2068
  1 sibling, 1 reply; 25+ messages in thread
From: Wols Lists @ 2016-08-26 18:07 UTC (permalink / raw)
  To: Ben, linux-raid

On 26/08/16 02:20, Ben wrote:
> [root@quantum ~]# smartctl -a /dev/sde
> smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.el6.centos.plus.x86_64] (local build)
> Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Seagate Barracuda (SATA 3Gb/s, 4K Sectors)
> Device Model:     ST1000DM003-1ER162
> Serial Number:    Z4YDLXWJ
> LU WWN Device Id: 5 000c50 091877801
> Firmware Version: CC45
> User Capacity:    1,000,204,886,016 bytes [1.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Device is:        In smartctl database [for details use: -P show]

Sorry Ben - that drive was NOT a smart buy !!! Seagate Barracuda :-(

You MUST enable the timeout on this drive :-(

Gut feel tells me most 1TB or less drives are okay in a raid - the
Barracudas are an exception :-( I've got two 3TB Barracudas mirrored,
and from reading the list, there's no way I'd go raid5 for more capacity
without ditching them.

Most people seem to get WD Reds - I've asked about Seagate NAS but I've
not picked up on any reports about them - good or bad. Barracudas - the
news is pretty much all bad :-(

Cheers,
Wol

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
  2016-08-26 18:07                           ` Wols Lists
@ 2016-08-28 18:29                             ` Benjammin2068
  2016-08-28 19:20                               ` Anthony Youngman
  2016-08-28 23:54                               ` Adam Goryachev
  0 siblings, 2 replies; 25+ messages in thread
From: Benjammin2068 @ 2016-08-28 18:29 UTC (permalink / raw)
  To: linux-raid



On 08/26/2016 01:07 PM, Wols Lists wrote:
> On 26/08/16 02:20, Ben wrote:
>> [root@quantum ~]# smartctl -a /dev/sde
>> smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.el6.centos.plus.x86_64] (local build)
>> Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
>>
>> === START OF INFORMATION SECTION ===
>> Model Family:     Seagate Barracuda (SATA 3Gb/s, 4K Sectors)
>> Device Model:     ST1000DM003-1ER162
>> Serial Number:    Z4YDLXWJ
>> LU WWN Device Id: 5 000c50 091877801
>> Firmware Version: CC45
>> User Capacity:    1,000,204,886,016 bytes [1.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Device is:        In smartctl database [for details use: -P show]
> Sorry Ben - that drive was NOT a smart buy !!! Seagate Barracuda :-(
>
> You MUST enable the timeout on this drive :-(
>
> Gut feel tells me most 1TB or less drives are okay in a raid - the
> Barracudas are an exception :-( I've got two 3TB Barracudas mirrored,
> and from reading the list, there's no way I'd go raid5 for more capacity
> without ditching them.
>
> Most people seem to get WD Reds - I've asked about Seagate NAS but I've
> not picked up on any reports about them - good or bad. Barracudas - the
> news is pretty much all bad :-(
>
>

Yea, I figured that out -- just couldn't find a decent detailed reference with what "SCT status supported" means versus the more fully featured.

And this drive (sort of  - but not this sub model -- and that's the replacement that Seagate recommended.) is not going to stay in the array.

I'm going to get some more WD red's (or decent NAS friendly mechs) and pull this puppy out of the stack and use it elsewhere.

Thanks for the confirmations!

 -Ben



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
  2016-08-28 18:29                             ` Benjammin2068
@ 2016-08-28 19:20                               ` Anthony Youngman
  2016-08-29  1:23                                 ` Benjammin2068
  2016-08-28 23:54                               ` Adam Goryachev
  1 sibling, 1 reply; 25+ messages in thread
From: Anthony Youngman @ 2016-08-28 19:20 UTC (permalink / raw)
  To: Benjammin2068, linux-raid

On 28/08/16 19:29, Benjammin2068 wrote:
> And this drive (sort of  - but not this sub model -- and that's the replacement that Seagate recommended.) is not going to stay in the array.

If they knew you were using it in a raid, and recommended it, then I 
don't know about your laws but over here in the UK I'd send it back as 
"unfit for purpose". Under SOGA (Sale Of Goods Act) they've sold you a 
pup and it's their problem, not yours.

(UK law assumes the salesman knows more than you, and so long as you 
tell them what you want, that forms part of the contract. Which means if 
they sell you something that does not meet the requirements you told 
them, they have to put matters right - either swap the drive for 
something that is suitable, or give you a refund. They can charge the 
difference if "suitable" means a more expensive drive, but a lot of UK 
shops would swallow the loss if they had recommended the wrong drive.)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
  2016-08-28 18:29                             ` Benjammin2068
  2016-08-28 19:20                               ` Anthony Youngman
@ 2016-08-28 23:54                               ` Adam Goryachev
  2016-08-29  1:25                                 ` Benjammin2068
  1 sibling, 1 reply; 25+ messages in thread
From: Adam Goryachev @ 2016-08-28 23:54 UTC (permalink / raw)
  To: Benjammin2068, linux-raid

On 29/08/16 04:29, Benjammin2068 wrote:
>
> On 08/26/2016 01:07 PM, Wols Lists wrote:
>> On 26/08/16 02:20, Ben wrote:
>>> [root@quantum ~]# smartctl -a /dev/sde
>>> smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.el6.centos.plus.x86_64] (local build)
>>> Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
>>>
>>> === START OF INFORMATION SECTION ===
>>> Model Family:     Seagate Barracuda (SATA 3Gb/s, 4K Sectors)
>>> Device Model:     ST1000DM003-1ER162
>>> Serial Number:    Z4YDLXWJ
>>> LU WWN Device Id: 5 000c50 091877801
>>> Firmware Version: CC45
>>> User Capacity:    1,000,204,886,016 bytes [1.00 TB]
>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>> Device is:        In smartctl database [for details use: -P show]
>> Sorry Ben - that drive was NOT a smart buy !!! Seagate Barracuda :-(
>>
>> You MUST enable the timeout on this drive :-(
>>
>> Gut feel tells me most 1TB or less drives are okay in a raid - the
>> Barracudas are an exception :-( I've got two 3TB Barracudas mirrored,
>> and from reading the list, there's no way I'd go raid5 for more capacity
>> without ditching them.
>>
>> Most people seem to get WD Reds - I've asked about Seagate NAS but I've
>> not picked up on any reports about them - good or bad. Barracudas - the
>> news is pretty much all bad :-(
>>
>>
> Yea, I figured that out -- just couldn't find a decent detailed reference with what "SCT status supported" means versus the more fully featured.
When I saw this, I assume it means you can ask for the status, and it 
will tell you it is disabled, but there is no support to modify the 
status (ie, turn it on). Totally useless for all intents and purposes....

Then again, I could be wrong... but compared to your other drive which 
showed additional supports, or on my one here:
SCT capabilities:              (0x0039) SCT Status supported.
                                         SCT Error Recovery Control 
supported.
                                         SCT Feature Control supported.
                                         SCT Data Table supported.

ie, the second one is probably what you want, the third allows you to 
turn it on/off, and no idea about the last option....

Regards,
Adam

-- 
Adam Goryachev Website Managers www.websitemanagers.com.au

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
  2016-08-28 19:20                               ` Anthony Youngman
@ 2016-08-29  1:23                                 ` Benjammin2068
  0 siblings, 0 replies; 25+ messages in thread
From: Benjammin2068 @ 2016-08-29  1:23 UTC (permalink / raw)
  To: linux-raid

On 08/28/2016 02:20 PM, Anthony Youngman wrote:
> On 28/08/16 19:29, Benjammin2068 wrote:
>> And this drive (sort of  - but not this sub model -- and that's the replacement that Seagate recommended.) is not going to stay in the array.
>
> If they knew you were using it in a raid, and recommended it, then I don't know about your laws but over here in the UK I'd send it back as "unfit for purpose". Under SOGA (Sale Of Goods Act) they've sold you a pup and it's their problem, not yours.
>
> (UK law assumes the salesman knows more than you, and so long as you tell them what you want, that forms part of the contract. Which means if they sell you something that does not meet the requirements you told them, they have to put matters right - either swap the drive for something that is suitable, or give you a refund. They can charge the difference if "suitable" means a more expensive drive, but a lot of UK shops would swallow the loss if they had recommended the wrong drive.)
>

In the US.

I'll have to look at my receipt. The recommendation was went I purchase the *last* drive.. not this current set. But I copied and pasted part numbers. So I'll have to look to see what's up.

Like I said, I can find a use for them elsewhere. It's not a huge deal.

 -Ben


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
  2016-08-28 23:54                               ` Adam Goryachev
@ 2016-08-29  1:25                                 ` Benjammin2068
  2016-08-29 11:19                                   ` Wols Lists
  0 siblings, 1 reply; 25+ messages in thread
From: Benjammin2068 @ 2016-08-29  1:25 UTC (permalink / raw)
  To: linux-raid

On 08/28/2016 06:54 PM, Adam Goryachev wrote:
> When I saw this, I assume it means you can ask for the status, and it will tell you it is disabled, but there is no support to modify the status (ie, turn it on). Totally useless for all intents and purposes....
>
> Then again, I could be wrong... but compared to your other drive which showed additional supports, or on my one here:
> SCT capabilities:              (0x0039) SCT Status supported.
>                                         SCT Error Recovery Control supported.
>                                         SCT Feature Control supported.
>                                         SCT Data Table supported.
>
> ie, the second one is probably what you want, the third allows you to turn it on/off, and no idea about the last option....
>


Right - I get that. But not knowing *for sure* I thought I would go look it up and google wasn't exactly helpful for a developer style description of what exactly the difference was.

again, no worries. I'll get me some of the right drives one way or another.

 -Ben


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)
  2016-08-29  1:25                                 ` Benjammin2068
@ 2016-08-29 11:19                                   ` Wols Lists
  2016-09-18 17:13                                     ` Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)) Benjammin2068
  0 siblings, 1 reply; 25+ messages in thread
From: Wols Lists @ 2016-08-29 11:19 UTC (permalink / raw)
  To: Benjammin2068, linux-raid

On 29/08/16 02:25, Benjammin2068 wrote:
> Right - I get that. But not knowing *for sure* I thought I would go look it up and google wasn't exactly helpful for a developer style description of what exactly the difference was.
> 
> again, no worries. I'll get me some of the right drives one way or another.

I don't know whether you can still get them, but there was a post about
a crashed raid1 array here not long ago, and the array contained a
couple of 1TB Seagate Constellations. Those DID support raid, but
they're probably discontinued now :-(

Cheers,
Wol

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
  2016-08-29 11:19                                   ` Wols Lists
@ 2016-09-18 17:13                                     ` Benjammin2068
  2016-09-18 17:50                                       ` Chris Murphy
  2016-09-18 18:08                                       ` Benjammin2068
  0 siblings, 2 replies; 25+ messages in thread
From: Benjammin2068 @ 2016-09-18 17:13 UTC (permalink / raw)
  To: linux-raid

In a followup question to my arrays, I have a question about the new WDs with the larger sector size geometry but support 512B sectors.

I bought some WD Reds (WD10EFRX) drives.

When I let the linux "Disk Utility" (palimpest <- who the heck named that anyway?) do the RAID management with a new drive, it partitions on cyls and not sectors.

So it makes a partition and then complains to me it's off by 512bytes which could affect performance.

Gee. Thanks.

So I can use g/parted -- or fdisk....

but I thought I'd get any suggestions for the preferred tool and any pitfalls to watch out for.

Thanks,

 -Ben

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
  2016-09-18 17:13                                     ` Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)) Benjammin2068
@ 2016-09-18 17:50                                       ` Chris Murphy
  2016-09-18 18:41                                         ` Benjammin2068
  2016-09-18 18:08                                       ` Benjammin2068
  1 sibling, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2016-09-18 17:50 UTC (permalink / raw)
  To: Benjammin2068; +Cc: Linux-RAID

On Sun, Sep 18, 2016 at 11:13 AM, Benjammin2068 <benjammin2068@gmail.com> wrote:
> In a followup question to my arrays, I have a question about the new WDs with the larger sector size geometry but support 512B sectors.
>
> I bought some WD Reds (WD10EFRX) drives.
>
> When I let the linux "Disk Utility" (palimpest <- who the heck named that anyway?) do the RAID management with a new drive, it partitions on cyls and not sectors.
>
> So it makes a partition and then complains to me it's off by 512bytes which could affect performance.

This is one of the dumbest things, haha. I do not for the life of me
understand what distribution won't backport this, if they're unwilling
to put modern tools for modern hardware in their distributions. It's
one of the simplest, safest backports they could do and yet they
don't. Incredible to me.

Anyway, yeah partition with something not from the Pleistocene.
Seriously, it's that old, it's that much of a solved problem, for
probably 5 years, maybe even longer.

Any version of gdisk will do this correctly out of the box, so you can
just install that from your existing old distro presumably. And if you
can't, then get a recent live CD from pretty much anybody: Fedora 23
or Fedora 24 has gdisk already on the media, and its version of parted
and fdisk, also included, all do alignment to 4KiB sectors correctly.

Actually, on either Fedora live media version you can do

dnf install https://kojipkgs.fedoraproject.org//packages/blivet-gui/2.0.1/1.fc25/noarch/blivet-gui-2.0.1-1.fc25.noarch.rpm

Which is the current version, and it will work on F24 for sure and
maybe/probably F23 also. And dnf will sort out any additional
dependencies needed. It has a similar gparted style UI, but it will do
all kinds of wild things: mdadm raid, LVM raid, Btrfs. It'll create
the partitions, RAID, LV's, file systems, and it will discover things
already on the drive and properly wipe their signatures with a proper
tear down before creating the new things. So you don't end up with
crusty old stuff coming back to haunt you some other day.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
  2016-09-18 17:13                                     ` Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)) Benjammin2068
  2016-09-18 17:50                                       ` Chris Murphy
@ 2016-09-18 18:08                                       ` Benjammin2068
  1 sibling, 0 replies; 25+ messages in thread
From: Benjammin2068 @ 2016-09-18 18:08 UTC (permalink / raw)
  To: linux-raid

As an update to this, here's some data:


the older Samsung HD103SJ drives (3 of the 4 drive RAID5 are still alive and well in this stack) have partition#1 (/dev/sdX1) which lists out at:

> [root@quantum myth]# sfdisk -l -uM /dev/sdc        <-- this is the output from one of the 3 HD103SJ drives. The partition was originally created by palimpest.
>
> Disk /dev/sdc: 121601 cylinders, 255 heads, 63 sectors/track
> Units = mebibytes of 1048576 bytes, blocks of 1024 bytes, counting from 0
>
>    Device Boot Start   End    MiB    #blocks   Id  System
> /dev/sdc1         0+ 953867- 953868- 976760001   fd  Linux raid autodetect
> /dev/sdc2         0      -      0          0    0  Empty
> /dev/sdc3         0      -      0          0    0  Empty
> /dev/sdc4         0      -      0          0    0  Empty

When I do the math:

976,760,001 * 1024  = 1,000,202,241,024 bytes --- ok, so that's /dev/sdX1

Now we take 1,000,202,241,024 / 4096 (block size of new drives) = 244190000.25 -- so I have a 1024byte (2 512byte sector) difference between the 2 models when trying to switch over.

Is there a best practice for how to contend with this? (resize the partition somehow on the raid and then alter the partitions sizes -2 sectors to make then /8 nicely? I know. Sounds insane. I have backups. I'd do it. :P )

Should I just eat the performance hit for now?

Thanks,

 -Ben



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
  2016-09-18 17:50                                       ` Chris Murphy
@ 2016-09-18 18:41                                         ` Benjammin2068
  2016-09-18 19:17                                           ` Wols Lists
  0 siblings, 1 reply; 25+ messages in thread
From: Benjammin2068 @ 2016-09-18 18:41 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Linux-RAID

On 09/18/2016 12:50 PM, Chris Murphy wrote:
>
> This is one of the dumbest things, haha. I do not for the life of me
> understand what distribution won't backport this, if they're unwilling
> to put modern tools for modern hardware in their distributions. It's
> one of the simplest, safest backports they could do and yet they
> don't. Incredible to me.

Yeaaaa.... and considering how often I have to do these kinds of installs or admin... it's... well.. yea.


> Any version of gdisk will do this correctly out of the box, so you can
> just install that from your existing old distro presumably. And if you
> can't, then get a recent live CD from pretty much anybody: Fedora 23
> or Fedora 24 has gdisk already on the media, and its version of parted
> and fdisk, also included, all do alignment to 4KiB sectors correctly.
>
> Actually, on either Fedora live media version you can do
>
> dnf install https://kojipkgs.fedoraproject.org//packages/blivet-gui/2.0.1/1.fc25/noarch/blivet-gui-2.0.1-1.fc25.noarch.rpm
>
> Which is the current version, and it will work on F24 for sure and
> maybe/probably F23 also. And dnf will sort out any additional
> dependencies needed. It has a similar gparted style UI, but it will do
> all kinds of wild things: mdadm raid, LVM raid, Btrfs. It'll create
> the partitions, RAID, LV's, file systems, and it will discover things
> already on the drive and properly wipe their signatures with a proper
> tear down before creating the new things. So you don't end up with
> crusty old stuff coming back to haunt you some other day.
>

I'll check - this is CentOS... but I've (as shown in followup email) played with fdisk (which doesn't bother me) and some of the others...

now I just have to sort out this offset issue which I think I'm stuck with due to different partition sizes.

 -Ben


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
  2016-09-18 18:41                                         ` Benjammin2068
@ 2016-09-18 19:17                                           ` Wols Lists
  2016-09-18 19:58                                             ` Benjammin2068
  0 siblings, 1 reply; 25+ messages in thread
From: Wols Lists @ 2016-09-18 19:17 UTC (permalink / raw)
  To: Benjammin2068, Chris Murphy; +Cc: Linux-RAID

On 18/09/16 19:41, Benjammin2068 wrote:
> I'll check - this is CentOS... but I've (as shown in followup email) played with fdisk (which doesn't bother me) and some of the others...
> 
> now I just have to sort out this offset issue which I think I'm stuck with due to different partition sizes.

Don't quite understand what you're trying to do, but ...

I'm sure you know this, but getting the physical/logical block size
out-of-sync hurts disk performance. And copying a smaller partition into
a larger allocated space is perfectly harmless. So...

I'd simply use a modern partition manager (such as gdisk) to partition
your new drives such that the new partitions are larger than the
existing ones, and are properly aligned relative to the drive geometry.

Then copy the old partitions across however you were planning - whether
it's "mdadm --replace" or stopping the array and "dd old-device
new-device" or whatever.

If you've got a bit of wasted space, or whatever, who cares.
You can resize your file-systems to use all available space, if you wish
(can't remember how, whenever I've done that sort of stuff it hasn't
been hard).

But I'd certainly try and avoid those offset warnings - it smacks to me
of a mismatch between 512-byte blocks and 4K disk sectors, and I
wouldn't want the drive firmware messing about correcting mismatches
between OS 4K blocks and drive 4K blocks. I don't fully understand it
but I know there was a lot of grief with exactly this sort of thing in
the transition from 512-byte to 4K.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
  2016-09-18 19:17                                           ` Wols Lists
@ 2016-09-18 19:58                                             ` Benjammin2068
  2016-09-18 21:21                                               ` Wols Lists
  0 siblings, 1 reply; 25+ messages in thread
From: Benjammin2068 @ 2016-09-18 19:58 UTC (permalink / raw)
  To: Wols Lists, Chris Murphy; +Cc: Linux-RAID



On 09/18/2016 02:17 PM, Wols Lists wrote:
>
> I'm sure you know this, but getting the physical/logical block size
> out-of-sync hurts disk performance. And copying a smaller partition into
> a larger allocated space is perfectly harmless. So...
>
> I'd simply use a modern partition manager (such as gdisk) to partition
> your new drives such that the new partitions are larger than the
> existing ones, and are properly aligned relative to the drive geometry.
>
> Then copy the old partitions across however you were planning - whether
> it's "mdadm --replace" or stopping the array and "dd old-device
> new-device" or whatever.
>
> If you've got a bit of wasted space, or whatever, who cares.
> You can resize your file-systems to use all available space, if you wish
> (can't remember how, whenever I've done that sort of stuff it hasn't
> been hard).
>
> But I'd certainly try and avoid those offset warnings - it smacks to me
> of a mismatch between 512-byte blocks and 4K disk sectors, and I
> wouldn't want the drive firmware messing about correcting mismatches
> between OS 4K blocks and drive 4K blocks. I don't fully understand it
> but I know there was a lot of grief with exactly this sort of thing in
> the transition from 512-byte to 4K.
>


Aha! That's what I needed to know.

I was wondering if I can make a partition (I think) that's 3/4 of a block larger (3072bytes) than the original /dev/sdX1's on the old HD103SJs drives.

You've answered my question perfectly.

I can use sfdisk or parted to get that done...

Thanks a bunch!

 -Ben


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
  2016-09-18 19:58                                             ` Benjammin2068
@ 2016-09-18 21:21                                               ` Wols Lists
  2016-09-18 21:29                                                 ` Benjammin2068
  0 siblings, 1 reply; 25+ messages in thread
From: Wols Lists @ 2016-09-18 21:21 UTC (permalink / raw)
  To: Benjammin2068, Chris Murphy; +Cc: Linux-RAID

On 18/09/16 20:58, Benjammin2068 wrote:
> Aha! That's what I needed to know.
> 
> I was wondering if I can make a partition (I think) that's 3/4 of a block larger (3072bytes) than the original /dev/sdX1's on the old HD103SJs drives.

Good. It's a bit like string logic - if the buffer is bigger than the
string everything's fine, but if the string is bigger than the buffer,
well, ooopppssssss.

Basically, I think the root cause of all this mess is that drive
sectors/blocks/whatever used to be 512 bytes. So, obviously, it made
sense to have sector 0 be the boot sector, and your first partition
started in sector 1. If your drives are small, you don't want to waste
space.

Then the new drives came along with 4K sectors. Aarghh. Put an old-style
partition scheme on a new-style drive, and every OS 4K block would start
in the 2nd 512-byte block of a 4K drive sector. So every disk write from
the OS would force the drive to read two sectors from disk, overlay the
OS block over them, and write them both back. Not nice. And the latest
drives refuse to do that!

Which is one of the reasons why modern partitioning programs start the
first partition - iirc - at the start of the 3rd megabyte of the disk.
Leaving plenty of space for the boot/startup code.

So it's not worth replicating your old partitions directly on the new
drives. Just make sure the new drives are the same size (or a bit
larger) than the old ones, and move the data across. Bit like copying a
string :-)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
  2016-09-18 21:21                                               ` Wols Lists
@ 2016-09-18 21:29                                                 ` Benjammin2068
  2016-09-19  6:25                                                   ` Wols Lists
  0 siblings, 1 reply; 25+ messages in thread
From: Benjammin2068 @ 2016-09-18 21:29 UTC (permalink / raw)
  To: Wols Lists; +Cc: Linux-RAID

On 09/18/2016 04:21 PM, Wols Lists wrote:
> On 18/09/16 20:58, Benjammin2068 wrote:
>> Aha! That's what I needed to know.
>>
>> I was wondering if I can make a partition (I think) that's 3/4 of a block larger (3072bytes) than the original /dev/sdX1's on the old HD103SJs drives.
> Good. It's a bit like string logic - if the buffer is bigger than the
> string everything's fine, but if the string is bigger than the buffer,
> well, ooopppssssss.
>
> Basically, I think the root cause of all this mess is that drive
> sectors/blocks/whatever used to be 512 bytes. So, obviously, it made
> sense to have sector 0 be the boot sector, and your first partition
> started in sector 1. If your drives are small, you don't want to waste
> space.
>
> Then the new drives came along with 4K sectors. Aarghh. Put an old-style
> partition scheme on a new-style drive, and every OS 4K block would start
> in the 2nd 512-byte block of a 4K drive sector. So every disk write from
> the OS would force the drive to read two sectors from disk, overlay the
> OS block over them, and write them both back. Not nice. And the latest
> drives refuse to do that!

hah.. yea.. I remember when it happened (and why). (I still have a seagate ST-251 40MB MFM HD sitting in a box with my Atari software on it. Right  now, it's Schrodinger's drive. It still working as long as I don't pull it out and test it. LoL....)

Drive companies claimed (and maybe rightfully so) that the 512B sector with all the seeks required to read data was wasteful. (considering the armature movement needed for scattered files and people who didn't defrag their drives.)

Also, the number of sectors that could be numbered on a drive was an issue with the sizes of drives coming out.

a 2^32 sectors @ 512bytes = 2,199,023,255,552 <-- doesn't that number ring a bell. ;)

So they moved to bigger sector sizes.

> Which is one of the reasons why modern partitioning programs start the
> first partition - iirc - at the start of the 3rd megabyte of the disk.
> Leaving plenty of space for the boot/startup code.

Yup. Now with all the bootloaders...

>
> So it's not worth replicating your old partitions directly on the new
> drives. Just make sure the new drives are the same size (or a bit
> larger) than the old ones, and move the data across. Bit like copying a
> string :-)

Sounds good. I was more worried about the specifics of the partition and how mdadm sees a larger sized partition -- NOT just a larger sized drive. (on which a same size partition could be built)

Thanks again,

 -Ben



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
  2016-09-18 21:29                                                 ` Benjammin2068
@ 2016-09-19  6:25                                                   ` Wols Lists
  2016-09-19 16:17                                                     ` Benjammin2068
  0 siblings, 1 reply; 25+ messages in thread
From: Wols Lists @ 2016-09-19  6:25 UTC (permalink / raw)
  To: Benjammin2068; +Cc: Linux-RAID

On 18/09/16 22:29, Benjammin2068 wrote:
> Sounds good. I was more worried about the specifics of the partition and how mdadm sees a larger sized partition -- NOT just a larger sized drive. (on which a same size partition could be built)

Yeah. I've done that a couple of times. Create the new partition larger
than the old one. dd the old partition across. Use whatever
filesystem-specific tool there was to grow the file system into all
available space on the partition.

Oh yes - and be damn careful with FAT :-) I can't remember the details,
but when there was a problem it used to prefer a faulty filesystem size
to the partition size, and would gaily sail off the end of the
partition, trashing the next partition. My "record to USB" TV seems
rather prone to this :-(

Cheers,
Wol

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
  2016-09-19  6:25                                                   ` Wols Lists
@ 2016-09-19 16:17                                                     ` Benjammin2068
  0 siblings, 0 replies; 25+ messages in thread
From: Benjammin2068 @ 2016-09-19 16:17 UTC (permalink / raw)
  To: Wols Lists; +Cc: Linux-RAID

On 09/19/2016 01:25 AM, Wols Lists wrote:
>
> Yeah. I've done that a couple of times. Create the new partition larger
> than the old one. dd the old partition across. Use whatever
> filesystem-specific tool there was to grow the file system into all
> available space on the partition.
>
> Oh yes - and be damn careful with FAT :-) I can't remember the details,
> but when there was a problem it used to prefer a faulty filesystem size
> to the partition size, and would gaily sail off the end of the
> partition, trashing the next partition. My "record to USB" TV seems
> rather prone to this :-(
>

These drives are wholly allocated to nothing but the RAID array... so I only have to make 1 partition and it's more or less the whole disk. :)

I've got the new WDs online and am growing that RAID5 to a RAID6 as we speak.

(two thumbs up)

I have (2) HD103SJ drives left in the array... one installed when the array was built and has about 44500 hours on it... while the other only has about 38400hours on it.

smartctl is keeping an eye on them for me. ;)

The rest of the drives are relatively new (especially after the episode of drive failures a couple weeks ago).

Thanks again for the help everyone!

 -Ben

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-09-19 16:17 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-22 21:51 Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive) Ben Kamen
2016-08-22 23:06 ` Adam Goryachev
2016-08-23 11:36   ` Wols Lists
2016-08-23 15:44     ` Ben
     [not found]   ` <CADDTLRBf9NPO6OuF4a3b+xffZgeZRqHRG+pJdPmbc9-Jat0HVQ@mail.gmail.com>
     [not found]     ` <d6d3fe0d-3f9f-985f-9bfb-051428cf221b@websitemanagers.com.au>
     [not found]       ` <57BBDA5B.3020706@gmail.com>
     [not found]         ` <57BBDC15.5030301@gmail.com>
     [not found]           ` <b8c6a380-7e6a-fda9-5834-b85271b26892@websitemanagers.com.au>
     [not found]             ` <57BC61F7.8070102@gmail.com>
     [not found]               ` <aca4e83f-9a3f-c200-7c16-3b5d9df52c1e@websitemanagers.com.au>
     [not found]                 ` <57BE450B.4030700@gmail.com>
     [not found]                   ` <56e86db5-456d-e9c1-339d-ba8903fe5dde@websitemanagers.com.au>
     [not found]                     ` <57BE52BC.6040908@gmail.com>
     [not found]                       ` <933228e0-bce4-ffad-f48d-034bf89bc07f@websitemanagers.com.au>
2016-08-26  1:20                         ` Ben
2016-08-26  2:22                           ` Phil Turmel
2016-08-26  2:54                             ` Benjammin2068
2016-08-26 12:38                               ` Phil Turmel
2016-08-26 18:07                           ` Wols Lists
2016-08-28 18:29                             ` Benjammin2068
2016-08-28 19:20                               ` Anthony Youngman
2016-08-29  1:23                                 ` Benjammin2068
2016-08-28 23:54                               ` Adam Goryachev
2016-08-29  1:25                                 ` Benjammin2068
2016-08-29 11:19                                   ` Wols Lists
2016-09-18 17:13                                     ` Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive)) Benjammin2068
2016-09-18 17:50                                       ` Chris Murphy
2016-09-18 18:41                                         ` Benjammin2068
2016-09-18 19:17                                           ` Wols Lists
2016-09-18 19:58                                             ` Benjammin2068
2016-09-18 21:21                                               ` Wols Lists
2016-09-18 21:29                                                 ` Benjammin2068
2016-09-19  6:25                                                   ` Wols Lists
2016-09-19 16:17                                                     ` Benjammin2068
2016-09-18 18:08                                       ` Benjammin2068

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.