All of lore.kernel.org
 help / color / mirror / Atom feed
* hardware recovery and RAID5 services
@ 2022-01-21 16:48 David T-G
  2022-01-21 19:31 ` Wols Lists
  0 siblings, 1 reply; 17+ messages in thread
From: David T-G @ 2022-01-21 16:48 UTC (permalink / raw)
  To: Linux RAID

Hi, all --

After the holidays and the hand surgery and the other "important"
projects, I am finally able to circle back to my 4 x 4T RAID5 disk
problem.  I've been promised that I can get 30 - 60 minutes each night
to do more poking and send responses ... but that was a week ago and here
is my first chance to sit down, so I don't know how much I believe it!

Has anyone worked with any recovery companies in the US?  One of the 4T
devices failed, possibly just because of motor issues, and one is
throwing read errors, while two are just fine.  This would probably be
easy for the right folks to fix, of course at some cost; the question is
are there any folks that you might recommend and how much is that cost.

I'd appreciate any input anyone can provide ...  Meanwhile, I'm also
refreshing myself and going through all of the steps to be able to ask
good questions and present necessary information in case I do indeed get
to scrape my knuckles against this stuff enough to recover the data.


Thanks again & HAND

:-D
-- 
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hardware recovery and RAID5 services
  2022-01-21 16:48 hardware recovery and RAID5 services David T-G
@ 2022-01-21 19:31 ` Wols Lists
  2022-01-21 19:34   ` Wols Lists
  2022-01-29 15:21   ` David T-G
  0 siblings, 2 replies; 17+ messages in thread
From: Wols Lists @ 2022-01-21 19:31 UTC (permalink / raw)
  To: David T-G, Linux RAID

On 21/01/2022 16:48, David T-G wrote:
> Has anyone worked with any recovery companies in the US?  One of the 4T
> devices failed, possibly just because of motor issues, and one is
> throwing read errors, while two are just fine.  This would probably be
> easy for the right folks to fix, of course at some cost; the question is
> are there any folks that you might recommend and how much is that cost.

I've heard assorted stories about sticking drives in the freezer to help 
them recover, and have personal experience of being given a drive that 
has "failed" and I've managed to recover. Firstly, do you have a new 4TB 
drive to recover to? My worry is trying to copy the drive with read 
errors may lead to even more trouble. I'd try and recover it with 
ddrescue, and see how much it can get back.

Secondly, I'm sure I've dealt with these people in the past, although I 
can't vouch for them ...

https://www.vogon-computer-evidence.com/our-story/

I didn't use them for recovering damaged kit, we just had a bunch of 
9-track backups, but no 9-track drive, so they dumped them to CD for us. 
 From a business p-o-v it wasn't expensive ... see if they've got an 
operation near you. Describe your problem in as much detail as you can, 
and see if they'll give you an estimate ...

Cheers,
Wol

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hardware recovery and RAID5 services
  2022-01-21 19:31 ` Wols Lists
@ 2022-01-21 19:34   ` Wols Lists
  2022-01-21 19:47     ` Wols Lists
  2022-01-29 15:21   ` David T-G
  1 sibling, 1 reply; 17+ messages in thread
From: Wols Lists @ 2022-01-21 19:34 UTC (permalink / raw)
  To: David T-G, Linux RAID

On 21/01/2022 19:31, Wols Lists wrote:
> Secondly, I'm sure I've dealt with these people in the past, although I 
> can't vouch for them ...
> 
> https://www.vogon-computer-evidence.com/our-story/

OUCH! Having found that page (which is pretty much as I remember the 
company), the rest of the web site looks like a cobweb site. So I don;t 
know what's happened, but it doesn't look promising ...

Cheers,
Wol

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hardware recovery and RAID5 services
  2022-01-21 19:34   ` Wols Lists
@ 2022-01-21 19:47     ` Wols Lists
  2022-01-22 14:23       ` Roger Heflin
  0 siblings, 1 reply; 17+ messages in thread
From: Wols Lists @ 2022-01-21 19:47 UTC (permalink / raw)
  To: David T-G, Linux RAID

On 21/01/2022 19:34, Wols Lists wrote:
> On 21/01/2022 19:31, Wols Lists wrote:
>> Secondly, I'm sure I've dealt with these people in the past, although 
>> I can't vouch for them ...
>>
>> https://www.vogon-computer-evidence.com/our-story/
> 
> OUCH! Having found that page (which is pretty much as I remember the 
> company), the rest of the web site looks like a cobweb site. So I don;t 
> know what's happened, but it doesn't look promising ...
> 
Following up further yes it certainly looks like a cobweb site. The 
company was taken over by Ontrack - I've seen a couple of 
recommendations for them. But I have to re-iterate I can't vouch for 
them, just they are a big professional company that does that sort of thing.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hardware recovery and RAID5 services
  2022-01-21 19:47     ` Wols Lists
@ 2022-01-22 14:23       ` Roger Heflin
  2022-01-22 22:23         ` Phil Turmel
  0 siblings, 1 reply; 17+ messages in thread
From: Roger Heflin @ 2022-01-22 14:23 UTC (permalink / raw)
  To: Wols Lists; +Cc: David T-G, Linux RAID

From the recovery I know about in the last 3 years, it was several
thousand US$ per TB for the recovery.

On Sat, Jan 22, 2022 at 1:33 AM Wols Lists <antlists@youngman.org.uk> wrote:
>
> On 21/01/2022 19:34, Wols Lists wrote:
> > On 21/01/2022 19:31, Wols Lists wrote:
> >> Secondly, I'm sure I've dealt with these people in the past, although
> >> I can't vouch for them ...
> >>
> >> https://www.vogon-computer-evidence.com/our-story/
> >
> > OUCH! Having found that page (which is pretty much as I remember the
> > company), the rest of the web site looks like a cobweb site. So I don;t
> > know what's happened, but it doesn't look promising ...
> >
> Following up further yes it certainly looks like a cobweb site. The
> company was taken over by Ontrack - I've seen a couple of
> recommendations for them. But I have to re-iterate I can't vouch for
> them, just they are a big professional company that does that sort of thing.
>
> Cheers,
> Wol

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hardware recovery and RAID5 services
  2022-01-22 14:23       ` Roger Heflin
@ 2022-01-22 22:23         ` Phil Turmel
  2022-01-23  0:20           ` anthony
                             ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Phil Turmel @ 2022-01-22 22:23 UTC (permalink / raw)
  To: Roger Heflin, Wols Lists; +Cc: David T-G, Linux RAID

Hi David, et al,

The principle of "My Hard Drive Died" is Scott Moulton, a highly 
respected member of the forensics and white-hat scene here in the 
Atlanta Metro Area.

https://myharddrivedied.com/

That said, I highly recommend copying the disk showing read errors onto 
another disk, keeping the log of sectors replaced by zeros. Then 
performing a file by file backup from the degraded array, using the copy 
instead of the troubled drive.

*After* you recover what you can, examine the replaced sector list and 
back-calculate what files, if any, were affected.  This will give you a 
limited and less expensive task to pay experts to solve.  Or carry on 
with whatever you ended up with.  I think your odds are good.

And yes, the pros are not cheap.

On 1/22/22 9:23 AM, Roger Heflin wrote:
>  From the recovery I know about in the last 3 years, it was several
> thousand US$ per TB for the recovery.
> 
> On Sat, Jan 22, 2022 at 1:33 AM Wols Lists <antlists@youngman.org.uk> wrote:
>>
>> On 21/01/2022 19:34, Wols Lists wrote:
>>> On 21/01/2022 19:31, Wols Lists wrote:
>>>> Secondly, I'm sure I've dealt with these people in the past, although
>>>> I can't vouch for them ...
>>>>
>>>> https://www.vogon-computer-evidence.com/our-story/
>>>
>>> OUCH! Having found that page (which is pretty much as I remember the
>>> company), the rest of the web site looks like a cobweb site. So I don;t
>>> know what's happened, but it doesn't look promising ...
>>>
>> Following up further yes it certainly looks like a cobweb site. The
>> company was taken over by Ontrack - I've seen a couple of
>> recommendations for them. But I have to re-iterate I can't vouch for
>> them, just they are a big professional company that does that sort of thing.

Phil

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hardware recovery and RAID5 services
  2022-01-22 22:23         ` Phil Turmel
@ 2022-01-23  0:20           ` anthony
  2022-01-29 15:25           ` David T-G
  2022-01-29 15:36           ` Wols Lists
  2 siblings, 0 replies; 17+ messages in thread
From: anthony @ 2022-01-23  0:20 UTC (permalink / raw)
  To: Phil Turmel, Roger Heflin, Wols Lists; +Cc: David T-G, Linux RAID

Looks like one of the first things I need to do when my raid testbed is 
up and running, is to set up a disk drive with dm-integrity --no-format, 
and see if I can dd successfully to it.

IFF that works, you'll then be able to just add it straight back in to 
the array, and running an integrity check will trigger a read error on 
anything that couldn't be recovered.

But there's no way I could recommend that at the moment, seeing as I 
have no idea whether or not it will actually work, even though I think 
it should.

Cheers,
Wol

On 22/01/2022 22:23, Phil Turmel wrote:
> Hi David, et al,
> 
> The principle of "My Hard Drive Died" is Scott Moulton, a highly 
> respected member of the forensics and white-hat scene here in the 
> Atlanta Metro Area.
> 
> https://myharddrivedied.com/
> 
> That said, I highly recommend copying the disk showing read errors onto 
> another disk, keeping the log of sectors replaced by zeros. Then 
> performing a file by file backup from the degraded array, using the copy 
> instead of the troubled drive.
> 
> *After* you recover what you can, examine the replaced sector list and 
> back-calculate what files, if any, were affected.  This will give you a 
> limited and less expensive task to pay experts to solve.  Or carry on 
> with whatever you ended up with.  I think your odds are good.
> 
> And yes, the pros are not cheap.
> 
> On 1/22/22 9:23 AM, Roger Heflin wrote:
>>  From the recovery I know about in the last 3 years, it was several
>> thousand US$ per TB for the recovery.
>>
>> On Sat, Jan 22, 2022 at 1:33 AM Wols Lists <antlists@youngman.org.uk> 
>> wrote:
>>>
>>> On 21/01/2022 19:34, Wols Lists wrote:
>>>> On 21/01/2022 19:31, Wols Lists wrote:
>>>>> Secondly, I'm sure I've dealt with these people in the past, although
>>>>> I can't vouch for them ...
>>>>>
>>>>> https://www.vogon-computer-evidence.com/our-story/
>>>>
>>>> OUCH! Having found that page (which is pretty much as I remember the
>>>> company), the rest of the web site looks like a cobweb site. So I don;t
>>>> know what's happened, but it doesn't look promising ...
>>>>
>>> Following up further yes it certainly looks like a cobweb site. The
>>> company was taken over by Ontrack - I've seen a couple of
>>> recommendations for them. But I have to re-iterate I can't vouch for
>>> them, just they are a big professional company that does that sort of 
>>> thing.
> 
> Phil

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hardware recovery and RAID5 services
  2022-01-21 19:31 ` Wols Lists
  2022-01-21 19:34   ` Wols Lists
@ 2022-01-29 15:21   ` David T-G
  1 sibling, 0 replies; 17+ messages in thread
From: David T-G @ 2022-01-29 15:21 UTC (permalink / raw)
  To: Linux RAID

Wol, et al --

...and then Wols Lists said...
% 
% On 21/01/2022 16:48, David T-G wrote:
% > Has anyone worked with any recovery companies in the US?  One of the 4T
...
% 
% Secondly, I'm sure I've dealt with these people in the past, although I
% can't vouch for them ...
% 
% https://www.vogon-computer-evidence.com/our-story/
[snip]

Even though they're dead now, the name is too good to pass up :-)  Thanks
for the link.


HANW

:-D
-- 
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hardware recovery and RAID5 services
  2022-01-22 22:23         ` Phil Turmel
  2022-01-23  0:20           ` anthony
@ 2022-01-29 15:25           ` David T-G
  2022-01-29 15:36           ` Wols Lists
  2 siblings, 0 replies; 17+ messages in thread
From: David T-G @ 2022-01-29 15:25 UTC (permalink / raw)
  To: Linux RAID

Phil, et al --

...and then Phil Turmel said...
% 
% Hi David, et al,
% 
% The principle of "My Hard Drive Died" is Scott Moulton, a highly respected
% member of the forensics and white-hat scene here in the Atlanta Metro Area.
% 
% https://myharddrivedied.com/

Great to know!  Thanks so much.


% 
% That said, I highly recommend copying the disk showing read errors onto
% another disk, keeping the log of sectors replaced by zeros. Then performing
% a file by file backup from the degraded array, using the copy instead of the
% troubled drive.
[snip]

I think I'm about there now.  I believe I have a "virtual array" 

  diskfarm:/mnt/10Traid50md/tmp # head -4 /proc/mdstat
  Personalities : [raid6] [raid5] [raid4] [raid0] 
  md0 : active (read-only) raid5 loop12[0] loop11[3] loop10[4]
        11720265216 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [U_UU]
  
  diskfarm:/mnt/10Traid50md/tmp # mdadm -E /dev/md0
  /dev/md0:
     MBR Magic : aa55
  Partition[0] :   4294967295 sectors at            1 (type ee)

which presents me a partition with an ailing XFS filesystem on it.  I'm
hopeful that in my next round of an hour or two I can dig into superblock
adventures.


Thanks to all for all of the input!

:-D
-- 
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hardware recovery and RAID5 services
  2022-01-22 22:23         ` Phil Turmel
  2022-01-23  0:20           ` anthony
  2022-01-29 15:25           ` David T-G
@ 2022-01-29 15:36           ` Wols Lists
  2022-01-31 15:39             ` Nix
  2 siblings, 1 reply; 17+ messages in thread
From: Wols Lists @ 2022-01-29 15:36 UTC (permalink / raw)
  To: Phil Turmel, Roger Heflin; +Cc: David T-G, Linux RAID

On 22/01/2022 22:23, Phil Turmel wrote:
> That said, I highly recommend copying the disk showing read errors onto 
> another disk, keeping the log of sectors replaced by zeros. Then 
> performing a file by file backup from the degraded array, using the copy 
> instead of the troubled drive.

I believe there is also a way of injecting a hardware error onto a 
drive. Unless you can take a backup of the backup :-) I wouldn't 
recommend it at the moment, but there's some ATA command or whatever 
that tells the drive to flag a sector as bad, and return a read error 
until it's over-written.

Obviously, doing that on the sectors that weren't rescued, and then 
doing a scrub, is going to recover your data if it's both possible and 
done right :-)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hardware recovery and RAID5 services
  2022-01-29 15:36           ` Wols Lists
@ 2022-01-31 15:39             ` Nix
  2022-01-31 18:59               ` Roger Heflin
  0 siblings, 1 reply; 17+ messages in thread
From: Nix @ 2022-01-31 15:39 UTC (permalink / raw)
  To: Wols Lists; +Cc: Phil Turmel, Roger Heflin, David T-G, Linux RAID

On 29 Jan 2022, Wols Lists told this:

> I believe there is also a way of injecting a hardware error onto a
> drive. Unless you can take a backup of the backup :-) I wouldn't
> recommend it at the moment, but there's some ATA command or whatever
> that tells the drive to flag a sector as bad, and return a read error
> until it's over-written.

See hdparm --make-bad-sector. The manpage says "EXCEPTIONALLY DANGEROUS.
DO NOT USE THIS OPTION!!". It is not lying. :)

(This is also --write-sector, which is merely VERY DANGEROUS, but can be
used to force rewrites of bad sectors. Make sure you get the sector
number right! Needless to say, if you don't, it's too late, and there's
no real way to test in advance...)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hardware recovery and RAID5 services
  2022-01-31 15:39             ` Nix
@ 2022-01-31 18:59               ` Roger Heflin
  2022-01-31 19:07                 ` Geoff Back
  0 siblings, 1 reply; 17+ messages in thread
From: Roger Heflin @ 2022-01-31 18:59 UTC (permalink / raw)
  To: Nix; +Cc: Wols Lists, Phil Turmel, David T-G, Linux RAID

And do not do write-sector on a disk that is in use in RAID, otherwise
that sectors data is gone.

I will completely remove a disk/partition and do --write-sectors
against it and then do a --add (don't do a re-add).    In general
though I have not had a lot of luck with the write-sector fixing
and/or forcing a reallocate even when the sector is clearly bad.  I
have to conclude (based on both WD and seagate not reallocating
sectors that reliably fail rereads in <30-seconds after just being
re-written) that pretty much everyone's disk firmware must suck.

Would --make-bad-sector work to force a reallocate?

On Mon, Jan 31, 2022 at 9:40 AM Nix <nix@esperi.org.uk> wrote:
>
> On 29 Jan 2022, Wols Lists told this:
>
> > I believe there is also a way of injecting a hardware error onto a
> > drive. Unless you can take a backup of the backup :-) I wouldn't
> > recommend it at the moment, but there's some ATA command or whatever
> > that tells the drive to flag a sector as bad, and return a read error
> > until it's over-written.
>
> See hdparm --make-bad-sector. The manpage says "EXCEPTIONALLY DANGEROUS.
> DO NOT USE THIS OPTION!!". It is not lying. :)
>
> (This is also --write-sector, which is merely VERY DANGEROUS, but can be
> used to force rewrites of bad sectors. Make sure you get the sector
> number right! Needless to say, if you don't, it's too late, and there's
> no real way to test in advance...)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hardware recovery and RAID5 services
  2022-01-31 18:59               ` Roger Heflin
@ 2022-01-31 19:07                 ` Geoff Back
  2022-01-31 19:21                   ` Phil Turmel
  0 siblings, 1 reply; 17+ messages in thread
From: Geoff Back @ 2022-01-31 19:07 UTC (permalink / raw)
  To: Roger Heflin, Nix; +Cc: Wols Lists, Phil Turmel, David T-G, Linux RAID

On 31/01/2022 18:59, Roger Heflin wrote:
> And do not do write-sector on a disk that is in use in RAID, otherwise
> that sectors data is gone.
>
> I will completely remove a disk/partition and do --write-sectors
> against it and then do a --add (don't do a re-add).    In general
> though I have not had a lot of luck with the write-sector fixing
> and/or forcing a reallocate even when the sector is clearly bad.  I
> have to conclude (based on both WD and seagate not reallocating
> sectors that reliably fail rereads in <30-seconds after just being
> re-written) that pretty much everyone's disk firmware must suck.
>
> Would --make-bad-sector work to force a reallocate?
>
> On Mon, Jan 31, 2022 at 9:40 AM Nix <nix@esperi.org.uk> wrote:
>> On 29 Jan 2022, Wols Lists told this:
>>
>>> I believe there is also a way of injecting a hardware error onto a
>>> drive. Unless you can take a backup of the backup :-) I wouldn't
>>> recommend it at the moment, but there's some ATA command or whatever
>>> that tells the drive to flag a sector as bad, and return a read error
>>> until it's over-written.
>> See hdparm --make-bad-sector. The manpage says "EXCEPTIONALLY DANGEROUS.
>> DO NOT USE THIS OPTION!!". It is not lying. :)
>>
>> (This is also --write-sector, which is merely VERY DANGEROUS, but can be
>> used to force rewrites of bad sectors. Make sure you get the sector
>> number right! Needless to say, if you don't, it's too late, and there's
>> no real way to test in advance...)

If a disk has one or more bad sectors, surely the only logical action is
to schedule it for replacement as soon as a new one can be obtained; and
if it's still in warranty, send it back.  If the data is valuable enough
to warrant use of RAID (along with, presumably, appropriate backups)
surely it is too valuable to risk continuing to use a known faulty disk?

In which case, I would suggest that dangerous experiments that try to
force the disk to reallocate the block are arguably pointless.

Just my opinion, but one that has served me well so far.

Regards,

Geoff.

-- 
Geoff Back
What if we're all just characters in someone's nightmares?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hardware recovery and RAID5 services
  2022-01-31 19:07                 ` Geoff Back
@ 2022-01-31 19:21                   ` Phil Turmel
  2022-01-31 19:46                     ` Roger Heflin
                                       ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Phil Turmel @ 2022-01-31 19:21 UTC (permalink / raw)
  To: Geoff Back, Roger Heflin, Nix; +Cc: Wols Lists, David T-G, Linux RAID

On 1/31/22 14:07, Geoff Back wrote:

> 
> If a disk has one or more bad sectors, surely the only logical action is
> to schedule it for replacement as soon as a new one can be obtained; and
> if it's still in warranty, send it back.  If the data is valuable enough
> to warrant use of RAID (along with, presumably, appropriate backups)
> surely it is too valuable to risk continuing to use a known faulty disk?
> 
> In which case, I would suggest that dangerous experiments that try to
> force the disk to reallocate the block are arguably pointless.
> 
> Just my opinion, but one that has served me well so far.
> 
> Regards,
> 
> Geoff.

I would be surprised if you got warranty replacement just for a few 
re-allocated sectors.  The sheer quantity of sectors in modern drives 
and the tiny magnetic domains involved means **no** drive is error-free. 
  Just most defects are identified and mapped out before shipping. 
Reallocations cover the marginal cases.

I replace drives when re-allocations hit double digits, though I've had 
to run a few corners cases well past that point.

Phil

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hardware recovery and RAID5 services
  2022-01-31 19:21                   ` Phil Turmel
@ 2022-01-31 19:46                     ` Roger Heflin
  2022-01-31 20:04                     ` Geoff Back
  2022-01-31 20:53                     ` Wols Lists
  2 siblings, 0 replies; 17+ messages in thread
From: Roger Heflin @ 2022-01-31 19:46 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Geoff Back, Nix, Wols Lists, David T-G, Linux RAID

I have been warranty/replace them when the sectors refuse to
reallocate, and/or the disks continue to hit the ERC/TLER timeout all
of the time with bad sectors growing rapidly with no end in sight.

If one is using raid6 and given the low rate of bad sectors, then it
is pretty unlikely that there will be data loss.  If one was using
raid5 things would be more worrisome.


On Mon, Jan 31, 2022 at 1:21 PM Phil Turmel <philip@turmel.org> wrote:
>
> On 1/31/22 14:07, Geoff Back wrote:
>
> >
> > If a disk has one or more bad sectors, surely the only logical action is
> > to schedule it for replacement as soon as a new one can be obtained; and
> > if it's still in warranty, send it back.  If the data is valuable enough
> > to warrant use of RAID (along with, presumably, appropriate backups)
> > surely it is too valuable to risk continuing to use a known faulty disk?
> >
> > In which case, I would suggest that dangerous experiments that try to
> > force the disk to reallocate the block are arguably pointless.
> >
> > Just my opinion, but one that has served me well so far.
> >
> > Regards,
> >
> > Geoff.
>
> I would be surprised if you got warranty replacement just for a few
> re-allocated sectors.  The sheer quantity of sectors in modern drives
> and the tiny magnetic domains involved means **no** drive is error-free.
>   Just most defects are identified and mapped out before shipping.
> Reallocations cover the marginal cases.
>
> I replace drives when re-allocations hit double digits, though I've had
> to run a few corners cases well past that point.
>
> Phil

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hardware recovery and RAID5 services
  2022-01-31 19:21                   ` Phil Turmel
  2022-01-31 19:46                     ` Roger Heflin
@ 2022-01-31 20:04                     ` Geoff Back
  2022-01-31 20:53                     ` Wols Lists
  2 siblings, 0 replies; 17+ messages in thread
From: Geoff Back @ 2022-01-31 20:04 UTC (permalink / raw)
  To: Phil Turmel, Roger Heflin, Nix; +Cc: Wols Lists, David T-G, Linux RAID



On 31/01/2022 19:21, Phil Turmel wrote:
> On 1/31/22 14:07, Geoff Back wrote:
>
>> If a disk has one or more bad sectors, surely the only logical action is
>> to schedule it for replacement as soon as a new one can be obtained; and
>> if it's still in warranty, send it back.  If the data is valuable enough
>> to warrant use of RAID (along with, presumably, appropriate backups)
>> surely it is too valuable to risk continuing to use a known faulty disk?
>>
>> In which case, I would suggest that dangerous experiments that try to
>> force the disk to reallocate the block are arguably pointless.
>>
>> Just my opinion, but one that has served me well so far.
>>
>> Regards,
>>
>> Geoff.
> I would be surprised if you got warranty replacement just for a few 
> re-allocated sectors.  The sheer quantity of sectors in modern drives 
> and the tiny magnetic domains involved means **no** drive is error-free. 
>   Just most defects are identified and mapped out before shipping. 
> Reallocations cover the marginal cases.
>
> I replace drives when re-allocations hit double digits, though I've had 
> to run a few corners cases well past that point.
>
> Phil

I've never had a problem with any manufacturer replacing a drive that
reallocates even one sector within 12 months.  I just send them a
"smartctl -x" log.
I can't remember the last time I had a drive do its first sector
reallocate after 12 months but before end of warranty, so I can't really
comment on what the manufacturers might be like in that case.

Yes, there will be original manufacturing defects that are mapped out
before shipping.  That's fine and doesn't bother me.  But any drive that
has developed a bad sector after installation will in my experience tend
to develop more in time, and on a few occasions I've seen drives that
reallocate in "bursts" so the count remains fairly stable for a while
then jumps up 40 or 50 sectors within a few minutes.

I generally reckon that as soon as one bad sector appears on an
out-of-warranty drive (which is alerted by SMART monitoring) it's time
to start looking at replacement as soon as reasonably possible, subject
to drive availability and a good time for the swapout and rebuild.  That
might mean next-day a drive and replace immediately or it might mean
within a couple of weeks, depending on drive availability and the
operational cost of a total array failure.

I did come across a customer array on one occasion with between 50 and
1200 reallocated sectors on each of the 12 drives in the array.  it was
working and generally performance was as expected, but I would not have
dared to replace/rebuild any of those disks (it was ultimately done as a
complete new array and data migration).

As always, this is my (experience-based) opinion and your mileage may vary.

Regards,

Geoff.

-- 
Geoff Back
What if we're all just characters in someone's nightmares?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: hardware recovery and RAID5 services
  2022-01-31 19:21                   ` Phil Turmel
  2022-01-31 19:46                     ` Roger Heflin
  2022-01-31 20:04                     ` Geoff Back
@ 2022-01-31 20:53                     ` Wols Lists
  2 siblings, 0 replies; 17+ messages in thread
From: Wols Lists @ 2022-01-31 20:53 UTC (permalink / raw)
  To: Geoff Back, Nix; +Cc: Linux RAID

On 31/01/2022 19:21, Phil Turmel wrote:
> On 1/31/22 14:07, Geoff Back wrote:
> 
>>
>> If a disk has one or more bad sectors, surely the only logical action is
>> to schedule it for replacement as soon as a new one can be obtained; and
>> if it's still in warranty, send it back.  If the data is valuable enough
>> to warrant use of RAID (along with, presumably, appropriate backups)
>> surely it is too valuable to risk continuing to use a known faulty disk?
>>
>> In which case, I would suggest that dangerous experiments that try to
>> force the disk to reallocate the block are arguably pointless.
>>
>> Just my opinion, but one that has served me well so far.
>>
>> Regards,
>>
>> Geoff.
> 
> I would be surprised if you got warranty replacement just for a few 
> re-allocated sectors.  The sheer quantity of sectors in modern drives 
> and the tiny magnetic domains involved means **no** drive is error-free. 
>   Just most defects are identified and mapped out before shipping. 
> Reallocations cover the marginal cases.
> 
You've also missed the point that the drive IS NOT FAULTY.

We're trying to trigger a *deliberate* fault on the new drive, when the 
copy from the old (faulty) drive failed ...

If raid does a successful read from the new drive, it will corrupt the 
raid data ...

Cheers,
Wol

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-01-31 20:54 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-21 16:48 hardware recovery and RAID5 services David T-G
2022-01-21 19:31 ` Wols Lists
2022-01-21 19:34   ` Wols Lists
2022-01-21 19:47     ` Wols Lists
2022-01-22 14:23       ` Roger Heflin
2022-01-22 22:23         ` Phil Turmel
2022-01-23  0:20           ` anthony
2022-01-29 15:25           ` David T-G
2022-01-29 15:36           ` Wols Lists
2022-01-31 15:39             ` Nix
2022-01-31 18:59               ` Roger Heflin
2022-01-31 19:07                 ` Geoff Back
2022-01-31 19:21                   ` Phil Turmel
2022-01-31 19:46                     ` Roger Heflin
2022-01-31 20:04                     ` Geoff Back
2022-01-31 20:53                     ` Wols Lists
2022-01-29 15:21   ` David T-G

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.