All of lore.kernel.org
 help / color / mirror / Atom feed
* On URE and RAID rebuild - again!
@ 2014-07-30  8:29 Gionatan Danti
  2014-07-30 11:13 ` Mikael Abrahamsson
  0 siblings, 1 reply; 18+ messages in thread
From: Gionatan Danti @ 2014-07-30  8:29 UTC (permalink / raw)
  To: linux-raid; +Cc: g.danti

Hi all,
I recently "scrubbed" the linux-raid list on URE and found some very 
interesting informations [1]. However, I don't have a definite answer on 
UREs and their effect on a RAID system, especially during rebuild - so 
be patient my me, please :)

1) From what I know, URE rate is measured in events per bit read. This 
means that a drive rated with a URE of "<10 in 10^-16" [2] will have 
less than 10 unreadable sectors per 1 PB read, or less than 1 URE event 
per 100 TB read. Moreover, when an URE happens the entire sector will be 
"lost". So, in the example above, with 512B sectors, I can "lost" 512B 
per 100TB.
QUESTION n.1: Is this explanation correct?

2) The URE rate measure a probability or it is a statistical record? In 
the first case (URE is a probability) even a relatively high URE rate of 
10^-14 is not traduced in "surely it will happen each 12 TB read", but 
in "you have ~63% an URE will happen". However is URE rate is the result 
of statistical evidence I can be quite sure that it will bite me at 
about 12.5 TB read. Sure this is an oversimplification, but I hope to be 
sufficiently clear here :)
QUESTION n.2: URE define a probability or a statistical evidence?

3) From what I understand having read some other mails, in the case of 
URE during RAID rebuild mdadm will _stop_ the rebuild and inform you of 
what happened. However, you could re-start the array, remount it and try 
to recover data via normal filesystem copy. If, and when, the filesystem 
will try to read the data affected by URE, mdadm will report back to it 
a "read error" and the filesystem can react as it want (re-try the copy, 
report back to user, abort the copy, etc.)
QUESTION n.3: is it what really happen on parity RAID (5,6)?
QUESTION n.4: what about mirror-striped array as RAID10? They follow the 
same behavior?

Thank you all and sorry for the lengthy mail!


[1] http://marc.info/?l=linux-raid&m=139025054501419&w=2
[2] http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-771444.pdf

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-07-30  8:29 On URE and RAID rebuild - again! Gionatan Danti
@ 2014-07-30 11:13 ` Mikael Abrahamsson
  2014-07-30 13:05   ` Gionatan Danti
  0 siblings, 1 reply; 18+ messages in thread
From: Mikael Abrahamsson @ 2014-07-30 11:13 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-raid

On Wed, 30 Jul 2014, Gionatan Danti wrote:

> QUESTION n.1: Is this explanation correct?
> QUESTION n.2: URE define a probability or a statistical evidence?

There has been much discussion about the URE figures. Some people 
interpret it one way, others another way. There is nobody here that knows 
for sure. Ask your HDD vendor, if they answer, do share here!

> 3) From what I understand having read some other mails, in the case of URE 
> during RAID rebuild mdadm will _stop_ the rebuild and inform you of what 
> happened. However, you could re-start the array, remount it and try to 
> recover data via normal filesystem copy. If, and when, the filesystem will 
> try to read the data affected by URE, mdadm will report back to it a "read 
> error" and the filesystem can react as it want (re-try the copy, report back 
> to user, abort the copy, etc.)
> QUESTION n.3: is it what really happen on parity RAID (5,6)?
> QUESTION n.4: what about mirror-striped array as RAID10? They follow the same 
> behavior?

When MD encounters an URE, it should calculate that block from parity 
information and write it. I have personally had problems with this not 
happening, seems it might be that if the URE doesn't happen repeatedly, MD 
might not re-write. All parity raid levels should behave the same, so this 
should work identically for RAID1, RAID10, RAID5 and RAID6.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-07-30 11:13 ` Mikael Abrahamsson
@ 2014-07-30 13:05   ` Gionatan Danti
  2014-07-30 21:31     ` NeilBrown
  0 siblings, 1 reply; 18+ messages in thread
From: Gionatan Danti @ 2014-07-30 13:05 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: linux-raid, g.danti

On 30/07/2014 13:13, Mikael Abrahamsson wrote:
>
> There has been much discussion about the URE figures. Some people
> interpret it one way, others another way. There is nobody here that
> knows for sure. Ask your HDD vendor, if they answer, do share here!
>
- Ouch! -
I was hoping that HDD vendors were somewhat more open about their URE 
calculations... It's time for some lab test, I think!
 >
> When MD encounters an URE, it should calculate that block from parity
> information and write it. I have personally had problems with this not
> happening, seems it might be that if the URE doesn't happen repeatedly,
> MD might not re-write. All parity raid levels should behave the same, so
> this should work identically for RAID1, RAID10, RAID5 and RAID6.
>
What about _degraded_ array state? In other words, if a degraded RAID5 
experiences a URE during rebuild, what happens? I read that most 
hardware based RAID card both stop rebuilding _and_ kill the entire 
array. From my understanding, mdadm should stop rebuilding but the array 
can the restarted, mounted and backupped. Right?

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-07-30 13:05   ` Gionatan Danti
@ 2014-07-30 21:31     ` NeilBrown
  2014-07-31  7:16       ` Gionatan Danti
  0 siblings, 1 reply; 18+ messages in thread
From: NeilBrown @ 2014-07-30 21:31 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: Mikael Abrahamsson, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1531 bytes --]

On Wed, 30 Jul 2014 15:05:29 +0200 Gionatan Danti <g.danti@assyoma.it> wrote:

> On 30/07/2014 13:13, Mikael Abrahamsson wrote:
> >
> > There has been much discussion about the URE figures. Some people
> > interpret it one way, others another way. There is nobody here that
> > knows for sure. Ask your HDD vendor, if they answer, do share here!
> >
> - Ouch! -
> I was hoping that HDD vendors were somewhat more open about their URE 
> calculations... It's time for some lab test, I think!
>  >
> > When MD encounters an URE, it should calculate that block from parity
> > information and write it. I have personally had problems with this not
> > happening, seems it might be that if the URE doesn't happen repeatedly,
> > MD might not re-write. All parity raid levels should behave the same, so
> > this should work identically for RAID1, RAID10, RAID5 and RAID6.
> >
> What about _degraded_ array state? In other words, if a degraded RAID5 
> experiences a URE during rebuild, what happens? I read that most 
> hardware based RAID card both stop rebuilding _and_ kill the entire 
> array. From my understanding, mdadm should stop rebuilding but the array 
> can the restarted, mounted and backupped. Right?
> 
> Regards.
> 

Yes, you can usually get your data back with mdadm.

With latest code, a URE during recovery will cause a bad-block to be recorded
on the recovered device, and recovery will continue.  You end up with a
working array that has a few unreadable blocks on it.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-07-30 21:31     ` NeilBrown
@ 2014-07-31  7:16       ` Gionatan Danti
  2014-08-02 16:21         ` Gionatan Danti
  0 siblings, 1 reply; 18+ messages in thread
From: Gionatan Danti @ 2014-07-31  7:16 UTC (permalink / raw)
  To: NeilBrown; +Cc: Mikael Abrahamsson, linux-raid, g.danti


> Yes, you can usually get your data back with mdadm.
>
> With latest code, a URE during recovery will cause a bad-block to be recorded
> on the recovered device, and recovery will continue.  You end up with a
> working array that has a few unreadable blocks on it.
>
> NeilBrown

This is very good news :)
I case of parity RAID I assume the entire stripe is marked as bad, but 
with mirror (eg: RAID10) only a single block (often 512B) is marked bad 
on the recovered device, right?

 From what mdadm/kernel version the new behavior is implemented? Maybe 
the software RAID on my CentOS 6.5 is stronger then expected ;)

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-07-31  7:16       ` Gionatan Danti
@ 2014-08-02 16:21         ` Gionatan Danti
  2014-08-03  3:48           ` NeilBrown
  0 siblings, 1 reply; 18+ messages in thread
From: Gionatan Danti @ 2014-08-02 16:21 UTC (permalink / raw)
  To: NeilBrown; +Cc: Mikael Abrahamsson, linux-raid, g.danti

Hi again,
I started a little experiment regarding BER/UREs and I wish to have an 
informed feedback.

As I had a spare 500 GB Seagate Barracuda 7200.12 (BER 10^14 max: 
http://www.seagate.com/staticfiles/support/disc/manuals/desktop/Barracuda%207200.12/100529369e.pdf), 
I started to read it continuously with the following shell command: dd 
if=/dev/sdb of=/dev/null bs=8M iflag=direct

The drive was used as a member of a RAID10 set on one of my test 
machines, so I assume its platters are full of pseudo-random data. At 
100 MB/s, I am now at about 15 TB read from it and I don't see any 
problem reported by the kernel.

Some questions:
1) I should try in different / harder mode to generate UREs? Maybe using 
some pre-determined pseudo-random string and then comparing the results 
(I think this is more appropriate to catch silent data corruption, by 
the way)?
2) how UREs should be visible? Via error reporting through dmesg?

Thanks.

Il 2014-07-31 09:16 Gionatan Danti ha scritto:
>> Yes, you can usually get your data back with mdadm.
>> 
>> With latest code, a URE during recovery will cause a bad-block to be 
>> recorded
>> on the recovered device, and recovery will continue.  You end up with 
>> a
>> working array that has a few unreadable blocks on it.
>> 
>> NeilBrown
> 
> This is very good news :)
> I case of parity RAID I assume the entire stripe is marked as bad, but
> with mirror (eg: RAID10) only a single block (often 512B) is marked
> bad on the recovered device, right?
> 
> From what mdadm/kernel version the new behavior is implemented? Maybe
> the software RAID on my CentOS 6.5 is stronger then expected ;)
> 
> Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-08-02 16:21         ` Gionatan Danti
@ 2014-08-03  3:48           ` NeilBrown
  2014-08-04  7:02             ` Mikael Abrahamsson
  2014-08-04 13:27             ` Gionatan Danti
  0 siblings, 2 replies; 18+ messages in thread
From: NeilBrown @ 2014-08-03  3:48 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: Mikael Abrahamsson, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2305 bytes --]

On Sat, 02 Aug 2014 18:21:07 +0200 Gionatan Danti <g.danti@assyoma.it> wrote:

> Hi again,
> I started a little experiment regarding BER/UREs and I wish to have an 
> informed feedback.
> 
> As I had a spare 500 GB Seagate Barracuda 7200.12 (BER 10^14 max: 
> http://www.seagate.com/staticfiles/support/disc/manuals/desktop/Barracuda%207200.12/100529369e.pdf), 
> I started to read it continuously with the following shell command: dd 
> if=/dev/sdb of=/dev/null bs=8M iflag=direct
> 
> The drive was used as a member of a RAID10 set on one of my test 
> machines, so I assume its platters are full of pseudo-random data. At 
> 100 MB/s, I am now at about 15 TB read from it and I don't see any 
> problem reported by the kernel.
> 
> Some questions:
> 1) I should try in different / harder mode to generate UREs? Maybe using 
> some pre-determined pseudo-random string and then comparing the results 
> (I think this is more appropriate to catch silent data corruption, by 
> the way)?

You are very unlikely to see UREs just be reading the drive over and over a
again.  You easily do that for years and not get an error.  Or maybe you got
one just then.


> 2) how UREs should be visible? Via error reporting through dmesg?

If you want to see how the system responds when it hits a URE, you can use the
hdparm command and the "--make-bad-sector" option.  There is also a
"--repair-sector" option which will (hopefully) repair the sector when you
are done.

NeilBrown


> 
> Thanks.
> 
> Il 2014-07-31 09:16 Gionatan Danti ha scritto:
> >> Yes, you can usually get your data back with mdadm.
> >> 
> >> With latest code, a URE during recovery will cause a bad-block to be 
> >> recorded
> >> on the recovered device, and recovery will continue.  You end up with 
> >> a
> >> working array that has a few unreadable blocks on it.
> >> 
> >> NeilBrown
> > 
> > This is very good news :)
> > I case of parity RAID I assume the entire stripe is marked as bad, but
> > with mirror (eg: RAID10) only a single block (often 512B) is marked
> > bad on the recovered device, right?
> > 
> > From what mdadm/kernel version the new behavior is implemented? Maybe
> > the software RAID on my CentOS 6.5 is stronger then expected ;)
> > 
> > Regards.
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-08-03  3:48           ` NeilBrown
@ 2014-08-04  7:02             ` Mikael Abrahamsson
  2014-08-04  7:13               ` NeilBrown
  2014-08-04 13:27             ` Gionatan Danti
  1 sibling, 1 reply; 18+ messages in thread
From: Mikael Abrahamsson @ 2014-08-04  7:02 UTC (permalink / raw)
  To: NeilBrown; +Cc: Gionatan Danti, linux-raid

On Sun, 3 Aug 2014, NeilBrown wrote:

> You are very unlikely to see UREs just be reading the drive over and over a
> again.  You easily do that for years and not get an error.  Or maybe you got
> one just then.

Also you might get an intermittent URE. I have had drives where the sector 
would be successfully be read after several attempts. Why the drive 
doesn't re-write the sector when it needs hundreds or thousands of 
attempts to read it, I don't know. I would very much like to talk to 
someone who really knows how these things works end-to-end, but I don't 
have access to anyone like that. Most of the information to be found 
publically is by people deducing behaviour from experience from the 
outside of this "black box".

>> 2) how UREs should be visible? Via error reporting through dmesg?
>
> If you want to see how the system responds when it hits a URE, you can use the
> hdparm command and the "--make-bad-sector" option.  There is also a
> "--repair-sector" option which will (hopefully) repair the sector when you
> are done.

Does this command do the same as with a real URE, ie will try until the 
timeout of the drive (which is what, 90 seconds on a consumer drive, 7 
seconds of an enterprise drive, right?).

If it fails immediately then it's not testing the same thing as a "real" 
URE. Might be good to know if one does testing that's supposed to emulate 
real failures.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-08-04  7:02             ` Mikael Abrahamsson
@ 2014-08-04  7:13               ` NeilBrown
  0 siblings, 0 replies; 18+ messages in thread
From: NeilBrown @ 2014-08-04  7:13 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: Gionatan Danti, linux-raid

[-- Attachment #1: Type: text/plain, Size: 719 bytes --]

On Mon, 4 Aug 2014 09:02:22 +0200 (CEST) Mikael Abrahamsson
<swmike@swm.pp.se> wrote:

> On Sun, 3 Aug 2014, NeilBrown wrote:

> >> 2) how UREs should be visible? Via error reporting through dmesg?
> >
> > If you want to see how the system responds when it hits a URE, you can use the
> > hdparm command and the "--make-bad-sector" option.  There is also a
> > "--repair-sector" option which will (hopefully) repair the sector when you
> > are done.
> 
> Does this command do the same as with a real URE, ie will try until the 
> timeout of the drive (which is what, 90 seconds on a consumer drive, 7 
> seconds of an enterprise drive, right?).

All I know is what I read in "man hdparm".

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-08-03  3:48           ` NeilBrown
  2014-08-04  7:02             ` Mikael Abrahamsson
@ 2014-08-04 13:27             ` Gionatan Danti
  2014-08-04 18:40               ` Mikael Abrahamsson
  1 sibling, 1 reply; 18+ messages in thread
From: Gionatan Danti @ 2014-08-04 13:27 UTC (permalink / raw)
  To: NeilBrown; +Cc: Mikael Abrahamsson, linux-raid, g.danti



On 03/08/2014 05:48, NeilBrown wrote:

> You are very unlikely to see UREs just be reading the drive over and over a
> again.  You easily do that for years and not get an error.  Or maybe you got
> one just then.

True. I read over 40 TB from this disk and I haven't find any error. 
Some SMART attribute reported so far:
ID  NAME                    FLAG     V     W     T      R
197 Current_Pending_Sector  0x0012   100   100   000    0
198 Offline_Uncorrectable   0x0010   100   100   000    0

As you can find, no error was reported, and I don't find anything 
suspicious in dmesg. At least, this should prove that article as this 
[1] are quite wrong.

Maybe URE errors are related to unsuccessful writes in the first place. 
I will try to repeat the test intermixing read with full-disk writes.


[1] 
http://subnetmask255x4.wordpress.com/2008/10/28/sata-unrecoverable-errors-and-how-that-impacts-raid/

> If you want to see how the system responds when it hits a URE, you can use the
> hdparm command and the "--make-bad-sector" option.  There is also a
> "--repair-sector" option which will (hopefully) repair the sector when you
> are done.
>
> NeilBrown
>
>
>>
>> Thanks.
>>
>> Il 2014-07-31 09:16 Gionatan Danti ha scritto:
>>>> Yes, you can usually get your data back with mdadm.
>>>>
>>>> With latest code, a URE during recovery will cause a bad-block to be
>>>> recorded
>>>> on the recovered device, and recovery will continue.  You end up with
>>>> a
>>>> working array that has a few unreadable blocks on it.
>>>>
>>>> NeilBrown
>>>
>>> This is very good news :)
>>> I case of parity RAID I assume the entire stripe is marked as bad, but
>>> with mirror (eg: RAID10) only a single block (often 512B) is marked
>>> bad on the recovered device, right?
>>>
>>>  From what mdadm/kernel version the new behavior is implemented? Maybe
>>> the software RAID on my CentOS 6.5 is stronger then expected ;)
>>>
>>> Regards.
>>
>

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-08-04 13:27             ` Gionatan Danti
@ 2014-08-04 18:40               ` Mikael Abrahamsson
  2014-08-04 22:44                 ` Gionatan Danti
  0 siblings, 1 reply; 18+ messages in thread
From: Mikael Abrahamsson @ 2014-08-04 18:40 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-raid

On Mon, 4 Aug 2014, Gionatan Danti wrote:

> As you can find, no error was reported, and I don't find anything 
> suspicious in dmesg. At least, this should prove that article as this 
> [1] are quite wrong.

Why do you think that's wrong? 10^-14 is what the vendor guarantees. I 
have had drives with worse performance (after a couple of months I had 
several UNC sectors without reading much).

Your claim about the article being wrong is the same as saying that the 
risk reported of getting into a car accident is wrong because you've 
driven that amount of kilometers but haven't been in an accident yet.

This is statistics, marketing and warranty, not guaranteed behavior.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-08-04 18:40               ` Mikael Abrahamsson
@ 2014-08-04 22:44                 ` Gionatan Danti
  2014-08-04 23:29                   ` NeilBrown
                                     ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Gionatan Danti @ 2014-08-04 22:44 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: linux-raid, g.danti

Il 2014-08-04 20:40 Mikael Abrahamsson ha scritto:
> 
> Why do you think that's wrong? 10^-14 is what the vendor guarantees. I
> have had drives with worse performance (after a couple of months I had
> several UNC sectors without reading much).
> 
> Your claim about the article being wrong is the same as saying that
> the risk reported of getting into a car accident is wrong because
> you've driven that amount of kilometers but haven't been in an
> accident yet.
> 
> This is statistics, marketing and warranty, not guaranteed behavior.

Yes, I understand this. However, the linked article (and many others) 
state:
"If you have a 2TB drive, you write 2TB to it, and then you fully read 
that, just over 6 times, then you will run into one read error, 
theoretically speaking."

I read my 500 GB drive over _60_ times, reading 3x more total data than 
stated above.

I started the entire discussion to know how UREs are calculated, trying 
to understand if they are expressed as probability ("1 probabily over 
10^14 that we can not read a sector) or a statistical record ("we found 
that 1 on 10^14 is not readable").

If defined as a probability, I am very lucky: if my math is OK, I should 
have only 0.5% to read about 40 TB of data (my math is: 
(1-(1/10^14))^(3*(10^14))). If, on the other hand, UREs are defined as 
statistical evidence (as MTBF), environment and test conditions (eg: 
duty cycle, read/write distribution, etc) are absolutely critical to 
understand  what this parameter really mean for us.

I'm under impression (and maybe I'm wrong, as usual :)) that UREs mainly 
depends on incomplete writes and/or unsable sectors. If this is the 
case, maybe the published URE values are related to the entire HDD 
warranty. In other word, they should be read as "in normal condition, 
with typical loads, out HDD will exibit about 1/10^14 unrecoverable 
error during the entire disk lifespan".

It is reasonable? Or I am horribly wrong?
Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-08-04 22:44                 ` Gionatan Danti
@ 2014-08-04 23:29                   ` NeilBrown
  2014-08-05  6:52                     ` Gionatan Danti
  2014-08-05 19:01                   ` Piergiorgio Sartor
  2014-08-06 16:34                   ` Chris Murphy
  2 siblings, 1 reply; 18+ messages in thread
From: NeilBrown @ 2014-08-04 23:29 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: Mikael Abrahamsson, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3853 bytes --]

On Tue, 05 Aug 2014 00:44:04 +0200 Gionatan Danti <g.danti@assyoma.it> wrote:

> Il 2014-08-04 20:40 Mikael Abrahamsson ha scritto:
> > 
> > Why do you think that's wrong? 10^-14 is what the vendor guarantees. I
> > have had drives with worse performance (after a couple of months I had
> > several UNC sectors without reading much).
> > 
> > Your claim about the article being wrong is the same as saying that
> > the risk reported of getting into a car accident is wrong because
> > you've driven that amount of kilometers but haven't been in an
> > accident yet.
> > 
> > This is statistics, marketing and warranty, not guaranteed behavior.
> 
> Yes, I understand this. However, the linked article (and many others) 
> state:
> "If you have a 2TB drive, you write 2TB to it, and then you fully read 
> that, just over 6 times, then you will run into one read error, 
> theoretically speaking."

This statement is wrong, and doesn't even make any sense.  It displays a deep
misunderstanding of probability (the same deep misunderstanding that leads
people to buy lottery tickets).

> 
> I read my 500 GB drive over _60_ times, reading 3x more total data than 
> stated above.
> 
> I started the entire discussion to know how UREs are calculated, trying 
> to understand if they are expressed as probability ("1 probabily over 
> 10^14 that we can not read a sector) or a statistical record ("we found 
> that 1 on 10^14 is not readable").

Probabilities are often calculated by examining a statistical record - the
two concepts are not separate.
There is probably some theoretical analysis, some statistical analysis, some
marketing and maybe even some actuarial analysis that goes in to the quoted
figure.  I remember when CPU speed was measured in "MIPS".
This stood for 
   Meaningless Indicators of Performance for Salesmen

URE rates numbers are probably equally trustworthy.

> 
> If defined as a probability, I am very lucky: if my math is OK, I should 
> have only 0.5% to read about 40 TB of data (my math is: 
> (1-(1/10^14))^(3*(10^14))). If, on the other hand, UREs are defined as 
> statistical evidence (as MTBF), environment and test conditions (eg: 
> duty cycle, read/write distribution, etc) are absolutely critical to 
> understand  what this parameter really mean for us.

The probability number doesn't tell you much at all about your drive.
Your drive probably works much better than the quoted rate, but could be much
worse.
The quoted number might say something useful about a collection of 10,000
drives, but if you can afford those, you can probably afford to competent
statistician to explain the details too.


> 
> I'm under impression (and maybe I'm wrong, as usual :)) that UREs mainly 
> depends on incomplete writes and/or unsable sectors. If this is the 
> case, maybe the published URE values are related to the entire HDD 
> warranty. In other word, they should be read as "in normal condition, 
> with typical loads, out HDD will exibit about 1/10^14 unrecoverable 
> error during the entire disk lifespan".

I'm not an electro-magnetic engineer, but I would guess that UREs are caused
by some combination of:
 - irregularities in the physical media
 - imperfections in positioning of the write head
 - fluctuations in temperature and pressure which could
   affect precise performance of resistors and capacitors etc.

and probably various quantum effects that I know nothing about.

Maybe most UREs come from a spec of dust that was in the wrong place at the
wrong time.

If think a better summary would be:
  in normal conditions and typical loads, a collection of 10^14 drives will
  exhibit errors somewhere in the collection on a regular basis.

> 
> It is reasonable? Or I am horribly wrong?
> Regards.
> 

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-08-04 23:29                   ` NeilBrown
@ 2014-08-05  6:52                     ` Gionatan Danti
  0 siblings, 0 replies; 18+ messages in thread
From: Gionatan Danti @ 2014-08-05  6:52 UTC (permalink / raw)
  To: NeilBrown; +Cc: Mikael Abrahamsson, linux-raid, g.danti



On 05/08/2014 01:29, NeilBrown wrote:
> On Tue, 05 Aug 2014 00:44:04 +0200 Gionatan Danti <g.danti@assyoma.it> wrote:
>
>
> Probabilities are often calculated by examining a statistical record - the
> two concepts are not separate.
> There is probably some theoretical analysis, some statistical analysis, some
> marketing and maybe even some actuarial analysis that goes in to the quoted
> figure.  I remember when CPU speed was measured in "MIPS".
> This stood for
>     Meaningless Indicators of Performance for Salesmen
>
> URE rates numbers are probably equally trustworthy.
>
> The probability number doesn't tell you much at all about your drive.
> Your drive probably works much better than the quoted rate, but could be much
> worse.
> The quoted number might say something useful about a collection of 10,000
> drives, but if you can afford those, you can probably afford to competent
> statistician to explain the details too.
>
>
>
> I'm not an electro-magnetic engineer, but I would guess that UREs are caused
> by some combination of:
>   - irregularities in the physical media
>   - imperfections in positioning of the write head
>   - fluctuations in temperature and pressure which could
>     affect precise performance of resistors and capacitors etc.
>
> and probably various quantum effects that I know nothing about.
>
> Maybe most UREs come from a spec of dust that was in the wrong place at the
> wrong time.
>
> If think a better summary would be:
>    in normal conditions and typical loads, a collection of 10^14 drives will
>    exhibit errors somewhere in the collection on a regular basis.
>
>
> NeilBrown
>

VERY informative post. Thank you Neil.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-08-04 22:44                 ` Gionatan Danti
  2014-08-04 23:29                   ` NeilBrown
@ 2014-08-05 19:01                   ` Piergiorgio Sartor
  2014-08-05 19:42                     ` Gionatan Danti
  2014-08-06 16:34                   ` Chris Murphy
  2 siblings, 1 reply; 18+ messages in thread
From: Piergiorgio Sartor @ 2014-08-05 19:01 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: Mikael Abrahamsson, linux-raid

On Tue, Aug 05, 2014 at 12:44:04AM +0200, Gionatan Danti wrote:
[...]
> Yes, I understand this. However, the linked article (and many others) state:
> "If you have a 2TB drive, you write 2TB to it, and then you fully read that,
> just over 6 times, then you will run into one read error, theoretically
> speaking."

This means they, who wrote the article, did not
really *tested* what they wrote.
Which already tells us a lot about the quality
of the article itself.

> I read my 500 GB drive over _60_ times, reading 3x more total data than
> stated above.
> 
> I started the entire discussion to know how UREs are calculated, trying to
> understand if they are expressed as probability ("1 probabily over 10^14
> that we can not read a sector) or a statistical record ("we found that 1 on
> 10^14 is not readable").

What's the difference between "probability" and
"statistical record"?
Is not one calculated with the other?

> If defined as a probability, I am very lucky: if my math is OK, I should
> have only 0.5% to read about 40 TB of data (my math is:
> (1-(1/10^14))^(3*(10^14))). If, on the other hand, UREs are defined as

I'm to lazy to try to understand what 3*10^14 is.
What is it?

> statistical evidence (as MTBF), environment and test conditions (eg: duty
> cycle, read/write distribution, etc) are absolutely critical to understand
> what this parameter really mean for us.

I'm under the impression you did not grasp the
concept of probability is such contex.
Given that it is not clear how the manufacturers
compute their numbers, both cases you describe
are the same.
All the possible conditions are included in the
probability computation.

You can state: under worst case scenario, *each*
bit has a probability of 10E-14 of being wrong.
What does this mean?

> I'm under impression (and maybe I'm wrong, as usual :)) that UREs mainly
> depends on incomplete writes and/or unsable sectors. If this is the case,
> maybe the published URE values are related to the entire HDD warranty. In
> other word, they should be read as "in normal condition, with typical loads,
> out HDD will exibit about 1/10^14 unrecoverable error during the entire disk
> lifespan".

As already wrote by others, it is not clear what
that number (10E-14) means.
A common understanding could be, as stated above,
each bit has a *probability* of 10E-14 of being wrong.

Practically, it does *not* mean that reading 10E14 bit
will deliver one bit wrong sistematically.

Furthermore, as already again stated, very likely
an "average" HDD has much lower URE probability.

> 
> It is reasonable? Or I am horribly wrong?

Is this pure curiosity from your side or are
you trying to achieve something?

There is a report, from CERN I think, provinding
real world statistics about HDD problems.

http://storagemojo.com/2007/09/19/cerns-data-corruption-research/

bye,

pg

> Regards.
> 
> -- 
> Danti Gionatan
> Supporto Tecnico
> Assyoma S.r.l. - www.assyoma.it
> email: g.danti@assyoma.it - info@assyoma.it
> GPG public key ID: FF5F32A8
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-08-05 19:01                   ` Piergiorgio Sartor
@ 2014-08-05 19:42                     ` Gionatan Danti
  2014-08-06 17:05                       ` Chris Murphy
  0 siblings, 1 reply; 18+ messages in thread
From: Gionatan Danti @ 2014-08-05 19:42 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: Mikael Abrahamsson, linux-raid, g.danti

Il 2014-08-05 21:01 Piergiorgio Sartor ha scritto:
> 
> This means they, who wrote the article, did not
> really *tested* what they wrote.
> Which already tells us a lot about the quality
> of the article itself.

True. Problem is that the web is full of similar articles, which sounded 
waaaaay to much "suspicious" is what they said.

> What's the difference between "probability" and
> "statistical record"?
> Is not one calculated with the other?

Premise: I am not a statistical expert, so maybe I used the wrong terms 
and/or my entire reasoning is flawed.

I am trying to imagine _how_ the various vendors arrive at the claimed 
number and _how much_ we have confidence in URE rate. _If_ for some 
reason (eg: magnetical interference during write and/or rest) a fixed 
"wrong read" probability exists _and_ _if_ it is correct to consider 
each sector read as totally indipendent events, HDD manufacturer may 
have a quite precise formula from which URE rate is obtained.

If, on the other hand, they "simply" observe how a big drive population 
reacts over time, maybe we can expect bigger variations between drivers.

I'm just speculating here; what really worried me was "you can't read 6 
times your 2 TB drive" argument :)

> I'm to lazy to try to understand what 3*10^14 is.
> What is it?

I have read about 40 TB of data, or 320 Tb. 10^14 is 12.5 TB or 100 Tb, 
if you prefer. So 3*10^14 simply is the numnber of bit that I read (URE 
is expressed as 1 event over 10^14 bit, so I wonder that make sense to 
use the same scale here).

> I'm under the impression you did not grasp the
> concept of probability is such contex.
> Given that it is not clear how the manufacturers
> compute their numbers, both cases you describe
> are the same.
> All the possible conditions are included in the
> probability computation.

I can see your point...

> You can state: under worst case scenario, *each*
> bit has a probability of 10E-14 of being wrong.
> What does this mean?

... and _this_ is what really interested me. Manufacturer publish URE 
rate as "max" values, so should be reasonable to assume that they are 
worst-case scenario. If this is the case, we can be quite sure that our 
URE rate will be lower then published specs (assuming that drive are 
deployed with care).

On the other hand, in some articles and even in this mailing list I read 
that published URE rate really are a "max of various means" and do not 
represent true worst-case scenario.

> As already wrote by others, it is not clear what
> that number (10E-14) means.
> A common understanding could be, as stated above,
> each bit has a *probability* of 10E-14 of being wrong.
> 
> Practically, it does *not* mean that reading 10E14 bit
> will deliver one bit wrong sistematically.

But if the spec is representative of normal usage scenario, reading 40 
TB of data with URE of 10^-14 has very high probabily to return a bad 
read (>95%) ...

> Furthermore, as already again stated, very likely
> an "average" HDD has much lower URE probability.

This is reassuring :)

> 
> Is this pure curiosity from your side or are
> you trying to achieve something?
> 
> There is a report, from CERN I think, provinding
> real world statistics about HDD problems.
> 
> http://storagemojo.com/2007/09/19/cerns-data-corruption-research/
> 
> bye,

Yes, I saw this article and read it with great interest. After all it 
seems that the greater part of data corruption is due to 
firmware/kernel/driver bug, and that URE rate play a minor role here.

Thank you very much guys. I'm sorry to boring you with all these 
questions, but I'm just trying to learn something!
Regards.


-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-08-04 22:44                 ` Gionatan Danti
  2014-08-04 23:29                   ` NeilBrown
  2014-08-05 19:01                   ` Piergiorgio Sartor
@ 2014-08-06 16:34                   ` Chris Murphy
  2 siblings, 0 replies; 18+ messages in thread
From: Chris Murphy @ 2014-08-06 16:34 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org List


On Aug 4, 2014, at 4:44 PM, Gionatan Danti <g.danti@assyoma.it> wrote:

> Il 2014-08-04 20:40 Mikael Abrahamsson ha scritto:
>> Why do you think that's wrong? 10^-14 is what the vendor guarantees. I
>> have had drives with worse performance (after a couple of months I had
>> several UNC sectors without reading much).
>> Your claim about the article being wrong is the same as saying that
>> the risk reported of getting into a car accident is wrong because
>> you've driven that amount of kilometers but haven't been in an
>> accident yet.
>> This is statistics, marketing and warranty, not guaranteed behavior.
> 
> Yes, I understand this. However, the linked article (and many others) state:
> "If you have a 2TB drive, you write 2TB to it, and then you fully read that, just over 6 times, then you will run into one read error, theoretically speaking."

A while ago I brought up the fact the less than sign is ignored in such assertions. And that's why they draw the wrong conclusions.


> I read my 500 GB drive over _60_ times, reading 3x more total data than stated above.

Right and it wouldn't surprise me if you could read it 600 times without unrecoverable read error. Also note that the spec doesn't require that you read every sector. You could have read one sector 24414062500 times to arrive at just over 1E14 bits read, assuming it's a 512 byte drive.

http://arxiv.org/pdf/cs/0701166.pdf


> I started the entire discussion to know how UREs are calculated, trying to understand if they are expressed as probability ("1 probabily over 10^14 that we can not read a sector) or a statistical record ("we found that 1 on 10^14 is not readable").

Neither. Both conclusions are overreach based on what the spec says. The spec is not stating an average, and we also know from a large body of publicly available information that drive failure rates aren't constant with age. So the distribution is actually quite complex.



> If defined as a probability, I am very lucky: if my math is OK, I should have only 0.5% to read about 40 TB of data (my math is: (1-(1/10^14))^(3*(10^14))). If, on the other hand, UREs are defined as statistical evidence (as MTBF), environment and test conditions (eg: duty cycle, read/write distribution, etc) are absolutely critical to understand  what this parameter really mean for us.


A URE isn't a failure to a manufacturer. Strictly speaking they're only promising less than one error per 12TB of bits read,in this example. If a read error happens more frequently, the implicit performance hasn't been met and you're entitled to a replacement. Otherwise you're not. But in practice manufacturers hand out RMAs for warranty drives for almost any reason which is why almost half of the drives they take back under warranty have no spec busting problems per their testing. So what parameters users have for "failure" vs the manufacturer, aren't exactly aligned.


> 
> I'm under impression (and maybe I'm wrong, as usual :)) that UREs mainly depends on incomplete writes and/or unsable sectors. If this is the case, maybe the published URE values are related to the entire HDD warranty. In other word, they should be read as "in normal condition, with typical loads, out HDD will exibit about 1/10^14 unrecoverable error during the entire disk lifespan".

You can only arrive at this conclusion by ignoring the < sign that all the manufacturers use (or the explicitly stated "less than" phrasing).


Chris Murphy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: On URE and RAID rebuild - again!
  2014-08-05 19:42                     ` Gionatan Danti
@ 2014-08-06 17:05                       ` Chris Murphy
  0 siblings, 0 replies; 18+ messages in thread
From: Chris Murphy @ 2014-08-06 17:05 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org List


On Aug 5, 2014, at 1:42 PM, Gionatan Danti <g.danti@assyoma.it> wrote:
> 
> I am trying to imagine _how_ the various vendors arrive at the claimed number and _how much_ we have confidence in URE rate.

I'd say it's next to useless, and a different question needs to be asked which is how much redundancy is a good value relative to the value of the data; and then coming up with a strategy that meets the uptime and redundancy preference for a given budget.

>> Furthermore, as already again stated, very likely
>> an "average" HDD has much lower URE probability.
> 
> This is reassuring :)

The spec only accounts for the drive itself. Not the cables, the controller, the computer's non-ECC memory, and notably one of the greatest sources of data loss: user error. It also doesn't account for the complete implosion of the drive, for any number of reasons, head impacts the spinning surface and either destroys the data on the surface or the read/write head; actuator death; spindle motor death, logic board death, power supply death, etc.

So to mitigate drive and cable problems we use RAID. For controller, logic board, power supply failure concerns, we use clusters. More than a handful of URE's, even if they were to bust the manufacturer spec, is the loss of a single drive represents hours or days of rebuild because one drive holds so much more data today.

Right now, md RAID 6 + XFS + Gluster clusters is a rather straightforward setup. For volume snapshots to mitigate user induced data loss, LVM2 thinly provisioned LV's can be used. I haven't tested it yet but I think the LVM2 integrated RAID does work with thinp LV's, so it's possible to remove a layer if you're OK with the different LVM raid management tools compared to mdadm.

Chris Murphy


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-08-06 17:05 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-30  8:29 On URE and RAID rebuild - again! Gionatan Danti
2014-07-30 11:13 ` Mikael Abrahamsson
2014-07-30 13:05   ` Gionatan Danti
2014-07-30 21:31     ` NeilBrown
2014-07-31  7:16       ` Gionatan Danti
2014-08-02 16:21         ` Gionatan Danti
2014-08-03  3:48           ` NeilBrown
2014-08-04  7:02             ` Mikael Abrahamsson
2014-08-04  7:13               ` NeilBrown
2014-08-04 13:27             ` Gionatan Danti
2014-08-04 18:40               ` Mikael Abrahamsson
2014-08-04 22:44                 ` Gionatan Danti
2014-08-04 23:29                   ` NeilBrown
2014-08-05  6:52                     ` Gionatan Danti
2014-08-05 19:01                   ` Piergiorgio Sartor
2014-08-05 19:42                     ` Gionatan Danti
2014-08-06 17:05                       ` Chris Murphy
2014-08-06 16:34                   ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.