All of lore.kernel.org
 help / color / mirror / Atom feed
* Help build a drive reliability service!
@ 2017-05-24 18:57 Patrick McGarry
  2017-05-24 19:35 ` John Spray
       [not found] ` <CAAZbbf3kS+KQ06KFe=p=Gn1Qmb8xen35U5CLgpcq3BaTT+e5Vg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 2 replies; 7+ messages in thread
From: Patrick McGarry @ 2017-05-24 18:57 UTC (permalink / raw)
  To: Ceph Devel, Ceph-User

Hey cephers,

Just wanted to share the genesis of a new community project that could
use a few helping hands (and any amount of feedback/discussion that
you might like to offer).

As a bit of backstory, around 2013 the Backblaze folks started
publishing statistics about hard drive reliability from within their
data center for the world to consume. This included things like model,
make, failure state, and SMART data. If you would like to view the
Backblaze data set, you can find it at:

https://www.backblaze.com/b2/hard-drive-test-data.html

While most major cloud providers are doing this for themselves
internally, we would like to replicate/enhance this effort across a
much wider segment of the population as a free service.  I think we
have a pretty good handle on the server/platform side of things, and a
couple of people who have expressed interest in building the
reliability model (although we could always use more!), what we really
need is a passionate volunteer who would like to come forward to write
the agent that sits on the drives, aggregates data, and submits daily
stats reports via an API (and potentially receives information back as
results are calculated about MTTF or potential to fail in the next
24-48 hrs).

Currently my thinking is to build our collection method based on the
Backblaze data set so that we can use it to train our model and build
from going forward. If this sounds like a project you would like to be
involved in (especially if you're from Backblaze!) please let me know.
I think a first pass of the agent should be something we can build in
a couple of afternoons to start testing with a small pilot group that
we already have available.

Happy to entertain any thoughts or feedback that people might have. Thanks!

-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help build a drive reliability service!
  2017-05-24 18:57 Help build a drive reliability service! Patrick McGarry
@ 2017-05-24 19:35 ` John Spray
  2017-05-24 19:37   ` Patrick McGarry
       [not found] ` <CAAZbbf3kS+KQ06KFe=p=Gn1Qmb8xen35U5CLgpcq3BaTT+e5Vg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 7+ messages in thread
From: John Spray @ 2017-05-24 19:35 UTC (permalink / raw)
  To: Patrick McGarry; +Cc: Ceph Devel

On Wed, May 24, 2017 at 7:57 PM, Patrick McGarry <pmcgarry@redhat.com> wrote:
> Hey cephers,
>
> Just wanted to share the genesis of a new community project that could
> use a few helping hands (and any amount of feedback/discussion that
> you might like to offer).
>
> As a bit of backstory, around 2013 the Backblaze folks started
> publishing statistics about hard drive reliability from within their
> data center for the world to consume. This included things like model,
> make, failure state, and SMART data. If you would like to view the
> Backblaze data set, you can find it at:
>
> https://www.backblaze.com/b2/hard-drive-test-data.html
>
> While most major cloud providers are doing this for themselves
> internally, we would like to replicate/enhance this effort across a
> much wider segment of the population as a free service.  I think we
> have a pretty good handle on the server/platform side of things, and a
> couple of people who have expressed interest in building the
> reliability model (although we could always use more!), what we really
> need is a passionate volunteer who would like to come forward to write
> the agent that sits on the drives, aggregates data, and submits daily
> stats reports via an API (and potentially receives information back as
> results are calculated about MTTF or potential to fail in the next
> 24-48 hrs).
>
> Currently my thinking is to build our collection method based on the
> Backblaze data set so that we can use it to train our model and build
> from going forward. If this sounds like a project you would like to be
> involved in (especially if you're from Backblaze!) please let me know.
> I think a first pass of the agent should be something we can build in
> a couple of afternoons to start testing with a small pilot group that
> we already have available.

I happen to already have written (some time ago) an agent that
collects smart data and posts it to a web service.  It's in golang and
links with a crudely hacked version of smartmontools to gather the
stats.

Any interest?  (hopefully I can find the code...)

John

>
> Happy to entertain any thoughts or feedback that people might have. Thanks!
>
> --
>
> Best Regards,
>
> Patrick McGarry
> Director Ceph Community || Red Hat
> http://ceph.com  ||  http://community.redhat.com
> @scuttlemonkey || @ceph
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help build a drive reliability service!
  2017-05-24 19:35 ` John Spray
@ 2017-05-24 19:37   ` Patrick McGarry
  0 siblings, 0 replies; 7+ messages in thread
From: Patrick McGarry @ 2017-05-24 19:37 UTC (permalink / raw)
  To: John Spray; +Cc: Ceph Devel

Hey John,

Definitely be a great place to start from if you can find it. I can
carve out a place in the Ceph github to push it to so we can all poke
at it a bit. Thanks!


On Wed, May 24, 2017 at 3:35 PM, John Spray <jspray@redhat.com> wrote:
> On Wed, May 24, 2017 at 7:57 PM, Patrick McGarry <pmcgarry@redhat.com> wrote:
>> Hey cephers,
>>
>> Just wanted to share the genesis of a new community project that could
>> use a few helping hands (and any amount of feedback/discussion that
>> you might like to offer).
>>
>> As a bit of backstory, around 2013 the Backblaze folks started
>> publishing statistics about hard drive reliability from within their
>> data center for the world to consume. This included things like model,
>> make, failure state, and SMART data. If you would like to view the
>> Backblaze data set, you can find it at:
>>
>> https://www.backblaze.com/b2/hard-drive-test-data.html
>>
>> While most major cloud providers are doing this for themselves
>> internally, we would like to replicate/enhance this effort across a
>> much wider segment of the population as a free service.  I think we
>> have a pretty good handle on the server/platform side of things, and a
>> couple of people who have expressed interest in building the
>> reliability model (although we could always use more!), what we really
>> need is a passionate volunteer who would like to come forward to write
>> the agent that sits on the drives, aggregates data, and submits daily
>> stats reports via an API (and potentially receives information back as
>> results are calculated about MTTF or potential to fail in the next
>> 24-48 hrs).
>>
>> Currently my thinking is to build our collection method based on the
>> Backblaze data set so that we can use it to train our model and build
>> from going forward. If this sounds like a project you would like to be
>> involved in (especially if you're from Backblaze!) please let me know.
>> I think a first pass of the agent should be something we can build in
>> a couple of afternoons to start testing with a small pilot group that
>> we already have available.
>
> I happen to already have written (some time ago) an agent that
> collects smart data and posts it to a web service.  It's in golang and
> links with a crudely hacked version of smartmontools to gather the
> stats.
>
> Any interest?  (hopefully I can find the code...)
>
> John
>
>>
>> Happy to entertain any thoughts or feedback that people might have. Thanks!
>>
>> --
>>
>> Best Regards,
>>
>> Patrick McGarry
>> Director Ceph Community || Red Hat
>> http://ceph.com  ||  http://community.redhat.com
>> @scuttlemonkey || @ceph
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help build a drive reliability service!
       [not found] ` <CAAZbbf3kS+KQ06KFe=p=Gn1Qmb8xen35U5CLgpcq3BaTT+e5Vg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-14 15:37   ` Dan van der Ster
       [not found]     ` <CABZ+qqmiKrca=AsoAYS3CaW+TTOieoSn1T_Z=gWzaxzmeA1stQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Dan van der Ster @ 2017-06-14 15:37 UTC (permalink / raw)
  To: Patrick McGarry; +Cc: Ceph Devel, Ceph-User

Hi Patrick,

We've just discussed this internally and I wanted to share some notes.

First, there are at least three separate efforts in our IT dept to
collect and analyse SMART data -- its clearly a popular idea and
simple to implement, but this leads to repetition and begs for a
common, good solution.

One (perhaps trivial) issue is that it is hard to define exactly when
a drive has failed -- it varies depending on the storage system. For
Ceph I would define failure as EIO, which normally correlates with a
drive medium error, but there were other ideas here. So if this should
be a general purpose service, the sensor should have a pluggable
failure indicator.

There was also debate about what exactly we could do with a failure
prediction model. Suppose the predictor told us a drive should fail in
one week. We could proactively drain that disk, but then would it
still fail? Will the vendor replace that drive under warranty only if
it was *about to fail*?

Lastly, and more importantly, there is a general hesitation to publish
this kind of data openly, given how negatively it could impact a
manufacturer. Our lab certainly couldn't publish a report saying "here
are the most and least reliable drives". I don't know if anonymising
the data sources would help here, but anyway I'm curious what are your
thoughts on that point. Maybe what can come out of this are the
_components_ of a drive reliability service, which could then be
deployed privately or publicly as appropriate.

Thanks!

Dan




On Wed, May 24, 2017 at 8:57 PM, Patrick McGarry <pmcgarry-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Hey cephers,
>
> Just wanted to share the genesis of a new community project that could
> use a few helping hands (and any amount of feedback/discussion that
> you might like to offer).
>
> As a bit of backstory, around 2013 the Backblaze folks started
> publishing statistics about hard drive reliability from within their
> data center for the world to consume. This included things like model,
> make, failure state, and SMART data. If you would like to view the
> Backblaze data set, you can find it at:
>
> https://www.backblaze.com/b2/hard-drive-test-data.html
>
> While most major cloud providers are doing this for themselves
> internally, we would like to replicate/enhance this effort across a
> much wider segment of the population as a free service.  I think we
> have a pretty good handle on the server/platform side of things, and a
> couple of people who have expressed interest in building the
> reliability model (although we could always use more!), what we really
> need is a passionate volunteer who would like to come forward to write
> the agent that sits on the drives, aggregates data, and submits daily
> stats reports via an API (and potentially receives information back as
> results are calculated about MTTF or potential to fail in the next
> 24-48 hrs).
>
> Currently my thinking is to build our collection method based on the
> Backblaze data set so that we can use it to train our model and build
> from going forward. If this sounds like a project you would like to be
> involved in (especially if you're from Backblaze!) please let me know.
> I think a first pass of the agent should be something we can build in
> a couple of afternoons to start testing with a small pilot group that
> we already have available.
>
> Happy to entertain any thoughts or feedback that people might have. Thanks!
>
> --
>
> Best Regards,
>
> Patrick McGarry
> Director Ceph Community || Red Hat
> http://ceph.com  ||  http://community.redhat.com
> @scuttlemonkey || @ceph
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help build a drive reliability service!
       [not found]     ` <CABZ+qqmiKrca=AsoAYS3CaW+TTOieoSn1T_Z=gWzaxzmeA1stQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-14 16:09       ` David Turner
  2017-06-16  3:48         ` [ceph-users] " Z Will
  0 siblings, 1 reply; 7+ messages in thread
From: David Turner @ 2017-06-14 16:09 UTC (permalink / raw)
  To: Dan van der Ster, Patrick McGarry; +Cc: Ceph Devel, Ceph-User


[-- Attachment #1.1: Type: text/plain, Size: 6731 bytes --]

I understand concern over annoying drive manufacturers, but if you have
data to back it up you aren't slandering a drive manufacturer.  If they
don't like the numbers that are found, then they should up their game or at
least request that you put in how your tests negatively affected their
drive endurance.  For instance, WD Red drives are out of warranty just by
being placed in a chassis with more than 4 disks because they aren't rated
for the increased vibration from that many disks in a chassis.

OTOH, if you are testing the drives within the bounds of the drives
warranty, and not doing anything against the recommendation of the
manufacturer in the test use case (both physical and software), then there
is no slander when you say that drive A outperformed drive B.  I know that
the drives I run at home are not nearly as resilient as the drives that I
use at the office, but I don't put my home cluster through a fraction of
the strain that I do at the office.  The manufacturer knows that their
cheaper drive isn't as resilient as the more robust enterprise drives.
Anyway, I'm sure you guys have thought about all of that and are
_generally_ pretty smart. ;)

In an early warning system that detects a drive that is close to failing,
you could implement a command to migrate off of the disk and then run
non-stop IO on it to finish off the disk to satisfy warranties.
Potentially this could be implemented with the osd daemon via a burn-in
start-up option.  Where it can be an OSD in the cluster that does not check
in as up, but with a different status so you can still monitor the health
of the failing drive from a ceph status.  This could also be useful for
people that would like to burn-in their drives, but don't want to dedicate
infrastructure to burning-in new disks before deploying them.  Making this
as easy as possible on the end user/ceph admin, there could even be a
ceph.conf option for OSDs that are added to the cluster and have never been
been marked in to run through a burn-in of X seconds (changeable in the
config and defaults to 0 as to not change the default behavior).  I don't
know if this is over-thinking it or adding complexity where it shouldn't
be, but it could be used to get a drive to fail to use for an RMA.  OTOH,
for large deployments we would RMA drives in batches and were never asked
to prove that the drive failed.  We would RMA drives off of medium errors
for HDDs and smart info for SSDs and of course for full failures.

On Wed, Jun 14, 2017 at 11:38 AM Dan van der Ster <dan-EOCVfBHj35C+XT7JhA+gdA@public.gmane.org>
wrote:

> Hi Patrick,
>
> We've just discussed this internally and I wanted to share some notes.
>
> First, there are at least three separate efforts in our IT dept to
> collect and analyse SMART data -- its clearly a popular idea and
> simple to implement, but this leads to repetition and begs for a
> common, good solution.
>
> One (perhaps trivial) issue is that it is hard to define exactly when
> a drive has failed -- it varies depending on the storage system. For
> Ceph I would define failure as EIO, which normally correlates with a
> drive medium error, but there were other ideas here. So if this should
> be a general purpose service, the sensor should have a pluggable
> failure indicator.
>
> There was also debate about what exactly we could do with a failure
> prediction model. Suppose the predictor told us a drive should fail in
> one week. We could proactively drain that disk, but then would it
> still fail? Will the vendor replace that drive under warranty only if
> it was *about to fail*?
>
> Lastly, and more importantly, there is a general hesitation to publish
> this kind of data openly, given how negatively it could impact a
> manufacturer. Our lab certainly couldn't publish a report saying "here
> are the most and least reliable drives". I don't know if anonymising
> the data sources would help here, but anyway I'm curious what are your
> thoughts on that point. Maybe what can come out of this are the
> _components_ of a drive reliability service, which could then be
> deployed privately or publicly as appropriate.
>
> Thanks!
>
> Dan
>
>
>
>
> On Wed, May 24, 2017 at 8:57 PM, Patrick McGarry <pmcgarry-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> wrote:
> > Hey cephers,
> >
> > Just wanted to share the genesis of a new community project that could
> > use a few helping hands (and any amount of feedback/discussion that
> > you might like to offer).
> >
> > As a bit of backstory, around 2013 the Backblaze folks started
> > publishing statistics about hard drive reliability from within their
> > data center for the world to consume. This included things like model,
> > make, failure state, and SMART data. If you would like to view the
> > Backblaze data set, you can find it at:
> >
> > https://www.backblaze.com/b2/hard-drive-test-data.html
> >
> > While most major cloud providers are doing this for themselves
> > internally, we would like to replicate/enhance this effort across a
> > much wider segment of the population as a free service.  I think we
> > have a pretty good handle on the server/platform side of things, and a
> > couple of people who have expressed interest in building the
> > reliability model (although we could always use more!), what we really
> > need is a passionate volunteer who would like to come forward to write
> > the agent that sits on the drives, aggregates data, and submits daily
> > stats reports via an API (and potentially receives information back as
> > results are calculated about MTTF or potential to fail in the next
> > 24-48 hrs).
> >
> > Currently my thinking is to build our collection method based on the
> > Backblaze data set so that we can use it to train our model and build
> > from going forward. If this sounds like a project you would like to be
> > involved in (especially if you're from Backblaze!) please let me know.
> > I think a first pass of the agent should be something we can build in
> > a couple of afternoons to start testing with a small pilot group that
> > we already have available.
> >
> > Happy to entertain any thoughts or feedback that people might have.
> Thanks!
> >
> > --
> >
> > Best Regards,
> >
> > Patrick McGarry
> > Director Ceph Community || Red Hat
> > http://ceph.com  ||  http://community.redhat.com
> > @scuttlemonkey || @ceph
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

[-- Attachment #1.2: Type: text/html, Size: 8211 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ceph-users] Help build a drive reliability service!
  2017-06-14 16:09       ` David Turner
@ 2017-06-16  3:48         ` Z Will
  2017-06-16 15:48           ` Allen Samuels
  0 siblings, 1 reply; 7+ messages in thread
From: Z Will @ 2017-06-16  3:48 UTC (permalink / raw)
  To: Patrick McGarry; +Cc: Dan van der Ster, Ceph Devel, Ceph-User, David Turner

Hi Patrick:
    I want to ask a  very tiny question. How much 9s do you claim your
storage durability? And how is it calculated ? Based on the data you
provided , have you find some failure model to refine the storage
durability ?

On Thu, Jun 15, 2017 at 12:09 AM, David Turner <drakonstein@gmail.com> wrote:
> I understand concern over annoying drive manufacturers, but if you have data
> to back it up you aren't slandering a drive manufacturer.  If they don't
> like the numbers that are found, then they should up their game or at least
> request that you put in how your tests negatively affected their drive
> endurance.  For instance, WD Red drives are out of warranty just by being
> placed in a chassis with more than 4 disks because they aren't rated for the
> increased vibration from that many disks in a chassis.
>
> OTOH, if you are testing the drives within the bounds of the drives
> warranty, and not doing anything against the recommendation of the
> manufacturer in the test use case (both physical and software), then there
> is no slander when you say that drive A outperformed drive B.  I know that
> the drives I run at home are not nearly as resilient as the drives that I
> use at the office, but I don't put my home cluster through a fraction of the
> strain that I do at the office.  The manufacturer knows that their cheaper
> drive isn't as resilient as the more robust enterprise drives.  Anyway, I'm
> sure you guys have thought about all of that and are _generally_ pretty
> smart. ;)
>
> In an early warning system that detects a drive that is close to failing,
> you could implement a command to migrate off of the disk and then run
> non-stop IO on it to finish off the disk to satisfy warranties.  Potentially
> this could be implemented with the osd daemon via a burn-in start-up option.
> Where it can be an OSD in the cluster that does not check in as up, but with
> a different status so you can still monitor the health of the failing drive
> from a ceph status.  This could also be useful for people that would like to
> burn-in their drives, but don't want to dedicate infrastructure to
> burning-in new disks before deploying them.  Making this as easy as possible
> on the end user/ceph admin, there could even be a ceph.conf option for OSDs
> that are added to the cluster and have never been been marked in to run
> through a burn-in of X seconds (changeable in the config and defaults to 0
> as to not change the default behavior).  I don't know if this is
> over-thinking it or adding complexity where it shouldn't be, but it could be
> used to get a drive to fail to use for an RMA.  OTOH, for large deployments
> we would RMA drives in batches and were never asked to prove that the drive
> failed.  We would RMA drives off of medium errors for HDDs and smart info
> for SSDs and of course for full failures.
>
> On Wed, Jun 14, 2017 at 11:38 AM Dan van der Ster <dan@vanderster.com>
> wrote:
>>
>> Hi Patrick,
>>
>> We've just discussed this internally and I wanted to share some notes.
>>
>> First, there are at least three separate efforts in our IT dept to
>> collect and analyse SMART data -- its clearly a popular idea and
>> simple to implement, but this leads to repetition and begs for a
>> common, good solution.
>>
>> One (perhaps trivial) issue is that it is hard to define exactly when
>> a drive has failed -- it varies depending on the storage system. For
>> Ceph I would define failure as EIO, which normally correlates with a
>> drive medium error, but there were other ideas here. So if this should
>> be a general purpose service, the sensor should have a pluggable
>> failure indicator.
>>
>> There was also debate about what exactly we could do with a failure
>> prediction model. Suppose the predictor told us a drive should fail in
>> one week. We could proactively drain that disk, but then would it
>> still fail? Will the vendor replace that drive under warranty only if
>> it was *about to fail*?
>>
>> Lastly, and more importantly, there is a general hesitation to publish
>> this kind of data openly, given how negatively it could impact a
>> manufacturer. Our lab certainly couldn't publish a report saying "here
>> are the most and least reliable drives". I don't know if anonymising
>> the data sources would help here, but anyway I'm curious what are your
>> thoughts on that point. Maybe what can come out of this are the
>> _components_ of a drive reliability service, which could then be
>> deployed privately or publicly as appropriate.
>>
>> Thanks!
>>
>> Dan
>>
>>
>>
>>
>> On Wed, May 24, 2017 at 8:57 PM, Patrick McGarry <pmcgarry@redhat.com>
>> wrote:
>> > Hey cephers,
>> >
>> > Just wanted to share the genesis of a new community project that could
>> > use a few helping hands (and any amount of feedback/discussion that
>> > you might like to offer).
>> >
>> > As a bit of backstory, around 2013 the Backblaze folks started
>> > publishing statistics about hard drive reliability from within their
>> > data center for the world to consume. This included things like model,
>> > make, failure state, and SMART data. If you would like to view the
>> > Backblaze data set, you can find it at:
>> >
>> > https://www.backblaze.com/b2/hard-drive-test-data.html
>> >
>> > While most major cloud providers are doing this for themselves
>> > internally, we would like to replicate/enhance this effort across a
>> > much wider segment of the population as a free service.  I think we
>> > have a pretty good handle on the server/platform side of things, and a
>> > couple of people who have expressed interest in building the
>> > reliability model (although we could always use more!), what we really
>> > need is a passionate volunteer who would like to come forward to write
>> > the agent that sits on the drives, aggregates data, and submits daily
>> > stats reports via an API (and potentially receives information back as
>> > results are calculated about MTTF or potential to fail in the next
>> > 24-48 hrs).
>> >
>> > Currently my thinking is to build our collection method based on the
>> > Backblaze data set so that we can use it to train our model and build
>> > from going forward. If this sounds like a project you would like to be
>> > involved in (especially if you're from Backblaze!) please let me know.
>> > I think a first pass of the agent should be something we can build in
>> > a couple of afternoons to start testing with a small pilot group that
>> > we already have available.
>> >
>> > Happy to entertain any thoughts or feedback that people might have.
>> > Thanks!
>> >
>> > --
>> >
>> > Best Regards,
>> >
>> > Patrick McGarry
>> > Director Ceph Community || Red Hat
>> > http://ceph.com  ||  http://community.redhat.com
>> > @scuttlemonkey || @ceph
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [ceph-users] Help build a drive reliability service!
  2017-06-16  3:48         ` [ceph-users] " Z Will
@ 2017-06-16 15:48           ` Allen Samuels
  0 siblings, 0 replies; 7+ messages in thread
From: Allen Samuels @ 2017-06-16 15:48 UTC (permalink / raw)
  To: Z Will, Patrick McGarry
  Cc: Dan van der Ster, Ceph Devel, Ceph-User, David Turner

The question is tiny, the answer is Yuge ;-)

Ceph by itself can't quote a specific durability. The actual durability is a combination of HW that you use, the failure scenarios that you're looking at and the specific configuration of Ceph. Ceph provides a toolkit that allows you to overcome the durability of a specific piece of HW to synthesize a system-level durability that's much (or more often, much much much much much) better.

To get a true system-level durability number you have to combine all of the different failure mode probabilities into an aggregate number. Usually failure modes are modeled as uncorrelated events which makes the math simple and is accurate enough for most purposes.

There are LOTS of failure modes (cluster-level, drive-level and sector-level failure modes all have scenarios that lead to data loss and hence impact system-level durability). But this thread is focused on drive-level events, so we'll confine ourselves to those.

For a simple case like 2x replication (i.e., you have two copies of data lying around -- RAID-1) you're looking at case where you get a first drive failure and then a second drive failure BEFORE you've had a chance to rebuild/recover from the first drive failure. This means that you actually have two input variables to the computation, the drive failure rate (typically quoted as AFR -- annual/average failure rate. The percentage of drives that will fail within a calendar year) AND the recovery time period. However, this is the per-drive durability and you wanted the cluster-level durability, you have to scale this up by the total number of drives in the system (since ANY drive failure ANYWHERE in the system presumably is a cluster-level durability failure and the events are uncorrelated)

The durability then becomes: "What are the odds that I'll have a second drive failure WHILE I'm still rebuilding the first drive TIMES the number of drives". Which is simply the AFR * rebuild time * # of drives (with suitable units conversions of course)

One warning: AFR isn't a constant number ;-), all drives (SSD or HDD) are subject to wear-out. In long-running cluster you will typically have a population of drives with varying age and you might need to factor that into your equations based on your expected expansion, tech refresh, drive retirement policies, etc.

Rebuild time can be tricky. First you have to include the time from when the first drive failures until you actually start the rebuild (is this a manually initiated process? How long before somebody actually swaps the drive and pushes the button to start, or do you have hot standbys?). Then you have to factor in the amount of data to be rebuilt (Ceph only rebuilds 'live' data, not the whole drive, so if you're cluster if 50% full you benefit from only rebuilding 1/2 of the drive), finally you have to figure in the rebuild rate. The last item is often a problem as the greater the rebuild rate, the less performance is available for normal operations. Essentially you have to overprovision your cluster's performance level to be able to perform rebuilds at a reasonable rate [in the extreme imagine if it took a YEAR to rebuild a drive...]ove

Triple replication or +2 erasure coding have essentially the same math ( potentially different rebuild rates :-)). What's the probability that you'll have three drive failures in the window of vulnerability which is a function of the rebuild time ).

In short, by overprovisioning on performance and raw capacity (replication/erasure coding) you can achieve arbitrarily high levels of insurance (durability) against this failure mode. It's a function of how big your wallet is....







Allen Samuels  
R&D Engineering Fellow 

Western Digital® 
Email:  allen.samuels@wdc.com 
Office:  +1-408-801-7030
Mobile: +1-408-780-6416 

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Z Will
Sent: Thursday, June 15, 2017 11:48 PM
To: Patrick McGarry <pmcgarry@redhat.com>
Cc: Dan van der Ster <dan@vanderster.com>; Ceph Devel <ceph-devel@vger.kernel.org>; Ceph-User <ceph-users@ceph.com>; David Turner <drakonstein@gmail.com>
Subject: Re: [ceph-users] Help build a drive reliability service!

Hi Patrick:
    I want to ask a  very tiny question. How much 9s do you claim your storage durability? And how is it calculated ? Based on the data you provided , have you find some failure model to refine the storage durability ?

On Thu, Jun 15, 2017 at 12:09 AM, David Turner <drakonstein@gmail.com> wrote:
> I understand concern over annoying drive manufacturers, but if you 
> have data to back it up you aren't slandering a drive manufacturer.  
> If they don't like the numbers that are found, then they should up 
> their game or at least request that you put in how your tests 
> negatively affected their drive endurance.  For instance, WD Red 
> drives are out of warranty just by being placed in a chassis with more 
> than 4 disks because they aren't rated for the increased vibration from that many disks in a chassis.
>
> OTOH, if you are testing the drives within the bounds of the drives 
> warranty, and not doing anything against the recommendation of the 
> manufacturer in the test use case (both physical and software), then 
> there is no slander when you say that drive A outperformed drive B.  I 
> know that the drives I run at home are not nearly as resilient as the 
> drives that I use at the office, but I don't put my home cluster 
> through a fraction of the strain that I do at the office.  The 
> manufacturer knows that their cheaper drive isn't as resilient as the 
> more robust enterprise drives.  Anyway, I'm sure you guys have thought 
> about all of that and are _generally_ pretty smart. ;)
>
> In an early warning system that detects a drive that is close to 
> failing, you could implement a command to migrate off of the disk and 
> then run non-stop IO on it to finish off the disk to satisfy 
> warranties.  Potentially this could be implemented with the osd daemon via a burn-in start-up option.
> Where it can be an OSD in the cluster that does not check in as up, 
> but with a different status so you can still monitor the health of the 
> failing drive from a ceph status.  This could also be useful for 
> people that would like to burn-in their drives, but don't want to 
> dedicate infrastructure to burning-in new disks before deploying them.  
> Making this as easy as possible on the end user/ceph admin, there 
> could even be a ceph.conf option for OSDs that are added to the 
> cluster and have never been been marked in to run through a burn-in of 
> X seconds (changeable in the config and defaults to 0 as to not change 
> the default behavior).  I don't know if this is over-thinking it or 
> adding complexity where it shouldn't be, but it could be used to get a 
> drive to fail to use for an RMA.  OTOH, for large deployments we would 
> RMA drives in batches and were never asked to prove that the drive 
> failed.  We would RMA drives off of medium errors for HDDs and smart info for SSDs and of course for full failures.
>
> On Wed, Jun 14, 2017 at 11:38 AM Dan van der Ster <dan@vanderster.com>
> wrote:
>>
>> Hi Patrick,
>>
>> We've just discussed this internally and I wanted to share some notes.
>>
>> First, there are at least three separate efforts in our IT dept to 
>> collect and analyse SMART data -- its clearly a popular idea and 
>> simple to implement, but this leads to repetition and begs for a 
>> common, good solution.
>>
>> One (perhaps trivial) issue is that it is hard to define exactly when 
>> a drive has failed -- it varies depending on the storage system. For 
>> Ceph I would define failure as EIO, which normally correlates with a 
>> drive medium error, but there were other ideas here. So if this 
>> should be a general purpose service, the sensor should have a 
>> pluggable failure indicator.
>>
>> There was also debate about what exactly we could do with a failure 
>> prediction model. Suppose the predictor told us a drive should fail 
>> in one week. We could proactively drain that disk, but then would it 
>> still fail? Will the vendor replace that drive under warranty only if 
>> it was *about to fail*?
>>
>> Lastly, and more importantly, there is a general hesitation to 
>> publish this kind of data openly, given how negatively it could 
>> impact a manufacturer. Our lab certainly couldn't publish a report 
>> saying "here are the most and least reliable drives". I don't know if 
>> anonymising the data sources would help here, but anyway I'm curious 
>> what are your thoughts on that point. Maybe what can come out of this 
>> are the _components_ of a drive reliability service, which could then 
>> be deployed privately or publicly as appropriate.
>>
>> Thanks!
>>
>> Dan
>>
>>
>>
>>
>> On Wed, May 24, 2017 at 8:57 PM, Patrick McGarry 
>> <pmcgarry@redhat.com>
>> wrote:
>> > Hey cephers,
>> >
>> > Just wanted to share the genesis of a new community project that 
>> > could use a few helping hands (and any amount of 
>> > feedback/discussion that you might like to offer).
>> >
>> > As a bit of backstory, around 2013 the Backblaze folks started 
>> > publishing statistics about hard drive reliability from within 
>> > their data center for the world to consume. This included things 
>> > like model, make, failure state, and SMART data. If you would like 
>> > to view the Backblaze data set, you can find it at:
>> >
>> > https://www.backblaze.com/b2/hard-drive-test-data.html
>> >
>> > While most major cloud providers are doing this for themselves 
>> > internally, we would like to replicate/enhance this effort across a 
>> > much wider segment of the population as a free service.  I think we 
>> > have a pretty good handle on the server/platform side of things, 
>> > and a couple of people who have expressed interest in building the 
>> > reliability model (although we could always use more!), what we 
>> > really need is a passionate volunteer who would like to come 
>> > forward to write the agent that sits on the drives, aggregates 
>> > data, and submits daily stats reports via an API (and potentially 
>> > receives information back as results are calculated about MTTF or 
>> > potential to fail in the next
>> > 24-48 hrs).
>> >
>> > Currently my thinking is to build our collection method based on 
>> > the Backblaze data set so that we can use it to train our model and 
>> > build from going forward. If this sounds like a project you would 
>> > like to be involved in (especially if you're from Backblaze!) please let me know.
>> > I think a first pass of the agent should be something we can build 
>> > in a couple of afternoons to start testing with a small pilot group 
>> > that we already have available.
>> >
>> > Happy to entertain any thoughts or feedback that people might have.
>> > Thanks!
>> >
>> > --
>> >
>> > Best Regards,
>> >
>> > Patrick McGarry
>> > Director Ceph Community || Red Hat
>> > http://ceph.com  ||  http://community.redhat.com @scuttlemonkey || 
>> > @ceph _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-06-16 15:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-24 18:57 Help build a drive reliability service! Patrick McGarry
2017-05-24 19:35 ` John Spray
2017-05-24 19:37   ` Patrick McGarry
     [not found] ` <CAAZbbf3kS+KQ06KFe=p=Gn1Qmb8xen35U5CLgpcq3BaTT+e5Vg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-14 15:37   ` Dan van der Ster
     [not found]     ` <CABZ+qqmiKrca=AsoAYS3CaW+TTOieoSn1T_Z=gWzaxzmeA1stQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-14 16:09       ` David Turner
2017-06-16  3:48         ` [ceph-users] " Z Will
2017-06-16 15:48           ` Allen Samuels

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.