All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID Class Drives`
@ 2010-03-17 13:48 Randy Terbush
  2010-03-18 16:45 ` Joachim Otahal
  2010-03-18 19:43 ` Randy Terbush
  0 siblings, 2 replies; 20+ messages in thread
From: Randy Terbush @ 2010-03-17 13:48 UTC (permalink / raw)
  To: linux-raid

Greetings RAIDers,

Apologies if this topic has been thrashed here before. Google is not
showing me much love on the topic and that which I have found does not
convey consensus. So I am coming to the experts to get the verdict.

Recent event: I spent a fair amount of time on the line with Seagate
support yesterday who informed me that their desktop drives will not
work in a RAID array. Now I may have been living in a cave for the
past 20 years, but I always had a modem.

As I started to dig into this a bit more looking for info on TLER,
ERC, etc. from my understanding, these "RAID class" drives simply
don't have the same level of error correction as the "desktop"
alternative and instead report back to the RAID controller immediately
instead of dawdling with fixing the problem themselves.

If this is true, then I can understand where this might cause a RAID
system some problems. However, I do not understand why the RAID system
cannot detect the type of drive it is dealing with and either disable
the behavior on the drive or allow more time for the drive to respond
before kicking it out of the array.

Just to give some background on how I got to this point, but not to
distract from the main question, here is where I have been...

Over past 5 years, have been struggling with a 4 drive mdraid array
configured for RAID5. This is not a busy system by any stretch. Just a
media server for my own personal use. Started out using the SATA
headers on the MB. Gave up and bought a cheapy hardware RAID
controller. Thought better of that decision and went back to software
RAID using the hardware RAID controller as a SATA expansion card. Gave
up on that and went back to the SATA headers on the MB (had replaced
the MB along the way).

Over that period, threw out original 4 drives and replaced them with
newer bigger Seagate Barracudas. Bought snazzier and snazzier cables
along the way. Discovered a firmware upgrade for the Barracudas that I
thought had recently fixed the problem.

After speaking with Seagate yesterday, I booted off of the SeaTools
image and ran tests on all drives. The two suspect drives did have
errors that were corrected by the test software. But alas, attempting
to reassemble this array fails, dropping one drive to failed spare
status and another to spare which has been the behavior I have been
fighting for years.

So the question becomes, do I try it again with the replacement drives
that Seagate is sending me, or do I hang them in my "desktop" and
spend the money for RAID Class drives? (I've grown tired of this
learning experience and would like to just have a dependable storage
system)

And to tag onto that question, is there any reason why mdraid cannot
detect these "lesser" drives and behave differently?

Why would these drives be developing errors as a result of their
tortuous experience in a RAID array?

Thanks for any light you can shed on this issue.

-Randy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RAID Class Drives`
  2010-03-17 13:48 RAID Class Drives` Randy Terbush
@ 2010-03-18 16:45 ` Joachim Otahal
  2010-03-19  8:15   ` John Robinson
  2010-03-18 19:43 ` Randy Terbush
  1 sibling, 1 reply; 20+ messages in thread
From: Joachim Otahal @ 2010-03-18 16:45 UTC (permalink / raw)
  To: Randy Terbush; +Cc: linux-raid

Randy Terbush schrieb:
> So the question becomes, do I try it again with the replacement drives
> that Seagate is sending me, or do I hang them in my "desktop" and
> spend the money for RAID Class drives? (I've grown tired of this
> learning experience and would like to just have a dependable storage
> system)
>    
Desktop class drives are usually enough. On todays mobo's chipset SATA 
is enough too. You should take care of the temperature of the drives, 
30°C to 35°C is preferred, above 35°C the lifespan goes down, over 40°C 
rapidly down.
Do you have a regular checkarray interval? Like this one from debian 
(monthly first sunday):
57 0 * * 0 root [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -le 
7 ] && /usr/share/mdadm/checkarray --cron --all --quiet
Do you have a regular SMART check? Not only check the SMART status, keep 
the history of some values which change over time, most notably the 
Reallocated Sector Count, if that one changes every week on one drive 
(or even faster) it is time to take that drive out of the array.

> Why would these drives be developing errors as a result of their
> tortuous experience in a RAID array?
>    
I don't think RAID is more stress than normal use at home, it depends on 
how long they run, how often they spin up, how hot they get.
As for you description of the error behaviour: This is correct, 
RAID-SATA-drives don't spend a minute or more trying to read a possibly 
failing sector, most only try for less than 5 seconds to re-read a 
sector Also their mechanics have lower tolerance allowing higher MBTF 
values.

kind regards,

Joachim Otahal
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RAID Class Drives`
  2010-03-17 13:48 RAID Class Drives` Randy Terbush
  2010-03-18 16:45 ` Joachim Otahal
@ 2010-03-18 19:43 ` Randy Terbush
  2010-04-18 12:11   ` CoolCold
  1 sibling, 1 reply; 20+ messages in thread
From: Randy Terbush @ 2010-03-18 19:43 UTC (permalink / raw)
  To: linux-raid

Let me follow-up to share what I have learned and what I have managed
to do to get this array to re-assemble.

I've received several responses from people telling me that they don't
have any problem with their "desktop class" drives being dropped from
the array. Congratulations to you all. I suspect that there may be a
theme in the drives that  you are using which may have different error
correction, may be smaller than 500GB or may not support the SCT
command set.

One of the first responses I received privately was from a gentlemen
that gave me the hint I needed regarding the SCT-ERC command. He
shared my frustration and actually presents a very compelling example
where this is a big problem. He works to support a commercial NAS
product which uses "desktop" class drives and fights this problem
continually.

With this new knowledge gained I started digging a bit more and ran
across a set of patches to smarttools which allows editing the values
for SCT-ERC. You can find that source here:
http://www.csc.liv.ac.uk/~greg/projects/erc/
FWIW, the Seagate Barracudas that I am running have non-volatile
storage for this variable. Not that I am recommending Seagate. Far
from it....

I can confirm that all of my drives had this value "disabled" which
means it allows the drive to go off and take as much time as it needs
to fix its own problem.

I set the values to 7 seconds for the 4 drives in my array and
attempted to rebuild the array. Unfortunately, it failed again. So I
reset the values to 5 seconds and fired off the rebuild once again and
managed to get through the rebuild process.

Now this solution does not satisfy the situation where you are
hot-plugging drives, but it at least gets me over my hurdle.

Seems it would be a nice improvement to md to actually detect the
SCT-ERC setting, warn when it cannot change the value and offer to set
these to reasonable values for the RAID application.

Here's to happy storage...

On Wed, Mar 17, 2010 at 7:48 AM, Randy Terbush <randy@terbush.org> wrote:
> Greetings RAIDers,
>
> Apologies if this topic has been thrashed here before. Google is not
> showing me much love on the topic and that which I have found does not
> convey consensus. So I am coming to the experts to get the verdict.
>
> Recent event: I spent a fair amount of time on the line with Seagate
> support yesterday who informed me that their desktop drives will not
> work in a RAID array. Now I may have been living in a cave for the
> past 20 years, but I always had a modem.
>
> As I started to dig into this a bit more looking for info on TLER,
> ERC, etc. from my understanding, these "RAID class" drives simply
> don't have the same level of error correction as the "desktop"
> alternative and instead report back to the RAID controller immediately
> instead of dawdling with fixing the problem themselves.
>
> If this is true, then I can understand where this might cause a RAID
> system some problems. However, I do not understand why the RAID system
> cannot detect the type of drive it is dealing with and either disable
> the behavior on the drive or allow more time for the drive to respond
> before kicking it out of the array.
>
> Just to give some background on how I got to this point, but not to
> distract from the main question, here is where I have been...
>
> Over past 5 years, have been struggling with a 4 drive mdraid array
> configured for RAID5. This is not a busy system by any stretch. Just a
> media server for my own personal use. Started out using the SATA
> headers on the MB. Gave up and bought a cheapy hardware RAID
> controller. Thought better of that decision and went back to software
> RAID using the hardware RAID controller as a SATA expansion card. Gave
> up on that and went back to the SATA headers on the MB (had replaced
> the MB along the way).
>
> Over that period, threw out original 4 drives and replaced them with
> newer bigger Seagate Barracudas. Bought snazzier and snazzier cables
> along the way. Discovered a firmware upgrade for the Barracudas that I
> thought had recently fixed the problem.
>
> After speaking with Seagate yesterday, I booted off of the SeaTools
> image and ran tests on all drives. The two suspect drives did have
> errors that were corrected by the test software. But alas, attempting
> to reassemble this array fails, dropping one drive to failed spare
> status and another to spare which has been the behavior I have been
> fighting for years.
>
> So the question becomes, do I try it again with the replacement drives
> that Seagate is sending me, or do I hang them in my "desktop" and
> spend the money for RAID Class drives? (I've grown tired of this
> learning experience and would like to just have a dependable storage
> system)
>
> And to tag onto that question, is there any reason why mdraid cannot
> detect these "lesser" drives and behave differently?
>
> Why would these drives be developing errors as a result of their
> tortuous experience in a RAID array?
>
> Thanks for any light you can shed on this issue.
>
> -Randy
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RAID Class Drives`
  2010-03-18 16:45 ` Joachim Otahal
@ 2010-03-19  8:15   ` John Robinson
  2010-03-19 16:43     ` Aryeh Gregor
  2010-03-19 17:53     ` Joachim Otahal
  0 siblings, 2 replies; 20+ messages in thread
From: John Robinson @ 2010-03-19  8:15 UTC (permalink / raw)
  To: Joachim Otahal; +Cc: linux-raid

On 18/03/2010 16:45, Joachim Otahal wrote:
> [...]  You should take care of the temperature of the drives,
> 30°C to 35°C is preferred, above 35°C the lifespan goes down, over 40°C 
> rapidly down.

Do you have a reference for this? Most drives' operating temperature 
range is specified up to 55°C, sometimes higher for enterprise drives, 
without any indication (apart from common sense perhaps) that running 
them this hot reduces lifespan.

Cheers,

John.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RAID Class Drives`
  2010-03-19  8:15   ` John Robinson
@ 2010-03-19 16:43     ` Aryeh Gregor
  2010-03-19 16:53       ` Mattias Wadenstein
                         ` (2 more replies)
  2010-03-19 17:53     ` Joachim Otahal
  1 sibling, 3 replies; 20+ messages in thread
From: Aryeh Gregor @ 2010-03-19 16:43 UTC (permalink / raw)
  To: John Robinson; +Cc: Joachim Otahal, linux-raid

On Fri, Mar 19, 2010 at 4:15 AM, John Robinson
<john.robinson@anonymous.org.uk> wrote:
> Do you have a reference for this? Most drives' operating temperature range
> is specified up to 55°C, sometimes higher for enterprise drives, without any
> indication (apart from common sense perhaps) that running them this hot
> reduces lifespan.

Google's study of >100,000 disks over 9 months or so
<http://labs.google.com/papers/disk_failures.html> suggests that
hotter drives don't fail much more often:

". . . failures do not increase when the average temperature
increases. In fact, there is a clear trend showing that lower
temperatures are associated with higher failure rates.  Only at very
high temperatures is there a slight reversal of this trend." (page 5
of PDF)

"We can conclude that at moderate temperature ranges it is likely that
there are other effects which affect failure rates much more strongly
than temperatures do." (page 6)

They were using SATA and PATA consumer drives, 5400 RPM to 7200 RPM,
80 to 400 GB, put into production in or after 2001 (from page 3).
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RAID Class Drives`
  2010-03-19 16:43     ` Aryeh Gregor
@ 2010-03-19 16:53       ` Mattias Wadenstein
  2010-03-19 18:14       ` Joachim Otahal
  2010-03-22  6:55       ` Leslie Rhorer
  2 siblings, 0 replies; 20+ messages in thread
From: Mattias Wadenstein @ 2010-03-19 16:53 UTC (permalink / raw)
  To: Aryeh Gregor; +Cc: John Robinson, Joachim Otahal, linux-raid

On Fri, 19 Mar 2010, Aryeh Gregor wrote:

> On Fri, Mar 19, 2010 at 4:15 AM, John Robinson
> <john.robinson@anonymous.org.uk> wrote:
>> Do you have a reference for this? Most drives' operating temperature range
>> is specified up to 55°C, sometimes higher for enterprise drives, without any
>> indication (apart from common sense perhaps) that running them this hot
>> reduces lifespan.
>
> Google's study of >100,000 disks over 9 months or so
> <http://labs.google.com/papers/disk_failures.html> suggests that
> hotter drives don't fail much more often:
>
> ". . . failures do not increase when the average temperature
> increases. In fact, there is a clear trend showing that lower
> temperatures are associated with higher failure rates.  Only at very
> high temperatures is there a slight reversal of this trend." (page 5
> of PDF)

Do check out figure 5 though, I wouldn't run the drives hotter than 
40-45°C based on that which does seem to indicate that hot drives don't 
last as long. But then again, so would running them at <30°C...

/Mattias Wadenstein
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RAID Class Drives`
  2010-03-19  8:15   ` John Robinson
  2010-03-19 16:43     ` Aryeh Gregor
@ 2010-03-19 17:53     ` Joachim Otahal
  2010-03-20 17:26       ` Bill Davidsen
  1 sibling, 1 reply; 20+ messages in thread
From: Joachim Otahal @ 2010-03-19 17:53 UTC (permalink / raw)
  To: John Robinson; +Cc: linux-raid

John Robinson schrieb:
> On 18/03/2010 16:45, Joachim Otahal wrote:
>> [...]  You should take care of the temperature of the drives,
>> 30°C to 35°C is preferred, above 35°C the lifespan goes down, over 
>> 40°C rapidly down.
>
> Do you have a reference for this? Most drives' operating temperature 
> range is specified up to 55°C, sometimes higher for enterprise drives, 
> without any indication (apart from common sense perhaps) that running 
> them this hot reduces lifespan.
>
> Cheers,
>
> John.
>
About a half year ago the german publisher c't did this testing (or 
reported from a big testing, cannot remember) what the best temperature 
of desktop drives is. The statistic varied from drive to drive since 
some are less than 5°C over room temperature, others are 15°C or more 
over room temperature (of course mounted behind a silent fan which keeps 
the air moving, no turbine mode).
The result was that 10°C and 15°C are not good for the drives. The 
"perfect sweet spot" changes from drive to drive (even within on 
manufacturer), but all of them had their sweet spot somewhere around 
20°C to to 35°C with variation in the range of measurement error.
Some drives has a higher failure rate at 40°C, for some 55°C was no 
problem at all and showed no real change in the failure rate. The last 
two examples were the extreme cases.

Some of my drives are 2°C above room temperature, others are 12°C over 
room temperature. Sine I really take care that non reaches 40°C even in 
summer the failure rate got down from "every few month" to once in the 3 
years which is the time I really take care of the drive temperatures. 
There are 6 drives currently in use from 750GB (the hottest of all my 
drives) up to 1.5 TB in my private machines, only one of them shows a 
gradual change in the SMART values (reallocated sector count), which 
mean it will probably fail in about 1.5 years if the error rate stays 
constant. At work (at least the two machines 100% under my control) I 
had the same effect, keep the HD's cool and they will live long, let 
them get over 40°C and be ready to replace them soon.

Joachim Otahal
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RAID Class Drives`
  2010-03-19 16:43     ` Aryeh Gregor
  2010-03-19 16:53       ` Mattias Wadenstein
@ 2010-03-19 18:14       ` Joachim Otahal
  2010-03-22  6:55       ` Leslie Rhorer
  2 siblings, 0 replies; 20+ messages in thread
From: Joachim Otahal @ 2010-03-19 18:14 UTC (permalink / raw)
  To: Aryeh Gregor; +Cc: John Robinson, linux-raid

Aryeh Gregor schrieb:
> On Fri, Mar 19, 2010 at 4:15 AM, John Robinson
> <john.robinson@anonymous.org.uk>  wrote:
>    
>> Do you have a reference for this? Most drives' operating temperature range
>> is specified up to 55°C, sometimes higher for enterprise drives, without any
>> indication (apart from common sense perhaps) that running them this hot
>> reduces lifespan.
>>      
> Google's study of>100,000 disks over 9 months or so
> <http://labs.google.com/papers/disk_failures.html>  suggests that
> hotter drives don't fail much more often:
>    
Thanks for the link.
That study was referred to as a comparison to the current situation. But 
google only tested up to 400 GB in that statistic, and I DO remember 
that my old 160GB to 500 GB drives were all a lot hotter than any of my 
current drives. Those Samsung 1TB drives only reached 35°C during last 
hot summer, but I did let the fan rotate faster during the summer.

Current real world data of my drives (Windows main machine, room 
temperature is 23°C right now):
SAMSUNG 1 TB 27°C (HD103UJ)
WD 1 TB 30°C (WD10 EACS-00ZJBO)
WD 750 GB 34°C (WD75 00AACS-00ZJBO)
SAMSUNG 1 TB 25°C (HD103UJ)

Linux Server (mirrored drives):
SEAGATE 1,5 TB 34°C (ST31500341AS)
SEAGATE 1,5 TB 34°C (ST31500341AS) <- this one will probably fail in one 
or one and a half year if the realloc-sector count continues to develop 
this way.

Joachim Otahal

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RAID Class Drives`
  2010-03-19 17:53     ` Joachim Otahal
@ 2010-03-20 17:26       ` Bill Davidsen
  2010-03-21 16:14         ` Eric Shubert
  0 siblings, 1 reply; 20+ messages in thread
From: Bill Davidsen @ 2010-03-20 17:26 UTC (permalink / raw)
  To: Joachim Otahal; +Cc: John Robinson, linux-raid

Joachim Otahal wrote:
> John Robinson schrieb:
>> On 18/03/2010 16:45, Joachim Otahal wrote:
>>> [...]  You should take care of the temperature of the drives,
>>> 30°C to 35°C is preferred, above 35°C the lifespan goes down, over 
>>> 40°C rapidly down.
>>
>> Do you have a reference for this? Most drives' operating temperature 
>> range is specified up to 55°C, sometimes higher for enterprise 
>> drives, without any indication (apart from common sense perhaps) that 
>> running them this hot reduces lifespan.
>>
>> Cheers,
>>
>> John.
>>
> About a half year ago the german publisher c't did this testing (or 
> reported from a big testing, cannot remember) what the best 
> temperature of desktop drives is. The statistic varied from drive to 
> drive since some are less than 5°C over room temperature, others are 
> 15°C or more over room temperature (of course mounted behind a silent 
> fan which keeps the air moving, no turbine mode).
> The result was that 10°C and 15°C are not good for the drives. The 
> "perfect sweet spot" changes from drive to drive (even within on 
> manufacturer), but all of them had their sweet spot somewhere around 
> 20°C to to 35°C with variation in the range of measurement error.
> Some drives has a higher failure rate at 40°C, for some 55°C was no 
> problem at all and showed no real change in the failure rate. The last 
> two examples were the extreme cases.
>
> Some of my drives are 2°C above room temperature, others are 12°C over 
> room temperature. Sine I really take care that non reaches 40°C even 
> in summer the failure rate got down from "every few month" to once in 
> the 3 years which is the time I really take care of the drive 
> temperatures. There are 6 drives currently in use from 750GB (the 
> hottest of all my drives) up to 1.5 TB in my private machines, only 
> one of them shows a gradual change in the SMART values (reallocated 
> sector count), which mean it will probably fail in about 1.5 years if 
> the error rate stays constant. At work (at least the two machines 100% 
> under my control) I had the same effect, keep the HD's cool and they 
> will live long, let them get over 40°C and be ready to replace them soon.

40°C is a good target, readily available to people in the Arctic. It 
requires a lot of cooling to do it in normal climates where the ambient 
may be mid to high 40s. Fortunately my experience looks more like 
Google's, as long as you move enough air over the drive to avoid hot 
spots they seem to do well, hitting 43-46 much of the time. If I replace 
them because they're obsolete and working, they lasted long enough. 
Perhaps being "always on" is part of longevity, the ones I have on for 
5-6 years seldom fail, the desktop cycled daily maybe half that.

I do note that the WD drives run about 8°C cooler than Seagate. That's 
the "black" drive, I guess, the "green" drives would run cooler, based 
on power use. I will switch to them next build.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RAID Class Drives`
  2010-03-20 17:26       ` Bill Davidsen
@ 2010-03-21 16:14         ` Eric Shubert
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Shubert @ 2010-03-21 16:14 UTC (permalink / raw)
  To: linux-raid

Bill Davidsen wrote:
> Joachim Otahal wrote:
>> John Robinson schrieb:
>>> On 18/03/2010 16:45, Joachim Otahal wrote:
>>>> [...]  You should take care of the temperature of the drives,
>>>> 30°C to 35°C is preferred, above 35°C the lifespan goes down, over 
>>>> 40°C rapidly down.
>>>
>>> Do you have a reference for this? Most drives' operating temperature 
>>> range is specified up to 55°C, sometimes higher for enterprise 
>>> drives, without any indication (apart from common sense perhaps) that 
>>> running them this hot reduces lifespan.
>>>
>>> Cheers,
>>>
>>> John.
>>>
>> About a half year ago the german publisher c't did this testing (or 
>> reported from a big testing, cannot remember) what the best 
>> temperature of desktop drives is. The statistic varied from drive to 
>> drive since some are less than 5°C over room temperature, others are 
>> 15°C or more over room temperature (of course mounted behind a silent 
>> fan which keeps the air moving, no turbine mode).
>> The result was that 10°C and 15°C are not good for the drives. The 
>> "perfect sweet spot" changes from drive to drive (even within on 
>> manufacturer), but all of them had their sweet spot somewhere around 
>> 20°C to to 35°C with variation in the range of measurement error.
>> Some drives has a higher failure rate at 40°C, for some 55°C was no 
>> problem at all and showed no real change in the failure rate. The last 
>> two examples were the extreme cases.
>>
>> Some of my drives are 2°C above room temperature, others are 12°C over 
>> room temperature. Sine I really take care that non reaches 40°C even 
>> in summer the failure rate got down from "every few month" to once in 
>> the 3 years which is the time I really take care of the drive 
>> temperatures. There are 6 drives currently in use from 750GB (the 
>> hottest of all my drives) up to 1.5 TB in my private machines, only 
>> one of them shows a gradual change in the SMART values (reallocated 
>> sector count), which mean it will probably fail in about 1.5 years if 
>> the error rate stays constant. At work (at least the two machines 100% 
>> under my control) I had the same effect, keep the HD's cool and they 
>> will live long, let them get over 40°C and be ready to replace them soon.
> 
> 40°C is a good target, readily available to people in the Arctic. It 
> requires a lot of cooling to do it in normal climates where the ambient 
> may be mid to high 40s. Fortunately my experience looks more like 
> Google's, as long as you move enough air over the drive to avoid hot 
> spots they seem to do well, hitting 43-46 much of the time. If I replace 
> them because they're obsolete and working, they lasted long enough. 
> Perhaps being "always on" is part of longevity, the ones I have on for 
> 5-6 years seldom fail, the desktop cycled daily maybe half that.
> 
> I do note that the WD drives run about 8°C cooler than Seagate. That's 
> the "black" drive, I guess, the "green" drives would run cooler, based 
> on power use. I will switch to them next build.
> 

I find this whole discussion of drives interesting. Thanks to everyone 
for their input.

A thought occurred to me today. Realizing that the drives are generating 
heat, *if* it's true that drives which run hotter have a shorter 
lifetime (which is debatable), it's possible that the cause of heat 
generation (friction?) is the contributing factor to the shorter 
lifetime, and not the heat itself. IOW, if a drive runs hot, removing 
the heat more quickly (reducing its operating temp) wouldn't necessarily 
increase the drive's lifetime.

Just a thought.

-- 
-Eric 'shubes'

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: RAID Class Drives`
  2010-03-19 16:43     ` Aryeh Gregor
  2010-03-19 16:53       ` Mattias Wadenstein
  2010-03-19 18:14       ` Joachim Otahal
@ 2010-03-22  6:55       ` Leslie Rhorer
  2010-03-22 16:29         ` Eric Shubert
  2 siblings, 1 reply; 20+ messages in thread
From: Leslie Rhorer @ 2010-03-22  6:55 UTC (permalink / raw)
  To: 'Aryeh Gregor', 'John Robinson'
  Cc: 'Joachim Otahal', linux-raid

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Aryeh Gregor
> Sent: Friday, March 19, 2010 11:43 AM
> To: John Robinson
> Cc: Joachim Otahal; linux-raid@vger.kernel.org
> Subject: Re: RAID Class Drives`
> 
> On Fri, Mar 19, 2010 at 4:15 AM, John Robinson
> <john.robinson@anonymous.org.uk> wrote:
> > Do you have a reference for this? Most drives' operating temperature
> range
> > is specified up to 55°C, sometimes higher for enterprise drives, without
> any
> > indication (apart from common sense perhaps) that running them this hot
> > reduces lifespan.
> 
> Google's study of >100,000 disks over 9 months or so
> <http://labs.google.com/papers/disk_failures.html> suggests that
> hotter drives don't fail much more often:
> 
> ". . . failures do not increase when the average temperature
> increases. In fact, there is a clear trend showing that lower
> temperatures are associated with higher failure rates.  Only at very
> high temperatures is there a slight reversal of this trend." (page 5
> of PDF)
> 
> "We can conclude that at moderate temperature ranges it is likely that
> there are other effects which affect failure rates much more strongly
> than temperatures do." (page 6)
> 
> They were using SATA and PATA consumer drives, 5400 RPM to 7200 RPM,
> 80 to 400 GB, put into production in or after 2001 (from page 3).

	First of all, not what they call "high" temperatures in the paper
are not really very high.  Eighty C is roughly the boiling point of Ethyl
Alcohol, and in human terms this is considered quite hot.  Immersion of body
tissues in a large volume of 80C water for several seconds will result in
moderately severe burns.  For most mechanical systems however, 80C is not
particularly hot.  Many solid state electronics systems can withstand 80C
internal temperatures indefinitely.  An average healthy adult human being
has a body core temperature of 37C, and a device with a 40C surface
temperature is barely warm to the touch.  It is not hot. Unless one employs
a refrigerated fluid cooling system or a Peltier junction to actively cool
it, no drive system is ever going to be less than 30C if the room
temperature is anything other than uncomfortably cold.  It's rather cold in
my house right now, because I have the heat shut off to save money, yet the
coolest drive in my arrays - which have very effective forced air systems
built in to them - are 31C.  Most are over 33C by a wide margin.  Come
summer, all of them will be over 40C.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RAID Class Drives`
  2010-03-22  6:55       ` Leslie Rhorer
@ 2010-03-22 16:29         ` Eric Shubert
  2010-03-23  1:23           ` Brad Campbell
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Shubert @ 2010-03-22 16:29 UTC (permalink / raw)
  To: linux-raid

Leslie Rhorer wrote:
>> -----Original Message-----
>> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
>> owner@vger.kernel.org] On Behalf Of Aryeh Gregor
>> Sent: Friday, March 19, 2010 11:43 AM
>> To: John Robinson
>> Cc: Joachim Otahal; linux-raid@vger.kernel.org
>> Subject: Re: RAID Class Drives`
>>
>> On Fri, Mar 19, 2010 at 4:15 AM, John Robinson
>> <john.robinson@anonymous.org.uk> wrote:
>>> Do you have a reference for this? Most drives' operating temperature
>> range
>>> is specified up to 55°C, sometimes higher for enterprise drives, without
>> any
>>> indication (apart from common sense perhaps) that running them this hot
>>> reduces lifespan.
>> Google's study of >100,000 disks over 9 months or so
>> <http://labs.google.com/papers/disk_failures.html> suggests that
>> hotter drives don't fail much more often:
>>
>> ". . . failures do not increase when the average temperature
>> increases. In fact, there is a clear trend showing that lower
>> temperatures are associated with higher failure rates.  Only at very
>> high temperatures is there a slight reversal of this trend." (page 5
>> of PDF)
>>
>> "We can conclude that at moderate temperature ranges it is likely that
>> there are other effects which affect failure rates much more strongly
>> than temperatures do." (page 6)
>>
>> They were using SATA and PATA consumer drives, 5400 RPM to 7200 RPM,
>> 80 to 400 GB, put into production in or after 2001 (from page 3).
> 
> 	First of all, not what they call "high" temperatures in the paper
> are not really very high.  Eighty C is roughly the boiling point of Ethyl
> Alcohol, and in human terms this is considered quite hot.  Immersion of body
> tissues in a large volume of 80C water for several seconds will result in
> moderately severe burns.  For most mechanical systems however, 80C is not
> particularly hot.  Many solid state electronics systems can withstand 80C
> internal temperatures indefinitely.  An average healthy adult human being
> has a body core temperature of 37C, and a device with a 40C surface
> temperature is barely warm to the touch.  It is not hot. Unless one employs
> a refrigerated fluid cooling system or a Peltier junction to actively cool
> it, no drive system is ever going to be less than 30C if the room
> temperature is anything other than uncomfortably cold.  It's rather cold in
> my house right now, because I have the heat shut off to save money, yet the
> coolest drive in my arrays - which have very effective forced air systems
> built in to them - are 31C.  Most are over 33C by a wide margin.  Come
> summer, all of them will be over 40C.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

I had a few drives running at about 55C for a couple years, with no 
failures (knock wood). These were used drives before being put into that 
environment, so they arguably had already survived the "infant mortality 
syndrome" that the google study identified. Would I recommend running 
drives at 55C? No, but I wouldn't be too concerned about it either.

FWIW.

-- 
-Eric 'shubes'

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RAID Class Drives`
  2010-03-22 16:29         ` Eric Shubert
@ 2010-03-23  1:23           ` Brad Campbell
  2010-03-23 17:45             ` Eric Shubert
  0 siblings, 1 reply; 20+ messages in thread
From: Brad Campbell @ 2010-03-23  1:23 UTC (permalink / raw)
  To: Eric Shubert; +Cc: linux-raid

Eric Shubert wrote:

> I had a few drives running at about 55C for a couple years, with no 
> failures (knock wood). These were used drives before being put into that 
> environment, so they arguably had already survived the "infant mortality 
> syndrome" that the google study identified. Would I recommend running 
> drives at 55C? No, but I wouldn't be too concerned about it either.

I know of at least one manufacturer who voids the warranty if the drive exceeds 55 Degrees. I have a 
couple of drives here that have the "Exceeded 55 degrees" mark permanently recorded in their SMART 
data now. Having said that, I've not had issues with them yet (they only have about 14,000 hours on 
them).

I'd wager that extended running at elevated temperatures has the potential to affect the bearing 
lubricant, but it's just an hunch. I've never killed a drive from overtemp (and in a couple of cases 
I actually tried).

Brad
-- 
Dolphins are so intelligent that within a few weeks they can
train Americans to stand at the edge of the pool and throw them
fish.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RAID Class Drives`
  2010-03-23  1:23           ` Brad Campbell
@ 2010-03-23 17:45             ` Eric Shubert
  2010-04-02  5:43               ` Leslie Rhorer
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Shubert @ 2010-03-23 17:45 UTC (permalink / raw)
  To: linux-raid

Brad Campbell wrote:
> Eric Shubert wrote:
> 
>> I had a few drives running at about 55C for a couple years, with no 
>> failures (knock wood). These were used drives before being put into 
>> that environment, so they arguably had already survived the "infant 
>> mortality syndrome" that the google study identified. Would I 
>> recommend running drives at 55C? No, but I wouldn't be too concerned 
>> about it either.
> 
> I know of at least one manufacturer who voids the warranty if the drive 
> exceeds 55 Degrees.

Care to name names? (Inquiring minds want to know!)

> I have a couple of drives here that have the 
> "Exceeded 55 degrees" mark permanently recorded in their SMART data now. 
> Having said that, I've not had issues with them yet (they only have 
> about 14,000 hours on them).
> 
> I'd wager that extended running at elevated temperatures has the 
> potential to affect the bearing lubricant, but it's just an hunch. I've 
> never killed a drive from overtemp (and in a couple of cases I actually 
> tried).

That would be my guess as well, although I'm sure that there are high 
temp lubes available. Just depends on what the mfr uses.

-- 
-Eric 'shubes'


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: RAID Class Drives`
  2010-03-23 17:45             ` Eric Shubert
@ 2010-04-02  5:43               ` Leslie Rhorer
  2010-04-02 20:04                 ` Richard Scobie
  0 siblings, 1 reply; 20+ messages in thread
From: Leslie Rhorer @ 2010-04-02  5:43 UTC (permalink / raw)
  To: 'Eric Shubert', linux-raid



> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Eric Shubert
> Sent: Tuesday, March 23, 2010 12:45 PM
> To: linux-raid@vger.kernel.org
> Subject: Re: RAID Class Drives`
> 
> Brad Campbell wrote:
> > Eric Shubert wrote:
> >
> >> I had a few drives running at about 55C for a couple years, with no
> >> failures (knock wood). These were used drives before being put into
> >> that environment, so they arguably had already survived the "infant
> >> mortality syndrome" that the google study identified. Would I
> >> recommend running drives at 55C? No, but I wouldn't be too concerned
> >> about it either.
> >
> > I know of at least one manufacturer who voids the warranty if the drive
> > exceeds 55 Degrees.
> 
> Care to name names? (Inquiring minds want to know!)
> 
> > I have a couple of drives here that have the
> > "Exceeded 55 degrees" mark permanently recorded in their SMART data now.
> > Having said that, I've not had issues with them yet (they only have
> > about 14,000 hours on them).
> >
> > I'd wager that extended running at elevated temperatures has the
> > potential to affect the bearing lubricant, but it's just an hunch. I've
> > never killed a drive from overtemp (and in a couple of cases I actually
> > tried).
> 
> That would be my guess as well, although I'm sure that there are high
> temp lubes available. Just depends on what the mfr uses.

	I would not expect a hard drive to use any fluid lubricant at all in
its bearings, although it is possible.  Nonetheless, 55C is *NOT* a high
temperature for any industrial lubricant, dry or fluid.  Most petroleum
based and organic lubricants can easily withstand temperatures well in
excess of 140C indefinitely.  The motor oil in your car's engine is
subjected to much higher temperatures than that daily, and if it were not
for the blow-by of hot gases laden with graphite particles and un-burned
gasoline from the engine cylinders, the oil would last for many years.  I
would expect the drives to use delron or teflon bearings, or possibly
aluminum on brass, without any fluid lubricant at all.  Any of these can
easily withstand close to or more than 200C.

	The main source of failure of the drive is going to be its thin film
metallic oxide coating, whose life will be significantly reduced as the
temperature increases.  The next most heat-sensitive part of the drive is
going to be the electronics.  This is especially true since some of the
electronic components (which are generating most of the heat in the drive
other than the head actuator servo) may not be very well thermally coupled
with the aluminum housing.  This means their temperature is going to be much
higher than the temperature of the aluminum case of the drive.  Most Silicon
and Metal Oxide semiconductors will start to fail more rapidly over 80C, but
below 80C, most semiconductors have lifetimes in the dozens of years.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RAID Class Drives`
  2010-04-02  5:43               ` Leslie Rhorer
@ 2010-04-02 20:04                 ` Richard Scobie
  2010-04-05  2:50                   ` Leslie Rhorer
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Scobie @ 2010-04-02 20:04 UTC (permalink / raw)
  To: Leslie Rhorer; +Cc: 'Eric Shubert', linux-raid

Leslie Rhorer wrote:

> 	I would not expect a hard drive to use any fluid lubricant at all in
> its bearings, although it is possible.  Nonetheless, 55C is *NOT* a high

Google "disk drive fluid bearing". Many current drives use fluid rather 
than the previously used precision ball bearings.

> temperature for any industrial lubricant, dry or fluid.  Most petroleum
> based and organic lubricants can easily withstand temperatures well in
> excess of 140C indefinitely.  The motor oil in your car's engine is
> subjected to much higher temperatures than that daily, and if it were not
> for the blow-by of hot gases laden with graphite particles and un-burned
> gasoline from the engine cylinders, the oil would last for many years.  I

Off topic , but a significant cause of motor oil degradation is 
increasing viscocity due to the lighter fractions evaporating over time 
at high temerature.

> would expect the drives to use delron or teflon bearings, or possibly
> aluminum on brass, without any fluid lubricant at all.  Any of these can
> easily withstand close to or more than 200C.

Prior to the relatively recent practice of disk drive heads being parked 
  off the surface of the platter, it was not uncommon for drives that 
had been run for extended periods, at high teperatures, to not restart 
after having been shut down.

In many cases this was caused by stiction, brought on due to 
vaporisation of bearing lubricant depositing back onto the platter surface.

Regards,

Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: RAID Class Drives`
  2010-04-02 20:04                 ` Richard Scobie
@ 2010-04-05  2:50                   ` Leslie Rhorer
  0 siblings, 0 replies; 20+ messages in thread
From: Leslie Rhorer @ 2010-04-05  2:50 UTC (permalink / raw)
  To: 'Richard Scobie'; +Cc: 'Eric Shubert', linux-raid

> -----Original Message-----
> From: Richard Scobie [mailto:richard@sauce.co.nz]
> Sent: Friday, April 02, 2010 3:04 PM
> To: Leslie Rhorer
> Cc: 'Eric Shubert'; linux-raid@vger.kernel.org
> Subject: Re: RAID Class Drives`
> 
> Leslie Rhorer wrote:
> 
> > 	I would not expect a hard drive to use any fluid lubricant at all in
> > its bearings, although it is possible.  Nonetheless, 55C is *NOT* a high
> 
> Google "disk drive fluid bearing". Many current drives use fluid rather
> than the previously used precision ball bearings.
> 
> > temperature for any industrial lubricant, dry or fluid.  Most petroleum
> > based and organic lubricants can easily withstand temperatures well in
> > excess of 140C indefinitely.  The motor oil in your car's engine is
> > subjected to much higher temperatures than that daily, and if it were
> not
> > for the blow-by of hot gases laden with graphite particles and un-burned
> > gasoline from the engine cylinders, the oil would last for many years.
> I
> 
> Off topic , but a significant cause of motor oil degradation is
> increasing viscocity due to the lighter fractions evaporating over time
> at high temerature.

	This is true.  The cylinder walls and head surfaces get *VERY* hot.

> > would expect the drives to use delron or teflon bearings, or possibly
> > aluminum on brass, without any fluid lubricant at all.  Any of these can
> > easily withstand close to or more than 200C.
> 
> Prior to the relatively recent practice of disk drive heads being parked
>   off the surface of the platter, it was not uncommon for drives that
> had been run for extended periods, at high teperatures, to not restart
> after having been shut down.
> 
> In many cases this was caused by stiction, brought on due to
> vaporisation of bearing lubricant depositing back onto the platter
> surface.

	I was aware of the "stiction" problem.  I was not aware it was due
to bearing lubricant.  My impression was it was due to the platter
lubricant.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RAID Class Drives`
  2010-03-18 19:43 ` Randy Terbush
@ 2010-04-18 12:11   ` CoolCold
       [not found]     ` <4BCB6484.7040500@stud.tu-ilmenau.de>
  0 siblings, 1 reply; 20+ messages in thread
From: CoolCold @ 2010-04-18 12:11 UTC (permalink / raw)
  To: Randy Terbush; +Cc: linux-raid

On Thu, Mar 18, 2010 at 11:43 PM, Randy Terbush <randy@terbush.org> wrote:
> Let me follow-up to share what I have learned and what I have managed
> to do to get this array to re-assemble.
>
> I've received several responses from people telling me that they don't
> have any problem with their "desktop class" drives being dropped from
> the array. Congratulations to you all. I suspect that there may be a
> theme in the drives that  you are using which may have different error
> correction, may be smaller than 500GB or may not support the SCT
> command set.
>
> One of the first responses I received privately was from a gentlemen
> that gave me the hint I needed regarding the SCT-ERC command. He
> shared my frustration and actually presents a very compelling example
> where this is a big problem. He works to support a commercial NAS
> product which uses "desktop" class drives and fights this problem
> continually.
>
> With this new knowledge gained I started digging a bit more and ran
> across a set of patches to smarttools which allows editing the values
> for SCT-ERC. You can find that source here:
> http://www.csc.liv.ac.uk/~greg/projects/erc/
> FWIW, the Seagate Barracudas that I am running have non-volatile
> storage for this variable. Not that I am recommending Seagate. Far
> from it....
>
> I can confirm that all of my drives had this value "disabled" which
> means it allows the drive to go off and take as much time as it needs
> to fix its own problem.
>
> I set the values to 7 seconds for the 4 drives in my array and
> attempted to rebuild the array. Unfortunately, it failed again. So I
> reset the values to 5 seconds and fired off the rebuild once again and
> managed to get through the rebuild process.
I don't really understand one point - why it failed? Did the
controller dropped device because it wasn't responsible or md did
this? Rephrasing my question - this is really "tuning" for controller
not to drop device and report error or for md?
And if drive has errors anyway, why it shouldn't be dropped, is it for
just in case we have read error, we can try to rewrite it from the
alive array part? If we have write error, we gonna drop drive from
array anyway..

>
> Now this solution does not satisfy the situation where you are
> hot-plugging drives, but it at least gets me over my hurdle.
>
> Seems it would be a nice improvement to md to actually detect the
> SCT-ERC setting, warn when it cannot change the value and offer to set
> these to reasonable values for the RAID application.
>
> Here's to happy storage...
>
> On Wed, Mar 17, 2010 at 7:48 AM, Randy Terbush <randy@terbush.org> wrote:
>> Greetings RAIDers,
>>
>> Apologies if this topic has been thrashed here before. Google is not
>> showing me much love on the topic and that which I have found does not
>> convey consensus. So I am coming to the experts to get the verdict.
>>
>> Recent event: I spent a fair amount of time on the line with Seagate
>> support yesterday who informed me that their desktop drives will not
>> work in a RAID array. Now I may have been living in a cave for the
>> past 20 years, but I always had a modem.
>>
>> As I started to dig into this a bit more looking for info on TLER,
>> ERC, etc. from my understanding, these "RAID class" drives simply
>> don't have the same level of error correction as the "desktop"
>> alternative and instead report back to the RAID controller immediately
>> instead of dawdling with fixing the problem themselves.
>>
>> If this is true, then I can understand where this might cause a RAID
>> system some problems. However, I do not understand why the RAID system
>> cannot detect the type of drive it is dealing with and either disable
>> the behavior on the drive or allow more time for the drive to respond
>> before kicking it out of the array.
>>
>> Just to give some background on how I got to this point, but not to
>> distract from the main question, here is where I have been...
>>
>> Over past 5 years, have been struggling with a 4 drive mdraid array
>> configured for RAID5. This is not a busy system by any stretch. Just a
>> media server for my own personal use. Started out using the SATA
>> headers on the MB. Gave up and bought a cheapy hardware RAID
>> controller. Thought better of that decision and went back to software
>> RAID using the hardware RAID controller as a SATA expansion card. Gave
>> up on that and went back to the SATA headers on the MB (had replaced
>> the MB along the way).
>>
>> Over that period, threw out original 4 drives and replaced them with
>> newer bigger Seagate Barracudas. Bought snazzier and snazzier cables
>> along the way. Discovered a firmware upgrade for the Barracudas that I
>> thought had recently fixed the problem.
>>
>> After speaking with Seagate yesterday, I booted off of the SeaTools
>> image and ran tests on all drives. The two suspect drives did have
>> errors that were corrected by the test software. But alas, attempting
>> to reassemble this array fails, dropping one drive to failed spare
>> status and another to spare which has been the behavior I have been
>> fighting for years.
>>
>> So the question becomes, do I try it again with the replacement drives
>> that Seagate is sending me, or do I hang them in my "desktop" and
>> spend the money for RAID Class drives? (I've grown tired of this
>> learning experience and would like to just have a dependable storage
>> system)
>>
>> And to tag onto that question, is there any reason why mdraid cannot
>> detect these "lesser" drives and behave differently?
>>
>> Why would these drives be developing errors as a result of their
>> tortuous experience in a RAID array?
>>
>> Thanks for any light you can shed on this issue.
>>
>> -Randy
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RAID Class Drives`
       [not found]     ` <4BCB6484.7040500@stud.tu-ilmenau.de>
@ 2010-04-19 10:11       ` CoolCold
       [not found]         ` <4BCC7C27.1000606@stud.tu-ilmenau.de>
  0 siblings, 1 reply; 20+ messages in thread
From: CoolCold @ 2010-04-19 10:11 UTC (permalink / raw)
  To: st0ff; +Cc: Linux RAID

On Sun, Apr 18, 2010 at 11:59 PM, Stefan /*St0fF*/ Hübner
<stefan.huebner@stud.tu-ilmenau.de> wrote:
> Hi!
>
> Am 18.04.2010 14:11, schrieb CoolCold:
>> I don't really understand one point - why it failed? Did the
>> controller dropped device because it wasn't responsible or md did
>> this? Rephrasing my question - this is really "tuning" for controller
>> not to drop device and report error or for md?
>
> If a desktop class drive starts its error recovery, it becomes
> unresponsive.  MD thinks this, but it isn't smart.  It tries to rewrite
> the sector while the drive itself is still in error recovery mode and by
> that unresponsive.  The write fails, MD drops the device.
Stop-stop. It is clear that timeouting "read" request in short period
is good idea, but i wanna know about writes
Does write fail because controller returns smth like "media error" or
md has internal operation timeouts?
Even if drive doens't become irresponsible and returns "error on
write" it will be dropped anyway.

So, SCT-ERC setting will prevent drive to be irresponsible for long
time which may be desirable in case of:
a) md doesn't have it's own timeouting mechanism and the whole md
device will be stucked
b) drive ( another partition ) is part of another array/lvm
pv/whatever and that device will be stucked too.


>
>> And if drive has errors anyway, why it shouldn't be dropped, is it for
>> just in case we have read error, we can try to rewrite it from the
>> alive array part? If we have write error, we gonna drop drive from
>> array anyway..
>>
> Because that is the reason hdd manufacturers built in spare sectors and
> internal error recovery procedures.  Think about the write density of
> todays drives, think about the many influences which it must work with.
>  The nearly atomically-sized bits on the platter.
> IT IS NOT POSSIBLE to build a perfect drive on todays specifications.
> That's why this exists in the first place.  And as we can see it is a
> very easy way for hdd manufacturers to make extra money.  Just label
> those drives that also passed the very last quality test (not only the
> other 10 tests before) "superdrive", sell them for nearly double the
> price and give them a firmware, where the advanced features are just
> enabled as default...
>
> /stefan
>
>



-- 
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RAID Class Drives`
       [not found]         ` <4BCC7C27.1000606@stud.tu-ilmenau.de>
@ 2010-04-19 20:10           ` CoolCold
  0 siblings, 0 replies; 20+ messages in thread
From: CoolCold @ 2010-04-19 20:10 UTC (permalink / raw)
  To: Linux RAID

On Mon, Apr 19, 2010 at 7:52 PM, Stefan /*St0fF*/ Hübner
<stefan.huebner@stud.tu-ilmenau.de> wrote:
> Hi again!
>
> Am 19.04.2010 12:11, schrieb CoolCold:
>> Stop-stop. It is clear that timeouting "read" request in short period
>> is good idea, but i wanna know about writes
>> Does write fail because controller returns smth like "media error" or
>> md has internal operation timeouts?
>
> Afaik the kernel has internal timeouts, that make unresponsive disks
> drop out.
>
>> Even if drive doens't become irresponsible and returns "error on
>> write" it will be dropped anyway.
>
> I'm not sure, you might be right.
>>
>> So, SCT-ERC setting will prevent drive to be irresponsible for long
>> time which may be desirable in case of:
>> a) md doesn't have it's own timeouting mechanism and the whole md
>> device will be stucked
>> b) drive ( another partition ) is part of another array/lvm
>> pv/whatever and that device will be stucked too.
>>
>
> Nope.  In the ATA8-ACS it is noted about like that: if the erc timer is
> about to expire, it is the duty of the drive to reallocate the sector
> and save the data-to-write onto the spare sector.
>
> So actually, if a write error occurs the drive should never report it as
> long as spare sectors are available.  That's because of ncq/tcq and scsi
> taskfiles - which make it nearly impossible to find out which write
> command failed and reconstruct the data for a retry.  On the other hand:
> the drive should still know the data somehow ;)
>
> The same applies to non-enabled write-erc.  The drive only runs the
> whole error-correction before it reallocates the sector (I had this 14
> times already (according to SMART) on my laptop - weird if it doesn't
> respond for 2min, but then all of a sudden everything's great again).
>
> All the best and I hope I could help at least a bit,
> Stefan
>

Thanks a lot!

After some additional googling, i've found similar dialogue on this
list - http://kerneltrap.org/mailarchive/linux-raid/2010/3/25/6883733
. It's contents and links cleared almost anything for me.

1) ERC ( TLER / CCTL ) is primary aimed on read requests.
2) Should help on writes because "if the erc timer is about to expire,
it is the duty of the drive to reallocate the sector and save the
data-to-write onto the spare sector"

Accoring to this statements, if md kicked out drive with ERC enabled,
drive is almost dead. Is it true in practice?;)

-- 
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-04-19 20:10 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-17 13:48 RAID Class Drives` Randy Terbush
2010-03-18 16:45 ` Joachim Otahal
2010-03-19  8:15   ` John Robinson
2010-03-19 16:43     ` Aryeh Gregor
2010-03-19 16:53       ` Mattias Wadenstein
2010-03-19 18:14       ` Joachim Otahal
2010-03-22  6:55       ` Leslie Rhorer
2010-03-22 16:29         ` Eric Shubert
2010-03-23  1:23           ` Brad Campbell
2010-03-23 17:45             ` Eric Shubert
2010-04-02  5:43               ` Leslie Rhorer
2010-04-02 20:04                 ` Richard Scobie
2010-04-05  2:50                   ` Leslie Rhorer
2010-03-19 17:53     ` Joachim Otahal
2010-03-20 17:26       ` Bill Davidsen
2010-03-21 16:14         ` Eric Shubert
2010-03-18 19:43 ` Randy Terbush
2010-04-18 12:11   ` CoolCold
     [not found]     ` <4BCB6484.7040500@stud.tu-ilmenau.de>
2010-04-19 10:11       ` CoolCold
     [not found]         ` <4BCC7C27.1000606@stud.tu-ilmenau.de>
2010-04-19 20:10           ` CoolCold

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.