* Bad disks or bug ?
@ 2005-01-11 12:39 Ing. Gianluca Alberici
2005-01-11 13:00 ` Mark Nipper
0 siblings, 1 reply; 7+ messages in thread
From: Ing. Gianluca Alberici @ 2005-01-11 12:39 UTC (permalink / raw)
To: linux-kernel
Hello,
i have a little doubt on the following....
/> Sep 10 12:50:30 abivrs0 kernel: hdb: dma_intr: status=0x51 { DriveReady /
/> SeekComplete Error } /
/> Sep 10 12:50:30 abivrs0 kernel: hdb: dma_intr: error=0x40 /
////
...Of course the explanation of such an error is that drive has bad
sectors and i'd better change it BUT:
I have many production servers running 2.4.27, everything seems OK while
they're new.
After about an year of production many of them (5 of them at present)
begin to show the problem BUT:
1) All of them show the problem on hdb (ALL OF THEM)
2) Never had problems on hda on ANY server, disks are the same, same
size, same partitioning
3) Typically hdb is used as a disk mirror
4) Many times a mkfs.ext3 -c -c solves the problem bringing bad sectors
to a new life !
How do you explain that ? Overload on hdb due to mirroring and surface
degradation ?
OR a kind of vodoo on my hdbs ?
NOTE: Over the internet, when i searched for such errors on disks (not
ramdisks or loop devs) i ALWAYS found problems on hdb !!!
Could it be depending on master-slave configuration ? Kernel bug ? Other ?
What do you say ?
thanks for your time,
Gianluca Alberici
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad disks or bug ?
2005-01-11 12:39 Bad disks or bug ? Ing. Gianluca Alberici
@ 2005-01-11 13:00 ` Mark Nipper
2005-01-11 13:02 ` Ing. Gianluca Alberici
0 siblings, 1 reply; 7+ messages in thread
From: Mark Nipper @ 2005-01-11 13:00 UTC (permalink / raw)
To: Ing. Gianluca Alberici; +Cc: linux-kernel
On 11 Jan 2005, Ing. Gianluca Alberici wrote:
> How do you explain that ? Overload on hdb due to mirroring and surface
> degradation ?
> OR a kind of vodoo on my hdbs ?
Is it possible that hdb is closer to a high heat source
or is not being cooled as hda if all these machines are the same
case design?
--
Mark Nipper e-contacts:
4475 Carter Creek Parkway nipsy@bitgnome.net
Apartment 724 http://nipsy.bitgnome.net/
Bryan, Texas, 77802-4481 AIM/Yahoo: texasnipsy ICQ: 66971617
(979)575-3193 MSN: nipsy@tamu.edu
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GG/IT d- s++:+ a- C++$ UBL++++$ P--->+++ L+++$ !E---
W++(--) N+ o K++ w(---) O++ M V(--) PS+++(+) PE(--)
Y+ PGP t+ 5 X R tv b+++@ DI+(++) D+ G e h r++ y+(**)
------END GEEK CODE BLOCK------
---begin random quote of the moment---
"That we are not much sicker and much madder than we are is
due exclusively to that most blessed and blessing of all
natural graces, sleep."
-- Aldous Huxley
----end random quote of the moment----
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad disks or bug ?
2005-01-11 13:00 ` Mark Nipper
@ 2005-01-11 13:02 ` Ing. Gianluca Alberici
[not found] ` <41E3D2A7.3000002@sgi.com>
0 siblings, 1 reply; 7+ messages in thread
From: Ing. Gianluca Alberici @ 2005-01-11 13:02 UTC (permalink / raw)
To: Mark Nipper; +Cc: linux-kernel
Hello,
Very Interesting news from Massimo...
About the heat source, Mark, i thought about that, too, and hdb is
always the
bottom (CS Enabled) device of rackmounts (so the colder one) , in front
of the usual ball bearing fans !!!!
I am beginning to believe all this deserves a more deep investigation...
Waiting for comments,
Gianluca
Mark Nipper wrote:
>On 11 Jan 2005, Ing. Gianluca Alberici wrote:
>
>
>>How do you explain that ? Overload on hdb due to mirroring and surface
>>degradation ?
>>OR a kind of vodoo on my hdbs ?
>>
>>
>
> Is it possible that hdb is closer to a high heat source
>or is not being cooled as hda if all these machines are the same
>case design?
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <Pine.LNX.4.61.0501111452500.23213@alexandria.physik.uni-oldenburg.de>]
* Re: Bad disks or bug ?
[not found] <Pine.LNX.4.61.0501111452500.23213@alexandria.physik.uni-oldenburg.de>
@ 2005-01-11 15:07 ` Ing. Gianluca Alberici
0 siblings, 0 replies; 7+ messages in thread
From: Ing. Gianluca Alberici @ 2005-01-11 15:07 UTC (permalink / raw)
To: stamer, linux-kernel
Heinrich,
Your analysis shows you know well what we're talking about and i think
finally
you gave the optimal explanation...
I know very well the problems of the IBM DTLA series (i have a dozen of
zombies here...)
and was mainly concerned about Maxtor disks.
Finally i agree it must be something that has to do with the use of
hardware and/or firmware...
I will of course substitute my backup drive immediately !
Regards and thanks,
Gianluca
Heinrich Stamerjohanns wrote:
>Der Gianluca,
>
>I guess you have an IBM Deskstar (or now Hitachi), possibly a DTLA-307045?
>
>We have the same setup: main disk is /dev/hda, backup disk (every night)
>is /dev/hdb.
>
>One just crashed yesterday, it is the fifth IBM (out of five) that
>crashed (this is already a replacement disk...).
>Don't worry your IBM /dev/hda will crash sooner or later as well ;)
>
>But when I investigated after the first crash, I read something like
>that these disk especially do not cope with the situation that they are
>only infrequently used, but then heavily (no use at all, then continous
>backup..). The firmware has supposedly changed since then, but it
>has not helped the replacment drive.
>So I guess your problem with /dev/hdb is rather a hardware than
>a software problem.
>
>To be sure you could make /dev/hdb your main disk and backup to /dev/hda.
>I am quite sure that /dev/hda will give up first then. (But it happened to
>us that the main drive died two days later, without a replaced backup
>drive...)
>
>Greetings, Heinrich
>
>
>--
> Dr. Heinrich Stamerjohanns Tel. +49-441-798-4276
> Institute for Science Networking stamer@uni-oldenburg.de
> University of Oldenburg http://isn.uni-oldenburg.de/~stamer
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Bad disks or bug ?
@ 2005-01-11 12:18 Ing. Gianluca Alberici
0 siblings, 0 replies; 7+ messages in thread
From: Ing. Gianluca Alberici @ 2005-01-11 12:18 UTC (permalink / raw)
To: linux-kernel
Hello,
i have a little doubt on the following....
/> Sep 10 12:50:30 abivrs0 kernel: hdb: dma_intr: status=0x51 { DriveReady /
/> SeekComplete Error } /
/> Sep 10 12:50:30 abivrs0 kernel: hdb: dma_intr: error=0x40 /
////
...Of course the explanation of such an error is that drive has bad
sectors and i'd better change it BUT:
I have many production servers running 2.4.27, everything seems OK while
they're new.
After about an year of production many of them (5 of them at present)
begin to show the problem BUT:
1) All of them show the problem on hdb (ALL OF THEM)
2) Never had problems on hda on ANY server, disks are the same, same
size, same partitioning
3) Typically hdb is used as a disk mirror
4) Many times a mkfs.ext3 -c -c solves the problem bringing bad sectors
to a new life !
How do you explain that ? Overload on hdb due to mirroring and surface
degradation ?
OR a kind of vodoo on my hdbs ?
NOTE: Over the internet, when i searched for such errors on disks (not
ramdisks or loop devs) i ALWAYS found problems on hdb !!!
Could it be depending on master-slave configuration ? Kernel bug ? Other ?
What do you say ?
thanks for your time,
Gianluca Alberici
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-01-11 15:12 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-11 12:39 Bad disks or bug ? Ing. Gianluca Alberici
2005-01-11 13:00 ` Mark Nipper
2005-01-11 13:02 ` Ing. Gianluca Alberici
[not found] ` <41E3D2A7.3000002@sgi.com>
2005-01-11 13:24 ` Ing. Gianluca Alberici
2005-01-11 14:23 ` Bill Davidsen
[not found] <Pine.LNX.4.61.0501111452500.23213@alexandria.physik.uni-oldenburg.de>
2005-01-11 15:07 ` Ing. Gianluca Alberici
-- strict thread matches above, loose matches on Subject: below --
2005-01-11 12:18 Ing. Gianluca Alberici
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).