* Bad disks or bug ?
@ 2005-01-11 12:18 Ing. Gianluca Alberici
0 siblings, 0 replies; 7+ messages in thread
From: Ing. Gianluca Alberici @ 2005-01-11 12:18 UTC (permalink / raw)
To: linux-kernel
Hello,
i have a little doubt on the following....
/> Sep 10 12:50:30 abivrs0 kernel: hdb: dma_intr: status=0x51 { DriveReady /
/> SeekComplete Error } /
/> Sep 10 12:50:30 abivrs0 kernel: hdb: dma_intr: error=0x40 /
////
...Of course the explanation of such an error is that drive has bad
sectors and i'd better change it BUT:
I have many production servers running 2.4.27, everything seems OK while
they're new.
After about an year of production many of them (5 of them at present)
begin to show the problem BUT:
1) All of them show the problem on hdb (ALL OF THEM)
2) Never had problems on hda on ANY server, disks are the same, same
size, same partitioning
3) Typically hdb is used as a disk mirror
4) Many times a mkfs.ext3 -c -c solves the problem bringing bad sectors
to a new life !
How do you explain that ? Overload on hdb due to mirroring and surface
degradation ?
OR a kind of vodoo on my hdbs ?
NOTE: Over the internet, when i searched for such errors on disks (not
ramdisks or loop devs) i ALWAYS found problems on hdb !!!
Could it be depending on master-slave configuration ? Kernel bug ? Other ?
What do you say ?
thanks for your time,
Gianluca Alberici
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad disks or bug ?
[not found] <Pine.LNX.4.61.0501111452500.23213@alexandria.physik.uni-oldenburg.de>
@ 2005-01-11 15:07 ` Ing. Gianluca Alberici
0 siblings, 0 replies; 7+ messages in thread
From: Ing. Gianluca Alberici @ 2005-01-11 15:07 UTC (permalink / raw)
To: stamer, linux-kernel
Heinrich,
Your analysis shows you know well what we're talking about and i think
finally
you gave the optimal explanation...
I know very well the problems of the IBM DTLA series (i have a dozen of
zombies here...)
and was mainly concerned about Maxtor disks.
Finally i agree it must be something that has to do with the use of
hardware and/or firmware...
I will of course substitute my backup drive immediately !
Regards and thanks,
Gianluca
Heinrich Stamerjohanns wrote:
>Der Gianluca,
>
>I guess you have an IBM Deskstar (or now Hitachi), possibly a DTLA-307045?
>
>We have the same setup: main disk is /dev/hda, backup disk (every night)
>is /dev/hdb.
>
>One just crashed yesterday, it is the fifth IBM (out of five) that
>crashed (this is already a replacement disk...).
>Don't worry your IBM /dev/hda will crash sooner or later as well ;)
>
>But when I investigated after the first crash, I read something like
>that these disk especially do not cope with the situation that they are
>only infrequently used, but then heavily (no use at all, then continous
>backup..). The firmware has supposedly changed since then, but it
>has not helped the replacment drive.
>So I guess your problem with /dev/hdb is rather a hardware than
>a software problem.
>
>To be sure you could make /dev/hdb your main disk and backup to /dev/hda.
>I am quite sure that /dev/hda will give up first then. (But it happened to
>us that the main drive died two days later, without a replaced backup
>drive...)
>
>Greetings, Heinrich
>
>
>--
> Dr. Heinrich Stamerjohanns Tel. +49-441-798-4276
> Institute for Science Networking stamer@uni-oldenburg.de
> University of Oldenburg http://isn.uni-oldenburg.de/~stamer
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad disks or bug ?
2005-01-11 13:24 ` Ing. Gianluca Alberici
@ 2005-01-11 14:23 ` Bill Davidsen
0 siblings, 0 replies; 7+ messages in thread
From: Bill Davidsen @ 2005-01-11 14:23 UTC (permalink / raw)
To: linux-kernel, Ing. Gianluca Alberici; +Cc: Prarit Bhargava, linux-kernel
Ing. Gianluca Alberici wrote:
> Prarit,
>
> I run 2.4.27 on all my machines.
>
> Never had problems but this (if we want to say its not a disk problem)
>
> I have seen on mailing lists many people having this very problem,
> always on hdb, with a lot of different kernel and machines.
>
> I have basically two kind of cabinets:
>
> - The 'Antec style' tower
> - Racmount cases
>
> Everything well cooled...i am sure bout that.
>
> Always ASUS A7V, Athlon XP 2xxx+, NEVER OVERCLOCKED, of course...
>
> Disks are mainly Maxtor, or IBM
>
> Basically i was wondering whether to swap disks on a server just to try....
>
> ....These are the things that rave me mad !
You might see if you can swap positions in the case first, then I guess
you would have nothing to try but actually flipping master with slave
(and contents as well, good fun).
Just wondering, where is your swap? Your Journal? Do you have a small
fast drive you could use for swap and external journal? I checked all
the machines which have a similar configuration, and I don't see any
pattern, although all the machines with really high i/o load are using SCSI.
--
-bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad disks or bug ?
[not found] ` <41E3D2A7.3000002@sgi.com>
@ 2005-01-11 13:24 ` Ing. Gianluca Alberici
2005-01-11 14:23 ` Bill Davidsen
0 siblings, 1 reply; 7+ messages in thread
From: Ing. Gianluca Alberici @ 2005-01-11 13:24 UTC (permalink / raw)
To: Prarit Bhargava, linux-kernel
Prarit,
I run 2.4.27 on all my machines.
Never had problems but this (if we want to say its not a disk problem)
I have seen on mailing lists many people having this very problem,
always on hdb, with a lot of different kernel and machines.
I have basically two kind of cabinets:
- The 'Antec style' tower
- Racmount cases
Everything well cooled...i am sure bout that.
Always ASUS A7V, Athlon XP 2xxx+, NEVER OVERCLOCKED, of course...
Disks are mainly Maxtor, or IBM
Basically i was wondering whether to swap disks on a server just to try....
....These are the things that rave me mad !
Prarit Bhargava wrote:
> Hi Gianluca,
>
> How old is your kernel?
> P.
>
> Ing. Gianluca Alberici wrote:
>
>> Hello,
>>
>> Very Interesting news from Massimo...
>>
>> About the heat source, Mark, i thought about that, too, and hdb is
>> always the
>> bottom (CS Enabled) device of rackmounts (so the colder one) , in front
>> of the usual ball bearing fans !!!!
>>
>> I am beginning to believe all this deserves a more deep investigation...
>>
>> Waiting for comments,
>>
>> Gianluca
>>
>> Mark Nipper wrote:
>>
>>> On 11 Jan 2005, Ing. Gianluca Alberici wrote:
>>>
>>>
>>>> How do you explain that ? Overload on hdb due to mirroring and surface
>>>> degradation ?
>>>> OR a kind of vodoo on my hdbs ?
>>>>
>>>
>>>
>>>
>>> Is it possible that hdb is closer to a high heat source
>>> or is not being cooled as hda if all these machines are the same
>>> case design?
>>>
>>>
>>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad disks or bug ?
2005-01-11 13:00 ` Mark Nipper
@ 2005-01-11 13:02 ` Ing. Gianluca Alberici
[not found] ` <41E3D2A7.3000002@sgi.com>
0 siblings, 1 reply; 7+ messages in thread
From: Ing. Gianluca Alberici @ 2005-01-11 13:02 UTC (permalink / raw)
To: Mark Nipper; +Cc: linux-kernel
Hello,
Very Interesting news from Massimo...
About the heat source, Mark, i thought about that, too, and hdb is
always the
bottom (CS Enabled) device of rackmounts (so the colder one) , in front
of the usual ball bearing fans !!!!
I am beginning to believe all this deserves a more deep investigation...
Waiting for comments,
Gianluca
Mark Nipper wrote:
>On 11 Jan 2005, Ing. Gianluca Alberici wrote:
>
>
>>How do you explain that ? Overload on hdb due to mirroring and surface
>>degradation ?
>>OR a kind of vodoo on my hdbs ?
>>
>>
>
> Is it possible that hdb is closer to a high heat source
>or is not being cooled as hda if all these machines are the same
>case design?
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad disks or bug ?
2005-01-11 12:39 Ing. Gianluca Alberici
@ 2005-01-11 13:00 ` Mark Nipper
2005-01-11 13:02 ` Ing. Gianluca Alberici
0 siblings, 1 reply; 7+ messages in thread
From: Mark Nipper @ 2005-01-11 13:00 UTC (permalink / raw)
To: Ing. Gianluca Alberici; +Cc: linux-kernel
On 11 Jan 2005, Ing. Gianluca Alberici wrote:
> How do you explain that ? Overload on hdb due to mirroring and surface
> degradation ?
> OR a kind of vodoo on my hdbs ?
Is it possible that hdb is closer to a high heat source
or is not being cooled as hda if all these machines are the same
case design?
--
Mark Nipper e-contacts:
4475 Carter Creek Parkway nipsy@bitgnome.net
Apartment 724 http://nipsy.bitgnome.net/
Bryan, Texas, 77802-4481 AIM/Yahoo: texasnipsy ICQ: 66971617
(979)575-3193 MSN: nipsy@tamu.edu
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GG/IT d- s++:+ a- C++$ UBL++++$ P--->+++ L+++$ !E---
W++(--) N+ o K++ w(---) O++ M V(--) PS+++(+) PE(--)
Y+ PGP t+ 5 X R tv b+++@ DI+(++) D+ G e h r++ y+(**)
------END GEEK CODE BLOCK------
---begin random quote of the moment---
"That we are not much sicker and much madder than we are is
due exclusively to that most blessed and blessing of all
natural graces, sleep."
-- Aldous Huxley
----end random quote of the moment----
^ permalink raw reply [flat|nested] 7+ messages in thread
* Bad disks or bug ?
@ 2005-01-11 12:39 Ing. Gianluca Alberici
2005-01-11 13:00 ` Mark Nipper
0 siblings, 1 reply; 7+ messages in thread
From: Ing. Gianluca Alberici @ 2005-01-11 12:39 UTC (permalink / raw)
To: linux-kernel
Hello,
i have a little doubt on the following....
/> Sep 10 12:50:30 abivrs0 kernel: hdb: dma_intr: status=0x51 { DriveReady /
/> SeekComplete Error } /
/> Sep 10 12:50:30 abivrs0 kernel: hdb: dma_intr: error=0x40 /
////
...Of course the explanation of such an error is that drive has bad
sectors and i'd better change it BUT:
I have many production servers running 2.4.27, everything seems OK while
they're new.
After about an year of production many of them (5 of them at present)
begin to show the problem BUT:
1) All of them show the problem on hdb (ALL OF THEM)
2) Never had problems on hda on ANY server, disks are the same, same
size, same partitioning
3) Typically hdb is used as a disk mirror
4) Many times a mkfs.ext3 -c -c solves the problem bringing bad sectors
to a new life !
How do you explain that ? Overload on hdb due to mirroring and surface
degradation ?
OR a kind of vodoo on my hdbs ?
NOTE: Over the internet, when i searched for such errors on disks (not
ramdisks or loop devs) i ALWAYS found problems on hdb !!!
Could it be depending on master-slave configuration ? Kernel bug ? Other ?
What do you say ?
thanks for your time,
Gianluca Alberici
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-01-11 15:12 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-11 12:18 Bad disks or bug ? Ing. Gianluca Alberici
2005-01-11 12:39 Ing. Gianluca Alberici
2005-01-11 13:00 ` Mark Nipper
2005-01-11 13:02 ` Ing. Gianluca Alberici
[not found] ` <41E3D2A7.3000002@sgi.com>
2005-01-11 13:24 ` Ing. Gianluca Alberici
2005-01-11 14:23 ` Bill Davidsen
[not found] <Pine.LNX.4.61.0501111452500.23213@alexandria.physik.uni-oldenburg.de>
2005-01-11 15:07 ` Ing. Gianluca Alberici
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).