linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bad disks or bug ?
@ 2005-01-11 12:18 Ing. Gianluca Alberici
  0 siblings, 0 replies; 7+ messages in thread
From: Ing. Gianluca Alberici @ 2005-01-11 12:18 UTC (permalink / raw)
  To: linux-kernel

Hello,

i have a little doubt on the following....

/> Sep 10 12:50:30 abivrs0 kernel: hdb: dma_intr: status=0x51 { DriveReady /
/> SeekComplete Error } /
/> Sep 10 12:50:30 abivrs0 kernel: hdb: dma_intr: error=0x40 /
////

...Of course the explanation of such an error is that drive has bad
sectors and i'd better change it BUT:

I have many production servers running 2.4.27, everything seems OK while
they're new.
After about an year of production many of them (5 of them at present)
begin to show the problem BUT:

1) All of them show the problem on hdb (ALL OF THEM)
2) Never had problems on hda on ANY server, disks are the same, same
size, same partitioning
3) Typically hdb is used as a disk mirror
4) Many times a mkfs.ext3 -c -c solves the problem bringing bad sectors
to a new life !

How do you explain that ? Overload on hdb due to mirroring and surface
degradation ?
OR a kind of vodoo on my hdbs ?

NOTE: Over the internet, when i searched for such errors on disks (not
ramdisks or loop devs) i ALWAYS found problems on hdb !!!

Could it be depending on master-slave configuration ? Kernel bug ? Other ?

What do you say ?

thanks for your time,

Gianluca Alberici


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad disks or bug ?
       [not found] <Pine.LNX.4.61.0501111452500.23213@alexandria.physik.uni-oldenburg.de>
@ 2005-01-11 15:07 ` Ing. Gianluca Alberici
  0 siblings, 0 replies; 7+ messages in thread
From: Ing. Gianluca Alberici @ 2005-01-11 15:07 UTC (permalink / raw)
  To: stamer, linux-kernel

Heinrich,

Your analysis shows you know well what we're talking about and i think 
finally
you gave the optimal explanation...

I know very well the problems of the IBM DTLA series (i have a dozen of 
zombies here...)
and was mainly concerned about Maxtor disks.

Finally i agree it must be something that has to do with the use of 
hardware and/or firmware...

I will of course substitute my backup drive immediately !

Regards and thanks,

Gianluca

Heinrich Stamerjohanns wrote:

>Der Gianluca, 
>
>I guess you have an IBM Deskstar (or now Hitachi), possibly a DTLA-307045?
>
>We have the same setup: main disk is /dev/hda, backup disk (every night) 
>is /dev/hdb.
>
>One just crashed yesterday, it is the fifth IBM (out of five) that 
>crashed (this is already a replacement disk...). 
>Don't worry your IBM /dev/hda will crash sooner or later as well ;)
>
>But when I investigated after the first crash, I read something like
>that these disk especially do not cope with the situation that they are 
>only infrequently used, but then heavily (no use at all, then continous 
>backup..). The firmware has supposedly changed since then, but it
>has not helped the replacment drive.
>So I guess your problem with /dev/hdb is rather a hardware than 
>a software problem. 
>
>To be sure you could make /dev/hdb your main disk and backup to /dev/hda.
>I am quite sure that /dev/hda will give up first then. (But it happened to 
>us that the main drive died two days later, without a replaced backup 
>drive...)
>
>Greetings, Heinrich
> 
>
>--
>  Dr. Heinrich Stamerjohanns        Tel. +49-441-798-4276
>  Institute for Science Networking  stamer@uni-oldenburg.de
>  University of Oldenburg           http://isn.uni-oldenburg.de/~stamer
>
>  
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad disks or bug ?
  2005-01-11 13:24       ` Ing. Gianluca Alberici
@ 2005-01-11 14:23         ` Bill Davidsen
  0 siblings, 0 replies; 7+ messages in thread
From: Bill Davidsen @ 2005-01-11 14:23 UTC (permalink / raw)
  To: linux-kernel, Ing. Gianluca Alberici; +Cc: Prarit Bhargava, linux-kernel

Ing. Gianluca Alberici wrote:
> Prarit,
> 
> I run 2.4.27 on all my machines.
> 
> Never had problems but this (if we want to say its not a disk problem)
> 
> I have seen on mailing lists many people having this very problem, 
> always on hdb, with a lot of different kernel and machines.
> 
> I have basically two kind of cabinets:
> 
> - The 'Antec style' tower
> - Racmount cases
> 
> Everything well cooled...i am sure bout that.
> 
> Always ASUS A7V, Athlon XP 2xxx+, NEVER OVERCLOCKED, of course...
> 
> Disks are mainly Maxtor, or IBM
> 
> Basically i was wondering whether to swap disks on a server just to try....
> 
> ....These are the things that rave me mad !

You might see if you can swap positions in the case first, then I guess 
you would have nothing to try but actually flipping master with slave 
(and contents as well, good fun).

Just wondering, where is your swap? Your Journal? Do you have a small 
fast drive you could use for swap and external journal? I checked all 
the machines which have a similar configuration, and I don't see any 
pattern, although all the machines with really high i/o load are using SCSI.

-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad disks or bug ?
       [not found]     ` <41E3D2A7.3000002@sgi.com>
@ 2005-01-11 13:24       ` Ing. Gianluca Alberici
  2005-01-11 14:23         ` Bill Davidsen
  0 siblings, 1 reply; 7+ messages in thread
From: Ing. Gianluca Alberici @ 2005-01-11 13:24 UTC (permalink / raw)
  To: Prarit Bhargava, linux-kernel

Prarit,

I run 2.4.27 on all my machines.

Never had problems but this (if we want to say its not a disk problem)

I have seen on mailing lists many people having this very problem, 
always on hdb, with a lot of different kernel and machines.

I have basically two kind of cabinets:

- The 'Antec style' tower
- Racmount cases

Everything well cooled...i am sure bout that.

Always ASUS A7V, Athlon XP 2xxx+, NEVER OVERCLOCKED, of course...

Disks are mainly Maxtor, or IBM

Basically i was wondering whether to swap disks on a server just to try....

....These are the things that rave me mad !


Prarit Bhargava wrote:

> Hi Gianluca,
>
> How old is your kernel?
> P.
>
> Ing. Gianluca Alberici wrote:
>
>> Hello,
>>
>> Very Interesting news from Massimo...
>>
>> About the heat source, Mark, i thought about that, too, and hdb is 
>> always the
>> bottom (CS Enabled) device of rackmounts (so the colder one) , in front
>> of the usual ball bearing fans !!!!
>>
>> I am beginning to believe all this deserves a more deep investigation...
>>
>> Waiting for comments,
>>
>> Gianluca
>>
>> Mark Nipper wrote:
>>
>>> On 11 Jan 2005, Ing. Gianluca Alberici wrote:
>>>  
>>>
>>>> How do you explain that ? Overload on hdb due to mirroring and surface
>>>> degradation ?
>>>> OR a kind of vodoo on my hdbs ?
>>>>   
>>>
>>>
>>>
>>>     Is it possible that hdb is closer to a high heat source
>>> or is not being cooled as hda if all these machines are the same
>>> case design?
>>>
>>>  
>>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe 
>> linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad disks or bug ?
  2005-01-11 13:00 ` Mark Nipper
@ 2005-01-11 13:02   ` Ing. Gianluca Alberici
       [not found]     ` <41E3D2A7.3000002@sgi.com>
  0 siblings, 1 reply; 7+ messages in thread
From: Ing. Gianluca Alberici @ 2005-01-11 13:02 UTC (permalink / raw)
  To: Mark Nipper; +Cc: linux-kernel

Hello,

Very Interesting news from Massimo...

About the heat source, Mark, i thought about that, too, and hdb is 
always the
bottom (CS Enabled) device of rackmounts (so the colder one) , in front
of the usual ball bearing fans !!!!

I am beginning to believe all this deserves a more deep investigation...

Waiting for comments,

Gianluca

Mark Nipper wrote:

>On 11 Jan 2005, Ing. Gianluca Alberici wrote:
>  
>
>>How do you explain that ? Overload on hdb due to mirroring and surface
>>degradation ?
>>OR a kind of vodoo on my hdbs ?
>>    
>>
>
>	Is it possible that hdb is closer to a high heat source
>or is not being cooled as hda if all these machines are the same
>case design?
>
>  
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad disks or bug ?
  2005-01-11 12:39 Ing. Gianluca Alberici
@ 2005-01-11 13:00 ` Mark Nipper
  2005-01-11 13:02   ` Ing. Gianluca Alberici
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Nipper @ 2005-01-11 13:00 UTC (permalink / raw)
  To: Ing. Gianluca Alberici; +Cc: linux-kernel

On 11 Jan 2005, Ing. Gianluca Alberici wrote:
> How do you explain that ? Overload on hdb due to mirroring and surface
> degradation ?
> OR a kind of vodoo on my hdbs ?

	Is it possible that hdb is closer to a high heat source
or is not being cooled as hda if all these machines are the same
case design?

-- 
Mark Nipper                                                e-contacts:
4475 Carter Creek Parkway                           nipsy@bitgnome.net
Apartment 724                               http://nipsy.bitgnome.net/
Bryan, Texas, 77802-4481           AIM/Yahoo: texasnipsy ICQ: 66971617
(979)575-3193                                      MSN: nipsy@tamu.edu

-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GG/IT d- s++:+ a- C++$ UBL++++$ P--->+++ L+++$ !E---
W++(--) N+ o K++ w(---) O++ M V(--) PS+++(+) PE(--)
Y+ PGP t+ 5 X R tv b+++@ DI+(++) D+ G e h r++ y+(**)
------END GEEK CODE BLOCK------

---begin random quote of the moment---
"That we are not much sicker and much madder than we are is
due exclusively to that most blessed and blessing of all
natural graces, sleep."
 -- Aldous Huxley
----end random quote of the moment----

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Bad disks or bug ?
@ 2005-01-11 12:39 Ing. Gianluca Alberici
  2005-01-11 13:00 ` Mark Nipper
  0 siblings, 1 reply; 7+ messages in thread
From: Ing. Gianluca Alberici @ 2005-01-11 12:39 UTC (permalink / raw)
  To: linux-kernel

Hello,

i have a little doubt on the following....

/> Sep 10 12:50:30 abivrs0 kernel: hdb: dma_intr: status=0x51 { DriveReady /
/> SeekComplete Error } /
/> Sep 10 12:50:30 abivrs0 kernel: hdb: dma_intr: error=0x40 /
////

...Of course the explanation of such an error is that drive has bad
sectors and i'd better change it BUT:

I have many production servers running 2.4.27, everything seems OK while
they're new.
After about an year of production many of them (5 of them at present)
begin to show the problem BUT:

1) All of them show the problem on hdb (ALL OF THEM)
2) Never had problems on hda on ANY server, disks are the same, same
size, same partitioning
3) Typically hdb is used as a disk mirror
4) Many times a mkfs.ext3 -c -c solves the problem bringing bad sectors
to a new life !

How do you explain that ? Overload on hdb due to mirroring and surface
degradation ?
OR a kind of vodoo on my hdbs ?

NOTE: Over the internet, when i searched for such errors on disks (not
ramdisks or loop devs) i ALWAYS found problems on hdb !!!

Could it be depending on master-slave configuration ? Kernel bug ? Other ?

What do you say ?

thanks for your time,

Gianluca Alberici



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-01-11 15:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-11 12:18 Bad disks or bug ? Ing. Gianluca Alberici
2005-01-11 12:39 Ing. Gianluca Alberici
2005-01-11 13:00 ` Mark Nipper
2005-01-11 13:02   ` Ing. Gianluca Alberici
     [not found]     ` <41E3D2A7.3000002@sgi.com>
2005-01-11 13:24       ` Ing. Gianluca Alberici
2005-01-11 14:23         ` Bill Davidsen
     [not found] <Pine.LNX.4.61.0501111452500.23213@alexandria.physik.uni-oldenburg.de>
2005-01-11 15:07 ` Ing. Gianluca Alberici

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).