linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* massive filesystem corruption with 2.4.9
@ 2001-08-21  8:00 Kristian
  2001-08-21  8:34 ` Christian Widmer
  0 siblings, 1 reply; 9+ messages in thread
From: Kristian @ 2001-08-21  8:00 UTC (permalink / raw)
  To: linux-kernel

Hello.

Since linux-2.4.5 always the same errors occur sporadically after the cold boot
     in the morning. (My computer is powered off during the night.) Every second
day I noticed my syslog sais something like the  following:

Aug 21 09:01:06 adlib kernel: EXT2-fs error (device ide0(3,5)): ext2_new_block:
Allocating block in system zone - block = 3
Aug 21 09:01:06 adlib kernel: EXT2-fs error (device ide0(3,5)):
ext2_free_blocks: Freeing blocks in system zones - Block = 4, count = 1
Aug 21 09:01:06 adlib kernel: EXT2-fs error (device ide0(3,5)): ext2_new_block:
Allocating block in system zone - block = 37
Aug 21 09:01:06 adlib kernel: EXT2-fs error (device ide0(3,5)): ext2_new_block:
Allocating block in system zone - block = 45
Aug 21 09:01:07 adlib kernel: mtrr: base(0x42000000) is not aligned on a
size(0x1800000) boundary
Aug 21 09:01:09 adlib last message repeated 2 times
Aug 21 09:01:26 adlib PAM_unix[1929]: (login) session opened for user root by
LOGIN(uid=0)
Aug 21 09:01:26 adlib  -- root[1929]: ROOT LOGIN ON tty1
Aug 21 09:01:30 adlib kernel: EXT2-fs error (device ide0(3,5)):
ext2_free_blocks: Freeing blocks in system zones - Block = 41, count = 4
Aug 21 09:01:30 adlib kernel: EXT2-fs error (device ide0(3,5)): ext2_new_block:
Allocating block in system zone - block = 4
Aug 21 09:01:30 adlib kernel: EXT2-fs error (device ide0(3,5)): ext2_new_block:
Allocating block in system zone - block = 7
Aug 21 09:01:30 adlib kernel: EXT2-fs error (device ide0(3,5)):
ext2_free_blocks: Freeing blocks in system zones - Block = 8, count = 2

Today it destroyed my super block and all my root-directories were placed in
/lost+found. I rescued everything with e2fsck-1.14 from a very old rescue-disk
and then again with 1.23, renaming and replacing the directories by hand. A lot
of devices and some .h-files were not recoverable.

These fatal errors are occuring since 2.4.5 (2.4.8 I've not tested.). When I
work with 2.4.4 everything is fine !

I already use the newest version of e2fsck (1.23) and util-linux (2.11f). My
RedHat (Rotkäppchen) 6.2 is rather old, but I don't like gcc 2.96 at all.

I posted this report as the errors occured after a complete crash with 2.4.6
also to the ext2-developers directly but they didn't answered.

Maybe you could help me here ?

Kristian

·· · · reach me :: · ·· ·· ·  · ·· · ··  · ··· · ·
                            :: http://www.korseby.net
                            :: http://www.tomlab.de
kristian@korseby.net ....::




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: massive filesystem corruption with 2.4.9
  2001-08-21  8:00 massive filesystem corruption with 2.4.9 Kristian
@ 2001-08-21  8:34 ` Christian Widmer
  2001-08-21 10:14   ` Kristian
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Widmer @ 2001-08-21  8:34 UTC (permalink / raw)
  To: Kristian; +Cc: linux-kernel

i had similar problems with 2.4.6. unfortunately i didn't save the errors so
i can't compare the msg's. i just can say that with 2.4.6 it destroyed the
ext2 on a 40GB and 60GB maxtor disk. since then my nfs server is running 
2.2.19 with works fine (with minor promblems*).

* after a client mounted an exprots once. i cant unmount that partition on 
the server after the client unmounted the exports.



On Tuesday 21 August 2001 10:00, Kristian wrote:
> Hello.
>
> Since linux-2.4.5 always the same errors occur sporadically after the cold
> boot in the morning. (My computer is powered off during the night.) Every
> second day I noticed my syslog sais something like the  following:
>
> Aug 21 09:01:06 adlib kernel: EXT2-fs error (device ide0(3,5)):
> ext2_new_block: Allocating block in system zone - block = 3
> Aug 21 09:01:06 adlib kernel: EXT2-fs error (device ide0(3,5)):
> ext2_free_blocks: Freeing blocks in system zones - Block = 4, count = 1
> Aug 21 09:01:06 adlib kernel: EXT2-fs error (device ide0(3,5)):
> ext2_new_block: Allocating block in system zone - block = 37
> Aug 21 09:01:06 adlib kernel: EXT2-fs error (device ide0(3,5)):
> ext2_new_block: Allocating block in system zone - block = 45
> Aug 21 09:01:07 adlib kernel: mtrr: base(0x42000000) is not aligned on a
> size(0x1800000) boundary
> Aug 21 09:01:09 adlib last message repeated 2 times
> Aug 21 09:01:26 adlib PAM_unix[1929]: (login) session opened for user root
> by LOGIN(uid=0)
> Aug 21 09:01:26 adlib  -- root[1929]: ROOT LOGIN ON tty1
> Aug 21 09:01:30 adlib kernel: EXT2-fs error (device ide0(3,5)):
> ext2_free_blocks: Freeing blocks in system zones - Block = 41, count = 4
> Aug 21 09:01:30 adlib kernel: EXT2-fs error (device ide0(3,5)):
> ext2_new_block: Allocating block in system zone - block = 4
> Aug 21 09:01:30 adlib kernel: EXT2-fs error (device ide0(3,5)):
> ext2_new_block: Allocating block in system zone - block = 7
> Aug 21 09:01:30 adlib kernel: EXT2-fs error (device ide0(3,5)):
> ext2_free_blocks: Freeing blocks in system zones - Block = 8, count = 2
>
> Today it destroyed my super block and all my root-directories were placed
> in /lost+found. I rescued everything with e2fsck-1.14 from a very old
> rescue-disk and then again with 1.23, renaming and replacing the
> directories by hand. A lot of devices and some .h-files were not
> recoverable.
>
> These fatal errors are occuring since 2.4.5 (2.4.8 I've not tested.). When
> I work with 2.4.4 everything is fine !
>
> I already use the newest version of e2fsck (1.23) and util-linux (2.11f).
> My RedHat (Rotkäppchen) 6.2 is rather old, but I don't like gcc 2.96 at
> all.
>
> I posted this report as the errors occured after a complete crash with
> 2.4.6 also to the ext2-developers directly but they didn't answered.
>
> Maybe you could help me here ?
>
> Kristian
>
> ·· · · reach me :: · ·· ·· ·  · ·· · ··  · ··· · ·
>
>                             :: http://www.korseby.net
>                             :: http://www.tomlab.de
>
> kristian@korseby.net ....::
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
christian widmer
zurlindenstrasse 294, 8003 zurich, switzerland
email:  cwidmer@iiic.ethz.ch
phone: ++41 (0)1 491 03 68

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: massive filesystem corruption with 2.4.9
  2001-08-21  8:34 ` Christian Widmer
@ 2001-08-21 10:14   ` Kristian
  0 siblings, 0 replies; 9+ messages in thread
From: Kristian @ 2001-08-21 10:14 UTC (permalink / raw)
  To: cwidmer; +Cc: linux-kernel

Christian Widmer wrote:
> i had similar problems with 2.4.6. unfortunately i didn't save the errors so
> i can't compare the msg's. i just can say that with 2.4.6 it destroyed the
> ext2 on a 40GB and 60GB maxtor disk. since then my nfs server is running 
> 2.2.19 with works fine (with minor promblems*).
> 
> * after a client mounted an exprots once. i cant unmount that partition on 
> the server after the client unmounted the exports.

I have several entries more in my logfile. It would be no problem collecting 
them if that is helpful. I forgot to say that I'm using an IBM 41 GB (hda: 
IBM-DTLA-305040, ATA DISK drive) and that this problem only occurs on my 
root-partition (hda5), the always mounted /boot-Partition (hda1) and partially 
mounted misc-Partition (hda7) are not effected.

I don't use any NFS.

Kristian

·· · · reach me :: · ·· ·· ·  · ·· · ··  · ··· · ·
                          :: http://www.korseby.net
                          :: http://www.tomlab.de
kristian@korseby.net ....::


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: massive filesystem corruption with 2.4.9
  2001-08-21 16:23 ` Alan Cox
@ 2001-08-21 19:06   ` Kristian
  0 siblings, 0 replies; 9+ messages in thread
From: Kristian @ 2001-08-21 19:06 UTC (permalink / raw)
  To: Alan Cox; +Cc: cwidmer, linux-kernel

Alan Cox wrote:
> Does memtest86 show up anything on this box ?

No errors...

Btw: As far as I know did the problem occur since I patched 2.4.5 with ac13 or 
ac15. Maybe a clean 2.4.5 works fine. I'm not sure about this. It's some time 
ago... Did you have made some important ext2-related changes with 2.4.5-ac?. I 
could revert to the old kernel and test him if it is relevant.

Kristian

·· · · reach me :: · ·· ·· ·  · ·· · ··  · ··· · ·
                          :: http://www.korseby.net
                          :: http://www.tomlab.de
kristian@korseby.net ....::


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: massive filesystem corruption with 2.4.9
       [not found] <no.id>
  2001-08-21 13:58 ` Alan Cox
  2001-08-21 16:23 ` Alan Cox
@ 2001-08-21 16:26 ` Alan Cox
  2 siblings, 0 replies; 9+ messages in thread
From: Alan Cox @ 2001-08-21 16:26 UTC (permalink / raw)
  To: cwidmer; +Cc: Kristian, Alan Cox, linux-kernel

> that it is a memory problem i also don't belive. that ram work for over 2 year
> with no errors found with memtest (memtset86, intels memtest) compiling
> seveal times xfree86 and an many many times several kernels. 
> 
> and i never had any problems. until i tried the first time a 2.4.x kernel on 
> the fileserver (that was 2.4.6). so i moved the fileserver back to 2.2.19.

Nod. I can follow that reasoning, I've come across boxes that fialed with
2.4 with memory errors, but not 2.2. So far however those have all shown up
with memtest86, or been Athlon optimisation triggered via things

Curiouser and curiouser

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: massive filesystem corruption with 2.4.9
       [not found] <no.id>
  2001-08-21 13:58 ` Alan Cox
@ 2001-08-21 16:23 ` Alan Cox
  2001-08-21 19:06   ` Kristian
  2001-08-21 16:26 ` Alan Cox
  2 siblings, 1 reply; 9+ messages in thread
From: Alan Cox @ 2001-08-21 16:23 UTC (permalink / raw)
  To: Kristian; +Cc: Alan Cox, cwidmer, linux-kernel

> No. I can't find any VIA chipset. I'm really surprised. :-) But it is a=
> n
> original Compaq-Board (EP-Series..) with a horroble BIOS. It seems that=
>  they're
> using intel only..

440BX - good chipset.

Does memtest86 show up anything on this box ?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: massive filesystem corruption with 2.4.9
  2001-08-21 16:00   ` Kristian
@ 2001-08-21 16:18     ` Christian Widmer
  0 siblings, 0 replies; 9+ messages in thread
From: Christian Widmer @ 2001-08-21 16:18 UTC (permalink / raw)
  To: Kristian, Alan Cox; +Cc: linux-kernel

>  > If your disk is in UDMA33/66 mode you can pretty rule the 
>  > disk out as the data is protected
i think this should be with the promise driver.

>  > If you have a VIA chipset, especially if there is an SB Live! in the
>  > machine then that may be the cause (fixes in 2.4.8-ac, should be a fix
>  > in 2.4.9 but Linus tree also applies another bogus change but which
>  > should be harmless)
it was an intel LX chipset

that it is a memory problem i also don't belive. that ram work for over 2 year
with no errors found with memtest (memtset86, intels memtest) compiling
seveal times xfree86 and an many many times several kernels. 

and i never had any problems. until i tried the first time a 2.4.x kernel on 
the fileserver (that was 2.4.6). so i moved the fileserver back to 2.2.19.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: massive filesystem corruption with 2.4.9
  2001-08-21 13:58 ` Alan Cox
@ 2001-08-21 16:00   ` Kristian
  2001-08-21 16:18     ` Christian Widmer
  0 siblings, 1 reply; 9+ messages in thread
From: Kristian @ 2001-08-21 16:00 UTC (permalink / raw)
  To: Alan Cox; +Cc: cwidmer, linux-kernel

Alan Cox wrote:
 > Typically this indicates disk, memory or chipset problems. If your disk is
 > in UDMA33/66 mode you can pretty rule the disk out as the data is
 > protected
 >
 > If you have a VIA chipset, especially if there is an SB Live! in the machine
 > then that may be the cause (fixes in 2.4.8-ac, should be a fix in 2.4.9 but
 > Linus tree also applies another bogus change but which should be harmless)

No. I can't find any VIA chipset. I'm really surprised. :-) But it is an
original Compaq-Board (EP-Series..) with a horroble BIOS. It seems that they're
using intel only..

I did a probe of my harddisk with IBM's Drive Fitness program. It detected no
errors.

Here's the output of it:

     Model                   : IBM-DTLA-305040
     Serial no.              : YJ025714
     Capacity                : 41.17 GB
     Cache size              : 380 KB
     Microcode level         : TW4OA60A
     ATA Compliance          : ATA-5

     Ultra DMA
       Highest mode          : 5
       Active mode           : 1

     Settings
       Write cache           : Enabled
       Read look-ahead       : Enabled
       Auto reassign         : Enabled
       S.M.A.R.T. operations : Enabled
       S.M.A.R.T. status     : Good
       ABLE                  : Disabled
       AAM                   : Disabled
       Security feature      : Supported
         Password            : Not Set


Here is the output of cat /proc/pci:
PCI devices found:
    Bus  0, device   0, function  0:
      Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge (rev 3).
        Master Capable.  Latency=64.
        Prefetchable 32 bit memory at 0x44000000 [0x47ffffff].
    Bus  0, device   1, function  0:
      PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev 3).
        Master Capable.  Latency=64.  Min Gnt=140.
    Bus  0, device  14, function  0:
      Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 5).
        IRQ 11.
        Master Capable.  Latency=66.  Min Gnt=8.Max Lat=56.
        Prefetchable 32 bit memory at 0x48100000 [0x48100fff].
        I/O at 0x1000 [0x101f].
        Non-prefetchable 32 bit memory at 0x48000000 [0x480fffff].
    Bus  0, device  15, function  0:
      Multimedia audio controller: Ensoniq ES1371 [AudioPCI-97] (rev 6).
        IRQ 11.
        Master Capable.  Latency=64.  Min Gnt=12.Max Lat=128.
        I/O at 0x1080 [0x10bf].
    Bus  0, device  16, function  0:
      Multimedia video controller: Brooktree Corporation Bt878 (rev 17).
        IRQ 11.
        Master Capable.  Latency=66.  Min Gnt=16.Max Lat=40.
        Prefetchable 32 bit memory at 0x48200000 [0x48200fff].
    Bus  0, device  16, function  1:
      Multimedia controller: Brooktree Corporation Bt878 (rev 17).
        IRQ 11.
        Master Capable.  Latency=66.  Min Gnt=4.Max Lat=255.
        Prefetchable 32 bit memory at 0x48300000 [0x48300fff].
    Bus  0, device  20, function  0:
      ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 2).
    Bus  0, device  20, function  1:
      IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 1).
        Master Capable.  Latency=64.
        I/O at 0x1040 [0x104f].
    Bus  0, device  20, function  2:
      USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 1).
        IRQ 11.
        Master Capable.  Latency=64.
        I/O at 0x1020 [0x103f].
    Bus  0, device  20, function  3:
      Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 2).
        IRQ 9.
    Bus  1, device   0, function  0:
      VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 130).
        IRQ 11.
        Master Capable.  Latency=64.  Min Gnt=16.Max Lat=32.
        Prefetchable 32 bit memory at 0x42000000 [0x43ffffff].
        Non-prefetchable 32 bit memory at 0x40800000 [0x40803fff].
        Non-prefetchable 32 bit memory at 0x40000000 [0x407fffff].

/proc/cpuinfo:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Pentium III (Coppermine)
stepping        : 3
cpu MHz         : 597.413
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat
pse36 mmx fxsr sse
bogomips        : 1192.75

/proc/interrupts:

    0:    1337585          XT-PIC  timer
    1:      46916          XT-PIC  keyboard
    2:          0          XT-PIC  cascade
    8:          1          XT-PIC  rtc
   11:     162990          XT-PIC  es1371, bttv, eth0
   12:     294331          XT-PIC  PS/2 Mouse
   14:      21839          XT-PIC  ide0
   15:         17          XT-PIC  ide1
NMI:          0
ERR:          0


Kristian

·· · · reach me :: · ·· ·· ·  · ·· · ··  · ··· · ·
                           :: http://www.korseby.net
                           :: http://www.tomlab.de
kristian@korseby.net ....::


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: massive filesystem corruption with 2.4.9
       [not found] <no.id>
@ 2001-08-21 13:58 ` Alan Cox
  2001-08-21 16:00   ` Kristian
  2001-08-21 16:23 ` Alan Cox
  2001-08-21 16:26 ` Alan Cox
  2 siblings, 1 reply; 9+ messages in thread
From: Alan Cox @ 2001-08-21 13:58 UTC (permalink / raw)
  To: cwidmer; +Cc: Kristian, linux-kernel

> > Aug 21 09:01:06 adlib kernel: EXT2-fs error (device ide0(3,5)):
> > ext2_new_block: Allocating block in system zone - block =3D 3
> > Aug 21 09:01:06 adlib kernel: EXT2-fs error (device ide0(3,5)):
> > ext2_free_blocks: Freeing blocks in system zones - Block =3D 4, count=
>  =3D 1

Typically this indicates disk, memory or chipset problems. If your disk is
in UDMA33/66 mode you can pretty rule the disk out as the data is
protected

If you have a VIA chipset, especially if there is an SB Live! in the machine
then that may be the cause (fixes in 2.4.8-ac, should be a fix in 2.4.9 but
Linus tree also applies another bogus change but which should be harmless)

> > These fatal errors are occuring since 2.4.5 (2.4.8 I've not tested.).=
>  When
> > I work with 2.4.4 everything is fine !

What hardware

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2001-08-21 19:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-21  8:00 massive filesystem corruption with 2.4.9 Kristian
2001-08-21  8:34 ` Christian Widmer
2001-08-21 10:14   ` Kristian
     [not found] <no.id>
2001-08-21 13:58 ` Alan Cox
2001-08-21 16:00   ` Kristian
2001-08-21 16:18     ` Christian Widmer
2001-08-21 16:23 ` Alan Cox
2001-08-21 19:06   ` Kristian
2001-08-21 16:26 ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).