All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 13201] New: kernel BUG at fs/ext4/extents.c:2737
@ 2009-04-28  9:28 bugzilla-daemon
  2009-04-29  1:06 ` [Bug 13201] " bugzilla-daemon
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: bugzilla-daemon @ 2009-04-28  9:28 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=13201

           Summary: kernel BUG at fs/ext4/extents.c:2737
           Product: File System
           Version: 2.5
    Kernel Version: 2.6.29.1
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: ext4
        AssignedTo: fs_ext4@kernel-bugs.osdl.org
        ReportedBy: franco@fugro-fsi.com.au
        Regression: No


I got this while testing ext4 on an external RAID system. The system has 4
identical RAID systems each with a single 13TB filesystem. Only one of the 4
failed the test which was to simply write 8GB files until the disk fills up.

The complete messages file is attached.

Apr 25 01:59:38 echo19 kernel: EXT4-fs error (device dm-2):
__ext4_get_inode_loc: unable to read inode block - inode=761860,
block=3612686232
Apr 25 01:59:38 echo19 kernel: EXT4-fs error (device dm-2) in
ext4_reserve_inode_write: IO failure
Apr 25 01:59:38 echo19 kernel: mpage_da_map_blocks block allocation failed for
inode 761860 at logical offset 699276 with max blocks 1024 with error -5
Apr 25 01:59:38 echo19 kernel: This should not happen.!! Data will be lost
Apr 25 01:59:38 echo19 kernel: ------------[ cut here ]------------
Apr 25 01:59:38 echo19 kernel: kernel BUG at fs/ext4/extents.c:2737!

The filesystem was totally inaccessible so I reset the system. On reboot, the
filesystem couldn't be mounted - bad superblock.

I ran fsck a few times before I could remount the filesystem, all the
directories were lost but the test files were intact in lost+found.

I can't see any errors anywhere that might indicate that this is a hardware
problem but these are brand new systems using SAS host connections which we
haven't used before.

I've remade the broken filesystem and restarted the test on this and 11 other
identical filesystems, I'll let you know if the problem reoccurs.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 13201] kernel BUG at fs/ext4/extents.c:2737
  2009-04-28  9:28 [Bug 13201] New: kernel BUG at fs/ext4/extents.c:2737 bugzilla-daemon
@ 2009-04-29  1:06 ` bugzilla-daemon
  2009-04-29  3:15 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2009-04-29  1:06 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=13201





--- Comment #2 from Franco Broi <franco@fugro-fsi.com.au>  2009-04-29 01:06:09 ---
Of the 12 tests, 2 produced errors.

EXT4-fs error (device dm-5): ext4_mb_generate_buddy: EXT4-fs: group 9: 32768
blocks in bitmap, 1023 in gd

The filesystem seems OK, I can ls the test files.

EXT4-fs error (device dm-3): ext4_mb_generate_buddy: EXT4-fs: group 0: 32768
blocks in bitmap, 970 in gd
EXT4-fs error (device dm-3): ext4_mb_generate_buddy: EXT4-fs: group 0: 32768
blocks in bitmap, 32766 in gd
EXT4-fs error (device dm-3): ext4_init_block_bitmap: Checksum bad for group 1
EXT4-fs error (device dm-3): ext4_mb_generate_buddy: EXT4-fs: group 1: 0 blocks
in bitmap, 1023 in gd
EXT4-fs error (device dm-3): ext4_dx_find_entry: bad entry in directory #15:
directory entry across blocks - offset=28672, inode=0, rec_len=65536,
name_len=0
EXT4-fs error (device dm-3): ext4_add_entry: bad entry in directory #15:
directory entry across blocks - offset=0, inode=0, rec_len=65536, name_len=0
EXT4-fs error (device dm-3): htree_dirblock_to_tree: bad entry in directory #2:
directory entry across blocks - offset=0, inode=0, rec_len=65536, name_len=0
EXT4-fs error (device dm-3): htree_dirblock_to_tree: bad entry in directory #2:
directory entry across blocks - offset=0, inode=0, rec_len=65536, name_len=0
EXT4-fs error (device dm-3): htree_dirblock_to_tree: bad entry in directory #2:
directory entry across blocks - offset=0, inode=0, rec_len=65536, name_len=0
EXT4-fs error (device dm-3): htree_dirblock_to_tree: bad entry in directory #2:
directory entry across blocks - offset=0, inode=0, rec_len=65536, name_len=0
EXT4-fs error (device dm-3): htree_dirblock_to_tree: bad entry in directory #2:
directory entry across blocks - offset=0, inode=0, rec_len=65536, name_len=0

Although df looks ok
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vgdata--143-data143
                     13456415384 13232157624 224257760  99% /data143
# ls /data143

Produces no output.

At this point I will need to switch back to ext3 so that I can get this disk
into production but I do have a small window to run some more tests if anyone
has any ideas.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 13201] kernel BUG at fs/ext4/extents.c:2737
  2009-04-28  9:28 [Bug 13201] New: kernel BUG at fs/ext4/extents.c:2737 bugzilla-daemon
  2009-04-29  1:06 ` [Bug 13201] " bugzilla-daemon
@ 2009-04-29  3:15 ` bugzilla-daemon
  2009-04-30  3:00 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2009-04-29  3:15 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=13201


Eric Sandeen <sandeen@redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sandeen@redhat.com




--- Comment #3 from Eric Sandeen <sandeen@redhat.com>  2009-04-29 03:15:31 ---
Franco, sorry we haven't gotten back with suggestions on this.  It looks like
you have hit a couple different end results.  We've had a few reports of
corruption on larger filesystems which makes us wonder if there might be a
problem above 8T somewhere... 

The current upstream git tree (or the 2.6.30-rc3-git5 prepatch) has more extent
validity checking in it; if you do have the time for another test, running on
that codebase may yield more info, depending on where the problem lies.

-Eric

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 13201] kernel BUG at fs/ext4/extents.c:2737
  2009-04-28  9:28 [Bug 13201] New: kernel BUG at fs/ext4/extents.c:2737 bugzilla-daemon
  2009-04-29  1:06 ` [Bug 13201] " bugzilla-daemon
  2009-04-29  3:15 ` bugzilla-daemon
@ 2009-04-30  3:00 ` bugzilla-daemon
  2009-04-30  9:39 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2009-04-30  3:00 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=13201





--- Comment #4 from Franco Broi <franco@fugro-fsi.com.au>  2009-04-30 03:00:48 ---
I ran a test overnight using 2.6.30-rc3-git5 and it didn't fail. Not sure if
this is a good or bad thing?

I've deleted the files and started the test again.

By the way, deleting files with ext4 is lightening fast, it only takes about 5
minutes to delete 13TB! Again, not sure if this is a good or bad thing, it
doesn't give you much time to hit ctrl_c...

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 13201] kernel BUG at fs/ext4/extents.c:2737
  2009-04-28  9:28 [Bug 13201] New: kernel BUG at fs/ext4/extents.c:2737 bugzilla-daemon
                   ` (2 preceding siblings ...)
  2009-04-30  3:00 ` bugzilla-daemon
@ 2009-04-30  9:39 ` bugzilla-daemon
  2009-05-19 18:04 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2009-04-30  9:39 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=13201





--- Comment #5 from Franco Broi <franco@fugro-fsi.com.au>  2009-04-30 09:39:21 ---
I've now got filesystem corruption with 2.6.30-rc3-git5, looks pretty much the
same as before.

Apr 30 17:30:56 echo20 kernel: EXT4-fs error (device dm-3):
ext4_mb_generate_buddy: EXT4-fs: group 0: 32768 blocks in bitmap, 23495 in gd
Apr 30 17:30:56 echo20 kernel: EXT4-fs error (device dm-3):
ext4_mb_mark_diskspace_used: Allocating block 1024 in system zone of 0 group

When I do an ls in the test directory I get lots of Input/output errors

EXT4-fs error (device dm-3): ext4_lookup: deleted inode referenced: 127
EXT4-fs error (device dm-3): ext4_lookup: deleted inode referenced: 358
EXT4-fs error (device dm-3): ext4_lookup: deleted inode referenced: 196

Anything you want me to try?

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 13201] kernel BUG at fs/ext4/extents.c:2737
  2009-04-28  9:28 [Bug 13201] New: kernel BUG at fs/ext4/extents.c:2737 bugzilla-daemon
                   ` (3 preceding siblings ...)
  2009-04-30  9:39 ` bugzilla-daemon
@ 2009-05-19 18:04 ` bugzilla-daemon
  2009-05-23 10:10 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2009-05-19 18:04 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=13201


Theodore Tso <tytso@mit.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tytso@mit.edu




--- Comment #6 from Theodore Tso <tytso@mit.edu>  2009-05-19 18:04:05 ---
Could you try replicating this problem in 2.6.30-rc6?   We fixed a race
condition in i_cached_extents could have very well caused your problem.   I'm
hoping it will close this a few other mystery bug reports we've had over the
past couple of months.  (The bug is an old one, but we had struggled with a
reliable reproduction case.)

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 13201] kernel BUG at fs/ext4/extents.c:2737
  2009-04-28  9:28 [Bug 13201] New: kernel BUG at fs/ext4/extents.c:2737 bugzilla-daemon
                   ` (4 preceding siblings ...)
  2009-05-19 18:04 ` bugzilla-daemon
@ 2009-05-23 10:10 ` bugzilla-daemon
  2009-06-05  0:51 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2009-05-23 10:10 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=13201





--- Comment #7 from Franco Broi <franco@fugro-fsi.com.au>  2009-05-23 10:10:50 ---
I wont be able to recreate the original test conditions but I'll run a test
with a single large filesystem within a couple of weeks.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 13201] kernel BUG at fs/ext4/extents.c:2737
  2009-04-28  9:28 [Bug 13201] New: kernel BUG at fs/ext4/extents.c:2737 bugzilla-daemon
                   ` (5 preceding siblings ...)
  2009-05-23 10:10 ` bugzilla-daemon
@ 2009-06-05  0:51 ` bugzilla-daemon
  2009-06-08 16:49 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2009-06-05  0:51 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=13201





--- Comment #8 from Franco Broi <franco@fugro-fsi.com.au>  2009-06-05 00:51:52 ---
I haven't been able to recreate the problem using 2.6.30-rc8 but the test
conditions aren't identical to before. Would it make a difference that only a
single filesystem is being written to and not 4 simultaneously as in the
original tests?

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 13201] kernel BUG at fs/ext4/extents.c:2737
  2009-04-28  9:28 [Bug 13201] New: kernel BUG at fs/ext4/extents.c:2737 bugzilla-daemon
                   ` (6 preceding siblings ...)
  2009-06-05  0:51 ` bugzilla-daemon
@ 2009-06-08 16:49 ` bugzilla-daemon
  2009-06-09  5:10 ` bugzilla-daemon
  2009-08-26 18:11 ` bugzilla-daemon
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2009-06-08 16:49 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=13201





--- Comment #9 from Theodore Tso <tytso@mit.edu>  2009-06-08 16:49:28 ---
If this is the same problem as the one which we fixed with identical symptoms,
what matters is multiple processes/threads writing to the same file at the same
time.  People using NFS or SAMBA on a backup server seemed to be a most common
scenarios for triggering this (admittedly very hard to reproduce) bug.   We
finally got lucky in that someone had a setup which allows for reliable
reproduction of the bug, so we could finally sink our teeth into it.

So if what you saw was the same as the bug we fixed in 2.6.30-rc6, no it
shouldn't make a difference.   If it is a completely different bug, then of
course all bets are off.  In general though whether you are writing to one
filesystem or 4 filesystems shouldn't make a difference, except in that it
might change the timing necessary to hit a race condition (and in the case of
the bug that we found and fixed, it was highly timing dependent; in fact, even
after we found the problem, we weren't able to come up with a reliable
reproduction case, even though the problem was obvious on paper and the one
user who could reliably reproduce reported it went away once the patch was
applied; IIRC, Eric finally put in a delay into the code to widen the race
window to the point where he could replicate it.)

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 13201] kernel BUG at fs/ext4/extents.c:2737
  2009-04-28  9:28 [Bug 13201] New: kernel BUG at fs/ext4/extents.c:2737 bugzilla-daemon
                   ` (7 preceding siblings ...)
  2009-06-08 16:49 ` bugzilla-daemon
@ 2009-06-09  5:10 ` bugzilla-daemon
  2009-08-26 18:11 ` bugzilla-daemon
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2009-06-09  5:10 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=13201





--- Comment #10 from Franco Broi <franco@fugro-fsi.com.au>  2009-06-09 05:10:31 ---
(In reply to comment #9)
> If this is the same problem as the one which we fixed with identical symptoms,
> what matters is multiple processes/threads writing to the same file at the same
> time.

Then it doesn't sound like it's the same bug. My tests are very simple, they
just keep writing 8GB files until the disk fills up, there is no concurrent
access to files or even the filesystem, and the machines are completely
standalone.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 13201] kernel BUG at fs/ext4/extents.c:2737
  2009-04-28  9:28 [Bug 13201] New: kernel BUG at fs/ext4/extents.c:2737 bugzilla-daemon
                   ` (8 preceding siblings ...)
  2009-06-09  5:10 ` bugzilla-daemon
@ 2009-08-26 18:11 ` bugzilla-daemon
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2009-08-26 18:11 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=13201


Valerie Aurora <vaurora@redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vaurora@redhat.com




--- Comment #11 from Valerie Aurora <vaurora@redhat.com>  2009-08-26 18:11:07 ---
Given that the bug appears to be fixed, and we can't reproduce the original
conditions or get more data on this bug, it seems like we should close this
bug.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-08-26 18:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-28  9:28 [Bug 13201] New: kernel BUG at fs/ext4/extents.c:2737 bugzilla-daemon
2009-04-29  1:06 ` [Bug 13201] " bugzilla-daemon
2009-04-29  3:15 ` bugzilla-daemon
2009-04-30  3:00 ` bugzilla-daemon
2009-04-30  9:39 ` bugzilla-daemon
2009-05-19 18:04 ` bugzilla-daemon
2009-05-23 10:10 ` bugzilla-daemon
2009-06-05  0:51 ` bugzilla-daemon
2009-06-08 16:49 ` bugzilla-daemon
2009-06-09  5:10 ` bugzilla-daemon
2009-08-26 18:11 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.