* Re: Bad things happening to journaled filesystem machines; Was: Oops in kjournald
@ 2005-01-11 16:10 Anders Saaby
From: Anders Saaby @ 2005-01-11 16:10 UTC (permalink / raw)
  To: linux-kernel


Well I just want to count my self in. - I have had quite a few of these
both on XFS and EXT3. I tried a lot of different 2.6.x kernels without
luck. (See other mails here on LKML from me and Jakob Oestergaard et al.) 

It seems something is very broken in VFS(?) on 2.6 (atleast after 2.6.5,
which was the last kernel I didn't see this on, but had other errors that
forced me away from it).

Sadly it looks to me as if either noone cares (enough) about this, or noone
is capable of fixing it (myself included). :( 

- 2.4.28 fixed it for me. - Just have to live with the poor performance.

Jeffrey E. Hundstad wrote:

> I've had 4 machines do the similiar things.  It happens during backups
> or during updatedb.  This has happened on 2.6.9, 2.6.10, 2.6.10-ac7, and
> 2.6.10-ac8.  I've seen several similiar reports with journaled file
> systems.  I use XFS exclusively; but have seen reports on XFS and EXT3.
> I would report something more useful but what I'm usually left with is
> XFS unmounted and nothing useful on the screen.  This has been on Xeon,
> Pentium-3 and Athlon systems. ...wish I could report more but perhaps it
> will add /part/ of a data point.

Med venlig hilsen - Best regards - Meilleures salutations

Anders Saaby
Systems Engineer
Cohaesio A/S - Maglebjergvej 5D - DK-2800 Lyngby
Phone: +45 45 880 888 - Fax: +45 45 880 777
Mail: -

* Oops in kjournald
@ 2005-01-11  9:05 Roland Rosenfeld
  2005-01-11  9:24 ` Bad things happening to journaled filesystem machines; Was: " Jeffrey E. Hundstad
From: Roland Rosenfeld @ 2005-01-11  9:05 UTC (permalink / raw)
  To: Linux Kernel Mailinglist

In the last week I got the following Oops five times on two machines,
when they are under very heavy load  (mailserver based on Debian sarge
with postfix 2.1.4) when the load is >4 for some hours.

---------------------- schnipp --------------------------------
Unable to handle kernel NULL pointer dereference at virtual address 0000000c
 printing eip:
*pde = 00000000
Oops: 0002 [#1]
Modules linked in: e1000 tg3 bonding rtc unix
CPU:    0
EIP:    0060:[<c01a65a5>]    Not tainted VLI
EFLAGS: 00010282   (2.6.10-686-nc-smp-1)
EIP is at jounral_commit_transaction+0x315/0x12f0
eax: f157117c   ebx: 00000000   ecx: 00000000   edx: f7cffb80
es: 007b   es: 007b   ss: 0068
Process kjournald (pid: 171, threadinfo=f6f7e000 task=f6e92520)
Stack: e2efd5cc e2efd5cc 00000008 000011da 00000000 f6e97ac0 f6f7e000 f6f7e000
       f6ef6bb8 f6e97a14 00000000 00000000 00000000 00000000 00000000 f0fa9dac
       e8a39ecc 000011da 00000000 f6e92520 c012d580 f6f7fe44 f6f7fe44 00000000
Call Trace:
 [<c012d580>] autoremove_wake_function+0x0/0x60
 [<c012d580>] autoremove_wake_function+0x0/0x60
 [<c01a9af5>] kjounald+0xe5/0x240
 [<c011b837>] do_exit+0x2d7/0x480
 [<c012d580>] autoremove_wake_function+0x0/0x60
 [<c012d580>] autoremove_wake_function+0x0/0x60
 [<c0102572>] ret_from_fork+0x6/0x14
 [<c01a99f0>] commit_timeout+0x0/0x10
 [<c01a9a10>] kjournald+0x0/0x240
 [<c01007f5>] kernel_thread_helper+0x5/0x10
Code: 00 8b 45 20 85 c0 75 be 8b 44 24 38 85 c0 0f 85 16 0d 00 00 8b 45 18 85 c0 0f 84 83 00 00 00 be 00 e0 ff ff 21 e6 8b 78 20 8b 1f <f0> ff 43 0c 8b 03 a8 04 0f 85 b3 0c 00 00 89 5c 24 04 8b 94 24
 <6>note: kjournald[171] exited with preempt_count 1
---------------------- schnipp --------------------------------

I run a 2.6.10 kernel (with aacraid 1.1.5 2372 driver from Adaptec,
everything else vanilla) on Dual Xeon machines. The kernel has ext3fs
and XFS compiled in but currently all filesystems are ext3.

In the logs I don't see anything, because the machines freeze with the
above message (I retyped the messages from the screen, so there may be
some typos, if necessary I have a screenshot here to correct some

What can I do to fix this problem?  Using google I didn't find a hint
for further search, but my kernel knowledge is very limited :-|



