linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Oops in 2.6.10-ac12 in kjournald (journal_commit_transaction)
@ 2005-02-15 14:56 Ralf Hildebrandt
  2005-02-16 15:33 ` Jan Kara
  0 siblings, 1 reply; 10+ messages in thread
From: Ralf Hildebrandt @ 2005-02-15 14:56 UTC (permalink / raw)
  To: linux-kernel

Today our mailserver froze after just one day of uptime. I was able to
capture the Oops on the screen using my digital camera:

http://www.stahl.bau.tu-bs.de/~hildeb/bugreport/

Keywords: EIP is at journal_commit_transaction, process kjournald

# mount
/dev/cciss/c0d0p6 on / type ext3 (rw,errors=remount-ro)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/cciss/c0d0p5 on /boot type ext3 (rw)
/dev/shm on /var/amavis type tmpfs (rw,noatime,size=200m,mode=770,uid=104,gid=108)

-- 
Ralf Hildebrandt (i.A. des IT-Zentrum)          Ralf.Hildebrandt@charite.de
Charite - Universitätsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-Berlin    Fax.  +49 (0)30-450 570-962
IT-Zentrum Standort CBF                 send no mail to spamtrap@charite.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in 2.6.10-ac12 in kjournald (journal_commit_transaction)
  2005-02-15 14:56 Oops in 2.6.10-ac12 in kjournald (journal_commit_transaction) Ralf Hildebrandt
@ 2005-02-16 15:33 ` Jan Kara
  2005-02-16 20:04   ` Ralf Hildebrandt
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2005-02-16 15:33 UTC (permalink / raw)
  To: linux-kernel

  Hello,

> Today our mailserver froze after just one day of uptime. I was able to
> capture the Oops on the screen using my digital camera:
> 
> http://www.stahl.bau.tu-bs.de/~hildeb/bugreport/
> 
> Keywords: EIP is at journal_commit_transaction, process kjournald
  I guess the system is SMP... Sadly a few lines in the beginning of the
report are missing (probably scrolled off the screen) but it seems
similar like a several other oopses I've seen reported recently. Is this
the first time you hit this bug?

> # mount
> /dev/cciss/c0d0p6 on / type ext3 (rw,errors=remount-ro)
> proc on /proc type proc (rw)
> sysfs on /sys type sysfs (rw)
> devpts on /dev/pts type devpts (rw,gid=5,mode=620)
> tmpfs on /dev/shm type tmpfs (rw)
> /dev/cciss/c0d0p5 on /boot type ext3 (rw)
> /dev/shm on /var/amavis type tmpfs (rw,noatime,size=200m,mode=770,uid=104,gid=108)

								Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in 2.6.10-ac12 in kjournald (journal_commit_transaction)
  2005-02-16 15:33 ` Jan Kara
@ 2005-02-16 20:04   ` Ralf Hildebrandt
  2005-02-16 21:54     ` Dale Blount
  0 siblings, 1 reply; 10+ messages in thread
From: Ralf Hildebrandt @ 2005-02-16 20:04 UTC (permalink / raw)
  To: linux-kernel

* Jan Kara <jack@suse.cz>:

>   I guess the system is SMP...

Indeed it is. Dual Xeon with SMP.

>   Sadly a few lines in the beginning of the
> report are missing (probably scrolled off the screen)

Yes, this sucks. I rebooted with vesafb active, no I do have 50 lines :)

> but it seems similar like a several other oopses I've seen reported
> recently. Is this the first time you hit this bug?

It's actually the second time. The first time it hit the SAME box but
with kernel-2.6.10 (vanilla) after 30 days of uptime. Nobody had a
camera at hand, so I couldn't take a photo.

Any suggestions? I'm open to suggestions. One difference between the
2.6.10 and 2.6.10-ac12 was that 2.6.10 has no in-kernel irq
balancing, while in 2.6.10-ac12 I acivated that.

-- 
Ralf Hildebrandt (i.A. des IT-Zentrum)          Ralf.Hildebrandt@charite.de
Charite - Universitätsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-Berlin    Fax.  +49 (0)30-450 570-962
IT-Zentrum Standort CBF                 send no mail to spamtrap@charite.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in 2.6.10-ac12 in kjournald (journal_commit_transaction)
  2005-02-16 20:04   ` Ralf Hildebrandt
@ 2005-02-16 21:54     ` Dale Blount
  2005-02-16 22:00       ` Ralf Hildebrandt
  2005-02-16 22:55       ` Andrew Morton
  0 siblings, 2 replies; 10+ messages in thread
From: Dale Blount @ 2005-02-16 21:54 UTC (permalink / raw)
  To: Ralf Hildebrandt; +Cc: linux-kernel

On Wed, 2005-02-16 at 21:04 +0100, Ralf Hildebrandt wrote:
> * Jan Kara <jack@suse.cz>:
> 
> >   I guess the system is SMP...
> 
> Indeed it is. Dual Xeon with SMP.
> 

This looks very similar (at least to me) to an OOPS I posted with 2.6.9
on 12/03/2004.
http://marc.theaimsgroup.com/?l=linux-kernel&m=110210705504716&w=2

My system is also a dual Xeon using SMP and Hyperthreading
(/proc/cpuinfo shows 4 cpus).
Mine, like Ralf's, is also a mail server running postfix using ext3 for
the spool directory.

> > but it seems similar like a several other oopses I've seen reported
> > recently. Is this the first time you hit this bug?
> 
> It's actually the second time. The first time it hit the SAME box but
> with kernel-2.6.10 (vanilla) after 30 days of uptime. Nobody had a
> camera at hand, so I couldn't take a photo.
> 

I've actually hit this bug (assuming it's the same) with 2.6.10 also.  I
had to power cycle remotely and unfortunately didn't have the serial
console logging enabled when it happened with 2.6.10.  I upgraded from
2.4.23 to 2.6.8.1 and crashed within a week, and continued to crash at
least monthly after that.  It had been running 2.4.23 for 200+ days with
no problems.

Hope this helps trace it back.

Dale


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in 2.6.10-ac12 in kjournald (journal_commit_transaction)
  2005-02-16 21:54     ` Dale Blount
@ 2005-02-16 22:00       ` Ralf Hildebrandt
  2005-02-16 22:55       ` Andrew Morton
  1 sibling, 0 replies; 10+ messages in thread
From: Ralf Hildebrandt @ 2005-02-16 22:00 UTC (permalink / raw)
  To: linux-kernel

* Dale Blount <linux-kernel@dale.us>:

> This looks very similar (at least to me) to an OOPS I posted with 2.6.9
> on 12/03/2004.
> http://marc.theaimsgroup.com/?l=linux-kernel&m=110210705504716&w=2

Could be.

> My system is also a dual Xeon using SMP and Hyperthreading
> (/proc/cpuinfo shows 4 cpus).

Same system here.

> Mine, like Ralf's, is also a mail server running postfix using ext3 for
> the spool directory.

Same here.

> I've actually hit this bug (assuming it's the same) with 2.6.10 also.  I
> had to power cycle remotely and unfortunately didn't have the serial
> console logging enabled when it happened with 2.6.10.  I upgraded from
> 2.4.23 to 2.6.8.1 and crashed within a week, and continued to crash at
> least monthly after that.  It had been running 2.4.23 for 200+ days with
> no problems.
> 
> Hope this helps trace it back.

Me too


-- 
Ralf Hildebrandt (i.A. des IT-Zentrum)          Ralf.Hildebrandt@charite.de
Charite - Universitätsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-Berlin    Fax.  +49 (0)30-450 570-962
IT-Zentrum Standort CBF                 send no mail to spamtrap@charite.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in 2.6.10-ac12 in kjournald (journal_commit_transaction)
  2005-02-16 21:54     ` Dale Blount
  2005-02-16 22:00       ` Ralf Hildebrandt
@ 2005-02-16 22:55       ` Andrew Morton
  2005-02-17 10:58         ` Ralf Hildebrandt
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2005-02-16 22:55 UTC (permalink / raw)
  To: Dale Blount; +Cc: Ralf.Hildebrandt, linux-kernel

Dale Blount <linux-kernel@dale.us> wrote:
>
> This looks very similar (at least to me) to an OOPS I posted with 2.6.9
> on 12/03/2004.
> http://marc.theaimsgroup.com/?l=linux-kernel&m=110210705504716&w=2

There have been a handful of reports - there's surely a race in there.

Unfortunately I've yet to see a report from which we can identify the
offending line in the very large journal_commit_transaction() function.

The best way to do that is to ensure that the kernel was built with
CONFIG_DEBUG_INFO, note the offending EIP value, then do

# gdb vmlinux
(gdb) l *0xc0<whatever>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in 2.6.10-ac12 in kjournald (journal_commit_transaction)
  2005-02-16 22:55       ` Andrew Morton
@ 2005-02-17 10:58         ` Ralf Hildebrandt
  2005-02-17 13:21           ` Ralf Hildebrandt
  0 siblings, 1 reply; 10+ messages in thread
From: Ralf Hildebrandt @ 2005-02-17 10:58 UTC (permalink / raw)
  To: Dale Blount, linux-kernel

* Andrew Morton <akpm@osdl.org>:

> There have been a handful of reports - there's surely a race in there.
> 
> Unfortunately I've yet to see a report from which we can identify the
> offending line in the very large journal_commit_transaction() function.

:(

> 
> The best way to do that is to ensure that the kernel was built with
> CONFIG_DEBUG_INFO, note the offending EIP value, then do
> 
> # gdb vmlinux
> (gdb) l *0xc0<whatever>

I'm rebuilding the ac12 kernel which crashed on me after just one day
and will reboot it today.

-- 
Ralf Hildebrandt (i.A. des IT-Zentrum)          Ralf.Hildebrandt@charite.de
Charite - Universitätsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-Berlin    Fax.  +49 (0)30-450 570-962
IT-Zentrum Standort CBF                 send no mail to spamtrap@charite.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in 2.6.10-ac12 in kjournald (journal_commit_transaction)
  2005-02-17 10:58         ` Ralf Hildebrandt
@ 2005-02-17 13:21           ` Ralf Hildebrandt
  2005-02-17 15:51             ` Randy.Dunlap
  0 siblings, 1 reply; 10+ messages in thread
From: Ralf Hildebrandt @ 2005-02-17 13:21 UTC (permalink / raw)
  To: Dale Blount, linux-kernel

* Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>:

> > The best way to do that is to ensure that the kernel was built with
> > CONFIG_DEBUG_INFO, note the offending EIP value, then do
> > 
> > # gdb vmlinux
> > (gdb) l *0xc0<whatever>
> 
> I'm rebuilding the ac12 kernel which crashed on me after just one day
> and will reboot it today.

Is it normal that the kernel with debugging enabled is not larger than
the normal kernel?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in 2.6.10-ac12 in kjournald (journal_commit_transaction)
  2005-02-17 13:21           ` Ralf Hildebrandt
@ 2005-02-17 15:51             ` Randy.Dunlap
  2005-02-17 16:00               ` Ralf Hildebrandt
  0 siblings, 1 reply; 10+ messages in thread
From: Randy.Dunlap @ 2005-02-17 15:51 UTC (permalink / raw)
  To: Ralf Hildebrandt; +Cc: Dale Blount, linux-kernel

Ralf Hildebrandt wrote:
> * Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>:
> 
> 
>>>The best way to do that is to ensure that the kernel was built with
>>>CONFIG_DEBUG_INFO, note the offending EIP value, then do
>>>
>>># gdb vmlinux
>>>(gdb) l *0xc0<whatever>
>>
>>I'm rebuilding the ac12 kernel which crashed on me after just one day
>>and will reboot it today.
> 
> 
> Is it normal that the kernel with debugging enabled is not larger than
> the normal kernel?
> -

No, it should be much larger.  Recheck the .config file
for CONFIG_DEBUG_INFO=y.  Maybe you need to do 'make clean'
first.

-- 
~Randy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Oops in 2.6.10-ac12 in kjournald (journal_commit_transaction)
  2005-02-17 15:51             ` Randy.Dunlap
@ 2005-02-17 16:00               ` Ralf Hildebrandt
  0 siblings, 0 replies; 10+ messages in thread
From: Ralf Hildebrandt @ 2005-02-17 16:00 UTC (permalink / raw)
  To: Dale Blount, linux-kernel

* Randy.Dunlap <rddunlap@osdl.org>:

> >Is it normal that the kernel with debugging enabled is not larger than
> >the normal kernel?
> >-
> 
> No, it should be much larger.  Recheck the .config file
> for CONFIG_DEBUG_INFO=y.  Maybe you need to do 'make clean'
> first.

CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
# CONFIG_SCHEDSTATS is not set
# CONFIG_DEBUG_SLAB is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_KOBJECT is not set
# CONFIG_DEBUG_HIGHMEM is not set
CONFIG_DEBUG_INFO=y
# CONFIG_FRAME_POINTER is not set
CONFIG_EARLY_PRINTK=y

I built that using "make-kpkg"

make-kpkg clean
CONCURRENCY_LEVEL=4 MAKEFLAGS="CC=gcc-3.4" make-kpkg --revision=20050217 kernel_image

-- 
Ralf Hildebrandt (i.A. des IT-Zentrum)          Ralf.Hildebrandt@charite.de
Charite - Universitätsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-Berlin    Fax.  +49 (0)30-450 570-962
IT-Zentrum Standort CBF                 send no mail to spamtrap@charite.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-02-17 16:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-15 14:56 Oops in 2.6.10-ac12 in kjournald (journal_commit_transaction) Ralf Hildebrandt
2005-02-16 15:33 ` Jan Kara
2005-02-16 20:04   ` Ralf Hildebrandt
2005-02-16 21:54     ` Dale Blount
2005-02-16 22:00       ` Ralf Hildebrandt
2005-02-16 22:55       ` Andrew Morton
2005-02-17 10:58         ` Ralf Hildebrandt
2005-02-17 13:21           ` Ralf Hildebrandt
2005-02-17 15:51             ` Randy.Dunlap
2005-02-17 16:00               ` Ralf Hildebrandt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).