linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.6.test11 bug
@ 2003-12-08  3:46 Rafal Skoczylas
  2003-12-08  4:17 ` William Lee Irwin III
  2003-12-08  5:17 ` Linus Torvalds
  0 siblings, 2 replies; 11+ messages in thread
From: Rafal Skoczylas @ 2003-12-08  3:46 UTC (permalink / raw)
  To: linux-kernel

[ I am sorry if this message doesn't get threaded in the original
thread in your software, but I read lkml ocassionally through
usenet-gate so didn't have the original message in my mbox and
couldn't just hit "reply" ;) ]

On Mon, 08 Dec 2003 03:30:12 +0100 Gordon Cormack wrote:
> I have read the FAQ but I'm confused about how to report a 2.6
> kernel bug, or who to report it to.
> [...]
> Dec  6 13:16:01 flax20 kernel: Bad page state at free_hot_cold_page
> Dec  6 13:16:01 flax20 kernel: flags:0x02000114 mapping:00000000
> mapped:1 count:0
> [...]

Hello.
I am experiencing similiar behaviour as described below.
In my case it is mlnetd (of mldonkey package) which seems to be
responsible for driving kernel to a crash[1].
After a few hours of running, either the process gets killed or system
crashes (I am only able to reboot it with alt+prntscr+b, but it seems
like it is not able to [S]ync or [U]nmount filesystems - i have lost
a few files which were open at the time of crash[2]).

It may be worth to mention that I don't remember having such a crash
on 2.6.0-test9 which i used for a couple of weeks (since first day
it apeared on ftp.kernel.org untill test11 - i skipped test10).

Hardware:
---------
ALi M1647, Duron 1200, 512MB sdram.

Kernel:
-------
Linux poziomka 2.6.0-test11 #32 Fri Dec 5 21:10:40 CET 2003 i686
AMD_Duron(TM)Processor unknown Shameless Compilation

Compiled with gcc-3.3.2.

Logs:
-----

--- The last time, i got the following:

Bad page state at free_hot_cold_page
flags:0x01020008 mapping:d38afe68 mapped:0 count:0
Backtrace: 
Call Trace:
 [bad_page+93/144] bad_page+0x5d/0x90
 [free_hot_cold_page+82/256] free_hot_cold_page+0x52/0x100
 [zap_pte_range+358/416] zap_pte_range+0x166/0x1a0
 [zap_pmd_range+75/112] zap_pmd_range+0x4b/0x70
 [unmap_page_range+67/112] unmap_page_range+0x43/0x70 
 [unmap_vmas+225/528] unmap_vmas+0xe1/0x210
 [exit_mmap+123/400] exit_mmap+0x7b/0x190
 [mmput+100/192] mmput+0x64/0xc0
 [do_exit+282/976] do_exit+0x11a/0x3d0
 [do_group_exit+58/176] do_group_exit+0x3a/0xb0
 [get_signal_to_deliver+590/848] get_signal_to_deliver+0x24e/0x350
 [do_signal+149/288] do_signal+0x95/0x120
 [schedule+761/1392] schedule+0x2f9/0x570
 [pipe_write+0/800] pipe_write+0x0/0x320
 [sys_rt_sigsuspend+222/272] sys_rt_sigsuspend+0xde/0x110
 [syscall_call+7/11] syscall_call+0x7/0xb
Trying to fix it up, but a reboot is needed

--- But sometimes it get things like this:

Unable to handle kernel paging request at virtual address 5a85fb5c
 printing eip:
c011e6c4
*pde = 00000000
Oops: 0002 [#1]
CPU:    0
EIP:    0060:[remove_wait_queue+36/112]    Not tainted
EFLAGS: 00010002
EIP is at remove_wait_queue+0x24/0x70
eax: defb4000   ebx: da85fb58   ecx: 5a85fb58   edx: db0468b0
esi: db0468bc   edi: 00000292   ebp: defb5fa0   esp: defb5f58
ds: 007b   es: 007b   ss: 0068
Process mlnetd (pid: 1456, threadinfo=defb4000 task=dd72e100)
Stack: db0468ac db046008 db046000 c0167484 00000000 cad86c08 00000001 c01681f5
       defb5fa0 cad86c00 defb5fa0 00000041 defb4000 08376b48 cad86c08 00000000
       cad86c00 00000001 c01674b0 db046000 00000000 3fd32b9c 083767d8 00000001
Call Trace:
 [poll_freewait+36/80] poll_freewait+0x24/0x50
 [sys_poll+581/656] sys_poll+0x245/0x290
 [__pollwait+0/208] __pollwait+0x0/0xd0
 [syscall_call+7/11] syscall_call+0x7/0xb

Code: 89 59 04 89 0b c7 46 04 00 02 20 00 c7 42 0c 00 01 10 00 57
 <6>note: mlnetd[1456] exited with preempt_count 1
bad: scheduling while atomic!
Call Trace:
 [schedule+1373/1392] schedule+0x55d/0x570
 [unmap_page_range+67/112] unmap_page_range+0x43/0x70
 [unmap_vmas+436/528] unmap_vmas+0x1b4/0x210
 [exit_mmap+123/400] exit_mmap+0x7b/0x190
 [mmput+100/192] mmput+0x64/0xc0
 [do_exit+282/976] do_exit+0x11a/0x3d0
 [do_page_fault+0/1292] do_page_fault+0x0/0x50c
 [die+225/240] die+0xe1/0xf0
 [do_page_fault+474/1292] do_page_fault+0x1da/0x50c
 [do_IRQ+253/304] do_IRQ+0xfd/0x130
 [common_interrupt+24/32] common_interrupt+0x18/0x20
 [tcp_poll+18/352] tcp_poll+0x12/0x160
 [do_page_fault+0/1292] do_page_fault+0x0/0x50c
 [error_code+45/56] error_code+0x2d/0x38
 [remove_wait_queue+36/112] remove_wait_queue+0x24/0x70
 [poll_freewait+36/80] poll_freewait+0x24/0x50
 [sys_poll+581/656] sys_poll+0x245/0x290
 [__pollwait+0/208] __pollwait+0x0/0xd0
 [syscall_call+7/11] syscall_call+0x7/0xb

(this last Call Trace repeted 2 more times)

If there is any important information missing, feel free to ask.
Regards.

[1] Actually, I am not sure, but this is the only candidate because
it uses more and more memory over the time and the crash or kill occurs
more or less at the same level of memory usage (~10-12% of 512Meg).
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
nils      1781 11.1 10.5 41252 38796 pts/2   S    03:39   3:15 mlnetd
                    ^^^^
[2] The files loss is probably XFS-related problem.

nils.
-- 
"Blessed is the man, who having nothing to say, abstains from giving wordy
evidence of the fact."  -- http://secprog.org/who/rs/quote.php?id=1

^ permalink raw reply	[flat|nested] 11+ messages in thread
[parent not found: <10kzo-7mZ-11@gated-at.bofh.it>]
* 2.6.test11 bug
@ 2003-12-08  2:24 Gordon Cormack
  2003-12-08  2:37 ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Gordon Cormack @ 2003-12-08  2:24 UTC (permalink / raw)
  To: linux-kernel

Hi,

I have read the FAQ but I'm confused about how to report a 2.6
kernel bug, or who to report it to.

Here it is in a nutshell.

--- kernel ---

Linux xxx20.uwaterloo.ca 2.6.0-test11 #2 SMP Thu Nov 27 14:46:01 EST 2003 i686 athlon i386 GNU/Linux

---- log ----

Dec  6 13:16:01 flax20 kernel: Bad page state at free_hot_cold_page
Dec  6 13:16:01 flax20 kernel: flags:0x02000114 mapping:00000000 mapped:1 count:0
Dec  6 13:16:01 flax20 kernel: Backtrace:
Dec  6 13:16:01 flax20 kernel: Call Trace:
Dec  6 13:16:01 flax20 kernel:  [<c013f98d>] bad_page+0x5d/0x90
Dec  6 13:16:01 flax20 kernel:  [<c0140041>] free_hot_cold_page+0x61/0xf0
Dec  6 13:16:01 flax20 kernel:  [<c014063c>] __pagevec_free+0x1c/0x30
Dec  6 13:16:01 flax20 kernel:  [<c014527f>] release_pages+0x11f/0x140
Dec  6 13:16:01 flax20 kernel:  [<c01544a6>] free_pages_and_swap_cache+0x56/0x90
Dec  6 13:16:01 flax20 kernel:  [<c014cf60>] unmap_region+0x150/0x160
Dec  6 13:16:01 flax20 kernel:  [<c014d28f>] do_munmap+0x11f/0x170
Dec  6 13:16:01 flax20 kernel:  [<c014d325>] sys_munmap+0x45/0x70
Dec  6 13:16:01 flax20 kernel:  [<c010adcf>] syscall_call+0x7/0xb
Dec  6 13:16:01 flax20 kernel:
Dec  6 13:16:01 flax20 kernel: Trying to fix it up, but a reboot is needed
Dec  6 16:31:14 flax20 syslogd 1.4.1: restart.

--- comments ---

The problem has occurred only once on one of 13 dual-processor machines.
The one that crashed was one (of seven) with dual AMD 1900+, 2GB RAM,
and 4 ATA hard drives.  None of the other machines have crashed in the
9 days since I installed the kernel.

It appears to be related to high memory pressure but I can't reproduce
it with simple programs.  The machines run text search software that is
both CPU and IO intensive.

---

I'm not a developer but I am running a pretty significant load on these
machines and am prepared to do instrumentation/investigation in order to
diagnose the problem.

As an aside, all versions of the 2.4 kernel are brought to their knees
in this application ("kswapd problems" hit full force and none of the
suggested patches worked).  Even with the occasional crash, 2.6.test11 is
way better.

-- 
Gordon V. Cormack     CS Dept, University of Waterloo, Canada N2L 3G1
gvcormack@uwaterloo.ca            http://cormack.uwaterloo.ca/cormack

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2003-12-09 23:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-08  3:46 2.6.test11 bug Rafal Skoczylas
2003-12-08  4:17 ` William Lee Irwin III
2003-12-08  5:17 ` Linus Torvalds
2003-12-08  9:02   ` Xavier Bestel
2003-12-08 16:27   ` Rafal Skoczylas
     [not found]   ` <20031208161742.GB9087@secprog.org>
     [not found]     ` <Pine.LNX.4.58.0312080848560.13236@home.osdl.org>
2003-12-08 17:12       ` 2.6.test11 bug Linus Torvalds
     [not found]         ` <20031209194827.GA22265@secprog.org>
     [not found]           ` <Pine.LNX.4.58.0312091221440.21456@home.osdl.org>
2003-12-09 22:31             ` Rafal Skoczylas
2003-12-09 23:26               ` Linus Torvalds
     [not found] <10kzo-7mZ-11@gated-at.bofh.it>
     [not found] ` <10m88-2wd-1@gated-at.bofh.it>
     [not found]   ` <10xdJ-28r-23@gated-at.bofh.it>
     [not found]     ` <10xdJ-28r-25@gated-at.bofh.it>
     [not found]       ` <10xdJ-28r-21@gated-at.bofh.it>
2003-12-08 22:28         ` Rafal Skoczylas
  -- strict thread matches above, loose matches on Subject: below --
2003-12-08  2:24 Gordon Cormack
2003-12-08  2:37 ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).