linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mm/rmap.c negative page map count BUG.
@ 2006-01-03  8:26 Dave Jones
  2006-01-03 11:42 ` Nick Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Jones @ 2006-01-03  8:26 UTC (permalink / raw)
  To: Linux Kernel

This has cropped up from time to time in the last few Fedora
kernels, by several users. I just got another report that it's
still a problem on 2.6.15rc7 based kernels (so likely .15 final too).

 kernel: kernel BUG at mm/rmap.c:486!
 kernel: invalid operand: 0000 [#1]
 kernel: Modules linked in: parport_pc lp parport nfs lockd nfs_acl autofs4 sunrpc dm_mod ipv6 uhci_hcd shpchp i2c_piix4 i2c_core snd_es18xx snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_opl3_lib snd_timer snd_hwdep snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore tlan floppy ext3 jbd aic7xxx scsi_transport_spi sd_mod scsi_mod
 kernel: CPU:    0
 kernel: EIP:    0060:[<c01502b2>]    Not tainted VLI
 kernel: EFLAGS: 00010286   (2.6.14-1.1769_FC4) 
 kernel: EIP is at page_remove_rmap+0x25/0x2f
 kernel: eax: ffffffff   ebx: c8331e30   ecx: c1152360   edx: c1152360
 kernel: esi: 08f8c000   edi: c1152360   ebp: 00000000   esp: c2114d78
 kernel: ds: 007b   es: 007b   ss: 0068
 kernel: Process udevd (pid: 11892, threadinfo=c2114000 task=c54af030)
 kernel: Stack: c0149a90 c3d70e34 c0419b20 c349d440 ffffffff ffffff3f c349d490 c2e6b08c 
 kernel:        09000000 c2114dfc c2e6b08c c0149ccb 08ecb000 09000000 c2114dfc 00000000 
 kernel:        c3d70e34 c0419b20 c2e6b08c 0913dfff 08ecb000 c3d70e34 0913e000 c2114e24 
 kernel: Call Trace:
 kernel:  [<c0149a90>] zap_pte_range+0x105/0x25a  [<c0149ccb>] unmap_page_range+0xe6/0x110
 kernel:  [<c0149dc7>] unmap_vmas+0xd2/0x1f1     [<c014e5f2>] exit_mmap+0x5f/0xda
 kernel:  [<c0119669>] mmput+0x1f/0x95     [<c0162f1f>] exec_mmap+0xc7/0x149
 kernel:  [<c0163084>] flush_old_exec+0x7b/0x8b7     [<c01595ff>] vfs_read+0xf6/0x158
 kernel:  [<c0162e4e>] kernel_read+0x37/0x41     [<c0182e30>] load_elf_binary+0x2b9/0xd8e
 kernel:  [<c01408b0>] __alloc_pages+0x57/0x2ed     [<c01df430>] copy_from_user+0x42/0x82
 kernel:  [<c0182b77>] load_elf_binary+0x0/0xd8e     [<c0163b32>] search_binary_handler+0x7a/0x243
 kernel:  [<c0163ee3>] do_execve+0x1e8/0x210     [<c0101b3f>] sys_execve+0x30/0x72
 kernel:  [<c0102ec5>] syscall_call+0x7/0xb    
 kernel: Code: 2e 0d 33 c0 eb bf 89 c2 83 40 08 ff 0f 98 c0 84 c0 75 01 c3 8b 42 08 83 c0 01 78 0f ba ff ff ff ff b8 10 00 00 00 e9 32 0b ff ff <0f> 0b e6 01 2e 0d 33 c0 eb e7 55 57 56 53 83 ec 0c 89 c7 89 d3 

The BUG it's hitting is the BUG_ON(page_mapcount(page) < 0); in page_remove_rmap()

anyone with any ideas wtf happened here ?

shortly after hitting this, the users usually report thing likes like ...

kernel: Bad page state at free_hot_cold_page (in process 'kswapd0', page c1152360)
kernel: flags:0x80000010 mapping:00000000 mapcount:-1 count:0

In no examples seen have there been binary modules loaded, and no obvious
signs of hardware failure (some of them have run memtest86 with no problems found)

		Dave


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-03  8:26 mm/rmap.c negative page map count BUG Dave Jones
@ 2006-01-03 11:42 ` Nick Piggin
  2006-01-03 13:53   ` Dave Jones
  0 siblings, 1 reply; 19+ messages in thread
From: Nick Piggin @ 2006-01-03 11:42 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linux Kernel

Dave Jones wrote:
> This has cropped up from time to time in the last few Fedora
> kernels, by several users. I just got another report that it's
> still a problem on 2.6.15rc7 based kernels (so likely .15 final too).
> 
>  kernel: kernel BUG at mm/rmap.c:486!
>  kernel: invalid operand: 0000 [#1]
>  kernel: Modules linked in: parport_pc lp parport nfs lockd nfs_acl autofs4 sunrpc dm_mod ipv6 uhci_hcd shpchp i2c_piix4 i2c_core snd_es18xx snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_opl3_lib snd_timer snd_hwdep snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore tlan floppy ext3 jbd aic7xxx scsi_transport_spi sd_mod scsi_mod
>  kernel: CPU:    0
>  kernel: EIP:    0060:[<c01502b2>]    Not tainted VLI
>  kernel: EFLAGS: 00010286   (2.6.14-1.1769_FC4) 
>  kernel: EIP is at page_remove_rmap+0x25/0x2f
>  kernel: eax: ffffffff   ebx: c8331e30   ecx: c1152360   edx: c1152360
>  kernel: esi: 08f8c000   edi: c1152360   ebp: 00000000   esp: c2114d78
>  kernel: ds: 007b   es: 007b   ss: 0068
>  kernel: Process udevd (pid: 11892, threadinfo=c2114000 task=c54af030)
>  kernel: Stack: c0149a90 c3d70e34 c0419b20 c349d440 ffffffff ffffff3f c349d490 c2e6b08c 
>  kernel:        09000000 c2114dfc c2e6b08c c0149ccb 08ecb000 09000000 c2114dfc 00000000 
>  kernel:        c3d70e34 c0419b20 c2e6b08c 0913dfff 08ecb000 c3d70e34 0913e000 c2114e24 
>  kernel: Call Trace:
>  kernel:  [<c0149a90>] zap_pte_range+0x105/0x25a  [<c0149ccb>] unmap_page_range+0xe6/0x110
>  kernel:  [<c0149dc7>] unmap_vmas+0xd2/0x1f1     [<c014e5f2>] exit_mmap+0x5f/0xda
>  kernel:  [<c0119669>] mmput+0x1f/0x95     [<c0162f1f>] exec_mmap+0xc7/0x149
>  kernel:  [<c0163084>] flush_old_exec+0x7b/0x8b7     [<c01595ff>] vfs_read+0xf6/0x158
>  kernel:  [<c0162e4e>] kernel_read+0x37/0x41     [<c0182e30>] load_elf_binary+0x2b9/0xd8e
>  kernel:  [<c01408b0>] __alloc_pages+0x57/0x2ed     [<c01df430>] copy_from_user+0x42/0x82
>  kernel:  [<c0182b77>] load_elf_binary+0x0/0xd8e     [<c0163b32>] search_binary_handler+0x7a/0x243
>  kernel:  [<c0163ee3>] do_execve+0x1e8/0x210     [<c0101b3f>] sys_execve+0x30/0x72
>  kernel:  [<c0102ec5>] syscall_call+0x7/0xb    
>  kernel: Code: 2e 0d 33 c0 eb bf 89 c2 83 40 08 ff 0f 98 c0 84 c0 75 01 c3 8b 42 08 83 c0 01 78 0f ba ff ff ff ff b8 10 00 00 00 e9 32 0b ff ff <0f> 0b e6 01 2e 0d 33 c0 eb e7 55 57 56 53 83 ec 0c 89 c7 89 d3 
> 
> The BUG it's hitting is the BUG_ON(page_mapcount(page) < 0); in page_remove_rmap()
> 
> anyone with any ideas wtf happened here ?
> 
> shortly after hitting this, the users usually report thing likes like ...
> 
> kernel: Bad page state at free_hot_cold_page (in process 'kswapd0', page c1152360)
> kernel: flags:0x80000010 mapping:00000000 mapcount:-1 count:0
> 

Well it isn't PG_reserved, so it is unlikely to be something like ZERO_PAGE.
That kswapd eventually frees it indicates it is a regular pagecache page on
the LRU... so it is unusual that nobody has reported it here.

Can you reproduce it? On a kernel.org kernel? Can you print ->flags, ->count,
->mapping, etc instead of going BUG?

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-03 11:42 ` Nick Piggin
@ 2006-01-03 13:53   ` Dave Jones
  2006-01-04 23:53     ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Jones @ 2006-01-03 13:53 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Linux Kernel

On Tue, Jan 03, 2006 at 10:42:07PM +1100, Nick Piggin wrote:
 
 > Well it isn't PG_reserved, so it is unlikely to be something like ZERO_PAGE.
 > That kswapd eventually frees it indicates it is a regular pagecache page on
 > the LRU... so it is unusual that nobody has reported it here.
 > 
 > Can you reproduce it?

I can't :(

 > On a kernel.org kernel?

Only some of our users hit it, which makes it tricky to reproduce.

 > Can you print ->flags, ->count, ->mapping, etc instead of going BUG?

I can add some instrumentation like this though, and see what turns up.

thanks,

		Dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-03 13:53   ` Dave Jones
@ 2006-01-04 23:53     ` Andrew Morton
  2006-01-04 23:56       ` Dave Jones
  2006-01-05  7:47       ` Dave Jones
  0 siblings, 2 replies; 19+ messages in thread
From: Andrew Morton @ 2006-01-04 23:53 UTC (permalink / raw)
  To: Dave Jones; +Cc: nickpiggin, linux-kernel

Dave Jones <davej@redhat.com> wrote:
>
>  > Can you print ->flags, ->count, ->mapping, etc instead of going BUG?
> 
> I can add some instrumentation like this though, and see what turns up.

Can we get that instrumentation into the upstream kernel please?  We do
seem to be hitting rmap assertions too often for it to be dud
hardware/bodgy drivers/etc.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-04 23:53     ` Andrew Morton
@ 2006-01-04 23:56       ` Dave Jones
  2006-01-05  0:16         ` Andrew Morton
  2006-01-05  7:47       ` Dave Jones
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Jones @ 2006-01-04 23:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nickpiggin, linux-kernel

On Wed, Jan 04, 2006 at 03:53:26PM -0800, Andrew Morton wrote:
 > Dave Jones <davej@redhat.com> wrote:
 > >
 > >  > Can you print ->flags, ->count, ->mapping, etc instead of going BUG?
 > > 
 > > I can add some instrumentation like this though, and see what turns up.
 > 
 > Can we get that instrumentation into the upstream kernel please?  We do
 > seem to be hitting rmap assertions too often for it to be dud
 > hardware/bodgy drivers/etc.

This is what I came up with..
anything missing ?

		Dave

--- linux-2.6.14/mm/rmap.c~	2006-01-03 08:53:32.000000000 -0500
+++ linux-2.6.14/mm/rmap.c	2006-01-03 08:58:19.000000000 -0500
@@ -484,6 +484,13 @@ void page_remove_rmap(struct page *page)
 	BUG_ON(PageReserved(page));
 
 	if (atomic_add_negative(-1, &page->_mapcount)) {
+		if (page_mapcount(page) < 0) {
+			printk (KERN_EMERG "Eeek! page_mapcount(page) went negative! (%d)\n", page->_mapcount);
+			printk (KERN_EMERG "  page->flags = %x\n", page->flags);
+			printk (KERN_EMERG "  page->count = %x\n", page->_count);
+			printk (KERN_EMERG "  page->mapping = %p\n", page->mapping);
+		}
+		
 		BUG_ON(page_mapcount(page) < 0);
 		/*
 		 * It would be tidy to reset the PageAnon mapping here,

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-04 23:56       ` Dave Jones
@ 2006-01-05  0:16         ` Andrew Morton
  2006-01-05  0:31           ` Dave Jones
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2006-01-05  0:16 UTC (permalink / raw)
  To: Dave Jones; +Cc: nickpiggin, linux-kernel

Dave Jones <davej@redhat.com> wrote:
>
> +			printk (KERN_EMERG "Eeek! page_mapcount(page) went negative! (%d)\n", page->_mapcount);

page_mapcount(page);

> +			printk (KERN_EMERG "  page->flags = %x\n", page->flags);

%lx

> +			printk (KERN_EMERG "  page->count = %x\n", page->_count);

page_count(page);



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-05  0:16         ` Andrew Morton
@ 2006-01-05  0:31           ` Dave Jones
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Jones @ 2006-01-05  0:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nickpiggin, linux-kernel

On Wed, Jan 04, 2006 at 04:16:40PM -0800, Andrew Morton wrote:
 > Dave Jones <davej@redhat.com> wrote:
 > >
 > > +			printk (KERN_EMERG "Eeek! page_mapcount(page) went negative! (%d)\n", page->_mapcount);
 > 
 > page_mapcount(page);
 > 
 > > +			printk (KERN_EMERG "  page->flags = %x\n", page->flags);
 > 
 > %lx
 > 
 > > +			printk (KERN_EMERG "  page->count = %x\n", page->_count);
 > 
 > page_count(page);

Ugh, almost an error per line. I suck.

		Dave

--- linux-2.6.14/mm/rmap.c~	2006-01-03 08:53:32.000000000 -0500
+++ linux-2.6.14/mm/rmap.c	2006-01-03 08:58:19.000000000 -0500
@@ -484,6 +484,13 @@ void page_remove_rmap(struct page *page)
 	BUG_ON(PageReserved(page));
 
 	if (atomic_add_negative(-1, &page->_mapcount)) {
+		if (page_mapcount(page) < 0) {
+			printk (KERN_EMERG "Eeek! page_mapcount(page) went negative! (%d)\n", page_mapcount(page));
+			printk (KERN_EMERG "  page->flags = %lx\n", page->flags);
+			printk (KERN_EMERG "  page->count = %x\n", page_count(page));
+			printk (KERN_EMERG "  page->mapping = %p\n", page->mapping);
+		}
+		
 		BUG_ON(page_mapcount(page) < 0);
 		/*
 		 * It would be tidy to reset the PageAnon mapping here,

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-04 23:53     ` Andrew Morton
  2006-01-04 23:56       ` Dave Jones
@ 2006-01-05  7:47       ` Dave Jones
  2006-01-05  8:11         ` Arjan van de Ven
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Jones @ 2006-01-05  7:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nickpiggin, linux-kernel

On Wed, Jan 04, 2006 at 03:53:26PM -0800, Andrew Morton wrote:
 > Dave Jones <davej@redhat.com> wrote:
 > >
 > >  > Can you print ->flags, ->count, ->mapping, etc instead of going BUG?
 > > 
 > > I can add some instrumentation like this though, and see what turns up.
 > 
 > Can we get that instrumentation into the upstream kernel please?  We do
 > seem to be hitting rmap assertions too often for it to be dud
 > hardware/bodgy drivers/etc.

I had a quick skim through bugme.osdl.org & Red Hat bugzilla.

Seems to be a few variants of this problem reported.
Quite a few Fedora users have hit it over the last year,
but what I find fascinating is that there's not a single
occurance of "BUG at mm/rmap.c" in our 2.6.9 based RHEL4 bug reports.

		Dave


2005-08-07
http://bugme.osdl.org/show_bug.cgi?id=3636

	Oct 25 04:41:47 www kernel: kernel BUG at mm/rmap.c:474!
	Oct 25 04:41:47 www kernel: invalid operand: 0000 [#4]
	Oct 25 04:41:47 www kernel: PREEMPT
	Oct 25 04:41:47 www kernel: Modules linked in:
	Oct 25 04:41:47 www kernel: CPU:    0
	Oct 25 04:41:47 www kernel: EIP:    0060:[<c0147319>]    Not tainted VLI
	Oct 25 04:41:47 www kernel: EFLAGS: 00010286   (2.6.9)
	Oct 25 04:41:47 www kernel: EIP is at page_remove_rmap+0x29/0x40
	Oct 25 04:41:47 www kernel: eax: ffffffff   ebx: 000dd000   ecx: c1160bc0   edx: c1160bc0
	Oct 25 04:41:47 www kernel: esi: c5e6f894   edi: c1160bc0   ebp: 00100000   esp: c9e93e90
	Oct 25 04:41:47 www kernel: ds: 007b   es: 007b   ss: 0068
	Oct 25 04:41:47 www kernel: Process show_bug.cgi (pid: 16375, threadinfo=c9e92000 task=cdac9020)
	Oct 25 04:41:47 www kernel: Stack: c0140ce6 c1160bc0 c02e6790 c9dec7a0 00000000 0b05e067 08948000 c4325088
	Oct 25 04:41:47 www kernel:        08648000 00000000 c0140e47 c045a008 c4325084 08548000 00100000 00000000
	Oct 25 04:41:47 www kernel:        c045a008 08548000 c4325088 08648000 00000000 c0140ebb c045a008 c4325084
	Oct 25 04:41:47 www kernel: Call Trace:
	Oct 25 04:41:47 www kernel:  [<c0140ce6>] zap_pte_range+0x126/0x230
	Oct 25 04:41:47 www kernel:  [<c02e6790>] ip_rcv_finish+0x0/0x270
	Oct 25 04:41:47 www kernel:  [<c0140e47>] zap_pmd_range+0x57/0x80
	Oct 25 04:41:47 www kernel:  [<c0140ebb>] unmap_page_range+0x4b/0x80
	Oct 25 04:41:47 www kernel:  [<c0140fed>] unmap_vmas+0xfd/0x1c0
	Oct 25 04:41:47 www kernel:  [<c0145593>] exit_mmap+0x83/0x160
	Oct 25 04:41:47 www kernel:  [<c01161d4>] mmput+0x64/0xb0
	Oct 25 04:41:47 www kernel:  [<c011aa72>] do_exit+0x152/0x420
	Oct 25 04:41:47 www kernel:  [<c010654d>] do_IRQ+0xfd/0x130
	Oct 25 04:41:47 www kernel:  [<c011adca>] do_group_exit+0x3a/0xb0
	Oct 25 04:41:47 www kernel:  [<c010421b>] syscall_call+0x7/0xb

2005-03-22
http://bugme.osdl.org/show_bug.cgi?id=4388

	Nov  4 13:55:03 localhost kernel: kernel BUG at mm/rmap.c:487!
	Nov  4 13:55:03 localhost kernel: invalid operand: 0000 [#1]
	Nov  4 13:55:03 localhost kernel: PREEMPT 
	Nov  4 13:55:03 localhost kernel: Modules linked in: radeon drm
	Nov  4 13:55:03 localhost kernel: CPU:    0
	Nov  4 13:55:03 localhost kernel: EIP:    0060:[page_remove_rmap+71/96]    Not tainted VLI
	Nov  4 13:55:03 localhost kernel: EFLAGS: 00010286   (2.6.14) 
	Nov  4 13:55:03 localhost kernel: EIP is at page_remove_rmap+0x47/0x60
	Nov  4 13:55:03 localhost kernel: eax: ffffffff   ebx: ccdbd244   ecx: 00000002   edx: c11cb8c0
	Nov  4 13:55:03 localhost kernel: esi: c11cb8c0   edi: 41891000   ebp: ce246d88   esp: ce246d80
	Nov  4 13:55:03 localhost kernel: ds: 007b   es: 007b   ss: 0068
	Nov  4 13:55:03 localhost kernel: Process postmaster (pid: 1914, threadinfo=ce246000 task=ce179560)
	Nov  4 13:55:04 localhost kernel: Stack: c014943d ccdbd244 ce246dac c014dd6c c11cb8c0 00000000 00000001 0e5c6025 
	Nov  4 13:55:04 localhost kernel:        cebab41c 41897000 41897000 ce246dd8 c014df24 c04e94ac cebab418 4188f000 
	Nov  4 13:55:04 localhost kernel:        41897000 00000000 41896fff 00008000 41897000 cd7a8634 ce246e18 c014e039 
	Nov  4 13:55:04 localhost kernel: Call Trace:
	Nov  4 13:55:04 localhost kernel:  [show_stack+171/240] show_stack+0xab/0xf0
	Nov  4 13:55:04 localhost kernel:  [show_registers+399/560] show_registers+0x18f/0x230
	Nov  4 13:55:04 localhost kernel:  [die+237/400] die+0xed/0x190
	Nov  4 13:55:04 localhost kernel:  [do_trap+137/208] do_trap+0x89/0xd0
	Nov  4 13:55:04 localhost kernel:  [do_invalid_op+170/192] do_invalid_op+0xaa/0xc0
	Nov  4 13:55:04 localhost kernel:  [error_code+79/84] error_code+0x4f/0x54
	Nov  4 13:55:04 localhost kernel:  [zap_pte_range+220/512] zap_pte_range+0xdc/0x200
	Nov  4 13:55:04 localhost kernel:  [unmap_page_range+148/208] unmap_page_range+0x94/0xd0
	Nov  4 13:55:04 localhost kernel:  [unmap_vmas+217/544] unmap_vmas+0xd9/0x220
	Nov  4 13:55:04 localhost kernel:  [exit_mmap+130/352] exit_mmap+0x82/0x160
	Nov  4 13:55:04 localhost kernel:  [mmput+53/176] mmput+0x35/0xb0
	Nov  4 13:55:04 localhost kernel:  [exit_mm+170/352] exit_mm+0xaa/0x160
	Nov  4 13:55:04 localhost kernel:  [do_exit+206/1184] do_exit+0xce/0x4a0
	Nov  4 13:55:04 localhost kernel:  [do_group_exit+59/208] do_group_exit+0x3b/0xd0
	Nov  4 13:55:04 localhost kernel:  [get_signal_to_deliver+515/848] get_signal_to_deliver+0x203/0x350
	Nov  4 13:55:04 localhost kernel:  [do_signal+87/288] do_signal+0x57/0x120
	Nov  4 13:55:04 localhost kernel:  [do_notify_resume+42/60] do_notify_resume+0x2a/0x3c
	Nov  4 13:55:04 localhost kernel:  [work_notifysig+19/25] work_notifysig+0x13/0x19

2005-08-23
http://bugme.osdl.org/show_bug.cgi?id=4873

	Jul 11 17:55:09 us401 kernel: kernel BUG at mm/rmap.c:493!
	Jul 11 17:55:09 us401 kernel: invalid operand: 0000 [#1]
	Jul 11 17:55:09 us401 kernel: SMP 
	Jul 11 17:55:09 us401 kernel: Modules linked in: netconsole iptable_nat ipv6 ipt_TOS iptable_mangle ip_conntrack_ftp ip_conntrack_irc ipt_LOG ipt_limit ipt_multiport autofs ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables sg scsi_mod parport_pc parport microcode loop video thermal processor fan button battery ac raid1
	Jul 11 17:55:09 us401 kernel: CPU:    2
	Jul 11 17:55:09 us401 kernel: EIP:    0060:[<c0151e99>]    Not tainted VLI
	Jul 11 17:55:09 us401 kernel: EFLAGS: 00010286   (2.6.12.1) 
	Jul 11 17:55:09 us401 kernel: EIP is at page_remove_rmap+0x39/0x50
	Jul 11 17:55:09 us401 kernel: eax: ffffffff   ebx: 00013508   ecx: 00000038   edx: c126a100
	Jul 11 17:55:09 us401 kernel: esi: ef60d720   edi: c126a100   ebp: 08ae4000   esp: ee869e84
	Jul 11 17:55:09 us401 kernel: ds: 007b   es: 007b   ss: 0068
	Jul 11 17:55:09 us401 kernel: Process httpd (pid: 28353, threadinfo=ee868000 task=d2d0c530)
	Jul 11 17:55:09 us401 kernel: Stack: c0145cd4 00013508 c014a9a7 c126a100 d2065be8 13508067 00000000 00000000 
	Jul 11 17:55:09 us401 kernel:        f5e52228 08ad0000 08b27000 c014ac16 c201a900 f5e52228 08ad0000 08b27000 
	Jul 11 17:55:09 us401 kernel:        00000000 08b26fff 08b26fff 08b27000 f77ba380 00057000 08b27000 08b27000 
	Jul 11 17:55:09 us401 kernel: Call Trace:
	Jul 11 17:55:09 us401 kernel:  [<c0145cd4>] mark_page_accessed+0x34/0x40
	Jul 11 17:55:09 us401 kernel:  [<c014a9a7>] zap_pte_range+0x107/0x270
	Jul 11 17:55:09 us401 kernel:  [<c014ac16>] unmap_page_range+0x106/0x150
	Jul 11 17:55:09 us401 kernel:  [<c014ad56>] unmap_vmas+0xf6/0x250
	Jul 11 17:55:09 us401 kernel:  [<c014f6b3>] unmap_region+0xb3/0x160
	Jul 11 17:55:09 us401 kernel:  [<c014f9df>] do_munmap+0x10f/0x150
	Jul 11 17:55:09 us401 kernel:  [<c014de22>] sys_brk+0x112/0x120
	Jul 11 17:55:09 us401 kernel:  [<c0102daf>] sysenter_past_esp+0x54/0x75
	Jul 11 17:55:09 us401 kernel: Code: f0 83 42 08 ff 0f 98 c0 84 c0 74 1b 8b 42 08 40 78 19 c7 04 24 10 00 00 00 b8 ff ff ff ff 89 44 24 04 e8 bb f3 fe ff 83 c4 08 c3

2005-11-27
http://bugme.osdl.org/show_bug.cgi?id=5666

	kernel BUG at mm/rmap.c:487!    
	invalid operand: 0000 [#1]    
	Modules linked in: af_packet ipt_limit ipt_state iptable_mangle iptable_nat    
	ip_nat iptable_filter ipt_ULOG ip_tables ipv6 ip_conntrack_ftp ip_conntrack    
	via_rhine sis900 mii unix    
	CPU:    0    
	EIP:    0060:[<c014b5a7>]    Tainted: G   M  VLI    
	EFLAGS: 00010286   (2.6.14)    
	EIP is at page_remove_rmap+0x37/0x50    
	eax: ffffffff   ebx: d5097c20   ecx: c03e9dcc   edx: c11fa560    
	esi: b7f08000   edi: c11fa560   ebp: 00000020   esp: cf9ddebc    
	ds: 007b   es: 007b   ss: 0068    
	Process apache2 (pid: 22104, threadinfo=cf9dc000 task=dd0850b0)    
	Stack: c11f3fe0 d5097c20 c0145298 c11fa560 b76bc000 d7daab7c b7f2d000 b7f2d000    
	       b7f2cfff c014541a c03e9dcc d7daab7c b7f06000 b7f2d000 00000000 00027000    
	       b7f2d000 b7f2d000 d15e7284 c0145529 c03e9dcc d15e7284 b7f06000 b7f2d000    
	Call Trace:    
	 [<c0145298>] zap_pte_range+0xd8/0x1d0    
	 [<c014541a>] unmap_page_range+0x8a/0xb0    
	 [<c0145529>] unmap_vmas+0xe9/0x1e0    
	 [<c0149a59>] exit_mmap+0x79/0x150    
	 [<c01181dc>] mmput+0x2c/0x80    
	 [<c011c3a8>] do_exit+0xd8/0x390    
	 [<c011c6d4>] do_group_exit+0x34/0x70    
	 [<c0103075>] syscall_call+0x7/0xb    
	Code: 75 33 83 42 08 ff 0f 98 c0 84 c0 74 1a 8b 42 08 40 78 18 c7 44 24 04 ff    
	ff ff ff c7 04 24 10 00 00 00 e8 8d 10 ff ff 83 c4 08 c3 <0f> 0b e7 01 c0 2a 33    
	c0 eb de 0f 0b e4 01 c0 2a 33 c0 eb c3 90  

2005-12-16
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=175925

Dec 15 02:57:13 garvin kernel: kernel BUG at mm/rmap.c:487!
Dec 15 02:57:13 garvin kernel: invalid operand: 0000 [#1]
Dec 15 02:57:13 garvin kernel: Modules linked in: loop parport_pc lp parport nfs
 lockd nfs_acl autofs4 sunrpc dm_mod ipv6 uhci_hcd i2c_piix4 i2c_core snd_es18xx
 snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss 
snd_pcm snd_page_alloc snd_opl3_lib snd_timer snd_hwdep snd_mpu401_uart snd_rawm
idi snd_seq_device snd soundcore tlan floppy ext3 jbd aic7xxx scsi_transport_spi
 sd_mod scsi_mod
Dec 15 02:57:13 garvin kernel: CPU:    0
Dec 15 02:57:13 garvin kernel: EIP:    0060:[<c014f97b>]    Not tainted VLI
Dec 15 02:57:13 garvin kernel: EFLAGS: 00010286   (2.6.14-1.1637_FC4) 
Dec 15 02:57:13 garvin kernel: EIP is at page_remove_rmap+0x37/0x41
Dec 15 02:57:13 garvin kernel: eax: ffffffff   ebx: c85d5e30   ecx: 00000006   edx: c115c580
Dec 15 02:57:13 garvin kernel: esi: c115c580   edi: 0038c000   ebp: c03f7a7c   esp: cd7ddec8
Dec 15 02:57:13 garvin kernel: ds: 007b   es: 007b   ss: 0068
Dec 15 02:57:13 garvin kernel: Process udev (pid: 4008, threadinfo=cd7dd000 task=c7059ab0)
Dec 15 02:57:13 garvin kernel: Stack: c0149137 00000000 00391000 c03f7a7c c0a7d000 00391000 00391000 00390fff 
Dec 15 02:57:13 garvin kernel:        c01492ca 00391000 00000000 c03f7a7c 00009000 00391000 c4ce3ddc 00391000 
Dec 15 02:57:13 garvin kernel:        c0149401 00391000 00000000 cd7dd000 cdb671c0 cd7ddf58 002d7000 00000000 
Dec 15 02:57:13 garvin kernel: Call Trace:
Dec 15 02:57:13 garvin kernel:  [<c0149137>] zap_pte_range+0xe5/0x1f5
Dec 15 02:57:13 garvin kernel:  [<c01492ca>] unmap_page_range+0x83/0xb7
Dec 15 02:57:13 garvin kernel:  [<c0149401>] unmap_vmas+0x103/0x222
Dec 15 02:57:13 garvin kernel:  [<c014dc05>] exit_mmap+0x7c/0x14c
Dec 15 02:57:13 garvin kernel:  [<c01189a0>] mmput+0x1f/0x95
Dec 15 02:57:13 garvin kernel:  [<c011d33d>] do_exit+0xe0/0x3b8
Dec 15 02:57:13 garvin kernel:  [<c011d66a>] do_group_exit+0x29/0x90
Dec 15 02:57:13 garvin kernel:  [<c0102edd>] syscall_call+0x7/0xb
Dec 15 02:57:13 garvin kernel: Code: ff 0f 98 c0 84 c0 75 01 c3 8b 42 08 83 c0 0
1 90 78 19 ba ff ff ff ff b8 10 00 00 00 e9 43 0c ff ff 0f 0b e4 01 ad 4a 32 c0 
eb d2 <0f> 0b e7 01 ad 4a 32 c0 eb dd 55 57 56 53 83 ec 04 89 c7 89 d3 

2004-09-11
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=121902
(mention of the BUG in comment #46 on 2.6.8, albeit nvidia tainted).

2004-06-21
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=126454

Two instances, at least one 'went away' with a hardware upgrade.
Could be a coincidence.

2004-07-15
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=127903
Wow, the oldest so far. All the way back to 2.6.6.
But again 'went away' with memory module replacements.

2004-11-28
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=141035
Several flavours. Nothing conclusive. Was mistakenly
believed to be possibly related to the amd errata at the time
and closed.

2005-06-02
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=157557
More of the same. Memory corruption after the first oops perhaps?

2005-07-09
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=159364
Another AMD user. Reports the problem 'went away' with an
update to 2.6.12.3


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-05  7:47       ` Dave Jones
@ 2006-01-05  8:11         ` Arjan van de Ven
  2006-01-05 11:15           ` Dave Jones
  0 siblings, 1 reply; 19+ messages in thread
From: Arjan van de Ven @ 2006-01-05  8:11 UTC (permalink / raw)
  To: Dave Jones; +Cc: Andrew Morton, nickpiggin, linux-kernel


> Quite a few Fedora users have hit it over the last year,
> but what I find fascinating is that there's not a single
> occurance of "BUG at mm/rmap.c" in our 2.6.9 based RHEL4 bug reports.

could mean it's caused by consumer hardware code...


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-05  8:11         ` Arjan van de Ven
@ 2006-01-05 11:15           ` Dave Jones
  2006-01-05 11:18             ` Arjan van de Ven
  2006-01-05 19:00             ` Octavio Alvarez
  0 siblings, 2 replies; 19+ messages in thread
From: Dave Jones @ 2006-01-05 11:15 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Andrew Morton, nickpiggin, linux-kernel

On Thu, Jan 05, 2006 at 09:11:51AM +0100, Arjan van de Ven wrote:
 > 
 > > Quite a few Fedora users have hit it over the last year,
 > > but what I find fascinating is that there's not a single
 > > occurance of "BUG at mm/rmap.c" in our 2.6.9 based RHEL4 bug reports.
 > 
 > could mean it's caused by consumer hardware code...

Yeah. People buying enterprise distros do tend to buy branded RAM
with goodies like ECC from big name suppliers instead of a cheap $20
noname DIMM from "Joe's computers".

So it *could* be a lot of these are crappy hardware, especially
as some of the reports do indicate that the problem went away
when they upgraded their RAM.  Some of the others though, I'm
not so sure.

		Dave


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-05 11:15           ` Dave Jones
@ 2006-01-05 11:18             ` Arjan van de Ven
  2006-01-05 11:26               ` Dave Jones
  2006-01-05 19:00             ` Octavio Alvarez
  1 sibling, 1 reply; 19+ messages in thread
From: Arjan van de Ven @ 2006-01-05 11:18 UTC (permalink / raw)
  To: Dave Jones; +Cc: Andrew Morton, nickpiggin, linux-kernel

On Thu, 2006-01-05 at 06:15 -0500, Dave Jones wrote:
> On Thu, Jan 05, 2006 at 09:11:51AM +0100, Arjan van de Ven wrote:
>  > 
>  > > Quite a few Fedora users have hit it over the last year,
>  > > but what I find fascinating is that there's not a single
>  > > occurance of "BUG at mm/rmap.c" in our 2.6.9 based RHEL4 bug reports.
>  > 
>  > could mean it's caused by consumer hardware code...
> 
> Yeah. People buying enterprise distros do tend to buy branded RAM
> with goodies like ECC from big name suppliers instead of a cheap $20
> noname DIMM from "Joe's computers".
> 
> So it *could* be a lot of these are crappy hardware, especially
> as some of the reports do indicate that the problem went away
> when they upgraded their RAM.  Some of the others though, I'm
> not so sure.

it could also be some consumer-mostly device, or driver thereof. say
video capture or weird usb gizmo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-05 11:18             ` Arjan van de Ven
@ 2006-01-05 11:26               ` Dave Jones
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Jones @ 2006-01-05 11:26 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Andrew Morton, nickpiggin, linux-kernel

On Thu, Jan 05, 2006 at 12:18:43PM +0100, Arjan van de Ven wrote:
 > On Thu, 2006-01-05 at 06:15 -0500, Dave Jones wrote:
 > > On Thu, Jan 05, 2006 at 09:11:51AM +0100, Arjan van de Ven wrote:
 > >  > 
 > >  > > Quite a few Fedora users have hit it over the last year,
 > >  > > but what I find fascinating is that there's not a single
 > >  > > occurance of "BUG at mm/rmap.c" in our 2.6.9 based RHEL4 bug reports.
 > >  > 
 > >  > could mean it's caused by consumer hardware code...
 > > 
 > > Yeah. People buying enterprise distros do tend to buy branded RAM
 > > with goodies like ECC from big name suppliers instead of a cheap $20
 > > noname DIMM from "Joe's computers".
 > > 
 > > So it *could* be a lot of these are crappy hardware, especially
 > > as some of the reports do indicate that the problem went away
 > > when they upgraded their RAM.  Some of the others though, I'm
 > > not so sure.
 > 
 > it could also be some consumer-mostly device, or driver thereof. say
 > video capture or weird usb gizmo

except looking at the oopses, there's no obvious pattern amongst
the modules loaded.  Though they could all have a commonality as
a built-in driver, it's a long-shot.

even looking at the Fedora ones alone, which have no built-in
drivers, there's nothing that immediately jumps out like
"ooh, radeon again".  I'll look through them again tomorrow,
but first, sleep.

		Dave


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-05 11:15           ` Dave Jones
  2006-01-05 11:18             ` Arjan van de Ven
@ 2006-01-05 19:00             ` Octavio Alvarez
  2006-01-11  8:01               ` Octavio Alvarez Piza
  1 sibling, 1 reply; 19+ messages in thread
From: Octavio Alvarez @ 2006-01-05 19:00 UTC (permalink / raw)
  To: Dave Jones, Arjan van de Ven; +Cc: Andrew Morton, nickpiggin, linux-kernel

On Thu, 05 Jan 2006 03:15:20 -0800, Dave Jones <davej@redhat.com> wrote:

> On Thu, Jan 05, 2006 at 09:11:51AM +0100, Arjan van de Ven wrote:
>  >
>  > > Quite a few Fedora users have hit it over the last year,
>  > > but what I find fascinating is that there's not a single
>  > > occurance of "BUG at mm/rmap.c" in our 2.6.9 based RHEL4 bug  
> reports.
>  >
>  > could mean it's caused by consumer hardware code...
>
> Yeah. People buying enterprise distros do tend to buy branded RAM
> with goodies like ECC from big name suppliers instead of a cheap $20
> noname DIMM from "Joe's computers".
>
> So it *could* be a lot of these are crappy hardware, especially
> as some of the reports do indicate that the problem went away
> when they upgraded their RAM.  Some of the others though, I'm
> not so sure.

Nevertheless, there are more instances of the bug in recent versions.
For me, version 2.6.10 or 2.6.11 seems to be the big difference, from
1 bug monthly to --suddenly-- 4 weekly.

I'm experiecing that problem too. I have notice that sometimes
"bad_page_state" trigger before the BUG is reported.

http://lkml.org/lkml/2005/12/14/449

I have already installed the instrumentation Dave provided. I'll see how
it goes.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-05 19:00             ` Octavio Alvarez
@ 2006-01-11  8:01               ` Octavio Alvarez Piza
  2006-01-11 16:12                 ` Hugh Dickins
  0 siblings, 1 reply; 19+ messages in thread
From: Octavio Alvarez Piza @ 2006-01-11  8:01 UTC (permalink / raw)
  To: Octavio Alvarez
  Cc: Dave Jones, Arjan van de Ven, Andrew Morton, nickpiggin, linux-kernel

On Thu, 05 Jan 2006 11:00:41 -0800
"Octavio Alvarez" <alvarezp@alvarezp.ods.org> wrote:

> On Thu, 05 Jan 2006 03:15:20 -0800, Dave Jones <davej@redhat.com>
wrote:
> 
> > On Thu, Jan 05, 2006 at 09:11:51AM +0100, Arjan van de Ven wrote:
> >  >
> >  > > Quite a few Fedora users have hit it over the last year,
> >  > > but what I find fascinating is that there's not a single
> >  > > occurance of "BUG at mm/rmap.c" in our 2.6.9 based RHEL4 bug  
> > reports.
> >  >
> >  > could mean it's caused by consumer hardware code...
> >
> > Yeah. People buying enterprise distros do tend to buy branded RAM
> > with goodies like ECC from big name suppliers instead of a cheap $20
> > noname DIMM from "Joe's computers".
> >
> > So it *could* be a lot of these are crappy hardware, especially
> > as some of the reports do indicate that the problem went away
> > when they upgraded their RAM.  Some of the others though, I'm
> > not so sure.
> 
> Nevertheless, there are more instances of the bug in recent versions.
> For me, version 2.6.10 or 2.6.11 seems to be the big difference, from
> 1 bug monthly to --suddenly-- 4 weekly.
> 
> I'm experiecing that problem too. I have notice that sometimes
> "bad_page_state" trigger before the BUG is reported.
> 
> http://lkml.org/lkml/2005/12/14/449
> 
> I have already installed the instrumentation Dave provided. I'll see
how
> it goes.

I have found another instance of "bad_page_state" with mapcount:-1
before
hitting BUG_ON().

sh-3.00$ cat /var/log/kernel | tail -n 19
Bad page state at free_hot_cold_page (in process 'X', page c1140c60)
flags:0x80010008 mapping:00000000
mapcount:-65536 count:0
Backtrace:
 [<c012eee2>] bad_page+0x5c/0x92
 [<c012f56c>] free_hot_cold_page+0x58/0xc2
 [<c012fbb6>] __pagevec_free+0x17/0x1d
 [<c0133e28>] __pagevec_release_nonlru+0x72/0x7f
 [<c0134c04>] shrink_list+0x2ef/0x386
 [<c0134e23>] shrink_cache+0xe7/0x210
 [<c0135323>] shrink_zone+0xac/0xc4
 [<c0135389>] shrink_caches+0x4e/0x5b
 [<c0135466>] try_to_free_pages+0xd0/0x190
 [<c012fa0a>] __alloc_pages+0x170/0x271
 [<c01382ca>] do_anonymous_page+0x37/0x107
 [<c013865c>] __handle_mm_fault+0xa6/0x15e
 [<c02a4cf4>] do_page_fault+0x188/0x545
 [<c02a4b6c>] do_page_fault+0x0/0x545
 [<c0102c9f>] error_code+0x4f/0x54
Trying to fix it up, but a reboot is needed

sh-3.00$ uptime
 23:56:29 up 1 day,  2:45,  4 users,  load average: 1.22, 1.16, 1.12

sh-3.00$ uname -a
Linux octavio 2.6.15 #13 Sat Jan 7 17:37:22 PST 2006 i686 unknown
unknown GNU/Linux

I ran memtest86+ for 24 hours prior to installing the latest kernel boot
with no errors reported.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-11  8:01               ` Octavio Alvarez Piza
@ 2006-01-11 16:12                 ` Hugh Dickins
  2006-01-11 16:21                   ` Arjan van de Ven
  2006-01-11 16:58                   ` Octavio Alvarez Piza
  0 siblings, 2 replies; 19+ messages in thread
From: Hugh Dickins @ 2006-01-11 16:12 UTC (permalink / raw)
  To: Octavio Alvarez Piza
  Cc: Dave Jones, Arjan van de Ven, Andrew Morton, nickpiggin, linux-kernel

On Wed, 11 Jan 2006, Octavio Alvarez Piza wrote:
> On Thu, 05 Jan 2006 11:00:41 -0800
> "Octavio Alvarez" <alvarezp@alvarezp.ods.org> wrote:
> 
> I have found another instance of "bad_page_state" with mapcount:-1
> before hitting BUG_ON().
> 
> Bad page state at free_hot_cold_page (in process 'X', page c1140c60)
> flags:0x80010008 mapping:00000000 mapcount:-65536 count:0

No, that's mapcount -65536 not -1.

That means page->_mapcount contained 0xfffeffff when it should have
contained 0xffffffff.  A single bit got cleared.  Probably bad memory,
overheating, something of that kind.

> I ran memtest86+ for 24 hours prior to installing the latest kernel boot
> with no errors reported.

Well, you've done your best to rule out that possibility, yes.

We can't rule out that something somewhere in the kernel has
scribbled on that location, but I've no guesses what.

Hugh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-11 16:12                 ` Hugh Dickins
@ 2006-01-11 16:21                   ` Arjan van de Ven
  2006-01-11 16:58                   ` Octavio Alvarez Piza
  1 sibling, 0 replies; 19+ messages in thread
From: Arjan van de Ven @ 2006-01-11 16:21 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Octavio Alvarez Piza, Dave Jones, Andrew Morton, nickpiggin,
	linux-kernel


> 
> That means page->_mapcount contained 0xfffeffff when it should have

> We can't rule out that something somewhere in the kernel has
> scribbled on that location, but I've no guesses what.

could be an rwsem/rwlock



btw.. which video driver is in use? (X tends to do rather evil things at
times via /dev/mem, but that is very much driver specific)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-11 16:12                 ` Hugh Dickins
  2006-01-11 16:21                   ` Arjan van de Ven
@ 2006-01-11 16:58                   ` Octavio Alvarez Piza
  2006-01-11 17:18                     ` Hugh Dickins
  2006-01-11 17:24                     ` Andrew Morton
  1 sibling, 2 replies; 19+ messages in thread
From: Octavio Alvarez Piza @ 2006-01-11 16:58 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Arjan van de Ven, Andrew Morton, nickpiggin, linux-kernel

On Wed, 11 Jan 2006 16:12:54 +0000 (GMT)
Hugh Dickins <hugh@veritas.com> wrote:

> On Wed, 11 Jan 2006, Octavio Alvarez Piza wrote:
> > On Thu, 05 Jan 2006 11:00:41 -0800
> > "Octavio Alvarez" <alvarezp@alvarezp.ods.org> wrote:
> > 
> > I have found another instance of "bad_page_state" with mapcount:-1
> > before hitting BUG_ON().
> > 
> > Bad page state at free_hot_cold_page (in process 'X', page c1140c60)
> > flags:0x80010008 mapping:00000000 mapcount:-65536 count:0
> 
> No, that's mapcount -65536 not -1.
> 

That's right, this might be a different issue. Now that it was X and not
"kswap0d" and that Arjan has asked me, I've realized that I'm using the
binary nVidia driver. I had gotten pretty much the same issue with the
open driver, though. Still, since I changed kernels to 2.6.15, I'll try
again to catch the bad page state with the nv free driver.
 
> That means page->_mapcount contained 0xfffeffff when it should have
> contained 0xffffffff.  A single bit got cleared.  Probably bad memory,
> overheating, something of that kind.

BTW, what's the first 8 in flags:0x80010008? I can't find 1<<31 in
include/linux/page-flags.h

Octavio.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-11 16:58                   ` Octavio Alvarez Piza
@ 2006-01-11 17:18                     ` Hugh Dickins
  2006-01-11 17:24                     ` Andrew Morton
  1 sibling, 0 replies; 19+ messages in thread
From: Hugh Dickins @ 2006-01-11 17:18 UTC (permalink / raw)
  To: Octavio Alvarez Piza
  Cc: Dave Jones, Arjan van de Ven, Andrew Morton, nickpiggin, linux-kernel

On Wed, 11 Jan 2006, Octavio Alvarez Piza wrote:
> 
> BTW, what's the first 8 in flags:0x80010008? I can't find 1<<31 in
> include/linux/page-flags.h

It's the zone that page belongs to (you won't, I think, get involved
in nodes and sections): see helpful comment on "page->flags layout"
in include/linux/mm.h, and definitions in include/linux/mmzone.h.

I'd have to make a fool of myself by doing arithmetic in public,
probably getting it wrong, to tell you precisely which zone the
8 meant in 2.6.15 in your config; but it's not interesting anyway.

Hugh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: mm/rmap.c negative page map count BUG.
  2006-01-11 16:58                   ` Octavio Alvarez Piza
  2006-01-11 17:18                     ` Hugh Dickins
@ 2006-01-11 17:24                     ` Andrew Morton
  1 sibling, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2006-01-11 17:24 UTC (permalink / raw)
  To: Octavio Alvarez Piza; +Cc: hugh, davej, arjan, nickpiggin, linux-kernel

Octavio Alvarez Piza <alvarezp@alvarezp.ods.org> wrote:
>
> > That means page->_mapcount contained 0xfffeffff when it should have
>  > contained 0xffffffff.  A single bit got cleared.  Probably bad memory,
>  > overheating, something of that kind.
> 
>  BTW, what's the first 8 in flags:0x80010008? I can't find 1<<31 in
>  include/linux/page-flags.h

That's the page's zone identifier.  We stuff that into the high bits of
page->flags for page_zone().

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2006-01-11 17:25 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-03  8:26 mm/rmap.c negative page map count BUG Dave Jones
2006-01-03 11:42 ` Nick Piggin
2006-01-03 13:53   ` Dave Jones
2006-01-04 23:53     ` Andrew Morton
2006-01-04 23:56       ` Dave Jones
2006-01-05  0:16         ` Andrew Morton
2006-01-05  0:31           ` Dave Jones
2006-01-05  7:47       ` Dave Jones
2006-01-05  8:11         ` Arjan van de Ven
2006-01-05 11:15           ` Dave Jones
2006-01-05 11:18             ` Arjan van de Ven
2006-01-05 11:26               ` Dave Jones
2006-01-05 19:00             ` Octavio Alvarez
2006-01-11  8:01               ` Octavio Alvarez Piza
2006-01-11 16:12                 ` Hugh Dickins
2006-01-11 16:21                   ` Arjan van de Ven
2006-01-11 16:58                   ` Octavio Alvarez Piza
2006-01-11 17:18                     ` Hugh Dickins
2006-01-11 17:24                     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).