All of lore.kernel.org
 help / color / mirror / Atom feed
* [1/1] make bad_page() print all of page->flags
@ 2004-09-01  6:14 William Lee Irwin III
  2004-09-01  7:28 ` Ian Wienand
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: William Lee Irwin III @ 2004-09-01  6:14 UTC (permalink / raw)
  To: linux-ia64

On Wed, Sep 01, 2004 at 03:59:33PM +1000, Ian Wienand wrote:
> With 2.6.9-rc1 on IA64 I have this problem when I boot on an rx2600
> (with both SMP and uniprocessor builds) :
> Bad page state at free_hot_cold_page (in process 'swapper', page e0000000049bcd08)
> flags:0x00000000 mapping:0000000000000000 mapcount:1 count:0
> Backtrace:
> [-- no backtrace is printed, unfortunately, and the machine stops dead --]
> But I also noticed that virtual mem map was right at around the same
> place :
> Virtual mem_map starts at 0xe000000004928000
> So I turned that off in the config, and it appears to work OK.  I
> noticed there were a lot of rmap changes with respect to locking just
> recently put in, and I suspect they are the culprit.  I'm afraid this
> is a little over my head, but I'm willing to try any suggestions.

(1) is this struct page actually in your virtual mem_map?
(2) page->flags looks 32-bit, but ia64 doesn't define
	ARCH_HAS_ATOMIC_UNSIGNED that I can tell; what's going on there?
	Let's see if those flags are really all zero.
(3) "stops dead" isn't a very good description; deadlock? livelock?
	interrupts on or off?


-- wli

bad_page() only prints out 8 hexadecimal digits of page->flags regardless
of sizeof(page_flags_t). This leads to confusing and/or incomplete bug
reports. The following patch uses a field width argument to replace the
hardcoded %08lx so that bad_page() may print the whole of page->flags.


Index: mm2-2.6.9-rc1/mm/page_alloc.c
=================================--- mm2-2.6.9-rc1.orig/mm/page_alloc.c	2004-08-31 01:06:55.000000000 -0700
+++ mm2-2.6.9-rc1/mm/page_alloc.c	2004-08-31 23:06:23.558598368 -0700
@@ -79,9 +79,9 @@
 {
 	printk(KERN_EMERG "Bad page state at %s (in process '%s', page %p)\n",
 		function, current->comm, page);
-	printk(KERN_EMERG "flags:0x%08lx mapping:%p mapcount:%d count:%d\n",
-		(unsigned long)page->flags, page->mapping,
-		page_mapcount(page), page_count(page));
+	printk(KERN_EMERG "flags:0x%0*lx mapping:%p mapcount:%d count:%d\n",
+		(int)(2*sizeof(page_flags_t)), (unsigned long)page->flags,
+		page->mapping, page_mapcount(page), page_count(page));
 	printk(KERN_EMERG "Backtrace:\n");
 	dump_stack();
 	printk(KERN_EMERG "Trying to fix it up, but a reboot is needed\n");

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [1/1] make bad_page() print all of page->flags
  2004-09-01  6:14 [1/1] make bad_page() print all of page->flags William Lee Irwin III
@ 2004-09-01  7:28 ` Ian Wienand
  2004-09-01  7:39 ` William Lee Irwin III
  2004-09-01 15:24 ` Bjorn Helgaas
  2 siblings, 0 replies; 4+ messages in thread
From: Ian Wienand @ 2004-09-01  7:28 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 902 bytes --]

On Tue, Aug 31, 2004 at 11:14:43PM -0700, William Lee Irwin III wrote:
> (1) is this struct page actually in your virtual mem_map?

I'm assuming so, since when it's turned off it works?

> (2) page->flags looks 32-bit, but ia64 doesn't define
> 	ARCH_HAS_ATOMIC_UNSIGNED that I can tell; what's going on there?
> 	Let's see if those flags are really all zero.

Thanks, I applied the patch and it seems to be all zeros

Bad page state at free_hot_cold_page (in process 'swapper', page e000000004940c98)
flags:0x0000000000000000 mapping:0000000000000000 mapcount:1 count:0
Backtrace:
[-- stops again --]

> (3) "stops dead" isn't a very good description; deadlock? livelock?
> 	interrupts on or off?

It doesn't respond to magic-sysrq and I guess it takes an MCA because
it spontaneously reboots after about 10 seconds, though doesn't print
anything useful before it does that.

Thanks for you help,

-i

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [1/1] make bad_page() print all of page->flags
  2004-09-01  6:14 [1/1] make bad_page() print all of page->flags William Lee Irwin III
  2004-09-01  7:28 ` Ian Wienand
@ 2004-09-01  7:39 ` William Lee Irwin III
  2004-09-01 15:24 ` Bjorn Helgaas
  2 siblings, 0 replies; 4+ messages in thread
From: William Lee Irwin III @ 2004-09-01  7:39 UTC (permalink / raw)
  To: linux-ia64

On Tue, Aug 31, 2004 at 11:14:43PM -0700, William Lee Irwin III wrote:
>> (1) is this struct page actually in your virtual mem_map?

On Wed, Sep 01, 2004 at 05:28:17PM +1000, Ian Wienand wrote:
> I'm assuming so, since when it's turned off it works?

That's not quite enough to tell. Generally some kind of dump of the
index of the sparse array (probably the ptes mapping it) so that the
virtual addresses falling within it can be detected is needed to tell.


On Tue, Aug 31, 2004 at 11:14:43PM -0700, William Lee Irwin III wrote:
>> (2) page->flags looks 32-bit, but ia64 doesn't define
>> 	ARCH_HAS_ATOMIC_UNSIGNED that I can tell; what's going on there?
>> 	Let's see if those flags are really all zero.

On Wed, Sep 01, 2004 at 05:28:17PM +1000, Ian Wienand wrote:
> Thanks, I applied the patch and it seems to be all zeros
> Bad page state at free_hot_cold_page (in process 'swapper', page e000000004940c98)
> flags:0x0000000000000000 mapping:0000000000000000 mapcount:1 count:0
> Backtrace:
> [-- stops again --]

My bet is that this is not a real page, or otherwise that it's
uninitialized and some kind of boundary conditions in
free_area_init_core() or surrounding functions have been flubbed.


On Tue, Aug 31, 2004 at 11:14:43PM -0700, William Lee Irwin III wrote:
>> (3) "stops dead" isn't a very good description; deadlock? livelock?
>> 	interrupts on or off?

On Wed, Sep 01, 2004 at 05:28:17PM +1000, Ian Wienand wrote:
> It doesn't respond to magic-sysrq and I guess it takes an MCA because
> it spontaneously reboots after about 10 seconds, though doesn't print
> anything useful before it does that.

"Fatal machine check" sounds like a good description of this.


-- wli

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [1/1] make bad_page() print all of page->flags
  2004-09-01  6:14 [1/1] make bad_page() print all of page->flags William Lee Irwin III
  2004-09-01  7:28 ` Ian Wienand
  2004-09-01  7:39 ` William Lee Irwin III
@ 2004-09-01 15:24 ` Bjorn Helgaas
  2 siblings, 0 replies; 4+ messages in thread
From: Bjorn Helgaas @ 2004-09-01 15:24 UTC (permalink / raw)
  To: linux-ia64

On Wednesday 01 September 2004 1:28 am, Ian Wienand wrote:
> On Tue, Aug 31, 2004 at 11:14:43PM -0700, William Lee Irwin III wrote:
> > (3) "stops dead" isn't a very good description; deadlock? livelock?
> > 	interrupts on or off?
> 
> It doesn't respond to magic-sysrq and I guess it takes an MCA because
> it spontaneously reboots after about 10 seconds, though doesn't print
> anything useful before it does that.

You mentioned this was an HP rx2600.  You can tell for sure whether it
was an MCA (and get useful information about it) by doing "errdump clear"
at the EFI shell prompt before reproducing the problem, stopping at the
EFI shell prompt again during the subsequent reboot, and doing
"errdump mca".

There's also a "salinfo" package that automatically retrieves and logs
this information.  It's part of Debian, there are sources here:

    ftp://ftp.kernel.org/pub/linux/kernel/people/helgaas

or I can send you an RPM for RHEL3 if you need it.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-09-01 15:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-01  6:14 [1/1] make bad_page() print all of page->flags William Lee Irwin III
2004-09-01  7:28 ` Ian Wienand
2004-09-01  7:39 ` William Lee Irwin III
2004-09-01 15:24 ` Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.