Re: Make sure we populate the initroot filesystem late enough

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: Make sure we populate the initroot filesystem late enough
       [not found] <200612112059.kBBKx1j7022473@hera.kernel.org>
@ 2007-02-26  0:00   ` David Woodhouse
  0 siblings, 0 replies; 56+ messages in thread
From: David Woodhouse @ 2007-02-26  0:00 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: torvalds, linuxppc-dev, john stultz

On Mon, 2006-12-11 at 20:59 +0000, Linux Kernel Mailing List wrote:
> Gitweb:     http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8d610dd52dd1da696e199e4b4545f33a2a5de5c6
> Commit:     8d610dd52dd1da696e199e4b4545f33a2a5de5c6
> Parent:     8993780a6e44fb4e7ed34e33458506a775356c6e
> Author:     Linus Torvalds <torvalds@woody.osdl.org>
> AuthorDate: Mon Dec 11 12:12:04 2006 -0800
> Committer:  Linus Torvalds <torvalds@woody.osdl.org>
> CommitDate: Mon Dec 11 12:12:04 2006 -0800
> 
>     Make sure we populate the initroot filesystem late enough
>     
>     We should not initialize rootfs before all the core initializers have
>     run.  So do it as a separate stage just before starting the regular
>     driver initializers.
>     
>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>

This seems to be what's triggering the apparent memory corruption we've
been seeing recently -- in the case of the Fedora kernel it manifests
itself as a BUG() in cache_alloc_refill() when the pmac ide driver
initialises.

Another report was at http://lkml.org/lkml/2006/12/17/4

We've been seeing it on a Mac Mini too, and I managed to reproduce it on
my shinybook this evening by booting with 'mem=512M'.

One side-effect of this patch is to move the call to free_initrd() much
later in the init sequence, potentially after other memory management
code is assuming it's already been freed.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26  0:00   ` David Woodhouse
  0 siblings, 0 replies; 56+ messages in thread
From: David Woodhouse @ 2007-02-26  0:00 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: linuxppc-dev, torvalds, john stultz

On Mon, 2006-12-11 at 20:59 +0000, Linux Kernel Mailing List wrote:
> Gitweb:     http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8d610dd52dd1da696e199e4b4545f33a2a5de5c6
> Commit:     8d610dd52dd1da696e199e4b4545f33a2a5de5c6
> Parent:     8993780a6e44fb4e7ed34e33458506a775356c6e
> Author:     Linus Torvalds <torvalds@woody.osdl.org>
> AuthorDate: Mon Dec 11 12:12:04 2006 -0800
> Committer:  Linus Torvalds <torvalds@woody.osdl.org>
> CommitDate: Mon Dec 11 12:12:04 2006 -0800
> 
>     Make sure we populate the initroot filesystem late enough
>     
>     We should not initialize rootfs before all the core initializers have
>     run.  So do it as a separate stage just before starting the regular
>     driver initializers.
>     
>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>

This seems to be what's triggering the apparent memory corruption we've
been seeing recently -- in the case of the Fedora kernel it manifests
itself as a BUG() in cache_alloc_refill() when the pmac ide driver
initialises.

Another report was at http://lkml.org/lkml/2006/12/17/4

We've been seeing it on a Mac Mini too, and I managed to reproduce it on
my shinybook this evening by booting with 'mem=512M'.

One side-effect of this patch is to move the call to free_initrd() much
later in the init sequence, potentially after other memory management
code is assuming it's already been freed.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26  0:00   ` David Woodhouse
@ 2007-02-26  0:24     ` Linus Torvalds
  -1 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2007-02-26  0:24 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linux Kernel Mailing List, linuxppc-dev, john stultz

On Sun, 25 Feb 2007, David Woodhouse wrote:
> 
> One side-effect of this patch is to move the call to free_initrd() much
> later in the init sequence, potentially after other memory management
> code is assuming it's already been freed.

Hmm. No, I don't think that should be a problem. free_initmem() only 
happens at the very, after do_basic_setup() has been run, which includes 
all the initcall stuff.

However, it's an interesting observation. How sure are you that it's this 
commit that triggers it. You say "This seems to be what's triggering ..", 
I'm wondering how firm that is..

		Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26  0:24     ` Linus Torvalds
  0 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2007-02-26  0:24 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linuxppc-dev, Linux Kernel Mailing List, john stultz

On Sun, 25 Feb 2007, David Woodhouse wrote:
> 
> One side-effect of this patch is to move the call to free_initrd() much
> later in the init sequence, potentially after other memory management
> code is assuming it's already been freed.

Hmm. No, I don't think that should be a problem. free_initmem() only 
happens at the very, after do_basic_setup() has been run, which includes 
all the initcall stuff.

However, it's an interesting observation. How sure are you that it's this 
commit that triggers it. You say "This seems to be what's triggering ..", 
I'm wondering how firm that is..

		Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26  0:24     ` Linus Torvalds
@ 2007-02-26  0:45       ` David Woodhouse
  -1 siblings, 0 replies; 56+ messages in thread
From: David Woodhouse @ 2007-02-26  0:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linuxppc-dev, john stultz

On Sun, 2007-02-25 at 16:24 -0800, Linus Torvalds wrote:
> Hmm. No, I don't think that should be a problem. free_initmem() only 
> happens at the very, after do_basic_setup() has been run, which includes 
> all the initcall stuff.

> However, it's an interesting observation. How sure are you that it's this 
> commit that triggers it. You say "This seems to be what's triggering ..", 
> I'm wondering how firm that is.. 

I found it with git-bisect. The Fedora kernel has been broken on this
particular 512MiB Mac Mini for a while, and now I've reverted the patch
it seems to be fine again. So I'm fairly sure. I'll be surer in a few
minutes once the full RPM build has finished with the patch reverted.

Of course, it could easily be an entirely separate bug which by some
bizarre coincidence is just triggered by this.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26  0:45       ` David Woodhouse
  0 siblings, 0 replies; 56+ messages in thread
From: David Woodhouse @ 2007-02-26  0:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linuxppc-dev, Linux Kernel Mailing List, john stultz

On Sun, 2007-02-25 at 16:24 -0800, Linus Torvalds wrote:
> Hmm. No, I don't think that should be a problem. free_initmem() only 
> happens at the very, after do_basic_setup() has been run, which includes 
> all the initcall stuff.

> However, it's an interesting observation. How sure are you that it's this 
> commit that triggers it. You say "This seems to be what's triggering ..", 
> I'm wondering how firm that is.. 

I found it with git-bisect. The Fedora kernel has been broken on this
particular 512MiB Mac Mini for a while, and now I've reverted the patch
it seems to be fine again. So I'm fairly sure. I'll be surer in a few
minutes once the full RPM build has finished with the patch reverted.

Of course, it could easily be an entirely separate bug which by some
bizarre coincidence is just triggered by this.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26  0:24     ` Linus Torvalds
@ 2007-02-26  1:17       ` David Woodhouse
  -1 siblings, 0 replies; 56+ messages in thread
From: David Woodhouse @ 2007-02-26  1:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linuxppc-dev, john stultz

On Sun, 2007-02-25 at 16:24 -0800, Linus Torvalds wrote:
> Hmm. No, I don't think that should be a problem. free_initmem() only 
> happens at the very, after do_basic_setup() has been run, which
> includes all the initcall stuff.

I'm inclined to agree that it _shouldn't_ be a problem. Nevertheless,
even this hack seems sufficient to 'fix' it:

--- arch/powerpc/mm/init_32.c   2007-02-25 20:06:54.000000000 -0500
+++ arch/powerpc/mm/init_32.c.not       2007-02-25 20:06:41.000000000 -0500
@@ -243,13 +243,14 @@ void free_initmem(void)
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
        if (start < end)
-               printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
+               printk ("NOT Freeing initrd memory: %ldKiB would be freed\n", (end - start) >> 10);
+       return;
        for (; start < end; start += PAGE_SIZE) {
                ClearPageReserved(virt_to_page(start));
                init_page_count(virt_to_page(start));
                free_page(start);
                totalram_pages++;
        }
 }
 #endif


-- 
dwmw2


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26  1:17       ` David Woodhouse
  0 siblings, 0 replies; 56+ messages in thread
From: David Woodhouse @ 2007-02-26  1:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linuxppc-dev, Linux Kernel Mailing List, john stultz

On Sun, 2007-02-25 at 16:24 -0800, Linus Torvalds wrote:
> Hmm. No, I don't think that should be a problem. free_initmem() only 
> happens at the very, after do_basic_setup() has been run, which
> includes all the initcall stuff.

I'm inclined to agree that it _shouldn't_ be a problem. Nevertheless,
even this hack seems sufficient to 'fix' it:

--- arch/powerpc/mm/init_32.c   2007-02-25 20:06:54.000000000 -0500
+++ arch/powerpc/mm/init_32.c.not       2007-02-25 20:06:41.000000000 -0500
@@ -243,13 +243,14 @@ void free_initmem(void)
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
        if (start < end)
-               printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
+               printk ("NOT Freeing initrd memory: %ldKiB would be freed\n", (end - start) >> 10);
+       return;
        for (; start < end; start += PAGE_SIZE) {
                ClearPageReserved(virt_to_page(start));
                init_page_count(virt_to_page(start));
                free_page(start);
                totalram_pages++;
        }
 }
 #endif


-- 
dwmw2

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26  1:17       ` David Woodhouse
@ 2007-02-26  3:45         ` Linus Torvalds
  -1 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2007-02-26  3:45 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linux Kernel Mailing List, linuxppc-dev, john stultz

On Sun, 25 Feb 2007, David Woodhouse wrote:
> 
> I'm inclined to agree that it _shouldn't_ be a problem. Nevertheless,
> even this hack seems sufficient to 'fix' it:

Ok. Clearly something is using that memory. That said, I *suspect* that 
the commit that you bisected to is just showing the problem indirectly. 
The ordering shouldn't make any difference, but it can obviously make a 
huge difference in various allocation patterns etc, thus just showing a 
pre-existing problem more clearly..

Can you try adding something like

	memset(start, 0xf0, end - start);

to before the return? That might give a better idea of exactly what is 
using it after it's free'd, hopefully by having the user trigger some more 
spectacular oops..

It is, of course, also entirely possible that the rootfs unpacking change 
really *was* buggy, and I am just missing something totally obvious. The 
memset() might still make it more obvious, though. Maybe.

>         if (start < end)
> -               printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
> +               printk ("NOT Freeing initrd memory: %ldKiB would be freed\n", (end - start) >> 10);

.. so adding the "memset()" here would be what I'm suggesting ..

> +       return;

.. and you might as well leave the return there, so that nobody else comes 
along and re-uses the memory. That should just improve on the chances of 
the memset() hopefully catching the problem..

		Linus "I don't see anything wrong" Torvalds

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26  3:45         ` Linus Torvalds
  0 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2007-02-26  3:45 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linuxppc-dev, Linux Kernel Mailing List, john stultz

On Sun, 25 Feb 2007, David Woodhouse wrote:
> 
> I'm inclined to agree that it _shouldn't_ be a problem. Nevertheless,
> even this hack seems sufficient to 'fix' it:

Ok. Clearly something is using that memory. That said, I *suspect* that 
the commit that you bisected to is just showing the problem indirectly. 
The ordering shouldn't make any difference, but it can obviously make a 
huge difference in various allocation patterns etc, thus just showing a 
pre-existing problem more clearly..

Can you try adding something like

	memset(start, 0xf0, end - start);

to before the return? That might give a better idea of exactly what is 
using it after it's free'd, hopefully by having the user trigger some more 
spectacular oops..

It is, of course, also entirely possible that the rootfs unpacking change 
really *was* buggy, and I am just missing something totally obvious. The 
memset() might still make it more obvious, though. Maybe.

>         if (start < end)
> -               printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
> +               printk ("NOT Freeing initrd memory: %ldKiB would be freed\n", (end - start) >> 10);

.. so adding the "memset()" here would be what I'm suggesting ..

> +       return;

.. and you might as well leave the return there, so that nobody else comes 
along and re-uses the memory. That should just improve on the chances of 
the memset() hopefully catching the problem..

		Linus "I don't see anything wrong" Torvalds

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26  3:45         ` Linus Torvalds
@ 2007-02-26  4:01           ` David Woodhouse
  -1 siblings, 0 replies; 56+ messages in thread
From: David Woodhouse @ 2007-02-26  4:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linuxppc-dev, john stultz

On Sun, 2007-02-25 at 19:45 -0800, Linus Torvalds wrote:
> Ok. Clearly something is using that memory. That said, I *suspect* that 
> the commit that you bisected to is just showing the problem indirectly. 
> The ordering shouldn't make any difference, but it can obviously make a 
> huge difference in various allocation patterns etc, thus just showing a 
> pre-existing problem more clearly..

Indeed.

> Can you try adding something like
> 
>         memset(start, 0xf0, end - start);

Yeah, I did that before giving up on it for the day and going in search
of dinner. It changes the failure mode to a BUG() in
cache_free_debugcheck(), at line 2876 of mm/slab.c

It smells like the pages weren't actually reserved in the first place
and we were blithely allocating them. The only problem with that theory
is that the initrd doesn't seem to be getting corrupted -- and if we
were handing out its pages like that then surely _something_ would have
scribbled on it before we tried to read it.

When I head back in tomorrow morning I'll instrument free_initrd_mem()
to check that the PageReserved bit was actually set on each page, before
clearing it. And I'll make the page allocation routines check whether
they're giving out pages between initrd_start and initrd_end, etc.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26  4:01           ` David Woodhouse
  0 siblings, 0 replies; 56+ messages in thread
From: David Woodhouse @ 2007-02-26  4:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linuxppc-dev, Linux Kernel Mailing List, john stultz

On Sun, 2007-02-25 at 19:45 -0800, Linus Torvalds wrote:
> Ok. Clearly something is using that memory. That said, I *suspect* that 
> the commit that you bisected to is just showing the problem indirectly. 
> The ordering shouldn't make any difference, but it can obviously make a 
> huge difference in various allocation patterns etc, thus just showing a 
> pre-existing problem more clearly..

Indeed.

> Can you try adding something like
> 
>         memset(start, 0xf0, end - start);

Yeah, I did that before giving up on it for the day and going in search
of dinner. It changes the failure mode to a BUG() in
cache_free_debugcheck(), at line 2876 of mm/slab.c

It smells like the pages weren't actually reserved in the first place
and we were blithely allocating them. The only problem with that theory
is that the initrd doesn't seem to be getting corrupted -- and if we
were handing out its pages like that then surely _something_ would have
scribbled on it before we tried to read it.

When I head back in tomorrow morning I'll instrument free_initrd_mem()
to check that the PageReserved bit was actually set on each page, before
clearing it. And I'll make the page allocation routines check whether
they're giving out pages between initrd_start and initrd_end, etc.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26  4:01           ` David Woodhouse
@ 2007-02-26  4:13             ` Linus Torvalds
  -1 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2007-02-26  4:13 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linux Kernel Mailing List, linuxppc-dev, john stultz



On Sun, 25 Feb 2007, David Woodhouse wrote:
>
> > Can you try adding something like
> > 
> >         memset(start, 0xf0, end - start);
> 
> Yeah, I did that before giving up on it for the day and going in search
> of dinner. It changes the failure mode to a BUG() in
> cache_free_debugcheck(), at line 2876 of mm/slab.c

Ok, that's just strange. 

One obvious thing to do would be to remove all the "__initdata" entries in 
mm/slab.c.. But I'd also like to see the full backtrace for the  BUG_ON(), 
in case that gives any clues at all.

> It smells like the pages weren't actually reserved in the first place
> and we were blithely allocating them. The only problem with that theory
> is that the initrd doesn't seem to be getting corrupted -- and if we
> were handing out its pages like that then surely _something_ would have
> scribbled on it before we tried to read it.

Yeah, I don't think it's necessarily initrd itself, I'd be more inclined 
to think that the reason you see this change with the initrd unpacking is 
simply that it does a lot of allocations for the initrd files, so I think 
it is only indirectly involved - just because it ends up being a slab 
user.

> When I head back in tomorrow morning I'll instrument free_initrd_mem()
> to check that the PageReserved bit was actually set on each page, before
> clearing it. And I'll make the page allocation routines check whether
> they're giving out pages between initrd_start and initrd_end, etc.

Sounds like a sane plan.

			Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26  4:13             ` Linus Torvalds
  0 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2007-02-26  4:13 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linuxppc-dev, Linux Kernel Mailing List, john stultz



On Sun, 25 Feb 2007, David Woodhouse wrote:
>
> > Can you try adding something like
> > 
> >         memset(start, 0xf0, end - start);
> 
> Yeah, I did that before giving up on it for the day and going in search
> of dinner. It changes the failure mode to a BUG() in
> cache_free_debugcheck(), at line 2876 of mm/slab.c

Ok, that's just strange. 

One obvious thing to do would be to remove all the "__initdata" entries in 
mm/slab.c.. But I'd also like to see the full backtrace for the  BUG_ON(), 
in case that gives any clues at all.

> It smells like the pages weren't actually reserved in the first place
> and we were blithely allocating them. The only problem with that theory
> is that the initrd doesn't seem to be getting corrupted -- and if we
> were handing out its pages like that then surely _something_ would have
> scribbled on it before we tried to read it.

Yeah, I don't think it's necessarily initrd itself, I'd be more inclined 
to think that the reason you see this change with the initrd unpacking is 
simply that it does a lot of allocations for the initrd files, so I think 
it is only indirectly involved - just because it ends up being a slab 
user.

> When I head back in tomorrow morning I'll instrument free_initrd_mem()
> to check that the PageReserved bit was actually set on each page, before
> clearing it. And I'll make the page allocation routines check whether
> they're giving out pages between initrd_start and initrd_end, etc.

Sounds like a sane plan.

			Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26  4:01           ` David Woodhouse
@ 2007-02-26  6:59             ` William Lee Irwin III
  -1 siblings, 0 replies; 56+ messages in thread
From: William Lee Irwin III @ 2007-02-26  6:59 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, Linux Kernel Mailing List, linuxppc-dev, john stultz

On Sun, Feb 25, 2007 at 11:01:06PM -0500, David Woodhouse wrote:
> Yeah, I did that before giving up on it for the day and going in search
> of dinner. It changes the failure mode to a BUG() in
> cache_free_debugcheck(), at line 2876 of mm/slab.c
> It smells like the pages weren't actually reserved in the first place
> and we were blithely allocating them. The only problem with that theory
> is that the initrd doesn't seem to be getting corrupted -- and if we
> were handing out its pages like that then surely _something_ would have
> scribbled on it before we tried to read it.
> When I head back in tomorrow morning I'll instrument free_initrd_mem()
> to check that the PageReserved bit was actually set on each page, before
> clearing it. And I'll make the page allocation routines check whether
> they're giving out pages between initrd_start and initrd_end, etc.

Another few things to try would be inserting checks in page_alloc.c for
pages in that specific range before some flag set in free_initrd_mem()
is set, and (conflicting with that, though easily reconciled) unmapping
initrd memory in free_initrd_mem() instead of freeing it.


-- wli

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26  6:59             ` William Lee Irwin III
  0 siblings, 0 replies; 56+ messages in thread
From: William Lee Irwin III @ 2007-02-26  6:59 UTC (permalink / raw)
  To: David Woodhouse
  Cc: linuxppc-dev, Linus Torvalds, Linux Kernel Mailing List, john stultz

On Sun, Feb 25, 2007 at 11:01:06PM -0500, David Woodhouse wrote:
> Yeah, I did that before giving up on it for the day and going in search
> of dinner. It changes the failure mode to a BUG() in
> cache_free_debugcheck(), at line 2876 of mm/slab.c
> It smells like the pages weren't actually reserved in the first place
> and we were blithely allocating them. The only problem with that theory
> is that the initrd doesn't seem to be getting corrupted -- and if we
> were handing out its pages like that then surely _something_ would have
> scribbled on it before we tried to read it.
> When I head back in tomorrow morning I'll instrument free_initrd_mem()
> to check that the PageReserved bit was actually set on each page, before
> clearing it. And I'll make the page allocation routines check whether
> they're giving out pages between initrd_start and initrd_end, etc.

Another few things to try would be inserting checks in page_alloc.c for
pages in that specific range before some flag set in free_initrd_mem()
is set, and (conflicting with that, though easily reconciled) unmapping
initrd memory in free_initrd_mem() instead of freeing it.


-- wli

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26  1:17       ` David Woodhouse
@ 2007-02-26 15:51         ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 56+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-26 15:51 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, linuxppc-dev, Linux Kernel Mailing List, john stultz

On Sun, 2007-02-25 at 20:17 -0500, David Woodhouse wrote:
> On Sun, 2007-02-25 at 16:24 -0800, Linus Torvalds wrote:
> > Hmm. No, I don't think that should be a problem. free_initmem() only 
> > happens at the very, after do_basic_setup() has been run, which
> > includes all the initcall stuff.
> 
> I'm inclined to agree that it _shouldn't_ be a problem. Nevertheless,
> even this hack seems sufficient to 'fix' it:

Could be a powerpc specific bug in initrd handling... I'm still
traveling so I can't really look at it right now, but I wouldn't be
surprised if some of that code did indeed bitrot.

Ben.

> --- arch/powerpc/mm/init_32.c   2007-02-25 20:06:54.000000000 -0500
> +++ arch/powerpc/mm/init_32.c.not       2007-02-25 20:06:41.000000000 -0500
> @@ -243,13 +243,14 @@ void free_initmem(void)
>  #ifdef CONFIG_BLK_DEV_INITRD
>  void free_initrd_mem(unsigned long start, unsigned long end)
>  {
>         if (start < end)
> -               printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
> +               printk ("NOT Freeing initrd memory: %ldKiB would be freed\n", (end - start) >> 10);
> +       return;
>         for (; start < end; start += PAGE_SIZE) {
>                 ClearPageReserved(virt_to_page(start));
>                 init_page_count(virt_to_page(start));
>                 free_page(start);
>                 totalram_pages++;
>         }
>  }
>  #endif
> 
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26 15:51         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 56+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-26 15:51 UTC (permalink / raw)
  To: David Woodhouse
  Cc: linuxppc-dev, Linus Torvalds, Linux Kernel Mailing List, john stultz

On Sun, 2007-02-25 at 20:17 -0500, David Woodhouse wrote:
> On Sun, 2007-02-25 at 16:24 -0800, Linus Torvalds wrote:
> > Hmm. No, I don't think that should be a problem. free_initmem() only 
> > happens at the very, after do_basic_setup() has been run, which
> > includes all the initcall stuff.
> 
> I'm inclined to agree that it _shouldn't_ be a problem. Nevertheless,
> even this hack seems sufficient to 'fix' it:

Could be a powerpc specific bug in initrd handling... I'm still
traveling so I can't really look at it right now, but I wouldn't be
surprised if some of that code did indeed bitrot.

Ben.

> --- arch/powerpc/mm/init_32.c   2007-02-25 20:06:54.000000000 -0500
> +++ arch/powerpc/mm/init_32.c.not       2007-02-25 20:06:41.000000000 -0500
> @@ -243,13 +243,14 @@ void free_initmem(void)
>  #ifdef CONFIG_BLK_DEV_INITRD
>  void free_initrd_mem(unsigned long start, unsigned long end)
>  {
>         if (start < end)
> -               printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
> +               printk ("NOT Freeing initrd memory: %ldKiB would be freed\n", (end - start) >> 10);
> +       return;
>         for (; start < end; start += PAGE_SIZE) {
>                 ClearPageReserved(virt_to_page(start));
>                 init_page_count(virt_to_page(start));
>                 free_page(start);
>                 totalram_pages++;
>         }
>  }
>  #endif
> 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26  4:01           ` David Woodhouse
@ 2007-02-26 15:53             ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 56+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-26 15:53 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, linuxppc-dev, Linux Kernel Mailing List, john stultz

On Sun, 2007-02-25 at 23:01 -0500, David Woodhouse wrote:

> Yeah, I did that before giving up on it for the day and going in search
> of dinner. It changes the failure mode to a BUG() in
> cache_free_debugcheck(), at line 2876 of mm/slab.c
> 
> It smells like the pages weren't actually reserved in the first place
> and we were blithely allocating them. The only problem with that theory
> is that the initrd doesn't seem to be getting corrupted -- and if we
> were handing out its pages like that then surely _something_ would have
> scribbled on it before we tried to read it.
> 
> When I head back in tomorrow morning I'll instrument free_initrd_mem()
> to check that the PageReserved bit was actually set on each page, before
> clearing it. And I'll make the page allocation routines check whether
> they're giving out pages between initrd_start and initrd_end, etc.

And check that we didn't end up stupidly having the initrd share a page
with something else ... (like not aligned end or such thingy).

Ben.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26 15:53             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 56+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-26 15:53 UTC (permalink / raw)
  To: David Woodhouse
  Cc: linuxppc-dev, Linus Torvalds, Linux Kernel Mailing List, john stultz

On Sun, 2007-02-25 at 23:01 -0500, David Woodhouse wrote:

> Yeah, I did that before giving up on it for the day and going in search
> of dinner. It changes the failure mode to a BUG() in
> cache_free_debugcheck(), at line 2876 of mm/slab.c
> 
> It smells like the pages weren't actually reserved in the first place
> and we were blithely allocating them. The only problem with that theory
> is that the initrd doesn't seem to be getting corrupted -- and if we
> were handing out its pages like that then surely _something_ would have
> scribbled on it before we tried to read it.
> 
> When I head back in tomorrow morning I'll instrument free_initrd_mem()
> to check that the PageReserved bit was actually set on each page, before
> clearing it. And I'll make the page allocation routines check whether
> they're giving out pages between initrd_start and initrd_end, etc.

And check that we didn't end up stupidly having the initrd share a page
with something else ... (like not aligned end or such thingy).

Ben.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26 15:53             ` Benjamin Herrenschmidt
@ 2007-02-26 16:00               ` Segher Boessenkool
  -1 siblings, 0 replies; 56+ messages in thread
From: Segher Boessenkool @ 2007-02-26 16:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Woodhouse, john stultz, Linus Torvalds,
	Linux Kernel Mailing List, linuxppc-dev

> And check that we didn't end up stupidly having the initrd share a page
> with something else ... (like not aligned end or such thingy).

David tested that yesterday, it's not the case.  Too bad,
would have been too easy ;-)


Segher


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26 16:00               ` Segher Boessenkool
  0 siblings, 0 replies; 56+ messages in thread
From: Segher Boessenkool @ 2007-02-26 16:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: john stultz, Linus Torvalds, David Woodhouse,
	Linux Kernel Mailing List, linuxppc-dev

> And check that we didn't end up stupidly having the initrd share a page
> with something else ... (like not aligned end or such thingy).

David tested that yesterday, it's not the case.  Too bad,
would have been too easy ;-)


Segher

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26  4:13             ` Linus Torvalds
@ 2007-02-26 16:24               ` David Woodhouse
  -1 siblings, 0 replies; 56+ messages in thread
From: David Woodhouse @ 2007-02-26 16:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, linuxppc-dev, john stultz

On Sun, 2007-02-25 at 20:13 -0800, Linus Torvalds wrote:
> 
> On Sun, 25 Feb 2007, David Woodhouse wrote:
> >
> > > Can you try adding something like
> > > 
> > >         memset(start, 0xf0, end - start);
> > 
> > Yeah, I did that before giving up on it for the day and going in search
> > of dinner. It changes the failure mode to a BUG() in
> > cache_free_debugcheck(), at line 2876 of mm/slab.c
>
> Ok, that's just strange. 

In this case I hadn't left the 'return' in free_initrd_mem(). I was
poisoning the pages and then returning them to the pool as usual.

If I poison the pages and _don't_ return them to the pool, it boots
fine. PageReserved is set on every page in the initrd region; total
page_count() is equal to the number of pages (which doesn't
_necessarily_ mean that page_count() for every page is equal to 1 but
it's a strong hint that that's the case).

Looking in /dev/mem after it boots, I see that my poison is still
present throughout the whole region.

> One obvious thing to do would be to remove all the "__initdata" entries in 
> mm/slab.c..

This is biting us long before we call free_initmem().

>  But I'd also like to see the full backtrace for the  BUG_ON(), 
> in case that gives any clues at all.

I'll see if I can find a camera. 

> > It smells like the pages weren't actually reserved in the first place
> > and we were blithely allocating them. The only problem with that theory
> > is that the initrd doesn't seem to be getting corrupted -- and if we
> > were handing out its pages like that then surely _something_ would have
> > scribbled on it before we tried to read it.
> 
> Yeah, I don't think it's necessarily initrd itself, I'd be more inclined 
> to think that the reason you see this change with the initrd unpacking is 
> simply that it does a lot of allocations for the initrd files, so I think 
> it is only indirectly involved - just because it ends up being a slab 
> user.

Whatever happens, initrd as a 'slab user' is fine. The crashes happen
_later_, when someone else is using the memory which used to belong to
the initrd. In that 'BUG at slab.c:2876' I mentioned above, r3 was
within the initrd region. As I said, I'll try to find a camera.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26 16:24               ` David Woodhouse
  0 siblings, 0 replies; 56+ messages in thread
From: David Woodhouse @ 2007-02-26 16:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linuxppc-dev, Linux Kernel Mailing List, john stultz

On Sun, 2007-02-25 at 20:13 -0800, Linus Torvalds wrote:
> 
> On Sun, 25 Feb 2007, David Woodhouse wrote:
> >
> > > Can you try adding something like
> > > 
> > >         memset(start, 0xf0, end - start);
> > 
> > Yeah, I did that before giving up on it for the day and going in search
> > of dinner. It changes the failure mode to a BUG() in
> > cache_free_debugcheck(), at line 2876 of mm/slab.c
>
> Ok, that's just strange. 

In this case I hadn't left the 'return' in free_initrd_mem(). I was
poisoning the pages and then returning them to the pool as usual.

If I poison the pages and _don't_ return them to the pool, it boots
fine. PageReserved is set on every page in the initrd region; total
page_count() is equal to the number of pages (which doesn't
_necessarily_ mean that page_count() for every page is equal to 1 but
it's a strong hint that that's the case).

Looking in /dev/mem after it boots, I see that my poison is still
present throughout the whole region.

> One obvious thing to do would be to remove all the "__initdata" entries in 
> mm/slab.c..

This is biting us long before we call free_initmem().

>  But I'd also like to see the full backtrace for the  BUG_ON(), 
> in case that gives any clues at all.

I'll see if I can find a camera. 

> > It smells like the pages weren't actually reserved in the first place
> > and we were blithely allocating them. The only problem with that theory
> > is that the initrd doesn't seem to be getting corrupted -- and if we
> > were handing out its pages like that then surely _something_ would have
> > scribbled on it before we tried to read it.
> 
> Yeah, I don't think it's necessarily initrd itself, I'd be more inclined 
> to think that the reason you see this change with the initrd unpacking is 
> simply that it does a lot of allocations for the initrd files, so I think 
> it is only indirectly involved - just because it ends up being a slab 
> user.

Whatever happens, initrd as a 'slab user' is fine. The crashes happen
_later_, when someone else is using the memory which used to belong to
the initrd. In that 'BUG at slab.c:2876' I mentioned above, r3 was
within the initrd region. As I said, I'll try to find a camera.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26  4:01           ` David Woodhouse
@ 2007-02-26 16:44             ` Milton Miller
  -1 siblings, 0 replies; 56+ messages in thread
From: Milton Miller @ 2007-02-26 16:44 UTC (permalink / raw)
  To: David Woodhouse; +Cc: LKML, linuxppc-dev

On Feb 27, 2007, at 2:24 AM, David Woodhouse wrote:
> On Sun, 2007-02-25 at 20:13 -0800, Linus Torvalds wrote:
>> On Sun, 25 Feb 2007, David Woodhouse wrote:
>>>> Can you try adding something like
>>>>
>>>>         memset(start, 0xf0, end - start);
>>>
>>> Yeah, I did that before giving up on it for the day and going in 
>>> search
>>> of dinner. It changes the failure mode to a BUG() in
>>> cache_free_debugcheck(), at line 2876 of mm/slab.c
>>
>> Ok, that's just strange.
>
> In this case I hadn't left the 'return' in free_initrd_mem(). I was
> poisoning the pages and then returning them to the pool as usual.
>
> If I poison the pages and _don't_ return them to the pool, it boots
> fine. PageReserved is set on every page in the initrd region; total
> page_count() is equal to the number of pages (which doesn't
> _necessarily_ mean that page_count() for every page is equal to 1 but
> it's a strong hint that that's the case).
>
> Looking in /dev/mem after it boots, I see that my poison is still
> present throughout the whole region.
>
>> One obvious thing to do would be to remove all the "__initdata" 
>> entries in
>> mm/slab.c..
>
> This is biting us long before we call free_initmem().
>
>>  But I'd also like to see the full backtrace for the  BUG_ON(),
>> in case that gives any clues at all.
>
> I'll see if I can find a camera.
>
>>> It smells like the pages weren't actually reserved in the first place
>>> and we were blithely allocating them. The only problem with that 
>>> theory
>>> is that the initrd doesn't seem to be getting corrupted -- and if we
>>> were handing out its pages like that then surely _something_ would 
>>> have
>>> scribbled on it before we tried to read it.
>>
>> Yeah, I don't think it's necessarily initrd itself, I'd be more 
>> inclined
>> to think that the reason you see this change with the initrd 
>> unpacking is
>> simply that it does a lot of allocations for the initrd files, so I 
>> think
>> it is only indirectly involved - just because it ends up being a slab
>> user.
>
> Whatever happens, initrd as a 'slab user' is fine. The crashes happen
> _later_, when someone else is using the memory which used to belong to
> the initrd. In that 'BUG at slab.c:2876' I mentioned above, r3 was
> within the initrd region. As I said, I'll try to find a camera.


Just a thought,

Any chance you are using one of the unusal code paths, like the 
bootloader
moving the initrd or using a kernel-crash region?


milton


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26 16:44             ` Milton Miller
  0 siblings, 0 replies; 56+ messages in thread
From: Milton Miller @ 2007-02-26 16:44 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linuxppc-dev, LKML

On Feb 27, 2007, at 2:24 AM, David Woodhouse wrote:
> On Sun, 2007-02-25 at 20:13 -0800, Linus Torvalds wrote:
>> On Sun, 25 Feb 2007, David Woodhouse wrote:
>>>> Can you try adding something like
>>>>
>>>>         memset(start, 0xf0, end - start);
>>>
>>> Yeah, I did that before giving up on it for the day and going in 
>>> search
>>> of dinner. It changes the failure mode to a BUG() in
>>> cache_free_debugcheck(), at line 2876 of mm/slab.c
>>
>> Ok, that's just strange.
>
> In this case I hadn't left the 'return' in free_initrd_mem(). I was
> poisoning the pages and then returning them to the pool as usual.
>
> If I poison the pages and _don't_ return them to the pool, it boots
> fine. PageReserved is set on every page in the initrd region; total
> page_count() is equal to the number of pages (which doesn't
> _necessarily_ mean that page_count() for every page is equal to 1 but
> it's a strong hint that that's the case).
>
> Looking in /dev/mem after it boots, I see that my poison is still
> present throughout the whole region.
>
>> One obvious thing to do would be to remove all the "__initdata" 
>> entries in
>> mm/slab.c..
>
> This is biting us long before we call free_initmem().
>
>>  But I'd also like to see the full backtrace for the  BUG_ON(),
>> in case that gives any clues at all.
>
> I'll see if I can find a camera.
>
>>> It smells like the pages weren't actually reserved in the first place
>>> and we were blithely allocating them. The only problem with that 
>>> theory
>>> is that the initrd doesn't seem to be getting corrupted -- and if we
>>> were handing out its pages like that then surely _something_ would 
>>> have
>>> scribbled on it before we tried to read it.
>>
>> Yeah, I don't think it's necessarily initrd itself, I'd be more 
>> inclined
>> to think that the reason you see this change with the initrd 
>> unpacking is
>> simply that it does a lot of allocations for the initrd files, so I 
>> think
>> it is only indirectly involved - just because it ends up being a slab
>> user.
>
> Whatever happens, initrd as a 'slab user' is fine. The crashes happen
> _later_, when someone else is using the memory which used to belong to
> the initrd. In that 'BUG at slab.c:2876' I mentioned above, r3 was
> within the initrd region. As I said, I'll try to find a camera.


Just a thought,

Any chance you are using one of the unusal code paths, like the 
bootloader
moving the initrd or using a kernel-crash region?


milton

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26  0:00   ` David Woodhouse
@ 2007-02-26 19:27     ` john stultz
  -1 siblings, 0 replies; 56+ messages in thread
From: john stultz @ 2007-02-26 19:27 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linux Kernel Mailing List, torvalds, linuxppc-dev

On Sun, 2007-02-25 at 19:00 -0500, David Woodhouse wrote:
> On Mon, 2006-12-11 at 20:59 +0000, Linux Kernel Mailing List wrote:
> > 
> >     Make sure we populate the initroot filesystem late enough
>
> This seems to be what's triggering the apparent memory corruption we've
> been seeing recently -- in the case of the Fedora kernel it manifests
> itself as a BUG() in cache_alloc_refill() when the pmac ide driver
> initialises.
> 
> Another report was at http://lkml.org/lkml/2006/12/17/4
> 
> We've been seeing it on a Mac Mini too, and I managed to reproduce it on
> my shinybook this evening by booting with 'mem=512M'.

Just for reference (as its not in the thread linked above), this issue
disappeared for me after some config changes (I somehow changed my
selection when I backtracked and then moved forward w/ git bisect).

I've not been able to reproduce it since, but I know others (BCC'ed on
this note) have seen it and might prod them to come forth with details
(and broken .config files)

thanks
-john



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26 19:27     ` john stultz
  0 siblings, 0 replies; 56+ messages in thread
From: john stultz @ 2007-02-26 19:27 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linuxppc-dev, torvalds, Linux Kernel Mailing List

On Sun, 2007-02-25 at 19:00 -0500, David Woodhouse wrote:
> On Mon, 2006-12-11 at 20:59 +0000, Linux Kernel Mailing List wrote:
> > 
> >     Make sure we populate the initroot filesystem late enough
>
> This seems to be what's triggering the apparent memory corruption we've
> been seeing recently -- in the case of the Fedora kernel it manifests
> itself as a BUG() in cache_alloc_refill() when the pmac ide driver
> initialises.
> 
> Another report was at http://lkml.org/lkml/2006/12/17/4
> 
> We've been seeing it on a Mac Mini too, and I managed to reproduce it on
> my shinybook this evening by booting with 'mem=512M'.

Just for reference (as its not in the thread linked above), this issue
disappeared for me after some config changes (I somehow changed my
selection when I backtracked and then moved forward w/ git bisect).

I've not been able to reproduce it since, but I know others (BCC'ed on
this note) have seen it and might prod them to come forth with details
(and broken .config files)

thanks
-john

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26 15:51         ` Benjamin Herrenschmidt
@ 2007-02-26 20:51           ` Kumar Gala
  -1 siblings, 0 replies; 56+ messages in thread
From: Kumar Gala @ 2007-02-26 20:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Woodhouse, linuxppc-dev, Linus Torvalds,
	Linux Kernel Mailing List, john stultz


On Feb 26, 2007, at 9:51 AM, Benjamin Herrenschmidt wrote:

> On Sun, 2007-02-25 at 20:17 -0500, David Woodhouse wrote:
>> On Sun, 2007-02-25 at 16:24 -0800, Linus Torvalds wrote:
>>> Hmm. No, I don't think that should be a problem. free_initmem() only
>>> happens at the very, after do_basic_setup() has been run, which
>>> includes all the initcall stuff.
>>
>> I'm inclined to agree that it _shouldn't_ be a problem. Nevertheless,
>> even this hack seems sufficient to 'fix' it:
>
> Could be a powerpc specific bug in initrd handling... I'm still
> traveling so I can't really look at it right now, but I wouldn't be
> surprised if some of that code did indeed bitrot.
>
> Ben.

Could there be some issue with initrd getting reserved properly via  
prom_init.c.  I know we make sure there are memreserve's in the fdt  
for initrd on embedded ppc.

- k

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26 20:51           ` Kumar Gala
  0 siblings, 0 replies; 56+ messages in thread
From: Kumar Gala @ 2007-02-26 20:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, Linus Torvalds, David Woodhouse,
	Linux Kernel Mailing List, john stultz


On Feb 26, 2007, at 9:51 AM, Benjamin Herrenschmidt wrote:

> On Sun, 2007-02-25 at 20:17 -0500, David Woodhouse wrote:
>> On Sun, 2007-02-25 at 16:24 -0800, Linus Torvalds wrote:
>>> Hmm. No, I don't think that should be a problem. free_initmem() only
>>> happens at the very, after do_basic_setup() has been run, which
>>> includes all the initcall stuff.
>>
>> I'm inclined to agree that it _shouldn't_ be a problem. Nevertheless,
>> even this hack seems sufficient to 'fix' it:
>
> Could be a powerpc specific bug in initrd handling... I'm still
> traveling so I can't really look at it right now, but I wouldn't be
> surprised if some of that code did indeed bitrot.
>
> Ben.

Could there be some issue with initrd getting reserved properly via  
prom_init.c.  I know we make sure there are memreserve's in the fdt  
for initrd on embedded ppc.

- k

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26 16:44             ` Milton Miller
@ 2007-02-26 20:57               ` David Woodhouse
  -1 siblings, 0 replies; 56+ messages in thread
From: David Woodhouse @ 2007-02-26 20:57 UTC (permalink / raw)
  To: Milton Miller; +Cc: LKML, linuxppc-dev, torvalds

On Mon, 2007-02-26 at 10:44 -0600, Milton Miller wrote:
> Any chance you are using one of the unusal code paths, like the 
> bootloader moving the initrd or using a kernel crash region?

I'm doing nothing special. And I'm less sure now about the trigger. I
built a Fedora 7 test 2 install tree with the patch reverted, and
managed to boot and install.... but now when I boot the _same_ machine
with the same CD, it fails. 

Now I'm starting to wonder if it's something the firmware sets up to DMA
to a certain region of memory, which makes it non-deterministic. And the
other things we're blaming are only making a difference because they
change the layout of what we have in memory.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26 20:57               ` David Woodhouse
  0 siblings, 0 replies; 56+ messages in thread
From: David Woodhouse @ 2007-02-26 20:57 UTC (permalink / raw)
  To: Milton Miller; +Cc: linuxppc-dev, torvalds, LKML

On Mon, 2007-02-26 at 10:44 -0600, Milton Miller wrote:
> Any chance you are using one of the unusal code paths, like the 
> bootloader moving the initrd or using a kernel crash region?

I'm doing nothing special. And I'm less sure now about the trigger. I
built a Fedora 7 test 2 install tree with the patch reverted, and
managed to boot and install.... but now when I boot the _same_ machine
with the same CD, it fails. 

Now I'm starting to wonder if it's something the firmware sets up to DMA
to a certain region of memory, which makes it non-deterministic. And the
other things we're blaming are only making a difference because they
change the layout of what we have in memory.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26 20:57               ` David Woodhouse
@ 2007-02-26 21:17                 ` Linus Torvalds
  -1 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2007-02-26 21:17 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Milton Miller, LKML, linuxppc-dev

On Mon, 26 Feb 2007, David Woodhouse wrote:
> 
> Now I'm starting to wonder if it's something the firmware sets up to DMA
> to a certain region of memory, which makes it non-deterministic. And the
> other things we're blaming are only making a difference because they
> change the layout of what we have in memory.

USB controller issues? We used to have these really hard-to-debug problems 
with the USB controller being active and having had the BIOS set up the 
command queues etc. Really subtle. It's why we now have PCI quirks for 
shutting up (most) USB controllers very early.

If there is some USB controller that we miss, or that sets up its command 
chain to some unexpected area (so that USB is active and corrupting memory 
even very early on), that could explain it.

		Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26 21:17                 ` Linus Torvalds
  0 siblings, 0 replies; 56+ messages in thread
From: Linus Torvalds @ 2007-02-26 21:17 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linuxppc-dev, LKML, Milton Miller

On Mon, 26 Feb 2007, David Woodhouse wrote:
> 
> Now I'm starting to wonder if it's something the firmware sets up to DMA
> to a certain region of memory, which makes it non-deterministic. And the
> other things we're blaming are only making a difference because they
> change the layout of what we have in memory.

USB controller issues? We used to have these really hard-to-debug problems 
with the USB controller being active and having had the BIOS set up the 
command queues etc. Really subtle. It's why we now have PCI quirks for 
shutting up (most) USB controllers very early.

If there is some USB controller that we miss, or that sets up its command 
chain to some unexpected area (so that USB is active and corrupting memory 
even very early on), that could explain it.

		Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26 19:27     ` john stultz
@ 2007-02-26 22:27       ` Paul TBBle Hampson
  -1 siblings, 0 replies; 56+ messages in thread
From: Paul TBBle Hampson @ 2007-02-26 22:27 UTC (permalink / raw)
  To: john stultz
  Cc: David Woodhouse, Linux Kernel Mailing List, torvalds, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1805 bytes --]

On Mon, Feb 26, 2007 at 11:27:47AM -0800, john stultz wrote:
> On Sun, 2007-02-25 at 19:00 -0500, David Woodhouse wrote:
>> On Mon, 2006-12-11 at 20:59 +0000, Linux Kernel Mailing List wrote:
> >> 
> >>     Make sure we populate the initroot filesystem late enough

>> This seems to be what's triggering the apparent memory corruption we've
>> been seeing recently -- in the case of the Fedora kernel it manifests
>> itself as a BUG() in cache_alloc_refill() when the pmac ide driver
>> initialises.
>> 
>> Another report was at http://lkml.org/lkml/2006/12/17/4
>> 
>> We've been seeing it on a Mac Mini too, and I managed to reproduce it on
>> my shinybook this evening by booting with 'mem=512M'.

> Just for reference (as its not in the thread linked above), this issue
> disappeared for me after some config changes (I somehow changed my
> selection when I backtracked and then moved forward w/ git bisect).

> I've not been able to reproduce it since, but I know others (BCC'ed on
> this note) have seen it and might prod them to come forth with details
> (and broken .config files)

In my case, disabling CPU_FREQ_PMAC made the failure go away.
After reverting this patch, CPU_FREQ_PMAC is once again operating
successfully, so far.

-- 
-----------------------------------------------------------
Paul "TBBle" Hampson, B.Sc, LPI, MCSE
On-hiatus Asian Studies student, ANU
The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361)
Paul.Hampson@Pobox.Com

Of course Pacman didn't influence us as kids. If it did,
we'd be running around in darkened rooms, popping pills and
listening to repetitive music.
 -- Kristian Wilson, Nintendo, Inc, 1989

License: http://creativecommons.org/licenses/by/2.1/au/
-----------------------------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-26 22:27       ` Paul TBBle Hampson
  0 siblings, 0 replies; 56+ messages in thread
From: Paul TBBle Hampson @ 2007-02-26 22:27 UTC (permalink / raw)
  To: john stultz
  Cc: torvalds, David Woodhouse, Linux Kernel Mailing List, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1805 bytes --]

On Mon, Feb 26, 2007 at 11:27:47AM -0800, john stultz wrote:
> On Sun, 2007-02-25 at 19:00 -0500, David Woodhouse wrote:
>> On Mon, 2006-12-11 at 20:59 +0000, Linux Kernel Mailing List wrote:
> >> 
> >>     Make sure we populate the initroot filesystem late enough

>> This seems to be what's triggering the apparent memory corruption we've
>> been seeing recently -- in the case of the Fedora kernel it manifests
>> itself as a BUG() in cache_alloc_refill() when the pmac ide driver
>> initialises.
>> 
>> Another report was at http://lkml.org/lkml/2006/12/17/4
>> 
>> We've been seeing it on a Mac Mini too, and I managed to reproduce it on
>> my shinybook this evening by booting with 'mem=512M'.

> Just for reference (as its not in the thread linked above), this issue
> disappeared for me after some config changes (I somehow changed my
> selection when I backtracked and then moved forward w/ git bisect).

> I've not been able to reproduce it since, but I know others (BCC'ed on
> this note) have seen it and might prod them to come forth with details
> (and broken .config files)

In my case, disabling CPU_FREQ_PMAC made the failure go away.
After reverting this patch, CPU_FREQ_PMAC is once again operating
successfully, so far.

-- 
-----------------------------------------------------------
Paul "TBBle" Hampson, B.Sc, LPI, MCSE
On-hiatus Asian Studies student, ANU
The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361)
Paul.Hampson@Pobox.Com

Of course Pacman didn't influence us as kids. If it did,
we'd be running around in darkened rooms, popping pills and
listening to repetitive music.
 -- Kristian Wilson, Nintendo, Inc, 1989

License: http://creativecommons.org/licenses/by/2.1/au/
-----------------------------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26 21:17                 ` Linus Torvalds
@ 2007-02-27  6:46                   ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 56+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-27  6:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Woodhouse, linuxppc-dev, LKML, Milton Miller

> USB controller issues? We used to have these really hard-to-debug problems 
> with the USB controller being active and having had the BIOS set up the 
> command queues etc. Really subtle. It's why we now have PCI quirks for 
> shutting up (most) USB controllers very early.

On powermacs or powerbooks, the USB controller is shut down by the
firmware when we call the "quiesce" OF call from prom_init.c, which
happens before the kernel relocates itself to 0 and takes over memory.
Unless we fucked up something in there, I wouldn't expect that to be the
cause.

> If there is some USB controller that we miss, or that sets up its command 
> chain to some unexpected area (so that USB is active and corrupting memory 
> even very early on), that could explain it.

Did we setup the OHCI controller when the crash happen ? Maybe we broke
something subtle in the USB stack ?

Ben.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-27  6:46                   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 56+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-27  6:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linuxppc-dev, David Woodhouse, LKML, Milton Miller

> USB controller issues? We used to have these really hard-to-debug problems 
> with the USB controller being active and having had the BIOS set up the 
> command queues etc. Really subtle. It's why we now have PCI quirks for 
> shutting up (most) USB controllers very early.

On powermacs or powerbooks, the USB controller is shut down by the
firmware when we call the "quiesce" OF call from prom_init.c, which
happens before the kernel relocates itself to 0 and takes over memory.
Unless we fucked up something in there, I wouldn't expect that to be the
cause.

> If there is some USB controller that we miss, or that sets up its command 
> chain to some unexpected area (so that USB is active and corrupting memory 
> even very early on), that could explain it.

Did we setup the OHCI controller when the crash happen ? Maybe we broke
something subtle in the USB stack ?

Ben.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-26 22:27       ` Paul TBBle Hampson
@ 2007-02-27  6:48         ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 56+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-27  6:48 UTC (permalink / raw)
  To: Paul TBBle Hampson
  Cc: john stultz, torvalds, David Woodhouse,
	Linux Kernel Mailing List, linuxppc-dev


> > I've not been able to reproduce it since, but I know others (BCC'ed on
> > this note) have seen it and might prod them to come forth with details
> > (and broken .config files)
> 
> In my case, disabling CPU_FREQ_PMAC made the failure go away.
> After reverting this patch, CPU_FREQ_PMAC is once again operating
> successfully, so far.

Hrm.. which cpufreq method is used on both your machines ? If it's the
one involving the PMU, it does involve a full hard reset of the
processor (with appropriate cache flushes etc...), maybe something's
going wrong in that area....

Ben.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-27  6:48         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 56+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-27  6:48 UTC (permalink / raw)
  To: Paul TBBle Hampson
  Cc: linuxppc-dev, john stultz, torvalds, David Woodhouse,
	Linux Kernel Mailing List


> > I've not been able to reproduce it since, but I know others (BCC'ed on
> > this note) have seen it and might prod them to come forth with details
> > (and broken .config files)
> 
> In my case, disabling CPU_FREQ_PMAC made the failure go away.
> After reverting this patch, CPU_FREQ_PMAC is once again operating
> successfully, so far.

Hrm.. which cpufreq method is used on both your machines ? If it's the
one involving the PMU, it does involve a full hard reset of the
processor (with appropriate cache flushes etc...), maybe something's
going wrong in that area....

Ben.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-27  6:48         ` Benjamin Herrenschmidt
@ 2007-02-27 11:58           ` Segher Boessenkool
  -1 siblings, 0 replies; 56+ messages in thread
From: Segher Boessenkool @ 2007-02-27 11:58 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Woodhouse, torvalds, john stultz,
	Linux Kernel Mailing List, linuxppc-dev, Paul TBBle Hampson

>>> I've not been able to reproduce it since, but I know others (BCC'ed 
>>> on
>>> this note) have seen it and might prod them to come forth with 
>>> details
>>> (and broken .config files)
>>
>> In my case, disabling CPU_FREQ_PMAC made the failure go away.
>> After reverting this patch, CPU_FREQ_PMAC is once again operating
>> successfully, so far.
>
> Hrm.. which cpufreq method is used on both your machines ? If it's the
> one involving the PMU, it does involve a full hard reset of the
> processor (with appropriate cache flushes etc...), maybe something's
> going wrong in that area....

It's most likely a red herring, lots of config changes
make the bug go away on some kernel versions (but not
on others); the problem is very sensitive to changes in
memory layout.


Segher


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-27 11:58           ` Segher Boessenkool
  0 siblings, 0 replies; 56+ messages in thread
From: Segher Boessenkool @ 2007-02-27 11:58 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Woodhouse, Linux Kernel Mailing List, linuxppc-dev,
	john stultz, Paul TBBle Hampson, torvalds

>>> I've not been able to reproduce it since, but I know others (BCC'ed 
>>> on
>>> this note) have seen it and might prod them to come forth with 
>>> details
>>> (and broken .config files)
>>
>> In my case, disabling CPU_FREQ_PMAC made the failure go away.
>> After reverting this patch, CPU_FREQ_PMAC is once again operating
>> successfully, so far.
>
> Hrm.. which cpufreq method is used on both your machines ? If it's the
> one involving the PMU, it does involve a full hard reset of the
> processor (with appropriate cache flushes etc...), maybe something's
> going wrong in that area....

It's most likely a red herring, lots of config changes
make the bug go away on some kernel versions (but not
on others); the problem is very sensitive to changes in
memory layout.


Segher

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-27 11:58           ` Segher Boessenkool
@ 2007-02-28  6:43             ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 56+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-28  6:43 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: David Woodhouse, torvalds, john stultz,
	Linux Kernel Mailing List, linuxppc-dev, Paul TBBle Hampson

> It's most likely a red herring, lots of config changes
> make the bug go away on some kernel versions (but not
> on others); the problem is very sensitive to changes in
> memory layout.

I wouldn't be that sure ... I've had problems in the past with PMU based
cpufreq... looks like flushing all caches and hard-resetting the
processor on the fly when there can be pending DMAs might be a source of
trouble... especially on CPUs that don't have working cache flush HW
assist.

Ben.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-28  6:43             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 56+ messages in thread
From: Benjamin Herrenschmidt @ 2007-02-28  6:43 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: David Woodhouse, Linux Kernel Mailing List, linuxppc-dev,
	john stultz, Paul TBBle Hampson, torvalds

> It's most likely a red herring, lots of config changes
> make the bug go away on some kernel versions (but not
> on others); the problem is very sensitive to changes in
> memory layout.

I wouldn't be that sure ... I've had problems in the past with PMU based
cpufreq... looks like flushing all caches and hard-resetting the
processor on the fly when there can be pending DMAs might be a source of
trouble... especially on CPUs that don't have working cache flush HW
assist.

Ben.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-28  6:43             ` Benjamin Herrenschmidt
@ 2007-02-28 10:13               ` David Woodhouse
  -1 siblings, 0 replies; 56+ messages in thread
From: David Woodhouse @ 2007-02-28 10:13 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Segher Boessenkool, torvalds, john stultz,
	Linux Kernel Mailing List, linuxppc-dev, Paul TBBle Hampson

On Wed, 2007-02-28 at 07:43 +0100, Benjamin Herrenschmidt wrote:
> I wouldn't be that sure ... I've had problems in the past with PMU based
> cpufreq... looks like flushing all caches and hard-resetting the
> processor on the fly when there can be pending DMAs might be a source of
> trouble... especially on CPUs that don't have working cache flush HW
> assist. 

I've seen it on a PowerMac3,1 (400MHz G4) where we don't have cpufreq.
I've also seen it on the latest 1.5GHz Mac Mini, and on my shinybook.
They all fall over with the latest kernel, although the shinybook only
does so immediately when booted with mem=512M. The shinybook does crash
later with new kernels though; I don't yet know why. It could be the
same thing, or it could be something different. That one seemed to
appear between Fedora's 2.6.19-1.2913 and 2.6.19-1.2914 kernels, where
we did nothing but turned CONFIG_SYSFS_DEPRECATED on.

I don't blame cpufreq. At various times I've been equally convinced that
it was due to CONFIG_KPROBES, and Linus' initrd-moving patch.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-02-28 10:13               ` David Woodhouse
  0 siblings, 0 replies; 56+ messages in thread
From: David Woodhouse @ 2007-02-28 10:13 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: john stultz, Linux Kernel Mailing List, linuxppc-dev,
	Paul TBBle Hampson, torvalds

On Wed, 2007-02-28 at 07:43 +0100, Benjamin Herrenschmidt wrote:
> I wouldn't be that sure ... I've had problems in the past with PMU based
> cpufreq... looks like flushing all caches and hard-resetting the
> processor on the fly when there can be pending DMAs might be a source of
> trouble... especially on CPUs that don't have working cache flush HW
> assist. 

I've seen it on a PowerMac3,1 (400MHz G4) where we don't have cpufreq.
I've also seen it on the latest 1.5GHz Mac Mini, and on my shinybook.
They all fall over with the latest kernel, although the shinybook only
does so immediately when booted with mem=512M. The shinybook does crash
later with new kernels though; I don't yet know why. It could be the
same thing, or it could be something different. That one seemed to
appear between Fedora's 2.6.19-1.2913 and 2.6.19-1.2914 kernels, where
we did nothing but turned CONFIG_SYSFS_DEPRECATED on.

I don't blame cpufreq. At various times I've been equally convinced that
it was due to CONFIG_KPROBES, and Linus' initrd-moving patch.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-02-28 10:13               ` David Woodhouse
@ 2007-03-01  0:30                 ` Michael Ellerman
  -1 siblings, 0 replies; 56+ messages in thread
From: Michael Ellerman @ 2007-03-01  0:30 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Benjamin Herrenschmidt, john stultz, Linux Kernel Mailing List,
	linuxppc-dev, Paul TBBle Hampson, torvalds

[-- Attachment #1: Type: text/plain, Size: 1845 bytes --]

On Wed, 2007-02-28 at 10:13 +0000, David Woodhouse wrote:
> On Wed, 2007-02-28 at 07:43 +0100, Benjamin Herrenschmidt wrote:
> > I wouldn't be that sure ... I've had problems in the past with PMU based
> > cpufreq... looks like flushing all caches and hard-resetting the
> > processor on the fly when there can be pending DMAs might be a source of
> > trouble... especially on CPUs that don't have working cache flush HW
> > assist. 
> 
> I've seen it on a PowerMac3,1 (400MHz G4) where we don't have cpufreq.
> I've also seen it on the latest 1.5GHz Mac Mini, and on my shinybook.
> They all fall over with the latest kernel, although the shinybook only
> does so immediately when booted with mem=512M. The shinybook does crash
> later with new kernels though; I don't yet know why. It could be the
> same thing, or it could be something different. That one seemed to
> appear between Fedora's 2.6.19-1.2913 and 2.6.19-1.2914 kernels, where
> we did nothing but turned CONFIG_SYSFS_DEPRECATED on.
> 
> I don't blame cpufreq. At various times I've been equally convinced that
> it was due to CONFIG_KPROBES, and Linus' initrd-moving patch.

Is there any pattern to the way it dies? Or is it just randomly dieing
somewhere depending on which config options you have enabled?

This is starting to sound reminiscent of a bug I chased for a while last
year on Power5, but didn't find. It was "fixed" on some machines by
disabling CONFIG_KEXEC, and/or other random unrelated CONFIG options.
Unfortunately it magically stopped reproducing so I never caught it :/

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-03-01  0:30                 ` Michael Ellerman
  0 siblings, 0 replies; 56+ messages in thread
From: Michael Ellerman @ 2007-03-01  0:30 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linux Kernel Mailing List, linuxppc-dev, john stultz,
	Paul TBBle Hampson, torvalds

[-- Attachment #1: Type: text/plain, Size: 1845 bytes --]

On Wed, 2007-02-28 at 10:13 +0000, David Woodhouse wrote:
> On Wed, 2007-02-28 at 07:43 +0100, Benjamin Herrenschmidt wrote:
> > I wouldn't be that sure ... I've had problems in the past with PMU based
> > cpufreq... looks like flushing all caches and hard-resetting the
> > processor on the fly when there can be pending DMAs might be a source of
> > trouble... especially on CPUs that don't have working cache flush HW
> > assist. 
> 
> I've seen it on a PowerMac3,1 (400MHz G4) where we don't have cpufreq.
> I've also seen it on the latest 1.5GHz Mac Mini, and on my shinybook.
> They all fall over with the latest kernel, although the shinybook only
> does so immediately when booted with mem=512M. The shinybook does crash
> later with new kernels though; I don't yet know why. It could be the
> same thing, or it could be something different. That one seemed to
> appear between Fedora's 2.6.19-1.2913 and 2.6.19-1.2914 kernels, where
> we did nothing but turned CONFIG_SYSFS_DEPRECATED on.
> 
> I don't blame cpufreq. At various times I've been equally convinced that
> it was due to CONFIG_KPROBES, and Linus' initrd-moving patch.

Is there any pattern to the way it dies? Or is it just randomly dieing
somewhere depending on which config options you have enabled?

This is starting to sound reminiscent of a bug I chased for a while last
year on Power5, but didn't find. It was "fixed" on some machines by
disabling CONFIG_KEXEC, and/or other random unrelated CONFIG options.
Unfortunately it magically stopped reproducing so I never caught it :/

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-03-01  0:30                 ` Michael Ellerman
@ 2007-03-12 23:01                   ` Paul TBBle Hampson
  -1 siblings, 0 replies; 56+ messages in thread
From: Paul TBBle Hampson @ 2007-03-12 23:01 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: David Woodhouse, Benjamin Herrenschmidt, john stultz,
	Linux Kernel Mailing List, linuxppc-dev, torvalds

[-- Attachment #1: Type: text/plain, Size: 3891 bytes --]

On Thu, Mar 01, 2007 at 09:30:56AM +0900, Michael Ellerman wrote:
> On Wed, 2007-02-28 at 10:13 +0000, David Woodhouse wrote:
>> On Wed, 2007-02-28 at 07:43 +0100, Benjamin Herrenschmidt wrote:
> >> I wouldn't be that sure ... I've had problems in the past with PMU based
> >> cpufreq... looks like flushing all caches and hard-resetting the
> >> processor on the fly when there can be pending DMAs might be a source of
> >> trouble... especially on CPUs that don't have working cache flush HW
> >> assist. 
>> 
>> I've seen it on a PowerMac3,1 (400MHz G4) where we don't have cpufreq.
>> I've also seen it on the latest 1.5GHz Mac Mini, and on my shinybook.
>> They all fall over with the latest kernel, although the shinybook only
>> does so immediately when booted with mem=512M. The shinybook does crash
>> later with new kernels though; I don't yet know why. It could be the
>> same thing, or it could be something different. That one seemed to
>> appear between Fedora's 2.6.19-1.2913 and 2.6.19-1.2914 kernels, where
>> we did nothing but turned CONFIG_SYSFS_DEPRECATED on.
>> 
>> I don't blame cpufreq. At various times I've been equally convinced that
>> it was due to CONFIG_KPROBES, and Linus' initrd-moving patch.

> Is there any pattern to the way it dies? Or is it just randomly dieing
> somewhere depending on which config options you have enabled?

> This is starting to sound reminiscent of a bug I chased for a while last
> year on Power5, but didn't find. It was "fixed" on some machines by
> disabling CONFIG_KEXEC, and/or other random unrelated CONFIG options.
> Unfortunately it magically stopped reproducing so I never caught it :/

Hmm. The crash came back after I booted into Mac OS X and back. It was however
a different crash, I believe it was coming from the USB modules (as it would
keep going when it happened, and get another crash, which tended to scroll away
too fast for me to capture) but I believe it was still getting down into the
slab code and actually dying there.

However, reverting the reversion of
8d610dd52dd1da696e199e4b4545f33a2a5de5c6 and instead applying
the following patch:

diff -ru linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c linux-source-2.6.20/arch/powerpc/mm/init_32.c
--- linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c  2007-02-05 05:44:54.000000000 +1100
+++ linux-source-2.6.20/arch/powerpc/mm/init_32.c       2007-03-10 11:03:56.000000000 +1100
@@ -244,7 +244,8 @@
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
        if (start < end)
-               printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
+               printk ("NOT Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
+       return;
        for (; start < end; start += PAGE_SIZE) {
                ClearPageReserved(virt_to_page(start));
                init_page_count(virt_to_page(start));

which if I recall correctly David Woodhouse posted to this thread,
seems to have fixed it.

I dunno if it's relevant, but my initrd.img is 13193315 bytes long,
(ie 99 bytes over 12884k) and the above logs:
"NOT Freeing initrd memory: 12888k freed"
which makes sense...

I of course completely failed to think to check this with the crashing
kernel, if it seems relevant I can roll back to it and get the numbers.

-- 
-----------------------------------------------------------
Paul "TBBle" Hampson, B.Sc, LPI, MCSE
On-hiatus Asian Studies student, ANU
The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361)
Paul.Hampson@Pobox.Com

Of course Pacman didn't influence us as kids. If it did,
we'd be running around in darkened rooms, popping pills and
listening to repetitive music.
 -- Kristian Wilson, Nintendo, Inc, 1989

License: http://creativecommons.org/licenses/by/2.1/au/
-----------------------------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-03-12 23:01                   ` Paul TBBle Hampson
  0 siblings, 0 replies; 56+ messages in thread
From: Paul TBBle Hampson @ 2007-03-12 23:01 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Linux Kernel Mailing List, linuxppc-dev, john stultz, torvalds,
	David Woodhouse

[-- Attachment #1: Type: text/plain, Size: 3891 bytes --]

On Thu, Mar 01, 2007 at 09:30:56AM +0900, Michael Ellerman wrote:
> On Wed, 2007-02-28 at 10:13 +0000, David Woodhouse wrote:
>> On Wed, 2007-02-28 at 07:43 +0100, Benjamin Herrenschmidt wrote:
> >> I wouldn't be that sure ... I've had problems in the past with PMU based
> >> cpufreq... looks like flushing all caches and hard-resetting the
> >> processor on the fly when there can be pending DMAs might be a source of
> >> trouble... especially on CPUs that don't have working cache flush HW
> >> assist. 
>> 
>> I've seen it on a PowerMac3,1 (400MHz G4) where we don't have cpufreq.
>> I've also seen it on the latest 1.5GHz Mac Mini, and on my shinybook.
>> They all fall over with the latest kernel, although the shinybook only
>> does so immediately when booted with mem=512M. The shinybook does crash
>> later with new kernels though; I don't yet know why. It could be the
>> same thing, or it could be something different. That one seemed to
>> appear between Fedora's 2.6.19-1.2913 and 2.6.19-1.2914 kernels, where
>> we did nothing but turned CONFIG_SYSFS_DEPRECATED on.
>> 
>> I don't blame cpufreq. At various times I've been equally convinced that
>> it was due to CONFIG_KPROBES, and Linus' initrd-moving patch.

> Is there any pattern to the way it dies? Or is it just randomly dieing
> somewhere depending on which config options you have enabled?

> This is starting to sound reminiscent of a bug I chased for a while last
> year on Power5, but didn't find. It was "fixed" on some machines by
> disabling CONFIG_KEXEC, and/or other random unrelated CONFIG options.
> Unfortunately it magically stopped reproducing so I never caught it :/

Hmm. The crash came back after I booted into Mac OS X and back. It was however
a different crash, I believe it was coming from the USB modules (as it would
keep going when it happened, and get another crash, which tended to scroll away
too fast for me to capture) but I believe it was still getting down into the
slab code and actually dying there.

However, reverting the reversion of
8d610dd52dd1da696e199e4b4545f33a2a5de5c6 and instead applying
the following patch:

diff -ru linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c linux-source-2.6.20/arch/powerpc/mm/init_32.c
--- linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c  2007-02-05 05:44:54.000000000 +1100
+++ linux-source-2.6.20/arch/powerpc/mm/init_32.c       2007-03-10 11:03:56.000000000 +1100
@@ -244,7 +244,8 @@
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
        if (start < end)
-               printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
+               printk ("NOT Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
+       return;
        for (; start < end; start += PAGE_SIZE) {
                ClearPageReserved(virt_to_page(start));
                init_page_count(virt_to_page(start));

which if I recall correctly David Woodhouse posted to this thread,
seems to have fixed it.

I dunno if it's relevant, but my initrd.img is 13193315 bytes long,
(ie 99 bytes over 12884k) and the above logs:
"NOT Freeing initrd memory: 12888k freed"
which makes sense...

I of course completely failed to think to check this with the crashing
kernel, if it seems relevant I can roll back to it and get the numbers.

-- 
-----------------------------------------------------------
Paul "TBBle" Hampson, B.Sc, LPI, MCSE
On-hiatus Asian Studies student, ANU
The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361)
Paul.Hampson@Pobox.Com

Of course Pacman didn't influence us as kids. If it did,
we'd be running around in darkened rooms, popping pills and
listening to repetitive music.
 -- Kristian Wilson, Nintendo, Inc, 1989

License: http://creativecommons.org/licenses/by/2.1/au/
-----------------------------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-03-12 23:01                   ` Paul TBBle Hampson
@ 2007-03-13  3:03                     ` Kumar Gala
  -1 siblings, 0 replies; 56+ messages in thread
From: Kumar Gala @ 2007-03-13  3:03 UTC (permalink / raw)
  To: Paul TBBle Hampson
  Cc: Michael Ellerman, Linux Kernel Mailing List, linuxppc-dev,
	john stultz, torvalds, David Woodhouse


On Mar 12, 2007, at 6:01 PM, Paul TBBle Hampson wrote:

> On Thu, Mar 01, 2007 at 09:30:56AM +0900, Michael Ellerman wrote:
>> On Wed, 2007-02-28 at 10:13 +0000, David Woodhouse wrote:
>>> On Wed, 2007-02-28 at 07:43 +0100, Benjamin Herrenschmidt wrote:
>>>> I wouldn't be that sure ... I've had problems in the past with  
>>>> PMU based
>>>> cpufreq... looks like flushing all caches and hard-resetting the
>>>> processor on the fly when there can be pending DMAs might be a  
>>>> source of
>>>> trouble... especially on CPUs that don't have working cache  
>>>> flush HW
>>>> assist.
>>>
>>> I've seen it on a PowerMac3,1 (400MHz G4) where we don't have  
>>> cpufreq.
>>> I've also seen it on the latest 1.5GHz Mac Mini, and on my  
>>> shinybook.
>>> They all fall over with the latest kernel, although the shinybook  
>>> only
>>> does so immediately when booted with mem=512M. The shinybook does  
>>> crash
>>> later with new kernels though; I don't yet know why. It could be the
>>> same thing, or it could be something different. That one seemed to
>>> appear between Fedora's 2.6.19-1.2913 and 2.6.19-1.2914 kernels,  
>>> where
>>> we did nothing but turned CONFIG_SYSFS_DEPRECATED on.
>>>
>>> I don't blame cpufreq. At various times I've been equally  
>>> convinced that
>>> it was due to CONFIG_KPROBES, and Linus' initrd-moving patch.
>
>> Is there any pattern to the way it dies? Or is it just randomly  
>> dieing
>> somewhere depending on which config options you have enabled?
>
>> This is starting to sound reminiscent of a bug I chased for a  
>> while last
>> year on Power5, but didn't find. It was "fixed" on some machines by
>> disabling CONFIG_KEXEC, and/or other random unrelated CONFIG options.
>> Unfortunately it magically stopped reproducing so I never caught  
>> it :/
>
> Hmm. The crash came back after I booted into Mac OS X and back. It  
> was however
> a different crash, I believe it was coming from the USB modules (as  
> it would
> keep going when it happened, and get another crash, which tended to  
> scroll away
> too fast for me to capture) but I believe it was still getting down  
> into the
> slab code and actually dying there.
>
> However, reverting the reversion of
> 8d610dd52dd1da696e199e4b4545f33a2a5de5c6 and instead applying
> the following patch:
>
> diff -ru linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c linux- 
> source-2.6.20/arch/powerpc/mm/init_32.c
> --- linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c  2007-02-05  
> 05:44:54.000000000 +1100
> +++ linux-source-2.6.20/arch/powerpc/mm/init_32.c       2007-03-10  
> 11:03:56.000000000 +1100
> @@ -244,7 +244,8 @@
>  void free_initrd_mem(unsigned long start, unsigned long end)
>  {
>         if (start < end)
> -               printk ("Freeing initrd memory: %ldk freed\n", (end  
> - start) >> 10);
> +               printk ("NOT Freeing initrd memory: %ldk freed\n",  
> (end - start) >> 10);
> +       return;
>         for (; start < end; start += PAGE_SIZE) {
>                 ClearPageReserved(virt_to_page(start));
>                 init_page_count(virt_to_page(start));
>
> which if I recall correctly David Woodhouse posted to this thread,
> seems to have fixed it.
>
> I dunno if it's relevant, but my initrd.img is 13193315 bytes long,
> (ie 99 bytes over 12884k) and the above logs:
> "NOT Freeing initrd memory: 12888k freed"
> which makes sense...
>
> I of course completely failed to think to check this with the crashing
> kernel, if it seems relevant I can roll back to it and get the  
> numbers.

Have you tried 2.6.20.2, there was a significant bug in get_order()  
that was deemed to be causing these issues.

- k

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-03-13  3:03                     ` Kumar Gala
  0 siblings, 0 replies; 56+ messages in thread
From: Kumar Gala @ 2007-03-13  3:03 UTC (permalink / raw)
  To: Paul TBBle Hampson
  Cc: torvalds, Linux Kernel Mailing List, linuxppc-dev, john stultz,
	David Woodhouse


On Mar 12, 2007, at 6:01 PM, Paul TBBle Hampson wrote:

> On Thu, Mar 01, 2007 at 09:30:56AM +0900, Michael Ellerman wrote:
>> On Wed, 2007-02-28 at 10:13 +0000, David Woodhouse wrote:
>>> On Wed, 2007-02-28 at 07:43 +0100, Benjamin Herrenschmidt wrote:
>>>> I wouldn't be that sure ... I've had problems in the past with  
>>>> PMU based
>>>> cpufreq... looks like flushing all caches and hard-resetting the
>>>> processor on the fly when there can be pending DMAs might be a  
>>>> source of
>>>> trouble... especially on CPUs that don't have working cache  
>>>> flush HW
>>>> assist.
>>>
>>> I've seen it on a PowerMac3,1 (400MHz G4) where we don't have  
>>> cpufreq.
>>> I've also seen it on the latest 1.5GHz Mac Mini, and on my  
>>> shinybook.
>>> They all fall over with the latest kernel, although the shinybook  
>>> only
>>> does so immediately when booted with mem=512M. The shinybook does  
>>> crash
>>> later with new kernels though; I don't yet know why. It could be the
>>> same thing, or it could be something different. That one seemed to
>>> appear between Fedora's 2.6.19-1.2913 and 2.6.19-1.2914 kernels,  
>>> where
>>> we did nothing but turned CONFIG_SYSFS_DEPRECATED on.
>>>
>>> I don't blame cpufreq. At various times I've been equally  
>>> convinced that
>>> it was due to CONFIG_KPROBES, and Linus' initrd-moving patch.
>
>> Is there any pattern to the way it dies? Or is it just randomly  
>> dieing
>> somewhere depending on which config options you have enabled?
>
>> This is starting to sound reminiscent of a bug I chased for a  
>> while last
>> year on Power5, but didn't find. It was "fixed" on some machines by
>> disabling CONFIG_KEXEC, and/or other random unrelated CONFIG options.
>> Unfortunately it magically stopped reproducing so I never caught  
>> it :/
>
> Hmm. The crash came back after I booted into Mac OS X and back. It  
> was however
> a different crash, I believe it was coming from the USB modules (as  
> it would
> keep going when it happened, and get another crash, which tended to  
> scroll away
> too fast for me to capture) but I believe it was still getting down  
> into the
> slab code and actually dying there.
>
> However, reverting the reversion of
> 8d610dd52dd1da696e199e4b4545f33a2a5de5c6 and instead applying
> the following patch:
>
> diff -ru linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c linux- 
> source-2.6.20/arch/powerpc/mm/init_32.c
> --- linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c  2007-02-05  
> 05:44:54.000000000 +1100
> +++ linux-source-2.6.20/arch/powerpc/mm/init_32.c       2007-03-10  
> 11:03:56.000000000 +1100
> @@ -244,7 +244,8 @@
>  void free_initrd_mem(unsigned long start, unsigned long end)
>  {
>         if (start < end)
> -               printk ("Freeing initrd memory: %ldk freed\n", (end  
> - start) >> 10);
> +               printk ("NOT Freeing initrd memory: %ldk freed\n",  
> (end - start) >> 10);
> +       return;
>         for (; start < end; start += PAGE_SIZE) {
>                 ClearPageReserved(virt_to_page(start));
>                 init_page_count(virt_to_page(start));
>
> which if I recall correctly David Woodhouse posted to this thread,
> seems to have fixed it.
>
> I dunno if it's relevant, but my initrd.img is 13193315 bytes long,
> (ie 99 bytes over 12884k) and the above logs:
> "NOT Freeing initrd memory: 12888k freed"
> which makes sense...
>
> I of course completely failed to think to check this with the crashing
> kernel, if it seems relevant I can roll back to it and get the  
> numbers.

Have you tried 2.6.20.2, there was a significant bug in get_order()  
that was deemed to be causing these issues.

- k

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-03-12 23:01                   ` Paul TBBle Hampson
@ 2007-03-13  7:03                     ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 56+ messages in thread
From: Benjamin Herrenschmidt @ 2007-03-13  7:03 UTC (permalink / raw)
  To: Paul TBBle Hampson
  Cc: Michael Ellerman, David Woodhouse, john stultz,
	Linux Kernel Mailing List, linuxppc-dev, torvalds


> Hmm. The crash came back after I booted into Mac OS X and back. It was however
> a different crash, I believe it was coming from the USB modules (as it would
> keep going when it happened, and get another crash, which tended to scroll away
> too fast for me to capture) but I believe it was still getting down into the
> slab code and actually dying there.

Have you tried, instead, to apply
38f3323037de22bb0089d08be27be01196e7148b ? (That is revert
39d61db0edb34d60b83c5e0d62d0e906578cc707).

I suspect this is the proper fix...

Ben.

> However, reverting the reversion of
> 8d610dd52dd1da696e199e4b4545f33a2a5de5c6 and instead applying
> the following patch:
> 
> diff -ru linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c linux-source-2.6.20/arch/powerpc/mm/init_32.c
> --- linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c  2007-02-05 05:44:54.000000000 +1100
> +++ linux-source-2.6.20/arch/powerpc/mm/init_32.c       2007-03-10 11:03:56.000000000 +1100
> @@ -244,7 +244,8 @@
>  void free_initrd_mem(unsigned long start, unsigned long end)
>  {
>         if (start < end)
> -               printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
> +               printk ("NOT Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
> +       return;
>         for (; start < end; start += PAGE_SIZE) {
>                 ClearPageReserved(virt_to_page(start));
>                 init_page_count(virt_to_page(start));
> 
> which if I recall correctly David Woodhouse posted to this thread,
> seems to have fixed it.
> 
> I dunno if it's relevant, but my initrd.img is 13193315 bytes long,
> (ie 99 bytes over 12884k) and the above logs:
> "NOT Freeing initrd memory: 12888k freed"
> which makes sense...
> 
> I of course completely failed to think to check this with the crashing
> kernel, if it seems relevant I can roll back to it and get the numbers.
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-03-13  7:03                     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 56+ messages in thread
From: Benjamin Herrenschmidt @ 2007-03-13  7:03 UTC (permalink / raw)
  To: Paul TBBle Hampson
  Cc: john stultz, Linux Kernel Mailing List, linuxppc-dev, torvalds,
	David Woodhouse


> Hmm. The crash came back after I booted into Mac OS X and back. It was however
> a different crash, I believe it was coming from the USB modules (as it would
> keep going when it happened, and get another crash, which tended to scroll away
> too fast for me to capture) but I believe it was still getting down into the
> slab code and actually dying there.

Have you tried, instead, to apply
38f3323037de22bb0089d08be27be01196e7148b ? (That is revert
39d61db0edb34d60b83c5e0d62d0e906578cc707).

I suspect this is the proper fix...

Ben.

> However, reverting the reversion of
> 8d610dd52dd1da696e199e4b4545f33a2a5de5c6 and instead applying
> the following patch:
> 
> diff -ru linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c linux-source-2.6.20/arch/powerpc/mm/init_32.c
> --- linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c  2007-02-05 05:44:54.000000000 +1100
> +++ linux-source-2.6.20/arch/powerpc/mm/init_32.c       2007-03-10 11:03:56.000000000 +1100
> @@ -244,7 +244,8 @@
>  void free_initrd_mem(unsigned long start, unsigned long end)
>  {
>         if (start < end)
> -               printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
> +               printk ("NOT Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
> +       return;
>         for (; start < end; start += PAGE_SIZE) {
>                 ClearPageReserved(virt_to_page(start));
>                 init_page_count(virt_to_page(start));
> 
> which if I recall correctly David Woodhouse posted to this thread,
> seems to have fixed it.
> 
> I dunno if it's relevant, but my initrd.img is 13193315 bytes long,
> (ie 99 bytes over 12884k) and the above logs:
> "NOT Freeing initrd memory: 12888k freed"
> which makes sense...
> 
> I of course completely failed to think to check this with the crashing
> kernel, if it seems relevant I can roll back to it and get the numbers.
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
  2007-03-13  7:03                     ` Benjamin Herrenschmidt
@ 2007-03-16  7:20                       ` Paul TBBle Hampson
  -1 siblings, 0 replies; 56+ messages in thread
From: Paul TBBle Hampson @ 2007-03-16  7:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael Ellerman, David Woodhouse, john stultz,
	Linux Kernel Mailing List, linuxppc-dev, torvalds

[-- Attachment #1: Type: text/plain, Size: 1232 bytes --]

On Tue, Mar 13, 2007 at 08:03:49AM +0100, Benjamin Herrenschmidt wrote:

>> Hmm. The crash came back after I booted into Mac OS X and back. It was however
>> a different crash, I believe it was coming from the USB modules (as it would
>> keep going when it happened, and get another crash, which tended to scroll away
>> too fast for me to capture) but I believe it was still getting down into the
>> slab code and actually dying there.

> Have you tried, instead, to apply
> 38f3323037de22bb0089d08be27be01196e7148b ? (That is revert
> 39d61db0edb34d60b83c5e0d62d0e906578cc707).

That's working fine at the moment, and has even survived a trip to Mac
OS X and back.

Thankyou.

-- 
-----------------------------------------------------------
Paul "TBBle" Hampson, B.Sc, LPI, MCSE
On-hiatus Asian Studies student, ANU
The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361)
Paul.Hampson@Pobox.Com

Of course Pacman didn't influence us as kids. If it did,
we'd be running around in darkened rooms, popping pills and
listening to repetitive music.
 -- Kristian Wilson, Nintendo, Inc, 1989

License: http://creativecommons.org/licenses/by/2.1/au/
-----------------------------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Make sure we populate the initroot filesystem late enough
@ 2007-03-16  7:20                       ` Paul TBBle Hampson
  0 siblings, 0 replies; 56+ messages in thread
From: Paul TBBle Hampson @ 2007-03-16  7:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: john stultz, Linux Kernel Mailing List, linuxppc-dev, torvalds,
	David Woodhouse

[-- Attachment #1: Type: text/plain, Size: 1232 bytes --]

On Tue, Mar 13, 2007 at 08:03:49AM +0100, Benjamin Herrenschmidt wrote:

>> Hmm. The crash came back after I booted into Mac OS X and back. It was however
>> a different crash, I believe it was coming from the USB modules (as it would
>> keep going when it happened, and get another crash, which tended to scroll away
>> too fast for me to capture) but I believe it was still getting down into the
>> slab code and actually dying there.

> Have you tried, instead, to apply
> 38f3323037de22bb0089d08be27be01196e7148b ? (That is revert
> 39d61db0edb34d60b83c5e0d62d0e906578cc707).

That's working fine at the moment, and has even survived a trip to Mac
OS X and back.

Thankyou.

-- 
-----------------------------------------------------------
Paul "TBBle" Hampson, B.Sc, LPI, MCSE
On-hiatus Asian Studies student, ANU
The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361)
Paul.Hampson@Pobox.Com

Of course Pacman didn't influence us as kids. If it did,
we'd be running around in darkened rooms, popping pills and
listening to repetitive music.
 -- Kristian Wilson, Nintendo, Inc, 1989

License: http://creativecommons.org/licenses/by/2.1/au/
-----------------------------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2007-03-16  7:20 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200612112059.kBBKx1j7022473@hera.kernel.org>
2007-02-26  0:00 ` Make sure we populate the initroot filesystem late enough David Woodhouse
2007-02-26  0:00   ` David Woodhouse
2007-02-26  0:24   ` Linus Torvalds
2007-02-26  0:24     ` Linus Torvalds
2007-02-26  0:45     ` David Woodhouse
2007-02-26  0:45       ` David Woodhouse
2007-02-26  1:17     ` David Woodhouse
2007-02-26  1:17       ` David Woodhouse
2007-02-26  3:45       ` Linus Torvalds
2007-02-26  3:45         ` Linus Torvalds
2007-02-26  4:01         ` David Woodhouse
2007-02-26  4:01           ` David Woodhouse
2007-02-26  4:13           ` Linus Torvalds
2007-02-26  4:13             ` Linus Torvalds
2007-02-26 16:24             ` David Woodhouse
2007-02-26 16:24               ` David Woodhouse
2007-02-26  6:59           ` William Lee Irwin III
2007-02-26  6:59             ` William Lee Irwin III
2007-02-26 15:53           ` Benjamin Herrenschmidt
2007-02-26 15:53             ` Benjamin Herrenschmidt
2007-02-26 16:00             ` Segher Boessenkool
2007-02-26 16:00               ` Segher Boessenkool
2007-02-26 16:44           ` Milton Miller
2007-02-26 16:44             ` Milton Miller
2007-02-26 20:57             ` David Woodhouse
2007-02-26 20:57               ` David Woodhouse
2007-02-26 21:17               ` Linus Torvalds
2007-02-26 21:17                 ` Linus Torvalds
2007-02-27  6:46                 ` Benjamin Herrenschmidt
2007-02-27  6:46                   ` Benjamin Herrenschmidt
2007-02-26 15:51       ` Benjamin Herrenschmidt
2007-02-26 15:51         ` Benjamin Herrenschmidt
2007-02-26 20:51         ` Kumar Gala
2007-02-26 20:51           ` Kumar Gala
2007-02-26 19:27   ` john stultz
2007-02-26 19:27     ` john stultz
2007-02-26 22:27     ` Paul TBBle Hampson
2007-02-26 22:27       ` Paul TBBle Hampson
2007-02-27  6:48       ` Benjamin Herrenschmidt
2007-02-27  6:48         ` Benjamin Herrenschmidt
2007-02-27 11:58         ` Segher Boessenkool
2007-02-27 11:58           ` Segher Boessenkool
2007-02-28  6:43           ` Benjamin Herrenschmidt
2007-02-28  6:43             ` Benjamin Herrenschmidt
2007-02-28 10:13             ` David Woodhouse
2007-02-28 10:13               ` David Woodhouse
2007-03-01  0:30               ` Michael Ellerman
2007-03-01  0:30                 ` Michael Ellerman
2007-03-12 23:01                 ` Paul TBBle Hampson
2007-03-12 23:01                   ` Paul TBBle Hampson
2007-03-13  3:03                   ` Kumar Gala
2007-03-13  3:03                     ` Kumar Gala
2007-03-13  7:03                   ` Benjamin Herrenschmidt
2007-03-13  7:03                     ` Benjamin Herrenschmidt
2007-03-16  7:20                     ` Paul TBBle Hampson
2007-03-16  7:20                       ` Paul TBBle Hampson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.