All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
@ 2009-09-16  6:37 Pekka Enberg
  2009-09-16  6:55 ` David Rientjes
  2009-09-17 10:08 ` Mel Gorman
  0 siblings, 2 replies; 26+ messages in thread
From: Pekka Enberg @ 2009-09-16  6:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, cl, heiko.carstens, mingo, npiggin, sachinp

The SLQB allocator is known to be broken on certain PowerPC and S390
configurations. Disable the allocator in Kconfig for those architectures
until the issues are resolved.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
---
 init/Kconfig |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index c0d8a47..aaeddeb 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1033,6 +1033,7 @@ config SLUB
 
 config SLQB
 	bool "SLQB (Queued allocator)"
+	depends on !PPC && !S390
 	help
 	  SLQB is a proposed new slab allocator.
 
-- 
1.5.6.3




^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-16  6:37 [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390 Pekka Enberg
@ 2009-09-16  6:55 ` David Rientjes
  2009-09-16  7:01   ` Pekka Enberg
  2009-09-17 10:08 ` Mel Gorman
  1 sibling, 1 reply; 26+ messages in thread
From: David Rientjes @ 2009-09-16  6:55 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: linux-kernel, akpm, cl, heiko.carstens, mingo, npiggin, sachinp

On Wed, 16 Sep 2009, Pekka Enberg wrote:

> The SLQB allocator is known to be broken on certain PowerPC and S390
> configurations. Disable the allocator in Kconfig for those architectures
> until the issues are resolved.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Christoph Lameter <cl@linux-foundation.org>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Nick Piggin <npiggin@suse.de>
> Cc: Sachin Sant <sachinp@in.ibm.com>
> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
> ---
>  init/Kconfig |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index c0d8a47..aaeddeb 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1033,6 +1033,7 @@ config SLUB
>  
>  config SLQB
>  	bool "SLQB (Queued allocator)"
> +	depends on !PPC && !S390
>  	help
>  	  SLQB is a proposed new slab allocator.
>  

I think this should be (!PPC && !S390) || EXPERIMENTAL so that it can 
still be enabled for debugging and development.

Is this in preparation for slqb inclusion as a non-default slab allocator 
in 2.6.32?  2.6.33?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-16  6:55 ` David Rientjes
@ 2009-09-16  7:01   ` Pekka Enberg
  2009-09-16  8:04     ` Heiko Carstens
  0 siblings, 1 reply; 26+ messages in thread
From: Pekka Enberg @ 2009-09-16  7:01 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-kernel, akpm, cl, heiko.carstens, mingo, npiggin, sachinp

Hi David,

On Wed, 16 Sep 2009, Pekka Enberg wrote:
> > The SLQB allocator is known to be broken on certain PowerPC and S390
> > configurations. Disable the allocator in Kconfig for those architectures
> > until the issues are resolved.
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Christoph Lameter <cl@linux-foundation.org>
> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Nick Piggin <npiggin@suse.de>
> > Cc: Sachin Sant <sachinp@in.ibm.com>
> > Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
> > ---
> >  init/Kconfig |    1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> > 
> > diff --git a/init/Kconfig b/init/Kconfig
> > index c0d8a47..aaeddeb 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -1033,6 +1033,7 @@ config SLUB
> >  
> >  config SLQB
> >  	bool "SLQB (Queued allocator)"
> > +	depends on !PPC && !S390
> >  	help
> >  	  SLQB is a proposed new slab allocator.
> >  

On Tue, 2009-09-15 at 23:55 -0700, David Rientjes wrote:
> I think this should be (!PPC && !S390) || EXPERIMENTAL so that it can 
> still be enabled for debugging and development.

Everybody enables EXPERIMENTAL so that seems pointless. Developers can
hack Kconfig locally, no?

On Tue, 2009-09-15 at 23:55 -0700, David Rientjes wrote:
> Is this in preparation for slqb inclusion as a non-default slab
> allocator in 2.6.32?  2.6.33?

Non-default for 2.6.32.

			Pekka


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-16  7:01   ` Pekka Enberg
@ 2009-09-16  8:04     ` Heiko Carstens
  0 siblings, 0 replies; 26+ messages in thread
From: Heiko Carstens @ 2009-09-16  8:04 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Rientjes, linux-kernel, akpm, cl, mingo, npiggin, sachinp

On Wed, Sep 16, 2009 at 10:01:08AM +0300, Pekka Enberg wrote:
> On Wed, 16 Sep 2009, Pekka Enberg wrote:
> > > The SLQB allocator is known to be broken on certain PowerPC and S390
> > > configurations. Disable the allocator in Kconfig for those architectures
> > > until the issues are resolved.

Looks ok to me. I'll debug s390 when time permits.

> > >  init/Kconfig |    1 +
> > >  1 files changed, 1 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/init/Kconfig b/init/Kconfig
> > > index c0d8a47..aaeddeb 100644
> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -1033,6 +1033,7 @@ config SLUB
> > >  
> > >  config SLQB
> > >  	bool "SLQB (Queued allocator)"
> > > +	depends on !PPC && !S390
> > >  	help
> > >  	  SLQB is a proposed new slab allocator.
> > >  
> 
> On Tue, 2009-09-15 at 23:55 -0700, David Rientjes wrote:
> > I think this should be (!PPC && !S390) || EXPERIMENTAL so that it can 
> > still be enabled for debugging and development.
> 
> Everybody enables EXPERIMENTAL so that seems pointless. Developers can
> hack Kconfig locally, no?

Exactly.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-16  6:37 [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390 Pekka Enberg
  2009-09-16  6:55 ` David Rientjes
@ 2009-09-17 10:08 ` Mel Gorman
  2009-09-17 10:29   ` Pekka Enberg
  1 sibling, 1 reply; 26+ messages in thread
From: Mel Gorman @ 2009-09-17 10:08 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: linux-kernel, akpm, cl, heiko.carstens, mingo, npiggin, sachinp

On Wed, Sep 16, 2009 at 09:37:39AM +0300, Pekka Enberg wrote:
> The SLQB allocator is known to be broken on certain PowerPC and S390
> configurations. Disable the allocator in Kconfig for those architectures
> until the issues are resolved.
> 

Can the issues be summarised?

The danger is if SLQB is being silently disabled, it'll never be noticed
or debugged :/

> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Christoph Lameter <cl@linux-foundation.org>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Nick Piggin <npiggin@suse.de>
> Cc: Sachin Sant <sachinp@in.ibm.com>
> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
> ---
>  init/Kconfig |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index c0d8a47..aaeddeb 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1033,6 +1033,7 @@ config SLUB
>  
>  config SLQB
>  	bool "SLQB (Queued allocator)"
> +	depends on !PPC && !S390
>  	help
>  	  SLQB is a proposed new slab allocator.
>  
> -- 
> 1.5.6.3
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 10:08 ` Mel Gorman
@ 2009-09-17 10:29   ` Pekka Enberg
  2009-09-17 10:57     ` Mel Gorman
  0 siblings, 1 reply; 26+ messages in thread
From: Pekka Enberg @ 2009-09-17 10:29 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-kernel, akpm, cl, heiko.carstens, mingo, npiggin, sachinp

Hi Mel,

On Wed, Sep 16, 2009 at 09:37:39AM +0300, Pekka Enberg wrote:
> > The SLQB allocator is known to be broken on certain PowerPC and S390
> > configurations. Disable the allocator in Kconfig for those architectures
> > until the issues are resolved. 
> 
> Can the issues be summarised?

It's a boot time crash during module load:

http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg33092.html

AFAICT, it's related to a memoryless node 0. Nick suggested it could be
a latent bug in the kernel that's triggered by SLQB.

On Thu, 2009-09-17 at 11:08 +0100, Mel Gorman wrote:
> The danger is if SLQB is being silently disabled, it'll never be noticed
> or debugged :/

Maybe, but that's not an excuse to push something that's known to break.

The other alternative is to skip this release cycle but I'm not sure
what we'd gain with that. Nick already stated in private that he'll try
to arrange for some time with ppc machines to debug the thing and we
hope to be able to fix it by 2.6.32 final.

Btw, the code is in slqb/core branch of slab.git in case someone wants
to take a stab at fixing the bug.

			Pekka


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 10:29   ` Pekka Enberg
@ 2009-09-17 10:57     ` Mel Gorman
  2009-09-17 11:13       ` Pekka Enberg
                         ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Mel Gorman @ 2009-09-17 10:57 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: linux-kernel, akpm, cl, heiko.carstens, mingo, npiggin, sachinp

On Thu, Sep 17, 2009 at 01:29:24PM +0300, Pekka Enberg wrote:
> Hi Mel,
> 
> On Wed, Sep 16, 2009 at 09:37:39AM +0300, Pekka Enberg wrote:
> > > The SLQB allocator is known to be broken on certain PowerPC and S390
> > > configurations. Disable the allocator in Kconfig for those architectures
> > > until the issues are resolved. 
> > 
> > Can the issues be summarised?
> 
> It's a boot time crash during module load:
> 
> http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg33092.html
> 
> AFAICT, it's related to a memoryless node 0. Nick suggested it could be
> a latent bug in the kernel that's triggered by SLQB.
> 

The danger is that this isn't a PPC or s390 bug then as such, but a bug where
there are either memoryless nodes or when node 0 is memoryless.  Hence, there
is no guarantee that your Kconfig option will catch all instances where this
bug triggers.  Granted, the configuration is most likely a PPC machine :)

> On Thu, 2009-09-17 at 11:08 +0100, Mel Gorman wrote:
> > The danger is if SLQB is being silently disabled, it'll never be noticed
> > or debugged :/
> 
> Maybe, but that's not an excuse to push something that's known to break.
> 

Wow, this is from back in May! Lame.

I'm against silently disabling it. Memoryless nodes are extremely rare but
bugs crop up there occasionally and take a long time to catch and squash. SLQB
breaking there is not going to cause widespread damage but force a fix to
be developed by the people with access to the affected machines.

> The other alternative is to skip this release cycle but I'm not sure
> what we'd gain with that. Nick already stated in private that he'll try
> to arrange for some time with ppc machines to debug the thing and we
> hope to be able to fix it by 2.6.32 final.
> 

I have access to a ppc machine but not necessarily one with a memoryless nodes
that can reproduce this problem.

Assuming Sachin is the reporter and we are in the same company, maybe I
have access to the machine. Sachin, can you mail me privately what this
machine is called and lets see can I get some time on that machine? By
any chance, was this bisected or did it just show up when SLQB became
the default?

Total aside, does anybody know handily if fake NUMA support allows the
creation of memoryless nodes help reproducing problems like this? If I can't
get a real machine, that'll be the approach I'll be trying.

> Btw, the code is in slqb/core branch of slab.git in case someone wants
> to take a stab at fixing the bug.
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 10:57     ` Mel Gorman
@ 2009-09-17 11:13       ` Pekka Enberg
  2009-09-17 11:18         ` Mel Gorman
  2009-09-17 11:23       ` Sachin Sant
  2009-09-17 12:12       ` Heiko Carstens
  2 siblings, 1 reply; 26+ messages in thread
From: Pekka Enberg @ 2009-09-17 11:13 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-kernel, akpm, cl, heiko.carstens, mingo, npiggin, sachinp

Hi Mel,

On Thu, 2009-09-17 at 11:57 +0100, Mel Gorman wrote:
> The danger is that this isn't a PPC or s390 bug then as such, but a bug where
> there are either memoryless nodes or when node 0 is memoryless.  Hence, there
> is no guarantee that your Kconfig option will catch all instances where this
> bug triggers.  Granted, the configuration is most likely a PPC machine :)

Yes, I suggested making SLQB depend on !NUMA to Nick but he didn't like
that as it's known to be good on x86 NUMA configs.

On Thu, 2009-09-17 at 11:08 +0100, Mel Gorman wrote:
> > > The danger is if SLQB is being silently disabled, it'll never be noticed
> > > or debugged :/
> > 
> > Maybe, but that's not an excuse to push something that's known to break. 

On Thu, 2009-09-17 at 11:57 +0100, Mel Gorman wrote:
> Wow, this is from back in May! Lame.

Heh, my (lame) excuse is lack of relevant hardware.... ;-)

On Thu, 2009-09-17 at 11:57 +0100, Mel Gorman wrote:
> I'm against silently disabling it. Memoryless nodes are extremely rare but
> bugs crop up there occasionally and take a long time to catch and squash. SLQB
> breaking there is not going to cause widespread damage but force a fix to
> be developed by the people with access to the affected machines.

Hey, if someone sends me fix for the bug well before the merge window
closes, that would be great! But there's no way we're adding new core
kernel code that's _known_ to break peoples configs, at least not
through slab.git. If disabling SLQB is not acceptable and we're unable
to fix things, we'll just have to skip this release cycle.

On Thu, 2009-09-17 at 11:57 +0100, Mel Gorman wrote:
> Total aside, does anybody know handily if fake NUMA support allows the
> creation of memoryless nodes help reproducing problems like this? If I can't
> get a real machine, that'll be the approach I'll be trying.

That would be useful, yes.

			Pekka


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 11:13       ` Pekka Enberg
@ 2009-09-17 11:18         ` Mel Gorman
  2009-09-17 11:23           ` Pekka Enberg
  2009-09-17 11:41           ` Nick Piggin
  0 siblings, 2 replies; 26+ messages in thread
From: Mel Gorman @ 2009-09-17 11:18 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: linux-kernel, akpm, cl, heiko.carstens, mingo, npiggin, sachinp

On Thu, Sep 17, 2009 at 02:13:39PM +0300, Pekka Enberg wrote:
> Hi Mel,
> 
> On Thu, 2009-09-17 at 11:57 +0100, Mel Gorman wrote:
> > The danger is that this isn't a PPC or s390 bug then as such, but a bug where
> > there are either memoryless nodes or when node 0 is memoryless.  Hence, there
> > is no guarantee that your Kconfig option will catch all instances where this
> > bug triggers.  Granted, the configuration is most likely a PPC machine :)
> 
> Yes, I suggested making SLQB depend on !NUMA to Nick but he didn't like
> that as it's known to be good on x86 NUMA configs.
> 

Agreed.

> On Thu, 2009-09-17 at 11:08 +0100, Mel Gorman wrote:
> > > > The danger is if SLQB is being silently disabled, it'll never be noticed
> > > > or debugged :/
> > > 
> > > Maybe, but that's not an excuse to push something that's known to break. 
> 
> On Thu, 2009-09-17 at 11:57 +0100, Mel Gorman wrote:
> > Wow, this is from back in May! Lame.
> 
> Heh, my (lame) excuse is lack of relevant hardware.... ;-)
> 

I'm not blaming you. It's just ... unfortunate :/

> On Thu, 2009-09-17 at 11:57 +0100, Mel Gorman wrote:
> > I'm against silently disabling it. Memoryless nodes are extremely rare but
> > bugs crop up there occasionally and take a long time to catch and squash. SLQB
> > breaking there is not going to cause widespread damage but force a fix to
> > be developed by the people with access to the affected machines.
> 
> Hey, if someone sends me fix for the bug well before the merge window
> closes, that would be great! But there's no way we're adding new core
> kernel code that's _known_ to break peoples configs, at least not
> through slab.git. If disabling SLQB is not acceptable and we're unable
> to fix things, we'll just have to skip this release cycle.
> 

Please consider disabling it as an option in the rc3 stage or the like.
With luck, I'll find a suitable machine in time and see what can be
done. I just don't like the idea of x86 defaulting to one allocator and
ppc defaulting to another.

> On Thu, 2009-09-17 at 11:57 +0100, Mel Gorman wrote:
> > Total aside, does anybody know handily if fake NUMA support allows the
> > creation of memoryless nodes help reproducing problems like this? If I can't
> > get a real machine, that'll be the approach I'll be trying.
> 
> That would be useful, yes.
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 11:18         ` Mel Gorman
@ 2009-09-17 11:23           ` Pekka Enberg
  2009-09-17 11:41           ` Nick Piggin
  1 sibling, 0 replies; 26+ messages in thread
From: Pekka Enberg @ 2009-09-17 11:23 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-kernel, akpm, cl, heiko.carstens, mingo, npiggin, sachinp

Hi Mel,

On Thu, 2009-09-17 at 12:18 +0100, Mel Gorman wrote:
> Please consider disabling it as an option in the rc3 stage or the like.
> With luck, I'll find a suitable machine in time and see what can be
> done. I just don't like the idea of x86 defaulting to one allocator and
> ppc defaulting to another.

SLQB is _not_ the default allocator in slqb/core and won't be one for
2.6.32 even if we do manage to sneak the allocator to linus.git.

			Pekka


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 10:57     ` Mel Gorman
  2009-09-17 11:13       ` Pekka Enberg
@ 2009-09-17 11:23       ` Sachin Sant
  2009-09-17 11:38         ` Nick Piggin
  2009-09-17 12:12       ` Heiko Carstens
  2 siblings, 1 reply; 26+ messages in thread
From: Sachin Sant @ 2009-09-17 11:23 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Pekka Enberg, linux-kernel, akpm, cl, heiko.carstens, mingo, npiggin

Mel Gorman wrote:
> I have access to a ppc machine but not necessarily one with a memoryless nodes
> that can reproduce this problem.
>
> Assuming Sachin is the reporter and we are in the same company, maybe I
> have access to the machine. Sachin, can you mail me privately what this
> machine is called and lets see can I get some time on that machine? By
> any chance, was this bisected or did it just show up when SLQB became
> the default?
>   
Mel,

Have sent you the access details for the machine. This bug showed
up when SLQB was enabled as default in linux-next

Thanks
-Sachin

> Total aside, does anybody know handily if fake NUMA support allows the
> creation of memoryless nodes help reproducing problems like this? If I can't
> get a real machine, that'll be the approach I'll be trying.
>
>   
>> Btw, the code is in slqb/core branch of slab.git in case someone wants
>> to take a stab at fixing the bug.
>>
>>     
>
>   


-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 11:23       ` Sachin Sant
@ 2009-09-17 11:38         ` Nick Piggin
  2009-09-17 11:43           ` Pekka Enberg
  0 siblings, 1 reply; 26+ messages in thread
From: Nick Piggin @ 2009-09-17 11:38 UTC (permalink / raw)
  To: Sachin Sant
  Cc: Mel Gorman, Pekka Enberg, linux-kernel, akpm, cl, heiko.carstens, mingo

On Thu, Sep 17, 2009 at 04:53:41PM +0530, Sachin Sant wrote:
> Mel Gorman wrote:
> >I have access to a ppc machine but not necessarily one with a memoryless 
> >nodes
> >that can reproduce this problem.
> >
> >Assuming Sachin is the reporter and we are in the same company, maybe I
> >have access to the machine. Sachin, can you mail me privately what this
> >machine is called and lets see can I get some time on that machine? By
> >any chance, was this bisected or did it just show up when SLQB became
> >the default?
> >  
> Mel,
> 
> Have sent you the access details for the machine. This bug showed
> up when SLQB was enabled as default in linux-next

Maybe it will be better to hold off merging until this is
debugged then? If it is merged as a non-default option, then
I can't see it being a huge issue to merge it a bit after the
window? It doesn't touch anything else...




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 11:18         ` Mel Gorman
  2009-09-17 11:23           ` Pekka Enberg
@ 2009-09-17 11:41           ` Nick Piggin
  2009-09-17 18:18             ` Mel Gorman
  1 sibling, 1 reply; 26+ messages in thread
From: Nick Piggin @ 2009-09-17 11:41 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Pekka Enberg, linux-kernel, akpm, cl, heiko.carstens, mingo, sachinp

On Thu, Sep 17, 2009 at 12:18:28PM +0100, Mel Gorman wrote:
> On Thu, Sep 17, 2009 at 02:13:39PM +0300, Pekka Enberg wrote:
> > On Thu, 2009-09-17 at 11:08 +0100, Mel Gorman wrote:
> > > > > The danger is if SLQB is being silently disabled, it'll never be noticed
> > > > > or debugged :/
> > > > 
> > > > Maybe, but that's not an excuse to push something that's known to break. 
> > 
> > On Thu, 2009-09-17 at 11:57 +0100, Mel Gorman wrote:
> > > Wow, this is from back in May! Lame.
> > 
> > Heh, my (lame) excuse is lack of relevant hardware.... ;-)
> > 
> 
> I'm not blaming you. It's just ... unfortunate :/

Ahh... it's pretty lame of me. Sachin has been a willing tester :(
I have spent quite a few hours looking at it but I never found
many good leads. Much appreciated if you can make more progress on
it.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 11:38         ` Nick Piggin
@ 2009-09-17 11:43           ` Pekka Enberg
  2009-09-17 11:52             ` Nick Piggin
  0 siblings, 1 reply; 26+ messages in thread
From: Pekka Enberg @ 2009-09-17 11:43 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Sachin Sant, Mel Gorman, linux-kernel, akpm, cl, heiko.carstens,
	mingo, Linus Torvalds

Hi Nick,

Mel Gorman wrote:
>> >I have access to a ppc machine but not necessarily one with a memoryless
>> >nodes
>> >that can reproduce this problem.
>> >
>> >Assuming Sachin is the reporter and we are in the same company, maybe I
>> >have access to the machine. Sachin, can you mail me privately what this
>> >machine is called and lets see can I get some time on that machine? By
>> >any chance, was this bisected or did it just show up when SLQB became
>> >the default?

On Thu, Sep 17, 2009 at 04:53:41PM +0530, Sachin Sant wrote:
>> Have sent you the access details for the machine. This bug showed
>> up when SLQB was enabled as default in linux-next

On Thu, Sep 17, 2009 at 2:38 PM, Nick Piggin <npiggin@suse.de> wrote:
> Maybe it will be better to hold off merging until this is
> debugged then? If it is merged as a non-default option, then
> I can't see it being a huge issue to merge it a bit after the
> window? It doesn't touch anything else...

Me sending anything but bug fixes to Linus after the merge window is
closed...? That's one scary thought!

But yeah, I can hold the pull request until the issue is resolved. If
Linus doesn't want to merge SLQB for 2.6.32, we'll just try again for
2.6.33.

                        Pekka

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 11:43           ` Pekka Enberg
@ 2009-09-17 11:52             ` Nick Piggin
  2009-09-17 11:55               ` Pekka Enberg
  0 siblings, 1 reply; 26+ messages in thread
From: Nick Piggin @ 2009-09-17 11:52 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Sachin Sant, Mel Gorman, linux-kernel, akpm, cl, heiko.carstens,
	mingo, Linus Torvalds

On Thu, Sep 17, 2009 at 02:43:38PM +0300, Pekka Enberg wrote:
> Hi Nick,
> 
> Mel Gorman wrote:
> >> >I have access to a ppc machine but not necessarily one with a memoryless
> >> >nodes
> >> >that can reproduce this problem.
> >> >
> >> >Assuming Sachin is the reporter and we are in the same company, maybe I
> >> >have access to the machine. Sachin, can you mail me privately what this
> >> >machine is called and lets see can I get some time on that machine? By
> >> >any chance, was this bisected or did it just show up when SLQB became
> >> >the default?
> 
> On Thu, Sep 17, 2009 at 04:53:41PM +0530, Sachin Sant wrote:
> >> Have sent you the access details for the machine. This bug showed
> >> up when SLQB was enabled as default in linux-next
> 
> On Thu, Sep 17, 2009 at 2:38 PM, Nick Piggin <npiggin@suse.de> wrote:
> > Maybe it will be better to hold off merging until this is
> > debugged then? If it is merged as a non-default option, then
> > I can't see it being a huge issue to merge it a bit after the
> > window? It doesn't touch anything else...
> 
> Me sending anything but bug fixes to Linus after the merge window is
> closed...? That's one scary thought!

Well, I don't know if it is much different than merging with known
bugs and expecting to resolve them. Maybe it circumvents the
_letter_ of the merge window law, but in spirit it is probably
nicer to merge after fixing the bug :)

But I'll let you decide how to proceed.
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 11:52             ` Nick Piggin
@ 2009-09-17 11:55               ` Pekka Enberg
  0 siblings, 0 replies; 26+ messages in thread
From: Pekka Enberg @ 2009-09-17 11:55 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Sachin Sant, Mel Gorman, linux-kernel, akpm, cl, heiko.carstens,
	mingo, Linus Torvalds

On Thu, 2009-09-17 at 13:52 +0200, Nick Piggin wrote:
> > Me sending anything but bug fixes to Linus after the merge window is
> > closed...? That's one scary thought!
> 
> Well, I don't know if it is much different than merging with known
> bugs and expecting to resolve them. Maybe it circumvents the
> _letter_ of the merge window law, but in spirit it is probably
> nicer to merge after fixing the bug :)
> 
> But I'll let you decide how to proceed.

I'll just wait for a fix to appear. Saves me from sleepless nights while
worrying for Linus' revenge.

			Pekka


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 10:57     ` Mel Gorman
  2009-09-17 11:13       ` Pekka Enberg
  2009-09-17 11:23       ` Sachin Sant
@ 2009-09-17 12:12       ` Heiko Carstens
  2009-09-17 12:16         ` Pekka Enberg
  2 siblings, 1 reply; 26+ messages in thread
From: Heiko Carstens @ 2009-09-17 12:12 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Pekka Enberg, linux-kernel, akpm, cl, mingo, npiggin, sachinp

On Thu, Sep 17, 2009 at 11:57:08AM +0100, Mel Gorman wrote:
> On Thu, Sep 17, 2009 at 01:29:24PM +0300, Pekka Enberg wrote:
> > On Wed, Sep 16, 2009 at 09:37:39AM +0300, Pekka Enberg wrote:
> > > > The SLQB allocator is known to be broken on certain PowerPC and S390
> > > > configurations. Disable the allocator in Kconfig for those architectures
> > > > until the issues are resolved. 
> > > 
> > > Can the issues be summarised?
> > 
> > It's a boot time crash during module load:
> > 
> > http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg33092.html
> > 
> > AFAICT, it's related to a memoryless node 0. Nick suggested it could be
> > a latent bug in the kernel that's triggered by SLQB.
> 
> The danger is that this isn't a PPC or s390 bug then as such, but a bug where
> there are either memoryless nodes or when node 0 is memoryless.  Hence, there
> is no guarantee that your Kconfig option will catch all instances where this
> bug triggers.  Granted, the configuration is most likely a PPC machine :)

Ok, I just wanted to debug this on s390. But... the bug seems to have
disappeared.
I pulled in

git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6.git slqb/core

and tried defconfig (+SLQB) as well as allyesconfig (+SLQB). Both started and
didn't show the crash-before-console-is-active which went away when switching
to a different allocator.
So the s390 restriction seems to be resolved. Don't know why...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 12:12       ` Heiko Carstens
@ 2009-09-17 12:16         ` Pekka Enberg
  2009-09-17 12:21           ` Heiko Carstens
  0 siblings, 1 reply; 26+ messages in thread
From: Pekka Enberg @ 2009-09-17 12:16 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Mel Gorman, linux-kernel, akpm, cl, mingo, npiggin, sachinp

On Thu, 2009-09-17 at 14:12 +0200, Heiko Carstens wrote:
> Ok, I just wanted to debug this on s390. But... the bug seems to have
> disappeared.
> I pulled in
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6.git slqb/core
> 
> and tried defconfig (+SLQB) as well as allyesconfig (+SLQB). Both started and
> didn't show the crash-before-console-is-active which went away when switching
> to a different allocator.
> So the s390 restriction seems to be resolved. Don't know why...

IIRC, this is the only bug fix that was merged after your report:

http://git.kernel.org/?p=linux/kernel/git/penberg/slab-2.6.git;a=commitdiff;h=ff61c4950125b09b5e5a83d48a6c81827e9d67ab

			Pekka


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 12:16         ` Pekka Enberg
@ 2009-09-17 12:21           ` Heiko Carstens
  2009-09-17 12:36             ` Nick Piggin
  0 siblings, 1 reply; 26+ messages in thread
From: Heiko Carstens @ 2009-09-17 12:21 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Mel Gorman, linux-kernel, akpm, cl, mingo, npiggin, sachinp

On Thu, Sep 17, 2009 at 03:16:21PM +0300, Pekka Enberg wrote:
> On Thu, 2009-09-17 at 14:12 +0200, Heiko Carstens wrote:
> > Ok, I just wanted to debug this on s390. But... the bug seems to have
> > disappeared.
> > I pulled in
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6.git slqb/core
> > 
> > and tried defconfig (+SLQB) as well as allyesconfig (+SLQB). Both started and
> > didn't show the crash-before-console-is-active which went away when switching
> > to a different allocator.
> > So the s390 restriction seems to be resolved. Don't know why...
> 
> IIRC, this is the only bug fix that was merged after your report:
> 
> http://git.kernel.org/?p=linux/kernel/git/penberg/slab-2.6.git;a=commitdiff;h=ff61c4950125b09b5e5a83d48a6c81827e9d67ab

Still works even if I revert that patch. Probably something else interfered.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 12:21           ` Heiko Carstens
@ 2009-09-17 12:36             ` Nick Piggin
  2009-09-17 12:42               ` Pekka Enberg
  0 siblings, 1 reply; 26+ messages in thread
From: Nick Piggin @ 2009-09-17 12:36 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Pekka Enberg, Mel Gorman, linux-kernel, akpm, cl, mingo, sachinp

On Thu, Sep 17, 2009 at 02:21:41PM +0200, Heiko Carstens wrote:
> On Thu, Sep 17, 2009 at 03:16:21PM +0300, Pekka Enberg wrote:
> > On Thu, 2009-09-17 at 14:12 +0200, Heiko Carstens wrote:
> > > Ok, I just wanted to debug this on s390. But... the bug seems to have
> > > disappeared.
> > > I pulled in
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6.git slqb/core
> > > 
> > > and tried defconfig (+SLQB) as well as allyesconfig (+SLQB). Both started and
> > > didn't show the crash-before-console-is-active which went away when switching
> > > to a different allocator.
> > > So the s390 restriction seems to be resolved. Don't know why...
> > 
> > IIRC, this is the only bug fix that was merged after your report:
> > 
> > http://git.kernel.org/?p=linux/kernel/git/penberg/slab-2.6.git;a=commitdiff;h=ff61c4950125b09b5e5a83d48a6c81827e9d67ab
> 
> Still works even if I revert that patch. Probably something else interfered.

Nasty. I don't suppose it would be too much work to try bisecting it?
Unfortunately with this kind of thing, bisecting doesn't always point
to anything meaningful anyway :(

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 12:36             ` Nick Piggin
@ 2009-09-17 12:42               ` Pekka Enberg
  0 siblings, 0 replies; 26+ messages in thread
From: Pekka Enberg @ 2009-09-17 12:42 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Heiko Carstens, Mel Gorman, linux-kernel, akpm, cl, mingo, sachinp

On Thu, Sep 17, 2009 at 3:36 PM, Nick Piggin <npiggin@suse.de> wrote:
> On Thu, Sep 17, 2009 at 02:21:41PM +0200, Heiko Carstens wrote:
>> On Thu, Sep 17, 2009 at 03:16:21PM +0300, Pekka Enberg wrote:
>> > On Thu, 2009-09-17 at 14:12 +0200, Heiko Carstens wrote:
>> > > Ok, I just wanted to debug this on s390. But... the bug seems to have
>> > > disappeared.
>> > > I pulled in
>> > >
>> > > git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6.git slqb/core
>> > >
>> > > and tried defconfig (+SLQB) as well as allyesconfig (+SLQB). Both started and
>> > > didn't show the crash-before-console-is-active which went away when switching
>> > > to a different allocator.
>> > > So the s390 restriction seems to be resolved. Don't know why...
>> >
>> > IIRC, this is the only bug fix that was merged after your report:
>> >
>> > http://git.kernel.org/?p=linux/kernel/git/penberg/slab-2.6.git;a=commitdiff;h=ff61c4950125b09b5e5a83d48a6c81827e9d67ab
>>
>> Still works even if I revert that patch. Probably something else interfered.
>
> Nasty. I don't suppose it would be too much work to try bisecting it?
> Unfortunately with this kind of thing, bisecting doesn't always point
> to anything meaningful anyway :(

Bisecting won't work here because I rebased the branch to fix merge
conflicts and move the Kconfig and Makefile changes at the end of the
series to make sure SLQB won't break "git bisect". Blah, I guess I
shouldn't have do that. :-(

                        Pekka

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 11:41           ` Nick Piggin
@ 2009-09-17 18:18             ` Mel Gorman
  2009-09-17 18:28               ` Nick Piggin
  0 siblings, 1 reply; 26+ messages in thread
From: Mel Gorman @ 2009-09-17 18:18 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Pekka Enberg, linux-kernel, akpm, cl, heiko.carstens, mingo, sachinp

On Thu, Sep 17, 2009 at 01:41:16PM +0200, Nick Piggin wrote:
> On Thu, Sep 17, 2009 at 12:18:28PM +0100, Mel Gorman wrote:
> > On Thu, Sep 17, 2009 at 02:13:39PM +0300, Pekka Enberg wrote:
> > > On Thu, 2009-09-17 at 11:08 +0100, Mel Gorman wrote:
> > > > > > The danger is if SLQB is being silently disabled, it'll never be noticed
> > > > > > or debugged :/
> > > > > 
> > > > > Maybe, but that's not an excuse to push something that's known to break. 
> > > 
> > > On Thu, 2009-09-17 at 11:57 +0100, Mel Gorman wrote:
> > > > Wow, this is from back in May! Lame.
> > > 
> > > Heh, my (lame) excuse is lack of relevant hardware.... ;-)
> > > 
> > 
> > I'm not blaming you. It's just ... unfortunate :/
> 
> Ahh... it's pretty lame of me. Sachin has been a willing tester :(
> I have spent quite a few hours looking at it but I never found
> many good leads. Much appreciated if you can make more progress on
> it.

Nothing much so far. I've reproduced the problem based on 2.6.31 and slqb-core
from Pekka's tree but not a whole pile else. I don't know SLQB at all so the
investigation is fuzzy. It appears to initialise SLQB ok but crashes later when
setting up SCSI. Not 100% sure what the triggering event is but it might be
userspace starting up and other CPUs get involved, possibly corrupting lists.

This machine has two CPUs (0, 1) and two nodes with actual memory (2,3).
After applying a patch to kmem_cache_create, I see in the console

MEL::Creating cache pgd_cache CPU 0 Node 0
MEL::Creating cache pmd_cache CPU 0 Node 0
MEL::Creating cache pid_namespace CPU 0 Node 0
MEL::Creating cache shmem_inode_cache CPU 0 Node 0
MEL::Creating cache scsi_data_buffer CPU 1 Node 0

It crashes at this point during creation before the struct kmem_cache has
been allocated from kmem_cache_cache. Note it's kmem_cache_cache we are
failing to allocate from, not scsi_data_buffer.

I have no theories yet but will stick with it. Any suggestions on where
to investigate are welcome. Will pick this up again tomorrow.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 18:18             ` Mel Gorman
@ 2009-09-17 18:28               ` Nick Piggin
  2009-09-17 18:38                 ` Christoph Lameter
  2009-09-18 15:56                 ` Mel Gorman
  0 siblings, 2 replies; 26+ messages in thread
From: Nick Piggin @ 2009-09-17 18:28 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Pekka Enberg, linux-kernel, akpm, cl, heiko.carstens, mingo, sachinp

On Thu, Sep 17, 2009 at 07:18:32PM +0100, Mel Gorman wrote:
> > Ahh... it's pretty lame of me. Sachin has been a willing tester :(
> > I have spent quite a few hours looking at it but I never found
> > many good leads. Much appreciated if you can make more progress on
> > it.
> 
> Nothing much so far. I've reproduced the problem based on 2.6.31 and slqb-core
> from Pekka's tree but not a whole pile else. I don't know SLQB at all so the
> investigation is fuzzy. It appears to initialise SLQB ok but crashes later when
> setting up SCSI. Not 100% sure what the triggering event is but it might be
> userspace starting up and other CPUs get involved, possibly corrupting lists.
> 
> This machine has two CPUs (0, 1) and two nodes with actual memory (2,3).
> After applying a patch to kmem_cache_create, I see in the console
> 
> MEL::Creating cache pgd_cache CPU 0 Node 0
> MEL::Creating cache pmd_cache CPU 0 Node 0
> MEL::Creating cache pid_namespace CPU 0 Node 0
> MEL::Creating cache shmem_inode_cache CPU 0 Node 0
> MEL::Creating cache scsi_data_buffer CPU 1 Node 0
> 
> It crashes at this point during creation before the struct kmem_cache has
> been allocated from kmem_cache_cache. Note it's kmem_cache_cache we are
> failing to allocate from, not scsi_data_buffer.

Yes, it's crashing in kmem_cache_create, when trying to allocate from
kmem_cache_cache.

I didn't get much further. I had thought something must be NULL or
not set up correctly in kmem_cache_cache, but I didn't work out what.

If you can identify the precondition which cases the crash (or even
just have a static counter of the number of caches created, to trigger
at the crashing cache create), then perhaps you can dump some more
details of the kmem_cache_cache.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 18:28               ` Nick Piggin
@ 2009-09-17 18:38                 ` Christoph Lameter
  2009-09-17 18:51                   ` Nick Piggin
  2009-09-18 15:56                 ` Mel Gorman
  1 sibling, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2009-09-17 18:38 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Mel Gorman, Pekka Enberg, linux-kernel, akpm, heiko.carstens,
	mingo, sachinp

On Thu, 17 Sep 2009, Nick Piggin wrote:

> > This machine has two CPUs (0, 1) and two nodes with actual memory (2,3).
> > After applying a patch to kmem_cache_create, I see in the console
> >
> > MEL::Creating cache pgd_cache CPU 0 Node 0
> > MEL::Creating cache pmd_cache CPU 0 Node 0
> > MEL::Creating cache pid_namespace CPU 0 Node 0
> > MEL::Creating cache shmem_inode_cache CPU 0 Node 0
> > MEL::Creating cache scsi_data_buffer CPU 1 Node 0

So we have two nodes 2,3 nothing on node 0 and we are creating caches on
the node that does not exist? SLQB assumes node 0 is present?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 18:38                 ` Christoph Lameter
@ 2009-09-17 18:51                   ` Nick Piggin
  0 siblings, 0 replies; 26+ messages in thread
From: Nick Piggin @ 2009-09-17 18:51 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Pekka Enberg, linux-kernel, akpm, heiko.carstens,
	mingo, sachinp

On Thu, Sep 17, 2009 at 02:38:08PM -0400, Christoph Lameter wrote:
> On Thu, 17 Sep 2009, Nick Piggin wrote:
> 
> > > This machine has two CPUs (0, 1) and two nodes with actual memory (2,3).
> > > After applying a patch to kmem_cache_create, I see in the console
> > >
> > > MEL::Creating cache pgd_cache CPU 0 Node 0
> > > MEL::Creating cache pmd_cache CPU 0 Node 0
> > > MEL::Creating cache pid_namespace CPU 0 Node 0
> > > MEL::Creating cache shmem_inode_cache CPU 0 Node 0
> > > MEL::Creating cache scsi_data_buffer CPU 1 Node 0
> 
> So we have two nodes 2,3 nothing on node 0 and we are creating caches on
> the node that does not exist? SLQB assumes node 0 is present?

It might do somewhere, but it shouldn't. It will ask for node 0
by default presumably if CPU0 is on node 0, but it should be able
to fall back...


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390
  2009-09-17 18:28               ` Nick Piggin
  2009-09-17 18:38                 ` Christoph Lameter
@ 2009-09-18 15:56                 ` Mel Gorman
  1 sibling, 0 replies; 26+ messages in thread
From: Mel Gorman @ 2009-09-18 15:56 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Pekka Enberg, linux-kernel, akpm, cl, heiko.carstens, mingo, sachinp

On Thu, Sep 17, 2009 at 08:28:42PM +0200, Nick Piggin wrote:
> On Thu, Sep 17, 2009 at 07:18:32PM +0100, Mel Gorman wrote:
> > > Ahh... it's pretty lame of me. Sachin has been a willing tester :(
> > > I have spent quite a few hours looking at it but I never found
> > > many good leads. Much appreciated if you can make more progress on
> > > it.
> > 
> > Nothing much so far. I've reproduced the problem based on 2.6.31 and slqb-core
> > from Pekka's tree but not a whole pile else. I don't know SLQB at all so the
> > investigation is fuzzy. It appears to initialise SLQB ok but crashes later when
> > setting up SCSI. Not 100% sure what the triggering event is but it might be
> > userspace starting up and other CPUs get involved, possibly corrupting lists.
> > 
> > This machine has two CPUs (0, 1) and two nodes with actual memory (2,3).
> > After applying a patch to kmem_cache_create, I see in the console
> > 
> > MEL::Creating cache pgd_cache CPU 0 Node 0
> > MEL::Creating cache pmd_cache CPU 0 Node 0
> > MEL::Creating cache pid_namespace CPU 0 Node 0
> > MEL::Creating cache shmem_inode_cache CPU 0 Node 0
> > MEL::Creating cache scsi_data_buffer CPU 1 Node 0
> > 
> > It crashes at this point during creation before the struct kmem_cache has
> > been allocated from kmem_cache_cache. Note it's kmem_cache_cache we are
> > failing to allocate from, not scsi_data_buffer.
> 
> Yes, it's crashing in kmem_cache_create, when trying to allocate from
> kmem_cache_cache.
> 
> I didn't get much further. I had thought something must be NULL or
> not set up correctly in kmem_cache_cache, but I didn't work out what.
> 

Somehow it's getting scrambled but I couldn't see anything wrong with
the locking. Weirdly, the following patch allows the kernel to boot much
further. Is it possible the DEFINE_PER_CPU() trick for defining per-node data
is being busted by recent per-cpu changes? Sorry that this is pretty crude,
it's my first proper reading of SLQB so it's slow going.

Although booting gets further with the following patch, it quickly hits
an OOM-storm until the machine dies with the vast majority of pages being
allocated by the slab allocator. There must be some flaw in the node fallback
logic that means that pages are allocated for every slab allocation.

diff --git a/mm/slqb.c b/mm/slqb.c
index 4ca85e2..4d72be2 100644
--- a/mm/slqb.c
+++ b/mm/slqb.c
@@ -1944,16 +1944,16 @@ static void init_kmem_cache_node(struct kmem_cache *s,
 static DEFINE_PER_CPU(struct kmem_cache_cpu, kmem_cache_cpus);
 #endif
 #ifdef CONFIG_NUMA
-/* XXX: really need a DEFINE_PER_NODE for per-node data, but this is better than
- * a static array */
-static DEFINE_PER_CPU(struct kmem_cache_node, kmem_cache_nodes);
+/* XXX: really need a DEFINE_PER_NODE for per-node data because a static
+ *      array is wasteful */
+static struct kmem_cache_node kmem_cache_nodes[MAX_NUMNODES];
 #endif
 
 #ifdef CONFIG_SMP
 static struct kmem_cache kmem_cpu_cache;
 static DEFINE_PER_CPU(struct kmem_cache_cpu, kmem_cpu_cpus);
 #ifdef CONFIG_NUMA
-static DEFINE_PER_CPU(struct kmem_cache_node, kmem_cpu_nodes); /* XXX per-nid */
+static struct kmem_cache_node kmem_cpu_nodes[MAX_NUMNODES]; /* XXX per-nid */
 #endif
 #endif
 
@@ -1962,7 +1962,7 @@ static struct kmem_cache kmem_node_cache;
 #ifdef CONFIG_SMP
 static DEFINE_PER_CPU(struct kmem_cache_cpu, kmem_node_cpus);
 #endif
-static DEFINE_PER_CPU(struct kmem_cache_node, kmem_node_nodes); /*XXX per-nid */
+static struct kmem_cache_node kmem_node_nodes[MAX_NUMNODES]; /*XXX per-nid */
 #endif
 
 #ifdef CONFIG_SMP
@@ -2918,15 +2918,15 @@ void __init kmem_cache_init(void)
 	for_each_node_state(i, N_NORMAL_MEMORY) {
 		struct kmem_cache_node *n;
 
-		n = &per_cpu(kmem_cache_nodes, i);
+		n = &kmem_cache_nodes[i];
 		init_kmem_cache_node(&kmem_cache_cache, n);
 		kmem_cache_cache.node_slab[i] = n;
 #ifdef CONFIG_SMP
-		n = &per_cpu(kmem_cpu_nodes, i);
+		n = &kmem_cpu_nodes[i];
 		init_kmem_cache_node(&kmem_cpu_cache, n);
 		kmem_cpu_cache.node_slab[i] = n;
 #endif
-		n = &per_cpu(kmem_node_nodes, i);
+		n = &kmem_node_nodes[i];
 		init_kmem_cache_node(&kmem_node_cache, n);
 		kmem_node_cache.node_slab[i] = n;
 	}

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2009-09-18 15:56 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-16  6:37 [RFC/PATCH] SLQB: Mark the allocator as broken PowerPC and S390 Pekka Enberg
2009-09-16  6:55 ` David Rientjes
2009-09-16  7:01   ` Pekka Enberg
2009-09-16  8:04     ` Heiko Carstens
2009-09-17 10:08 ` Mel Gorman
2009-09-17 10:29   ` Pekka Enberg
2009-09-17 10:57     ` Mel Gorman
2009-09-17 11:13       ` Pekka Enberg
2009-09-17 11:18         ` Mel Gorman
2009-09-17 11:23           ` Pekka Enberg
2009-09-17 11:41           ` Nick Piggin
2009-09-17 18:18             ` Mel Gorman
2009-09-17 18:28               ` Nick Piggin
2009-09-17 18:38                 ` Christoph Lameter
2009-09-17 18:51                   ` Nick Piggin
2009-09-18 15:56                 ` Mel Gorman
2009-09-17 11:23       ` Sachin Sant
2009-09-17 11:38         ` Nick Piggin
2009-09-17 11:43           ` Pekka Enberg
2009-09-17 11:52             ` Nick Piggin
2009-09-17 11:55               ` Pekka Enberg
2009-09-17 12:12       ` Heiko Carstens
2009-09-17 12:16         ` Pekka Enberg
2009-09-17 12:21           ` Heiko Carstens
2009-09-17 12:36             ` Nick Piggin
2009-09-17 12:42               ` Pekka Enberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.