linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2.6.17-rc1-mm1] sched_domain-handle-kmalloc-failure-fix
@ 2006-04-06 19:58 Lee Schermerhorn
  2006-04-07 18:01 ` Dave Hansen
  0 siblings, 1 reply; 2+ messages in thread
From: Lee Schermerhorn @ 2006-04-06 19:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Eric Whitney

[PATCH] sched_domain-handle-kmalloc-failure-fix

2.6.17-rc1-mm1 hangs during boot on HP rx8620 and dl585 -- both 4 node
NUMA platforms.  Problem is in build_sched_domains() setting up the
sched_group_nodes[] lists, resulting from patch:
sched_domain-handle-kmalloc-failure.patch

The referenced patch does not propagate the "next" pointer from the head
of the list, resulting in a loop between the last 2 groups in the list.
This causes a tight loop/hang in init_numa_sched_groups_power() because 
'sg->next' never == 'group_head' when you have > 2 nodes.

This patch seems to fix the problem.  

Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>

Index: linux-2.6.17-rc1-mm1/kernel/sched.c
===================================================================
--- linux-2.6.17-rc1-mm1.orig/kernel/sched.c	2006-04-06 15:18:32.000000000 -0400
+++ linux-2.6.17-rc1-mm1/kernel/sched.c	2006-04-06 15:20:49.000000000 -0400
@@ -6360,7 +6360,7 @@ static int build_sched_domains(const cpu
 			}
 			sg->cpu_power = 0;
 			sg->cpumask = tmp;
-			sg->next = prev;
+			sg->next = prev->next;
 			cpus_or(covered, covered, tmp);
 			prev->next = sg;
 			prev = sg;



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH 2.6.17-rc1-mm1] sched_domain-handle-kmalloc-failure-fix
  2006-04-06 19:58 [PATCH 2.6.17-rc1-mm1] sched_domain-handle-kmalloc-failure-fix Lee Schermerhorn
@ 2006-04-07 18:01 ` Dave Hansen
  0 siblings, 0 replies; 2+ messages in thread
From: Dave Hansen @ 2006-04-07 18:01 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: linux-kernel, Andrew Morton, Eric Whitney

On Thu, 2006-04-06 at 15:58 -0400, Lee Schermerhorn wrote:
> 2.6.17-rc1-mm1 hangs during boot on HP rx8620 and dl585 -- both 4 node
> NUMA platforms.  Problem is in build_sched_domains() setting up the
> sched_group_nodes[] lists, resulting from patch:
> sched_domain-handle-kmalloc-failure.patch
> 
> The referenced patch does not propagate the "next" pointer from the head
> of the list, resulting in a loop between the last 2 groups in the list.
> This causes a tight loop/hang in init_numa_sched_groups_power() because 
> 'sg->next' never == 'group_head' when you have > 2 nodes. 

Wow.  I'm incredibly impressed that you tracked that down.  I can't
believe how horribly unintelligible that code is.

I ran into the same freeze on a 4-node NUMA-Q.  Your patch fixed it.

Is there any good reason that sched domains has to roll its own linked
lists?  Why not use list_heads?  Seems like it would avoid crappy
problems like this.

-- Dave


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2006-04-07 18:02 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-06 19:58 [PATCH 2.6.17-rc1-mm1] sched_domain-handle-kmalloc-failure-fix Lee Schermerhorn
2006-04-07 18:01 ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).