linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* wierd failures from -mm1
@ 2006-04-07 18:05 Martin Bligh
       [not found] ` <1144433309.24221.7.camel@localhost.localdomain>
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Bligh @ 2006-04-07 18:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Andy Whitcroft

I hadn't mailed this out for a while, cause we weren't sure if it was 
-mm or a testing glitch, but there's been no -git releases, so Andy 
reran -mm to double check, and it still seems to be there. a subsequent
test of rc1 + cons patches didn't hit this ... I think -mm has issues ;-)

Look at the 2.6.17-rc1-mm1 column from: http://test.kernel.org/

Drilling down into the console logs:

http://test.kernel.org/abat/27597/debug/console.log
Hangs after testing NMI watchdog.
http://test.kernel.org/abat/27596/debug/console.log
Hangs after bringing up cpus.

http://test.kernel.org/abat/27598/debug/console.log
http://test.kernel.org/abat/27593/debug/console.log
Both fail with reiserfs fsck errors; at first sight look like just dirty
root partitions, but I don't think they are.



Filesystem is clean
Failed to lock the process to fsck the mounted ro partition. Bad address.
fsck.reiserfs /dev/sda3 failed (status 0x8). Run manually!


Note that it's actually saying it's clean.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: wierd failures from -mm1
       [not found] ` <1144433309.24221.7.camel@localhost.localdomain>
@ 2006-04-07 18:20   ` Martin Bligh
  2006-04-07 19:11     ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Bligh @ 2006-04-07 18:20 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Andrew Morton, linux-kernel, Andy Whitcroft

Dave Hansen wrote:
> On Fri, 2006-04-07 at 11:05 -0700, Martin Bligh wrote:
> 
>>http://test.kernel.org/abat/27596/debug/console.log
>>Hangs after bringing up cpus. 
> 
> 
> See attached patch.  It fixes curly.

Splendid -thanks. This may well fix the first two ... I think the reiser
thing is likely still borked though.

M.

> -- Dave
> 
> 
> ------------------------------------------------------------------------
> 
> Subject:
> [PATCH 2.6.17-rc1-mm1] sched_domain-handle-kmalloc-failure-fix
> From:
> Lee Schermerhorn <Lee.Schermerhorn@hp.com>
> Date:
> Thu, 06 Apr 2006 15:58:47 -0400
> To:
> linux-kernel <linux-kernel@vger.kernel.org>
> 
> To:
> linux-kernel <linux-kernel@vger.kernel.org>
> CC:
> Andrew Morton <akpm@osdl.org>, Eric Whitney <eric.whitney@hp.com>
> 
> 
> [PATCH] sched_domain-handle-kmalloc-failure-fix
> 
> 2.6.17-rc1-mm1 hangs during boot on HP rx8620 and dl585 -- both 4 node
> NUMA platforms.  Problem is in build_sched_domains() setting up the
> sched_group_nodes[] lists, resulting from patch:
> sched_domain-handle-kmalloc-failure.patch
> 
> The referenced patch does not propagate the "next" pointer from the head
> of the list, resulting in a loop between the last 2 groups in the list.
> This causes a tight loop/hang in init_numa_sched_groups_power() because 
> 'sg->next' never == 'group_head' when you have > 2 nodes.
> 
> This patch seems to fix the problem.  
> 
> Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>
> 
> Index: linux-2.6.17-rc1-mm1/kernel/sched.c
> ===================================================================
> --- linux-2.6.17-rc1-mm1.orig/kernel/sched.c	2006-04-06 15:18:32.000000000 -0400
> +++ linux-2.6.17-rc1-mm1/kernel/sched.c	2006-04-06 15:20:49.000000000 -0400
> @@ -6360,7 +6360,7 @@ static int build_sched_domains(const cpu
>  			}
>  			sg->cpu_power = 0;
>  			sg->cpumask = tmp;
> -			sg->next = prev;
> +			sg->next = prev->next;
>  			cpus_or(covered, covered, tmp);
>  			prev->next = sg;
>  			prev = sg;
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: wierd failures from -mm1
  2006-04-07 18:20   ` Martin Bligh
@ 2006-04-07 19:11     ` Andrew Morton
  2006-04-08 14:28       ` Martin J. Bligh
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2006-04-07 19:11 UTC (permalink / raw)
  To: Martin Bligh; +Cc: haveblue, linux-kernel, apw

Martin Bligh <mbligh@mbligh.org> wrote:
>
> Dave Hansen wrote:
>  > On Fri, 2006-04-07 at 11:05 -0700, Martin Bligh wrote:
>  > 
>  >>http://test.kernel.org/abat/27596/debug/console.log
>  >>Hangs after bringing up cpus. 
>  > 
>  > 
>  > See attached patch.  It fixes curly.
> 
>  Splendid -thanks. This may well fix the first two ... I think the reiser
>  thing is likely still borked though.

The reiserfsck problem looks like a failed mlockall.  Reverting
mm-posix-memory-lock.patch should fix it.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: wierd failures from -mm1
  2006-04-07 19:11     ` Andrew Morton
@ 2006-04-08 14:28       ` Martin J. Bligh
  0 siblings, 0 replies; 4+ messages in thread
From: Martin J. Bligh @ 2006-04-08 14:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: haveblue, linux-kernel, apw

Andrew Morton wrote:
> Martin Bligh <mbligh@mbligh.org> wrote:
> 
>>Dave Hansen wrote:
>> > On Fri, 2006-04-07 at 11:05 -0700, Martin Bligh wrote:
>> > 
>> >>http://test.kernel.org/abat/27596/debug/console.log
>> >>Hangs after bringing up cpus. 
>> > 
>> > 
>> > See attached patch.  It fixes curly.
>>
>> Splendid -thanks. This may well fix the first two ... I think the reiser
>> thing is likely still borked though.
> 
> 
> The reiserfsck problem looks like a failed mlockall.  Reverting
> mm-posix-memory-lock.patch should fix it.

Didn't manage to get that test kicked off before you released -mm2,
which seems to work fine (across the boxes that still work, at least)

M.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-04-08 14:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-07 18:05 wierd failures from -mm1 Martin Bligh
     [not found] ` <1144433309.24221.7.camel@localhost.localdomain>
2006-04-07 18:20   ` Martin Bligh
2006-04-07 19:11     ` Andrew Morton
2006-04-08 14:28       ` Martin J. Bligh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).