From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S264452AbTLGQnj (ORCPT ); Sun, 7 Dec 2003 11:43:39 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S264454AbTLGQnj (ORCPT ); Sun, 7 Dec 2003 11:43:39 -0500 Received: from dp.samba.org ([66.70.73.150]:38377 "EHLO lists.samba.org") by vger.kernel.org with ESMTP id S264452AbTLGQnh (ORCPT ); Sun, 7 Dec 2003 11:43:37 -0500 Date: Mon, 8 Dec 2003 03:39:14 +1100 From: Anton Blanchard To: Ingo Molnar Cc: Zwane Mwaikambo , "Martin J. Bligh" , linux-kernel Subject: Re: [patch] sched-HT-2.6.0-test11-A5 Message-ID: <20031207163914.GB19412@krispykreme> References: <1027750000.1069604762@[10.10.2.4]> <392900000.1070737269@[10.10.2.4]> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.4i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hi, > i've seen a similar crash once on a 2-way (4-way) HT box, so there some > startup race going on most likely. Im seeing bootup crashes every now and then on a ppc64 box too. A few other things Ive noticed: - nr_running looks to be wrong. On an idle machine just after booting: 00:07:20 up 14 min, 3 users, load average: 8.00, 7.67, 4.95 Its a 4 core 8 thread machine, so perhaps we are counting idle threads. - The printk had me confused, we are really mapping cpu2 onto cpu1s runqueue. Patch below. - I tried the HT scheduler with NUMA enabled. Same machine, 4 core 8 threads, each NUMA node has 2 cores, 4 threads. Its easy to end up in a sub optimal state: Cpu0 : 0.0% user, 0.0% system, 0.0% nice, 100.0% idle, 0.0% IO-wait Cpu1 : 0.0% user, 0.0% system, 0.0% nice, 100.0% idle, 0.0% IO-wait Cpu2 : 100.0% user, 0.0% system, 0.0% nice, 0.0% idle, 0.0% IO-wait Cpu3 : 100.0% user, 0.0% system, 0.0% nice, 0.0% idle, 0.0% IO-wait Cpu4 : 100.0% user, 0.0% system, 0.0% nice, 0.0% idle, 0.0% IO-wait Cpu5 : 0.0% user, 0.0% system, 0.0% nice, 100.0% idle, 0.0% IO-wait Cpu6 : 100.0% user, 0.0% system, 0.0% nice, 0.0% idle, 0.0% IO-wait Cpu7 : 0.7% user, 0.7% system, 0.0% nice, 98.6% idle, 0.0% IO-wait cpu0/1 are an SMT pair, cpu 0-3 are a NUMA node. As you can see cpu0/1 is free and cpu2/3 is busy on both threads. So far we have noticed nr_cpus_node should probably be nr_runqueues_node now, otherwise the inter node balancing code could make bad decisions. However in this case the imbalance is within the node, so Im not sure why cpu0/1 runqueue hasnt stolen a task from cpu2/3. Anton --- foo/kernel/sched.c.ff 2003-12-03 02:03:41.000000000 -0600 +++ foo/kernel/sched.c 2003-12-04 11:37:40.980022085 -0600 @@ -1452,7 +1452,7 @@ runqueue_t *rq2 = cpu_rq(cpu2); int cpu2_idx_orig = cpu_idx(cpu2), cpu2_idx; - printk("mapping CPU#%d's runqueue to CPU#%d's runqueue.\n", cpu1, cpu2); + printk("mapping CPU#%d's runqueue to CPU#%d's runqueue.\n", cpu2, cpu1); BUG_ON(rq1 == rq2 || rq2->nr_running || rq_idx(cpu1) != cpu1); /* * At this point, we dont have anything in the runqueue yet. So,