From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S263851AbUDFOxe (ORCPT ); Tue, 6 Apr 2004 10:53:34 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S263853AbUDFOxe (ORCPT ); Tue, 6 Apr 2004 10:53:34 -0400 Received: from e32.co.us.ibm.com ([32.97.110.130]:5032 "EHLO e32.co.us.ibm.com") by vger.kernel.org with ESMTP id S263851AbUDFOx1 (ORCPT ); Tue, 6 Apr 2004 10:53:27 -0400 Date: Tue, 6 Apr 2004 20:23:48 +0530 From: Srivatsa Vaddagiri To: Ingo Molnar Cc: rusty@au1.ibm.com, nickpiggin@yahoo.com.au, akpm@osdl.org, linux-kernel@vger.kernel.org, lhcs-devel@lists.sourceforge.net Subject: Re: [Experimental CPU Hotplug PATCH] - Move migrate_all_tasks to CPU_DEAD handling Message-ID: <20040406145348.GA8516@in.ibm.com> Reply-To: vatsa@in.ibm.com References: <20040405121824.GA8497@in.ibm.com> <20040406072543.GA21626@elte.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040406072543.GA21626@elte.hu> User-Agent: Mutt/1.4.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 06, 2004 at 09:25:43AM +0200, Ingo Molnar wrote: > the question is, how much actual latency does the current 'freeze > everything' solution cause? We should prefer simplicity and debuggability > over cleverness of implementation - it's not like we'll have hotplug systems > on everyone's desk in the next year or so. > > also, even assuming a hotplug CPU system, CPU replacement events are not > that common, so the performance of the CPU-down op should not be a big > issue. The function depends on the # of tasks only linearly, and we have > tons of other code that is linear on the # of tasks - in fact we just > finished removing all the quadratic functions. Ingo, I obtained some latency measurements of migrate_all_tasks() on a 4-way 1.2 GHz Power4 PPC64 (p630) box. They are as below: Number of Tasks Cycles (get_cycles) spent in migrate_all_tasks (ms) =========================================================================== 10536 803244 (5.3 ms) 30072 2587940 (17 ms) Extending this to 100000 tasks makes the stoppage time to be for 8 million cycles (~50 ms). My main concern of stopping the machine for so much time was not performance, rather the effect it may have on functioning of the system. The fact that we freeze the machine for (possibly) tons of cycles doing nothing but migration made me uncomfortable. _and_ the fact that it can very well be avoided :) Can we rule out any side effects because of this stoppage? Watchdog timers, cluster heartbeats, jiffies, ..? Not sure .. It just felt much more "safe" and efficient to delegate migration to more safer time in CPU_DEAD notification, when rest of the machine is running. Plus this avoids the cpu_is_offline check in the more hotter path (load_balance/try_to_wake_up)!! -- Thanks and Regards, Srivatsa Vaddagiri, Linux Technology Center, IBM Software Labs, Bangalore, INDIA - 560017