From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932991Ab3FQM07 (ORCPT ); Mon, 17 Jun 2013 08:26:59 -0400 Received: from mail-la0-f41.google.com ([209.85.215.41]:63431 "EHLO mail-la0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932859Ab3FQM05 (ORCPT ); Mon, 17 Jun 2013 08:26:57 -0400 MIME-Version: 1.0 In-Reply-To: <20130617092033.GL3204@twins.programming.kicks-ass.net> References: <1370589652-24549-1-git-send-email-alex.shi@intel.com> <1370589652-24549-4-git-send-email-alex.shi@intel.com> <20130617092033.GL3204@twins.programming.kicks-ass.net> Date: Mon, 17 Jun 2013 20:26:55 +0800 Message-ID: Subject: Re: [patch v8 3/9] sched: set initial value of runnable avg for new forked task From: Lei Wen To: Peter Zijlstra Cc: Alex Shi , mingo@redhat.com, tglx@linutronix.de, akpm@linux-foundation.org, bp@alien8.de, pjt@google.com, namhyung@kernel.org, efault@gmx.de, morten.rasmussen@arm.com, vincent.guittot@linaro.org, preeti@linux.vnet.ibm.com, viresh.kumar@linaro.org, linux-kernel@vger.kernel.org, mgorman@suse.de, riel@redhat.com, wangyun@linux.vnet.ibm.com, Jason Low , Changlong Xie , sgruszka@redhat.com, fweisbec@gmail.com Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Peter, On Mon, Jun 17, 2013 at 5:20 PM, Peter Zijlstra wrote: > On Fri, Jun 14, 2013 at 06:02:45PM +0800, Lei Wen wrote: >> Hi Alex, >> >> On Fri, Jun 7, 2013 at 3:20 PM, Alex Shi wrote: >> > We need initialize the se.avg.{decay_count, load_avg_contrib} for a >> > new forked task. >> > Otherwise random values of above variables cause mess when do new task >> > enqueue: >> > enqueue_task_fair >> > enqueue_entity >> > enqueue_entity_load_avg >> > >> > and make forking balancing imbalance since incorrect load_avg_contrib. >> > >> > Further more, Morten Rasmussen notice some tasks were not launched at >> > once after created. So Paul and Peter suggest giving a start value for >> > new task runnable avg time same as sched_slice(). >> >> I am confused at this comment, how set slice to runnable avg would change >> the behavior of "some tasks were not launched at once after created"? >> >> IMHO, I could only tell that for the new forked task, it could be run if current >> task already be set as need_resched, and preempt_schedule or >> preempt_schedule_irq >> is called. >> >> Since the set slice to avg behavior would not affect this task's vruntime, >> and hence cannot make current running task be need_sched, if >> previously it cannot. > > > So the 'problem' is that our running avg is a 'floating' average; ie. it > decays with time. Now we have to guess about the future of our newly > spawned task -- something that is nigh impossible seeing these CPU > vendors keep refusing to implement the crystal ball instruction. I am curious at this "crystal ball instruction" saying. :) Could it be real? I mean what kind of hw mechanism could achieve such magic power? What I see, for silicon vendor they could provide more monitor unit, but to precise predict the sw's behavior, I don't think hw also this kind of power... > > So there's two asymptotic cases we want to deal well with; 1) the case > where the newly spawned program will be 'nearly' idle for its lifetime; > and 2) the case where its cpu-bound. > > Since we have to guess, we'll go for worst case and assume its > cpu-bound; now we don't want to make the avg so heavy adjusting to the > near-idle case takes forever. We want to be able to quickly adjust and > lower our running avg. > > Now we also don't want to make our avg too light, such that it gets > decremented just for the new task not having had a chance to run yet -- > even if when it would run, it would be more cpu-bound than not. > > So what we do is we make the initial avg of the same duration as that we > guess it takes to run each task on the system at least once -- aka > sched_slice(). > > Of course we can defeat this with wakeup/fork bombs, but in the 'normal' > case it should be good enough. > > > Does that make sense? Thanks for your detailed explanation. Very useful indeed! :) BTW, I have no question for the patch itself, but just confuse at the patch's comment "some tasks were not launched at once after created". Thanks, Lei