From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753958Ab0IMPxp (ORCPT ); Mon, 13 Sep 2010 11:53:45 -0400 Received: from bombadil.infradead.org ([18.85.46.34]:33851 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753419Ab0IMPxo convert rfc822-to-8bit (ORCPT ); Mon, 13 Sep 2010 11:53:44 -0400 Subject: Re: [RFC patch 1/2] sched: dynamically adapt granularity with nr_running From: Peter Zijlstra To: Mike Galbraith Cc: LKML , Linus Torvalds , Andrew Morton , Ingo Molnar , Thomas Gleixner , Tony Lindgren , Mike Galbraith , Steven Rostedt In-Reply-To: <1284386179.10436.6.camel@marge.simson.net> References: <20100911173732.551632040@efficios.com> <20100911174003.051303123@efficios.com> <1284231470.2251.52.camel@laptop> <20100911195708.GA9273@Krystal> <1284288072.2251.91.camel@laptop> <20100912203712.GD32327@Krystal> <1284382387.2275.265.camel@laptop> <1284383758.2275.283.camel@laptop> <1284386179.10436.6.camel@marge.simson.net> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Date: Mon, 13 Sep 2010 17:53:35 +0200 Message-ID: <1284393215.2275.383.camel@laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2010-09-13 at 15:56 +0200, Mike Galbraith wrote: > > One option is to simply get rid of that stuff in check_preempt_tick() > > and instead do a wakeup-preempt check on the leftmost task instead. > > That's what I wanted to boil it down to instead of putting the extra > preempt check in, but it kills the longish slices of low load. IIRC, > when I tried that, it demolished throughput. Hrm.. yes it would.. So the reason for all this: /* * Ensure that a task that missed wakeup preemption by a * narrow margin doesn't have to wait for a full slice. * This also mitigates buddy induced latencies under load. */ Is to avoid tasks getting too far ahead in virtual time due to buddies, right? Would something like the below work? Don't actually use delta_exec to filter, but use wakeup_gran + min_gran on virtual time, (much like Steve suggested) and then verify using __sched_gran(). Or have I now totally confused myself backwards? - delta_exec is walltime, and should thus we compared against a weighted unit like slice, - delta is a vruntime unit, and is thus weight free, hence we can use granularity/unweighted units. --- kernel/sched_fair.c | 20 ++++++++++++++++---- 1 files changed, 16 insertions(+), 4 deletions(-) diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index 9b5b4f8..7f418de 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -485,6 +485,16 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se) return slice; } +static u64 __sched_gran(unsigned long nr_running) +{ + unsigned long latency = sysctl_sched_latency; + + if (nr_running >= nr_latency) + return sysctl_sched_min_granularity; + + return latency / nr_running; +} + /* * We calculate the vruntime slice of a to be inserted task * @@ -865,14 +875,16 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr) if (!sched_feat(WAKEUP_PREEMPT)) return; - if (delta_exec < sysctl_sched_min_granularity) - return; - if (cfs_rq->nr_running > 1) { struct sched_entity *se = __pick_next_entity(cfs_rq); s64 delta = curr->vruntime - se->vruntime; + u64 wakeup_gran = sysctl_sched_wakeup_granularity; + u64 min_gran = sysctl_sched_min_granularity; + + if (delta < wakeup_gran + min_gran) + return; - if (delta > ideal_runtime) + if (delta > wakeup_gran + __sched_gran(cfs_rq->nr_running)) resched_task(rq_of(cfs_rq)->curr); } }