From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760752AbdAKVQ6 (ORCPT ); Wed, 11 Jan 2017 16:16:58 -0500 Received: from mail.santannapisa.it ([193.205.80.99]:24761 "EHLO mail.santannapisa.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751444AbdAKVQz (ORCPT ); Wed, 11 Jan 2017 16:16:55 -0500 Date: Wed, 11 Jan 2017 22:16:46 +0100 From: luca abeni To: Juri Lelli Cc: Daniel Bristot de Oliveira , linux-kernel@vger.kernel.org, Peter Zijlstra , Ingo Molnar , Claudio Scordino , Steven Rostedt , Tommaso Cucinotta Subject: Re: [RFC v4 0/6] CPU reclaiming for SCHED_DEADLINE Message-ID: <20170111221646.4da97555@sweethome> In-Reply-To: <20170111150647.GK10415@e106622-lin> References: <1483097591-3871-1-git-send-email-lucabe72@gmail.com> <6c4ab4ec-7164-fafe-5efc-990f3cf31269@redhat.com> <20170104131755.573651ca@sweethome> <20170111121951.GI10415@e106622-lin> <20170111133926.7ec0a5b0@luca> <20170111150647.GK10415@e106622-lin> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 11 Jan 2017 15:06:47 +0000 Juri Lelli wrote: > On 11/01/17 13:39, Luca Abeni wrote: > > Hi Juri, > > (I reply from my new email address) > > > > On Wed, 11 Jan 2017 12:19:51 +0000 > > Juri Lelli wrote: > > [...] > > > > > For example, with my taskset, with a hypothetical perfect > > > > > balance of the whole runqueue, one possible scenario is: > > > > > > > > > > CPU 0 1 2 3 > > > > > # TASKS 3 3 3 2 > > > > > > > > > > In this case, CPUs 0 1 2 are with 100% of local utilization. > > > > > Thus, the current task on these CPUs will have their runtime > > > > > decreased by GRUB. Meanwhile, the luck tasks in the CPU 3 > > > > > would use an additional time that they "globally" do not have > > > > > - because the system, globally, has a load higher than the > > > > > 66.6...% of the local runqueue. Actually, part of the time > > > > > decreased from tasks on [0-2] are being used by the tasks on > > > > > 3, until the next migration of any task, which will change > > > > > the luck tasks... but without any guaranty that all tasks > > > > > will be the luck one on every activation, causing the problem. > > > > > > > > > > Does it make sense? > > > > > > > > Yes; but my impression is that gEDF will migrate tasks so that > > > > the distribution of the reclaimed CPU bandwidth is almost > > > > uniform... Instead, you saw huge differences in the > > > > utilisations (and I do not think that "compressing" the > > > > utilisations from 100% to 95% can decrease the utilisation of a > > > > task from 33% to 25% / 26%... :) > > > > > > I tried to replicate Daniel's experiment, but I don't see such a > > > skewed allocation. They get a reasonably uniform bandwidth and the > > > trace looks fairly good as well (all processes get to run on the > > > different processors at some time). > > > > With some effort, I replicated the issue noticed by Daniel... I > > think it also depends on the CPU speed (and on good or bad luck :), > > but the "unfair" CPU allocation can actually happen. > > Yeah, actual allocation in general varies. I guess the question is: do > we care? We currently don't load balance considering utilizations, > only dynamic deadlines matter. Right... But the problem is that with the version of GRUB I proposed this unfairness can result in some tasks receiving less CPU time than the guaranteed amount (because some other tasks receive much more). I think there are at least two possible ways to fix this (without changing the migration strategy), and I am working on them... (hopefully, I'll post something in next week) > > > I was expecting that the task could consume 0.5 worth of bandwidth > > > with the given global limit. Is the current behaviour intended? > > > > > > If we want to change this behaviour maybe something like the > > > following might work? > > > > > > delta_exec = (delta * to_ratio((1ULL << 20) - > > > rq->dl.non_deadline_bw, rq->dl.running_bw)) >> 20 > > My current patch does > > (delta * rq->dl.running_bw * rq->dl.deadline_bw_inv) >> 20 > > >> 8; where rq->dl.deadline_bw_inv has been set to > > to_ratio(global_rt_runtime(), global_rt_period()) >> 12; > > > > This seems to work fine, and should introduce less overhead than > > to_ratio(). > > > > Sure, we don't want to do divisions if we can. Why the intermediate > right shifts, though? I wrote it like this to remember that ">> 20" comes from how "to_ratio()" computes the utilization, and the additional ">> 8" comes from the fact that deadline_bw_inv is shifted left by 8, to avoid losing precision (I used 8 insted of 20 so that the computation can be - hopefully - performed on 32 bits... Of course I can revise this if needed). If needed I can change the ">> 20 >> 8" in ">> 28", or remove the ">> 12" from the deadline_bw_inv conmputation (so that we can use ">> 40" or ">> 20 >> 20" in grub_reclaim()). Thanks, Luca