From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933641AbcA2VVb (ORCPT ); Fri, 29 Jan 2016 16:21:31 -0500 Received: from mail-wm0-f46.google.com ([74.125.82.46]:34228 "EHLO mail-wm0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932997AbcA2VV1 (ORCPT ); Fri, 29 Jan 2016 16:21:27 -0500 Date: Fri, 29 Jan 2016 22:21:13 +0100 From: Luca Abeni To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Juri Lelli Subject: Re: [RFC 5/8] Track the "total rq utilisation" too Message-ID: <20160129222113.4f0f40b5@luca-1225C> In-Reply-To: <20160129150605.GA2906@worktop> References: <1452785094-3086-1-git-send-email-luca.abeni@unitn.it> <1452785094-3086-6-git-send-email-luca.abeni@unitn.it> <20160114194837.GE6357@twins.programming.kicks-ass.net> <20160115075049.1e24bde8@luca-1225C> <20160115083359.GC3421@worktop> <20160115101511.726cfe29@luca-1225C> <20160129150605.GA2906@worktop> X-Mailer: Claws Mail 3.8.0 (GTK+ 2.24.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Peter, On Fri, 29 Jan 2016 16:06:05 +0100 Peter Zijlstra wrote: > On Fri, Jan 15, 2016 at 10:15:11AM +0100, Luca Abeni wrote: > > > There is also a newer paper, that will be published at ACM SAC 2016 > > (so, it is not available yet), but is based on this technical > > report: http://arxiv.org/abs/1512.01984 > > This second paper describes some more complex algorithms (easily > > implementable over this patchset) that are able to guarantee hard > > schedulability for SCHED_DEADLINE tasks with reclaiming on SMP. > > So I finally got around to reading the relevant sections of that paper > (5.1 and 5.2). > > The paper introduces two alternatives; > > - parallel reclaim (5.1) > - sequential reclaim (5.2) > > The parent patch introduces the accounting required for sequential > reclaiming IIUC. The patches I posted implements something similar to sequential reclaiming (they actually implement the algorithm described in the RTLWS paper: http://disi.unitn.it/~abeni/reclaiming/rtlws14-grub.pdf). The parallel and sequential reclaiming algorithms can be implemented on top of the RFC patches I posted. See https://github.com/lucabe72/linux-reclaiming/commits/reclaiming-new-v3 commits from "Move to the new M-GRUB definition (BCL incarnation)", which implements sequential reclaiming. Parallel reclaiming is implemented in the following 2 commits. I did not post these last patches because I feel they are too premature even for an RFC :) > Thinking about things however, I think I would prefer parallel reclaim > over sequential reclaim. The problem I see with sequential reclaim is > that under light load jobs might land on different CPUs and not > benefit from reclaim (as much) since the 'spare' bandwidth is stuck > on other CPUs. > > Now I suppose the exact conditions to hit that worst case might be > quite hard to trigger, in which case it might just not matter in > practical terms. > > But maybe I'm mistaken, the paper doesn't seem to compare the two > approaches in this way. The technical report does not present any comparison, but we have an ACM SAC paper (still to be published) that presents some experiments comparing the two algorithms. And, you are right: parallel reclaiming seems to work better. However, parallel reclaiming requires a "global" (per scheduling domain) variable to keep track of the total active (or inactive) utilization... And this is updated every time a task blocks/unblock. So, I expected more overhead, or scalability issues... But in the experiments I have not been able to measure this overhead (I "only" tested on a dual Xeon, with 4 cores per CPU... Maybe I needed more runqueues/CPU cores to see scalability issues?). Also, the implementation of parallel reclaiming is approximated (for example: when a task wakes up on a runqueue, the "global" active utilization is updated... But before updating it we should properly account the runtime of the SCHED_DEADLINE tasks executing on all the other runqueues of the scheduling domain... However, I only account the runtime when a tick fires or when a task blocks). But my experiments were not able to show any performance degradation because of this approximation... I got the impression that "per-runqueue active utilization" can be tracked with less overhead (and can be more useful: for example, to drive frequency scaling), and this is why I posted this patchset in the RFC... But if you think that "global active utilization" (leading to parallel reclaiming) can be more useful, I can reorder the patches and post an RFC with a parallel reclaiming implementation. Thanks, Luca