From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3853C4332F for ; Tue, 7 Nov 2023 08:16:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233729AbjKGIQ0 (ORCPT ); Tue, 7 Nov 2023 03:16:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50546 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233658AbjKGIQX (ORCPT ); Tue, 7 Nov 2023 03:16:23 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53CC8DA for ; Tue, 7 Nov 2023 00:16:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=qInQP3okHG5W4uFJcSe4n5S3W2Ar2OKl9hckzkM1FQw=; b=BV21XkCtxsTYioh+ti4R0wlQhG 7m54NdviFl3WhA9zK+z1lh3Q2FoPKLnnWzVi6b2lCbW5V1xE5WWzqcC1/KhhPi/wH7sp1P/Xu7XvD G+QNg3yXN00P+qR6rDUOv96VkdSzovLwjSbCE2fEBlRoyzYPE1y+WA+Hg7Do2jtIIvQSI2acmyRHp JR2RONxrGKvN+DXW67RudNE98haXCO/s/FYwEy5MxE2IoLi5cmMDTRoAUuKxFkNd9EUgAgLw+IegT 8zixzVz02pzdE7opXeWtGC1BuLwodFZ8S2G2qDSJYbeafTXtJsCbxpPCEl+bOvCReVQrzlT0NZg7Q O5VU5OwA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1r0HFi-00B99D-Bp; Tue, 07 Nov 2023 08:16:03 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id B137F30049D; Tue, 7 Nov 2023 09:16:02 +0100 (CET) Date: Tue, 7 Nov 2023 09:16:02 +0100 From: Peter Zijlstra To: Daniel Bristot de Oliveira Cc: Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , Phil Auld Subject: Re: [PATCH v5 7/7] sched/fair: Fair server interface Message-ID: <20231107081602.GP8262@noisy.programming.kicks-ass.net> References: <26adad2378c8b15533e4f6216c2863341e587f57.1699095159.git.bristot@kernel.org> <20231106154042.GH3818@noisy.programming.kicks-ass.net> <9a7222ed-88f8-4a3f-9d83-09b7fb977c27@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9a7222ed-88f8-4a3f-9d83-09b7fb977c27@kernel.org> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 06, 2023 at 05:29:49PM +0100, Daniel Bristot de Oliveira wrote: > On 11/6/23 16:40, Peter Zijlstra wrote: > > On Sat, Nov 04, 2023 at 11:59:24AM +0100, Daniel Bristot de Oliveira wrote: > >> Add an interface for fair server setup on debugfs. > >> > >> Each rq have three files under /sys/kernel/debug/sched/rq/CPU{ID}: > >> > >> - fair_server_runtime: set runtime in ns > >> - fair_server_period: set period in ns > >> - fair_server_defer: on/off for the defer mechanism > >> > > > > This then leaves /proc/sys/kernel/sched_rt_{period,runtime}_us to be the > > total available bandwidth control, right? > > right, but thinking aloud... given that the per-cpu files are already allocating the > bandwidth on the dl_rq, the spare time for fair scheduler is granted. > > Still, we can have them there as a safeguard to not overloading the deadline > scheduler... (thinking aloud 2) as long as global is a thing... as we get away > from it, that global limitation will make less sense - still better to have a form > of limitation so people are aware of bandwidth until there. Yeah, so having a limit on the deadline thing seems prudent as a way to model system overhead. I mean 100% sounds nice, but then all the models also assume no interrupts, no scheduler or migration overhead etc.. So setting a slightly lower max seems far more realistic to me. That said, the period/bandwidth thing is now slightly odd, as we really only care about the utilization. But whatever. One thing at a time. > > But then shouldn've we also rip out the throttle thingy right quick? > > > > I was thinking about moving the entire throttling machinery inside CONFIG_RT_GROUP_SCHED > for now, because GROUP_SCHED depends on it, no? Yes. Until we can delete all that code we'll have to keep some of that. > With the next step on moving the dl server as the base for the > hierarchical scheduling... That will rip out the > CONFIG_RT_GROUP_SCHED... with a thing with a per-cpu interface. > > Does it make sense? I'm still not sure how to deal with affinities and deadline servers for RT. There's a bunch of issues and I thing we've only got some of them solved. The semi-partitioned thing (someone was working on that, I think you know the guy), solves DL 'entities' having affinities. But the problem of FIFO is that they don't have inherent bandwidth. This in turn means that any server for FIFO needs to be minimally concurrent, otherwise you hand out bandwidth to lower priority tasks that the higher priority task might want etc.. (Andersson's group has papers here). Specifically, imagine a server with U=1.5 and 3 tasks, a high prio task that requires .8 a medium prio task that requires .6 and a low prio task that soaks up whatever it can get its little grubby paws on. Then with minimal concurreny this works out nicely, high gets .8, mid gets .6 and low gets the remaining .1. If OTOH you don't limit concurrency and let them all run concurrently, you can end up with the situation where they each get .5. Which is obviously fail. Add affinities here though and you're up a creek, how do you distribute utilization between the slices, what slices, etc.. You say given them a per-cpu cgroup interface, and have them configure it themselves, but that's a god-aweful thing to ask userspace to do. Ideally, I'd delete all of FIFO, it's such a horrid trainwreck, a total and abysmal failure of a model -- thank you POSIX :-(