From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C991C2D0A8 for ; Wed, 23 Sep 2020 13:36:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9B916221F0 for ; Wed, 23 Sep 2020 13:36:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="hb9E+fGb"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="y8RA9Mp+" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726610AbgIWNf5 (ORCPT ); Wed, 23 Sep 2020 09:35:57 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:37712 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726130AbgIWNf4 (ORCPT ); Wed, 23 Sep 2020 09:35:56 -0400 From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1600868152; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=sQi8lZPkv33+2/p+CxEr9K+rfH1Bs0rDvL8F8tfBhFM=; b=hb9E+fGbYyw9hvwyuwWM5aRsnhQZbpwkXik+PBbS9GmRNPaEQzDG2cas3gsgGMI/WqcOgS fzq1pqwEssy4yQFNCOYopo/e6I6jqzzZ4/sCJexHt/8fEprytMHFGaMU02dXYOBLgdkUgL Ersq8pwIzxU83Z+Lnw2OTTDxn4eDleCVSMd1vext49huWceSLfCZht2xC3ZtNhkANl7Ot/ SVxZ2XAQzZwFf5mjt/d8lr3db+PL5Yzj4qA7Tij8EewuBMC7eY/3tZzMEuUPieoxSXXFsl 1/cA0sdEQ5mELiFAqkhL4mvqLWDQUGjcscHh4gzy8DesABDK4kf+8a+fZno/OA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1600868152; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=sQi8lZPkv33+2/p+CxEr9K+rfH1Bs0rDvL8F8tfBhFM=; b=y8RA9Mp+iOMW8NAuIAQx5KwO92xGItjHMxSGsmGMHVaYsVtr0nXOezo7T+MTtRUT9KlaF2 bd7Z3u4QE1LVBqCQ== To: peterz@infradead.org Cc: Linus Torvalds , LKML , linux-arch , Paul McKenney , the arch/x86 maintainers , Sebastian Andrzej Siewior , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Will Deacon , Andrew Morton , Linux-MM , Russell King , Linux ARM , Chris Zankel , Max Filippov , linux-xtensa@linux-xtensa.org, Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , David Airlie , Daniel Vetter , intel-gfx , dri-devel , Ard Biesheuvel , Herbert Xu , Vineet Gupta , "open list\:SYNOPSYS ARC ARCHITECTURE" , Arnd Bergmann , Guo Ren , linux-csky@vger.kernel.org, Michal Simek , Thomas Bogendoerfer , linux-mips@vger.kernel.org, Nick Hu , Greentime Hu , Vincent Chen , Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras , linuxppc-dev , "David S. Miller" , linux-sparc Subject: Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends In-Reply-To: <20200923084032.GU1362448@hirez.programming.kicks-ass.net> References: <20200919091751.011116649@linutronix.de> <87mu1lc5mp.fsf@nanos.tec.linutronix.de> <87k0wode9a.fsf@nanos.tec.linutronix.de> <87eemwcpnq.fsf@nanos.tec.linutronix.de> <87a6xjd1dw.fsf@nanos.tec.linutronix.de> <87sgbbaq0y.fsf@nanos.tec.linutronix.de> <20200923084032.GU1362448@hirez.programming.kicks-ass.net> Date: Wed, 23 Sep 2020 15:35:52 +0200 Message-ID: <87imc4aa4n.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-csky@vger.kernel.org On Wed, Sep 23 2020 at 10:40, peterz wrote: > Right, so I'm concerned. migrate_disable() wrecks pretty much all > Real-Time scheduler theory we have, and PREEMPRT_RT bringing it in is > somewhat ironic. It's even more ironic that the approach of PREEMPT_RT has been 'pragmatic ignorance of theory' from the very beginning and despite violating all theories it still works. :) > Yes, it allows breaking up non-preemptible regions of non-deterministic > duration, and thereby both reduce and bound the scheduling latency, the > cost for doing that is that the theory on CPU utilization/bandwidth go > out the window. I agree, that the theory goes out of the window, but does it matter in practice? I've yet to see a report of migrate disable stacking being the culprit of a missed deadline and I surely have stared at lots of reports in the past 10+ years. > To easily see this consider an SMP system with a number of tasks equal > to the number of CPUs. On a regular (preempt_disable) kernel we can > always run each task, by virtue of always having an idle CPU to take the > task. > > However, with migrate_disable() we can have each task preempted in a > migrate_disable() region, worse we can stack them all on the _same_ CPU > (super ridiculous odds, sure). And then we end up only able to run one > task, with the rest of the CPUs picking their nose. > > The end result is that, like with unbounded latency, we will still miss > our deadline, simply because we got starved for CPU. I'm well aware of that. > Now, while we could (with a _lot_ of work) rework the kernel to not rely > on the implicit per-cpu ness of things like spinlock_t, the moment we > bring in basic primitives that rely on migrate_disable() we're stuck > with it. Right, but we are stuck with per CPUness and distangling that is just infeasible IMO. > The problem is; afaict there's been no research into this problem. There is no research on a lot of things the kernel does :) > There might be scheduling (read: balancing) schemes that can > mitigate/solve this problem, or it might prove to be a 'hard' problem, > I just don't know. In practive balancing surely can take the number of preempted tasks which are in a migrate disable section into account which would be just another measure to work around the fact that the kernel is not adhering to the theories. It never did that even w/o migrate disable. > But once we start down this road, it's going to be hell to get rid of > it. Like most of the other things the kernel came up with to deal with the oddities of modern hardware :) > That's why I've been arguing (for many years) to strictly limit this to > PREEMPT_RT and only as a gap-stop, not a fundamental primitive to build > on. I know, but short of rewriting the world, I'm not seing the faintest plan to remove the stop gap. :) As we discussed not long ago we have too many inconsistent preemption models already. RT is adding yet another one. And that's worse than introducing migrate disable as a general available facility. IMO, reaching a point of consistency where our different preemption models (NONE, VOLUNTARY, PREEMPT. RT) build on each other is far more important. For !RT migrate disable is far less of an danger than for RT kernels because the amount of code which will use it is rather limited compared to code which still will disable preemption implicit through spin and rw locks. On RT converting these locks to 'sleepable spinlocks' is just possible because RT forces almost everything into task context and migrate disable is just the obvious decomposition of preempt disable which implicitely disables migration. But that means that RT is by orders of magnitude more prone to run into the scheduling trainwreck you are worried about. It just refuses to do so at least with real world work loads. I'm surely in favour of having solid theories behind implementation, but at some point you just have to bite the bullet and chose pragmatism in order to make progress. Proliferating inconsistency is not real progress, as it is violating the most fundamental engineering principles. That's by far more dangerous than violating scheduling theories which are built on perfect models and therefore enforce violation by practical implementations anyway. Thanks, tglx