From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6082DC4332F for ; Tue, 7 Nov 2023 16:44:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229738AbjKGQo4 (ORCPT ); Tue, 7 Nov 2023 11:44:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229643AbjKGQoz (ORCPT ); Tue, 7 Nov 2023 11:44:55 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC94E93; Tue, 7 Nov 2023 08:44:53 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 711B2C433C7; Tue, 7 Nov 2023 16:44:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699375493; bh=kNJQEfH4fV5bGrV5pEKSxIjNeERGSHFawubQFMpT+HA=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=iJ1IijufEZT9JCunm4YGwT3t6VR+eU2MTcRVnaJ+ftCpIfCXRPV/e41Cww9vbzpcP V06Hk17fM0INcfPdSeRL8hQchYgwcpzg1fZzSWrv/KDiU+H4Xn4kMrC7gAv9KOLtWq W1JF8+YBOSMxPa5grNbnXXdEnvII8BrQhbFiE54DVpKOmfBco6Iictnd519+xlF0sN mI5MZxS1DKt8AuiwId2REA7vwVEJTClha/xViMUhOpJ4Zv+V8Izj9FKMoQetL+IN95 KXfjZMP1XoLptHvez5QvhNf+MHrFFdy7QJ0k2TCcAaAMHPxosCdE8n/lo8WNN/ziMI kRaEfbA5O3U5Q== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id E284BCE0E95; Tue, 7 Nov 2023 08:44:51 -0800 (PST) Date: Tue, 7 Nov 2023 08:44:51 -0800 From: "Paul E. McKenney" To: Segher Boessenkool Cc: Peter Zijlstra , j.alglave@ucl.ac.uk, will@kernel.org, catalin.marinas@arm.com, linux@armlinux.org.uk, mpe@ellerman.id.au, npiggin@gmail.com, palmer@dabbelt.com, parri.andrea@gmail.com, linux-kernel@vger.kernel.org, linux-toolchains@vger.kernel.org, boqun.feng@gmail.com, davidtgoldblatt@gmail.com Subject: Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion Message-ID: Reply-To: paulmck@kernel.org References: <20231105230859.GH8262@noisy.programming.kicks-ass.net> <20231107095745.GD19790@gate.crashing.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231107095745.GD19790@gate.crashing.org> Precedence: bulk List-ID: X-Mailing-List: linux-toolchains@vger.kernel.org On Tue, Nov 07, 2023 at 03:57:45AM -0600, Segher Boessenkool wrote: > On Mon, Nov 06, 2023 at 06:16:24PM -0800, Paul E. McKenney wrote: > > On Mon, Nov 06, 2023 at 12:08:59AM +0100, Peter Zijlstra wrote: > > > Which is a contradiction if ever I saw one. It both claims this atrocity > > > fixes our volatile_if() woes while at the same time saying we're > > > unaffected because we don't use any of the C/C++ atomic batshit. > > > > I guess that my traditional reply would be that if you are properly > > confused by all this, that just means that you were reading carefully. > > I'll put that in my quote box :-) Please accept my condolences. ;-) > > I am very much against incurring real overhead to solve an issue that is > > an issue only in theory and not in practice. I wish I could confidently > > say that my view will prevail, but... > > Given enough time most theory turns out to be practice. Actually, in this field, it tends to be the other way around. After all, if practice followed theory, we would have all abandoned locks for non-blocking synchronization in the 1990s, and then switched to transactional memory in the decade following. > If what you > are writing has a constrained scope, or a limited impact, or both, you > can ignore this "we'll deal with it later if/when it shows up". But a > compiler does not have that luxury at all: it has to make correct > translations from source code to assembler code (or machine code > directly, for some compilers), or refuse to compile something. Making > an incorrect translation is not an option. But in this case, it would be most excellent if compiler practice were to follow theory. Because the theory of avoiding OOTA without emitting extraneous instructions is quite simple: Avoid breaking semanitic dependencies. Given that, as you say, incorrect translation is not an option, it should not be hard to satisfy oneself that a correct compiler must avoid breaking semantic dependencies, at least for compilers that do not indulge in value speculation, which would be a very brave indulgence. Or, alternatively, that any such breaking constitutes a compiler bug. But when this approach was put forward about a decade ago, compiler writers were quite resistant. Plus they argued that concurrency was a niche use case, which might be a less convincing argument these days. > > If this goes through and if developers see any overhead from relaxed > > atomics in a situation that matters to them, they will reach for some > > other tool. Inline assembly and volatile accesses, I suppose. Or the > > traditional approach of a compiler flag. > > And I understand you want the standards to be more useful for the kernel > concurrency model? Why, exactly? I can think of many reasons, but I'm > a bit lost as to what motivates you here :-) My hope is that a number of useful and efficient concurrent-code idioms can be implemented with help from the compiler, in the Linux kernel and elsewhere. Or failing that, at least with less resistance on the part of the compiler. Here is an incomplete list: 1. When it is necessary for fast-path code to load from a shared variable, it should be possible to use a single normal load instruction for this purpose. Our current conversation touches on this point. 2. Address dependencies extending to loads do not generate OOTA, but are another example of this "just use a normal load" issue. Address dependencies need more careful handling because compilers can and do convert them to control dependencies. Which is why the Linux-kernel advice in rcu_dereference.rst is to avoid using integers for address dependencies, and most especially to avoid using booleans for this. But it would be nice to be able to tell the compiler that a given variable, function parameter, and/or return value carried a dependency. At a minimum, it would be nice if the compiler could complain if that dependency would be broken. 3. It should be possible to implement 50-year-old concurrent algorithms straightforwardly (LIFO push stack being the poster boy here [1]). Right now, pointer provenance gets in the way of this. On the other hand, clang/LLVM does quite well without pointer provenance, so one has to wonder just how important pointer-provenance-based optimizations really are. Hans Boehm, were he looking over my shoulder, would add that there are a number of single-threaded algorithms that are made unnecessarily complex due pointer provenance issues [3]. Hans and I often come down on opposite sides of concurrency issues, so in cases like this one where we do agree, you just might want to pay attention. ;-) Anthony Williams goes further, arguing that pointers should at least sometimes just be bags of bits, that is, that the role of pointer provenance should be greatly reduced or even eliminated [3]. 4. Semantics of volatile. Perhaps the current state is the best that can be hoped for, but given that the current state is a few vague words in the standard in combination with the fact that C-language device drivers must be able to use volatile to reliably and concurrently access memory shared with device firmware, one would hope for better. 5. UB on signed integer overflow. Right now, the kernel just forces wrapping, which works, so maybe we don't really care all that much. But at this point, it seems to me that it was a mistake for the language to have failed to provide a means of specifying signed integers that wrap (int_wrap?). (Yes, yes, you can get them by making an atomic signed int, but that is not exactly an ergonomic workaround.) Is this really too much to ask? If so, why? The Linux-kernel memory model and associated coding guidelines form a way of achieving the items on this list with the current level of cooperation from the compilers, or perhaps more accurately, current level of lack of cooperation. The thing is that C11 and C++11 did pretty much the bare minimum to support concurrency. After all, when that effort started back in 2005, concurrency really was a niche use case. It is only reasonable to expect a few adjustments almost 20 years on. Hey, you asked!!! ;-) Thanx, Paul [1] https://docs.google.com/document/d/12paeC4suYAmVZlQvqytiCjGmEeGdJn0X7oM0SRLMALk/edit [2] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1726r5.pdf [3] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2188r1.html