From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 913CECDB47E for ; Wed, 18 Oct 2023 21:09:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231913AbjJRVJ3 (ORCPT ); Wed, 18 Oct 2023 17:09:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230373AbjJRVJ0 (ORCPT ); Wed, 18 Oct 2023 17:09:26 -0400 Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99863AB for ; Wed, 18 Oct 2023 14:09:24 -0700 (PDT) Received: by mail-ed1-x536.google.com with SMTP id 4fb4d7f45d1cf-53dd752685fso12627570a12.3 for ; Wed, 18 Oct 2023 14:09:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697663363; x=1698268163; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=YZuQOrHn5qschiscHqNh0CrL7M5eMboDDMfEuDpLQtI=; b=jIlUX1rSnxxh6XIyIYfJQXScokjP2wvoZiJBSv76KIaSlu+djTG3E5AXDbOMLlkNsR leWE0KEMHnDRjl6KtEdbdXkBsjq6/o2CJ8pyg++QUtsriOkUsYXWxYBZUIlgCo3arlVV jBtyPJiFZ2HdU9ER6J5sHOWiVkk6WH/9bxOXeOcXPUQ25dkL8PwVqfW5Zf/Vbypa/qBe tCcSduXWoUu4ZiDZhgGgcV/U7rRrRuzVZtlfciNJhUOLob8+4HrLau1hj109SsD6djT8 5UGjWBwOd9LF63VS3p23qOTtvvLPYMG3qFe1WbpaQiEPJHRbqu62EE7c1rAqWI0FMNZA FXtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697663363; x=1698268163; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YZuQOrHn5qschiscHqNh0CrL7M5eMboDDMfEuDpLQtI=; b=fsOevXZFBXEd2lyIixvNb0pdgQ1CJsCBxjR/HeEAZs+B0sE8vXR1L3TEuMjqHUXDD8 TEQG+puIyy0nBnGwgpnsbdhWfElpOTVsS8VJ81haLGj3tMfe1UWmNqNlrD9X7I7TfYJk ubsK0t9BVwpy6HcA26i0XlD2ZX08iLnCZjRACPssOIw3hANz/chUpiuPQ3kIS+e31zn5 jPcGeX32KVVns2NoAjIYq++BubCNpz7QkkIpGcgQ7vTj4V1xWpS0GiwhzfNMWn8OQwKv lkCilXwVYNkAxpeePiwRIeaCw0ys/7djINNqHiHtLfs3ifv9X7Ld8iymXwd4PU74rNDi tWNw== X-Gm-Message-State: AOJu0YxWiLGHeF/pWh0dPz4T5KXnNEJyrBRyUEUdoowzXQMPlxM03SNe 0XOxo1onm13M1c8QyyDTrstidsS4LKdza3+DuC0= X-Google-Smtp-Source: AGHT+IGxzwtSk0KkF3L2w0cIRCQA1sT4ezENGT9pCKEXbFSm0R7at/7JSl1pp+2MHQZC1iXNziJJWdHru4TJ/6W5DLw= X-Received: by 2002:a05:6402:50d3:b0:534:8bdf:a258 with SMTP id h19-20020a05640250d300b005348bdfa258mr115761edb.31.1697663362698; Wed, 18 Oct 2023 14:09:22 -0700 (PDT) MIME-Version: 1.0 References: <20231010164234.140750-1-ubizjak@gmail.com> <0617BB2F-D08F-410F-A6EE-4135BB03863C@vmware.com> <7D77A452-E61E-4B8B-B49C-949E1C8E257C@vmware.com> <9F926586-20D9-4979-AB7A-71124BBAABD3@vmware.com> <3F9D776E-AD7E-4814-9E3C-508550AD9287@vmware.com> <28B9471C-4FB0-4AB0-81DD-4885C3645E95@vmware.com> In-Reply-To: From: Uros Bizjak Date: Wed, 18 Oct 2023 23:09:10 +0200 Message-ID: Subject: Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr() To: Linus Torvalds Cc: Nadav Amit , "the arch/x86 maintainers" , Linux Kernel Mailing List , Andy Lutomirski , Brian Gerst , Denys Vlasenko , "H . Peter Anvin" , Peter Zijlstra , Thomas Gleixner , Josh Poimboeuf , Nick Desaulniers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 18, 2023 at 10:51=E2=80=AFPM Uros Bizjak wr= ote: > > On Wed, Oct 18, 2023 at 10:34=E2=80=AFPM Linus Torvalds > wrote: > > > > On Wed, 18 Oct 2023 at 13:22, Linus Torvalds > > wrote: > > > > > > And yes, sometimes we use actual volatile accesses for them > > > (READ_ONCE() and WRITE_ONCE()) but those are *horrendous* in general, > > > and are much too strict. Not only does gcc generally lose its mind > > > when it sees volatile (ie it stops doing various sane combinations > > > that would actually be perfectly valid), but it obviously also stops > > > doing CSE on the loads (as it has to). > > > > Note, in case you wonder what I mean by "lose its mind", try this > > (extremely stupid) test program: > > > > void a(volatile int *i) { ++*i; } > > void b(int *i) { ++*i; } > > > > and note that the non-volatile version does > > > > addl $1, (%rdi) > > > > but the volatile version then refuses to combine the read+write into a > > rmw instruction, and generates > > > > movl (%rdi), %eax > > addl $1, %eax > > movl %eax, (%rdi) > > > > instead. > > > > Sure, it's correct, but it's an example of how 'volatile' ends up > > disabling a lot of other optimizations than just the "don't remove the > > access". > > > > Doing the volatile as one rmw instruction would still have been very > > obviously valid - it's still doing a read and a write. You don't need > > two instructions for that. > > FYI: This is the reason RMW instructions in percpu.h are not (blindly) > converted to C ops. They will remain in their (volatile or not) asm > form because of the above reason, and due to the fact that they don't > combine with anything. > > > I'm not complaining, and I understand *why* it happens - compiler > > writers very understandably go "oh, I'm not touching that". > > > > I'm just trying to point out that volatile really screws up code > > generation even aside from the "access _exactly_ once" issue. > > > > So using inline asm and relying on gcc doing (minimal) CSE will then > > generate better code than volatile ever could, even when we just use a > > simple 'mov" instruction. At least you get that basic combining > > effect, even if it's not great. > > Actually, RMW insns are better written in asm, while simple "mov" > should be converted to (volatile or not) memory access. On x86 "mov"s > from memory (reads) will combine nicely with almost all other > instructions. BTW: There was a discussion that GCC should construct RMW instructions also when the memory location is marked volatile, but there was no resolution reached. So, the "I'm not touching that" approach remains. However, GCC *will* combine a volatile read with a follow-up instruction. Uros.