linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Uros Bizjak <ubizjak@gmail.com>
Cc: Nadav Amit <namit@vmware.com>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andy Lutomirski <luto@kernel.org>,
	Brian Gerst <brgerst@gmail.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	"H . Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Nick Desaulniers <ndesaulniers@google.com>
Subject: Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()
Date: Wed, 18 Oct 2023 13:22:19 -0700	[thread overview]
Message-ID: <CAHk-=wgoWOcToLYbuL2GccbNXwj_MH-LxmB_7MMjw6uu50k57Q@mail.gmail.com> (raw)
In-Reply-To: <CAFULd4Zhw=zoDtir03FdPxJD15GZ5N=SV9=4Z45_Q_P9BL1rvQ@mail.gmail.com>

On Wed, 18 Oct 2023 at 12:33, Uros Bizjak <ubizjak@gmail.com> wrote:
>
> This pach works for me:

Looks fine.

But you actually bring up another issue:

> BTW: I also don't understand the comment from include/linux/smp.h:
>
> /*
>  * Allow the architecture to differentiate between a stable and unstable read.
>  * For example, x86 uses an IRQ-safe asm-volatile read for the unstable but a
>  * regular asm read for the stable.

I think the comment is badly worded, but I think the issue may actually be real.

One word: rematerialization.

The thing is, turning inline asm accesses to regular compiler loads
has a *very* bad semantic problem: the compiler may now feel like it
can not only combine the loads (ok), but also possibly rematerialize
values by re-doing the loads (NOT OK!).

IOW, the kernel often has very strict requirements of "at most once"
behavior, because doing two loads might give different results.

The cpu number is a good example of this.

And yes, sometimes we use actual volatile accesses for them
(READ_ONCE() and WRITE_ONCE()) but those are *horrendous* in general,
and are much too strict. Not only does gcc generally lose its mind
when it sees volatile (ie it stops doing various sane combinations
that would actually be perfectly valid), but it obviously also stops
doing CSE on the loads (as it has to).

So the "non-volatile asm" has been a great way to get the "at most
one" behavior: it's safe wrt interrupts changing the value, because
you will see *one* value, not two. As far as we know, gcc never
rematerializes the output of an inline asm. So when you use an inline
asm, you may have the result CSE'd, but you'll never see it generate
more than *one* copy of the inline asm.

(Of course, as with so much about inline asm, that "knowledge" is not
necessarily explicitly spelled out anywhere, and it's just "that's how
it has always worked").

IOW, look at code like the one in swiotlb_pool_find_slots(), which does this:

        int start = raw_smp_processor_id() & (pool->nareas - 1);

and the use of 'start' really is meant to be just a good heuristic, in
that different concurrent CPU's will start looking in different pools.
So that code is basically "cpu-local by default", but it's purely
about locality, it's not some kind of correctness issue, and it's not
necessarily run when the code is *tied* to a particular CPU.

But what *is* important is that 'start' have *one* value, and one
value only. So look at that loop, which hasically does

        do {
                  .. use the 'i' based on 'start' ..
                if (++i >= pool->nareas)
                        i = 0;
        } while (i != start);

and it is very important indeed that the compiler does *not* think
"Oh, I can rematerialize the 'start' value".

See what I'm saying? Using 'volatile' for loading the current CPU
value would be bad for performance for no good reason. But loading it
multiple times would be a *bug*.

Using inline asm is basically perfect here: the compiler can *combine*
two inline asms into one, but once we have a value for 'start', it
won't change, because the compiler is not going to decide "I can drop
this value, and just re-do the inline asm to rematerialize it".

This all makes me worried about the __seg_fs thing.

For 'current', this is all perfect. Rematerializing current is
actually better than spilling and reloading the value.

But for something like raw_smp_processor_id(), rematerializing would
be a correctness problem, and a really horrible one (because in
practice, the code would work 99.9999% of the time, and then once in a
blue moon, it would rematerialize a different value).

See the problem?

I guess we could use the stdatomics to try to explain these issues to
the compiler, but I don't even know what the C interfaces look like or
whether they are stable and usable across the range of compilers we
use.

               Linus

  parent reply	other threads:[~2023-10-18 20:22 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-10 16:42 [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr() Uros Bizjak
2023-10-10 17:32 ` Linus Torvalds
2023-10-10 18:22   ` Uros Bizjak
2023-10-10 18:25     ` Nadav Amit
2023-10-10 18:42       ` Linus Torvalds
2023-10-10 18:37     ` Linus Torvalds
2023-10-10 18:41       ` Uros Bizjak
2023-10-10 18:52         ` Linus Torvalds
2023-10-11  7:27           ` Uros Bizjak
2023-10-11  7:45             ` Uros Bizjak
2023-10-11 19:40               ` Linus Torvalds
2023-10-11 18:42           ` Uros Bizjak
2023-10-11 19:51             ` Linus Torvalds
2023-10-11 19:52               ` Linus Torvalds
2023-10-11 20:00               ` Uros Bizjak
2023-10-11 22:37               ` Ingo Molnar
2023-10-11 23:15                 ` H. Peter Anvin
2023-10-12  1:35                   ` Josh Poimboeuf
2023-10-12  6:19                     ` Ingo Molnar
2023-10-12 16:08                       ` Josh Poimboeuf
2023-10-12 17:59                         ` Ingo Molnar
2023-10-12 21:30                           ` Josh Poimboeuf
2023-10-13 10:52                             ` Ingo Molnar
2023-10-11  7:41       ` Nadav Amit
2023-10-11 19:37         ` Linus Torvalds
2023-10-11 21:32           ` Uros Bizjak
2023-10-11 21:54             ` Linus Torvalds
2023-10-12 15:19               ` Nadav Amit
2023-10-12 16:33                 ` Uros Bizjak
2023-10-12 16:55                   ` Uros Bizjak
2023-10-12 17:10                     ` Linus Torvalds
2023-10-12 17:47                       ` Linus Torvalds
2023-10-12 18:01                         ` Uros Bizjak
2023-10-13  9:38                           ` Uros Bizjak
2023-10-13 11:53                             ` Uros Bizjak
2023-10-13 16:38                               ` Linus Torvalds
2023-10-12 17:52                       ` Uros Bizjak
2023-11-20  9:39                       ` Use %a asm operand modifier to obtain %rip-relative addressing Uros Bizjak
2023-10-12 16:56                   ` [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr() Linus Torvalds
2023-10-12 17:16                 ` Linus Torvalds
2023-10-12 19:32                   ` Nadav Amit
2023-10-12 19:40                     ` Linus Torvalds
2023-10-16 18:52                 ` Uros Bizjak
2023-10-16 19:24                   ` Linus Torvalds
2023-10-16 20:35                     ` Nadav Amit
2023-10-16 20:59                       ` Linus Torvalds
2023-10-16 23:02                       ` Linus Torvalds
2023-10-16 23:14                         ` Linus Torvalds
2023-10-17  7:23                         ` Nadav Amit
2023-10-17 19:00                           ` Linus Torvalds
2023-10-17 19:11                             ` Uros Bizjak
2023-10-17 21:05                               ` Uros Bizjak
2023-10-17 21:53                                 ` Linus Torvalds
2023-10-17 22:06                                   ` Nadav Amit
2023-10-17 22:29                                     ` Nadav Amit
2023-10-18  7:46                                   ` Uros Bizjak
2023-10-18  9:04                                     ` Uros Bizjak
2023-10-18 10:54                                       ` Nadav Amit
2023-10-18 12:14                                         ` Uros Bizjak
2023-10-18 13:15                                           ` Uros Bizjak
2023-10-18 14:46                                             ` Nadav Amit
2023-10-18 15:17                                               ` Uros Bizjak
2023-10-18 16:03                                                 ` Nadav Amit
2023-10-18 16:26                                                   ` Linus Torvalds
2023-10-18 17:23                                                     ` Uros Bizjak
2023-10-18 18:11                                                       ` Linus Torvalds
2023-10-18 18:08                                                     ` Uros Bizjak
2023-10-18 18:15                                                       ` Linus Torvalds
2023-10-18 18:26                                                         ` Uros Bizjak
2023-10-18 19:33                                                           ` Uros Bizjak
2023-10-18 20:17                                                             ` Nadav Amit
2023-10-18 20:22                                                             ` Linus Torvalds [this message]
2023-10-18 20:34                                                               ` Linus Torvalds
2023-10-18 20:51                                                                 ` Uros Bizjak
2023-10-18 21:09                                                                   ` Uros Bizjak
2023-10-18 21:10                                                                   ` Linus Torvalds
2023-10-18 21:40                                                                     ` Uros Bizjak
2023-10-18 22:40                                                                       ` Linus Torvalds
2023-10-18 23:06                                                                         ` Linus Torvalds
2023-10-19  7:04                                                                         ` Uros Bizjak
2023-10-19 16:59                                                                           ` Linus Torvalds
2023-10-19 17:21                                                                             ` Uros Bizjak
2023-10-19 18:06                                                                               ` Linus Torvalds
2023-10-19 18:16                                                                                 ` Uros Bizjak
2023-10-19 18:49                                                                                   ` Linus Torvalds
2023-10-19 19:07                                                                                     ` Linus Torvalds
2023-10-20  7:57                                                                                       ` Uros Bizjak
2023-10-19 21:04                                                                                   ` Linus Torvalds
2023-10-19 22:39                                                                                     ` Linus Torvalds
2023-10-20  8:08                                                                                       ` Uros Bizjak
2023-10-19  8:44                                                                         ` Peter Zijlstra
2023-10-19  8:54                                                                         ` Peter Zijlstra
2023-10-19 17:04                                                                           ` Linus Torvalds
2023-10-19 18:13                                                                             ` Peter Zijlstra
2023-10-19 18:22                                                                               ` Linus Torvalds
2023-10-19 18:37                                                                                 ` Uros Bizjak
2023-10-19  9:07                                                                         ` Peter Zijlstra
2023-10-19  9:23                                                                           ` Uros Bizjak
2023-10-18 20:42                                                               ` Uros Bizjak
2023-10-19 16:32                                                               ` Uros Bizjak
2023-10-19 17:08                                                                 ` Linus Torvalds
2023-10-18 18:29                                                       ` Nadav Amit
2023-10-18 16:12                                             ` Linus Torvalds
2023-10-18 17:07                                               ` Uros Bizjak
2023-10-18 18:01                                                 ` Linus Torvalds
2023-10-16 21:09                   ` Uros Bizjak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgoWOcToLYbuL2GccbNXwj_MH-LxmB_7MMjw6uu50k57Q@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=brgerst@gmail.com \
    --cc=dvlasenk@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=namit@vmware.com \
    --cc=ndesaulniers@google.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=ubizjak@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).