linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Uros Bizjak <ubizjak@gmail.com>, peterz@infradead.org
Cc: Nadav Amit <namit@vmware.com>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andy Lutomirski <luto@kernel.org>,
	Brian Gerst <brgerst@gmail.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	"H . Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Nick Desaulniers <ndesaulniers@google.com>
Subject: Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()
Date: Wed, 18 Oct 2023 15:40:05 -0700	[thread overview]
Message-ID: <CAHk-=whEc2HR3En32uyAufPM3tEh8J4+dot6JyGW=Eg5SEhx7A@mail.gmail.com> (raw)
In-Reply-To: <CAFULd4Zj5hTvATZUVYhUGrxH3fiAUWjO9C27UV_USf2H164thQ@mail.gmail.com>

On Wed, 18 Oct 2023 at 14:40, Uros Bizjak <ubizjak@gmail.com> wrote:
>
> The ones in "raw" form are not IRQ safe and these are implemented
> without volatile qualifier.

You are misreading it.

Both *are* irq safe - on x86.

The difference between "this_cpu_xyz()" and "raw_cpu_xyz()" is that on
*other* architectures, "raw_cpu_xyz():" can be a lot more efficient,
because other architectures may need to do extra work to make the
"this" version be atomic on a particular CPU.

See for example __count_vm_event() vs count_vm_event().

In fact, that particular use isn't even in an interrupt-safe context,
that's an example of literally "I'd rather be fast that correct for
certain statistics that aren't all that important".

They two versions generate the same code on x86, but on other
architectures, __count_vm_event() can *much* simpler and faster
because it doesn't disable interrupts or do other special things.

But on x86, the whole "interrupt safety" is a complete red herring.
Both of them generate the exact same instruction.

On x86, the "volatile" is actually for a completely different reason:
to avoid too much CSE by the compiler.

See  commit b59167ac7baf ("x86/percpu: Fix this_cpu_read()").

In fact, that commit went overboard, and just added "volatile" to
*every* percpu read.

So then people complained about *that*, and PeterZ did commit
0b9ccc0a9b14 ("x86/percpu: Differentiate this_cpu_{}() and
__this_cpu_{}()"), which basically made that "qual or not" be a macro
choice.

And in the process, it now got added to all the RMW ops, that didn't
actually need it or want it in the first place, since they won't be
CSE'd, since they depend on the input.

So that commit basically generalized the whole thing entirely
pointlessly, and caused your current confusion.

End result: we should remove 'volatile' from the RMW ops. It doesn't
do anything on x86. All it does is make us have two subtly different
versions that we don't care about the difference.

End result two: we should make it clear that "this_cpu_read()" vs
"raw_cpu_read()" are *NOT* about interrupts. Even on architectures
where the RMW ops need to have irq protection (so that they are atomic
wrt interrupts also modifying the value), the *READ* operation
obviously has no such issue.

For the raw_cpu_read() vs this_cpu_read() case, the only issue is
whether you can CSE the result.

And in 99% of all cases, you can - and want to - CSE it. But as that
commit b59167ac7baf shows, sometimes you cannot.

Side note: the code that caused that problem is this:

  __always_inline void __cyc2ns_read(struct cyc2ns_data *data)
  {
        int seq, idx;

        do {
                seq = this_cpu_read(cyc2ns.seq.seqcount.sequence);
                ...
        } while (unlikely(seq != this_cpu_read(cyc2ns.seq.seqcount.sequence)));
  }

where the issue is that the this_cpu_read() of that sequence number
needs to be ordered.

Honestly, that code is just buggy and bad.  We should never have
"fixed" it by changing the semantics of this_cpu_read() in the first
place.

The problem is that it re-implements its own locking model, and as so
often happens when people do that, they do it completely wrongly.

Look at the *REAL* sequence counter code in <linux/seqlock.h>. Notice
how in raw_read_seqcount_begin() we have

        unsigned _seq = __read_seqcount_begin(s);
        smp_rmb();

because it actually does the proper barriers. Notice how the garbage
code in __cyc2ns_read() doesn't have them - and how it was buggy as a
result.

(Also notice how this all predates our "we should use load_acquire()
instead of smb_rmb()", but whatever).

IOW, all the "volatiles" in the x86 <asm/percpu.h> file are LITERAL
GARBAGE and should not exist, and are due to a historical mistake.

                   Linus

  reply	other threads:[~2023-10-18 22:40 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-10 16:42 [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr() Uros Bizjak
2023-10-10 17:32 ` Linus Torvalds
2023-10-10 18:22   ` Uros Bizjak
2023-10-10 18:25     ` Nadav Amit
2023-10-10 18:42       ` Linus Torvalds
2023-10-10 18:37     ` Linus Torvalds
2023-10-10 18:41       ` Uros Bizjak
2023-10-10 18:52         ` Linus Torvalds
2023-10-11  7:27           ` Uros Bizjak
2023-10-11  7:45             ` Uros Bizjak
2023-10-11 19:40               ` Linus Torvalds
2023-10-11 18:42           ` Uros Bizjak
2023-10-11 19:51             ` Linus Torvalds
2023-10-11 19:52               ` Linus Torvalds
2023-10-11 20:00               ` Uros Bizjak
2023-10-11 22:37               ` Ingo Molnar
2023-10-11 23:15                 ` H. Peter Anvin
2023-10-12  1:35                   ` Josh Poimboeuf
2023-10-12  6:19                     ` Ingo Molnar
2023-10-12 16:08                       ` Josh Poimboeuf
2023-10-12 17:59                         ` Ingo Molnar
2023-10-12 21:30                           ` Josh Poimboeuf
2023-10-13 10:52                             ` Ingo Molnar
2023-10-11  7:41       ` Nadav Amit
2023-10-11 19:37         ` Linus Torvalds
2023-10-11 21:32           ` Uros Bizjak
2023-10-11 21:54             ` Linus Torvalds
2023-10-12 15:19               ` Nadav Amit
2023-10-12 16:33                 ` Uros Bizjak
2023-10-12 16:55                   ` Uros Bizjak
2023-10-12 17:10                     ` Linus Torvalds
2023-10-12 17:47                       ` Linus Torvalds
2023-10-12 18:01                         ` Uros Bizjak
2023-10-13  9:38                           ` Uros Bizjak
2023-10-13 11:53                             ` Uros Bizjak
2023-10-13 16:38                               ` Linus Torvalds
2023-10-12 17:52                       ` Uros Bizjak
2023-11-20  9:39                       ` Use %a asm operand modifier to obtain %rip-relative addressing Uros Bizjak
2023-10-12 16:56                   ` [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr() Linus Torvalds
2023-10-12 17:16                 ` Linus Torvalds
2023-10-12 19:32                   ` Nadav Amit
2023-10-12 19:40                     ` Linus Torvalds
2023-10-16 18:52                 ` Uros Bizjak
2023-10-16 19:24                   ` Linus Torvalds
2023-10-16 20:35                     ` Nadav Amit
2023-10-16 20:59                       ` Linus Torvalds
2023-10-16 23:02                       ` Linus Torvalds
2023-10-16 23:14                         ` Linus Torvalds
2023-10-17  7:23                         ` Nadav Amit
2023-10-17 19:00                           ` Linus Torvalds
2023-10-17 19:11                             ` Uros Bizjak
2023-10-17 21:05                               ` Uros Bizjak
2023-10-17 21:53                                 ` Linus Torvalds
2023-10-17 22:06                                   ` Nadav Amit
2023-10-17 22:29                                     ` Nadav Amit
2023-10-18  7:46                                   ` Uros Bizjak
2023-10-18  9:04                                     ` Uros Bizjak
2023-10-18 10:54                                       ` Nadav Amit
2023-10-18 12:14                                         ` Uros Bizjak
2023-10-18 13:15                                           ` Uros Bizjak
2023-10-18 14:46                                             ` Nadav Amit
2023-10-18 15:17                                               ` Uros Bizjak
2023-10-18 16:03                                                 ` Nadav Amit
2023-10-18 16:26                                                   ` Linus Torvalds
2023-10-18 17:23                                                     ` Uros Bizjak
2023-10-18 18:11                                                       ` Linus Torvalds
2023-10-18 18:08                                                     ` Uros Bizjak
2023-10-18 18:15                                                       ` Linus Torvalds
2023-10-18 18:26                                                         ` Uros Bizjak
2023-10-18 19:33                                                           ` Uros Bizjak
2023-10-18 20:17                                                             ` Nadav Amit
2023-10-18 20:22                                                             ` Linus Torvalds
2023-10-18 20:34                                                               ` Linus Torvalds
2023-10-18 20:51                                                                 ` Uros Bizjak
2023-10-18 21:09                                                                   ` Uros Bizjak
2023-10-18 21:10                                                                   ` Linus Torvalds
2023-10-18 21:40                                                                     ` Uros Bizjak
2023-10-18 22:40                                                                       ` Linus Torvalds [this message]
2023-10-18 23:06                                                                         ` Linus Torvalds
2023-10-19  7:04                                                                         ` Uros Bizjak
2023-10-19 16:59                                                                           ` Linus Torvalds
2023-10-19 17:21                                                                             ` Uros Bizjak
2023-10-19 18:06                                                                               ` Linus Torvalds
2023-10-19 18:16                                                                                 ` Uros Bizjak
2023-10-19 18:49                                                                                   ` Linus Torvalds
2023-10-19 19:07                                                                                     ` Linus Torvalds
2023-10-20  7:57                                                                                       ` Uros Bizjak
2023-10-19 21:04                                                                                   ` Linus Torvalds
2023-10-19 22:39                                                                                     ` Linus Torvalds
2023-10-20  8:08                                                                                       ` Uros Bizjak
2023-10-19  8:44                                                                         ` Peter Zijlstra
2023-10-19  8:54                                                                         ` Peter Zijlstra
2023-10-19 17:04                                                                           ` Linus Torvalds
2023-10-19 18:13                                                                             ` Peter Zijlstra
2023-10-19 18:22                                                                               ` Linus Torvalds
2023-10-19 18:37                                                                                 ` Uros Bizjak
2023-10-19  9:07                                                                         ` Peter Zijlstra
2023-10-19  9:23                                                                           ` Uros Bizjak
2023-10-18 20:42                                                               ` Uros Bizjak
2023-10-19 16:32                                                               ` Uros Bizjak
2023-10-19 17:08                                                                 ` Linus Torvalds
2023-10-18 18:29                                                       ` Nadav Amit
2023-10-18 16:12                                             ` Linus Torvalds
2023-10-18 17:07                                               ` Uros Bizjak
2023-10-18 18:01                                                 ` Linus Torvalds
2023-10-16 21:09                   ` Uros Bizjak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=whEc2HR3En32uyAufPM3tEh8J4+dot6JyGW=Eg5SEhx7A@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=brgerst@gmail.com \
    --cc=dvlasenk@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=namit@vmware.com \
    --cc=ndesaulniers@google.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=ubizjak@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).