All of lore.kernel.org
 help / color / mirror / Atom feed
* A few proposals, this time from the C++ standards committee
@ 2024-03-17  9:14 Paul E. McKenney
  2024-03-17 18:50 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Paul E. McKenney @ 2024-03-17  9:14 UTC (permalink / raw)
  To: linux-toolchains; +Cc: peterz, hpa, rostedt, gregkh, keescook, torvalds

Hello!

Another language, another standards-committee meeting, another set of
potentially relevant papers.  ;-)

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

P2414R2 — Pointer lifetime-end zap proposed solutions
	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2414r2.pdf

	Yet another run at making it easier to express some concurrent
	algorithms dating back to the 1970s.  There has been some
	movement on making the CAS old-value assignment recompute pointer
	provenance, and the most controversial operation turns out to be
	implementable with Linux-kernel barrier().  I have no idea how
	things will go with the notion that atomic pointers should not be
	subject to lifetime-end pointer zap.  It should be interesting...

D3181R0 — Atomic stores and object lifetimes
	This one was late to the party, so is not formally published.
	It deals with an odd corner case in the C and C++ memory models
	in which an atomic_thread_fence(memory_order_release) cannot
	completely emulate a store-release operation.  We avoided this
	problem in the Linux-kernel memory model, and hardware seems to
	do the right thing.  Actually, the speed of light, the atomic
	nature of matter, and the causal nature of the universe being
	what they appear to be, hardware would have some difficulty
	causing trouble here.  But the abstract machine is ignorant of
	the laws of physics, so this should be good clean fun!	;-)

	There is an example code fragment here:

	https://github.com/llvm/llvm-project/issues/64188

D3125R0 — Pointer tagging
	Another one that is late to the party, and thus not yet formally
	published.  The idea is to provide a way to access pointer bits
	that are not relevant to pointer dereferencing for pointers to
	properly aligned objects or that are unused high-order bits.
	It would be nice.  The devil is in the details.

CWG2298 — Actions and expression evaluation
	https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#2298

	Language lawyering on portions of the C and C++ memory models.
	Nevertheless, it might be useful to tooling.

LWG3941 — atomics.order inadvertently prohibits widespread implementation techniques
	https://cplusplus.github.io/LWG/issue3941

	"Memory models are hard."  ;-)

	Everyone agrees that the implementations are doing the right
	thing, but we need to get the memory-model definition to agree.
	The Linux-kernel memory model avoids this problem by being more
	of a hardware memory model than a language-level memory model.
	(LKMM pays the price by not completely modeling compiler
	optimizations, so pick your poison carefully.)

LWG4004 — The load and store operation in atomics.order p1 is ambiguous
	https://cplusplus.github.io/LWG/issue4004

	Probably just nomenclature.  Probably.  ;-)

---

And these don't seem to have much to do with the C language, but
here they are anyway:

P3149R0 — async_scope -- Creating scopes for non-sequential concurrency
	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3149r0.pdf

P3300R0 — C++ Asynchronous Parallel Algorithms
	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3300r0.html

P2882R0 — An Event Model for C++ Executors
	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2882r0.html

P3179R0 — C++ parallel range algorithms
	https://isocpp.org/files/papers/P3179R0.html

P3135R0 — Hazard Pointer Extensions
	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3135r0.pdf

P3138R0 — views::cache_last
	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3138r0.html

P2964R0 — Allowing user-defined types in std::simd
	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2964r0.html

P0260R8 — C++ Concurrent Queues
	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p0260r7.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17  9:14 A few proposals, this time from the C++ standards committee Paul E. McKenney
@ 2024-03-17 18:50 ` Linus Torvalds
  2024-03-17 20:56   ` Paul E. McKenney
  2024-03-17 20:50 ` Linus Torvalds
  2024-03-19  7:41 ` Marco Elver
  2 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2024-03-17 18:50 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, 17 Mar 2024 at 02:14, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> Another language, another standards-committee meeting, another set of
> potentially relevant papers.  ;-)
>
> Thoughts?

These seem to be mainly all just due to entirely self-inflicted damage
by the C++ standards body.

Well, except for the pointer tagging, which is a reasonable feature
but I suspect the syntax and the way to let people specify *which*
bits to tag would be painful.

The self-inflicted ones seem to all be because of the horrible
syntax-based abstract machine that the C++ standards body refuses to
give up. It is the source of pretty much every single memory ordering
issue.

Yes, that whole "semantics based on high-level language syntax and
contexts" is all lovely - if you do purely functional programming and
you have zero actual interactions with hardware.

But going back all the way to K&R, the C language definition has
always had that whiff of "oh, reality actually matters", and you can
see it in how "volatile" was described. The C standards people always
distrusted that "take it closer to a real machine" for some reason,
and the C++ people seem to have actively hated it, which is part of
why C++ then got _so_ confused with what the actual semantics of
"volatile" really are, because of the whole "is a lvalue an access"
thing etc etc.

(Yes, yes, I know they then introduced "generalized lvalues" aka
"glvalues" to fix that particular braindamage, but my point is that at
no point did they realize that the problem went deeper).

The whole concept of "abstract machine" is broken. Not because it was
a bad idea originally as a way to describe some amount of portability
issues. I guarantee that is how it started for K&R - as a way to just
avoid talking about very concrete limits (word size etc).

But the C++ standard people try *SO*HARD* to describe what a valid
optimization is without ever talking about reality that it has become
a completely broken thing.

I have a solution for it all, but my solution involves throwing out
all that pointless and wasted effort, and involves talking about
optimizations in terms of actual observable differences on real
hardware. So my solution is obviously not acceptable to the C++ people
who have a serious case of Stockholm syndrome with their whole failed
model. These people refuse to admit that their whole approach is
broken.

I quote from the standard:

    The semantic descriptions in this document define a parameterized
    nondeterministic abstract machine. This document places no
    requirement on the structure of conforming implementations. In
    particular, they need not copy or emulate the structure of the
    abstract machine.

    Rather, conforming implementations are required to emulate (only)
    the observable behavior of the abstract machine as explained below.

and the problem here really is that not only does it start from a
ridiculous assumption ("parameterized nondeterministic abstract
machine"), but it ends with a problem that then needs to be defined
("observable behavior") because you started from such an overly
pointless mental exercise.

So my suggestion is that somebody put some psychoactive drugs into the
fountain machine at the next C++ standards meeting, and when all the
members are susceptible to sane suggestions, you instead tell them
that the abstract machine was a mistake. And then you tell them that
you should always generate code as if you were a "simple compiler"
(it's interesting to note that your "lifetime zap" paper actually
talks about that, so *somebody* has a f*cking clue - I haven't seen
that model of "simple compiler" in the C++ standard before).

And then you define the notion of acceptable optimizations as the ones
that have the same results as the simple compiler.

IOW, you make it all *concrete*. And the issue of memory ordering ends
up being pretty much the exact same as the issue of "volatile".
Certain loads and stores can only be combined and moved in certain
ways. Because atomics, memory ordering, and volatile are all basically
the same issue: this is where you deal with reality.

Ta-daa. No stupid abstract machine problems. No odd - and pretty much
unsolvable - impedance issues between "real hardware" and "abstract
machine". In fact, if you do it right, get rid of the "undefined
behavior" catch-all phrase for "we can't describe this, and it ends up
depending on things that depend on runtime differences".

And no,  it's not going to happen. And putting psychoactive drugs in
the fountain machine is immoral.

Too bad.

               Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17  9:14 A few proposals, this time from the C++ standards committee Paul E. McKenney
  2024-03-17 18:50 ` Linus Torvalds
@ 2024-03-17 20:50 ` Linus Torvalds
  2024-03-17 21:04   ` Paul E. McKenney
                     ` (2 more replies)
  2024-03-19  7:41 ` Marco Elver
  2 siblings, 3 replies; 19+ messages in thread
From: Linus Torvalds @ 2024-03-17 20:50 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, 17 Mar 2024 at 02:14, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> D3181R0 — Atomic stores and object lifetimes
>         This one was late to the party, so is not formally published.
>         It deals with an odd corner case in the C and C++ memory models
>         in which an atomic_thread_fence(memory_order_release) cannot
>         completely emulate a store-release operation.  We avoided this
>         problem in the Linux-kernel memory model, and hardware seems to
>         do the right thing.  Actually, the speed of light, the atomic
>         nature of matter, and the causal nature of the universe being
>         what they appear to be, hardware would have some difficulty
>         causing trouble here.  But the abstract machine is ignorant of
>         the laws of physics, so this should be good clean fun!  ;-)
>
>         There is an example code fragment here:
>
>         https://github.com/llvm/llvm-project/issues/64188

Looking closer at this one, it seems to be purely a compiler bug.

Assuming you want to honor memory ordering in the first place, you
*cannot* move a store that ends up later being visible to other
threads past a function call, because you don't know if that function
call might contain a memory barrier.

There's no laws of physics of speed of light or causality issues at
all. The bug they describe in that github issue happens on real
hardware, not on some kind of abstract machine.

In fact, I think the problem case can be simplified further:

  int *bug(int N)
  {
    int* p = malloc(sizeof(int));
    *p = N;
    function_call();
    return p;
  }

without having that "atomic<int>& a" argument involved at all.

If the compiler moves the store to 'p' to after the function call, and
then does a "return p" (which exposes that memory location), and the
function call has any "memory_order_release" store in it (which the
compiler cannot know), then there needs to be some guarantee that a
third party (that may have done an "acquire" on the same thing that
"function_call()" did a release on) always sees the store of N before
it sees that other store.

Now, on x86, this happens automatically, because even if you move the
"*p = N" down to after the function call, all stores are releases, so
by the time 'p' becomes visible to anybody else, you are guaranteed to
see the right ordering.

But on pretty much any other architecture than s390 and x86, you need
to add your own memory barrier if you did the store to '*p' after the
function call, because otherwise you end up violating the
'memory_order_release' in the called function that you didn't even
see.

And yes, to a compiler person, that is very annoying, because
'function_call()' itself clearly doesn't know anything about 'p', so
you'd think that there are no _possible_ visible ordering differences.

But if the C++ standards body thinks that the re-ordering is fine, the
C++ standards body is standardizing on "memory ordering is not real".

I can't find the actual standards text for this, but at least
according to cppreference.com (I don't know how official that is), we
have a very clear rule (and honestly, it's the _only_ possible sane
rule for release->consume, so I hope it's official):

   All memory writes (non-atomic and relaxed atomic) that
   happened-before the atomic store from the point of view of thread A,
   become visible side-effects within those operations in thread B into
   which the load operation carries dependency, that is, once the atomic
   load is completed, those operators and functions in thread B that use
   the value obtained from the load are guaranteed to see what thread A
   wrote to memory.

so this is all completely unambiguous. The compiler is *WRONG* to move
the store to '*p' to after the function call, unless it also adds its
own 'release' ordering.

Weak memory ordering is subtle and difficult. What else is new?

                 Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17 18:50 ` Linus Torvalds
@ 2024-03-17 20:56   ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2024-03-17 20:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, Mar 17, 2024 at 11:50:08AM -0700, Linus Torvalds wrote:
> On Sun, 17 Mar 2024 at 02:14, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > Another language, another standards-committee meeting, another set of
> > potentially relevant papers.  ;-)
> >
> > Thoughts?
> 
> These seem to be mainly all just due to entirely self-inflicted damage
> by the C++ standards body.
> 
> Well, except for the pointer tagging, which is a reasonable feature
> but I suspect the syntax and the way to let people specify *which*
> bits to tag would be painful.
> 
> The self-inflicted ones seem to all be because of the horrible
> syntax-based abstract machine that the C++ standards body refuses to
> give up. It is the source of pretty much every single memory ordering
> issue.
> 
> Yes, that whole "semantics based on high-level language syntax and
> contexts" is all lovely - if you do purely functional programming and
> you have zero actual interactions with hardware.

My decades of interactions with these committees summed up in a single
sentence.  Thank you for that, it did me good!  ;-)

> But going back all the way to K&R, the C language definition has
> always had that whiff of "oh, reality actually matters", and you can
> see it in how "volatile" was described. The C standards people always
> distrusted that "take it closer to a real machine" for some reason,
> and the C++ people seem to have actively hated it, which is part of
> why C++ then got _so_ confused with what the actual semantics of
> "volatile" really are, because of the whole "is a lvalue an access"
> thing etc etc.
> 
> (Yes, yes, I know they then introduced "generalized lvalues" aka
> "glvalues" to fix that particular braindamage, but my point is that at
> no point did they realize that the problem went deeper).

I have many times had to ask those who were inveighing against volatile
whether they wanted their credit cards to continue working.  And more than
once have had to follow up with an explanation of the connection between
volatile, device drivers, kernels, computers, and credit-card processing.
(And yes, one guy actually responded that he would be happy if his credit
card stopped working.)

> The whole concept of "abstract machine" is broken. Not because it was
> a bad idea originally as a way to describe some amount of portability
> issues. I guarantee that is how it started for K&R - as a way to just
> avoid talking about very concrete limits (word size etc).
> 
> But the C++ standard people try *SO*HARD* to describe what a valid
> optimization is without ever talking about reality that it has become
> a completely broken thing.
> 
> I have a solution for it all, but my solution involves throwing out
> all that pointless and wasted effort, and involves talking about
> optimizations in terms of actual observable differences on real
> hardware. So my solution is obviously not acceptable to the C++ people
> who have a serious case of Stockholm syndrome with their whole failed
> model. These people refuse to admit that their whole approach is
> broken.
> 
> I quote from the standard:
> 
>     The semantic descriptions in this document define a parameterized
>     nondeterministic abstract machine. This document places no
>     requirement on the structure of conforming implementations. In
>     particular, they need not copy or emulate the structure of the
>     abstract machine.
> 
>     Rather, conforming implementations are required to emulate (only)
>     the observable behavior of the abstract machine as explained below.
> 
> and the problem here really is that not only does it start from a
> ridiculous assumption ("parameterized nondeterministic abstract
> machine"), but it ends with a problem that then needs to be defined
> ("observable behavior") because you started from such an overly
> pointless mental exercise.

I would prefer that the abstract machine be constrained by the laws of
physics.  Those wishing to make analysis tools are not quite so happy
with that preference, but on the other hand, almost all C and C++ code
is fed through a compiler and run on real hardware that is subject to
the constraints of the objective universe.

> So my suggestion is that somebody put some psychoactive drugs into the
> fountain machine at the next C++ standards meeting, and when all the
> members are susceptible to sane suggestions, you instead tell them
> that the abstract machine was a mistake. And then you tell them that
> you should always generate code as if you were a "simple compiler"
> (it's interesting to note that your "lifetime zap" paper actually
> talks about that, so *somebody* has a f*cking clue - I haven't seen
> that model of "simple compiler" in the C++ standard before).

I did just that (minus the psychoactive drug, just in case there is any
question) at a workshop a couple of months ago.  Things quieted down a
bit when I noted that it would be acceptable for the abstract machine
to be extended to account for the limitations of the physical universe.

> And then you define the notion of acceptable optimizations as the ones
> that have the same results as the simple compiler.
> 
> IOW, you make it all *concrete*. And the issue of memory ordering ends
> up being pretty much the exact same as the issue of "volatile".
> Certain loads and stores can only be combined and moved in certain
> ways. Because atomics, memory ordering, and volatile are all basically
> the same issue: this is where you deal with reality.
> 
> Ta-daa. No stupid abstract machine problems. No odd - and pretty much
> unsolvable - impedance issues between "real hardware" and "abstract
> machine". In fact, if you do it right, get rid of the "undefined
> behavior" catch-all phrase for "we can't describe this, and it ends up
> depending on things that depend on runtime differences".
> 
> And no,  it's not going to happen. And putting psychoactive drugs in
> the fountain machine is immoral.
> 
> Too bad.

I would of course word this a bit differently, but I cannot argue with
your overall assessment of the technical situation.

On the other hand, I have not given up hope, and so I invoke the wise
words that are often attributed ot George Box: "All models are wrong,
but some are useful."  The C++ abstract machine is a model that is
wrong in that it does not account for hardware (at least not very well)
or even for the laws of physics that govern all hardware.

But as long as the discussion is confined to the non-concurrent code
interacting with itself, the C++ abstract machine is almost always quite
useful.  Too bad that I am almost always working with concurrency and
with the underlying hardware.  On the other hand, what is life without
a challenge?  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17 20:50 ` Linus Torvalds
@ 2024-03-17 21:04   ` Paul E. McKenney
  2024-03-17 21:44   ` Linus Torvalds
  2024-03-18 16:32   ` Linus Torvalds
  2 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2024-03-17 21:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, Mar 17, 2024 at 01:50:02PM -0700, Linus Torvalds wrote:
> On Sun, 17 Mar 2024 at 02:14, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > D3181R0 — Atomic stores and object lifetimes
> >         This one was late to the party, so is not formally published.
> >         It deals with an odd corner case in the C and C++ memory models
> >         in which an atomic_thread_fence(memory_order_release) cannot
> >         completely emulate a store-release operation.  We avoided this
> >         problem in the Linux-kernel memory model, and hardware seems to
> >         do the right thing.  Actually, the speed of light, the atomic
> >         nature of matter, and the causal nature of the universe being
> >         what they appear to be, hardware would have some difficulty
> >         causing trouble here.  But the abstract machine is ignorant of
> >         the laws of physics, so this should be good clean fun!  ;-)
> >
> >         There is an example code fragment here:
> >
> >         https://github.com/llvm/llvm-project/issues/64188
> 
> Looking closer at this one, it seems to be purely a compiler bug.
> 
> Assuming you want to honor memory ordering in the first place, you
> *cannot* move a store that ends up later being visible to other
> threads past a function call, because you don't know if that function
> call might contain a memory barrier.
> 
> There's no laws of physics of speed of light or causality issues at
> all. The bug they describe in that github issue happens on real
> hardware, not on some kind of abstract machine.
> 
> In fact, I think the problem case can be simplified further:
> 
>   int *bug(int N)
>   {
>     int* p = malloc(sizeof(int));
>     *p = N;
>     function_call();
>     return p;
>   }
> 
> without having that "atomic<int>& a" argument involved at all.
> 
> If the compiler moves the store to 'p' to after the function call, and
> then does a "return p" (which exposes that memory location), and the
> function call has any "memory_order_release" store in it (which the
> compiler cannot know), then there needs to be some guarantee that a
> third party (that may have done an "acquire" on the same thing that
> "function_call()" did a release on) always sees the store of N before
> it sees that other store.
> 
> Now, on x86, this happens automatically, because even if you move the
> "*p = N" down to after the function call, all stores are releases, so
> by the time 'p' becomes visible to anybody else, you are guaranteed to
> see the right ordering.
> 
> But on pretty much any other architecture than s390 and x86, you need
> to add your own memory barrier if you did the store to '*p' after the
> function call, because otherwise you end up violating the
> 'memory_order_release' in the called function that you didn't even
> see.
> 
> And yes, to a compiler person, that is very annoying, because
> 'function_call()' itself clearly doesn't know anything about 'p', so
> you'd think that there are no _possible_ visible ordering differences.
> 
> But if the C++ standards body thinks that the re-ordering is fine, the
> C++ standards body is standardizing on "memory ordering is not real".
> 
> I can't find the actual standards text for this, but at least
> according to cppreference.com (I don't know how official that is), we
> have a very clear rule (and honestly, it's the _only_ possible sane
> rule for release->consume, so I hope it's official):
> 
>    All memory writes (non-atomic and relaxed atomic) that
>    happened-before the atomic store from the point of view of thread A,
>    become visible side-effects within those operations in thread B into
>    which the load operation carries dependency, that is, once the atomic
>    load is completed, those operators and functions in thread B that use
>    the value obtained from the load are guaranteed to see what thread A
>    wrote to memory.
> 
> so this is all completely unambiguous. The compiler is *WRONG* to move
> the store to '*p' to after the function call, unless it also adds its
> own 'release' ordering.
> 
> Weak memory ordering is subtle and difficult. What else is new?

All good points.  In short, if they really badly want that optimization,
they will need to provide some way to tell the compiler of ordering
provided by external functions, and a way to shut down those
optimizations.

But it just might be simpler to forgo the optimizations.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17 20:50 ` Linus Torvalds
  2024-03-17 21:04   ` Paul E. McKenney
@ 2024-03-17 21:44   ` Linus Torvalds
  2024-03-17 22:02     ` Paul E. McKenney
  2024-03-18 16:32   ` Linus Torvalds
  2 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2024-03-17 21:44 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, 17 Mar 2024 at 13:50, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Looking closer at this one, it seems to be purely a compiler bug.

Side note: it may be that all that protects us in the kernel from this
compiler bug is the fact that we do not let the compiler know that
"kmalloc()" returns some private memory. So for that particular
pattern, the compiler doesn't actuially know that 'p' is some private
pointer and not visible to anybody else.

In C++, particularly with 'new', the compiler might be much more aware
of the fact that nobody can possibly see 'p' outside of that function
until after the return.

So a scarier example without that kind of issue might be something like this:

    extern void unlock(void);
    extern void lock(void);
    extern void wait_for_x(int *);

    void buggy(void)
    {
        int p = 5;
        unlock();
        wait_for_x(&p);
        lock();
    }

where the basic theory of operation is that we're calling that
function with a lock held, and then that "wait_for_x()" thing does
something that exposes the value and waits for it to be changed.

And at the time of the "unlock()", a buggy compiler *might* think that
the value of "p" is entirely private to that function, so the compiler
might decide to compile this as

 - call "unlock()"

 - *then* set 'p' to 5, and pass off the address to the wait function

and that is very buggy on weakly ordered machines for the very same
reasons that that github issue was raised - it is re-ordering the
store wrt the store-release inherent in the 'unlock()'.

Now, in my quick tests, that doesn't actually happen. I sincerely hope
it is because the compiler sees "Oh, somebody is taking the address of
'p'" and just the act of that address-of will make the compiler know
that it has to serialize stores to 'p' with any function calls - even
function calls that happen before the address is taken.

But that

     https://github.com/llvm/llvm-project/issues/64188

that you linked to certainly seems to imply that some versions of
clang have made the equivalent of that mistake, and could possibly
hoist the assignment to 'p' to after the 'unlock()' call.

           Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17 21:44   ` Linus Torvalds
@ 2024-03-17 22:02     ` Paul E. McKenney
  2024-03-17 22:34       ` Linus Torvalds
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2024-03-17 22:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, Mar 17, 2024 at 02:44:09PM -0700, Linus Torvalds wrote:
> On Sun, 17 Mar 2024 at 13:50, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Looking closer at this one, it seems to be purely a compiler bug.
> 
> Side note: it may be that all that protects us in the kernel from this
> compiler bug is the fact that we do not let the compiler know that
> "kmalloc()" returns some private memory. So for that particular
> pattern, the compiler doesn't actuially know that 'p' is some private
> pointer and not visible to anybody else.

Sadly, we really do let the compiler know:

static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)

#define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc

#define __malloc                        __attribute__((__malloc__))

Maybe we should stop doing so:

------------------------------------------------------------------------

diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h
index 28566624f008f..7b4db0cd093a2 100644
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -181,7 +181,7 @@
  *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-malloc-function-attribute
  * clang: https://clang.llvm.org/docs/AttributeReference.html#malloc
  */
-#define __malloc                        __attribute__((__malloc__))
+#define __malloc
 
 /*
  *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#index-mode-type-attribute

------------------------------------------------------------------------

> In C++, particularly with 'new', the compiler might be much more aware
> of the fact that nobody can possibly see 'p' outside of that function
> until after the return.
> 
> So a scarier example without that kind of issue might be something like this:
> 
>     extern void unlock(void);
>     extern void lock(void);
>     extern void wait_for_x(int *);
> 
>     void buggy(void)
>     {
>         int p = 5;
>         unlock();
>         wait_for_x(&p);
>         lock();
>     }
> 
> where the basic theory of operation is that we're calling that
> function with a lock held, and then that "wait_for_x()" thing does
> something that exposes the value and waits for it to be changed.
> 
> And at the time of the "unlock()", a buggy compiler *might* think that
> the value of "p" is entirely private to that function, so the compiler
> might decide to compile this as
> 
>  - call "unlock()"
> 
>  - *then* set 'p' to 5, and pass off the address to the wait function
> 
> and that is very buggy on weakly ordered machines for the very same
> reasons that that github issue was raised - it is re-ordering the
> store wrt the store-release inherent in the 'unlock()'.
> 
> Now, in my quick tests, that doesn't actually happen. I sincerely hope
> it is because the compiler sees "Oh, somebody is taking the address of
> 'p'" and just the act of that address-of will make the compiler know
> that it has to serialize stores to 'p' with any function calls - even
> function calls that happen before the address is taken.
> 
> But that
> 
>      https://github.com/llvm/llvm-project/issues/64188
> 
> that you linked to certainly seems to imply that some versions of
> clang have made the equivalent of that mistake, and could possibly
> hoist the assignment to 'p' to after the 'unlock()' call.

Yes, it really does happen in some cases.

And I agree that there are likely a great many failure cases.

The initial examples were user error where the pointer was handed off to
some other thread without synchronizing the lifetime of the pointed-to
object.  I chastised them for this, and they eventually came up with
the external function hiding the atomic_thread_fence() from the compiler.

							Thanx, Paul

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17 22:02     ` Paul E. McKenney
@ 2024-03-17 22:34       ` Linus Torvalds
  2024-03-17 23:46         ` Jonathan Martin
  2024-03-18  0:42         ` Paul E. McKenney
  0 siblings, 2 replies; 19+ messages in thread
From: Linus Torvalds @ 2024-03-17 22:34 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, 17 Mar 2024 at 15:02, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> Sadly, we really do let the compiler know:

Oops.

Oh well.

Can you please give the standards body that simplified example of
mine, together with a litmus test, and an explanation for why it's
very fundamentally wrong to move a store past a function call that you
don't know?

Because this is literally fundamental. If compilers move that store
past a random function call, they *will* have destroyed memory
ordering on arm64.

Not on x86, no. Which just means that 99% of all the testing we do for
the kernel won't find this. But on weakly ordered architectures, it
really is very very wrong, and no among ot language lawyering will
ever make it right.

Now, maybe some function attribute (like the already existing
"__attribute__((pure))" or "__attrribute__((const))" can then be used
to say "this function has no memory ordering side effects. But without
that kind of explicit knowledge, the compiler really must not do that
code movement.

And this isn't a kernel issue, This is literally a "without this, all
the memory ordering verbiage is just broken fantasy".

And honestly, compiler writers DO NOT UNDERSTAND memory ordering, and
they don't understand the whole "abstract machine" thing either. This
needs to be a litmus test with real code and real explanation. IOW,
tell them that code like this:

    extern void external_function(void);

    int *buggy(void)
    {
        int *p = new int;
        *p = 5;
        external_function();
        return p;
    }

absolutely *has* to generate code like

        mov     w0, #4
        bl      _Znwm
        mov     w8, #5
        mov     x19, x0
        str     w8, [x0]
        bl      _Z17external_functionv

on arm64, and explain to them *why* that 'str' has to be before the
function call and cannot be moved around a function.

Or explain to them that if they move that store across the function
call (because "obviously the function cannot possibly need it"), they
need to make the 'str' be a 'stlr'.

Make it very concrete, because I *guarantee* that if you explain it in
terms of some abstract machine, it's not going to really make them
understand. It's too far removed from the actual problem case.

           Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17 22:34       ` Linus Torvalds
@ 2024-03-17 23:46         ` Jonathan Martin
  2024-03-18  0:42         ` Paul E. McKenney
  1 sibling, 0 replies; 19+ messages in thread
From: Jonathan Martin @ 2024-03-17 23:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-toolchains

Hello;

I doubt I have any reason to speak on this but I am compelled to try. I have been thinking this over for a long time and this may be an appropriate entry point.

If the OSI model is combined with the six-tier hierarchy of controls used in safety for industrial systems, you get something that can permutate by substituting the application layer with any other position in the ring. I’ve named this as a substitute for the clamoring of idiocy that is DevOps Tree(3).0 to be “Developemnt of Secure Applications,” and “Secure Operations Development Authority.” Dosa with Soda.

The OSI model is obvious enough that I might as well be handing over a toddler's puzzle cube, but I don’t know if you are aware of the six tier hierarchy of controls since the five-tier model has better advertising:

ELIMINATION, SUBSTITUTION, ISOLATION;

ENGINEERING CONTROLS, ADMINISTRATIVE CONTROLS, Personal Protective Equipment;

Additionally:

PRESENTAITON, SESSION, TRANSPORT;

NETWORK, DATA-LINK, PHYSICAL;

<< APPLICATION -> ROLE

If it follows naturally through the maxim of “God Programmers Create God Programs,” actual and real implementation of this model would lead to a massive reduction in lead time for the Five stages of Project Management.

~A9

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17 22:34       ` Linus Torvalds
  2024-03-17 23:46         ` Jonathan Martin
@ 2024-03-18  0:42         ` Paul E. McKenney
  2024-03-18  1:49           ` Linus Torvalds
  1 sibling, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2024-03-18  0:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, Mar 17, 2024 at 03:34:17PM -0700, Linus Torvalds wrote:
> On Sun, 17 Mar 2024 at 15:02, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > Sadly, we really do let the compiler know:
> 
> Oops.
> 
> Oh well.

I was confused into thinking the same a few years back as well.  :-/

> Can you please give the standards body that simplified example of
> mine, together with a litmus test, and an explanation for why it's
> very fundamentally wrong to move a store past a function call that you
> don't know?
> 
> Because this is literally fundamental. If compilers move that store
> past a random function call, they *will* have destroyed memory
> ordering on arm64.
> 
> Not on x86, no. Which just means that 99% of all the testing we do for
> the kernel won't find this. But on weakly ordered architectures, it
> really is very very wrong, and no among ot language lawyering will
> ever make it right.
> 
> Now, maybe some function attribute (like the already existing
> "__attribute__((pure))" or "__attrribute__((const))" can then be used
> to say "this function has no memory ordering side effects. But without
> that kind of explicit knowledge, the compiler really must not do that
> code movement.
> 
> And this isn't a kernel issue, This is literally a "without this, all
> the memory ordering verbiage is just broken fantasy".
> 
> And honestly, compiler writers DO NOT UNDERSTAND memory ordering, and
> they don't understand the whole "abstract machine" thing either.

The compiler writers' protestations about concurrency being a niche use
case certainly are wearing a bit thin, aren't they?  My smartphone has
eight hardware threads, which was considered to be a huge number not
that many decades back.  ;-)

On the other hand, there is much more awareness of concurrency in that
group than 20 years ago, so there is hope.

>                                                                  This
> needs to be a litmus test with real code and real explanation. IOW,
> tell them that code like this:
> 
>     extern void external_function(void);
> 
>     int *buggy(void)
>     {
>         int *p = new int;
>         *p = 5;
>         external_function();
>         return p;
>     }
> 
> absolutely *has* to generate code like
> 
>         mov     w0, #4
>         bl      _Znwm
>         mov     w8, #5
>         mov     x19, x0
>         str     w8, [x0]
>         bl      _Z17external_functionv
> 
> on arm64, and explain to them *why* that 'str' has to be before the
> function call and cannot be moved around a function.
> 
> Or explain to them that if they move that store across the function
> call (because "obviously the function cannot possibly need it"), they
> need to make the 'str' be a 'stlr'.
> 
> Make it very concrete, because I *guarantee* that if you explain it in
> terms of some abstract machine, it's not going to really make them
> understand. It's too far removed from the actual problem case.

Done.

One interesting complication is a guarantee of ordering versus the
possibility of ordering.  If a function is unmarked, the compiler must
assume that it might provide full ordering, but it cannot rely on any
ordering at all.

It should be an interesting discussion.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-18  0:42         ` Paul E. McKenney
@ 2024-03-18  1:49           ` Linus Torvalds
  2024-03-18  2:44             ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2024-03-18  1:49 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, 17 Mar 2024 at 17:42, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On the other hand, there is much more awareness of concurrency in that
> group than 20 years ago, so there is hope.

Yeah. But when I say "compiler writers don't understand memory
ordering", it's not that I think they need to be singled out - pretty
much *nobody* understands it.

Christ, I'm supposed to know it fairly well, and I still get it wrong
myself regularly and have to really think about it (and honestly just
prefer leaning on a few standard patterns rather than having to think
about it too much).

So "awareness of concurrency" is one thing, and I agree it's getting
much better.

Actually getting memory ordering right - even when you are aware of
concurrency - is another thing entirely.

                 Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-18  1:49           ` Linus Torvalds
@ 2024-03-18  2:44             ` Paul E. McKenney
  2024-03-18  2:57               ` Randy Dunlap
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2024-03-18  2:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, Mar 17, 2024 at 06:49:17PM -0700, Linus Torvalds wrote:
> On Sun, 17 Mar 2024 at 17:42, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On the other hand, there is much more awareness of concurrency in that
> > group than 20 years ago, so there is hope.
> 
> Yeah. But when I say "compiler writers don't understand memory
> ordering", it's not that I think they need to be singled out - pretty
> much *nobody* understands it.

Fair enough!

> Christ, I'm supposed to know it fairly well, and I still get it wrong
> myself regularly and have to really think about it (and honestly just
> prefer leaning on a few standard patterns rather than having to think
> about it too much).
> 
> So "awareness of concurrency" is one thing, and I agree it's getting
> much better.
> 
> Actually getting memory ordering right - even when you are aware of
> concurrency - is another thing entirely.

Agreed, myself included.  So we should all use the standard patterns where
we can, getting ourselves into memory-model trouble when those patterns
are not cutting it.  And over time, we add to the standard patterns.

But we are making progress.  Fifty years ago, the consensus was that
developers could not be trusted to get while-loop conditions right.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-18  2:44             ` Paul E. McKenney
@ 2024-03-18  2:57               ` Randy Dunlap
  2024-03-18  4:42                 ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Randy Dunlap @ 2024-03-18  2:57 UTC (permalink / raw)
  To: paulmck, Linus Torvalds
  Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook



On 3/17/24 19:44, Paul E. McKenney wrote:
> On Sun, Mar 17, 2024 at 06:49:17PM -0700, Linus Torvalds wrote:
>> On Sun, 17 Mar 2024 at 17:42, Paul E. McKenney <paulmck@kernel.org> wrote:
>>>
>>> On the other hand, there is much more awareness of concurrency in that
>>> group than 20 years ago, so there is hope.
>>
>> Yeah. But when I say "compiler writers don't understand memory
>> ordering", it's not that I think they need to be singled out - pretty
>> much *nobody* understands it.
> 
> Fair enough!
> 
>> Christ, I'm supposed to know it fairly well, and I still get it wrong
>> myself regularly and have to really think about it (and honestly just
>> prefer leaning on a few standard patterns rather than having to think
>> about it too much).
>>
>> So "awareness of concurrency" is one thing, and I agree it's getting
>> much better.
>>
>> Actually getting memory ordering right - even when you are aware of
>> concurrency - is another thing entirely.
> 
> Agreed, myself included.  So we should all use the standard patterns where
> we can, getting ourselves into memory-model trouble when those patterns
> are not cutting it.  And over time, we add to the standard patterns.
> 
> But we are making progress.  Fifty years ago, the consensus was that
> developers could not be trusted to get while-loop conditions right.  ;-)

I was using for loops and do-until loops 50 years ago, but maybe not "while"
loops. Or are you off by 10 years or so?

-- 
#Randy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-18  2:57               ` Randy Dunlap
@ 2024-03-18  4:42                 ` Paul E. McKenney
  2024-03-18  4:45                   ` Randy Dunlap
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2024-03-18  4:42 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Linus Torvalds, linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

On Sun, Mar 17, 2024 at 07:57:31PM -0700, Randy Dunlap wrote:
> 
> 
> On 3/17/24 19:44, Paul E. McKenney wrote:
> > On Sun, Mar 17, 2024 at 06:49:17PM -0700, Linus Torvalds wrote:
> >> On Sun, 17 Mar 2024 at 17:42, Paul E. McKenney <paulmck@kernel.org> wrote:
> >>>
> >>> On the other hand, there is much more awareness of concurrency in that
> >>> group than 20 years ago, so there is hope.
> >>
> >> Yeah. But when I say "compiler writers don't understand memory
> >> ordering", it's not that I think they need to be singled out - pretty
> >> much *nobody* understands it.
> > 
> > Fair enough!
> > 
> >> Christ, I'm supposed to know it fairly well, and I still get it wrong
> >> myself regularly and have to really think about it (and honestly just
> >> prefer leaning on a few standard patterns rather than having to think
> >> about it too much).
> >>
> >> So "awareness of concurrency" is one thing, and I agree it's getting
> >> much better.
> >>
> >> Actually getting memory ordering right - even when you are aware of
> >> concurrency - is another thing entirely.
> > 
> > Agreed, myself included.  So we should all use the standard patterns where
> > we can, getting ourselves into memory-model trouble when those patterns
> > are not cutting it.  And over time, we add to the standard patterns.
> > 
> > But we are making progress.  Fifty years ago, the consensus was that
> > developers could not be trusted to get while-loop conditions right.  ;-)
> 
> I was using for loops and do-until loops 50 years ago, but maybe not "while"
> loops. Or are you off by 10 years or so?

So was I.  Yet in the late 1970s, I attended a talk by a guy named
Edsger Dijktra with examples claiming that you could not trust ordinary
developers to correctly write "while" loops.  Sort of like some people
today claim that ordinary developers cannot be trusted to write concurrent
code.

Of course, one might reasonably argue that developers cannot be trusted
to write much of any code at all.  Some days I would agree.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-18  4:42                 ` Paul E. McKenney
@ 2024-03-18  4:45                   ` Randy Dunlap
  0 siblings, 0 replies; 19+ messages in thread
From: Randy Dunlap @ 2024-03-18  4:45 UTC (permalink / raw)
  To: paulmck
  Cc: Linus Torvalds, linux-toolchains, peterz, hpa, rostedt, gregkh, keescook



On 3/17/24 21:42, Paul E. McKenney wrote:
> On Sun, Mar 17, 2024 at 07:57:31PM -0700, Randy Dunlap wrote:
>>
>>
>> On 3/17/24 19:44, Paul E. McKenney wrote:
>>> On Sun, Mar 17, 2024 at 06:49:17PM -0700, Linus Torvalds wrote:
>>>> On Sun, 17 Mar 2024 at 17:42, Paul E. McKenney <paulmck@kernel.org> wrote:
>>>>>
>>>>> On the other hand, there is much more awareness of concurrency in that
>>>>> group than 20 years ago, so there is hope.
>>>>
>>>> Yeah. But when I say "compiler writers don't understand memory
>>>> ordering", it's not that I think they need to be singled out - pretty
>>>> much *nobody* understands it.
>>>
>>> Fair enough!
>>>
>>>> Christ, I'm supposed to know it fairly well, and I still get it wrong
>>>> myself regularly and have to really think about it (and honestly just
>>>> prefer leaning on a few standard patterns rather than having to think
>>>> about it too much).
>>>>
>>>> So "awareness of concurrency" is one thing, and I agree it's getting
>>>> much better.
>>>>
>>>> Actually getting memory ordering right - even when you are aware of
>>>> concurrency - is another thing entirely.
>>>
>>> Agreed, myself included.  So we should all use the standard patterns where
>>> we can, getting ourselves into memory-model trouble when those patterns
>>> are not cutting it.  And over time, we add to the standard patterns.
>>>
>>> But we are making progress.  Fifty years ago, the consensus was that
>>> developers could not be trusted to get while-loop conditions right.  ;-)
>>
>> I was using for loops and do-until loops 50 years ago, but maybe not "while"
>> loops. Or are you off by 10 years or so?
> 
> So was I.  Yet in the late 1970s, I attended a talk by a guy named
> Edsger Dijktra with examples claiming that you could not trust ordinary
> developers to correctly write "while" loops.  Sort of like some people
> today claim that ordinary developers cannot be trusted to write concurrent
> code.
> 
> Of course, one might reasonably argue that developers cannot be trusted
> to write much of any code at all.  Some days I would agree.  ;-)

Ack that.

-- 
#Randy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17 20:50 ` Linus Torvalds
  2024-03-17 21:04   ` Paul E. McKenney
  2024-03-17 21:44   ` Linus Torvalds
@ 2024-03-18 16:32   ` Linus Torvalds
  2024-03-18 16:48     ` H. Peter Anvin
  2 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2024-03-18 16:32 UTC (permalink / raw)
  To: paulmck; +Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook

[ Final note on this, I hope ]

On Sun, 17 Mar 2024 at 13:50, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Now, on x86, this happens automatically, because even if you move the
> "*p = N" down to after the function call, all stores are releases, so
> by the time 'p' becomes visible to anybody else, you are guaranteed to
> see the right ordering.

Actually, I take that back.

Even x86 (and s390) can see problems from the "move store past a
function call" issue, although they require more effort by the
compiler.

Because while it is true that every store is a release on x86, and as
such any store that exposes the address to another CPU will
automatically have done the release that also guarantees that the
value 'N' is visible to any other thread (and then the acquire will
guarantee that the other end sees the right value), that doesn't
necessarily fix the problem.

Why? Once the compiler has missed the original memory barrier (that
was in the function that it moved the store past), the compiler could
end up doing further store movement, and simply generate the code to
do the '*p = N' store after exposing the address.

At that point, even a strong memory ordering won't help - although it
would probably make the problem easier to spot as a human (and would
probably make it easier to trigger too, since then things like
interrupts, preemption or single-stepping would also make the window
to see it much much bigger).

So x86 wouldn't be immune to this, it would just require more
reordering by the compiler (which might in turn require that the
function was inlined in order to give that re-ordering possibility, of
course).

               Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-18 16:32   ` Linus Torvalds
@ 2024-03-18 16:48     ` H. Peter Anvin
  0 siblings, 0 replies; 19+ messages in thread
From: H. Peter Anvin @ 2024-03-18 16:48 UTC (permalink / raw)
  To: Linus Torvalds, paulmck
  Cc: linux-toolchains, peterz, rostedt, gregkh, keescook

On March 18, 2024 9:32:55 AM PDT, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>[ Final note on this, I hope ]
>
>On Sun, 17 Mar 2024 at 13:50, Linus Torvalds
><torvalds@linux-foundation.org> wrote:
>>
>> Now, on x86, this happens automatically, because even if you move the
>> "*p = N" down to after the function call, all stores are releases, so
>> by the time 'p' becomes visible to anybody else, you are guaranteed to
>> see the right ordering.
>
>Actually, I take that back.
>
>Even x86 (and s390) can see problems from the "move store past a
>function call" issue, although they require more effort by the
>compiler.
>
>Because while it is true that every store is a release on x86, and as
>such any store that exposes the address to another CPU will
>automatically have done the release that also guarantees that the
>value 'N' is visible to any other thread (and then the acquire will
>guarantee that the other end sees the right value), that doesn't
>necessarily fix the problem.
>
>Why? Once the compiler has missed the original memory barrier (that
>was in the function that it moved the store past), the compiler could
>end up doing further store movement, and simply generate the code to
>do the '*p = N' store after exposing the address.
>
>At that point, even a strong memory ordering won't help - although it
>would probably make the problem easier to spot as a human (and would
>probably make it easier to trigger too, since then things like
>interrupts, preemption or single-stepping would also make the window
>to see it much much bigger).
>
>So x86 wouldn't be immune to this, it would just require more
>reordering by the compiler (which might in turn require that the
>function was inlined in order to give that re-ordering possibility, of
>course).
>
>               Linus

Hardware memory order doesn't mean anything if the compiler is the one messing it up...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-17  9:14 A few proposals, this time from the C++ standards committee Paul E. McKenney
  2024-03-17 18:50 ` Linus Torvalds
  2024-03-17 20:50 ` Linus Torvalds
@ 2024-03-19  7:41 ` Marco Elver
  2024-03-19  8:07   ` Jakub Jelinek
  2 siblings, 1 reply; 19+ messages in thread
From: Marco Elver @ 2024-03-19  7:41 UTC (permalink / raw)
  To: paulmck
  Cc: linux-toolchains, peterz, hpa, rostedt, gregkh, keescook,
	torvalds, Evgenii Stepanov, Kostya Serebryany

Hi Paul,

On Sun, 17 Mar 2024 at 10:14, Paul E. McKenney <paulmck@kernel.org> wrote:
> ------------------------------------------------------------------------
[...]
> D3125R0 — Pointer tagging
>         Another one that is late to the party, and thus not yet formally
>         published.  The idea is to provide a way to access pointer bits
>         that are not relevant to pointer dereferencing for pointers to
>         properly aligned objects or that are unused high-order bits.
>         It would be nice.  The devil is in the details.

You mention it's not formally published, but is there a draft that is
already accessible somewhere?

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A few proposals, this time from the C++ standards committee
  2024-03-19  7:41 ` Marco Elver
@ 2024-03-19  8:07   ` Jakub Jelinek
  0 siblings, 0 replies; 19+ messages in thread
From: Jakub Jelinek @ 2024-03-19  8:07 UTC (permalink / raw)
  To: Marco Elver
  Cc: paulmck, linux-toolchains, peterz, hpa, rostedt, gregkh,
	keescook, torvalds, Evgenii Stepanov, Kostya Serebryany

On Tue, Mar 19, 2024 at 08:41:27AM +0100, Marco Elver wrote:
> Hi Paul,
> 
> On Sun, 17 Mar 2024 at 10:14, Paul E. McKenney <paulmck@kernel.org> wrote:
> > ------------------------------------------------------------------------
> [...]
> > D3125R0 — Pointer tagging
> >         Another one that is late to the party, and thus not yet formally
> >         published.  The idea is to provide a way to access pointer bits
> >         that are not relevant to pointer dereferencing for pointers to
> >         properly aligned objects or that are unused high-order bits.
> >         It would be nice.  The devil is in the details.
> 
> You mention it's not formally published, but is there a draft that is
> already accessible somewhere?

https://wg21.link/D3125R0

	Jakub


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-03-19  8:07 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-17  9:14 A few proposals, this time from the C++ standards committee Paul E. McKenney
2024-03-17 18:50 ` Linus Torvalds
2024-03-17 20:56   ` Paul E. McKenney
2024-03-17 20:50 ` Linus Torvalds
2024-03-17 21:04   ` Paul E. McKenney
2024-03-17 21:44   ` Linus Torvalds
2024-03-17 22:02     ` Paul E. McKenney
2024-03-17 22:34       ` Linus Torvalds
2024-03-17 23:46         ` Jonathan Martin
2024-03-18  0:42         ` Paul E. McKenney
2024-03-18  1:49           ` Linus Torvalds
2024-03-18  2:44             ` Paul E. McKenney
2024-03-18  2:57               ` Randy Dunlap
2024-03-18  4:42                 ` Paul E. McKenney
2024-03-18  4:45                   ` Randy Dunlap
2024-03-18 16:32   ` Linus Torvalds
2024-03-18 16:48     ` H. Peter Anvin
2024-03-19  7:41 ` Marco Elver
2024-03-19  8:07   ` Jakub Jelinek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.