All of lore.kernel.org
 help / color / mirror / Atom feed
* Control Dependencies vs C Compilers
@ 2020-10-06 11:47 Peter Zijlstra
  2020-10-06 12:37 ` David Laight
  2020-10-06 21:20 ` Florian Weimer
  0 siblings, 2 replies; 21+ messages in thread
From: Peter Zijlstra @ 2020-10-06 11:47 UTC (permalink / raw)
  To: linux-toolchains, Will Deacon, Paul McKenney
  Cc: linux-kernel, stern, parri.andrea, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, dlustig, joel, torvalds

Hi,

Let's give this linux-toolchains thing a test-run...

As some of you might know, there's a bit of a discrepancy between what
compiler and kernel people consider 'valid' use of the compiler :-)

One area where this shows up is in implicit (memory) ordering provided
by the hardware, which we kernel people would like to use to avoid
explicit fences (expensive) but which the compiler is unaware of and
could ruin (bad).

During the last LPC we had a session on that; find here:

  https://linuxplumbersconf.org/event/7/contributions/821/

With recordings of the event here:

  https://youtu.be/FFjV9f_Ub9o?t=89

That presentation covers 3 different implicit dependencies and various
ways in which a compiler can ruin the game. For this thread, I'd like to
limit things to just control-dependencies. We can start separate threads
for the other issues.

In short, the control dependency relies on the hardware never
speculating stores (instant OOTA) to provide a LOAD->STORE ordering.
That is, a LOAD must be completed to resolve a conditional branch, the
STORE is after the branch and cannot be made visible until the branch is
determined (which implies the load is complete).

However, our 'dear' C language has no clue of any of this.

So given code like:

	x = *foo;
	if (x > 42)
		*bar = 1;

Which, if literally translated into assembly, would provide a
LOAD->STORE order between foo and bar, could, in the hands of an
evil^Woptimizing compiler, become:

	x = *foo;
	*bar = 1;

because it knows, through value tracking, that the condition must be
true.

Our Documentation/memory-barriers.txt has a Control Dependencies section
(which I shall not replicate here for brevity) which lists a number of
caveats. But in general the work-around we use is:

	x = READ_ONCE(*foo);
	if (x > 42)
		WRITE_ONCE(*bar, 1);

Where READ/WRITE_ONCE() cast the variable volatile. The volatile
qualifier dissuades the compiler from assuming it knows things and we
then hope it will indeed emit the branch like we'd expect.


Now, hoping the compiler generates correct code is clearly not ideal and
very dangerous indeed. Which is why my question to the compiler folks
assembled here is:

  Can we get a C language extention for this?

And while we have a fair number (and growing) existing users of this in
the kernel, I'd not be adverse to having to annotate them.

Any suggestions from the compiler people present on how they'd like to
provide us this feature?

Even just being able to detect this going wrong would be a step forward.

 ~ Peter

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Control Dependencies vs C Compilers
  2020-10-06 11:47 Control Dependencies vs C Compilers Peter Zijlstra
@ 2020-10-06 12:37 ` David Laight
  2020-10-06 12:49   ` Willy Tarreau
  2020-10-06 13:31   ` Peter Zijlstra
  2020-10-06 21:20 ` Florian Weimer
  1 sibling, 2 replies; 21+ messages in thread
From: David Laight @ 2020-10-06 12:37 UTC (permalink / raw)
  To: 'Peter Zijlstra', linux-toolchains, Will Deacon, Paul McKenney
  Cc: linux-kernel, stern, parri.andrea, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, dlustig, joel, torvalds

From: Peter Zijlstra
> Sent: 06 October 2020 12:47
> Hi,
> 
> Let's give this linux-toolchains thing a test-run...
> 
> As some of you might know, there's a bit of a discrepancy between what
> compiler and kernel people consider 'valid' use of the compiler :-)
> 
> One area where this shows up is in implicit (memory) ordering provided
> by the hardware, which we kernel people would like to use to avoid
> explicit fences (expensive) but which the compiler is unaware of and
> could ruin (bad).
...
> 
> In short, the control dependency relies on the hardware never
> speculating stores (instant OOTA) to provide a LOAD->STORE ordering.
> That is, a LOAD must be completed to resolve a conditional branch, the
> STORE is after the branch and cannot be made visible until the branch is
> determined (which implies the load is complete).
> 
> However, our 'dear' C language has no clue of any of this.
> 
> So given code like:
> 
> 	x = *foo;
> 	if (x > 42)
> 		*bar = 1;
> 
> Which, if literally translated into assembly, would provide a
> LOAD->STORE order between foo and bar, could, in the hands of an
> evil^Woptimizing compiler, become:
> 
> 	x = *foo;
> 	*bar = 1;
> 
> because it knows, through value tracking, that the condition must be
> true.
> 
> Our Documentation/memory-barriers.txt has a Control Dependencies section
> (which I shall not replicate here for brevity) which lists a number of
> caveats. But in general the work-around we use is:
> 
> 	x = READ_ONCE(*foo);
> 	if (x > 42)
> 		WRITE_ONCE(*bar, 1);

An alternative is to 'persuade' the compiler that
any 'tracked' value for a local variable is invalid.
Rather like the way that barrier() 'invalidates' memory.
So you generate:

	x = *foo
	asm ("" : "+r" (x));
	if (x > 42)
		*bar = 1;

Since the "+r" constraint indicates that the value of 'x'
might have changed it can't optimise based on any
presumed old value.
(Unless it looks inside the asm opcodes...)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Control Dependencies vs C Compilers
  2020-10-06 12:37 ` David Laight
@ 2020-10-06 12:49   ` Willy Tarreau
  2020-10-06 13:31   ` Peter Zijlstra
  1 sibling, 0 replies; 21+ messages in thread
From: Willy Tarreau @ 2020-10-06 12:49 UTC (permalink / raw)
  To: David Laight
  Cc: 'Peter Zijlstra',
	linux-toolchains, Will Deacon, Paul McKenney, linux-kernel,
	stern, parri.andrea, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel, torvalds

On Tue, Oct 06, 2020 at 12:37:06PM +0000, David Laight wrote:
> > Our Documentation/memory-barriers.txt has a Control Dependencies section
> > (which I shall not replicate here for brevity) which lists a number of
> > caveats. But in general the work-around we use is:
> > 
> > 	x = READ_ONCE(*foo);
> > 	if (x > 42)
> > 		WRITE_ONCE(*bar, 1);
> 
> An alternative is to 'persuade' the compiler that
> any 'tracked' value for a local variable is invalid.
> Rather like the way that barrier() 'invalidates' memory.
> So you generate:
> 
> 	x = *foo
> 	asm ("" : "+r" (x));
> 	if (x > 42)
> 		*bar = 1;
> 
> Since the "+r" constraint indicates that the value of 'x'
> might have changed it can't optimise based on any
> presumed old value.
> (Unless it looks inside the asm opcodes...)

I'm using exactly this in userland to prevent the compiler from guessing
what I'm doing with a variable, and it's also useful sometimes to shut up
certain warnings when I know a condition is satisfied but can hardly be
expressed in a way to please the compiler. Overall I find that it's no
big deal and forces the developer to think twice before doing it, which
is probably a good thing in general.

Willy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Control Dependencies vs C Compilers
  2020-10-06 12:37 ` David Laight
  2020-10-06 12:49   ` Willy Tarreau
@ 2020-10-06 13:31   ` Peter Zijlstra
  2020-10-06 14:23     ` stern
  2020-10-06 15:07     ` David Laight
  1 sibling, 2 replies; 21+ messages in thread
From: Peter Zijlstra @ 2020-10-06 13:31 UTC (permalink / raw)
  To: David Laight
  Cc: linux-toolchains, Will Deacon, Paul McKenney, linux-kernel,
	stern, parri.andrea, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel, torvalds

On Tue, Oct 06, 2020 at 12:37:06PM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 06 October 2020 12:47
> > Hi,
> > 
> > Let's give this linux-toolchains thing a test-run...
> > 
> > As some of you might know, there's a bit of a discrepancy between what
> > compiler and kernel people consider 'valid' use of the compiler :-)
> > 
> > One area where this shows up is in implicit (memory) ordering provided
> > by the hardware, which we kernel people would like to use to avoid
> > explicit fences (expensive) but which the compiler is unaware of and
> > could ruin (bad).
> ...
> > 
> > In short, the control dependency relies on the hardware never
> > speculating stores (instant OOTA) to provide a LOAD->STORE ordering.
> > That is, a LOAD must be completed to resolve a conditional branch, the
> > STORE is after the branch and cannot be made visible until the branch is
> > determined (which implies the load is complete).
> > 
> > However, our 'dear' C language has no clue of any of this.
> > 
> > So given code like:
> > 
> > 	x = *foo;
> > 	if (x > 42)
> > 		*bar = 1;
> > 
> > Which, if literally translated into assembly, would provide a
> > LOAD->STORE order between foo and bar, could, in the hands of an
> > evil^Woptimizing compiler, become:
> > 
> > 	x = *foo;
> > 	*bar = 1;
> > 
> > because it knows, through value tracking, that the condition must be
> > true.
> > 
> > Our Documentation/memory-barriers.txt has a Control Dependencies section
> > (which I shall not replicate here for brevity) which lists a number of
> > caveats. But in general the work-around we use is:
> > 
> > 	x = READ_ONCE(*foo);
> > 	if (x > 42)
> > 		WRITE_ONCE(*bar, 1);
> 
> An alternative is to 'persuade' the compiler that
> any 'tracked' value for a local variable is invalid.
> Rather like the way that barrier() 'invalidates' memory.
> So you generate:
> 
> 	x = *foo
> 	asm ("" : "+r" (x));
> 	if (x > 42)
> 		*bar = 1;
> 
> Since the "+r" constraint indicates that the value of 'x'
> might have changed it can't optimise based on any
> presumed old value.
> (Unless it looks inside the asm opcodes...)

The compiler can still try and lift the store out of the block, possibly
by inventing more stores.

Please go read memory-barriers.txt for a bunch of other examples.

This thread is not to collect work-arounds that might convince a
compiler to emit the desired code as a side effect, but to get the
compiler people involved and get control-dependencies recognised such
that correct code gen is guaranteed.

Only if we get the compiler people on board and have them provide means
are we guaranteed safe from the optimizer. Otherwise we'll just keep
playing whack-a-mole with fancy new optimization techniques. And given
how horridly painful it is to debug memory ordering problems, I feel it
is best to make sure we're not going to have to more than necessary.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Control Dependencies vs C Compilers
  2020-10-06 13:31   ` Peter Zijlstra
@ 2020-10-06 14:23     ` stern
  2020-10-06 14:43       ` Peter Zijlstra
  2020-10-06 15:07     ` David Laight
  1 sibling, 1 reply; 21+ messages in thread
From: stern @ 2020-10-06 14:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Laight, linux-toolchains, Will Deacon, Paul McKenney,
	linux-kernel, parri.andrea, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, dlustig, joel, torvalds

On Tue, Oct 06, 2020 at 03:31:15PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 06, 2020 at 12:37:06PM +0000, David Laight wrote:
> > From: Peter Zijlstra
> > > Sent: 06 October 2020 12:47
> > > Hi,
> > > 
> > > Let's give this linux-toolchains thing a test-run...
> > > 
> > > As some of you might know, there's a bit of a discrepancy between what
> > > compiler and kernel people consider 'valid' use of the compiler :-)
> > > 
> > > One area where this shows up is in implicit (memory) ordering provided
> > > by the hardware, which we kernel people would like to use to avoid
> > > explicit fences (expensive) but which the compiler is unaware of and
> > > could ruin (bad).
> > ...
> > > 
> > > In short, the control dependency relies on the hardware never
> > > speculating stores (instant OOTA) to provide a LOAD->STORE ordering.
> > > That is, a LOAD must be completed to resolve a conditional branch, the
> > > STORE is after the branch and cannot be made visible until the branch is
> > > determined (which implies the load is complete).
> > > 
> > > However, our 'dear' C language has no clue of any of this.
> > > 
> > > So given code like:
> > > 
> > > 	x = *foo;
> > > 	if (x > 42)
> > > 		*bar = 1;
> > > 
> > > Which, if literally translated into assembly, would provide a
> > > LOAD->STORE order between foo and bar, could, in the hands of an
> > > evil^Woptimizing compiler, become:
> > > 
> > > 	x = *foo;
> > > 	*bar = 1;
> > > 
> > > because it knows, through value tracking, that the condition must be
> > > true.
> > > 
> > > Our Documentation/memory-barriers.txt has a Control Dependencies section
> > > (which I shall not replicate here for brevity) which lists a number of
> > > caveats. But in general the work-around we use is:
> > > 
> > > 	x = READ_ONCE(*foo);
> > > 	if (x > 42)
> > > 		WRITE_ONCE(*bar, 1);
> > 
> > An alternative is to 'persuade' the compiler that
> > any 'tracked' value for a local variable is invalid.
> > Rather like the way that barrier() 'invalidates' memory.
> > So you generate:
> > 
> > 	x = *foo
> > 	asm ("" : "+r" (x));
> > 	if (x > 42)
> > 		*bar = 1;
> > 
> > Since the "+r" constraint indicates that the value of 'x'
> > might have changed it can't optimise based on any
> > presumed old value.
> > (Unless it looks inside the asm opcodes...)
> 
> The compiler can still try and lift the store out of the block, possibly
> by inventing more stores.
> 
> Please go read memory-barriers.txt for a bunch of other examples.
> 
> This thread is not to collect work-arounds that might convince a
> compiler to emit the desired code as a side effect, but to get the
> compiler people involved and get control-dependencies recognised such
> that correct code gen is guaranteed.
> 
> Only if we get the compiler people on board and have them provide means
> are we guaranteed safe from the optimizer. Otherwise we'll just keep
> playing whack-a-mole with fancy new optimization techniques. And given
> how horridly painful it is to debug memory ordering problems, I feel it
> is best to make sure we're not going to have to more than necessary.

Given that you would have to add a compiler annotation, isn't it just as 
easy to use READ_ONCE and WRITE_ONCE?

Or are you worried that even with READ_ONCE and WRITE_ONCE, the compiler 
might still somehow defeat the control dependency?

Alan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Control Dependencies vs C Compilers
  2020-10-06 14:23     ` stern
@ 2020-10-06 14:43       ` Peter Zijlstra
  2020-10-06 15:16         ` Nick Clifton
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2020-10-06 14:43 UTC (permalink / raw)
  To: stern
  Cc: David Laight, linux-toolchains, Will Deacon, Paul McKenney,
	linux-kernel, parri.andrea, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, dlustig, joel, torvalds

On Tue, Oct 06, 2020 at 10:23:24AM -0400, stern@rowland.harvard.edu wrote:
> On Tue, Oct 06, 2020 at 03:31:15PM +0200, Peter Zijlstra wrote:

> > Only if we get the compiler people on board and have them provide means
> > are we guaranteed safe from the optimizer. Otherwise we'll just keep
> > playing whack-a-mole with fancy new optimization techniques. And given
> > how horridly painful it is to debug memory ordering problems, I feel it
> > is best to make sure we're not going to have to more than necessary.
> 
> Given that you would have to add a compiler annotation, isn't it just as 
> easy to use READ_ONCE and WRITE_ONCE?
> 
> Or are you worried that even with READ_ONCE and WRITE_ONCE, the compiler 
> might still somehow defeat the control dependency?

Yes.

Also, not all instances actually have the WRITE_ONCE() on. The one in
the perf ringbuffer for example uses local_read() for the load (which is
basically READ_ONCE()), but doesn't have WRITE_ONCE() on the inside.

Also, afaiu WRITE_ONCE() also doesn't stop the compiler from lifting it
if it thinks it can get away with it -- memory-barriers.txt has
examples.

And then there's the case where the common branch has a store, I know
ARM64 ARM isn't clear on that, but I'm thinking any actual hardware will
still have to respect it, and it's a matter of 'fixing' the ARM.


Mostly I just want the compiler people to say they'll guarantee the
behaviour if we do 'X'. If 'X' happens to be 'any dynamic branch headed
by a volatile load' that's fine by me.

If they want a new keyword or attribute, that's also fine. But I want to
have the compiler people tell me what they want and guarantee they'll
not come and wreck things.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Control Dependencies vs C Compilers
  2020-10-06 13:31   ` Peter Zijlstra
  2020-10-06 14:23     ` stern
@ 2020-10-06 15:07     ` David Laight
  1 sibling, 0 replies; 21+ messages in thread
From: David Laight @ 2020-10-06 15:07 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: linux-toolchains, Will Deacon, Paul McKenney, linux-kernel,
	stern, parri.andrea, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel, torvalds

> Please go read memory-barriers.txt for a bunch of other examples.

My brain hurts.
It's not as though I've not known about most of this since
the first SMP capable SPARC cpus that had a 'store buffer'.
(Probably 30 years ago.)

I sometimes wonder whether compiling things like the kernel
requires a compiler mode that defaults everything (except
function locals) to 'volatile'.
Although that still does fix control dependencies.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Control Dependencies vs C Compilers
  2020-10-06 14:43       ` Peter Zijlstra
@ 2020-10-06 15:16         ` Nick Clifton
  2020-10-06 15:37           ` David Laight
  0 siblings, 1 reply; 21+ messages in thread
From: Nick Clifton @ 2020-10-06 15:16 UTC (permalink / raw)
  To: Peter Zijlstra, stern
  Cc: David Laight, linux-toolchains, Will Deacon, Paul McKenney,
	linux-kernel, parri.andrea, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, dlustig, joel, torvalds

Hi Guys,

  OK, so playing the devils advocate here...

> Mostly I just want the compiler people to say they'll guarantee the
> behaviour if we do 'X'. If 'X' happens to be 'any dynamic branch headed
> by a volatile load' that's fine by me.

  Is a compiler hack really the right way to go here ?

  After all, if you do get this feature added it will make it
  harder to compile the kernel with other compilers (*cough 
  LLVM cough*), or older versions of gcc.  Plus code like this
  is often subject to very aggressive optimization and all it
  takes is one bug in the compiler implementation and you lose
  the gains you were trying for.

  My suggestion as an alternative is to use assembler instead.
  That way you can guarantee that you get the instructions you
  want in the order that you want them.  It should be fairly
  straightforward to create a macro or inline function that
  contains the necessary code and this can be done once and 
  then used wherever the functionality is required.

Cheers
  Nick


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Control Dependencies vs C Compilers
  2020-10-06 15:16         ` Nick Clifton
@ 2020-10-06 15:37           ` David Laight
  2020-10-06 15:50             ` Paul E. McKenney
  0 siblings, 1 reply; 21+ messages in thread
From: David Laight @ 2020-10-06 15:37 UTC (permalink / raw)
  To: 'Nick Clifton', Peter Zijlstra, stern
  Cc: linux-toolchains, Will Deacon, Paul McKenney, linux-kernel,
	parri.andrea, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel, torvalds

>   My suggestion as an alternative is to use assembler instead.
>   That way you can guarantee that you get the instructions you
>   want in the order that you want them.  It should be fairly
>   straightforward to create a macro or inline function that
>   contains the necessary code and this can be done once and
>   then used wherever the functionality is required.

C exists because K&R got fed up of writing pdp-11 assembler.
Compared to some modern ones it is nice and easy to write
(I'm old enough to have used pdp-11.)

You can't put control dependencies inside C asm statements.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Control Dependencies vs C Compilers
  2020-10-06 15:37           ` David Laight
@ 2020-10-06 15:50             ` Paul E. McKenney
  2020-10-06 16:10               ` Willy Tarreau
  0 siblings, 1 reply; 21+ messages in thread
From: Paul E. McKenney @ 2020-10-06 15:50 UTC (permalink / raw)
  To: David Laight
  Cc: 'Nick Clifton',
	Peter Zijlstra, stern, linux-toolchains, Will Deacon,
	linux-kernel, parri.andrea, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, dlustig, joel, torvalds

On Tue, Oct 06, 2020 at 03:37:24PM +0000, David Laight wrote:
> >   My suggestion as an alternative is to use assembler instead.
> >   That way you can guarantee that you get the instructions you
> >   want in the order that you want them.  It should be fairly
> >   straightforward to create a macro or inline function that
> >   contains the necessary code and this can be done once and
> >   then used wherever the functionality is required.
> 
> C exists because K&R got fed up of writing pdp-11 assembler.
> Compared to some modern ones it is nice and easy to write
> (I'm old enough to have used pdp-11.)
> 
> You can't put control dependencies inside C asm statements.

What David said!

And not only are modern machine languages quite complex and strange
compared to that of the PDP-11, but writing core Linux-kernel code
in assembler requires writing it 25 times, once for each of the
supported architectures.  And even that is an underestimate because
some architectures have multiple variants, for but one example, arm's
multiple instruction sets.

Comparing 25+ assembly languages to but two compilers most definitely
motivates looking hard at doing something with the compilers.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Control Dependencies vs C Compilers
  2020-10-06 15:50             ` Paul E. McKenney
@ 2020-10-06 16:10               ` Willy Tarreau
  2020-10-06 16:22                 ` David Laight
  0 siblings, 1 reply; 21+ messages in thread
From: Willy Tarreau @ 2020-10-06 16:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: David Laight, 'Nick Clifton',
	Peter Zijlstra, stern, linux-toolchains, Will Deacon,
	linux-kernel, parri.andrea, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, dlustig, joel, torvalds

On Tue, Oct 06, 2020 at 08:50:52AM -0700, Paul E. McKenney wrote:
> Comparing 25+ assembly languages to but two compilers most definitely
> motivates looking hard at doing something with the compilers.

+1, especially since the kernel is not special; anyone working with
threads faces the same issues in userland, which are often hidden
behind the implicit whole-memory clobbers of certain operations or
the call to mutex manipulation functions, but which become a reality
again as soon as you go slightly away from these and try to use
lockless mechanisms.

Willy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Control Dependencies vs C Compilers
  2020-10-06 16:10               ` Willy Tarreau
@ 2020-10-06 16:22                 ` David Laight
  2020-10-06 16:31                   ` Paul E. McKenney
  0 siblings, 1 reply; 21+ messages in thread
From: David Laight @ 2020-10-06 16:22 UTC (permalink / raw)
  To: 'Willy Tarreau', Paul E. McKenney
  Cc: 'Nick Clifton',
	Peter Zijlstra, stern, linux-toolchains, Will Deacon,
	linux-kernel, parri.andrea, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, dlustig, joel, torvalds

From: Willy Tarreau
> Sent: 06 October 2020 17:11
> 
> On Tue, Oct 06, 2020 at 08:50:52AM -0700, Paul E. McKenney wrote:
> > Comparing 25+ assembly languages to but two compilers most definitely
> > motivates looking hard at doing something with the compilers.
> 
> +1, especially since the kernel is not special; anyone working with
> threads faces the same issues in userland, which are often hidden
> behind the implicit whole-memory clobbers of certain operations or
> the call to mutex manipulation functions, but which become a reality
> again as soon as you go slightly away from these and try to use
> lockless mechanisms.

AFAICT most windows and android apps completely ignore the problem
of thread locking - which is why the crash and lock up all the time :-)

I've spent most of the day looking at some library traces from a
customer bug.
I almost suspect a bug in the pthread mutex code on their system.
They are using a nice, modern, 3.10.0-957.el7.x86_64 kernel.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Control Dependencies vs C Compilers
  2020-10-06 16:22                 ` David Laight
@ 2020-10-06 16:31                   ` Paul E. McKenney
  0 siblings, 0 replies; 21+ messages in thread
From: Paul E. McKenney @ 2020-10-06 16:31 UTC (permalink / raw)
  To: David Laight
  Cc: 'Willy Tarreau', 'Nick Clifton',
	Peter Zijlstra, stern, linux-toolchains, Will Deacon,
	linux-kernel, parri.andrea, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, dlustig, joel, torvalds

On Tue, Oct 06, 2020 at 04:22:10PM +0000, David Laight wrote:
> From: Willy Tarreau
> > Sent: 06 October 2020 17:11
> > 
> > On Tue, Oct 06, 2020 at 08:50:52AM -0700, Paul E. McKenney wrote:
> > > Comparing 25+ assembly languages to but two compilers most definitely
> > > motivates looking hard at doing something with the compilers.
> > 
> > +1, especially since the kernel is not special; anyone working with
> > threads faces the same issues in userland, which are often hidden
> > behind the implicit whole-memory clobbers of certain operations or
> > the call to mutex manipulation functions, but which become a reality
> > again as soon as you go slightly away from these and try to use
> > lockless mechanisms.
> 
> AFAICT most windows and android apps completely ignore the problem
> of thread locking - which is why the crash and lock up all the time :-)
> 
> I've spent most of the day looking at some library traces from a
> customer bug.
> I almost suspect a bug in the pthread mutex code on their system.
> They are using a nice, modern, 3.10.0-957.el7.x86_64 kernel.

It would be good if the compiler were more helpful!  Failing that, if
-something- could be more helpful!!!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Control Dependencies vs C Compilers
  2020-10-06 11:47 Control Dependencies vs C Compilers Peter Zijlstra
  2020-10-06 12:37 ` David Laight
@ 2020-10-06 21:20 ` Florian Weimer
  2020-10-07  9:32   ` Peter Zijlstra
  1 sibling, 1 reply; 21+ messages in thread
From: Florian Weimer @ 2020-10-06 21:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-toolchains, Will Deacon, Paul McKenney, linux-kernel,
	stern, parri.andrea, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel, torvalds

* Peter Zijlstra:

> Our Documentation/memory-barriers.txt has a Control Dependencies section
> (which I shall not replicate here for brevity) which lists a number of
> caveats. But in general the work-around we use is:
>
> 	x = READ_ONCE(*foo);
> 	if (x > 42)
> 		WRITE_ONCE(*bar, 1);
>
> Where READ/WRITE_ONCE() cast the variable volatile. The volatile
> qualifier dissuades the compiler from assuming it knows things and we
> then hope it will indeed emit the branch like we'd expect.
>
>
> Now, hoping the compiler generates correct code is clearly not ideal and
> very dangerous indeed. Which is why my question to the compiler folks
> assembled here is:
>
>   Can we get a C language extention for this?

For what exactly?

Do you want a compiler that never simplifies conditional expressions
(like some people want compilers that never re-associate floating point
operations)?

With a bit of effort, we could prototype such a compiler and run
benchmarks against a kernel that was built using it.  But I'm not sure
if that's a good use of resources.

> And while we have a fair number (and growing) existing users of this in
> the kernel, I'd not be adverse to having to annotate them.

But not using READ_ONCE and WRITE_ONCE?

I think in GCC, they are called __atomic_load_n(foo, __ATOMIC_RELAXED)
and __atomic_store_n(foo, __ATOMIC_RELAXED).  GCC can't optimize relaxed
MO loads and stores because the C memory model is defective and does not
actually guarantee the absence of out-of-thin-air values (a property it
was supposed to have).  Obviously, actual implementations want to
provide this guarantee.  So it's really impossible right now to argue
about this in any formal way and determine the validity of optimizations
(which is why there are none, hopefully).

In standard C, there is <stdatomic.h>, but its relaxed MO loads and
stores can of course be optimized, so you'll need a compiler extension
here.

A different way of annotating this would be a variant of _Atomic where
plain accesses have relaxed MO, not seq-cst MO.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Control Dependencies vs C Compilers
  2020-10-06 21:20 ` Florian Weimer
@ 2020-10-07  9:32   ` Peter Zijlstra
  2020-10-07 10:20     ` Florian Weimer
  2020-10-07 10:30     ` Willy Tarreau
  0 siblings, 2 replies; 21+ messages in thread
From: Peter Zijlstra @ 2020-10-07  9:32 UTC (permalink / raw)
  To: Florian Weimer
  Cc: linux-toolchains, Will Deacon, Paul McKenney, linux-kernel,
	stern, parri.andrea, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel, torvalds

On Tue, Oct 06, 2020 at 11:20:01PM +0200, Florian Weimer wrote:
> * Peter Zijlstra:
> 
> > Our Documentation/memory-barriers.txt has a Control Dependencies section
> > (which I shall not replicate here for brevity) which lists a number of
> > caveats. But in general the work-around we use is:
> >
> > 	x = READ_ONCE(*foo);
> > 	if (x > 42)
> > 		WRITE_ONCE(*bar, 1);
> >
> > Where READ/WRITE_ONCE() cast the variable volatile. The volatile
> > qualifier dissuades the compiler from assuming it knows things and we
> > then hope it will indeed emit the branch like we'd expect.
> >
> >
> > Now, hoping the compiler generates correct code is clearly not ideal and
> > very dangerous indeed. Which is why my question to the compiler folks
> > assembled here is:
> >
> >   Can we get a C language extention for this?
> 
> For what exactly?

A branch that cannot be optimized away and prohibits lifting stores
over. One possible suggestion would be allowing the volatile keyword as
a qualifier to if.

	x = *foo;
	volatile if (x > 42)
		*bar = 1;

This would tell the compiler that the condition is special in that it
must emit a conditional branch instruction and that it must not lift
stores (or sequence points) over it.

> Do you want a compiler that never simplifies conditional expressions
> (like some people want compilers that never re-associate floating point
> operations)?

No. I'm fine with optimizing things in general, I just want to be able
to control/limit it for a few specific cases.

> > And while we have a fair number (and growing) existing users of this in
> > the kernel, I'd not be adverse to having to annotate them.
> 
> But not using READ_ONCE and WRITE_ONCE?

I'm OK with READ_ONCE(), but the WRITE_ONCE() doesn't help much, if
anything. The compiler is always allowed to lift stores, regardless of
the qualifiers used.

> I think in GCC, they are called __atomic_load_n(foo, __ATOMIC_RELAXED)
> and __atomic_store_n(foo, __ATOMIC_RELAXED).  GCC can't optimize relaxed
> MO loads and stores because the C memory model is defective and does not
> actually guarantee the absence of out-of-thin-air values (a property it
> was supposed to have).

AFAIK people want to get that flaw in the C memory model fixed (which to
me seemd like a very good idea).

Also, AFAIK the compiler would be allowed to lift __atomic_store_n(foo,
__ATOMIC_RELAXED) out of a branch.

> A different way of annotating this would be a variant of _Atomic where
> plain accesses have relaxed MO, not seq-cst MO.

So Linux isn't going to use _Atomic, we disagree with the C memory model
too much. Also, volatile is perfectly sufficient for things.

I know there's a bunch of people in the C committee that want to get rid
of volatile, but that's just not going to happen in the real world,
there's too much volatile out there.

More to the point, Linux already relies on this without the later stores
being annotated, and it works because lifting those stores just really
doesn't make sense (and it's further constrained by sequence points,
although I'm not sure what, if anything, the effect of LTO optimization
is on sequence points -- inline for example removes sequence points,
which is sometimes scary as heck).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Control Dependencies vs C Compilers
  2020-10-07  9:32   ` Peter Zijlstra
@ 2020-10-07 10:20     ` Florian Weimer
  2020-10-07 11:50       ` Peter Zijlstra
  2020-10-07 10:30     ` Willy Tarreau
  1 sibling, 1 reply; 21+ messages in thread
From: Florian Weimer @ 2020-10-07 10:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-toolchains, Will Deacon, Paul McKenney, linux-kernel,
	stern, parri.andrea, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel, torvalds

* Peter Zijlstra:

> On Tue, Oct 06, 2020 at 11:20:01PM +0200, Florian Weimer wrote:
>> * Peter Zijlstra:
>> 
>> > Our Documentation/memory-barriers.txt has a Control Dependencies section
>> > (which I shall not replicate here for brevity) which lists a number of
>> > caveats. But in general the work-around we use is:
>> >
>> > 	x = READ_ONCE(*foo);
>> > 	if (x > 42)
>> > 		WRITE_ONCE(*bar, 1);
>> >
>> > Where READ/WRITE_ONCE() cast the variable volatile. The volatile
>> > qualifier dissuades the compiler from assuming it knows things and we
>> > then hope it will indeed emit the branch like we'd expect.
>> >
>> >
>> > Now, hoping the compiler generates correct code is clearly not ideal and
>> > very dangerous indeed. Which is why my question to the compiler folks
>> > assembled here is:
>> >
>> >   Can we get a C language extention for this?
>> 
>> For what exactly?
>
> A branch that cannot be optimized away and prohibits lifting stores
> over. One possible suggestion would be allowing the volatile keyword as
> a qualifier to if.
>
> 	x = *foo;
> 	volatile if (x > 42)
> 		*bar = 1;
>
> This would tell the compiler that the condition is special in that it
> must emit a conditional branch instruction and that it must not lift
> stores (or sequence points) over it.

But it's not the if statement, but the expression in it, right?  So this
would not be a valid transformation:

 	x = *foo;
        bool flag = x > 42;
 	volatile if (flag)
 		*bar = 1;

And if we had this:

 	unsigned x = *foo;
 	volatile if (x >= 17 && x < 42)
 		*bar = 1;

Would it be valid to transform this into (assuming that I got the
arithmetic correct):

 	unsigned x = *foo;
 	volatile if ((x - 17) < 25)
 		*bar = 1;

Or would this destroy the magic because arithmetic happens on the value
before the comparison?

>> But not using READ_ONCE and WRITE_ONCE?
>
> I'm OK with READ_ONCE(), but the WRITE_ONCE() doesn't help much, if
> anything. The compiler is always allowed to lift stores, regardless of
> the qualifiers used.

I would hope that with some level of formalization, it can be shown that
no additional synchronization is necessary beyond the load/conditional
sequence.

>> I think in GCC, they are called __atomic_load_n(foo, __ATOMIC_RELAXED)
>> and __atomic_store_n(foo, __ATOMIC_RELAXED).  GCC can't optimize relaxed
>> MO loads and stores because the C memory model is defective and does not
>> actually guarantee the absence of out-of-thin-air values (a property it
>> was supposed to have).
>
> AFAIK people want to get that flaw in the C memory model fixed (which to
> me seemd like a very good idea).

It's been a long time since people realized that this problem exists,
with several standard releases since then.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Control Dependencies vs C Compilers
  2020-10-07  9:32   ` Peter Zijlstra
  2020-10-07 10:20     ` Florian Weimer
@ 2020-10-07 10:30     ` Willy Tarreau
  1 sibling, 0 replies; 21+ messages in thread
From: Willy Tarreau @ 2020-10-07 10:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Florian Weimer, linux-toolchains, Will Deacon, Paul McKenney,
	linux-kernel, stern, parri.andrea, boqun.feng, npiggin, dhowells,
	j.alglave, luc.maranget, akiyks, dlustig, joel, torvalds

On Wed, Oct 07, 2020 at 11:32:43AM +0200, Peter Zijlstra wrote:
> A branch that cannot be optimized away and prohibits lifting stores
> over. One possible suggestion would be allowing the volatile keyword as
> a qualifier to if.
> 
> 	x = *foo;
> 	volatile if (x > 42)
> 		*bar = 1;
> 
> This would tell the compiler that the condition is special in that it
> must emit a conditional branch instruction and that it must not lift
> stores (or sequence points) over it.

This test is interesting, because if foo and bar are of the same type,
nothing prevents them from aliasing and the compiler cannot make wild
guesses on them (i.e. they may be plain memory as well as memory-mapped
registers).

Extending it like this shows a difference between the use of volatile
and __atomic_{load,store}_n. While both are correct in that each access
is properly performed, for an unknown reason the compiler decided to
implement two distinct branches in the atomic case and to inflate the
code:

$ gcc -v
gcc version 9.3.0 (GCC)

$ cat foo-volatile.c
long foobar(long *foo, long *bar)
{
        *(volatile long *)bar = 10;
        if (*(volatile long *)foo <= 42)
                *(volatile long *)bar = 64;
        if (*(volatile long *)foo > 42)
                *(volatile long *)bar = 0;
        return *(volatile long *)bar;
}
$ gcc -c -O2 foo-volatile.c
$ objdump -dr foo-volatile.o 
0000000000000000 <foobar>:
   0:   48 c7 06 0a 00 00 00    movq   $0xa,(%rsi)
   7:   48 8b 07                mov    (%rdi),%rax
   a:   48 83 f8 2a             cmp    $0x2a,%rax
   e:   7f 07                   jg     17 <foobar+0x17>
  10:   48 c7 06 40 00 00 00    movq   $0x40,(%rsi)
  17:   48 8b 07                mov    (%rdi),%rax
  1a:   48 83 f8 2a             cmp    $0x2a,%rax
  1e:   7e 07                   jle    27 <foobar+0x27>
  20:   48 c7 06 00 00 00 00    movq   $0x0,(%rsi)
  27:   48 8b 06                mov    (%rsi),%rax
  2a:   c3                      retq   


$ cat foo-atomic.c  
long foobar(long *foo, long *bar)
{
        __atomic_store_n(bar, 10, __ATOMIC_RELAXED);
        if (__atomic_load_n(foo, __ATOMIC_RELAXED) <= 42)
                __atomic_store_n(bar, 64, __ATOMIC_RELAXED);
        if (__atomic_load_n(foo, __ATOMIC_RELAXED) > 42)
                __atomic_store_n(bar, 0, __ATOMIC_RELAXED);
        return __atomic_load_n(bar, __ATOMIC_RELAXED);
}
$ objdump -dr foo-atomic.o 
0000000000000000 <foobar>:
   0:   48 c7 06 0a 00 00 00    movq   $0xa,(%rsi)
   7:   48 8b 07                mov    (%rdi),%rax
   a:   48 83 f8 2a             cmp    $0x2a,%rax
   e:   7e 10                   jle    20 <foobar+0x20>
  10:   48 8b 07                mov    (%rdi),%rax
  13:   48 83 f8 2a             cmp    $0x2a,%rax
  17:   7f 17                   jg     30 <foobar+0x30>
  19:   48 8b 06                mov    (%rsi),%rax
  1c:   c3                      retq   
  1d:   0f 1f 00                nopl   (%rax)
  20:   48 c7 06 40 00 00 00    movq   $0x40,(%rsi)
  27:   48 8b 07                mov    (%rdi),%rax
  2a:   48 83 f8 2a             cmp    $0x2a,%rax
  2e:   7e e9                   jle    19 <foobar+0x19>
  30:   48 c7 06 00 00 00 00    movq   $0x0,(%rsi)
  37:   48 8b 06                mov    (%rsi),%rax
  3a:   c3                      retq   

When building at -Os both produce the same code as the volatile version
above. It *seems* to me that the volatile version always produces more
optimal code, but is it always correct ? This is just an illustration of
how tricky this can currently be and how confusing it can sometimes be
for the developer to make sure the desired code is emitted in a few
special cases. And just for this, having the compiler support more easily
predictable constructs would be a nice improvement.

Willy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Control Dependencies vs C Compilers
  2020-10-07 10:20     ` Florian Weimer
@ 2020-10-07 11:50       ` Peter Zijlstra
  2020-10-07 17:11         ` Paul E. McKenney
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2020-10-07 11:50 UTC (permalink / raw)
  To: Florian Weimer
  Cc: linux-toolchains, Will Deacon, Paul McKenney, linux-kernel,
	stern, parri.andrea, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel, torvalds

On Wed, Oct 07, 2020 at 12:20:41PM +0200, Florian Weimer wrote:
> * Peter Zijlstra:

> > A branch that cannot be optimized away and prohibits lifting stores
> > over. One possible suggestion would be allowing the volatile keyword as
> > a qualifier to if.
> >
> > 	x = *foo;
> > 	volatile if (x > 42)
> > 		*bar = 1;
> >
> > This would tell the compiler that the condition is special in that it
> > must emit a conditional branch instruction and that it must not lift
> > stores (or sequence points) over it.
> 
> But it's not the if statement, but the expression in it, right? 

No, it *IS* the if statement, the magic is a conditional branch
instruction and the fact that CPUs are not allowed to speculate stores
(which would lead to instant OOTA).

> So this would not be a valid transformation:
> 
>  	x = *foo;
>         bool flag = x > 42;
>  	volatile if (flag)
>  		*bar = 1;

It would be valid, it still ensures the load of *foo happens before the
store of *bar.

> And if we had this:
> 
>  	unsigned x = *foo;
>  	volatile if (x >= 17 && x < 42)
>  		*bar = 1;
> 
> Would it be valid to transform this into (assuming that I got the
> arithmetic correct):
> 
>  	unsigned x = *foo;
>  	volatile if ((x - 17) < 25)
>  		*bar = 1;
> 
> Or would this destroy the magic because arithmetic happens on the value
> before the comparison?

Nope, that'd still be good. The critical part is that the resolution of
the conditional branch depend on the load. All these transformations
preserve that.

So we use the data dependency between the load and the branch
instruction coupled with the inability to speculate stores, to generate
a LOAD to STORE ordering.

> >> But not using READ_ONCE and WRITE_ONCE?
> >
> > I'm OK with READ_ONCE(), but the WRITE_ONCE() doesn't help much, if
> > anything. The compiler is always allowed to lift stores, regardless of
> > the qualifiers used.
> 
> I would hope that with some level of formalization, it can be shown that
> no additional synchronization is necessary beyond the load/conditional
> sequence.

Agreed. Those are the critical part, the tricky bit is ensuring the
compiler doesn't lift stuff over the condition.

> >> I think in GCC, they are called __atomic_load_n(foo, __ATOMIC_RELAXED)
> >> and __atomic_store_n(foo, __ATOMIC_RELAXED).  GCC can't optimize relaxed
> >> MO loads and stores because the C memory model is defective and does not
> >> actually guarantee the absence of out-of-thin-air values (a property it
> >> was supposed to have).
> >
> > AFAIK people want to get that flaw in the C memory model fixed (which to
> > me seemd like a very good idea).
> 
> It's been a long time since people realized that this problem exists,
> with several standard releases since then.

I've been given to believe it is a hard problem. Personally I hold the
opinion that prohibiting store speculation (of all kinds) is both
necesary and sufficient to avoid OOTA. But I have 0 proof for that.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Control Dependencies vs C Compilers
  2020-10-07 11:50       ` Peter Zijlstra
@ 2020-10-07 17:11         ` Paul E. McKenney
  2020-10-07 21:07           ` Peter Zijlstra
  0 siblings, 1 reply; 21+ messages in thread
From: Paul E. McKenney @ 2020-10-07 17:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Florian Weimer, linux-toolchains, Will Deacon, linux-kernel,
	stern, parri.andrea, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel, torvalds

On Wed, Oct 07, 2020 at 01:50:54PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2020 at 12:20:41PM +0200, Florian Weimer wrote:
> > * Peter Zijlstra:

[ . . . ]

> > >> I think in GCC, they are called __atomic_load_n(foo, __ATOMIC_RELAXED)
> > >> and __atomic_store_n(foo, __ATOMIC_RELAXED).  GCC can't optimize relaxed
> > >> MO loads and stores because the C memory model is defective and does not
> > >> actually guarantee the absence of out-of-thin-air values (a property it
> > >> was supposed to have).
> > >
> > > AFAIK people want to get that flaw in the C memory model fixed (which to
> > > me seemd like a very good idea).
> > 
> > It's been a long time since people realized that this problem exists,
> > with several standard releases since then.
> 
> I've been given to believe it is a hard problem. Personally I hold the
> opinion that prohibiting store speculation (of all kinds) is both
> necesary and sufficient to avoid OOTA. But I have 0 proof for that.

There are proofs for some definitions of store speculation, for example,
as proposed by Demsky and Boehm [1] and as prototyped by Demsky's student,
Peizhao Ou [2].  But these require marking all accesses and end up being
optimized variants of acquire load and release store.  One optimization
is that if you have a bunch of loads followed by a bunch of stores,
the compiler can emit a single memory-barrier instruction between the
last load and the first store.

I am not a fan of this approach.

Challenges include:

o	Unmarked accesses.  Compilers are quite aggressive about
	moving normal code.

o	Separately compiled code.  For example, does the compiler have
	unfortunatel optimization opportunities when "volatile if" 
	appears in one translation unit and the dependent stores in
	some other translation unit?

o	LTO, as has already been mentioned in this thread.

Probably other issues as well, but a starting point.

							Thanx, Paul

[1]	https://dl.acm.org/doi/10.1145/2618128.2618134
	"Outlawing ghosts: avoiding out-of-thin-air results"
	Hans-J. Boehm and Brian Demsky.

[2]	https://escholarship.org/uc/item/2vm546k1
	"An Initial Study of Two Approaches to Eliminating Out-of-Thin-Air
	Results" Peizhao Ou.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Control Dependencies vs C Compilers
  2020-10-07 17:11         ` Paul E. McKenney
@ 2020-10-07 21:07           ` Peter Zijlstra
  2020-10-07 21:20             ` Paul E. McKenney
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2020-10-07 21:07 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Florian Weimer, linux-toolchains, Will Deacon, linux-kernel,
	stern, parri.andrea, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel, torvalds

On Wed, Oct 07, 2020 at 10:11:07AM -0700, Paul E. McKenney wrote:

> Challenges include:
> 
> o	Unmarked accesses.  Compilers are quite aggressive about
> 	moving normal code.

Which is why this thread exists :-) We wants to dis-allow lifting the
stores over our volatile-if.

> o	Separately compiled code.  For example, does the compiler have
> 	unfortunatel optimization opportunities when "volatile if" 
> 	appears in one translation unit and the dependent stores in
> 	some other translation unit?

It can hardly lift anything outside a TU (barring the next point). So I
don't see how it can go wrong here. This is in fact the case with the
perf ringbuffer. The ctrl-dep lives in a different TU from the
stores.

> o	LTO, as has already been mentioned in this thread.

So I would probably advocate the volatile-if to be a full sync point,
and LTO would have to preserve that.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Control Dependencies vs C Compilers
  2020-10-07 21:07           ` Peter Zijlstra
@ 2020-10-07 21:20             ` Paul E. McKenney
  0 siblings, 0 replies; 21+ messages in thread
From: Paul E. McKenney @ 2020-10-07 21:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Florian Weimer, linux-toolchains, Will Deacon, linux-kernel,
	stern, parri.andrea, boqun.feng, npiggin, dhowells, j.alglave,
	luc.maranget, akiyks, dlustig, joel, torvalds

On Wed, Oct 07, 2020 at 11:07:17PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2020 at 10:11:07AM -0700, Paul E. McKenney wrote:
> 
> > Challenges include:
> > 
> > o	Unmarked accesses.  Compilers are quite aggressive about
> > 	moving normal code.
> 
> Which is why this thread exists :-) We wants to dis-allow lifting the
> stores over our volatile-if.

Of course.  But you should expect this point to be a continual source
of shock and surprise to compiler folks.  ;-)

> > o	Separately compiled code.  For example, does the compiler have
> > 	unfortunatel optimization opportunities when "volatile if" 
> > 	appears in one translation unit and the dependent stores in
> > 	some other translation unit?
> 
> It can hardly lift anything outside a TU (barring the next point). So I
> don't see how it can go wrong here. This is in fact the case with the
> perf ringbuffer. The ctrl-dep lives in a different TU from the
> stores.

I don't see how it could either, but I have been surprised before.

> > o	LTO, as has already been mentioned in this thread.
> 
> So I would probably advocate the volatile-if to be a full sync point,
> and LTO would have to preserve that.

Completely agreed!  And probably not the only place that LTO needs
to be reined in a bit.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-10-07 21:20 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-06 11:47 Control Dependencies vs C Compilers Peter Zijlstra
2020-10-06 12:37 ` David Laight
2020-10-06 12:49   ` Willy Tarreau
2020-10-06 13:31   ` Peter Zijlstra
2020-10-06 14:23     ` stern
2020-10-06 14:43       ` Peter Zijlstra
2020-10-06 15:16         ` Nick Clifton
2020-10-06 15:37           ` David Laight
2020-10-06 15:50             ` Paul E. McKenney
2020-10-06 16:10               ` Willy Tarreau
2020-10-06 16:22                 ` David Laight
2020-10-06 16:31                   ` Paul E. McKenney
2020-10-06 15:07     ` David Laight
2020-10-06 21:20 ` Florian Weimer
2020-10-07  9:32   ` Peter Zijlstra
2020-10-07 10:20     ` Florian Weimer
2020-10-07 11:50       ` Peter Zijlstra
2020-10-07 17:11         ` Paul E. McKenney
2020-10-07 21:07           ` Peter Zijlstra
2020-10-07 21:20             ` Paul E. McKenney
2020-10-07 10:30     ` Willy Tarreau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.