All of lore.kernel.org
 help / color / mirror / Atom feed
* Potentially missing "memory" clobbers in bitops.h for x86
@ 2019-03-28 14:14 Alexander Potapenko
  2019-03-28 16:22 ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Potapenko @ 2019-03-28 14:14 UTC (permalink / raw)
  To: Paul McKenney, Peter Zijlstra, H. Peter Anvin, Ingo Molnar
  Cc: LKML, Dmitriy Vyukov, James Y Knight

Hello,

arch/x86/include/asm/bitops.h defines clear_bit(nr, addr) for
non-constant |nr| values as follows:

void clear_bit(long nr, volatile unsigned long *addr) {
  asm volatile("lock; btr %1,%0"
    : "+m"(*(volatile long *)addr)
    : "Ir" (nr));
  }
(https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/bitops.h#L111)

According to the comments in the file, |nr| may be arbitrarily large.
However the assembly constraints only imply that the first unsigned
long value at |addr| is written to.
This may result in the compiler ignoring the effect of the asm directive.

Consider the following example (https://godbolt.org/z/naTmjn):

  #include <stdio.h>
  void clear_bit(long nr, volatile unsigned long *addr) {
    asm volatile("lock; btr %1,%0"
      : "+m"(*(volatile long *)addr)
      : "Ir" (nr));
  }

  unsigned long foo() {
    unsigned long addr[2] = {1, 2};
    clear_bit(65, addr);
    return addr[0] + addr[1];
  }

  int main() {
    printf("foo: %lu\n", foo());
  }

Depending on the optimization level, the program may print either 1
(for -O0 and -O1) or 3 (for -O2 and -O3).
This is because on higher optimization levels GCC assumes that addr[1]
is unchanged and directly propagates the constant to the result.

I suspect the definitions of clear_bit() and similar functions are
lacking the "memory" clobber.
But the whole file tends to be very picky about whether this clobber
needs to be applied in each case, so in the case of a performance
penalty we may need to consider alternative approaches to fixing this
code.


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potentially missing "memory" clobbers in bitops.h for x86
  2019-03-28 14:14 Potentially missing "memory" clobbers in bitops.h for x86 Alexander Potapenko
@ 2019-03-28 16:22 ` Paul E. McKenney
  2019-03-29 15:54   ` Alexander Potapenko
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2019-03-28 16:22 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar, LKML,
	Dmitriy Vyukov, James Y Knight

On Thu, Mar 28, 2019 at 03:14:12PM +0100, Alexander Potapenko wrote:
> Hello,
> 
> arch/x86/include/asm/bitops.h defines clear_bit(nr, addr) for
> non-constant |nr| values as follows:
> 
> void clear_bit(long nr, volatile unsigned long *addr) {
>   asm volatile("lock; btr %1,%0"
>     : "+m"(*(volatile long *)addr)
>     : "Ir" (nr));
>   }
> (https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/bitops.h#L111)
> 
> According to the comments in the file, |nr| may be arbitrarily large.
> However the assembly constraints only imply that the first unsigned
> long value at |addr| is written to.
> This may result in the compiler ignoring the effect of the asm directive.
> 
> Consider the following example (https://godbolt.org/z/naTmjn):
> 
>   #include <stdio.h>
>   void clear_bit(long nr, volatile unsigned long *addr) {
>     asm volatile("lock; btr %1,%0"
>       : "+m"(*(volatile long *)addr)
>       : "Ir" (nr));
>   }
> 
>   unsigned long foo() {
>     unsigned long addr[2] = {1, 2};
>     clear_bit(65, addr);
>     return addr[0] + addr[1];
>   }
> 
>   int main() {
>     printf("foo: %lu\n", foo());
>   }
> 
> Depending on the optimization level, the program may print either 1
> (for -O0 and -O1) or 3 (for -O2 and -O3).
> This is because on higher optimization levels GCC assumes that addr[1]
> is unchanged and directly propagates the constant to the result.
> 
> I suspect the definitions of clear_bit() and similar functions are
> lacking the "memory" clobber.
> But the whole file tends to be very picky about whether this clobber
> needs to be applied in each case, so in the case of a performance
> penalty we may need to consider alternative approaches to fixing this
> code.

Is there a way of indicating a clobber only on the specific location
affected?  I suppose that one approach would be to calculate in C code
a pointer to the specific element of the addr[] array, which would
put the specific clobbered memory location onto the outputs list.
Or keep the current calculation, but also add "addr[nr / sizeof(long)]"
to the output list, thus telling the compiler exactly what is being
clobbered.  Assuming that actually works...

Of course, this would force the compiler to actually compute the
offset, which would slow things down.  I have no idea whether this
would be better or worse than just using the "memory" clobber.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potentially missing "memory" clobbers in bitops.h for x86
  2019-03-28 16:22 ` Paul E. McKenney
@ 2019-03-29 15:54   ` Alexander Potapenko
  2019-03-29 20:52     ` H. Peter Anvin
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Potapenko @ 2019-03-29 15:54 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Peter Zijlstra, H. Peter Anvin, Ingo Molnar, LKML,
	Dmitriy Vyukov, James Y Knight

On Thu, Mar 28, 2019 at 5:22 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote:
>
> On Thu, Mar 28, 2019 at 03:14:12PM +0100, Alexander Potapenko wrote:
> > Hello,
> >
> > arch/x86/include/asm/bitops.h defines clear_bit(nr, addr) for
> > non-constant |nr| values as follows:
> >
> > void clear_bit(long nr, volatile unsigned long *addr) {
> >   asm volatile("lock; btr %1,%0"
> >     : "+m"(*(volatile long *)addr)
> >     : "Ir" (nr));
> >   }
> > (https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/bitops.h#L111)
> >
> > According to the comments in the file, |nr| may be arbitrarily large.
> > However the assembly constraints only imply that the first unsigned
> > long value at |addr| is written to.
> > This may result in the compiler ignoring the effect of the asm directive.
> >
> > Consider the following example (https://godbolt.org/z/naTmjn):
> >
> >   #include <stdio.h>
> >   void clear_bit(long nr, volatile unsigned long *addr) {
> >     asm volatile("lock; btr %1,%0"
> >       : "+m"(*(volatile long *)addr)
> >       : "Ir" (nr));
> >   }
> >
> >   unsigned long foo() {
> >     unsigned long addr[2] = {1, 2};
> >     clear_bit(65, addr);
> >     return addr[0] + addr[1];
> >   }
> >
> >   int main() {
> >     printf("foo: %lu\n", foo());
> >   }
> >
> > Depending on the optimization level, the program may print either 1
> > (for -O0 and -O1) or 3 (for -O2 and -O3).
> > This is because on higher optimization levels GCC assumes that addr[1]
> > is unchanged and directly propagates the constant to the result.
> >
> > I suspect the definitions of clear_bit() and similar functions are
> > lacking the "memory" clobber.
> > But the whole file tends to be very picky about whether this clobber
> > needs to be applied in each case, so in the case of a performance
> > penalty we may need to consider alternative approaches to fixing this
> > code.
>
> Is there a way of indicating a clobber only on the specific location
> affected?
We're unaware of such a way.
> I suppose that one approach would be to calculate in C code
> a pointer to the specific element of the addr[] array, which would
> put the specific clobbered memory location onto the outputs list.
> Or keep the current calculation, but also add "addr[nr / sizeof(long)]"
> to the output list, thus telling the compiler exactly what is being
> clobbered.  Assuming that actually works...
That works, but sometimes GCC emits the code calculating addr[nr/64]
before the BTR instruction for no reason.
Adding this output for clear_bit() makes around 120 kernel functions
change their sizes. I only checked a handful, but some contained code
like:
  cmovns %rdi, %rax
  sarq $6, %rax
  leaq (%rsi,%rax,8), %rax

Another idea we've come up with is to cast the pointer in BITOP_ADDR()
to a very big array: https://godbolt.org/z/apHvDc
Making so prevents the compiler from caching the assembly inputs, but
has its own drawbacks.
First, it may potentially yield incorrect assumptions about the size
of an object passed into the assembly (which may lead to e.g. dropping
some size checks)
Second, for smaller arrays these casts may result in excessive
clobbers (this happens in misc_register(), where GCC decides that the
input parameter |misc| may alias with the global |misc_minors|
bitmap).
Applying the cast to clear_bit() only changes the size of 3 kernel
functions: hub_port_reset(), misc_register(),
tick_broadcast_setup_oneshot().
None of them have visible semantic changes.

> Of course, this would force the compiler to actually compute the
> offset, which would slow things down.  I have no idea whether this
> would be better or worse than just using the "memory" clobber.
Just adding the "memory" clobber to clear_bit() changes sizes of 5
kernel functions (the three mentioned above, plus hub_activate() and
native_send_call_func_ipi()) by a small margin.
This probably means the performance impact of this clobber is
negligible in this case.
>                                                         Thanx, Paul
>
--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potentially missing "memory" clobbers in bitops.h for x86
  2019-03-29 15:54   ` Alexander Potapenko
@ 2019-03-29 20:52     ` H. Peter Anvin
  2019-03-29 21:09       ` Paul E. McKenney
  2019-04-01 15:00       ` Alexander Potapenko
  0 siblings, 2 replies; 12+ messages in thread
From: H. Peter Anvin @ 2019-03-29 20:52 UTC (permalink / raw)
  To: Alexander Potapenko, Paul McKenney
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Dmitriy Vyukov, James Y Knight

On 3/29/19 8:54 AM, Alexander Potapenko wrote:
> 
>> Of course, this would force the compiler to actually compute the
>> offset, which would slow things down.  I have no idea whether this
>> would be better or worse than just using the "memory" clobber.
> Just adding the "memory" clobber to clear_bit() changes sizes of 5
> kernel functions (the three mentioned above, plus hub_activate() and
> native_send_call_func_ipi()) by a small margin.
> This probably means the performance impact of this clobber is
> negligible in this case.

I would agree with that.

Could you perhaps verify whether or not any of the above functions
contains a currently manifest bug?

Note: the atomic versions of these functions obviously need to have
"volatile" and the clobber anyway, as they are by definition barriers
and moving memory operations around them would be a very serious error.

	-hpa




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potentially missing "memory" clobbers in bitops.h for x86
  2019-03-29 20:52     ` H. Peter Anvin
@ 2019-03-29 21:09       ` Paul E. McKenney
  2019-03-29 21:51         ` H. Peter Anvin
  2019-04-01 15:00       ` Alexander Potapenko
  1 sibling, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2019-03-29 21:09 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Alexander Potapenko, Peter Zijlstra, Ingo Molnar, LKML,
	Dmitriy Vyukov, James Y Knight

On Fri, Mar 29, 2019 at 01:52:33PM -0700, H. Peter Anvin wrote:
> On 3/29/19 8:54 AM, Alexander Potapenko wrote:
> > 
> >> Of course, this would force the compiler to actually compute the
> >> offset, which would slow things down.  I have no idea whether this
> >> would be better or worse than just using the "memory" clobber.
> > Just adding the "memory" clobber to clear_bit() changes sizes of 5
> > kernel functions (the three mentioned above, plus hub_activate() and
> > native_send_call_func_ipi()) by a small margin.
> > This probably means the performance impact of this clobber is
> > negligible in this case.
> 
> I would agree with that.
> 
> Could you perhaps verify whether or not any of the above functions
> contains a currently manifest bug?
> 
> Note: the atomic versions of these functions obviously need to have
> "volatile" and the clobber anyway, as they are by definition barriers
> and moving memory operations around them would be a very serious error.

The atomic functions that return void don't need to order anything except
the input and output arguments.  The oddness with clear_bit() is that the
memory changed isn't necessarily the quantity referenced by the argument,
if the number of bits specified is large.

So (for example) atomic_inc() does not need a "memory" clobber, right?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potentially missing "memory" clobbers in bitops.h for x86
  2019-03-29 21:09       ` Paul E. McKenney
@ 2019-03-29 21:51         ` H. Peter Anvin
  2019-03-29 22:05           ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: H. Peter Anvin @ 2019-03-29 21:51 UTC (permalink / raw)
  To: paulmck
  Cc: Alexander Potapenko, Peter Zijlstra, Ingo Molnar, LKML,
	Dmitriy Vyukov, James Y Knight

On 3/29/19 2:09 PM, Paul E. McKenney wrote:
>>
>> Note: the atomic versions of these functions obviously need to have
>> "volatile" and the clobber anyway, as they are by definition barriers
>> and moving memory operations around them would be a very serious error.
> 
> The atomic functions that return void don't need to order anything except
> the input and output arguments.  The oddness with clear_bit() is that the
> memory changed isn't necessarily the quantity referenced by the argument,
> if the number of bits specified is large.
> 
> So (for example) atomic_inc() does not need a "memory" clobber, right?
> 

I don't believe that is true: the code calling it has a reasonable
expectation that previous memory operations have finished and later
memory operations have not started from the point of view of another
processor. You are more of an expert on memory ordering than I am, but
I'm 89% sure that there is plenty of code in the kernel which makes that
assumption.

	-hpa


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potentially missing "memory" clobbers in bitops.h for x86
  2019-03-29 21:51         ` H. Peter Anvin
@ 2019-03-29 22:05           ` Paul E. McKenney
  2019-03-29 22:30             ` hpa
  2019-04-01 10:53             ` Peter Zijlstra
  0 siblings, 2 replies; 12+ messages in thread
From: Paul E. McKenney @ 2019-03-29 22:05 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Alexander Potapenko, Peter Zijlstra, Ingo Molnar, LKML,
	Dmitriy Vyukov, James Y Knight

On Fri, Mar 29, 2019 at 02:51:26PM -0700, H. Peter Anvin wrote:
> On 3/29/19 2:09 PM, Paul E. McKenney wrote:
> >>
> >> Note: the atomic versions of these functions obviously need to have
> >> "volatile" and the clobber anyway, as they are by definition barriers
> >> and moving memory operations around them would be a very serious error.
> > 
> > The atomic functions that return void don't need to order anything except
> > the input and output arguments.  The oddness with clear_bit() is that the
> > memory changed isn't necessarily the quantity referenced by the argument,
> > if the number of bits specified is large.
> > 
> > So (for example) atomic_inc() does not need a "memory" clobber, right?
> 
> I don't believe that is true: the code calling it has a reasonable
> expectation that previous memory operations have finished and later
> memory operations have not started from the point of view of another
> processor. You are more of an expert on memory ordering than I am, but
> I'm 89% sure that there is plenty of code in the kernel which makes that
> assumption.

From Documentation/core-api/atomic_ops.rst:

------------------------------------------------------------------------
	void atomic_add(int i, atomic_t *v);
	void atomic_sub(int i, atomic_t *v);
	void atomic_inc(atomic_t *v);
	void atomic_dec(atomic_t *v);

These four routines add and subtract integral values to/from the given
atomic_t value.  The first two routines pass explicit integers by
which to make the adjustment, whereas the latter two use an implicit
adjustment value of "1".

One very important aspect of these two routines is that they DO NOT
require any explicit memory barriers.  They need only perform the
atomic_t counter update in an SMP safe manner.
------------------------------------------------------------------------

So, no, these functions do not imply any ordering other than to the
variable modified.  This one predates my joining the Linux kernel
community.  ;-)  So any cases where someone is relying on atomic_inc()
to provide ordering are bugs.

Now for value-returning atomics, for example, atomic_inc_return(),
full ordering is indeed required.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potentially missing "memory" clobbers in bitops.h for x86
  2019-03-29 22:05           ` Paul E. McKenney
@ 2019-03-29 22:30             ` hpa
  2019-04-01 10:53             ` Peter Zijlstra
  1 sibling, 0 replies; 12+ messages in thread
From: hpa @ 2019-03-29 22:30 UTC (permalink / raw)
  To: paulmck, Paul E. McKenney
  Cc: Alexander Potapenko, Peter Zijlstra, Ingo Molnar, LKML,
	Dmitriy Vyukov, James Y Knight

On March 29, 2019 3:05:54 PM PDT, "Paul E. McKenney" <paulmck@linux.ibm.com> wrote:
>On Fri, Mar 29, 2019 at 02:51:26PM -0700, H. Peter Anvin wrote:
>> On 3/29/19 2:09 PM, Paul E. McKenney wrote:
>> >>
>> >> Note: the atomic versions of these functions obviously need to
>have
>> >> "volatile" and the clobber anyway, as they are by definition
>barriers
>> >> and moving memory operations around them would be a very serious
>error.
>> > 
>> > The atomic functions that return void don't need to order anything
>except
>> > the input and output arguments.  The oddness with clear_bit() is
>that the
>> > memory changed isn't necessarily the quantity referenced by the
>argument,
>> > if the number of bits specified is large.
>> > 
>> > So (for example) atomic_inc() does not need a "memory" clobber,
>right?
>> 
>> I don't believe that is true: the code calling it has a reasonable
>> expectation that previous memory operations have finished and later
>> memory operations have not started from the point of view of another
>> processor. You are more of an expert on memory ordering than I am,
>but
>> I'm 89% sure that there is plenty of code in the kernel which makes
>that
>> assumption.
>
>From Documentation/core-api/atomic_ops.rst:
>
>------------------------------------------------------------------------
>	void atomic_add(int i, atomic_t *v);
>	void atomic_sub(int i, atomic_t *v);
>	void atomic_inc(atomic_t *v);
>	void atomic_dec(atomic_t *v);
>
>These four routines add and subtract integral values to/from the given
>atomic_t value.  The first two routines pass explicit integers by
>which to make the adjustment, whereas the latter two use an implicit
>adjustment value of "1".
>
>One very important aspect of these two routines is that they DO NOT
>require any explicit memory barriers.  They need only perform the
>atomic_t counter update in an SMP safe manner.
>------------------------------------------------------------------------
>
>So, no, these functions do not imply any ordering other than to the
>variable modified.  This one predates my joining the Linux kernel
>community.  ;-)  So any cases where someone is relying on atomic_inc()
>to provide ordering are bugs.
>
>Now for value-returning atomics, for example, atomic_inc_return(),
>full ordering is indeed required.
>
>							Thanx, Paul

Ok.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potentially missing "memory" clobbers in bitops.h for x86
  2019-03-29 22:05           ` Paul E. McKenney
  2019-03-29 22:30             ` hpa
@ 2019-04-01 10:53             ` Peter Zijlstra
  2019-04-01 15:44               ` Paul E. McKenney
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2019-04-01 10:53 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: H. Peter Anvin, Alexander Potapenko, Ingo Molnar, LKML,
	Dmitriy Vyukov, James Y Knight

On Fri, Mar 29, 2019 at 03:05:54PM -0700, Paul E. McKenney wrote:
> On Fri, Mar 29, 2019 at 02:51:26PM -0700, H. Peter Anvin wrote:
> > On 3/29/19 2:09 PM, Paul E. McKenney wrote:
> > >>
> > >> Note: the atomic versions of these functions obviously need to have
> > >> "volatile" and the clobber anyway, as they are by definition barriers
> > >> and moving memory operations around them would be a very serious error.
> > > 
> > > The atomic functions that return void don't need to order anything except
> > > the input and output arguments.  The oddness with clear_bit() is that the
> > > memory changed isn't necessarily the quantity referenced by the argument,
> > > if the number of bits specified is large.
> > > 
> > > So (for example) atomic_inc() does not need a "memory" clobber, right?

Correct, and many implementations do not, including x86:

static __always_inline void arch_atomic_inc(atomic_t *v)
{
	asm volatile(LOCK_PREFIX "incl %0"
		     : "+m" (v->counter));
}

> > I don't believe that is true: the code calling it has a reasonable
> > expectation that previous memory operations have finished and later
> > memory operations have not started from the point of view of another
> > processor. You are more of an expert on memory ordering than I am, but
> > I'm 89% sure that there is plenty of code in the kernel which makes that
> > assumption.
> 
> From Documentation/core-api/atomic_ops.rst:

We should delete that file.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potentially missing "memory" clobbers in bitops.h for x86
  2019-03-29 20:52     ` H. Peter Anvin
  2019-03-29 21:09       ` Paul E. McKenney
@ 2019-04-01 15:00       ` Alexander Potapenko
  1 sibling, 0 replies; 12+ messages in thread
From: Alexander Potapenko @ 2019-04-01 15:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Paul McKenney, Peter Zijlstra, Ingo Molnar, LKML, Dmitriy Vyukov,
	James Y Knight

On Fri, Mar 29, 2019 at 9:52 PM H. Peter Anvin <hpa@zytor.com> wrote:
>
> On 3/29/19 8:54 AM, Alexander Potapenko wrote:
> >
> >> Of course, this would force the compiler to actually compute the
> >> offset, which would slow things down.  I have no idea whether this
> >> would be better or worse than just using the "memory" clobber.
> > Just adding the "memory" clobber to clear_bit() changes sizes of 5
> > kernel functions (the three mentioned above, plus hub_activate() and
> > native_send_call_func_ipi()) by a small margin.
> > This probably means the performance impact of this clobber is
> > negligible in this case.
>
> I would agree with that.
>
> Could you perhaps verify whether or not any of the above functions
> contains a currently manifest bug?
Yes, I've checked that none of the above functions contain the bug.
For a patch adding 7 memory constraints to various functions in
bitops.h there are already 258 functions that change their size.
I've skimmed through those with the biggest diffs and didn't find any
flaws, but that can be just pure luck.
I would expect such bugs to be more likely with full program
optimization kicking in.

> Note: the atomic versions of these functions obviously need to have
> "volatile" and the clobber anyway, as they are by definition barriers
> and moving memory operations around them would be a very serious error.
>
>         -hpa
>
>
>
--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potentially missing "memory" clobbers in bitops.h for x86
  2019-04-01 10:53             ` Peter Zijlstra
@ 2019-04-01 15:44               ` Paul E. McKenney
  2019-04-01 16:04                 ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2019-04-01 15:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, Alexander Potapenko, Ingo Molnar, LKML,
	Dmitriy Vyukov, James Y Knight

On Mon, Apr 01, 2019 at 12:53:48PM +0200, Peter Zijlstra wrote:
> On Fri, Mar 29, 2019 at 03:05:54PM -0700, Paul E. McKenney wrote:
> > On Fri, Mar 29, 2019 at 02:51:26PM -0700, H. Peter Anvin wrote:
> > > On 3/29/19 2:09 PM, Paul E. McKenney wrote:
> > > >>
> > > >> Note: the atomic versions of these functions obviously need to have
> > > >> "volatile" and the clobber anyway, as they are by definition barriers
> > > >> and moving memory operations around them would be a very serious error.
> > > > 
> > > > The atomic functions that return void don't need to order anything except
> > > > the input and output arguments.  The oddness with clear_bit() is that the
> > > > memory changed isn't necessarily the quantity referenced by the argument,
> > > > if the number of bits specified is large.
> > > > 
> > > > So (for example) atomic_inc() does not need a "memory" clobber, right?
> 
> Correct, and many implementations do not, including x86:
> 
> static __always_inline void arch_atomic_inc(atomic_t *v)
> {
> 	asm volatile(LOCK_PREFIX "incl %0"
> 		     : "+m" (v->counter));
> }

Very good!

> > > I don't believe that is true: the code calling it has a reasonable
> > > expectation that previous memory operations have finished and later
> > > memory operations have not started from the point of view of another
> > > processor. You are more of an expert on memory ordering than I am, but
> > > I'm 89% sure that there is plenty of code in the kernel which makes that
> > > assumption.
> > 
> > From Documentation/core-api/atomic_ops.rst:
> 
> We should delete that file.

Only if all of its content is fully present elsewhere.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potentially missing "memory" clobbers in bitops.h for x86
  2019-04-01 15:44               ` Paul E. McKenney
@ 2019-04-01 16:04                 ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2019-04-01 16:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: H. Peter Anvin, Alexander Potapenko, Ingo Molnar, LKML,
	Dmitriy Vyukov, James Y Knight, Will Deacon

On Mon, Apr 01, 2019 at 08:44:14AM -0700, Paul E. McKenney wrote:
> On Mon, Apr 01, 2019 at 12:53:48PM +0200, Peter Zijlstra wrote:
> > On Fri, Mar 29, 2019 at 03:05:54PM -0700, Paul E. McKenney wrote:

> > > From Documentation/core-api/atomic_ops.rst:
> > 
> > We should delete that file.
> 
> Only if all of its content is fully present elsewhere.  ;-)

Documentation/atomic_t.txt and Documentation/atomic_bitops.txt _should_
cover all of it I _think_. If there's anything mising, please tell and
we'll write more documents.

And I suppose we should ammend atomic_bitops.txt with a reference to
include/asm-generic/bitops/atomic.h which fully implements the atomic
bitops using atomic_t.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-04-01 16:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-28 14:14 Potentially missing "memory" clobbers in bitops.h for x86 Alexander Potapenko
2019-03-28 16:22 ` Paul E. McKenney
2019-03-29 15:54   ` Alexander Potapenko
2019-03-29 20:52     ` H. Peter Anvin
2019-03-29 21:09       ` Paul E. McKenney
2019-03-29 21:51         ` H. Peter Anvin
2019-03-29 22:05           ` Paul E. McKenney
2019-03-29 22:30             ` hpa
2019-04-01 10:53             ` Peter Zijlstra
2019-04-01 15:44               ` Paul E. McKenney
2019-04-01 16:04                 ` Peter Zijlstra
2019-04-01 15:00       ` Alexander Potapenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.