All of lore.kernel.org
 help / color / mirror / Atom feed
* Potentially Broken Address Dependency via test_bit() When Compiling With Clang
@ 2021-10-27 10:19 Paul Heidekrüger
  2021-10-27 11:56 ` David Laight
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Paul Heidekrüger @ 2021-10-27 10:19 UTC (permalink / raw)
  To: paulmck, will, peterz, boqun.feng, stern, parri.andrea,
	linux-kernel, llvm
  Cc: elver, charalampos.mainas, pramod.bhatotia

Hi all,

For my bachelor thesis, I have been working on the infamous problem of 
potentially broken dependency orderings in the Linux kernel. I'm being 
advised by Marco Elver, Charalampos Mainas, Pramod Bhatotia (Cc'd).

For context, see: 
https://linuxplumbersconf.org/event/7/contributions/821/attachments/598/1075/LPC_2020_--_Dependency_ordering.pdf

Our approach consists of two LLVM compiler passes which annotate 
dependencies in unoptimised intermediate representation (IR) and verify 
the annotated dependencies in optimised IR. ATM, the passes only 
recognise a subset of address dependencies - everything is still WIP ;-)

We have been cross-compiling with a slightly modified version of 
allyesconfig for arm64, and the passes have now found a case that we 
would like to share with LKML for feedback: an address dependency being 
broken (?) through compiler optimisations in 
fs/afs/addr_list.c::afs_iterate_addresses().

Address dependency in source code, lines 373 - 375 in fs/afs/addr_list.c:

> [...]
>   index = READ_ONCE(ac->alist->preferred);
>   if (test_bit(index, &set))
>     goto selected;
> [...]

where test_bit() expands to the following in 
include/asm-generic/bitops/non-atomic.h, lines 115 - 122:

> static __always_inline int
> arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> {
>   return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> }
> #define test_bit arch_test_bit

The address dependency gets preserved in unoptimised IR since the virtual register %33 transitively depends on %28:

> %28 = load volatile i8, i8* %preferred, align 2, !annotation !15
> store i8 %28, i8* %tmp21, align 1
> %29 = load i8, i8* %tmp21, align 1
> %conv23 = zext i8 %29 to i32
> store i32 %conv23, i32* %index, align 4
> %30 = load i32, i32* %index, align 4
> store i32 %30, i32* %nr.addr.i, align 4
> store i64* %set, i64** %addr.addr.i, align 8
> %31 = load i64*, i64** %addr.addr.i, align 8
> %32 = load i32, i32* %nr.addr.i, align 4
> %div.i = udiv i32 %32, 64
> %idxprom.i = zext i32 %div.i to i64
> %arrayidx.i = getelementptr i64, i64* %31, i64 %idxprom.i
> %33 = load volatile i64, i64* %arrayidx.i, align 8, !annotation !16

In optimised IR, there is no dependency between the two volatile loads 
anymore:

> %11 = load volatile i8, i8* %preferred, align 2, !annotation !19
> %conv25 = zext i8 %11 to i32
> %set.0. = load volatile i64, i64* %set, align 8

Now, since @nr traces back to the READ_ONCE() to @index, does this make 
the load from @addr in test_bit() address-dependent on that READ_ONCE()? 
Should the load from @addr therefore be ordered against the READ_ONCE()?

Many thanks,
Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Potentially Broken Address Dependency via test_bit() When Compiling With Clang
  2021-10-27 10:19 Potentially Broken Address Dependency via test_bit() When Compiling With Clang Paul Heidekrüger
@ 2021-10-27 11:56 ` David Laight
  2021-10-27 12:17 ` Peter Zijlstra
  2021-10-27 14:27 ` Alan Stern
  2 siblings, 0 replies; 11+ messages in thread
From: David Laight @ 2021-10-27 11:56 UTC (permalink / raw)
  To: 'Paul Heidekrüger',
	paulmck, will, peterz, boqun.feng, stern, parri.andrea,
	linux-kernel, llvm
  Cc: elver, charalampos.mainas, pramod.bhatotia

From: Paul Heidekrüger
> Sent: 27 October 2021 11:20
> 
> For my bachelor thesis, I have been working on the infamous problem of
> potentially broken dependency orderings in the Linux kernel. I'm being
> advised by Marco Elver, Charalampos Mainas, Pramod Bhatotia (Cc'd).
> 
> For context, see:
> https://linuxplumbersconf.org/event/7/contributions/821/attachments/598/1075/LPC_2020_--
> _Dependency_ordering.pdf
> 
> Our approach consists of two LLVM compiler passes which annotate
> dependencies in unoptimised intermediate representation (IR) and verify
> the annotated dependencies in optimised IR. ATM, the passes only
> recognise a subset of address dependencies - everything is still WIP ;-)
> 
> We have been cross-compiling with a slightly modified version of
> allyesconfig for arm64, and the passes have now found a case that we
> would like to share with LKML for feedback: an address dependency being
> broken (?) through compiler optimisations in
> fs/afs/addr_list.c::afs_iterate_addresses().
> 
> Address dependency in source code, lines 373 - 375 in fs/afs/addr_list.c:
> 
> > [...]
> >   index = READ_ONCE(ac->alist->preferred);
> >   if (test_bit(index, &set))
> >     goto selected;
> > [...]
> 
> where test_bit() expands to the following in
> include/asm-generic/bitops/non-atomic.h, lines 115 - 122:
> 
> > static __always_inline int
> > arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> > {
> >   return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> > }
> > #define test_bit arch_test_bit

I don't think there is expected to be an address dependency.
The READ_ONCE() is needed to ensure the generated code doesn't use
two different values for 'index' - eg for 'nr' inside arch_test_bit().

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Potentially Broken Address Dependency via test_bit() When Compiling With Clang
  2021-10-27 10:19 Potentially Broken Address Dependency via test_bit() When Compiling With Clang Paul Heidekrüger
  2021-10-27 11:56 ` David Laight
@ 2021-10-27 12:17 ` Peter Zijlstra
  2021-10-27 12:24   ` Marco Elver
  2021-10-27 12:34   ` Peter Zijlstra
  2021-10-27 14:27 ` Alan Stern
  2 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2021-10-27 12:17 UTC (permalink / raw)
  To: Paul Heidekrüger
  Cc: paulmck, will, boqun.feng, stern, parri.andrea, linux-kernel,
	llvm, elver, charalampos.mainas, pramod.bhatotia

On Wed, Oct 27, 2021 at 12:19:48PM +0200, Paul Heidekrüger wrote:
> Hi all,
> 
> For my bachelor thesis, I have been working on the infamous problem of 
> potentially broken dependency orderings in the Linux kernel. I'm being 
> advised by Marco Elver, Charalampos Mainas, Pramod Bhatotia (Cc'd).

Nice! Great to see someone working on this!

> For context, see: 
> https://linuxplumbersconf.org/event/7/contributions/821/attachments/598/1075/LPC_2020_--_Dependency_ordering.pdf
> 
> Our approach consists of two LLVM compiler passes which annotate 
> dependencies in unoptimised intermediate representation (IR) and verify 
> the annotated dependencies in optimised IR. ATM, the passes only 
> recognise a subset of address dependencies - everything is still WIP ;-)
> 
> We have been cross-compiling with a slightly modified version of 
> allyesconfig for arm64, and the passes have now found a case that we 
> would like to share with LKML for feedback: an address dependency being 
> broken (?) through compiler optimisations in 
> fs/afs/addr_list.c::afs_iterate_addresses().
> 
> Address dependency in source code, lines 373 - 375 in fs/afs/addr_list.c:
> 
> > [...]
> >   index = READ_ONCE(ac->alist->preferred);
> >   if (test_bit(index, &set))
> >     goto selected;
> > [...]
> 
> where test_bit() expands to the following in 
> include/asm-generic/bitops/non-atomic.h, lines 115 - 122:
> 
> > static __always_inline int
> > arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> > {
> >   return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> > }
> > #define test_bit arch_test_bit
> 
> The address dependency gets preserved in unoptimised IR since the virtual register %33 transitively depends on %28:
> 
> > %28 = load volatile i8, i8* %preferred, align 2, !annotation !15
> > store i8 %28, i8* %tmp21, align 1
> > %29 = load i8, i8* %tmp21, align 1
> > %conv23 = zext i8 %29 to i32
> > store i32 %conv23, i32* %index, align 4
> > %30 = load i32, i32* %index, align 4
> > store i32 %30, i32* %nr.addr.i, align 4
> > store i64* %set, i64** %addr.addr.i, align 8
> > %31 = load i64*, i64** %addr.addr.i, align 8
> > %32 = load i32, i32* %nr.addr.i, align 4
> > %div.i = udiv i32 %32, 64
> > %idxprom.i = zext i32 %div.i to i64
> > %arrayidx.i = getelementptr i64, i64* %31, i64 %idxprom.i
> > %33 = load volatile i64, i64* %arrayidx.i, align 8, !annotation !16
> 
> In optimised IR, there is no dependency between the two volatile loads 
> anymore:
> 
> > %11 = load volatile i8, i8* %preferred, align 2, !annotation !19
> > %conv25 = zext i8 %11 to i32
> > %set.0. = load volatile i64, i64* %set, align 8
> 
> Now, since @nr traces back to the READ_ONCE() to @index, does this make 
> the load from @addr in test_bit() address-dependent on that READ_ONCE()? 
> Should the load from @addr therefore be ordered against the READ_ONCE()?

I would personally not consider this a dependend load. The result
depends on two loads, but there is no actual ordering between them.

  r1 = *x
  r2 = *y
  b = 1 & (r1 >> r2);

(more or less)

A dependent load would be something where the address of the second load
depends on the value of the first load, eg:

  r1 = *x;
  r2 = *(y + r1);

typically derefencing or array accesses have this pattern. The canonical
example being rcu_dereference(), and is the reason Paul Mckenney is
arguing that pointers should carry dependecies; I'll let him refer to
the many C language papers on this.

Other examples, ones we're actually worried about the compiler breaking,
are, for example, the array access as found in __ktime_get_fast_ns():

	seq = READ_ONCE(tkf->seq);
	tkr = tkf->base + (seq & 1);
	now = tkr->...

Here the dependency is on an integer (seq), and worse, only a single bit
of it. If the compiler were this to transform into something like:

	seq = READ_ONCE(tkf->seq)
	if (seq & 1) {
		// use tkf->base[1]
	} else {
		// use tkf->base[0]
	}

Then it would be broken, since the condition doesn't order the two loads
and they can be re-ordered. Which in turn breaks the premise of the
seqcount_latch construct -- see the comment that goes with
raw_write_seqcount_latch() in seqlock.h.

hth,

~Peter


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Potentially Broken Address Dependency via test_bit() When Compiling With Clang
  2021-10-27 12:17 ` Peter Zijlstra
@ 2021-10-27 12:24   ` Marco Elver
  2021-10-27 12:34   ` Peter Zijlstra
  1 sibling, 0 replies; 11+ messages in thread
From: Marco Elver @ 2021-10-27 12:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Heidekrüger, paulmck, will, boqun.feng, stern,
	parri.andrea, linux-kernel, llvm, charalampos.mainas,
	pramod.bhatotia

On Wed, 27 Oct 2021 at 14:17, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Oct 27, 2021 at 12:19:48PM +0200, Paul Heidekrüger wrote:
> > Hi all,
> >
> > For my bachelor thesis, I have been working on the infamous problem of
> > potentially broken dependency orderings in the Linux kernel. I'm being
> > advised by Marco Elver, Charalampos Mainas, Pramod Bhatotia (Cc'd).
>
> Nice! Great to see someone working on this!
>
> > For context, see:
> > https://linuxplumbersconf.org/event/7/contributions/821/attachments/598/1075/LPC_2020_--_Dependency_ordering.pdf
> >
> > Our approach consists of two LLVM compiler passes which annotate
> > dependencies in unoptimised intermediate representation (IR) and verify
> > the annotated dependencies in optimised IR. ATM, the passes only
> > recognise a subset of address dependencies - everything is still WIP ;-)
> >
> > We have been cross-compiling with a slightly modified version of
> > allyesconfig for arm64, and the passes have now found a case that we
> > would like to share with LKML for feedback: an address dependency being
> > broken (?) through compiler optimisations in
> > fs/afs/addr_list.c::afs_iterate_addresses().
> >
> > Address dependency in source code, lines 373 - 375 in fs/afs/addr_list.c:
> >
> > > [...]
> > >   index = READ_ONCE(ac->alist->preferred);
> > >   if (test_bit(index, &set))
> > >     goto selected;
> > > [...]
> >
> > where test_bit() expands to the following in
> > include/asm-generic/bitops/non-atomic.h, lines 115 - 122:
> >
> > > static __always_inline int
> > > arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> > > {
> > >   return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> > > }
> > > #define test_bit arch_test_bit
> >
> > The address dependency gets preserved in unoptimised IR since the virtual register %33 transitively depends on %28:
> >
> > > %28 = load volatile i8, i8* %preferred, align 2, !annotation !15
> > > store i8 %28, i8* %tmp21, align 1
> > > %29 = load i8, i8* %tmp21, align 1
> > > %conv23 = zext i8 %29 to i32
> > > store i32 %conv23, i32* %index, align 4
> > > %30 = load i32, i32* %index, align 4
> > > store i32 %30, i32* %nr.addr.i, align 4
> > > store i64* %set, i64** %addr.addr.i, align 8
> > > %31 = load i64*, i64** %addr.addr.i, align 8
> > > %32 = load i32, i32* %nr.addr.i, align 4
> > > %div.i = udiv i32 %32, 64
> > > %idxprom.i = zext i32 %div.i to i64
> > > %arrayidx.i = getelementptr i64, i64* %31, i64 %idxprom.i
> > > %33 = load volatile i64, i64* %arrayidx.i, align 8, !annotation !16
> >
> > In optimised IR, there is no dependency between the two volatile loads
> > anymore:
> >
> > > %11 = load volatile i8, i8* %preferred, align 2, !annotation !19
> > > %conv25 = zext i8 %11 to i32
> > > %set.0. = load volatile i64, i64* %set, align 8
> >
> > Now, since @nr traces back to the READ_ONCE() to @index, does this make
> > the load from @addr in test_bit() address-dependent on that READ_ONCE()?
> > Should the load from @addr therefore be ordered against the READ_ONCE()?
>
> I would personally not consider this a dependend load. The result
> depends on two loads, but there is no actual ordering between them.
>
>   r1 = *x
>   r2 = *y
>   b = 1 & (r1 >> r2);
>
> (more or less)

Note that test_bit() does the load in terms of this: "...
addr[BIT_WORD(nr)] ..."
which means the address loaded does depend on 'nr'. And in the case
here 'nr' is a READ_ONCE()'d. From all the documentation we can find,
we think it's technically an addr-dep, albeit a pretty useless one.

I guess in this case nobody cares very much, because 'set' is on the
stack and not modified concurrently.

> A dependent load would be something where the address of the second load
> depends on the value of the first load, eg:
>
>   r1 = *x;
>   r2 = *(y + r1);
>
> typically derefencing or array accesses have this pattern. The canonical
> example being rcu_dereference(), and is the reason Paul Mckenney is
> arguing that pointers should carry dependecies; I'll let him refer to
> the many C language papers on this.
>
> Other examples, ones we're actually worried about the compiler breaking,
> are, for example, the array access as found in __ktime_get_fast_ns():
>
>         seq = READ_ONCE(tkf->seq);
>         tkr = tkf->base + (seq & 1);
>         now = tkr->...
>
> Here the dependency is on an integer (seq), and worse, only a single bit
> of it. If the compiler were this to transform into something like:
>
>         seq = READ_ONCE(tkf->seq)
>         if (seq & 1) {
>                 // use tkf->base[1]
>         } else {
>                 // use tkf->base[0]
>         }
>
> Then it would be broken, since the condition doesn't order the two loads
> and they can be re-ordered. Which in turn breaks the premise of the
> seqcount_latch construct -- see the comment that goes with
> raw_write_seqcount_latch() in seqlock.h.
>
> hth,
>
> ~Peter
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Potentially Broken Address Dependency via test_bit() When Compiling With Clang
  2021-10-27 12:17 ` Peter Zijlstra
  2021-10-27 12:24   ` Marco Elver
@ 2021-10-27 12:34   ` Peter Zijlstra
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2021-10-27 12:34 UTC (permalink / raw)
  To: Paul Heidekrüger
  Cc: paulmck, will, boqun.feng, stern, parri.andrea, linux-kernel,
	llvm, elver, charalampos.mainas, pramod.bhatotia

On Wed, Oct 27, 2021 at 02:17:48PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 27, 2021 at 12:19:48PM +0200, Paul Heidekrüger wrote:
> I would personally not consider this a dependend load. The result
> depends on two loads, but there is no actual ordering between them.
> 
>   r1 = *x
>   r2 = *y
>   b = 1 & (r1 >> r2);
> 
> (more or less)

melver pointed out on IRC that I missed the whole BIT_WORD(nr) thing.
And with that restored this should indeed be an address dependency.

Still, I wasn't actually expecting test_bit() to be one. Nice find.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Potentially Broken Address Dependency via test_bit() When Compiling With Clang
  2021-10-27 10:19 Potentially Broken Address Dependency via test_bit() When Compiling With Clang Paul Heidekrüger
  2021-10-27 11:56 ` David Laight
  2021-10-27 12:17 ` Peter Zijlstra
@ 2021-10-27 14:27 ` Alan Stern
  2021-10-28 12:37   ` Paul Heidekrüger
  2 siblings, 1 reply; 11+ messages in thread
From: Alan Stern @ 2021-10-27 14:27 UTC (permalink / raw)
  To: Paul Heidekrüger
  Cc: paulmck, will, peterz, boqun.feng, parri.andrea, linux-kernel,
	llvm, elver, charalampos.mainas, pramod.bhatotia

On Wed, Oct 27, 2021 at 12:19:48PM +0200, Paul Heidekrüger wrote:
> Hi all,
> 
> For my bachelor thesis, I have been working on the infamous problem of 
> potentially broken dependency orderings in the Linux kernel. I'm being 
> advised by Marco Elver, Charalampos Mainas, Pramod Bhatotia (Cc'd).
> 
> For context, see: 
> https://linuxplumbersconf.org/event/7/contributions/821/attachments/598/1075/LPC_2020_--_Dependency_ordering.pdf
> 
> Our approach consists of two LLVM compiler passes which annotate 
> dependencies in unoptimised intermediate representation (IR) and verify 
> the annotated dependencies in optimised IR. ATM, the passes only 
> recognise a subset of address dependencies - everything is still WIP ;-)
> 
> We have been cross-compiling with a slightly modified version of 
> allyesconfig for arm64, and the passes have now found a case that we 
> would like to share with LKML for feedback: an address dependency being 
> broken (?) through compiler optimisations in 
> fs/afs/addr_list.c::afs_iterate_addresses().
> 
> Address dependency in source code, lines 373 - 375 in fs/afs/addr_list.c:
> 
> > [...]
> >   index = READ_ONCE(ac->alist->preferred);
> >   if (test_bit(index, &set))
> >     goto selected;
> > [...]
> 
> where test_bit() expands to the following in 
> include/asm-generic/bitops/non-atomic.h, lines 115 - 122:
> 
> > static __always_inline int
> > arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> > {
> >   return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> > }
> > #define test_bit arch_test_bit
> 
> The address dependency gets preserved in unoptimised IR since the virtual register %33 transitively depends on %28:
> 
> > %28 = load volatile i8, i8* %preferred, align 2, !annotation !15
> > store i8 %28, i8* %tmp21, align 1
> > %29 = load i8, i8* %tmp21, align 1
> > %conv23 = zext i8 %29 to i32
> > store i32 %conv23, i32* %index, align 4
> > %30 = load i32, i32* %index, align 4
> > store i32 %30, i32* %nr.addr.i, align 4
> > store i64* %set, i64** %addr.addr.i, align 8
> > %31 = load i64*, i64** %addr.addr.i, align 8
> > %32 = load i32, i32* %nr.addr.i, align 4
> > %div.i = udiv i32 %32, 64
> > %idxprom.i = zext i32 %div.i to i64
> > %arrayidx.i = getelementptr i64, i64* %31, i64 %idxprom.i
> > %33 = load volatile i64, i64* %arrayidx.i, align 8, !annotation !16
> 
> In optimised IR, there is no dependency between the two volatile loads 
> anymore:
> 
> > %11 = load volatile i8, i8* %preferred, align 2, !annotation !19
> > %conv25 = zext i8 %11 to i32
> > %set.0. = load volatile i64, i64* %set, align 8
> 
> Now, since @nr traces back to the READ_ONCE() to @index, does this make 
> the load from @addr in test_bit() address-dependent on that READ_ONCE()? 
> Should the load from @addr therefore be ordered against the READ_ONCE()?

As others have pointed out, there really is an address dependency here 
(although it's not a very useful one and the code doesn't rely on it).

However, I can't follow the IR code.  Can you please explain in ordinary 
English how the LLVM compiler manages to lose track of this dependency?

Alan Stern

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Potentially Broken Address Dependency via test_bit() When Compiling With Clang
  2021-10-27 14:27 ` Alan Stern
@ 2021-10-28 12:37   ` Paul Heidekrüger
  2021-10-28 14:34     ` Alan Stern
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Heidekrüger @ 2021-10-28 12:37 UTC (permalink / raw)
  To: Alan Stern
  Cc: paulmck, will, peterz, boqun.feng, parri.andrea, linux-kernel,
	llvm, elver, charalampos.mainas, pramod.bhatotia

On Wed, Oct 27, 2021 at 10:27:20AM -0400, Alan Stern wrote:
> On Wed, Oct 27, 2021 at 12:19:48PM +0200, Paul Heidekrüger wrote:
> > Hi all,
> > 
> > For my bachelor thesis, I have been working on the infamous problem of 
> > potentially broken dependency orderings in the Linux kernel. I'm being 
> > advised by Marco Elver, Charalampos Mainas, Pramod Bhatotia (Cc'd).
> > 
> > For context, see: 
> > https://linuxplumbersconf.org/event/7/contributions/821/attachments/598/1075/LPC_2020_--_Dependency_ordering.pdf
> > 
> > Our approach consists of two LLVM compiler passes which annotate 
> > dependencies in unoptimised intermediate representation (IR) and verify 
> > the annotated dependencies in optimised IR. ATM, the passes only 
> > recognise a subset of address dependencies - everything is still WIP ;-)
> > 
> > We have been cross-compiling with a slightly modified version of 
> > allyesconfig for arm64, and the passes have now found a case that we 
> > would like to share with LKML for feedback: an address dependency being 
> > broken (?) through compiler optimisations in 
> > fs/afs/addr_list.c::afs_iterate_addresses().
> > 
> > Address dependency in source code, lines 373 - 375 in fs/afs/addr_list.c:
> > 
> > > [...]
> > >   index = READ_ONCE(ac->alist->preferred);
> > >   if (test_bit(index, &set))
> > >     goto selected;
> > > [...]
> > 
> > where test_bit() expands to the following in 
> > include/asm-generic/bitops/non-atomic.h, lines 115 - 122:
> > 
> > > static __always_inline int
> > > arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> > > {
> > >   return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> > > }
> > > #define test_bit arch_test_bit
> > 
> > The address dependency gets preserved in unoptimised IR since the virtual register %33 transitively depends on %28:
> > 
> > > %28 = load volatile i8, i8* %preferred, align 2, !annotation !15
> > > store i8 %28, i8* %tmp21, align 1
> > > %29 = load i8, i8* %tmp21, align 1
> > > %conv23 = zext i8 %29 to i32
> > > store i32 %conv23, i32* %index, align 4
> > > %30 = load i32, i32* %index, align 4
> > > store i32 %30, i32* %nr.addr.i, align 4
> > > store i64* %set, i64** %addr.addr.i, align 8
> > > %31 = load i64*, i64** %addr.addr.i, align 8
> > > %32 = load i32, i32* %nr.addr.i, align 4
> > > %div.i = udiv i32 %32, 64
> > > %idxprom.i = zext i32 %div.i to i64
> > > %arrayidx.i = getelementptr i64, i64* %31, i64 %idxprom.i
> > > %33 = load volatile i64, i64* %arrayidx.i, align 8, !annotation !16
> > 
> > In optimised IR, there is no dependency between the two volatile loads 
> > anymore:
> > 
> > > %11 = load volatile i8, i8* %preferred, align 2, !annotation !19
> > > %conv25 = zext i8 %11 to i32
> > > %set.0. = load volatile i64, i64* %set, align 8
> > 
> > Now, since @nr traces back to the READ_ONCE() to @index, does this make 
> > the load from @addr in test_bit() address-dependent on that READ_ONCE()? 
> > Should the load from @addr therefore be ordered against the READ_ONCE()?
> 
> As others have pointed out, there really is an address dependency here 
> (although it's not a very useful one and the code doesn't rely on it).
> 
> However, I can't follow the IR code.  Can you please explain in ordinary 
> English how the LLVM compiler manages to lose track of this dependency?
> 
> Alan Stern

Here's what we think might be going on:
- In 'arch_test_bit()', 'addr[BIT_WORD(nr)]' expands to 'addr[(nr) / 64]'.
- Since 'addr' points to an 'unsigned long', any other result than '0' for
  '(nr) / 64' would be out of bounds and therefore undefined.
- We assume LLVM is able to figure this out and use it to get rid of the
  address computation all together.

We ran some experiments to see how optimisations behave when 'set' is in fact
an array and / or in global scope.

1. Insert a 'barrier()' in 'arch_test_bit()' before the 'return':
The dependency gets broken.

2. Make 'set' an 'unsigned long' array of size '42', keep local scope: 
The dependency gets preserved.

3. Keep 'set' as 'unsigend long', move to global scope: 
The dependency gets preserved.

4. Make 'set' an 'unsigned long' array of size '42', move to global scope: 
The dependency gets preserved.

Many thanks,
Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Potentially Broken Address Dependency via test_bit() When Compiling With Clang
  2021-10-28 12:37   ` Paul Heidekrüger
@ 2021-10-28 14:34     ` Alan Stern
  2021-11-02 18:35       ` Paul Heidekrüger
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Stern @ 2021-10-28 14:34 UTC (permalink / raw)
  To: Paul Heidekrüger
  Cc: paulmck, will, peterz, boqun.feng, parri.andrea, linux-kernel,
	llvm, elver, charalampos.mainas, pramod.bhatotia

On Thu, Oct 28, 2021 at 02:37:47PM +0200, Paul Heidekrüger wrote:
> On Wed, Oct 27, 2021 at 10:27:20AM -0400, Alan Stern wrote:
> > On Wed, Oct 27, 2021 at 12:19:48PM +0200, Paul Heidekrüger wrote:

> > > Address dependency in source code, lines 373 - 375 in fs/afs/addr_list.c:
> > > 
> > > > [...]
> > > >   index = READ_ONCE(ac->alist->preferred);
> > > >   if (test_bit(index, &set))
> > > >     goto selected;
> > > > [...]
> > > 
> > > where test_bit() expands to the following in 
> > > include/asm-generic/bitops/non-atomic.h, lines 115 - 122:
> > > 
> > > > static __always_inline int
> > > > arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> > > > {
> > > >   return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> > > > }
> > > > #define test_bit arch_test_bit

> > However, I can't follow the IR code.  Can you please explain in ordinary 
> > English how the LLVM compiler manages to lose track of this dependency?
> > 
> > Alan Stern
> 
> Here's what we think might be going on:
> - In 'arch_test_bit()', 'addr[BIT_WORD(nr)]' expands to 'addr[(nr) / 64]'.
> - Since 'addr' points to an 'unsigned long', any other result than '0' for
>   '(nr) / 64' would be out of bounds and therefore undefined.
> - We assume LLVM is able to figure this out and use it to get rid of the
>   address computation all together.

Ah, that explains it.  Yes, when set is a single unsigned long (or an 
array of length 1), the address dependency is only syntactic, not 
semantic.  As a result, we should expect that compilers will sometimes 
not preserve it.

The danger, of course, is that people relying on an ordering prescribed 
by the LKMM may get fooled because (unbeknownst to them) the dependency 
in question is not semantic.  It would be great if a static checker 
could detect such things -- but this would require some way for us to 
inform the checker about when the code does rely on a dependency 
ordering.

> We ran some experiments to see how optimisations behave when 'set' is in fact
> an array and / or in global scope.
> 
> 1. Insert a 'barrier()' in 'arch_test_bit()' before the 'return':
> The dependency gets broken.
> 
> 2. Make 'set' an 'unsigned long' array of size '42', keep local scope: 
> The dependency gets preserved.
> 
> 3. Keep 'set' as 'unsigend long', move to global scope: 
> The dependency gets preserved.

That one's a little strange.  I don't see why the scope should make any 
difference, so long as the compiler knows the actual type and length.

> 4. Make 'set' an 'unsigned long' array of size '42', move to global scope: 
> The dependency gets preserved.

Alan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Potentially Broken Address Dependency via test_bit() When Compiling With Clang
  2021-10-28 14:34     ` Alan Stern
@ 2021-11-02 18:35       ` Paul Heidekrüger
  2021-11-02 19:01         ` Alan Stern
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Heidekrüger @ 2021-11-02 18:35 UTC (permalink / raw)
  To: Alan Stern
  Cc: paulmck, will, peterz, boqun.feng, parri.andrea, linux-kernel,
	llvm, elver, charalampos.mainas, pramod.bhatotia

On Thu, Oct 28, 2021 at 10:34:46AM -0400, Alan Stern wrote:
> On Thu, Oct 28, 2021 at 02:37:47PM +0200, Paul Heidekrüger wrote:
> > On Wed, Oct 27, 2021 at 10:27:20AM -0400, Alan Stern wrote:
> > > On Wed, Oct 27, 2021 at 12:19:48PM +0200, Paul Heidekrüger wrote:
> 
> > > > Address dependency in source code, lines 373 - 375 in fs/afs/addr_list.c:
> > > > 
> > > > > [...]
> > > > >   index = READ_ONCE(ac->alist->preferred);
> > > > >   if (test_bit(index, &set))
> > > > >     goto selected;
> > > > > [...]
> > > > 
> > > > where test_bit() expands to the following in 
> > > > include/asm-generic/bitops/non-atomic.h, lines 115 - 122:
> > > > 
> > > > > static __always_inline int
> > > > > arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> > > > > {
> > > > >   return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> > > > > }
> > > > > #define test_bit arch_test_bit
> 
> > > However, I can't follow the IR code.  Can you please explain in ordinary 
> > > English how the LLVM compiler manages to lose track of this dependency?
> > > 
> > > Alan Stern
> > 
> > Here's what we think might be going on:
> > - In 'arch_test_bit()', 'addr[BIT_WORD(nr)]' expands to 'addr[(nr) / 64]'.
> > - Since 'addr' points to an 'unsigned long', any other result than '0' for
> >   '(nr) / 64' would be out of bounds and therefore undefined.
> > - We assume LLVM is able to figure this out and use it to get rid of the
> >   address computation all together.
> 
> Ah, that explains it.  Yes, when set is a single unsigned long (or an 
> array of length 1), the address dependency is only syntactic, not 
> semantic.  As a result, we should expect that compilers will sometimes 
> not preserve it.

In LKMM's explanation.txt, lines 450 - 453 state:

> A read event and another memory access event are linked by an address
> dependency if the value obtained by the read affects the location
> accessed by the other event.

If we understand correctly, the ambiguity you're pointing out is that by
looking at 'addr[BIT_WORD(nr)]', one could deduce an address dependency by
seeing an array subscript operator, using a value which can be traced back to a
'READ_ONCE()' (syntactic).

However, since 'addr' points to an 'unsigned long', the 'READ_ONCE()' never
affects the location accessed in 'arch_test_bit()' as the offset computed in
the subscript operator can only be, as clang identified, '0' (semantic).

> The danger, of course, is that people relying on an ordering prescribed
> by the LKMM may get fooled because (unbeknownst to them) the dependency
> in question is not semantic.

Do you think this would warrant a change to LKMM's explanation.txt to make this
more explicit?

> It would be great if a static checker
> could detect such things -- but this would require some way for us to
> inform the checker about when the code does rely on a dependency
> ordering.

The compiler passes we're working on will hopefully be able to do exactly that,
not taking into account the programmer's intent of course.

Many thanks,
Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Potentially Broken Address Dependency via test_bit() When Compiling With Clang
  2021-11-02 18:35       ` Paul Heidekrüger
@ 2021-11-02 19:01         ` Alan Stern
  2021-11-04 18:16           ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Stern @ 2021-11-02 19:01 UTC (permalink / raw)
  To: Paul Heidekrüger
  Cc: paulmck, will, peterz, boqun.feng, parri.andrea, linux-kernel,
	llvm, elver, charalampos.mainas, pramod.bhatotia

On Tue, Nov 02, 2021 at 07:35:57PM +0100, Paul Heidekrüger wrote:
> On Thu, Oct 28, 2021 at 10:34:46AM -0400, Alan Stern wrote:
> > On Thu, Oct 28, 2021 at 02:37:47PM +0200, Paul Heidekrüger wrote:
> > > On Wed, Oct 27, 2021 at 10:27:20AM -0400, Alan Stern wrote:
> > > > On Wed, Oct 27, 2021 at 12:19:48PM +0200, Paul Heidekrüger wrote:
> > 
> > > > > Address dependency in source code, lines 373 - 375 in fs/afs/addr_list.c:
> > > > > 
> > > > > > [...]
> > > > > >   index = READ_ONCE(ac->alist->preferred);
> > > > > >   if (test_bit(index, &set))
> > > > > >     goto selected;
> > > > > > [...]
> > > > > 
> > > > > where test_bit() expands to the following in 
> > > > > include/asm-generic/bitops/non-atomic.h, lines 115 - 122:
> > > > > 
> > > > > > static __always_inline int
> > > > > > arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> > > > > > {
> > > > > >   return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> > > > > > }
> > > > > > #define test_bit arch_test_bit
> > 
> > > > However, I can't follow the IR code.  Can you please explain in ordinary 
> > > > English how the LLVM compiler manages to lose track of this dependency?
> > > > 
> > > > Alan Stern
> > > 
> > > Here's what we think might be going on:
> > > - In 'arch_test_bit()', 'addr[BIT_WORD(nr)]' expands to 'addr[(nr) / 64]'.
> > > - Since 'addr' points to an 'unsigned long', any other result than '0' for
> > >   '(nr) / 64' would be out of bounds and therefore undefined.
> > > - We assume LLVM is able to figure this out and use it to get rid of the
> > >   address computation all together.
> > 
> > Ah, that explains it.  Yes, when set is a single unsigned long (or an 
> > array of length 1), the address dependency is only syntactic, not 
> > semantic.  As a result, we should expect that compilers will sometimes 
> > not preserve it.
> 
> In LKMM's explanation.txt, lines 450 - 453 state:
> 
> > A read event and another memory access event are linked by an address
> > dependency if the value obtained by the read affects the location
> > accessed by the other event.
> 
> If we understand correctly, the ambiguity you're pointing out is that by
> looking at 'addr[BIT_WORD(nr)]', one could deduce an address dependency by
> seeing an array subscript operator, using a value which can be traced back to a
> 'READ_ONCE()' (syntactic).

Exactly.  The syntax of the expression apparently indicates that the 
address to be dereferenced depends on the value of nr.

> However, since 'addr' points to an 'unsigned long', the 'READ_ONCE()' never
> affects the location accessed in 'arch_test_bit()' as the offset computed in
> the subscript operator can only be, as clang identified, '0' (semantic).

Again correct.  Although the address to be dereferenced may seem to 
depend on nr, in fact it doesn't because in any valid execution nr must 
not contain any value other than 0 (and the compiler knows this).

> > The danger, of course, is that people relying on an ordering prescribed
> > by the LKMM may get fooled because (unbeknownst to them) the dependency
> > in question is not semantic.
> 
> Do you think this would warrant a change to LKMM's explanation.txt to make this
> more explicit?

It wouldn't hurt.  Note that explanation.txt does already include a 
section talking about address dependencies from marked loads to plain 
reads (line 2260 et seq).  It also talks about the possibility of the 
compiler undermining the memory model in this regard (line 2308).

However, it doesn't mention the difference between syntactic and 
semantic dependencies.  If you would like to submit a patch adding an 
explicit description of this, please feel free to do so.

> > It would be great if a static checker
> > could detect such things -- but this would require some way for us to
> > inform the checker about when the code does rely on a dependency
> > ordering.
> 
> The compiler passes we're working on will hopefully be able to do exactly that,
> not taking into account the programmer's intent of course.

Good luck!

Alan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Potentially Broken Address Dependency via test_bit() When Compiling With Clang
  2021-11-02 19:01         ` Alan Stern
@ 2021-11-04 18:16           ` Paul E. McKenney
  0 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2021-11-04 18:16 UTC (permalink / raw)
  To: Alan Stern
  Cc: Paul Heidekrüger, will, peterz, boqun.feng, parri.andrea,
	linux-kernel, llvm, elver, charalampos.mainas, pramod.bhatotia

On Tue, Nov 02, 2021 at 03:01:38PM -0400, Alan Stern wrote:
> On Tue, Nov 02, 2021 at 07:35:57PM +0100, Paul Heidekrüger wrote:
> > On Thu, Oct 28, 2021 at 10:34:46AM -0400, Alan Stern wrote:
> > > On Thu, Oct 28, 2021 at 02:37:47PM +0200, Paul Heidekrüger wrote:
> > > > On Wed, Oct 27, 2021 at 10:27:20AM -0400, Alan Stern wrote:
> > > > > On Wed, Oct 27, 2021 at 12:19:48PM +0200, Paul Heidekrüger wrote:
> > > 
> > > > > > Address dependency in source code, lines 373 - 375 in fs/afs/addr_list.c:
> > > > > > 
> > > > > > > [...]
> > > > > > >   index = READ_ONCE(ac->alist->preferred);
> > > > > > >   if (test_bit(index, &set))
> > > > > > >     goto selected;
> > > > > > > [...]
> > > > > > 
> > > > > > where test_bit() expands to the following in 
> > > > > > include/asm-generic/bitops/non-atomic.h, lines 115 - 122:
> > > > > > 
> > > > > > > static __always_inline int
> > > > > > > arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> > > > > > > {
> > > > > > >   return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> > > > > > > }
> > > > > > > #define test_bit arch_test_bit
> > > 
> > > > > However, I can't follow the IR code.  Can you please explain in ordinary 
> > > > > English how the LLVM compiler manages to lose track of this dependency?
> > > > > 
> > > > > Alan Stern
> > > > 
> > > > Here's what we think might be going on:
> > > > - In 'arch_test_bit()', 'addr[BIT_WORD(nr)]' expands to 'addr[(nr) / 64]'.
> > > > - Since 'addr' points to an 'unsigned long', any other result than '0' for
> > > >   '(nr) / 64' would be out of bounds and therefore undefined.
> > > > - We assume LLVM is able to figure this out and use it to get rid of the
> > > >   address computation all together.
> > > 
> > > Ah, that explains it.  Yes, when set is a single unsigned long (or an 
> > > array of length 1), the address dependency is only syntactic, not 
> > > semantic.  As a result, we should expect that compilers will sometimes 
> > > not preserve it.
> > 
> > In LKMM's explanation.txt, lines 450 - 453 state:
> > 
> > > A read event and another memory access event are linked by an address
> > > dependency if the value obtained by the read affects the location
> > > accessed by the other event.
> > 
> > If we understand correctly, the ambiguity you're pointing out is that by
> > looking at 'addr[BIT_WORD(nr)]', one could deduce an address dependency by
> > seeing an array subscript operator, using a value which can be traced back to a
> > 'READ_ONCE()' (syntactic).
> 
> Exactly.  The syntax of the expression apparently indicates that the 
> address to be dereferenced depends on the value of nr.
> 
> > However, since 'addr' points to an 'unsigned long', the 'READ_ONCE()' never
> > affects the location accessed in 'arch_test_bit()' as the offset computed in
> > the subscript operator can only be, as clang identified, '0' (semantic).
> 
> Again correct.  Although the address to be dereferenced may seem to 
> depend on nr, in fact it doesn't because in any valid execution nr must 
> not contain any value other than 0 (and the compiler knows this).
> 
> > > The danger, of course, is that people relying on an ordering prescribed
> > > by the LKMM may get fooled because (unbeknownst to them) the dependency
> > > in question is not semantic.
> > 
> > Do you think this would warrant a change to LKMM's explanation.txt to make this
> > more explicit?
> 
> It wouldn't hurt.  Note that explanation.txt does already include a 
> section talking about address dependencies from marked loads to plain 
> reads (line 2260 et seq).  It also talks about the possibility of the 
> compiler undermining the memory model in this regard (line 2308).
> 
> However, it doesn't mention the difference between syntactic and 
> semantic dependencies.  If you would like to submit a patch adding an 
> explicit description of this, please feel free to do so.
> 
> > > It would be great if a static checker
> > > could detect such things -- but this would require some way for us to
> > > inform the checker about when the code does rely on a dependency
> > > ordering.
> > 
> > The compiler passes we're working on will hopefully be able to do exactly that,
> > not taking into account the programmer's intent of course.
> 
> Good luck!

Most of the current proposals involve explicitly telling the compiler
of the programmer's intent.  But if you can make something that is
generally accepted, and that can figure out things on its own, that
would be extremely cool!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-11-04 18:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-27 10:19 Potentially Broken Address Dependency via test_bit() When Compiling With Clang Paul Heidekrüger
2021-10-27 11:56 ` David Laight
2021-10-27 12:17 ` Peter Zijlstra
2021-10-27 12:24   ` Marco Elver
2021-10-27 12:34   ` Peter Zijlstra
2021-10-27 14:27 ` Alan Stern
2021-10-28 12:37   ` Paul Heidekrüger
2021-10-28 14:34     ` Alan Stern
2021-11-02 18:35       ` Paul Heidekrüger
2021-11-02 19:01         ` Alan Stern
2021-11-04 18:16           ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.