All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] locking/atomic: atomic: Use arch_atomic_{read,set} in generic atomic ops
@ 2023-01-26 17:33 Jules Maselbas
  2023-01-27 11:18 ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Jules Maselbas @ 2023-01-26 17:33 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra, Boqun Feng, Mark Rutland, Arnd Bergmann
  Cc: linux-arch, linux-kernel, Jules Maselbas

Reading and setting the atomic counter should be done through
arch_atomic_{read,set} macros, which respectively uses {READ,WRITE}_ONCE
macros.

This is to avoid the compiler from potentially replay the read access.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jules Maselbas <jmaselbas@kalray.eu>
---
 include/asm-generic/atomic.h | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/include/asm-generic/atomic.h b/include/asm-generic/atomic.h
index 04b8be9f1a77..711e408af581 100644
--- a/include/asm-generic/atomic.h
+++ b/include/asm-generic/atomic.h
@@ -12,6 +12,9 @@
 #include <asm/cmpxchg.h>
 #include <asm/barrier.h>
 
+#define arch_atomic_read(v)			READ_ONCE((v)->counter)
+#define arch_atomic_set(v, i)			WRITE_ONCE(((v)->counter), (i))
+
 #ifdef CONFIG_SMP
 
 /* we can build all atomic primitives from cmpxchg */
@@ -21,7 +24,7 @@ static inline void generic_atomic_##op(int i, atomic_t *v)		\
 {									\
 	int c, old;							\
 									\
-	c = v->counter;							\
+	c = arch_atomic_read(v);					\
 	while ((old = arch_cmpxchg(&v->counter, c, c c_op i)) != c)	\
 		c = old;						\
 }
@@ -31,7 +34,7 @@ static inline int generic_atomic_##op##_return(int i, atomic_t *v)	\
 {									\
 	int c, old;							\
 									\
-	c = v->counter;							\
+	c = arch_atomic_read(v);					\
 	while ((old = arch_cmpxchg(&v->counter, c, c c_op i)) != c)	\
 		c = old;						\
 									\
@@ -43,7 +46,7 @@ static inline int generic_atomic_fetch_##op(int i, atomic_t *v)		\
 {									\
 	int c, old;							\
 									\
-	c = v->counter;							\
+	c = arch_atomic_read(v);					\
 	while ((old = arch_cmpxchg(&v->counter, c, c c_op i)) != c)	\
 		c = old;						\
 									\
@@ -58,9 +61,11 @@ static inline int generic_atomic_fetch_##op(int i, atomic_t *v)		\
 static inline void generic_atomic_##op(int i, atomic_t *v)		\
 {									\
 	unsigned long flags;						\
+	int c;								\
 									\
 	raw_local_irq_save(flags);					\
-	v->counter = v->counter c_op i;					\
+	c = arch_atomic_read(v);					\
+	arch_atomic_set(v, c c_op i);					\
 	raw_local_irq_restore(flags);					\
 }
 
@@ -68,27 +73,28 @@ static inline void generic_atomic_##op(int i, atomic_t *v)		\
 static inline int generic_atomic_##op##_return(int i, atomic_t *v)	\
 {									\
 	unsigned long flags;						\
-	int ret;							\
+	int c;								\
 									\
 	raw_local_irq_save(flags);					\
-	ret = (v->counter = v->counter c_op i);				\
+	c = arch_atomic_read(v);					\
+	arch_atomic_set(v, c c_op i);					\
 	raw_local_irq_restore(flags);					\
 									\
-	return ret;							\
+	return c c_op i;						\
 }
 
 #define ATOMIC_FETCH_OP(op, c_op)					\
 static inline int generic_atomic_fetch_##op(int i, atomic_t *v)		\
 {									\
 	unsigned long flags;						\
-	int ret;							\
+	int c;								\
 									\
 	raw_local_irq_save(flags);					\
-	ret = v->counter;						\
-	v->counter = v->counter c_op i;					\
+	c = arch_atomic_read(v);					\
+	arch_atomic_set(v, c c_op i);					\
 	raw_local_irq_restore(flags);					\
 									\
-	return ret;							\
+	return c;							\
 }
 
 #endif /* CONFIG_SMP */
@@ -127,9 +133,6 @@ ATOMIC_OP(xor, ^)
 #define arch_atomic_or				generic_atomic_or
 #define arch_atomic_xor				generic_atomic_xor
 
-#define arch_atomic_read(v)			READ_ONCE((v)->counter)
-#define arch_atomic_set(v, i)			WRITE_ONCE(((v)->counter), (i))
-
 #define arch_atomic_xchg(ptr, v)		(arch_xchg(&(ptr)->counter, (v)))
 #define arch_atomic_cmpxchg(v, old, new)	(arch_cmpxchg(&((v)->counter), (old), (new)))
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] locking/atomic: atomic: Use arch_atomic_{read,set} in generic atomic ops
  2023-01-26 17:33 [PATCH] locking/atomic: atomic: Use arch_atomic_{read,set} in generic atomic ops Jules Maselbas
@ 2023-01-27 11:18 ` Peter Zijlstra
  2023-01-27 13:49   ` Jules Maselbas
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2023-01-27 11:18 UTC (permalink / raw)
  To: Jules Maselbas
  Cc: Will Deacon, Boqun Feng, Mark Rutland, Arnd Bergmann, linux-arch,
	linux-kernel

On Thu, Jan 26, 2023 at 06:33:54PM +0100, Jules Maselbas wrote:

> @@ -58,9 +61,11 @@ static inline int generic_atomic_fetch_##op(int i, atomic_t *v)		\
>  static inline void generic_atomic_##op(int i, atomic_t *v)		\
>  {									\
>  	unsigned long flags;						\
> +	int c;								\
>  									\
>  	raw_local_irq_save(flags);					\
> -	v->counter = v->counter c_op i;					\
> +	c = arch_atomic_read(v);					\
> +	arch_atomic_set(v, c c_op i);					\
>  	raw_local_irq_restore(flags);					\
>  }

This and the others like it are a bit sad, it explicitly dis-allows the
compiler from using memops and forces a load-store.

The alternative is writing it like:

	*(volatile int *)&v->counter c_op i;

but that's pretty shit too I suppose.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] locking/atomic: atomic: Use arch_atomic_{read,set} in generic atomic ops
  2023-01-27 11:18 ` Peter Zijlstra
@ 2023-01-27 13:49   ` Jules Maselbas
  2023-01-27 14:34     ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Jules Maselbas @ 2023-01-27 13:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Boqun Feng, Mark Rutland, Arnd Bergmann, linux-arch,
	linux-kernel

Hi Peter,

On Fri, Jan 27, 2023 at 12:18:13PM +0100, Peter Zijlstra wrote:
> On Thu, Jan 26, 2023 at 06:33:54PM +0100, Jules Maselbas wrote:
> 
> > @@ -58,9 +61,11 @@ static inline int generic_atomic_fetch_##op(int i, atomic_t *v)		\
> >  static inline void generic_atomic_##op(int i, atomic_t *v)		\
> >  {									\
> >  	unsigned long flags;						\
> > +	int c;								\
> >  									\
> >  	raw_local_irq_save(flags);					\
> > -	v->counter = v->counter c_op i;					\
> > +	c = arch_atomic_read(v);					\
> > +	arch_atomic_set(v, c c_op i);					\
> >  	raw_local_irq_restore(flags);					\
> >  }
> 
> This and the others like it are a bit sad, it explicitly dis-allows the
> compiler from using memops and forces a load-store.
Good point, I don't know much about atomic memops but this is indeed a
bit sad to prevent such instructions to be used.

> The alternative is writing it like:
> 
> 	*(volatile int *)&v->counter c_op i;
I wonder if it could be possible to write something like:

        *(volatile int *)&v->counter += i;

I also noticed that GCC has some builtin/extension to do such things,
__atomic_OP_fetch and __atomic_fetch_OP, but I do not know if this
can be used in the kernel.


Thanks,
-- Jules






^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] locking/atomic: atomic: Use arch_atomic_{read,set} in generic atomic ops
  2023-01-27 13:49   ` Jules Maselbas
@ 2023-01-27 14:34     ` Peter Zijlstra
  2023-01-27 22:09       ` Boqun Feng
  2023-01-30 18:15       ` Jules Maselbas
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2023-01-27 14:34 UTC (permalink / raw)
  To: Jules Maselbas
  Cc: Will Deacon, Boqun Feng, Mark Rutland, Arnd Bergmann, linux-arch,
	linux-kernel

On Fri, Jan 27, 2023 at 02:49:46PM +0100, Jules Maselbas wrote:
> Hi Peter,
> 
> On Fri, Jan 27, 2023 at 12:18:13PM +0100, Peter Zijlstra wrote:
> > On Thu, Jan 26, 2023 at 06:33:54PM +0100, Jules Maselbas wrote:
> > 
> > > @@ -58,9 +61,11 @@ static inline int generic_atomic_fetch_##op(int i, atomic_t *v)		\
> > >  static inline void generic_atomic_##op(int i, atomic_t *v)		\
> > >  {									\
> > >  	unsigned long flags;						\
> > > +	int c;								\
> > >  									\
> > >  	raw_local_irq_save(flags);					\
> > > -	v->counter = v->counter c_op i;					\
> > > +	c = arch_atomic_read(v);					\
> > > +	arch_atomic_set(v, c c_op i);					\
> > >  	raw_local_irq_restore(flags);					\
> > >  }
> > 
> > This and the others like it are a bit sad, it explicitly dis-allows the
> > compiler from using memops and forces a load-store.
> Good point, I don't know much about atomic memops but this is indeed a
> bit sad to prevent such instructions to be used.

Depends on the platform, x86,s390 etc. have then, RISC like things
typically don't.

> > The alternative is writing it like:
> > 
> > 	*(volatile int *)&v->counter c_op i;
> I wonder if it could be possible to write something like:
> 
>         *(volatile int *)&v->counter += i;

Should work, but give it a try, see what it does :-)

> I also noticed that GCC has some builtin/extension to do such things,
> __atomic_OP_fetch and __atomic_fetch_OP, but I do not know if this
> can be used in the kernel.

On a per-architecture basis only, the C/C++ memory model does not match
the Linux Kernel memory model so using the compiler to generate the
atomic ops is somewhat tricky and needs architecture audits.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] locking/atomic: atomic: Use arch_atomic_{read,set} in generic atomic ops
  2023-01-27 14:34     ` Peter Zijlstra
@ 2023-01-27 22:09       ` Boqun Feng
  2023-01-30 12:23         ` Jonas Oberhauser
  2023-01-30 18:15       ` Jules Maselbas
  1 sibling, 1 reply; 11+ messages in thread
From: Boqun Feng @ 2023-01-27 22:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jules Maselbas, Will Deacon, Mark Rutland, Arnd Bergmann,
	linux-arch, linux-kernel, Alan Stern, Andrea Parri,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
	Jonas Oberhauser, Hernan Ponce de Leon, Paul Heidekrüger,
	Marco Elver, Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho,
	Gary Guo, Björn Roy Baron

On Fri, Jan 27, 2023 at 03:34:33PM +0100, Peter Zijlstra wrote:
> > I also noticed that GCC has some builtin/extension to do such things,
> > __atomic_OP_fetch and __atomic_fetch_OP, but I do not know if this
> > can be used in the kernel.
> 
> On a per-architecture basis only, the C/C++ memory model does not match
> the Linux Kernel memory model so using the compiler to generate the
> atomic ops is somewhat tricky and needs architecture audits.

Hijack this thread a little bit, but while we are at it, do you think it
makes sense that we have a config option that allows archs to
implement LKMM atomics via C11 (volatile) atomics? I know there are gaps
between two memory models, but the option is only for fallback/generic
implementation so we can put extra barriers/orderings to make things
guaranteed to work.

It'll be a code version of this document:

	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0124r7.html

(although I realise there may be a few mistakes in that doc since I
wasn't familiar with C11 memory model when I wrote part of the doc, but
these can be fixed)

Another reason I ask is that since Rust is coming, we need to provide
our LKMM atomics in Rust so that C code and Rust code can talk via same
atomic variables, since both sides need to use the same memory model.
My choices are:

1.	Using FFI to call Linux atomic APIs: not inline therefore not
	efficient.

2.	Implementing Rust LKMM atomics in asm: much more work although
	I'm OK if we have to do it.

3.	Implementing Rust LKMM atomics with standard atomics (i.e. C/C++
	atomics):

	*	Requires Rust has "volatile" atomics, which is WIP but
		looks promising
	
	*	Less efficient compared to choice #2 but more efficient
		compared to choice #1

Ideally, choice #2 is the best option for all architectures, however, if
we have the generic implementation based on choice #3, for some archs it
may be good enough.

Thoughts?

[Cc LKMM and Rust people]

Regards,
Boqun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] locking/atomic: atomic: Use arch_atomic_{read,set} in generic atomic ops
  2023-01-27 22:09       ` Boqun Feng
@ 2023-01-30 12:23         ` Jonas Oberhauser
  2023-01-30 18:38           ` Boqun Feng
  0 siblings, 1 reply; 11+ messages in thread
From: Jonas Oberhauser @ 2023-01-30 12:23 UTC (permalink / raw)
  To: Boqun Feng, Peter Zijlstra
  Cc: Jules Maselbas, Will Deacon, Mark Rutland, Arnd Bergmann,
	linux-arch, linux-kernel, Alan Stern, Andrea Parri,
	Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget,
	Paul E. McKenney, Akira Yokosawa, Daniel Lustig, Joel Fernandes,
	Hernan Ponce de Leon, Paul Heidekrüger, Marco Elver,
	Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho, Gary Guo,
	Björn Roy Baron



On 1/27/2023 11:09 PM, Boqun Feng wrote:
> On Fri, Jan 27, 2023 at 03:34:33PM +0100, Peter Zijlstra wrote:
>>> I also noticed that GCC has some builtin/extension to do such things,
>>> __atomic_OP_fetch and __atomic_fetch_OP, but I do not know if this
>>> can be used in the kernel.
>> On a per-architecture basis only, the C/C++ memory model does not match
>> the Linux Kernel memory model so using the compiler to generate the
>> atomic ops is somewhat tricky and needs architecture audits.
> Hijack this thread a little bit, but while we are at it, do you think it
> makes sense that we have a config option that allows archs to
> implement LKMM atomics via C11 (volatile) atomics? I know there are gaps
> between two memory models, but the option is only for fallback/generic
> implementation so we can put extra barriers/orderings to make things
> guaranteed to work.
>
> It'll be a code version of this document:
>
> 	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0124r7.html
>
> (although I realise there may be a few mistakes in that doc since I
> wasn't familiar with C11 memory model when I wrote part of the doc, but
> these can be fixed)
>
> Another reason I ask is that since Rust is coming, we need to provide
> our LKMM atomics in Rust so that C code and Rust code can talk via same
> atomic variables, since both sides need to use the same memory model.
> My choices are:
>
> 1.	Using FFI to call Linux atomic APIs: not inline therefore not
> 	efficient.
>
> 2.	Implementing Rust LKMM atomics in asm: much more work although
> 	I'm OK if we have to do it.
>
> 3.	Implementing Rust LKMM atomics with standard atomics (i.e. C/C++
> 	atomics):
>
> 	*	Requires Rust has "volatile" atomics, which is WIP but
> 		looks promising
> 	
> 	*	Less efficient compared to choice #2 but more efficient
> 		compared to choice #1
>
> Ideally, choice #2 is the best option for all architectures, however, if
> we have the generic implementation based on choice #3, for some archs it
> may be good enough.
>
> Thoughts?

Thanks for adding me to the discussion!

One reason not to rely on C11 is that old compilers don't support it, 
and there may be application scenarios in which new compilers haven't 
been certified.
I don't know if this is something that affects linux, but linux is so 
big and versatile I'd be surprised if that's irrelevant.

Another is that the C11 model is more about atomic locations than atomic 
accesses, and there are several places in the kernel where a location is 
accessed both atomically and non-atomically. This API mismatch is more 
severe than the semantic differences in my opinion, since you don't have 
guarantees of what the layout of atomics is going to be.

Perhaps you could instead rely on the compiler builtins? Note that this 
may invalidate some progress properties, e.g., ticket locks become 
unfair if the increment (for taking a ticket) is implemented with a CAS 
loop (because a thread can fail forever to get a ticket if the ticket 
counter is contended, and thus starve). There may be some linux atomics 
that don't map to any compiler builtins and need to implemented with 
such CAS loops, potentially leading to such problems.

I'm also curious whether link time optimization can resolve the inlining 
issue?

I think another big question for me is to which extent it makes sense 
anyways to have shared memory concurrency between the Rust code and the 
C code. It seems all the bad concurrency stuff from the C world would 
flow into the Rust world, right?
If you can live without shared Rust & C concurrency, then perhaps you 
can get away without using LKMM in Rust at all, and just rely on its 
(C11-like) memory model internally and talk to the C code through 
synchronous, safer ways.

I'm not against having a fallback builtin-based implementation of LKMM, 
and I don't think that it really needs architecture audits. What it 
needs is some additional compiler barriers and memory barriers, to 
ensure that the arguments about dependencies and non-atomics still hold. 
E.g., a release store may not just be "builtin release store" but may 
need to have a compiler barrier to prevent the release store being moved 
in program order. And a "full barrier" exchange may need an mb() infront 
of the operation to avoid "roach motel ordering" (i.e.,  x=1 ; "full 
barrier exchange"; y = 1 allows y=1 to execute before x=1 in the 
compiler builtins as far as I remember). And there may be some other 
cases like this.

But I currently don't see that this implementation would be noticeably 
faster than paying the overhead of lack of inline.

Best wishes, jonas


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] locking/atomic: atomic: Use arch_atomic_{read,set} in generic atomic ops
  2023-01-27 14:34     ` Peter Zijlstra
  2023-01-27 22:09       ` Boqun Feng
@ 2023-01-30 18:15       ` Jules Maselbas
  1 sibling, 0 replies; 11+ messages in thread
From: Jules Maselbas @ 2023-01-30 18:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Boqun Feng, Will Deacon, Mark Rutland, Arnd Bergmann, linux-arch,
	linux-kernel, Alan Stern, Andrea Parri, Nicholas Piggin,
	David Howells, Jade Alglave, Luc Maranget, Paul E. McKenney,
	Akira Yokosawa, Daniel Lustig, Joel Fernandes, Jonas Oberhauser,
	Hernan Ponce de Leon, Paul Heidekrüger, Marco Elver,
	Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho, Gary Guo,
	Björn Roy Baron, Yann Sionneau

Hi Peter,

On Fri, Jan 27, 2023 at 03:34:33PM +0100, Peter Zijlstra wrote:
> On Fri, Jan 27, 2023 at 02:49:46PM +0100, Jules Maselbas wrote:
> > Hi Peter,
> > 
> > On Fri, Jan 27, 2023 at 12:18:13PM +0100, Peter Zijlstra wrote:
> > > On Thu, Jan 26, 2023 at 06:33:54PM +0100, Jules Maselbas wrote:
> > > 
> > > > @@ -58,9 +61,11 @@ static inline int generic_atomic_fetch_##op(int i, atomic_t *v)		\
> > > >  static inline void generic_atomic_##op(int i, atomic_t *v)		\
> > > >  {									\
> > > >  	unsigned long flags;						\
> > > > +	int c;								\
> > > >  									\
> > > >  	raw_local_irq_save(flags);					\
> > > > -	v->counter = v->counter c_op i;					\
> > > > +	c = arch_atomic_read(v);					\
> > > > +	arch_atomic_set(v, c c_op i);					\
> > > >  	raw_local_irq_restore(flags);					\
> > > >  }
> > > 
> > > This and the others like it are a bit sad, it explicitly dis-allows the
> > > compiler from using memops and forces a load-store.
> > Good point, I don't know much about atomic memops but this is indeed a
> > bit sad to prevent such instructions to be used.
> 
> Depends on the platform, x86,s390 etc. have then, RISC like things
> typically don't.
> 
> > > The alternative is writing it like:
> > > 
> > > 	*(volatile int *)&v->counter c_op i;
> > I wonder if it could be possible to write something like:
> > 
> >         *(volatile int *)&v->counter += i;
> 
> Should work, but give it a try, see what it does :-)
> 

I've made a quick test on godbolt[1] and I don't see a major difference
between the old version and the new version I propose. I am not very
familiar with both x86 and s390 architecture and I might have missed an
option for gcc to automagically generate "memops" instructions.

[1] https://godbolt.org/z/nrvvMs9b6

From my understanding s390 has instructions to read a value from memory
and add a value, but still needs to be written by another instruction.

x86 is not using the generic atomic code, but has its own implementation
of atomic memory operations using lock {add,...} instructions.

The goal of the proposed patch is to make the generic code more correct:
| I don't think that's true; without READ_ONCE() the compiler could (but
| is very unlikely to) read multiple times, and that could cause problems.
explained by Mark Rutland here:
https://lore.kernel.org/lkml/Y71LoCIl+IFdy9D8@FVFF77S0Q05N/

I still have some open questions:

 - Maybe in SMP the generic_atomic_* functions should use READ_ONCE
instead of arch_atomic_read, since only the "once" part is what is
needed, and the atomicity is done by the cmpxchg.

 - I have the feeling that in non-SMP we do not need the atomicity at all.


Thanks
-- Jules

> > I also noticed that GCC has some builtin/extension to do such things,
> > __atomic_OP_fetch and __atomic_fetch_OP, but I do not know if this
> > can be used in the kernel.
> 
> On a per-architecture basis only, the C/C++ memory model does not match
> the Linux Kernel memory model so using the compiler to generate the
> atomic ops is somewhat tricky and needs architecture audits.
> 
> 
> 
> 





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] locking/atomic: atomic: Use arch_atomic_{read,set} in generic atomic ops
  2023-01-30 12:23         ` Jonas Oberhauser
@ 2023-01-30 18:38           ` Boqun Feng
  2023-01-31 15:08             ` Jonas Oberhauser
  0 siblings, 1 reply; 11+ messages in thread
From: Boqun Feng @ 2023-01-30 18:38 UTC (permalink / raw)
  To: Jonas Oberhauser
  Cc: Peter Zijlstra, Jules Maselbas, Will Deacon, Mark Rutland,
	Arnd Bergmann, linux-arch, linux-kernel, Alan Stern,
	Andrea Parri, Nicholas Piggin, David Howells, Jade Alglave,
	Luc Maranget, Paul E. McKenney, Akira Yokosawa, Daniel Lustig,
	Joel Fernandes, Hernan Ponce de Leon, Paul Heidekrüger,
	Marco Elver, Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho,
	Gary Guo, Björn Roy Baron

On Mon, Jan 30, 2023 at 01:23:28PM +0100, Jonas Oberhauser wrote:
> 
> 
> On 1/27/2023 11:09 PM, Boqun Feng wrote:
> > On Fri, Jan 27, 2023 at 03:34:33PM +0100, Peter Zijlstra wrote:
> > > > I also noticed that GCC has some builtin/extension to do such things,
> > > > __atomic_OP_fetch and __atomic_fetch_OP, but I do not know if this
> > > > can be used in the kernel.
> > > On a per-architecture basis only, the C/C++ memory model does not match
> > > the Linux Kernel memory model so using the compiler to generate the
> > > atomic ops is somewhat tricky and needs architecture audits.
> > Hijack this thread a little bit, but while we are at it, do you think it
> > makes sense that we have a config option that allows archs to
> > implement LKMM atomics via C11 (volatile) atomics? I know there are gaps
> > between two memory models, but the option is only for fallback/generic
> > implementation so we can put extra barriers/orderings to make things
> > guaranteed to work.
> > 
> > It'll be a code version of this document:
> > 
> > 	https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0124r7.html
> > 
> > (although I realise there may be a few mistakes in that doc since I
> > wasn't familiar with C11 memory model when I wrote part of the doc, but
> > these can be fixed)
> > 
> > Another reason I ask is that since Rust is coming, we need to provide
> > our LKMM atomics in Rust so that C code and Rust code can talk via same
> > atomic variables, since both sides need to use the same memory model.
> > My choices are:
> > 
> > 1.	Using FFI to call Linux atomic APIs: not inline therefore not
> > 	efficient.
> > 
> > 2.	Implementing Rust LKMM atomics in asm: much more work although
> > 	I'm OK if we have to do it.
> > 
> > 3.	Implementing Rust LKMM atomics with standard atomics (i.e. C/C++
> > 	atomics):
> > 
> > 	*	Requires Rust has "volatile" atomics, which is WIP but
> > 		looks promising
> > 	
> > 	*	Less efficient compared to choice #2 but more efficient
> > 		compared to choice #1
> > 
> > Ideally, choice #2 is the best option for all architectures, however, if
> > we have the generic implementation based on choice #3, for some archs it
> > may be good enough.
> > 
> > Thoughts?
> 
> Thanks for adding me to the discussion!
> 
> One reason not to rely on C11 is that old compilers don't support it, and
> there may be application scenarios in which new compilers haven't been
> certified.
> I don't know if this is something that affects linux, but linux is so big
> and versatile I'd be surprised if that's irrelevant.
> 

We are gnu11 since last year:

	e8c07082a810 ("Kbuild: move to -std=gnu11")

, so at least for mainline I don't see a problem to use c11 atomics.

> Another is that the C11 model is more about atomic locations than atomic
> accesses, and there are several places in the kernel where a location is
> accessed both atomically and non-atomically. This API mismatch is more
> severe than the semantic differences in my opinion, since you don't have
> guarantees of what the layout of atomics is going to be.
> 

True, but the same problem for our asm implemented atomics, right? My
plan is to do (volatile atomic_int *) casts on these locations.

> Perhaps you could instead rely on the compiler builtins? Note that this may

These are less formal/defined to me, and not much research on them I
assume, I'd rather not use them.

> invalidate some progress properties, e.g., ticket locks become unfair if the
> increment (for taking a ticket) is implemented with a CAS loop (because a
> thread can fail forever to get a ticket if the ticket counter is contended,
> and thus starve). There may be some linux atomics that don't map to any
> compiler builtins and need to implemented with such CAS loops, potentially
> leading to such problems.
> 
> I'm also curious whether link time optimization can resolve the inlining
> issue?
> 

For Rust case, cross-language LTO is needed I think, and last time I
tried, it didn't work.

> I think another big question for me is to which extent it makes sense
> anyways to have shared memory concurrency between the Rust code and the C
> code. It seems all the bad concurrency stuff from the C world would flow
> into the Rust world, right?

What do you mean by "bad" ;-) ;-) ;-)

> If you can live without shared Rust & C concurrency, then perhaps you can
> get away without using LKMM in Rust at all, and just rely on its (C11-like)
> memory model internally and talk to the C code through synchronous, safer
> ways.
> 

First I don't think I can avoid using LKMM in Rust, besides the
communication from two sides, what if kernel developers just want to
use the memory model they learn and understand (i.e. LKMM) in a new Rust
driver? They probably already have a working parallel algorithm based on
LKMM.

Further, let's say we make C and Rust talk without shared memory
concurrency, what would that be? Will it more defined/formal the LKMM?
How's the cost if we use synchronous ways? I personally think there are
places in core kernel where Rust can be tried, whatever the mechanism is
used, it cannot sarcrifed.

> I'm not against having a fallback builtin-based implementation of LKMM, and
> I don't think that it really needs architecture audits. What it needs is

Fun fact, there exist some "optimizations" that don't generate the asm
code as you want:

	https://github.com/llvm/llvm-project/issues/56450

Needless to say, they are bugs, and will be fixed, besides making atomic
volatile seems to avoid these "optimizations"

> some additional compiler barriers and memory barriers, to ensure that the
> arguments about dependencies and non-atomics still hold. E.g., a release
> store may not just be "builtin release store" but may need to have a
> compiler barrier to prevent the release store being moved in program order.
> And a "full barrier" exchange may need an mb() infront of the operation to
> avoid "roach motel ordering" (i.e.,  x=1 ; "full barrier exchange"; y = 1
> allows y=1 to execute before x=1 in the compiler builtins as far as I
> remember). And there may be some other cases like this.
> 

Agreed. And this is another reason I want to do it: I'm curious about
how far C11 memory model and LKMM are different, and whether there is a
way to implement one by another, what are the gaps (theorical and
pratical), whether the ordering we have in LKMM can be implemented by
compilers (mostly dependencies). More importantly, we could improve both
to get something better? With the ability to exactly express the
programmers' intention yet still allow optimization by the compilers.

> But I currently don't see that this implementation would be noticeably
> faster than paying the overhead of lack of inline.
> 

You are not wrong, surely we will need to real benchmark to know. But my
rationale is 1) in theory this is faster, 2) we also get a chance to try
out code based on LKMM with C11 atomics to see where it hurts. Therefore
I asked ;-)

Regards,
Boqun

> Best wishes, jonas
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] locking/atomic: atomic: Use arch_atomic_{read,set} in generic atomic ops
  2023-01-30 18:38           ` Boqun Feng
@ 2023-01-31 15:08             ` Jonas Oberhauser
  2023-01-31 22:03               ` Boqun Feng
  0 siblings, 1 reply; 11+ messages in thread
From: Jonas Oberhauser @ 2023-01-31 15:08 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Peter Zijlstra, Jules Maselbas, Will Deacon, Mark Rutland,
	Arnd Bergmann, linux-arch, linux-kernel, Alan Stern,
	Andrea Parri, Nicholas Piggin, David Howells, Jade Alglave,
	Luc Maranget, Paul E. McKenney, Akira Yokosawa, Daniel Lustig,
	Joel Fernandes, Hernan Ponce de Leon, Paul Heidekrüger,
	Marco Elver, Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho,
	Gary Guo, Björn Roy Baron



On 1/30/2023 7:38 PM, Boqun Feng wrote:
> On Mon, Jan 30, 2023 at 01:23:28PM +0100, Jonas Oberhauser wrote:
>>
>> On 1/27/2023 11:09 PM, Boqun Feng wrote:
>>> On Fri, Jan 27, 2023 at 03:34:33PM +0100, Peter Zijlstra wrote:
>>>>> I also noticed that GCC has some builtin/extension to do such things,
>>>>> __atomic_OP_fetch and __atomic_fetch_OP, but I do not know if this
>>>>> can be used in the kernel.
>>>> On a per-architecture basis only, the C/C++ memory model does not match
>>>> the Linux Kernel memory model so using the compiler to generate the
>>>> atomic ops is somewhat tricky and needs architecture audits.
>>> Hijack this thread a little bit, but while we are at it, do you think it
>>> makes sense that we have a config option that allows archs to
>>> implement LKMM atomics via C11 (volatile) atomics? I know there are gaps
>>> between two memory models, but the option is only for fallback/generic
>>> implementation so we can put extra barriers/orderings to make things
>>> guaranteed to work.
>> Another is that the C11 model is more about atomic locations than atomic
>> accesses, and there are several places in the kernel where a location is
>> accessed both atomically and non-atomically. This API mismatch is more
>> severe than the semantic differences in my opinion, since you don't have
>> guarantees of what the layout of atomics is going to be.
>>
> True, but the same problem for our asm implemented atomics, right? My
> plan is to do (volatile atomic_int *) casts on these locations.

Do you? I think LKMM atomic types are always exactly as big as the 
underlying types.
With C you might get into a case where the atomic_int is actually [lock 
; non-atomic int] and when you access the location in a mixed way, you 
will non-atomically read the lock part but the atomic accesses modify 
the int part (protected by the lock).

>
>> Perhaps you could instead rely on the compiler builtins? Note that this may
> These are less formal/defined to me, and not much research on them I
> assume, I'd rather not use them.

I think that it's easy enough to define a formal model of these that is 
a bit conservative, and then just add mb()s to make them safe.

>
>> invalidate some progress properties, e.g., ticket locks become unfair if the
>> increment (for taking a ticket) is implemented with a CAS loop (because a
>> thread can fail forever to get a ticket if the ticket counter is contended,
>> and thus starve). There may be some linux atomics that don't map to any
>> compiler builtins and need to implemented with such CAS loops, potentially
>> leading to such problems.
>>
>> I'm also curious whether link time optimization can resolve the inlining
>> issue?
>>
> For Rust case, cross-language LTO is needed I think, and last time I
> tried, it didn't work.

In German we say "Was noch nicht ist kann ja noch werden", translated as 
"what isn't can yet become", I don't feel like putting too much effort 
into something that hardly affects performance and will hopefully become 
obsolete at some point in the near future.

>
>> I think another big question for me is to which extent it makes sense
>> anyways to have shared memory concurrency between the Rust code and the C
>> code. It seems all the bad concurrency stuff from the C world would flow
>> into the Rust world, right?
> What do you mean by "bad" ;-) ;-) ;-)

Uh oh. Let's pretend I didn't say anything :D

>> If you can live without shared Rust & C concurrency, then perhaps you can
>> get away without using LKMM in Rust at all, and just rely on its (C11-like)
>> memory model internally and talk to the C code through synchronous, safer
>> ways.
>>
> First I don't think I can avoid using LKMM in Rust, besides the
> communication from two sides, what if kernel developers just want to
> use the memory model they learn and understand (i.e. LKMM) in a new Rust
> driver?

I'd rather people think 10 times before relying on atomics to write Rust 
code.
There may be cases where it can't be avoided because of performance 
reasons, but Rust has a much more convenient concurrency model to offer 
than atomics.
I think a lot more people understand Rust mutexes or channels compared 
to atomics.
Unfortunately I haven't written much driver code so I don't have 
experience to what extent it's generally necessary to rely on atomics :(


> They probably already have a working parallel algorithm based on
> LKMM.
>
> Further, let's say we make C and Rust talk without shared memory
> concurrency, what would that be? Will it more defined/formal the LKMM?

Normal FFI with sequential shared memory (passing a buffer sequentially 
through FFI)?

> How's the cost if we use synchronous ways? I personally think there are
> places in core kernel where Rust can be tried, whatever the mechanism is
> used, it cannot sarcrifed.

Note that if you turn a C atomics-based implementation into a Rust 
atomics-based implementation, you'll probably need some unsafe when you 
"consume" the data synchronized through atomics (Sebastian Humenda and 
Jonathan Schwender explained this to me recently), and any memory safety 
bugs coming from that unsafe part can be due to incorrect use of the 
atomics in the rest of the algorithm. So you don't really gain much 
memory safety compared to the C implementation.

Hopefully you can write a small safe Rust API around your mechanism, and 
then constrain its possibly unsafe atomics use inside the library.
But the motivation for porting that mechanism to Rust atomics rather 
than relying on FFI is low for me.

>> I'm not against having a fallback builtin-based implementation of LKMM, and
>> I don't think that it really needs architecture audits. What it needs is
> Fun fact, there exist some "optimizations" that don't generate the asm
> code as you want:
>
> 	https://github.com/llvm/llvm-project/issues/56450
>
> Needless to say, they are bugs, and will be fixed, besides making atomic
> volatile seems to avoid these "optimizations"


Haha. Reminds me of a similar issue with failed CAS, which are usually 
only reads but are considered to write back the same value in some 
papers, leading to some incorrect results.


>> some additional compiler barriers and memory barriers, to ensure that the
>> arguments about dependencies and non-atomics still hold. E.g., a release
>> store may not just be "builtin release store" but may need to have a
>> compiler barrier to prevent the release store being moved in program order.
>> And a "full barrier" exchange may need an mb() infront of the operation to
>> avoid "roach motel ordering" (i.e.,  x=1 ; "full barrier exchange"; y = 1
>> allows y=1 to execute before x=1 in the compiler builtins as far as I
>> remember). And there may be some other cases like this.
>>
> Agreed. And this is another reason I want to do it: I'm curious about
> how far C11 memory model and LKMM are different, and whether there is a
> way to implement one by another, what are the gaps (theorical and
> pratical), whether the ordering we have in LKMM can be implemented by
> compilers (mostly dependencies).


I could imagine you'll find lots of differences, starting from SC 
ordering, dependencies, local ordering of barriers, propagation...
I generally categorize memory models in 4 tiers:
- po | rfe | coe | fre is acyclic  (SC)
- ppo | rfe | coe | fre is acyclic (Arm, x86)
- ppo | rfe | pb is acyclic (LKMM, Power)
- synchronize-with | po is acyclic (C11)

Just structurally, C11 and LKMM are extremely different. Because C11 has 
no notion of ppo, barriers have no local ordering effect. Only globally 
when matched with another barrier on another thread.
Bridging this gap will be hard!


> More importantly, we could improve both
> to get something better? With the ability to exactly express the
> programmers' intention yet still allow optimization by the compilers.


In my humble opinion, programmers shouldn't have the intention of 
relying on dependency ordering :D
I've had a few times people ask me if some barrier could be relaxed due 
to dependency ordering, and I've said "I honestly don't care if it can 
be relaxed until there's some evidence it improves performance".
And I haven't yet seen any convincing evidence that relying on a 
dependency rather than an acquire improves end-to-end performance, which 
isn't surprising considering the hardware optimizations for acquire. 
(But we also haven't measured on PowerPC, where perhaps the lwsync might 
be a tad more expensive).

Nevertheless, ruling out OOTA has some nice practical properties for 
verification methodology.
I think most of the potential improvement is in identifying the cases in 
which compilers won't eliminate dependencies, and showing that this 
covers all the ones that are necessary to rule out OOTA. Namely the ones 
in which the store might be observed by other threads, and the values of 
the read that lead to this store being produced can only come from 
specific stores of other threads.
And then C can follow Viktor's suggestion to just explicitly rule out 
OOTA, without any concrete explanation of how (and without making 
dependencies provide order beyond that).

That said, for a huge code base like Linux where some code just *does* 
rely on dependency ordering, you can't go and kick out that ordering 
from the model. That's why I think it's important to keep it in LKMM, 
but not necessarily add it to C.


>> But I currently don't see that this implementation would be noticeably
>> faster than paying the overhead of lack of inline.
>>
> You are not wrong, surely we will need to real benchmark to know. But my
> rationale is 1) in theory this is faster, 2) we also get a chance to try
> out code based on LKMM with C11 atomics to see where it hurts. Therefore
> I asked ;-)

I think one beautiful thing about open source is that nobody can stop 
you from trying it out yourself :D
It's currently not something I would personally put resources into though.
You could pick a DPDK or CK algorithm and port it to a version using FFI 
instead of using atomics directly, and measure the impact on some 
microbenchmarks.
Then consider that the end-to-end impact on Linux will probably be at 
least one or two orders of magnitude less.

Have fun, jonas


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] locking/atomic: atomic: Use arch_atomic_{read,set} in generic atomic ops
  2023-01-31 15:08             ` Jonas Oberhauser
@ 2023-01-31 22:03               ` Boqun Feng
  2023-02-01 10:51                 ` Jonas Oberhauser
  0 siblings, 1 reply; 11+ messages in thread
From: Boqun Feng @ 2023-01-31 22:03 UTC (permalink / raw)
  To: Jonas Oberhauser
  Cc: Peter Zijlstra, Jules Maselbas, Will Deacon, Mark Rutland,
	Arnd Bergmann, linux-arch, linux-kernel, Alan Stern,
	Andrea Parri, Nicholas Piggin, David Howells, Jade Alglave,
	Luc Maranget, Paul E. McKenney, Akira Yokosawa, Daniel Lustig,
	Joel Fernandes, Hernan Ponce de Leon, Paul Heidekrüger,
	Marco Elver, Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho,
	Gary Guo, Björn Roy Baron

On Tue, Jan 31, 2023 at 04:08:29PM +0100, Jonas Oberhauser wrote:
> 
> 
> On 1/30/2023 7:38 PM, Boqun Feng wrote:
> > On Mon, Jan 30, 2023 at 01:23:28PM +0100, Jonas Oberhauser wrote:
> > > 
> > > On 1/27/2023 11:09 PM, Boqun Feng wrote:
> > > > On Fri, Jan 27, 2023 at 03:34:33PM +0100, Peter Zijlstra wrote:
> > > > > > I also noticed that GCC has some builtin/extension to do such things,
> > > > > > __atomic_OP_fetch and __atomic_fetch_OP, but I do not know if this
> > > > > > can be used in the kernel.
> > > > > On a per-architecture basis only, the C/C++ memory model does not match
> > > > > the Linux Kernel memory model so using the compiler to generate the
> > > > > atomic ops is somewhat tricky and needs architecture audits.
> > > > Hijack this thread a little bit, but while we are at it, do you think it
> > > > makes sense that we have a config option that allows archs to
> > > > implement LKMM atomics via C11 (volatile) atomics? I know there are gaps
> > > > between two memory models, but the option is only for fallback/generic
> > > > implementation so we can put extra barriers/orderings to make things
> > > > guaranteed to work.
> > > Another is that the C11 model is more about atomic locations than atomic
> > > accesses, and there are several places in the kernel where a location is
> > > accessed both atomically and non-atomically. This API mismatch is more
> > > severe than the semantic differences in my opinion, since you don't have
> > > guarantees of what the layout of atomics is going to be.
> > > 
> > True, but the same problem for our asm implemented atomics, right? My
> > plan is to do (volatile atomic_int *) casts on these locations.
> 
> Do you? I think LKMM atomic types are always exactly as big as the
> underlying types.
> With C you might get into a case where the atomic_int is actually [lock ;
> non-atomic int] and when you access the location in a mixed way, you will
> non-atomically read the lock part but the atomic accesses modify the int
> part (protected by the lock).
> 

Well, I plan is to check ATOMIC_*_LOCK_FREE and have Static_assert() on
the side of atomic_int ;-)

> > 
> > > Perhaps you could instead rely on the compiler builtins? Note that this may
> > These are less formal/defined to me, and not much research on them I
> > assume, I'd rather not use them.
> 
> I think that it's easy enough to define a formal model of these that is a
> bit conservative, and then just add mb()s to make them safe.
> 

I actually have a similar idea about the communication betweet Rust and
kernel C with atomic variables: Rust can use its standard atomics (i.e.
C11/LLVM atomics) but with mb()s to make them safe when talking with C.

Of course, no problem about pure Rust code using pure LLVM atomics.

> > 
> > > invalidate some progress properties, e.g., ticket locks become unfair if the
> > > increment (for taking a ticket) is implemented with a CAS loop (because a
> > > thread can fail forever to get a ticket if the ticket counter is contended,
> > > and thus starve). There may be some linux atomics that don't map to any
> > > compiler builtins and need to implemented with such CAS loops, potentially
> > > leading to such problems.
> > > 
> > > I'm also curious whether link time optimization can resolve the inlining
> > > issue?
> > > 
> > For Rust case, cross-language LTO is needed I think, and last time I
> > tried, it didn't work.
> 
> In German we say "Was noch nicht ist kann ja noch werden", translated as
> "what isn't can yet become", I don't feel like putting too much effort into

Not too much compared to wrapping LKMM atomics with Rust using FFI,

Using FFI:

	impl Atomic {
		fn read_acquire(&self) -> i32 {
			// SAFTEY:
			unsafe { atomic_read_acquire(self as _) }
		}
	}

Using standard atomics:

	impl Atomic {
		fn read_acquire(&self) -> i32 {
			// self.0 is a Rust AtomicI32
			compiler_fence(SeqCst); // Rust not support volatile atomic yet
			self.0.load(Acquire)
		}
	}

Needless to say, if we really need LKMM atomics in Rust, it's kinda my
job to implement these, so not much different for me ;-) Of course, any
help is appreciate!

> something that hardly affects performance and will hopefully become obsolete
> at some point in the near future.
> 
> > 
> > > I think another big question for me is to which extent it makes sense
> > > anyways to have shared memory concurrency between the Rust code and the C
> > > code. It seems all the bad concurrency stuff from the C world would flow
> > > into the Rust world, right?
> > What do you mean by "bad" ;-) ;-) ;-)
> 
> Uh oh. Let's pretend I didn't say anything :D
> 
> > > If you can live without shared Rust & C concurrency, then perhaps you can
> > > get away without using LKMM in Rust at all, and just rely on its (C11-like)
> > > memory model internally and talk to the C code through synchronous, safer
> > > ways.
> > > 
> > First I don't think I can avoid using LKMM in Rust, besides the
> > communication from two sides, what if kernel developers just want to
> > use the memory model they learn and understand (i.e. LKMM) in a new Rust
> > driver?
> 
> I'd rather people think 10 times before relying on atomics to write Rust
> code.
> There may be cases where it can't be avoided because of performance reasons,
> but Rust has a much more convenient concurrency model to offer than atomics.
> I think a lot more people understand Rust mutexes or channels compared to
> atomics.

C also has more convenient concurrency tools in kernel, and I'm happy
that people use them. But there are also people (including me) working
on building these tools/models, inevitably we need to use atomics. And
when we use the atomics, the biggest question is which one to use?
Right, it seems that I'm responsible for the answer because I have
multiple hats on. Trust me, it's not something I like to think about but
I have to ;-)

> Unfortunately I haven't written much driver code so I don't have experience
> to what extent it's generally necessary to rely on atomics :(
> 
> 

[snip some good tech inputs, I will reply later]

> 
> > > But I currently don't see that this implementation would be noticeably
> > > faster than paying the overhead of lack of inline.
> > > 
> > You are not wrong, surely we will need to real benchmark to know. But my
> > rationale is 1) in theory this is faster, 2) we also get a chance to try
> > out code based on LKMM with C11 atomics to see where it hurts. Therefore
> > I asked ;-)
> 
> I think one beautiful thing about open source is that nobody can stop you
> from trying it out yourself :D

I wish this is a personal side project that I can give up whenever I
want ;-)

The key question is that when building something in Rust for Linux using
atomics, and also talking with C side with atomics, how do we handle the
different between two memory models? A few options I can think of:

1.	Use Rust standard atomics and pretend different memory models
	work together (do we have model tools to handle code in
	different models communicating with each other?)

2.	Use Rust standard atomics and add extra mb()s to enforce more
	ordering guarantee.

3.	Implement LKMM atomics in Rust and use them with caution when
	comes to implicit ordering guarantees such as ppo. In fact lots
	of implicit ordering guarantees are available since the compiler
	won't exploit the potential reordering to "optimize", we also
	kinda have tools to check:

		https://lpc.events/event/16/contributions/1174/attachments/1108/2121/Status%20Report%20-%20Broken%20Dependency%20Orderings%20in%20the%20Linux%20Kernel.pdf

	A good part of using Rust is that we may try out a few tricks
	(with proc-macro, compiler plugs, etc) to express some ordering
	expection, e.g. control dependencies.

	Two suboptions are:

	3.1	Implement LKMM atomics in Rust with FFI
	3.2	Implement LKMM atomics in Rust with Rust standard
		atomics

I'm happy to figure out pros and cons behind each option.

Regards,
Boqun

> It's currently not something I would personally put resources into though.
> You could pick a DPDK or CK algorithm and port it to a version using FFI
> instead of using atomics directly, and measure the impact on some
> microbenchmarks.
> Then consider that the end-to-end impact on Linux will probably be at least
> one or two orders of magnitude less.
> 
> Have fun, jonas
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] locking/atomic: atomic: Use arch_atomic_{read,set} in generic atomic ops
  2023-01-31 22:03               ` Boqun Feng
@ 2023-02-01 10:51                 ` Jonas Oberhauser
  0 siblings, 0 replies; 11+ messages in thread
From: Jonas Oberhauser @ 2023-02-01 10:51 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Peter Zijlstra, Jules Maselbas, Will Deacon, Mark Rutland,
	Arnd Bergmann, linux-arch, linux-kernel, Alan Stern,
	Andrea Parri, Nicholas Piggin, David Howells, Jade Alglave,
	Luc Maranget, Paul E. McKenney, Akira Yokosawa, Daniel Lustig,
	Joel Fernandes, Hernan Ponce de Leon, Paul Heidekrüger,
	Marco Elver, Miguel Ojeda, Alex Gaynor, Wedson Almeida Filho,
	Gary Guo, Björn Roy Baron



On 1/31/2023 11:03 PM, Boqun Feng wrote:
> On Tue, Jan 31, 2023 at 04:08:29PM +0100, Jonas Oberhauser wrote:
>>
>> On 1/30/2023 7:38 PM, Boqun Feng wrote:
>>> On Mon, Jan 30, 2023 at 01:23:28PM +0100, Jonas Oberhauser wrote:
>>>> On 1/27/2023 11:09 PM, Boqun Feng wrote:
>>>>> On Fri, Jan 27, 2023 at 03:34:33PM +0100, Peter Zijlstra wrote:
>>>>>>> I also noticed that GCC has some builtin/extension to do such things,
>>>>>>> __atomic_OP_fetch and __atomic_fetch_OP, but I do not know if this
>>>>>>> can be used in the kernel.
>>>>>> On a per-architecture basis only, the C/C++ memory model does not match
>>>>>> the Linux Kernel memory model so using the compiler to generate the
>>>>>> atomic ops is somewhat tricky and needs architecture audits.
>>>>> Hijack this thread a little bit, but while we are at it, do you think it
>>>>> makes sense that we have a config option that allows archs to
>>>>> implement LKMM atomics via C11 (volatile) atomics? I know there are gaps
>>>>> between two memory models, but the option is only for fallback/generic
>>>>> implementation so we can put extra barriers/orderings to make things
>>>>> guaranteed to work.
>>>>>

>>>> [...]
>>>> I'm also curious whether link time optimization can resolve the inlining
>>>> issue?
>>>>
>>> For Rust case, cross-language LTO is needed I think, and last time I
>>> tried, it didn't work.
>> In German we say "Was noch nicht ist kann ja noch werden", translated as
>> "what isn't can yet become", I don't feel like putting too much effort into
> Not too much compared to wrapping LKMM atomics with Rust using FFI,
>
> Using FFI:
>
> 	impl Atomic {
> 		fn read_acquire(&self) -> i32 {
> 			// SAFTEY:
> 			unsafe { atomic_read_acquire(self as _) }
> 		}
> 	}
>
> Using standard atomics:
>
> 	impl Atomic {
> 		fn read_acquire(&self) -> i32 {
> 			// self.0 is a Rust AtomicI32
> 			compiler_fence(SeqCst); // Rust not support volatile atomic yet
> 			self.0.load(Acquire)
> 		}
> 	}
>
> Needless to say, if we really need LKMM atomics in Rust, it's kinda my
> job to implement these, so not much different for me ;-) Of course, any
> help is appreciate!

I think a lot more mental effort goes into figuring out where and which 
barriers go everywhere.
But of course if there's curiosity driving you, then that small 
trade-off may be acceptable : )

>> something that hardly affects performance and will hopefully become obsolete
>> at some point in the near future.
>>
>>>> I think another big question for me is to which extent it makes sense
>>>> anyways to have shared memory concurrency between the Rust code and the C
>>>> code. It seems all the bad concurrency stuff from the C world would flow
>>>> into the Rust world, right?
>>> What do you mean by "bad" ;-) ;-) ;-)
>> Uh oh. Let's pretend I didn't say anything :D
>>
>>>> If you can live without shared Rust & C concurrency, then perhaps you can
>>>> get away without using LKMM in Rust at all, and just rely on its (C11-like)
>>>> memory model internally and talk to the C code through synchronous, safer
>>>> ways.
>>>>
>>> First I don't think I can avoid using LKMM in Rust, besides the
>>> communication from two sides, what if kernel developers just want to
>>> use the memory model they learn and understand (i.e. LKMM) in a new Rust
>>> driver?
>> I'd rather people think 10 times before relying on atomics to write Rust
>> code.
>> There may be cases where it can't be avoided because of performance reasons,
>> but Rust has a much more convenient concurrency model to offer than atomics.
>> I think a lot more people understand Rust mutexes or channels compared to
>> atomics.
> C also has more convenient concurrency tools in kernel, and I'm happy
> that people use them. But there are also people (including me) working
> on building these tools/models, inevitably we need to use atomics.

:D

> 1.	Use Rust standard atomics and pretend different memory models
> 	work together (do we have model tools to handle code in
> 	different models communicating with each other?)

I'm not aware of any generic tools, and in particular for Rust and LKMM 
it will take some thought to create interoperability.
This is because the po | sw and ppo | rfe | pb styles are so different.

I did some previous work on letting SC and x86 talk to each other 
through shared memory, which is much easier because both can be 
understood through the ppo | rfe | coe | fre lense, just that on SC 
everything is preserved; it still wasn't completely trivial because the 
SC part was actually implemented efficiently on x86 as well, so it was a 
"partial-DRF-partial-SC" kind of deal.


>
> 2.	Use Rust standard atomics and add extra mb()s to enforce more
> 	ordering guarantee.
>
> 3.	Implement LKMM atomics in Rust and use them with caution when
> 	comes to implicit ordering guarantees such as ppo. In fact lots
> 	of implicit ordering guarantees are available since the compiler
> 	won't exploit the potential reordering to "optimize", we also
> 	kinda have tools to check:
>
> 		https://lpc.events/event/16/contributions/1174/attachments/1108/2121/Status%20Report%20-%20Broken%20Dependency%20Orderings%20in%20the%20Linux%20Kernel.pdf
>
> 	A good part of using Rust is that we may try out a few tricks
> 	(with proc-macro, compiler plugs, etc) to express some ordering
> 	expection, e.g. control dependencies.
>
> 	Two suboptions are:
>
> 	3.1	Implement LKMM atomics in Rust with FFI

I'd target this one ; ) It seems the most likely to work the way people 
want, with perhaps the least effort, and a good chance of not having 
overhead at some point in the future.

> 	3.2	Implement LKMM atomics in Rust with Rust standard
> 		atomics
>
> I'm happy to figure out pros and cons behind each option.
>


Best wishes, jonas


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-02-01 10:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-26 17:33 [PATCH] locking/atomic: atomic: Use arch_atomic_{read,set} in generic atomic ops Jules Maselbas
2023-01-27 11:18 ` Peter Zijlstra
2023-01-27 13:49   ` Jules Maselbas
2023-01-27 14:34     ` Peter Zijlstra
2023-01-27 22:09       ` Boqun Feng
2023-01-30 12:23         ` Jonas Oberhauser
2023-01-30 18:38           ` Boqun Feng
2023-01-31 15:08             ` Jonas Oberhauser
2023-01-31 22:03               ` Boqun Feng
2023-02-01 10:51                 ` Jonas Oberhauser
2023-01-30 18:15       ` Jules Maselbas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.