All of lore.kernel.org
 help / color / mirror / Atom feed
* smp_mb__after_spinlock requirement too strong?
@ 2018-03-11  7:55 焦晓冬
  2018-03-12  5:44 ` Boqun Feng
  0 siblings, 1 reply; 9+ messages in thread
From: 焦晓冬 @ 2018-03-11  7:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, stern, will.deacon, torvalds, npiggin, mingo, mpe, oleg,
	benh, paulmck

Peter pointed out in this patch https://patchwork.kernel.org/patch/9771921/
that the spinning-lock used at __schedule() should be RCsc to ensure
visibility of writes prior to __schedule when the task is to be migrated to
another CPU.

And this is emphasized at the comment of the newly introduced
smp_mb__after_spinlock(),

 * This barrier must provide two things:
 *
 *   - it must guarantee a STORE before the spin_lock() is ordered against a
 *     LOAD after it, see the comments at its two usage sites.
 *
 *   - it must ensure the critical section is RCsc.
 *
 * The latter is important for cases where we observe values written by other
 * CPUs in spin-loops, without barriers, while being subject to scheduling.
 *
 * CPU0         CPU1            CPU2
 *
 *          for (;;) {
 *            if (READ_ONCE(X))
 *              break;
 *          }
 * X=1
 *          <sched-out>
 *                      <sched-in>
 *                      r = X;
 *
 * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
 * we get migrated and CPU2 sees X==0.

which is used at,

__schedule(bool preempt) {
    ...
    rq_lock(rq, &rf);
    smp_mb__after_spinlock();
    ...
}
.

If I didn't miss something, I found this kind of visibility is __not__
necessarily
depends on the spinning-lock at __schedule being RCsc.

In fact, as for runnable task A, the migration would be,

 CPU0         CPU1            CPU2

<ACCESS before schedule out A>

lock(rq0)
schedule out A
unock(rq0)

              lock(rq0)
              remove A from rq0
              unlock(rq0)

              lock(rq2)
              add A into rq2
              unlock(rq2)
                                        lock(rq2)
                                        schedule in A
                                        unlock(rq2)

                                        <ACCESS after schedule in A>

<ACCESS before schedule out A> happens-before
unlock(rq0) happends-before
lock(rq0) happends-before
unlock(rq2) happens-before
lock(rq2) happens-before
<ACCESS after schedule in A>

And for stopped tasks,

 CPU0         CPU1            CPU2

<ACCESS before schedule out A>

lock(rq0)
schedule out A
remove A from rq0
store-release(A->on_cpu)
unock(rq0)

              load_acquire(A->on_cpu)
              set_task_cpu(A, 2)

              lock(rq2)
              add A into rq2
              unlock(rq2)

                                        lock(rq2)
                                        schedule in A
                                        unlock(rq2)

                                        <ACCESS after schedule in A>

<ACCESS before schedule out A> happens-before
store-release(A->on_cpu)  happens-before
load_acquire(A->on_cpu)  happens-before
unlock(rq2) happens-before
lock(rq2) happens-before
<ACCESS after schedule in A>

So, I think the only requirement to smp_mb__after_spinlock is
to guarantee a STORE before the spin_lock() is ordered
against a LOAD after it. So we could remove the RCsc requirement
to allow more efficient implementation.

Did I miss something or this RCsc requirement does not really matter?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: smp_mb__after_spinlock requirement too strong?
  2018-03-11  7:55 smp_mb__after_spinlock requirement too strong? 焦晓冬
@ 2018-03-12  5:44 ` Boqun Feng
  2018-03-12  8:18   ` 焦晓冬
  0 siblings, 1 reply; 9+ messages in thread
From: Boqun Feng @ 2018-03-12  5:44 UTC (permalink / raw)
  To: 焦晓冬
  Cc: linux-kernel, peterz, stern, will.deacon, torvalds, npiggin,
	mingo, mpe, oleg, benh, paulmck

[-- Attachment #1: Type: text/plain, Size: 3794 bytes --]

On Sun, Mar 11, 2018 at 03:55:41PM +0800, 焦晓冬 wrote:
> Peter pointed out in this patch https://patchwork.kernel.org/patch/9771921/
> that the spinning-lock used at __schedule() should be RCsc to ensure
> visibility of writes prior to __schedule when the task is to be migrated to
> another CPU.
> 
> And this is emphasized at the comment of the newly introduced
> smp_mb__after_spinlock(),
> 
>  * This barrier must provide two things:
>  *
>  *   - it must guarantee a STORE before the spin_lock() is ordered against a
>  *     LOAD after it, see the comments at its two usage sites.
>  *
>  *   - it must ensure the critical section is RCsc.
>  *
>  * The latter is important for cases where we observe values written by other
>  * CPUs in spin-loops, without barriers, while being subject to scheduling.
>  *
>  * CPU0         CPU1            CPU2
>  *
>  *          for (;;) {
>  *            if (READ_ONCE(X))
>  *              break;
>  *          }
>  * X=1
>  *          <sched-out>
>  *                      <sched-in>
>  *                      r = X;
>  *
>  * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
>  * we get migrated and CPU2 sees X==0.
> 
> which is used at,
> 
> __schedule(bool preempt) {
>     ...
>     rq_lock(rq, &rf);
>     smp_mb__after_spinlock();
>     ...
> }
> .
> 
> If I didn't miss something, I found this kind of visibility is __not__
> necessarily
> depends on the spinning-lock at __schedule being RCsc.
> 
> In fact, as for runnable task A, the migration would be,
> 
>  CPU0         CPU1            CPU2
> 
> <ACCESS before schedule out A>
> 
> lock(rq0)
> schedule out A
> unock(rq0)
> 
>               lock(rq0)
>               remove A from rq0
>               unlock(rq0)
> 
>               lock(rq2)
>               add A into rq2
>               unlock(rq2)
>                                         lock(rq2)
>                                         schedule in A
>                                         unlock(rq2)
> 
>                                         <ACCESS after schedule in A>
> 
> <ACCESS before schedule out A> happens-before
> unlock(rq0) happends-before
> lock(rq0) happends-before
> unlock(rq2) happens-before
> lock(rq2) happens-before
> <ACCESS after schedule in A>
> 

But without RCsc lock, you cannot guarantee that a write propagates to
CPU 0 and CPU 2 at the same time, so the same write may propagate to
CPU0 before <ACCESS before schedule out A> but propagate to CPU 2 after
<ACCESS after scheduler in A>. So..

Regards,
Boqun

> And for stopped tasks,
> 
>  CPU0         CPU1            CPU2
> 
> <ACCESS before schedule out A>
> 
> lock(rq0)
> schedule out A
> remove A from rq0
> store-release(A->on_cpu)
> unock(rq0)
> 
>               load_acquire(A->on_cpu)
>               set_task_cpu(A, 2)
> 
>               lock(rq2)
>               add A into rq2
>               unlock(rq2)
> 
>                                         lock(rq2)
>                                         schedule in A
>                                         unlock(rq2)
> 
>                                         <ACCESS after schedule in A>
> 
> <ACCESS before schedule out A> happens-before
> store-release(A->on_cpu)  happens-before
> load_acquire(A->on_cpu)  happens-before
> unlock(rq2) happens-before
> lock(rq2) happens-before
> <ACCESS after schedule in A>
> 
> So, I think the only requirement to smp_mb__after_spinlock is
> to guarantee a STORE before the spin_lock() is ordered
> against a LOAD after it. So we could remove the RCsc requirement
> to allow more efficient implementation.
> 
> Did I miss something or this RCsc requirement does not really matter?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: smp_mb__after_spinlock requirement too strong?
  2018-03-12  5:44 ` Boqun Feng
@ 2018-03-12  8:18   ` 焦晓冬
  2018-03-12  8:56     ` Boqun Feng
  2018-03-12 13:24     ` Andrea Parri
  0 siblings, 2 replies; 9+ messages in thread
From: 焦晓冬 @ 2018-03-12  8:18 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, peterz, Alan Stern, will.deacon, torvalds, npiggin,
	mingo, mpe, oleg, benh, Paul McKenney

>> Peter pointed out in this patch https://patchwork.kernel.org/patch/9771921/
>> that the spinning-lock used at __schedule() should be RCsc to ensure
>> visibility of writes prior to __schedule when the task is to be migrated to
>> another CPU.
>>
>> And this is emphasized at the comment of the newly introduced
>> smp_mb__after_spinlock(),
>>
>>  * This barrier must provide two things:
>>  *
>>  *   - it must guarantee a STORE before the spin_lock() is ordered against a
>>  *     LOAD after it, see the comments at its two usage sites.
>>  *
>>  *   - it must ensure the critical section is RCsc.
>>  *
>>  * The latter is important for cases where we observe values written by other
>>  * CPUs in spin-loops, without barriers, while being subject to scheduling.
>>  *
>>  * CPU0         CPU1            CPU2
>>  *
>>  *          for (;;) {
>>  *            if (READ_ONCE(X))
>>  *              break;
>>  *          }
>>  * X=1
>>  *          <sched-out>
>>  *                      <sched-in>
>>  *                      r = X;
>>  *
>>  * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
>>  * we get migrated and CPU2 sees X==0.
>>
>> which is used at,
>>
>> __schedule(bool preempt) {
>>     ...
>>     rq_lock(rq, &rf);
>>     smp_mb__after_spinlock();
>>     ...
>> }
>> .
>>
>> If I didn't miss something, I found this kind of visibility is __not__
>> necessarily
>> depends on the spinning-lock at __schedule being RCsc.
>>
>> In fact, as for runnable task A, the migration would be,
>>
>>  CPU0         CPU1            CPU2
>>
>> <ACCESS before schedule out A>
>>
>> lock(rq0)
>> schedule out A
>> unock(rq0)
>>
>>               lock(rq0)
>>               remove A from rq0
>>               unlock(rq0)
>>
>>               lock(rq2)
>>               add A into rq2
>>               unlock(rq2)
>>                                         lock(rq2)
>>                                         schedule in A
>>                                         unlock(rq2)
>>
>>                                         <ACCESS after schedule in A>
>>
>> <ACCESS before schedule out A> happens-before
>> unlock(rq0) happends-before
>> lock(rq0) happends-before
>> unlock(rq2) happens-before
>> lock(rq2) happens-before
>> <ACCESS after schedule in A>
>>
>
> But without RCsc lock, you cannot guarantee that a write propagates to
> CPU 0 and CPU 2 at the same time, so the same write may propagate to
> CPU0 before <ACCESS before schedule out A> but propagate to CPU 2 after
> <ACCESS after scheduler in A>. So..
>
> Regards,
> Boqun

Thank you for pointing out this case, Boqun.
But this is just one special case that acquire-release chains promise us.

A=B=0 as initial

  CPU0                CPU1                CPU2                CPU3
 write A=1
                           read A=1
                           write B=1
                           release X
                                                 acquire X
                                                 read A=?
                                                 release Y

    acquire Y

    read B=?

assurance 1: CPU3 will surely see B=1 writing by CPU1, and
assurance 2: CPU2 will also see A=1 writing by CPU0 as a special case

The second assurance is both in theory and implemented by real hardware.

As for theory, the C++11 memory model, which is a potential formal model
for kernel memory model as
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html
descripes, states that:

If a value computation A of an atomic object M happens before a value
computation B of M, and A takes its value from a side effect X on M, then
the value computed by B shall either be the value stored by X or the value
stored by a side effect Y on M, where Y follows X in the modification
order of M.

at
$1.10 rule 18, on page 14
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf

As for real hardware, Luc provided detailed test and explanation on
ARM and POWER in 5.1 Cumulative Barriers for WRC  on page 19
in this paper:

A Tutorial Introduction to the ARM and POWER Relaxed Memory Models
https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf

So, I think we may remove RCsc from smp_mb__after_spinlock which is
really confusing.

Best Regards,
Trol

>
>> And for stopped tasks,
>>
>>  CPU0         CPU1            CPU2
>>
>> <ACCESS before schedule out A>
>>
>> lock(rq0)
>> schedule out A
>> remove A from rq0
>> store-release(A->on_cpu)
>> unock(rq0)
>>
>>               load_acquire(A->on_cpu)
>>               set_task_cpu(A, 2)
>>
>>               lock(rq2)
>>               add A into rq2
>>               unlock(rq2)
>>
>>                                         lock(rq2)
>>                                         schedule in A
>>                                         unlock(rq2)
>>
>>                                         <ACCESS after schedule in A>
>>
>> <ACCESS before schedule out A> happens-before
>> store-release(A->on_cpu)  happens-before
>> load_acquire(A->on_cpu)  happens-before
>> unlock(rq2) happens-before
>> lock(rq2) happens-before
>> <ACCESS after schedule in A>
>>
>> So, I think the only requirement to smp_mb__after_spinlock is
>> to guarantee a STORE before the spin_lock() is ordered
>> against a LOAD after it. So we could remove the RCsc requirement
>> to allow more efficient implementation.
>>
>> Did I miss something or this RCsc requirement does not really matter?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: smp_mb__after_spinlock requirement too strong?
  2018-03-12  8:18   ` 焦晓冬
@ 2018-03-12  8:56     ` Boqun Feng
  2018-03-12  8:56       ` Peter Zijlstra
  2018-03-12 13:24     ` Andrea Parri
  1 sibling, 1 reply; 9+ messages in thread
From: Boqun Feng @ 2018-03-12  8:56 UTC (permalink / raw)
  To: 焦晓冬
  Cc: linux-kernel, peterz, Alan Stern, will.deacon, torvalds, npiggin,
	mingo, mpe, oleg, benh, Paul McKenney

[-- Attachment #1: Type: text/plain, Size: 6641 bytes --]

On Mon, Mar 12, 2018 at 04:18:00PM +0800, 焦晓冬 wrote:
> >> Peter pointed out in this patch https://patchwork.kernel.org/patch/9771921/
> >> that the spinning-lock used at __schedule() should be RCsc to ensure
> >> visibility of writes prior to __schedule when the task is to be migrated to
> >> another CPU.
> >>
> >> And this is emphasized at the comment of the newly introduced
> >> smp_mb__after_spinlock(),
> >>
> >>  * This barrier must provide two things:
> >>  *
> >>  *   - it must guarantee a STORE before the spin_lock() is ordered against a
> >>  *     LOAD after it, see the comments at its two usage sites.
> >>  *
> >>  *   - it must ensure the critical section is RCsc.
> >>  *
> >>  * The latter is important for cases where we observe values written by other
> >>  * CPUs in spin-loops, without barriers, while being subject to scheduling.
> >>  *
> >>  * CPU0         CPU1            CPU2
> >>  *
> >>  *          for (;;) {
> >>  *            if (READ_ONCE(X))
> >>  *              break;
> >>  *          }
> >>  * X=1
> >>  *          <sched-out>
> >>  *                      <sched-in>
> >>  *                      r = X;
> >>  *
> >>  * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
> >>  * we get migrated and CPU2 sees X==0.
> >>
> >> which is used at,
> >>
> >> __schedule(bool preempt) {
> >>     ...
> >>     rq_lock(rq, &rf);
> >>     smp_mb__after_spinlock();
> >>     ...
> >> }
> >> .
> >>
> >> If I didn't miss something, I found this kind of visibility is __not__
> >> necessarily
> >> depends on the spinning-lock at __schedule being RCsc.
> >>
> >> In fact, as for runnable task A, the migration would be,
> >>
> >>  CPU0         CPU1            CPU2
> >>
> >> <ACCESS before schedule out A>
> >>
> >> lock(rq0)
> >> schedule out A
> >> unock(rq0)
> >>
> >>               lock(rq0)
> >>               remove A from rq0
> >>               unlock(rq0)
> >>
> >>               lock(rq2)
> >>               add A into rq2
> >>               unlock(rq2)
> >>                                         lock(rq2)
> >>                                         schedule in A
> >>                                         unlock(rq2)
> >>
> >>                                         <ACCESS after schedule in A>
> >>
> >> <ACCESS before schedule out A> happens-before
> >> unlock(rq0) happends-before
> >> lock(rq0) happends-before
> >> unlock(rq2) happens-before
> >> lock(rq2) happens-before
> >> <ACCESS after schedule in A>
> >>
> >
> > But without RCsc lock, you cannot guarantee that a write propagates to
> > CPU 0 and CPU 2 at the same time, so the same write may propagate to
> > CPU0 before <ACCESS before schedule out A> but propagate to CPU 2 after
> > <ACCESS after scheduler in A>. So..
> >
> > Regards,
> > Boqun
> 
> Thank you for pointing out this case, Boqun.
> But this is just one special case that acquire-release chains promise us.
> 

Ah.. right, because of A-Cumulative.

> A=B=0 as initial
> 
>   CPU0                CPU1                CPU2                CPU3
>  write A=1
>                            read A=1
>                            write B=1
>                            release X
>                                                  acquire X
>                                                  read A=?
>                                                  release Y
> 
>     acquire Y
> 
>     read B=?
> 
> assurance 1: CPU3 will surely see B=1 writing by CPU1, and
> assurance 2: CPU2 will also see A=1 writing by CPU0 as a special case
> 
> The second assurance is both in theory and implemented by real hardware.
> 
> As for theory, the C++11 memory model, which is a potential formal model
> for kernel memory model as
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html
> descripes, states that:
> 
> If a value computation A of an atomic object M happens before a value
> computation B of M, and A takes its value from a side effect X on M, then
> the value computed by B shall either be the value stored by X or the value
> stored by a side effect Y on M, where Y follows X in the modification
> order of M.
> 
> at
> $1.10 rule 18, on page 14
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf
> 
> As for real hardware, Luc provided detailed test and explanation on
> ARM and POWER in 5.1 Cumulative Barriers for WRC  on page 19
> in this paper:
> 
> A Tutorial Introduction to the ARM and POWER Relaxed Memory Models
> https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
> 
> So, I think we may remove RCsc from smp_mb__after_spinlock which is
> really confusing.
> 

So I think the purpose of smp_mb__after_spinlock() is to provide RCsc
locks, it's just the comments before that may be misleading. We want
RCsc locks in schedule code because we want writes in different critical
section are ordered even outside the critical sections, for case like:

	CPU 0		CPU 1		CPU 2

	{A =0 , B = 0}
	lock(rq0);
	write A=1;
	unlock(rq0);

			lock(rq0);
			read A=1;
			write B=2;
			unlock(rq0);

					read B=2;
					smp_rmb();
					read A=1;

I think we need to fix the comments rather than loose the requirement.
Peter?

Regards,
Boqun

> Best Regards,
> Trol
> 
> >
> >> And for stopped tasks,
> >>
> >>  CPU0         CPU1            CPU2
> >>
> >> <ACCESS before schedule out A>
> >>
> >> lock(rq0)
> >> schedule out A
> >> remove A from rq0
> >> store-release(A->on_cpu)
> >> unock(rq0)
> >>
> >>               load_acquire(A->on_cpu)
> >>               set_task_cpu(A, 2)
> >>
> >>               lock(rq2)
> >>               add A into rq2
> >>               unlock(rq2)
> >>
> >>                                         lock(rq2)
> >>                                         schedule in A
> >>                                         unlock(rq2)
> >>
> >>                                         <ACCESS after schedule in A>
> >>
> >> <ACCESS before schedule out A> happens-before
> >> store-release(A->on_cpu)  happens-before
> >> load_acquire(A->on_cpu)  happens-before
> >> unlock(rq2) happens-before
> >> lock(rq2) happens-before
> >> <ACCESS after schedule in A>
> >>
> >> So, I think the only requirement to smp_mb__after_spinlock is
> >> to guarantee a STORE before the spin_lock() is ordered
> >> against a LOAD after it. So we could remove the RCsc requirement
> >> to allow more efficient implementation.
> >>
> >> Did I miss something or this RCsc requirement does not really matter?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: smp_mb__after_spinlock requirement too strong?
  2018-03-12  8:56     ` Boqun Feng
@ 2018-03-12  8:56       ` Peter Zijlstra
  2018-03-12  9:13         ` 焦晓冬
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2018-03-12  8:56 UTC (permalink / raw)
  To: Boqun Feng
  Cc: 焦晓冬,
	linux-kernel, Alan Stern, will.deacon, torvalds, npiggin, mingo,
	mpe, oleg, benh, Paul McKenney

On Mon, Mar 12, 2018 at 04:56:00PM +0800, Boqun Feng wrote:
> So I think the purpose of smp_mb__after_spinlock() is to provide RCsc
> locks, it's just the comments before that may be misleading. We want
> RCsc locks in schedule code because we want writes in different critical
> section are ordered even outside the critical sections, for case like:
> 
> 	CPU 0		CPU 1		CPU 2
> 
> 	{A =0 , B = 0}
> 	lock(rq0);
> 	write A=1;
> 	unlock(rq0);
> 
> 			lock(rq0);
> 			read A=1;
> 			write B=2;
> 			unlock(rq0);
> 
> 					read B=2;
> 					smp_rmb();
> 					read A=1;
> 
> I think we need to fix the comments rather than loose the requirement.
> Peter?

Yes, ISTR people relying on schedule() being RCsc, and I just picked a
bad exmaple.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: smp_mb__after_spinlock requirement too strong?
  2018-03-12  8:56       ` Peter Zijlstra
@ 2018-03-12  9:13         ` 焦晓冬
  2018-03-12 13:31           ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: 焦晓冬 @ 2018-03-12  9:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Boqun Feng, linux-kernel, Alan Stern, will.deacon, torvalds,
	npiggin, mingo, mpe, oleg, benh, Paul McKenney

On Mon, Mar 12, 2018 at 4:56 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Mar 12, 2018 at 04:56:00PM +0800, Boqun Feng wrote:
>> So I think the purpose of smp_mb__after_spinlock() is to provide RCsc
>> locks, it's just the comments before that may be misleading. We want
>> RCsc locks in schedule code because we want writes in different critical
>> section are ordered even outside the critical sections, for case like:
>>
>>       CPU 0           CPU 1           CPU 2
>>
>>       {A =0 , B = 0}
>>       lock(rq0);
>>       write A=1;
>>       unlock(rq0);
>>
>>                       lock(rq0);
>>                       read A=1;
>>                       write B=2;
>>                       unlock(rq0);
>>
>>                                       read B=2;
>>                                       smp_rmb();
>>                                       read A=1;
>>
>> I think we need to fix the comments rather than loose the requirement.
>> Peter?
>
> Yes, ISTR people relying on schedule() being RCsc, and I just picked a
> bad exmaple.

Hi, Peter,
If the fixed comment could point out where this RCsc is used, it will be great.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: smp_mb__after_spinlock requirement too strong?
  2018-03-12  8:18   ` 焦晓冬
  2018-03-12  8:56     ` Boqun Feng
@ 2018-03-12 13:24     ` Andrea Parri
  2018-03-12 14:10       ` 焦晓冬
  1 sibling, 1 reply; 9+ messages in thread
From: Andrea Parri @ 2018-03-12 13:24 UTC (permalink / raw)
  To: 焦晓冬
  Cc: Boqun Feng, linux-kernel, peterz, Alan Stern, will.deacon,
	torvalds, npiggin, mingo, mpe, oleg, benh, Paul McKenney

Hi Trol,

[...]


> But this is just one special case that acquire-release chains promise us.
> 
> A=B=0 as initial
> 
>   CPU0                CPU1                CPU2                CPU3
>  write A=1
>                            read A=1
>                            write B=1
>                            release X
>                                                  acquire X
>                                                  read A=?
>                                                  release Y
> 
>     acquire Y
> 
>     read B=?
> 
> assurance 1: CPU3 will surely see B=1 writing by CPU1, and
> assurance 2: CPU2 will also see A=1 writing by CPU0 as a special case
> 
> The second assurance is both in theory and implemented by real hardware.
> 
> As for theory, the C++11 memory model, which is a potential formal model
> for kernel memory model as
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html
> descripes, states that:
> 
> If a value computation A of an atomic object M happens before a value
> computation B of M, and A takes its value from a side effect X on M, then
> the value computed by B shall either be the value stored by X or the value
> stored by a side effect Y on M, where Y follows X in the modification
> order of M.

A formal memory consistency model for the Linux kernel is now available at:

 git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git lkmm

Commit

  1c27b644c0fdbc61e113b8faee14baeb8df32486
  ("Automate memory-barriers.txt; provide Linux-kernel memory model")

provides some information (and references) on the development of this work.

---

You can check the above observation against this model: unless I mis-typed
your snippet,

andrea@andrea:~/linux-rcu/tools/memory-model$ cat trol0.litmus
C trol0

{}

P0(int *a)
{
	WRITE_ONCE(*a, 1);
}

P1(int *a, int *b, int *x)
{
	int r0;

	r0 = READ_ONCE(*a);
	WRITE_ONCE(*b, 1);
	smp_store_release(x, 1);
}

P2(int *a, int *x, int *y)
{
	int r0;
	int r1;

	r0 = smp_load_acquire(x);
	r1 = READ_ONCE(*a);
	smp_store_release(y, 1);
}

P3(int *b, int *y)
{
	int r0;
	int r1;

	r0 = smp_load_acquire(y);
	r1 = READ_ONCE(*b);
}

exists (1:r0=1 /\ 2:r0=1 /\ 3:r0=1 /\ (2:r1=0 \/ 3:r1=0))

andrea@andrea:~/linux-rcu/tools/memory-model$ herd7 -conf linux-kernel.cfg trol0.litmus
Test trol0 Allowed
States 25
1:r0=0; 2:r0=0; 2:r1=0; 3:r0=0; 3:r1=0;
1:r0=0; 2:r0=0; 2:r1=0; 3:r0=0; 3:r1=1;
1:r0=0; 2:r0=0; 2:r1=0; 3:r0=1; 3:r1=0;
1:r0=0; 2:r0=0; 2:r1=0; 3:r0=1; 3:r1=1;
1:r0=0; 2:r0=0; 2:r1=1; 3:r0=0; 3:r1=0;
1:r0=0; 2:r0=0; 2:r1=1; 3:r0=0; 3:r1=1;
1:r0=0; 2:r0=0; 2:r1=1; 3:r0=1; 3:r1=0;
1:r0=0; 2:r0=0; 2:r1=1; 3:r0=1; 3:r1=1;
1:r0=0; 2:r0=1; 2:r1=0; 3:r0=0; 3:r1=0;
1:r0=0; 2:r0=1; 2:r1=0; 3:r0=0; 3:r1=1;
1:r0=0; 2:r0=1; 2:r1=0; 3:r0=1; 3:r1=1;
1:r0=0; 2:r0=1; 2:r1=1; 3:r0=0; 3:r1=0;
1:r0=0; 2:r0=1; 2:r1=1; 3:r0=0; 3:r1=1;
1:r0=0; 2:r0=1; 2:r1=1; 3:r0=1; 3:r1=1;
1:r0=1; 2:r0=0; 2:r1=0; 3:r0=0; 3:r1=0;
1:r0=1; 2:r0=0; 2:r1=0; 3:r0=0; 3:r1=1;
1:r0=1; 2:r0=0; 2:r1=0; 3:r0=1; 3:r1=0;
1:r0=1; 2:r0=0; 2:r1=0; 3:r0=1; 3:r1=1;
1:r0=1; 2:r0=0; 2:r1=1; 3:r0=0; 3:r1=0;
1:r0=1; 2:r0=0; 2:r1=1; 3:r0=0; 3:r1=1;
1:r0=1; 2:r0=0; 2:r1=1; 3:r0=1; 3:r1=0;
1:r0=1; 2:r0=0; 2:r1=1; 3:r0=1; 3:r1=1;
1:r0=1; 2:r0=1; 2:r1=1; 3:r0=0; 3:r1=0;
1:r0=1; 2:r0=1; 2:r1=1; 3:r0=0; 3:r1=1;
1:r0=1; 2:r0=1; 2:r1=1; 3:r0=1; 3:r1=1;
No
Witnesses
Positive: 0 Negative: 25
Condition exists (1:r0=1 /\ 2:r0=1 /\ 3:r0=1 /\ (2:r1=0 \/ 3:r1=0))
Observation trol0 Never 0 25
Time trol0 0.03
Hash=21369772c98e442dd382bd84b43067ee

Please see "tools/memory-model/README" or "tools/memory-model/Documentation/"
for further information about these tools/model.

Best,
  Andrea


> 
> at
> $1.10 rule 18, on page 14
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf
> 
> As for real hardware, Luc provided detailed test and explanation on
> ARM and POWER in 5.1 Cumulative Barriers for WRC  on page 19
> in this paper:
> 
> A Tutorial Introduction to the ARM and POWER Relaxed Memory Models
> https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
> 
> So, I think we may remove RCsc from smp_mb__after_spinlock which is
> really confusing.
> 
> Best Regards,
> Trol
> 
> >
> >> And for stopped tasks,
> >>
> >>  CPU0         CPU1            CPU2
> >>
> >> <ACCESS before schedule out A>
> >>
> >> lock(rq0)
> >> schedule out A
> >> remove A from rq0
> >> store-release(A->on_cpu)
> >> unock(rq0)
> >>
> >>               load_acquire(A->on_cpu)
> >>               set_task_cpu(A, 2)
> >>
> >>               lock(rq2)
> >>               add A into rq2
> >>               unlock(rq2)
> >>
> >>                                         lock(rq2)
> >>                                         schedule in A
> >>                                         unlock(rq2)
> >>
> >>                                         <ACCESS after schedule in A>
> >>
> >> <ACCESS before schedule out A> happens-before
> >> store-release(A->on_cpu)  happens-before
> >> load_acquire(A->on_cpu)  happens-before
> >> unlock(rq2) happens-before
> >> lock(rq2) happens-before
> >> <ACCESS after schedule in A>
> >>
> >> So, I think the only requirement to smp_mb__after_spinlock is
> >> to guarantee a STORE before the spin_lock() is ordered
> >> against a LOAD after it. So we could remove the RCsc requirement
> >> to allow more efficient implementation.
> >>
> >> Did I miss something or this RCsc requirement does not really matter?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: smp_mb__after_spinlock requirement too strong?
  2018-03-12  9:13         ` 焦晓冬
@ 2018-03-12 13:31           ` Peter Zijlstra
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2018-03-12 13:31 UTC (permalink / raw)
  To: 焦晓冬
  Cc: Boqun Feng, linux-kernel, Alan Stern, will.deacon, torvalds,
	npiggin, mingo, mpe, oleg, benh, Paul McKenney

On Mon, Mar 12, 2018 at 05:13:03PM +0800, 焦晓冬 wrote:

> If the fixed comment could point out where this RCsc is used, it will be great.

The one that comes to mind first is RCU-sched, that relies on a context
switch implying a full smp_mb(). But I think there's more..

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: smp_mb__after_spinlock requirement too strong?
  2018-03-12 13:24     ` Andrea Parri
@ 2018-03-12 14:10       ` 焦晓冬
  0 siblings, 0 replies; 9+ messages in thread
From: 焦晓冬 @ 2018-03-12 14:10 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Boqun Feng, linux-kernel, peterz, Alan Stern, will.deacon,
	torvalds, npiggin, mingo, mpe, oleg, benh, Paul McKenney

On Mon, Mar 12, 2018 at 9:24 PM, Andrea Parri <parri.andrea@gmail.com> wrote:
> Hi Trol,
>
> [...]
>
>
>> But this is just one special case that acquire-release chains promise us.
>>
>> A=B=0 as initial
>>
>>   CPU0                CPU1                CPU2                CPU3
>>  write A=1
>>                            read A=1
>>                            write B=1
>>                            release X
>>                                                  acquire X
>>                                                  read A=?
>>                                                  release Y
>>
>>     acquire Y
>>
>>     read B=?
>>
>> assurance 1: CPU3 will surely see B=1 writing by CPU1, and
>> assurance 2: CPU2 will also see A=1 writing by CPU0 as a special case
>>
>> The second assurance is both in theory and implemented by real hardware.
>>
>> As for theory, the C++11 memory model, which is a potential formal model
>> for kernel memory model as
>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html
>> descripes, states that:
>>
>> If a value computation A of an atomic object M happens before a value
>> computation B of M, and A takes its value from a side effect X on M, then
>> the value computed by B shall either be the value stored by X or the value
>> stored by a side effect Y on M, where Y follows X in the modification
>> order of M.
>
> A formal memory consistency model for the Linux kernel is now available at:
>
>  git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git lkmm
>
> Commit
>
>   1c27b644c0fdbc61e113b8faee14baeb8df32486
>   ("Automate memory-barriers.txt; provide Linux-kernel memory model")
>
> provides some information (and references) on the development of this work.
>
> ---
>
> You can check the above observation against this model: unless I mis-typed
> your snippet,
>
> andrea@andrea:~/linux-rcu/tools/memory-model$ cat trol0.litmus
> C trol0
>
> {}
>
> P0(int *a)
> {
>         WRITE_ONCE(*a, 1);
> }
>
> P1(int *a, int *b, int *x)
> {
>         int r0;
>
>         r0 = READ_ONCE(*a);
>         WRITE_ONCE(*b, 1);
>         smp_store_release(x, 1);
> }
>
> P2(int *a, int *x, int *y)
> {
>         int r0;
>         int r1;
>
>         r0 = smp_load_acquire(x);
>         r1 = READ_ONCE(*a);
>         smp_store_release(y, 1);
> }
>
> P3(int *b, int *y)
> {
>         int r0;
>         int r1;
>
>         r0 = smp_load_acquire(y);
>         r1 = READ_ONCE(*b);
> }
>
> exists (1:r0=1 /\ 2:r0=1 /\ 3:r0=1 /\ (2:r1=0 \/ 3:r1=0))
>
> andrea@andrea:~/linux-rcu/tools/memory-model$ herd7 -conf linux-kernel.cfg trol0.litmus
> Test trol0 Allowed
> States 25
> 1:r0=0; 2:r0=0; 2:r1=0; 3:r0=0; 3:r1=0;
> 1:r0=0; 2:r0=0; 2:r1=0; 3:r0=0; 3:r1=1;
> 1:r0=0; 2:r0=0; 2:r1=0; 3:r0=1; 3:r1=0;
> 1:r0=0; 2:r0=0; 2:r1=0; 3:r0=1; 3:r1=1;
> 1:r0=0; 2:r0=0; 2:r1=1; 3:r0=0; 3:r1=0;
> 1:r0=0; 2:r0=0; 2:r1=1; 3:r0=0; 3:r1=1;
> 1:r0=0; 2:r0=0; 2:r1=1; 3:r0=1; 3:r1=0;
> 1:r0=0; 2:r0=0; 2:r1=1; 3:r0=1; 3:r1=1;
> 1:r0=0; 2:r0=1; 2:r1=0; 3:r0=0; 3:r1=0;
> 1:r0=0; 2:r0=1; 2:r1=0; 3:r0=0; 3:r1=1;
> 1:r0=0; 2:r0=1; 2:r1=0; 3:r0=1; 3:r1=1;
> 1:r0=0; 2:r0=1; 2:r1=1; 3:r0=0; 3:r1=0;
> 1:r0=0; 2:r0=1; 2:r1=1; 3:r0=0; 3:r1=1;
> 1:r0=0; 2:r0=1; 2:r1=1; 3:r0=1; 3:r1=1;
> 1:r0=1; 2:r0=0; 2:r1=0; 3:r0=0; 3:r1=0;
> 1:r0=1; 2:r0=0; 2:r1=0; 3:r0=0; 3:r1=1;
> 1:r0=1; 2:r0=0; 2:r1=0; 3:r0=1; 3:r1=0;
> 1:r0=1; 2:r0=0; 2:r1=0; 3:r0=1; 3:r1=1;
> 1:r0=1; 2:r0=0; 2:r1=1; 3:r0=0; 3:r1=0;
> 1:r0=1; 2:r0=0; 2:r1=1; 3:r0=0; 3:r1=1;
> 1:r0=1; 2:r0=0; 2:r1=1; 3:r0=1; 3:r1=0;
> 1:r0=1; 2:r0=0; 2:r1=1; 3:r0=1; 3:r1=1;
> 1:r0=1; 2:r0=1; 2:r1=1; 3:r0=0; 3:r1=0;
> 1:r0=1; 2:r0=1; 2:r1=1; 3:r0=0; 3:r1=1;
> 1:r0=1; 2:r0=1; 2:r1=1; 3:r0=1; 3:r1=1;
> No
> Witnesses
> Positive: 0 Negative: 25
> Condition exists (1:r0=1 /\ 2:r0=1 /\ 3:r0=1 /\ (2:r1=0 \/ 3:r1=0))
> Observation trol0 Never 0 25
> Time trol0 0.03
> Hash=21369772c98e442dd382bd84b43067ee
>
> Please see "tools/memory-model/README" or "tools/memory-model/Documentation/"
> for further information about these tools/model.
>
> Best,
>   Andrea
>

This work is amazingly great, Andrea.
I'd like to study on it.

>
>>
>> at
>> $1.10 rule 18, on page 14
>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf
>>
>> As for real hardware, Luc provided detailed test and explanation on
>> ARM and POWER in 5.1 Cumulative Barriers for WRC  on page 19
>> in this paper:
>>
>> A Tutorial Introduction to the ARM and POWER Relaxed Memory Models
>> https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
>>
>> So, I think we may remove RCsc from smp_mb__after_spinlock which is
>> really confusing.
>>
>> Best Regards,
>> Trol
>>
>> >
>> >> And for stopped tasks,
>> >>
>> >>  CPU0         CPU1            CPU2
>> >>
>> >> <ACCESS before schedule out A>
>> >>
>> >> lock(rq0)
>> >> schedule out A
>> >> remove A from rq0
>> >> store-release(A->on_cpu)
>> >> unock(rq0)
>> >>
>> >>               load_acquire(A->on_cpu)
>> >>               set_task_cpu(A, 2)
>> >>
>> >>               lock(rq2)
>> >>               add A into rq2
>> >>               unlock(rq2)
>> >>
>> >>                                         lock(rq2)
>> >>                                         schedule in A
>> >>                                         unlock(rq2)
>> >>
>> >>                                         <ACCESS after schedule in A>
>> >>
>> >> <ACCESS before schedule out A> happens-before
>> >> store-release(A->on_cpu)  happens-before
>> >> load_acquire(A->on_cpu)  happens-before
>> >> unlock(rq2) happens-before
>> >> lock(rq2) happens-before
>> >> <ACCESS after schedule in A>
>> >>
>> >> So, I think the only requirement to smp_mb__after_spinlock is
>> >> to guarantee a STORE before the spin_lock() is ordered
>> >> against a LOAD after it. So we could remove the RCsc requirement
>> >> to allow more efficient implementation.
>> >>
>> >> Did I miss something or this RCsc requirement does not really matter?

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-03-12 14:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-11  7:55 smp_mb__after_spinlock requirement too strong? 焦晓冬
2018-03-12  5:44 ` Boqun Feng
2018-03-12  8:18   ` 焦晓冬
2018-03-12  8:56     ` Boqun Feng
2018-03-12  8:56       ` Peter Zijlstra
2018-03-12  9:13         ` 焦晓冬
2018-03-12 13:31           ` Peter Zijlstra
2018-03-12 13:24     ` Andrea Parri
2018-03-12 14:10       ` 焦晓冬

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.