All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boqun Feng <boqun.feng@gmail.com>
To: 焦晓冬 <milestonejxd@gmail.com>
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
	Alan Stern <stern@rowland.harvard.edu>,
	will.deacon@arm.com, torvalds@linux-foundation.org,
	npiggin@gmail.com, mingo@kernel.org, mpe@ellerman.id.au,
	oleg@redhat.com, benh@kernel.crashing.org,
	Paul McKenney <paulmck@linux.vnet.ibm.com>
Subject: Re: smp_mb__after_spinlock requirement too strong?
Date: Mon, 12 Mar 2018 16:56:00 +0800	[thread overview]
Message-ID: <20180312085600.aczjkpn73axzs2sb@tardis> (raw)
In-Reply-To: <CAJDTihxxhy7zmhTJ-ky4wvMby_o9y8UOcs9R1dABN7NccekAiQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6641 bytes --]

On Mon, Mar 12, 2018 at 04:18:00PM +0800, 焦晓冬 wrote:
> >> Peter pointed out in this patch https://patchwork.kernel.org/patch/9771921/
> >> that the spinning-lock used at __schedule() should be RCsc to ensure
> >> visibility of writes prior to __schedule when the task is to be migrated to
> >> another CPU.
> >>
> >> And this is emphasized at the comment of the newly introduced
> >> smp_mb__after_spinlock(),
> >>
> >>  * This barrier must provide two things:
> >>  *
> >>  *   - it must guarantee a STORE before the spin_lock() is ordered against a
> >>  *     LOAD after it, see the comments at its two usage sites.
> >>  *
> >>  *   - it must ensure the critical section is RCsc.
> >>  *
> >>  * The latter is important for cases where we observe values written by other
> >>  * CPUs in spin-loops, without barriers, while being subject to scheduling.
> >>  *
> >>  * CPU0         CPU1            CPU2
> >>  *
> >>  *          for (;;) {
> >>  *            if (READ_ONCE(X))
> >>  *              break;
> >>  *          }
> >>  * X=1
> >>  *          <sched-out>
> >>  *                      <sched-in>
> >>  *                      r = X;
> >>  *
> >>  * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
> >>  * we get migrated and CPU2 sees X==0.
> >>
> >> which is used at,
> >>
> >> __schedule(bool preempt) {
> >>     ...
> >>     rq_lock(rq, &rf);
> >>     smp_mb__after_spinlock();
> >>     ...
> >> }
> >> .
> >>
> >> If I didn't miss something, I found this kind of visibility is __not__
> >> necessarily
> >> depends on the spinning-lock at __schedule being RCsc.
> >>
> >> In fact, as for runnable task A, the migration would be,
> >>
> >>  CPU0         CPU1            CPU2
> >>
> >> <ACCESS before schedule out A>
> >>
> >> lock(rq0)
> >> schedule out A
> >> unock(rq0)
> >>
> >>               lock(rq0)
> >>               remove A from rq0
> >>               unlock(rq0)
> >>
> >>               lock(rq2)
> >>               add A into rq2
> >>               unlock(rq2)
> >>                                         lock(rq2)
> >>                                         schedule in A
> >>                                         unlock(rq2)
> >>
> >>                                         <ACCESS after schedule in A>
> >>
> >> <ACCESS before schedule out A> happens-before
> >> unlock(rq0) happends-before
> >> lock(rq0) happends-before
> >> unlock(rq2) happens-before
> >> lock(rq2) happens-before
> >> <ACCESS after schedule in A>
> >>
> >
> > But without RCsc lock, you cannot guarantee that a write propagates to
> > CPU 0 and CPU 2 at the same time, so the same write may propagate to
> > CPU0 before <ACCESS before schedule out A> but propagate to CPU 2 after
> > <ACCESS after scheduler in A>. So..
> >
> > Regards,
> > Boqun
> 
> Thank you for pointing out this case, Boqun.
> But this is just one special case that acquire-release chains promise us.
> 

Ah.. right, because of A-Cumulative.

> A=B=0 as initial
> 
>   CPU0                CPU1                CPU2                CPU3
>  write A=1
>                            read A=1
>                            write B=1
>                            release X
>                                                  acquire X
>                                                  read A=?
>                                                  release Y
> 
>     acquire Y
> 
>     read B=?
> 
> assurance 1: CPU3 will surely see B=1 writing by CPU1, and
> assurance 2: CPU2 will also see A=1 writing by CPU0 as a special case
> 
> The second assurance is both in theory and implemented by real hardware.
> 
> As for theory, the C++11 memory model, which is a potential formal model
> for kernel memory model as
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html
> descripes, states that:
> 
> If a value computation A of an atomic object M happens before a value
> computation B of M, and A takes its value from a side effect X on M, then
> the value computed by B shall either be the value stored by X or the value
> stored by a side effect Y on M, where Y follows X in the modification
> order of M.
> 
> at
> $1.10 rule 18, on page 14
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf
> 
> As for real hardware, Luc provided detailed test and explanation on
> ARM and POWER in 5.1 Cumulative Barriers for WRC  on page 19
> in this paper:
> 
> A Tutorial Introduction to the ARM and POWER Relaxed Memory Models
> https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
> 
> So, I think we may remove RCsc from smp_mb__after_spinlock which is
> really confusing.
> 

So I think the purpose of smp_mb__after_spinlock() is to provide RCsc
locks, it's just the comments before that may be misleading. We want
RCsc locks in schedule code because we want writes in different critical
section are ordered even outside the critical sections, for case like:

	CPU 0		CPU 1		CPU 2

	{A =0 , B = 0}
	lock(rq0);
	write A=1;
	unlock(rq0);

			lock(rq0);
			read A=1;
			write B=2;
			unlock(rq0);

					read B=2;
					smp_rmb();
					read A=1;

I think we need to fix the comments rather than loose the requirement.
Peter?

Regards,
Boqun

> Best Regards,
> Trol
> 
> >
> >> And for stopped tasks,
> >>
> >>  CPU0         CPU1            CPU2
> >>
> >> <ACCESS before schedule out A>
> >>
> >> lock(rq0)
> >> schedule out A
> >> remove A from rq0
> >> store-release(A->on_cpu)
> >> unock(rq0)
> >>
> >>               load_acquire(A->on_cpu)
> >>               set_task_cpu(A, 2)
> >>
> >>               lock(rq2)
> >>               add A into rq2
> >>               unlock(rq2)
> >>
> >>                                         lock(rq2)
> >>                                         schedule in A
> >>                                         unlock(rq2)
> >>
> >>                                         <ACCESS after schedule in A>
> >>
> >> <ACCESS before schedule out A> happens-before
> >> store-release(A->on_cpu)  happens-before
> >> load_acquire(A->on_cpu)  happens-before
> >> unlock(rq2) happens-before
> >> lock(rq2) happens-before
> >> <ACCESS after schedule in A>
> >>
> >> So, I think the only requirement to smp_mb__after_spinlock is
> >> to guarantee a STORE before the spin_lock() is ordered
> >> against a LOAD after it. So we could remove the RCsc requirement
> >> to allow more efficient implementation.
> >>
> >> Did I miss something or this RCsc requirement does not really matter?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2018-03-12  8:52 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-11  7:55 smp_mb__after_spinlock requirement too strong? 焦晓冬
2018-03-12  5:44 ` Boqun Feng
2018-03-12  8:18   ` 焦晓冬
2018-03-12  8:56     ` Boqun Feng [this message]
2018-03-12  8:56       ` Peter Zijlstra
2018-03-12  9:13         ` 焦晓冬
2018-03-12 13:31           ` Peter Zijlstra
2018-03-12 13:24     ` Andrea Parri
2018-03-12 14:10       ` 焦晓冬

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180312085600.aczjkpn73axzs2sb@tardis \
    --to=boqun.feng@gmail.com \
    --cc=benh@kernel.crashing.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=milestonejxd@gmail.com \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=stern@rowland.harvard.edu \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.