All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: paulmck@kernel.org
Cc: Frederic Weisbecker <frederic@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	linux-kernel@vger.kernel.org,
	Josh Triplett <josh@joshtriplett.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	rcu@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [RFC 0/2] srcu: Remove pre-flip memory barrier
Date: Fri, 23 Dec 2022 15:10:40 -0500	[thread overview]
Message-ID: <CAEXW_YQ5ZfpFmMcsiyenK9XePz3jLiDYYUMdpuxbyNbnaH2Xhg@mail.gmail.com> (raw)
In-Reply-To: <20221223181514.GZ4001@paulmck-ThinkPad-P17-Gen-1>

On Fri, Dec 23, 2022 at 1:15 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Dec 23, 2022 at 11:12:06AM -0500, Joel Fernandes wrote:
> > On Thu, Dec 22, 2022 at 11:43 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > [...]
> > > > > >>>> On Dec 22, 2022, at 11:43 AM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > >>>
> > > > > >>> On Thu, Dec 22, 2022 at 01:40:10PM +0100, Frederic Weisbecker wrote:
> > > > > >>>>> On Wed, Dec 21, 2022 at 12:11:42PM -0500, Mathieu Desnoyers wrote:
> > > > > >>>>> On 2022-12-21 06:59, Frederic Weisbecker wrote:
> > > > > >>>>>>> On Tue, Dec 20, 2022 at 10:34:19PM -0500, Mathieu Desnoyers wrote:
> > > > > >>>>> [...]
> > > > > >>>>>>>
> > > > > >>>>>>> The memory ordering constraint I am concerned about here is:
> > > > > >>>>>>>
> > > > > >>>>>>> * [...] In addition,
> > > > > >>>>>>> * each CPU having an SRCU read-side critical section that extends beyond
> > > > > >>>>>>> * the return from synchronize_srcu() is guaranteed to have executed a
> > > > > >>>>>>> * full memory barrier after the beginning of synchronize_srcu() and before
> > > > > >>>>>>> * the beginning of that SRCU read-side critical section. [...]
> > > > > >>>>>>>
> > > > > >>>>>>> So if we have a SRCU read-side critical section that begins after the beginning
> > > > > >>>>>>> of synchronize_srcu, but before its first memory barrier, it would miss the
> > > > > >>>>>>> guarantee that the full memory barrier is issued before the beginning of that
> > > > > >>>>>>> SRCU read-side critical section. IOW, that memory barrier needs to be at the
> > > > > >>>>>>> very beginning of the grace period.
> > > > > >>>>>>
> > > > > >>>>>> I'm confused, what's wrong with this ?
> > > > > >>>>>>
> > > > > >>>>>> UPDATER                  READER
> > > > > >>>>>> -------                  ------
> > > > > >>>>>> STORE X = 1              STORE srcu_read_lock++
> > > > > >>>>>> // rcu_seq_snap()        smp_mb()
> > > > > >>>>>> smp_mb()                 READ X
> > > > > >>>>>> // scans
> > > > > >>>>>> READ srcu_read_lock
> > > > > >>>>>
> > > > > >>>>> What you refer to here is only memory ordering of the store to X and load
> > > > > >>>>> from X wrt loading/increment of srcu_read_lock, which is internal to the
> > > > > >>>>> srcu implementation. If we really want to model the provided high-level
> > > > > >>>>> memory ordering guarantees, we should consider a scenario where SRCU is used
> > > > > >>>>> for its memory ordering properties to synchronize other variables.
> > > > > >>>>>
> > > > > >>>>> I'm concerned about the following Dekker scenario, where synchronize_srcu()
> > > > > >>>>> and srcu_read_lock/unlock would be used instead of memory barriers:
> > > > > >>>>>
> > > > > >>>>> Initial state: X = 0, Y = 0
> > > > > >>>>>
> > > > > >>>>> Thread A                   Thread B
> > > > > >>>>> ---------------------------------------------
> > > > > >>>>> STORE X = 1                STORE Y = 1
> > > > > >>>>> synchronize_srcu()
> > > > > >>>>>                          srcu_read_lock()
> > > > > >>>>>                          r1 = LOAD X
> > > > > >>>>>                          srcu_read_unlock()
> > > > > >>>>> r0 = LOAD Y
> > > > > >>>>>
> > > > > >>>>> BUG_ON(!r0 && !r1)
> > > > > >>>>>
> > > > > >>>>> So in the synchronize_srcu implementation, there appears to be two
> > > > > >>>>> major scenarios: either srcu_gp_start_if_needed starts a gp or expedited gp,
> > > > > >>>>> or it uses an already started gp/expedited gp. When snapshotting with
> > > > > >>>>> rcu_seq_snap, the fact that the memory barrier is after the ssp->srcu_gp_seq
> > > > > >>>>> load means that it does not order prior memory accesses before that load.
> > > > > >>>>> This sequence value is then used to identify which gp_seq to wait for when
> > > > > >>>>> piggy-backing on another already-started gp. I worry about reordering
> > > > > >>>>> between STORE X = 1 and load of ssp->srcu_gp_seq, which is then used to
> > > > > >>>>> piggy-back on an already-started gp.
> > > > > >>>>>
> > > > > >>>>> I suspect that the implicit barrier in srcu_read_lock() invoked at the
> > > > > >>>>> beginning of srcu_gp_start_if_needed() is really the barrier that makes
> > > > > >>>>> all this behave as expected. But without documentation it's rather hard to
> > > > > >>>>> follow.
> > > > > >>>>
> > > > > >>>> Oh ok I see now. It might be working that way by accident or on forgotten
> > > > > >>>> purpose. In any case, we really want to add a comment above that
> > > > > >>>> __srcu_read_lock_nmisafe() call.
> > > > > >>>
> > > > > >>> Another test for the safety (or not) of removing either D or E is
> > > > > >>> to move that WRITE_ONCE() to follow (or, respectively, precede) the
> > > > > >>> adjacent scans.
> > > > > >>
> > > > > >> Good idea, though I believe the MBs that the above talk about are not the flip ones. They are the ones in synchronize_srcu() beginning and end, that order with respect to grace period start and end.
> > > > > >>
> > > > > >> So that (flipping MBs) is unrelated, or did I miss something?
> > > > > >
> > > > > > The thought is to manually similate in the source code the maximum
> > > > > > memory-reference reordering that a maximally hostile compiler and CPU
> > > > > > would be permitted to carry out.  So yes, given that there are other
> > > > > > memory barriers before and after, these other memory barriers limit how
> > > > > > far the flip may be moved in the source code.
> > > > > >
> > > > > > Here I am talking about the memory barriers associated with the flip,
> > > > > > but the same trick can of course be applied to other memory barriers.
> > > > > > In general, remove a given memory barrier and (in the source code)
> > > > > > maximally rearrange the memory references that were previously ordered
> > > > > > by the memory barrier in question.
> > > > > >
> > > > > > Again, the presence of other memory barriers will limit the permitted
> > > > > > maximal source-code rearrangement.
> > > > >
> > > > >
> > > > > Makes sense if the memory barrier is explicit. In this case, the memory barriers are implicit apparently, with a srcu_read_lock() in the beginning of synchronize_rcu() having the implicit / indirect memory barrier. So I am not sure if that can be implemented without breaking SRCU readers.
> > > >
> > > > First, are we talking about the same barrier?  I am talking about E.
>
> Apologies.  I am a bit fixated on E because it is the one you guys
> proposed eliminating.  ;-)

Ah ok, no worries. :-)

> > > No, in the last part you replied to above, Mathieu and Frederic were
> > > talking about the need for memory barriers in synchronize_srcu(). That
> > > has nothing to do with E:
> > > <quote>
> > >  I suspect that the implicit barrier in srcu_read_lock() invoked at the
> > >  beginning of srcu_gp_start_if_needed() is really the barrier that makes
> > >  all this behave as expected.
> > > </quote>
> > >
> > > We need to order code prior to synchronize_srcu() wrt the start of the
> > > grace period, so that readers that started after the grace period
> > > started see those side-effects as they may not be waited on (they are
> > > too late).
> >
> > My thought is this is achieved by the srcu_read_lock() before the
> > grace period is started, and the start of the grace period (which is
> > basically the smp_mb() in the first scan).
> >
> > So from memory ordering PoV, if synchronize_rcu() spans the GP, and
> > readers don't span the GP, that means the reader does not span the
> > synchronize_rcu() which is the GP guarantee.
> >
> > But I could be missing something. I will dig more on my side. Thanks.
>
> Could you please specifically identify that srcu_read_lock()?
>
> Also which version you are looking at.  ;-)

Should be this one in current -rcu:
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/tree/kernel/rcu/srcutree.c#n1669

Thanks,

 - Joel

  reply	other threads:[~2022-12-23 20:10 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-18 19:13 [RFC 0/2] srcu: Remove pre-flip memory barrier Joel Fernandes (Google)
2022-12-18 19:13 ` [RFC 1/2] srcu: Remove comment about prior read lock counts Joel Fernandes (Google)
2022-12-18 21:08   ` Mathieu Desnoyers
2022-12-18 21:19     ` Joel Fernandes
2022-12-18 19:13 ` [RFC 2/2] srcu: Remove memory barrier "E" as it is not required Joel Fernandes (Google)
2022-12-18 21:42   ` Frederic Weisbecker
2022-12-18 23:26     ` Joel Fernandes
2022-12-19  0:30       ` Joel Fernandes
2022-12-18 20:57 ` [RFC 0/2] srcu: Remove pre-flip memory barrier Mathieu Desnoyers
2022-12-18 21:30   ` Joel Fernandes
2022-12-18 23:26     ` Paul E. McKenney
2022-12-18 23:38     ` Mathieu Desnoyers
2022-12-19  0:04       ` Joel Fernandes
2022-12-19  0:24         ` Joel Fernandes
2022-12-19  1:50           ` Mathieu Desnoyers
2022-12-20  0:55             ` Joel Fernandes
2022-12-20  1:04               ` Joel Fernandes
2022-12-20 17:00                 ` Mathieu Desnoyers
2022-12-20 18:05                   ` Joel Fernandes
2022-12-20 18:14                     ` Mathieu Desnoyers
2022-12-20 18:29                       ` Joel Fernandes
2022-12-20 19:01                         ` Mathieu Desnoyers
2022-12-20 19:06                           ` Joel Fernandes
2022-12-20 23:05                             ` Frederic Weisbecker
2022-12-20 23:46                               ` Joel Fernandes
2022-12-21  0:27                                 ` Frederic Weisbecker
2022-12-20 22:57                           ` Frederic Weisbecker
2022-12-21  3:34                             ` Mathieu Desnoyers
2022-12-21 11:59                               ` Frederic Weisbecker
2022-12-21 17:11                                 ` Mathieu Desnoyers
2022-12-22 12:40                                   ` Frederic Weisbecker
2022-12-22 13:19                                     ` Joel Fernandes
2022-12-22 16:43                                     ` Paul E. McKenney
2022-12-22 18:19                                       ` Joel Fernandes
2022-12-22 18:53                                         ` Paul E. McKenney
2022-12-22 18:56                                           ` Joel Fernandes
2022-12-22 19:45                                             ` Paul E. McKenney
2022-12-23  4:43                                               ` Joel Fernandes
2022-12-23 16:12                                                 ` Joel Fernandes
2022-12-23 18:15                                                   ` Paul E. McKenney
2022-12-23 20:10                                                     ` Joel Fernandes [this message]
2022-12-23 20:52                                                       ` Paul E. McKenney
2022-12-20 20:55                         ` Joel Fernandes
2022-12-21  3:52                           ` Mathieu Desnoyers
2022-12-21  5:02                             ` Joel Fernandes
2022-12-21  0:07                   ` Frederic Weisbecker
2022-12-21  3:47                     ` Mathieu Desnoyers
2022-12-20  4:07 ` Joel Fernandes
2022-12-20 12:34   ` Frederic Weisbecker
2022-12-20 12:40     ` Frederic Weisbecker
2022-12-20 13:44       ` Joel Fernandes
2022-12-20 14:07         ` Frederic Weisbecker
2022-12-20 14:20           ` Joel Fernandes
2022-12-20 22:44             ` Frederic Weisbecker
2022-12-21  0:15               ` Joel Fernandes
2022-12-21  0:49                 ` Frederic Weisbecker
2022-12-21  0:58                   ` Frederic Weisbecker
2022-12-21  3:43                     ` Mathieu Desnoyers
2022-12-21  4:26                       ` Joel Fernandes
2022-12-21 14:04                         ` Frederic Weisbecker
2022-12-21 16:30                         ` Mathieu Desnoyers
2022-12-21 12:11                       ` Frederic Weisbecker
2022-12-21 17:20                         ` Mathieu Desnoyers
2022-12-21 18:18                           ` Joel Fernandes
2022-12-21  2:41                   ` Joel Fernandes
2022-12-21 11:26                     ` Frederic Weisbecker
2022-12-21 16:02                       ` Boqun Feng
2022-12-21 17:30                         ` Frederic Weisbecker
2022-12-21 19:33                           ` Joel Fernandes
2022-12-21 19:57                             ` Joel Fernandes
2022-12-21 20:19                           ` Boqun Feng
2022-12-22 12:16                             ` Frederic Weisbecker
2022-12-22 12:24                               ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEXW_YQ5ZfpFmMcsiyenK9XePz3jLiDYYUMdpuxbyNbnaH2Xhg@mail.gmail.com \
    --to=joel@joelfernandes.org \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.