linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: paulmck <paulmck@linux.ibm.com>, rcu <rcu@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	dipankar <dipankar@in.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	rostedt <rostedt@goodmis.org>,
	David Howells <dhowells@redhat.com>,
	Eric Dumazet <edumazet@google.com>, fweisbec <fweisbec@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	amd-gfx <amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH RFC tip/core/rcu 0/4] Forbid static SRCU use in modules
Date: Tue, 9 Apr 2019 11:40:12 -0400	[thread overview]
Message-ID: <20190409154012.GC248418@google.com> (raw)
In-Reply-To: <1489474416.1465.1554744287985.JavaMail.zimbra@efficios.com>

On Mon, Apr 08, 2019 at 01:24:47PM -0400, Mathieu Desnoyers wrote:
> ----- On Apr 8, 2019, at 11:46 AM, paulmck paulmck@linux.ibm.com wrote:
> 
> > On Mon, Apr 08, 2019 at 10:49:32AM -0400, Mathieu Desnoyers wrote:
> >> ----- On Apr 8, 2019, at 10:22 AM, paulmck paulmck@linux.ibm.com wrote:
> >> 
> >> > On Mon, Apr 08, 2019 at 09:05:34AM -0400, Mathieu Desnoyers wrote:
> >> >> ----- On Apr 7, 2019, at 10:27 PM, paulmck paulmck@linux.ibm.com wrote:
> >> >> 
> >> >> > On Sun, Apr 07, 2019 at 09:07:18PM +0000, Joel Fernandes wrote:
> >> >> >> On Sun, Apr 07, 2019 at 04:41:36PM -0400, Mathieu Desnoyers wrote:
> >> >> >> > 
> >> >> >> > ----- On Apr 7, 2019, at 3:32 PM, Joel Fernandes, Google joel@joelfernandes.org
> >> >> >> > wrote:
> >> >> >> > 
> >> >> >> > > On Sun, Apr 07, 2019 at 03:26:16PM -0400, Mathieu Desnoyers wrote:
> >> >> >> > >> ----- On Apr 7, 2019, at 9:59 AM, paulmck paulmck@linux.ibm.com wrote:
> >> >> >> > >> 
> >> >> >> > >> > On Sun, Apr 07, 2019 at 06:39:41AM -0700, Paul E. McKenney wrote:
> >> >> >> > >> >> On Sat, Apr 06, 2019 at 07:06:13PM -0400, Joel Fernandes wrote:
> >> >> >> > >> > 
> >> >> >> > >> > [ . . . ]
> >> >> >> > >> > 
> >> >> >> > >> >> > > diff --git a/include/asm-generic/vmlinux.lds.h
> >> >> >> > >> >> > > b/include/asm-generic/vmlinux.lds.h
> >> >> >> > >> >> > > index f8f6f04c4453..c2d919a1566e 100644
> >> >> >> > >> >> > > --- a/include/asm-generic/vmlinux.lds.h
> >> >> >> > >> >> > > +++ b/include/asm-generic/vmlinux.lds.h
> >> >> >> > >> >> > > @@ -338,6 +338,10 @@
> >> >> >> > >> >> > >  		KEEP(*(__tracepoints_ptrs)) /* Tracepoints: pointer array */ \
> >> >> >> > >> >> > >  		__stop___tracepoints_ptrs = .;				\
> >> >> >> > >> >> > >  		*(__tracepoints_strings)/* Tracepoints: strings */	\
> >> >> >> > >> >> > > +		. = ALIGN(8);						\
> >> >> >> > >> >> > > +		__start___srcu_struct = .;				\
> >> >> >> > >> >> > > +		*(___srcu_struct_ptrs)					\
> >> >> >> > >> >> > > +		__end___srcu_struct = .;				\
> >> >> >> > >> >> > >  	}								\
> >> >> >> > >> >> > 
> >> >> >> > >> >> > This vmlinux linker modification is not needed. I tested without it and srcu
> >> >> >> > >> >> > torture works fine with rcutorture built as a module. Putting further prints
> >> >> >> > >> >> > in kernel/module.c verified that the kernel is able to find the srcu structs
> >> >> >> > >> >> > just fine. You could squash the below patch into this one or apply it on top
> >> >> >> > >> >> > of the dev branch.
> >> >> >> > >> >> 
> >> >> >> > >> >> Good point, given that otherwise FORTRAN named common blocks would not
> >> >> >> > >> >> work.
> >> >> >> > >> >> 
> >> >> >> > >> >> But isn't one advantage of leaving that stuff in the RO_DATA_SECTION()
> >> >> >> > >> >> macro that it can be mapped read-only?  Or am I suffering from excessive
> >> >> >> > >> >> optimism?
> >> >> >> > >> > 
> >> >> >> > >> > And to answer the other question, in the case where I am suffering from
> >> >> >> > >> > excessive optimism, it should be a separate commit.  Please see below
> >> >> >> > >> > for the updated original commit thus far.
> >> >> >> > >> > 
> >> >> >> > >> > And may I have your Tested-by?
> >> >> >> > >> 
> >> >> >> > >> Just to confirm: does the cleanup performed in the modules going
> >> >> >> > >> notifier end up acting as a barrier first before freeing the memory ?
> >> >> >> > >> If not, is it explicitly stated that a barrier must be issued before
> >> >> >> > >> module unload ?
> >> >> >> > >> 
> >> >> >> > > 
> >> >> >> > > You mean rcu_barrier? It is mentioned in the documentation that this is the
> >> >> >> > > responsibility of the module writer to prevent delays for all modules.
> >> >> >> > 
> >> >> >> > It's a srcu barrier yes. Considering it would be a barrier specific to the
> >> >> >> > srcu domain within that module, I don't see how it would cause delays for
> >> >> >> > "all" modules if we implicitly issue the barrier on module unload. What
> >> >> >> > am I missing ?
> >> >> >> 
> >> >> >> Yes you are right. I thought of this after I just sent my email. I think it
> >> >> >> makes sense for srcu case to do and could avoid a class of bugs.
> >> >> > 
> >> >> > If there are call_srcu() callbacks outstanding, the module writer still
> >> >> > needs the srcu_barrier() because otherwise callbacks arrive after
> >> >> > the module text has gone, which will be disappoint the CPU when it
> >> >> > tries fetching instructions that are no longer mapped.  If there are
> >> >> > no call_srcu() callbacks from that module, then there is no need for
> >> >> > srcu_barrier() either way.
> >> >> > 
> >> >> > So if an srcu_barrier() is needed, the module developer needs to
> >> >> > supply it.
> >> >> 
> >> >> When you say "callbacks arrive after the module text has gone",
> >> >> I think you assume that free_module() is invoked before the
> >> >> MODULE_STATE_GOING notifiers are called. But it's done in the
> >> >> opposite order: going notifiers are called first, and then
> >> >> free_module() is invoked.
> >> >> 
> >> >> So AFAIU it would be safe to issue the srcu_barrier() from the module
> >> >> going notifier.
> >> >> 
> >> >> Or am I missing something ?
> >> > 
> >> > We do seem to be talking past each other.  ;-)
> >> > 
> >> > This has nothing to do with the order of events at module-unload time.
> >> > 
> >> > So please let me try again.
> >> > 
> >> > If a given srcu_struct in a module never has call_srcu() invoked, there
> >> > is no need to invoke rcu_barrier() at any time, whether at module-unload
> >> > time or not.  Adding rcu_barrier() in this case adds overhead and latency
> >> > for no good reason.
> >> 
> >> Not if we invoke srcu_barrier() for that specific domain. If
> >> call_srcu was never invoked for a srcu domain, I don't see why
> >> srcu_barrier() should be more expensive than a simple check that
> >> the domain does not have any srcu work queued.
> > 
> > But that simple check does involve a cache miss for each possible CPU (not
> > just each online CPU), so it is non-trivial, especially on large systems.
> > 
> >> > If a given srcu_struct in a module does have at least one call_srcu()
> >> > invoked, it is already that module's responsibility to make sure that
> >> > the code sticks around long enough for the callback to be invoked.
> >> 
> >> I understand that when users do explicit dynamic allocation/cleanup of
> >> srcu domains, they indeed need to take care of doing explicit srcu_barrier().
> >> However, if they do static definition of srcu domains, it would be nice
> >> if we can handle the barriers under the hood.
> > 
> > All else being equal, of course.  But...
> > 
> >> > This means that correct SRCU users that invoke call_srcu() already
> >> > have srcu_barrier() at module-unload time.  Incorrect SRCU users, with
> >> > reasonable probability, now get a WARN_ON() at module-unload time, with
> >> > the per-CPU state getting leaked.  Before this change, they would (also
> >> > with reasonable probability) instead get an instruction-fetch fault when
> >> > the SRCU callback was invoked after the completion of the module unload.
> >> > Furthermore, in all cases where they would previously have gotten the
> >> > instruction-fetch fault, they now get the WARN_ON(), like this:
> >> > 
> >> >	if (WARN_ON(rcu_segcblist_n_cbs(&sdp->srcu_cblist)))
> >> >		return; /* Forgot srcu_barrier(), so just leak it! */
> >> > 
> >> > So this change already represents an improvement in usability.
> >> 
> >> Considering that we can do a srcu_barrier() for the specific domain,
> >> and that it should add no noticeable overhead if there is no queued
> >> callbacks, I don't see a good reason for leaving the srcu_barrier
> >> invocation to the user rather than implicitly doing it from the
> >> module going notifier.
> > 
> > Now, I could automatically add an indicator of whether or not a
> > call_srcu() had happened, but then again, that would either add a
> > call_srcu() scalability bottleneck or again require a scan of all possible
> > CPUs...  to figure out if it was necessary to scan all possible CPUs.
> > 
> > Or is scanning all possible CPUs down in the noise in this case?  Or
> > am I missing a trick that would reduce the overhead?
> 
> Module unloading implicitly does a synchronize_rcu (for RCU-sched), and
> a stop_machine. So I would be tempted to say that overhead of iteration
> over all CPUs might not matter that much considering the rest.
> 
> About notifying that a call_srcu has happened for the srcu domain in a
> scalable fashion, let's see... We could have a flag "call_srcu_used"
> for each call_srcu domain. Whenever call_srcu is invoked, it would
> load that flag. It sets it on first use.
> 
> The idea here is to only use that flag when srcu_barrier is performed
> right before the srcu domain cleanup (it could become part of that
> cleanup). Else, using it in all srcu_barrier() might be tricky, because
> we may then need to add memory barriers or locking to the call_srcu
> fast-path, which is an overhead we try to avoid.
> 
> However, if we only use that flag as part of the srcu domain cleanup,
> it's already prohibited to invoke call_srcu concurrently with the
> cleanup of the same domain, so I don't think we would need any
> memory barriers in call_srcu.

About the last part of your email, it seems to that if after call_srcu has
returned, if the module could be unloaded on some other CPU - then it would
need to see the flag stored by the preceding call_srcu, so I believe there
would be a memory barrier between the two opreations (call_srcu and module
unload).

Also about doing the unconditional srcu_barrier, since a module could be
unloaded at any time - don't all SRCU using modules need to invoke
srcu_barrier() during their clean up anyway so we are incurring the barrier
overhead anyway? Or, am I missing a design pattern here? It seems to me
rcutorture module definitely calls srcu_barrier() before it is unloaded.

thanks,

 - Joel

> Thoughts ?
> 
> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

  reply	other threads:[~2019-04-09 15:40 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-02 14:28 [PATCH RFC tip/core/rcu 0/4] Forbid static SRCU use in modules Paul E. McKenney
2019-04-02 14:29 ` [PATCH RFC tip/core/rcu 1/4] dax/super: Dynamically allocate dax_srcu Paul E. McKenney
2019-04-03 18:31   ` Dan Williams
2019-04-04 21:04     ` Paul E. McKenney
2019-04-02 14:29 ` [PATCH RFC tip/core/rcu 2/4] drivers/gpu/drm: Dynamically allocate drm_unplug_srcu Paul E. McKenney
2019-04-02 16:14   ` Daniel Vetter
2019-04-02 14:29 ` [PATCH RFC tip/core/rcu 3/4] drivers/gpu/drm/amd: Dynamically allocate kfd_processes_srcu Paul E. McKenney
2019-04-02 17:40   ` Kuehling, Felix
2019-04-04 21:16     ` Paul E. McKenney
2019-04-02 14:29 ` [PATCH RFC tip/core/rcu 4/4] rcu: Forbid DEFINE{,_STATIC}_SRCU() from modules Paul E. McKenney
2019-04-02 15:14 ` [PATCH RFC tip/core/rcu 0/4] Forbid static SRCU use in modules Mathieu Desnoyers
2019-04-02 15:23   ` Paul E. McKenney
2019-04-02 15:34     ` Mathieu Desnoyers
2019-04-03 13:32       ` Paul E. McKenney
2019-04-03 14:27         ` Mathieu Desnoyers
2019-04-03 16:20           ` Paul E. McKenney
2019-04-03 19:30             ` Joel Fernandes
2019-04-05 23:28             ` Paul E. McKenney
2019-04-06 13:33               ` Joel Fernandes
2019-04-07 13:48                 ` Paul E. McKenney
2019-04-06 23:06               ` Joel Fernandes
2019-04-07 13:39                 ` Paul E. McKenney
2019-04-07 13:59                   ` Paul E. McKenney
2019-04-07 15:46                     ` Joel Fernandes
2019-04-07 17:05                       ` Paul E. McKenney
2019-04-08  0:36                         ` Joel Fernandes
2019-04-08  2:28                           ` Paul E. McKenney
2019-04-07 19:26                     ` Mathieu Desnoyers
2019-04-07 19:32                       ` Joel Fernandes
2019-04-07 20:41                         ` Mathieu Desnoyers
2019-04-07 21:07                           ` Joel Fernandes
2019-04-08  2:27                             ` Paul E. McKenney
2019-04-08 13:05                               ` Mathieu Desnoyers
2019-04-08 14:22                                 ` Paul E. McKenney
2019-04-08 14:49                                   ` Mathieu Desnoyers
2019-04-08 15:46                                     ` Paul E. McKenney
2019-04-08 17:24                                       ` Mathieu Desnoyers
2019-04-09 15:40                                         ` Joel Fernandes [this message]
2019-04-09 15:56                                           ` Mathieu Desnoyers
2019-04-09 16:18                                             ` Joel Fernandes
2019-04-09 16:40                                             ` Paul E. McKenney
2019-04-09 16:45                                               ` Mathieu Desnoyers
2019-04-09 17:55                                                 ` Paul E. McKenney
2019-04-09 18:04                                                   ` Mathieu Desnoyers
2019-04-09 19:14                                                     ` Paul E. McKenney
2019-04-02 18:40     ` Joel Fernandes
2019-04-03 13:19       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190409154012.GC248418@google.com \
    --to=joel@joelfernandes.org \
    --cc=akpm@linux-foundation.org \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).