From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F4F1C10F0E for ; Tue, 9 Apr 2019 15:40:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F0BA520883 for ; Tue, 9 Apr 2019 15:40:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=joelfernandes.org header.i=@joelfernandes.org header.b="c4GCQrQG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726456AbfDIPkQ (ORCPT ); Tue, 9 Apr 2019 11:40:16 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:45965 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726035AbfDIPkQ (ORCPT ); Tue, 9 Apr 2019 11:40:16 -0400 Received: by mail-pl1-f194.google.com with SMTP id bf11so9601277plb.12 for ; Tue, 09 Apr 2019 08:40:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=1vVGbkC9Hrl77DLW544aLWPTPJHYEEx7X3v6Ib+Goxk=; b=c4GCQrQG8cPXF4Nf7AxV0/Fxjt07grwLbM63/JCxSWXS65/+SCwcHeQOu2DdPz2jWM bf70TQ2HDGYeVuOx96YrRyIc620Uj072xaufs1Ew/WOPd/jlNm8Qtq8RxdoKMDFGjisf /sPZLNGY5hWG1yk95iB1HCyPwpjJRrGuYshWU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=1vVGbkC9Hrl77DLW544aLWPTPJHYEEx7X3v6Ib+Goxk=; b=onpEpaIGKTbClemzsfzzoDx7kEM/jPDB9EmEYEwtHKkDvr+auv20NLsJpUjUsS7Vkz qeAm+ptns2TtIAL+xvHSqPzXvWO0gbk/xDaTW3O4HduafrW2Ao7mVFM6cShvxvRbGCey qbGmQgOHrTj7eoLVjH+Zy14rqq6BfBZOBVcPGVDcgm804bAQeR2vFzCmyNOAqoKsOHKn 7cgtZVmOcFgs+bx/tXs5q8wDInIOZemo6k/YzC/s82pLjxxAEbtfUdqJccR+z3ZBsBqY C1CQbPTnavk5npSwmVuk41fcJlx2SLUPfqHVP6id/KkKKv82zAkSekuuR7u2sB2Uwdjd iWbg== X-Gm-Message-State: APjAAAW8GG4CFJFeyyvnTugZx3Uq9FXXLm8bMHUMKgZZpUHFr3GjEHYA DZVZCKILyMQJORoW5vQkjdwpsg== X-Google-Smtp-Source: APXvYqwYV4PF3nH4gPpZVevv1e5HFoQpiJ2irQOlHK2oAd++pg4eE/vmLX6lFcBZ05W66oYQIgqryg== X-Received: by 2002:a17:902:a506:: with SMTP id s6mr36551912plq.164.1554824415170; Tue, 09 Apr 2019 08:40:15 -0700 (PDT) Received: from localhost ([2620:15c:6:12:9c46:e0da:efbf:69cc]) by smtp.gmail.com with ESMTPSA id i31sm53981357pgi.36.2019.04.09.08.40.13 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 09 Apr 2019 08:40:13 -0700 (PDT) Date: Tue, 9 Apr 2019 11:40:12 -0400 From: Joel Fernandes To: Mathieu Desnoyers Cc: paulmck , rcu , linux-kernel , Ingo Molnar , Lai Jiangshan , dipankar , Andrew Morton , Josh Triplett , Thomas Gleixner , Peter Zijlstra , rostedt , David Howells , Eric Dumazet , fweisbec , Oleg Nesterov , linux-nvdimm , dri-devel , amd-gfx Subject: Re: [PATCH RFC tip/core/rcu 0/4] Forbid static SRCU use in modules Message-ID: <20190409154012.GC248418@google.com> References: <20190402142816.GA13084@linux.ibm.com> <1632568795.549.1554669696728.JavaMail.zimbra@efficios.com> <20190407210718.GA6656@localhost> <20190408022728.GF14111@linux.ibm.com> <1504296005.857.1554728734661.JavaMail.zimbra@efficios.com> <20190408142230.GJ14111@linux.ibm.com> <1447252022.1166.1554734972823.JavaMail.zimbra@efficios.com> <20190408154616.GO14111@linux.ibm.com> <1489474416.1465.1554744287985.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1489474416.1465.1554744287985.JavaMail.zimbra@efficios.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: rcu-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org On Mon, Apr 08, 2019 at 01:24:47PM -0400, Mathieu Desnoyers wrote: > ----- On Apr 8, 2019, at 11:46 AM, paulmck paulmck@linux.ibm.com wrote: > > > On Mon, Apr 08, 2019 at 10:49:32AM -0400, Mathieu Desnoyers wrote: > >> ----- On Apr 8, 2019, at 10:22 AM, paulmck paulmck@linux.ibm.com wrote: > >> > >> > On Mon, Apr 08, 2019 at 09:05:34AM -0400, Mathieu Desnoyers wrote: > >> >> ----- On Apr 7, 2019, at 10:27 PM, paulmck paulmck@linux.ibm.com wrote: > >> >> > >> >> > On Sun, Apr 07, 2019 at 09:07:18PM +0000, Joel Fernandes wrote: > >> >> >> On Sun, Apr 07, 2019 at 04:41:36PM -0400, Mathieu Desnoyers wrote: > >> >> >> > > >> >> >> > ----- On Apr 7, 2019, at 3:32 PM, Joel Fernandes, Google joel@joelfernandes.org > >> >> >> > wrote: > >> >> >> > > >> >> >> > > On Sun, Apr 07, 2019 at 03:26:16PM -0400, Mathieu Desnoyers wrote: > >> >> >> > >> ----- On Apr 7, 2019, at 9:59 AM, paulmck paulmck@linux.ibm.com wrote: > >> >> >> > >> > >> >> >> > >> > On Sun, Apr 07, 2019 at 06:39:41AM -0700, Paul E. McKenney wrote: > >> >> >> > >> >> On Sat, Apr 06, 2019 at 07:06:13PM -0400, Joel Fernandes wrote: > >> >> >> > >> > > >> >> >> > >> > [ . . . ] > >> >> >> > >> > > >> >> >> > >> >> > > diff --git a/include/asm-generic/vmlinux.lds.h > >> >> >> > >> >> > > b/include/asm-generic/vmlinux.lds.h > >> >> >> > >> >> > > index f8f6f04c4453..c2d919a1566e 100644 > >> >> >> > >> >> > > --- a/include/asm-generic/vmlinux.lds.h > >> >> >> > >> >> > > +++ b/include/asm-generic/vmlinux.lds.h > >> >> >> > >> >> > > @@ -338,6 +338,10 @@ > >> >> >> > >> >> > > KEEP(*(__tracepoints_ptrs)) /* Tracepoints: pointer array */ \ > >> >> >> > >> >> > > __stop___tracepoints_ptrs = .; \ > >> >> >> > >> >> > > *(__tracepoints_strings)/* Tracepoints: strings */ \ > >> >> >> > >> >> > > + . = ALIGN(8); \ > >> >> >> > >> >> > > + __start___srcu_struct = .; \ > >> >> >> > >> >> > > + *(___srcu_struct_ptrs) \ > >> >> >> > >> >> > > + __end___srcu_struct = .; \ > >> >> >> > >> >> > > } \ > >> >> >> > >> >> > > >> >> >> > >> >> > This vmlinux linker modification is not needed. I tested without it and srcu > >> >> >> > >> >> > torture works fine with rcutorture built as a module. Putting further prints > >> >> >> > >> >> > in kernel/module.c verified that the kernel is able to find the srcu structs > >> >> >> > >> >> > just fine. You could squash the below patch into this one or apply it on top > >> >> >> > >> >> > of the dev branch. > >> >> >> > >> >> > >> >> >> > >> >> Good point, given that otherwise FORTRAN named common blocks would not > >> >> >> > >> >> work. > >> >> >> > >> >> > >> >> >> > >> >> But isn't one advantage of leaving that stuff in the RO_DATA_SECTION() > >> >> >> > >> >> macro that it can be mapped read-only? Or am I suffering from excessive > >> >> >> > >> >> optimism? > >> >> >> > >> > > >> >> >> > >> > And to answer the other question, in the case where I am suffering from > >> >> >> > >> > excessive optimism, it should be a separate commit. Please see below > >> >> >> > >> > for the updated original commit thus far. > >> >> >> > >> > > >> >> >> > >> > And may I have your Tested-by? > >> >> >> > >> > >> >> >> > >> Just to confirm: does the cleanup performed in the modules going > >> >> >> > >> notifier end up acting as a barrier first before freeing the memory ? > >> >> >> > >> If not, is it explicitly stated that a barrier must be issued before > >> >> >> > >> module unload ? > >> >> >> > >> > >> >> >> > > > >> >> >> > > You mean rcu_barrier? It is mentioned in the documentation that this is the > >> >> >> > > responsibility of the module writer to prevent delays for all modules. > >> >> >> > > >> >> >> > It's a srcu barrier yes. Considering it would be a barrier specific to the > >> >> >> > srcu domain within that module, I don't see how it would cause delays for > >> >> >> > "all" modules if we implicitly issue the barrier on module unload. What > >> >> >> > am I missing ? > >> >> >> > >> >> >> Yes you are right. I thought of this after I just sent my email. I think it > >> >> >> makes sense for srcu case to do and could avoid a class of bugs. > >> >> > > >> >> > If there are call_srcu() callbacks outstanding, the module writer still > >> >> > needs the srcu_barrier() because otherwise callbacks arrive after > >> >> > the module text has gone, which will be disappoint the CPU when it > >> >> > tries fetching instructions that are no longer mapped. If there are > >> >> > no call_srcu() callbacks from that module, then there is no need for > >> >> > srcu_barrier() either way. > >> >> > > >> >> > So if an srcu_barrier() is needed, the module developer needs to > >> >> > supply it. > >> >> > >> >> When you say "callbacks arrive after the module text has gone", > >> >> I think you assume that free_module() is invoked before the > >> >> MODULE_STATE_GOING notifiers are called. But it's done in the > >> >> opposite order: going notifiers are called first, and then > >> >> free_module() is invoked. > >> >> > >> >> So AFAIU it would be safe to issue the srcu_barrier() from the module > >> >> going notifier. > >> >> > >> >> Or am I missing something ? > >> > > >> > We do seem to be talking past each other. ;-) > >> > > >> > This has nothing to do with the order of events at module-unload time. > >> > > >> > So please let me try again. > >> > > >> > If a given srcu_struct in a module never has call_srcu() invoked, there > >> > is no need to invoke rcu_barrier() at any time, whether at module-unload > >> > time or not. Adding rcu_barrier() in this case adds overhead and latency > >> > for no good reason. > >> > >> Not if we invoke srcu_barrier() for that specific domain. If > >> call_srcu was never invoked for a srcu domain, I don't see why > >> srcu_barrier() should be more expensive than a simple check that > >> the domain does not have any srcu work queued. > > > > But that simple check does involve a cache miss for each possible CPU (not > > just each online CPU), so it is non-trivial, especially on large systems. > > > >> > If a given srcu_struct in a module does have at least one call_srcu() > >> > invoked, it is already that module's responsibility to make sure that > >> > the code sticks around long enough for the callback to be invoked. > >> > >> I understand that when users do explicit dynamic allocation/cleanup of > >> srcu domains, they indeed need to take care of doing explicit srcu_barrier(). > >> However, if they do static definition of srcu domains, it would be nice > >> if we can handle the barriers under the hood. > > > > All else being equal, of course. But... > > > >> > This means that correct SRCU users that invoke call_srcu() already > >> > have srcu_barrier() at module-unload time. Incorrect SRCU users, with > >> > reasonable probability, now get a WARN_ON() at module-unload time, with > >> > the per-CPU state getting leaked. Before this change, they would (also > >> > with reasonable probability) instead get an instruction-fetch fault when > >> > the SRCU callback was invoked after the completion of the module unload. > >> > Furthermore, in all cases where they would previously have gotten the > >> > instruction-fetch fault, they now get the WARN_ON(), like this: > >> > > >> > if (WARN_ON(rcu_segcblist_n_cbs(&sdp->srcu_cblist))) > >> > return; /* Forgot srcu_barrier(), so just leak it! */ > >> > > >> > So this change already represents an improvement in usability. > >> > >> Considering that we can do a srcu_barrier() for the specific domain, > >> and that it should add no noticeable overhead if there is no queued > >> callbacks, I don't see a good reason for leaving the srcu_barrier > >> invocation to the user rather than implicitly doing it from the > >> module going notifier. > > > > Now, I could automatically add an indicator of whether or not a > > call_srcu() had happened, but then again, that would either add a > > call_srcu() scalability bottleneck or again require a scan of all possible > > CPUs... to figure out if it was necessary to scan all possible CPUs. > > > > Or is scanning all possible CPUs down in the noise in this case? Or > > am I missing a trick that would reduce the overhead? > > Module unloading implicitly does a synchronize_rcu (for RCU-sched), and > a stop_machine. So I would be tempted to say that overhead of iteration > over all CPUs might not matter that much considering the rest. > > About notifying that a call_srcu has happened for the srcu domain in a > scalable fashion, let's see... We could have a flag "call_srcu_used" > for each call_srcu domain. Whenever call_srcu is invoked, it would > load that flag. It sets it on first use. > > The idea here is to only use that flag when srcu_barrier is performed > right before the srcu domain cleanup (it could become part of that > cleanup). Else, using it in all srcu_barrier() might be tricky, because > we may then need to add memory barriers or locking to the call_srcu > fast-path, which is an overhead we try to avoid. > > However, if we only use that flag as part of the srcu domain cleanup, > it's already prohibited to invoke call_srcu concurrently with the > cleanup of the same domain, so I don't think we would need any > memory barriers in call_srcu. About the last part of your email, it seems to that if after call_srcu has returned, if the module could be unloaded on some other CPU - then it would need to see the flag stored by the preceding call_srcu, so I believe there would be a memory barrier between the two opreations (call_srcu and module unload). Also about doing the unconditional srcu_barrier, since a module could be unloaded at any time - don't all SRCU using modules need to invoke srcu_barrier() during their clean up anyway so we are incurring the barrier overhead anyway? Or, am I missing a design pattern here? It seems to me rcutorture module definitely calls srcu_barrier() before it is unloaded. thanks, - Joel > Thoughts ? > > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com