All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Uladzislau Rezki <urezki@gmail.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Neeraj Upadhyay <neeraj.iitr10@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	"Kadam, Rushikesh S" <rushikesh.s.kadam@intel.com>,
	rcu <rcu@vger.kernel.org>
Subject: Re: [PATCH v5 00/18] Implement call_rcu_lazy() and miscellaneous fixes
Date: Tue, 20 Sep 2022 17:55:53 +0000	[thread overview]
Message-ID: <Yyn+qSZiYI02WxOv@google.com> (raw)
In-Reply-To: <20220920123208.GL4196@paulmck-ThinkPad-P17-Gen-1>

Sorry for slightly delayed reply, today was crazy at work.

On Tue, Sep 20, 2022 at 05:32:08AM -0700, Paul E. McKenney wrote:
> On Tue, Sep 20, 2022 at 07:40:01AM -0400, Joel Fernandes wrote:
> > +rcu@vger for archives.
> > On Tue, Sep 20, 2022 at 7:37 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > > > On 9/19/2022 7:40 PM, Joel Fernandes wrote:
> > > >> On 9/19/2022 6:32 PM, Paul E. McKenney wrote:
> > > >>> On Mon, Sep 19, 2022 at 05:38:19PM -0400, Joel Fernandes wrote:
> > > >>>> On Mon, Sep 19, 2022 at 9:11 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >>>>>
> > > >>>>> On Mon, Sep 19, 2022 at 10:08:28AM +0200, Uladzislau Rezki wrote:
> > > >>>>>> On Wed, Sep 14, 2022 at 04:22:51AM -0700, Paul E. McKenney wrote:
> > > >>>>>>> On Wed, Sep 14, 2022 at 11:38:32AM +0100, Uladzislau Rezki wrote:
> > > >>>>>>>> On Wed, Sep 14, 2022, 11:23 Frederic Weisbecker <frederic@kernel.org> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> On Wed, Sep 14, 2022 at 02:34:23AM -0700, Paul E. McKenney wrote:
> > > >>>>>>>>>> Agreed.  There could be a list of functions in call_rcu() that are
> > > >>>>>>>>>> treated specially.
> > > >>>>>>>>>>
> > > >>>>>>>>>> But please be careful.  Maintainers of a given function might have
> > > >>>>>>>>>> something to say about that function being added to that list.  ;-)
> > > >>>>>>>>>>
> > > >>>>>>>>>> But one big advantage of this is that the list of functions could be
> > > >>>>>>>>>> private to ChromeOS and/or Android.  Maintainers might still be annoyed
> > > >>>>>>>>>> about bugs sent to them whose root cause was addition of their function
> > > >>>>>>>>>> to the call_rcu() list, though.  ;-)
> > > >>>>>>>>>
> > > >>>>>>>>> Are you sure you want this feature to be usable and debuggable by Android
> > > >>>>>>>>> only?
> > > >>>>>>>>>
> > > >>>>>>>> Adding on top of Frederic sentence. Sorry for that 😀
> > > >>>>>>>>
> > > >>>>>>>> Do you mean to have the function names internally to be able to detect what
> > > >>>>>>>> is called and based on that func list place them in a lazy track?
> > > >>>>>>>
> > > >>>>>>> Yes, if needed.  But in addition to using call_rcu_lazy() where the
> > > >>>>>>> callback functions' maintainers are good with it.
> > > >>>>>>>
> > > >>>>>>> Of course, if all relevant maintainers are OK with all the callback
> > > >>>>>>> functions being passed to call_rcu_lazy(), then their is no need for
> > > >>>>>>> the internal switching.
> > > >>>>>>>
> > > >>>>>> Then we would need map function names internally. I think what we got as
> > > >>>>>> a feedback from the conference session is it makes sense to improve the
> > > >>>>>> batching in general of call_rcu(). Doing it we will improve and cover
> > > >>>>>> all current users. So we do not need to distinguish lazy variant and a
> > > >>>>>> normal one from each other. Also we do not need to explain people the
> > > >>>>>> difference and switch to _lazy() across the kernel code. As we also
> > > >>>>>> discussed one regular non-lazy call will revert everything back to a
> > > >>>>>> problem that is in question.
> > > >>>>>>
> > > >>>>>> As Thomas said, look at it as: delay of invocation and if needed in
> > > >>>>>> corner case do kind of expedited or sync support. For example call_rcu_sync().
> > > >>>>>>
> > > >>>>>> It is not up to me but... I just share my view :)
> > > >>>>>
> > > >>>>> Me, I am a bit pessimistic about the probability of us getting away with
> > > >>>>> making the file-close RCU callback be lazy.
> > > >>>>>
> > > >>>>> After all, this thing is used across all filesystems along with a great
> > > >>>>> many things that are not filesystems.  It doesn't really matter whether
> > > >>>>> this laziness comes from a conversion to call_rcu_lazy() as in Joel's most
> > > >>>>> recent patch, lack of a conversion to call_rcu_expedited() as in Thomas
> > > >>>>> Gleixner's suggestion, or a mapping within call_rcu() as I suggested.
> > > >>>>>
> > > >>>>> Don't get me wrong, it would be great if this callback really can be
> > > >>>>> always lazy, non-expedited, or whatever, but we should not take this
> > > >>>>> for granted.
> > > >>>>
> > > >>>> Are you saying we should not make call_rcu() default-lazy for
> > > >>>> everyone? According to Thomas, anyone expecting call_rcu() to finish
> > > >>>> in any reasonable amount of time is a bug, and that usecase needs to
> > > >>>> be fixed to use synchronize_rcu(). Are we saying now that that's not
> > > >>>> our direction?
> > > >>>
> > > >>> I am saying that Thomas can sometimes get quite enthusiastic about
> > > >>> a particular direction.  Plus I strongly suspect that Thomas has not
> > > >>> actually reviewed all of the RCU callbacks.
> > > >>>
> > > >>> Of course, I agree that it won't hurt to try out Thomas's suggestion,
> > > >>> which is why I did not object at the time.  But we should be fully
> > > >>> prepared for it to blow up in our faces.  ;-)
> > > >>
> > > >> Makes sense, thanks for clarifying. My impression was he does expect some pain
> > > >> in the short-term but the pain is worth not introduction a new opt-in API.
> > > >>
> > > >> I am wondering if putting it behind a CONFIG option will delay the pain too
> > > >> much. Thoughts? I am assuming that we want the CONFIG to turn it off by default.
> > > >
> > > > I tried a poor man's version of Thomas's idea by doing
> > > > jiffies_till_first_fqs=1000 and rcu.rcu_expedited=1 boot parameters. Of course
> > > > this just for trying has no regard for memory pressure and such so it is in the
> > > > 'do not try at home' category.
> > > >
> > > > Then I tried jiffies_till_first_fqs=128 and compare before/after kernel logs.
> > > >
> > > > These are on ChromeOS hardware (5.19-rc4 kernel)
> > > > For jiffies_till_first_fqs=1000 Everything is fine until this point:
> > > >     8.383646] Freeing unused kernel image (initmem) memory: 1676K
> > > >    11.378880] Write protecting the kernel read-only data: 28672k
> > > >
> > > > I do not see this 3 second delay with jiffies_till_first_fqs=128.
> > > >     8.411866] Freeing unused kernel image (initmem) memory: 1676K
> > > >     8.656701] Write protecting the kernel read-only data: 28672k
> > >
> > > At least this could be because of the rodata write protection stuff calling
> > > rcu_barrier():
> > >
> > > From init/main.c
> > >
> > > #ifdef CONFIG_STRICT_KERNEL_RWX
> > > static void mark_readonly(void)
> > > {
> > >         if (rodata_enabled) {
> > >                 /*
> > >                  * load_module() results in W+X mappings, which are cleaned
> > >                  * up with call_rcu().  Let's make sure that queued work is
> > >                  * flushed so that we don't hit false positives looking for
> > >                  * insecure pages which are W+X.
> > >                  */
> > >                 rcu_barrier();
> > >                 mark_rodata_ro();
> > >                 rodata_test();
> > >         } else
> > >                 pr_info("Kernel memory protection disabled.\n");
> > > }
> > >
> > >
> > > I think rcu_barrier() does not play well with jiffiest_till_first_fqs (probably
> > > because of the missing wake ups I found in the existing code, I should try that
> > > patch on this test..).
> > >
> > > Is it expected that a sleep involving jiffies_till_first_fqs does not get
> > > disturbed due to rcu_barrier() ? That seems like a bug to me. At least for
> > > !NOCB, I don't see any call the rcu_gp_kthread_wake() from rcu_barrier()!
> 
> Another approach is to take your current stack of patches, but only those
> to RCU, and then as Thomas suggests, make call_rcu() behave lazily and
> make a call_rcu_expedited() that retains the current behavior.  Keep the
> current grace-period timings.

Yes these sound good. But one issue is, we do not know all the cases that
needed call_rcu_expedited(). There is the per-cpu refcount which needs it,
but I'm not aware of any others. If you take the patches, can we add convert
to call_rcu_expedited() as regressions happen?

> Then make synchronize_rcu() and friends use call_rcu_expedited() and see
> if you can get away with only a small number of call_rcu_expedited()
> "upgrades".

Right. It is a bit of a chicken-and-egg problem, if we have to whitelist
everything, then we have to blacklist as we go. If we blacklist everything,
then we may miss whitelisting something.

Thoughts?

Meanwhile just to clarify- I am very much invested in seeing the current
stack of patches merged soon give or take the _expedited() conversions,
however I did the above jiffies_till_first experiments in the hopes they
would show the ones that need such conversion quickly.

thanks,

 - Joel


  reply	other threads:[~2022-09-20 17:55 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-01 22:17 [PATCH v5 00/18] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 01/18] mm/slub: perform free consistency checks before call_rcu Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 02/18] mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head Joel Fernandes (Google)
2022-09-02  9:26   ` Vlastimil Babka
2022-09-02  9:30     ` Vlastimil Babka
2022-09-02 15:09       ` Joel Fernandes
2022-09-03 13:53         ` Paul E. McKenney
2022-09-01 22:17 ` [PATCH v5 03/18] rcu/tree: Use READ_ONCE() for lockless read of rnp->qsmask Joel Fernandes (Google)
2022-09-06 22:26   ` Boqun Feng
2022-09-06 22:31     ` Joel Fernandes
2022-09-01 22:17 ` [PATCH v5 04/18] rcu: Fix late wakeup when flush of bypass cblist happens Joel Fernandes (Google)
2022-09-02 11:35   ` Frederic Weisbecker
2022-09-02 23:58     ` Joel Fernandes
2022-09-03 15:10       ` Paul E. McKenney
2022-09-04 21:13       ` Frederic Weisbecker
2022-09-03 14:04   ` Paul E. McKenney
2022-09-03 14:05     ` Joel Fernandes
2022-09-06  3:07   ` Joel Fernandes
2022-09-06  9:48     ` Frederic Weisbecker
2022-09-07  2:43       ` Joel Fernandes
2022-09-01 22:17 ` [PATCH v5 05/18] rcu: Move trace_rcu_callback() before bypassing Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 06/18] rcu: Introduce call_rcu_lazy() API implementation Joel Fernandes (Google)
2022-09-02 15:21   ` Frederic Weisbecker
2022-09-02 23:09     ` Joel Fernandes
2022-09-05 12:59       ` Frederic Weisbecker
2022-09-05 20:18         ` Joel Fernandes
2022-09-05 20:32         ` Joel Fernandes
2022-09-06  8:55           ` Paul E. McKenney
2022-09-06 16:16             ` Joel Fernandes
2022-09-06 17:05               ` Paul E. McKenney
2022-09-03 22:00     ` Joel Fernandes
2022-09-04 21:01       ` Frederic Weisbecker
2022-09-05 20:20         ` Joel Fernandes
2022-09-06  3:05     ` Joel Fernandes
2022-09-06 15:17       ` Frederic Weisbecker
2022-09-06 16:15         ` Joel Fernandes
2022-09-06 16:31           ` Joel Fernandes
2022-09-06 16:38             ` Joel Fernandes
2022-09-06 16:43               ` Joel Fernandes
2022-09-06 19:11                 ` Frederic Weisbecker
2022-09-07  2:56                   ` Joel Fernandes
2022-09-07  9:56                     ` Frederic Weisbecker
2022-09-07 10:03           ` Frederic Weisbecker
2022-09-07 14:01             ` Joel Fernandes
2022-09-07  0:06         ` Joel Fernandes
2022-09-07  9:40           ` Frederic Weisbecker
2022-09-07 13:44             ` Joel Fernandes
2022-09-07 15:38               ` Frederic Weisbecker
2022-09-07 15:39                 ` Joel Fernandes
2022-09-21 23:52             ` Joel Fernandes
2022-09-22 14:39               ` Joel Fernandes
2022-09-06 18:16       ` Uladzislau Rezki
2022-09-06 18:21         ` Joel Fernandes
2022-09-07  8:52           ` Uladzislau Rezki
2022-09-07 15:23             ` Joel Fernandes
2022-09-03 14:03   ` Paul E. McKenney
2022-09-03 14:05     ` Joel Fernandes
2022-09-01 22:17 ` [PATCH v5 07/18] rcu: shrinker for lazy rcu Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 08/18] rcu: Add per-CB tracing for queuing, flush and invocation Joel Fernandes (Google)
2022-09-02 16:48   ` kernel test robot
2022-09-03 12:39     ` Paul E. McKenney
2022-09-03 12:39       ` Paul E. McKenney
2022-09-03 14:07       ` Joel Fernandes
2022-09-03 14:07         ` Joel Fernandes
2022-09-02 19:01   ` kernel test robot
2022-09-01 22:17 ` [PATCH v5 09/18] rcuscale: Add laziness and kfree tests Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 10/18] rcutorture: Add test code for call_rcu_lazy() Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 11/18] fs: Move call_rcu() to call_rcu_lazy() in some paths Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 12/18] cred: Move call_rcu() to call_rcu_lazy() Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 13/18] security: " Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 14/18] net/core: " Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 15/18] kernel: Move various core kernel usages " Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 16/18] lib: Move call_rcu() " Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 17/18] i915: " Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 18/18] fork: Move thread_stack_free_rcu() " Joel Fernandes (Google)
2022-09-03 15:22 ` [PATCH v5 00/18] Implement call_rcu_lazy() and miscellaneous fixes Paul E. McKenney
     [not found]   ` <CAEXW_YTxRWvUN3YuwdJ9WPmHH-v+Auozfc_OugpV_hpxC5K+4Q@mail.gmail.com>
     [not found]     ` <20220914093423.GS246308@paulmck-ThinkPad-P17-Gen-1>
     [not found]       ` <20220914102319.GA1936@lothringen>
     [not found]         ` <CA+KHdyUb122cCaeLutzK8Ti-dk4D=iXCtQau=ZXkPRO-1cnD1A@mail.gmail.com>
     [not found]           ` <20220914112251.GV246308@paulmck-ThinkPad-P17-Gen-1>
     [not found]             ` <YygjfJkoUzxy9xd/@pc636>
     [not found]               ` <20220919131152.GF4196@paulmck-ThinkPad-P17-Gen-1>
     [not found]                 ` <CAEXW_YQGPwkTQjxAb5p4OOhJkMA-L93JNTsa-WaXs0Ts7=Oakw@mail.gmail.com>
     [not found]                   ` <20220919223211.GJ4196@paulmck-ThinkPad-P17-Gen-1>
     [not found]                     ` <f061b798-e6f6-1523-52c3-12fe7966f809@joelfernandes.org>
     [not found]                       ` <a2882e80-1723-9004-f67a-cd8332aa701d@joelfernandes.org>
     [not found]                         ` <d72808e1-fc27-f158-37b7-ef23c23e74c1@joelfernandes.org>
2022-09-20 11:40                           ` Joel Fernandes
2022-09-20 12:32                             ` Paul E. McKenney
2022-09-20 17:55                               ` Joel Fernandes [this message]
2022-09-20 19:03                                 ` Paul E. McKenney
2022-09-20 22:27                                   ` Joel Fernandes
2022-09-20 22:35                                     ` Paul E. McKenney
2022-09-20 22:40                                       ` Joel Fernandes
2022-09-21  2:21                                         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yyn+qSZiYI02WxOv@google.com \
    --to=joel@joelfernandes.org \
    --cc=boqun.feng@gmail.com \
    --cc=frederic@kernel.org \
    --cc=neeraj.iitr10@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=rushikesh.s.kadam@intel.com \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.