All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Uladzislau Rezki <urezki@gmail.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	linux-kernel@vger.kernel.org, Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	linux-doc@vger.kernel.org, rcu@vger.kernel.org
Subject: Re: [PATCH v3] rcu: Add a minimum time for marking boot as completed
Date: Tue, 7 Mar 2023 18:54:43 +0000	[thread overview]
Message-ID: <20230307185443.GA516865@google.com> (raw)
In-Reply-To: <20230307173313.GJ1301832@paulmck-ThinkPad-P17-Gen-1>

On Tue, Mar 07, 2023 at 09:33:13AM -0800, Paul E. McKenney wrote:
> On Tue, Mar 07, 2023 at 08:48:52AM -0500, Joel Fernandes wrote:
> > On Tue, Mar 7, 2023 at 8:40 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> > >
> > > On Tue, Mar 07, 2023 at 02:01:54PM +0100, Frederic Weisbecker wrote:
> > > > On Fri, Mar 03, 2023 at 09:38:51PM +0000, Joel Fernandes (Google) wrote:
> > > > > On many systems, a great deal of boot (in userspace) happens after the
> > > > > kernel thinks the boot has completed. It is difficult to determine if
> > > > > the system has really booted from the kernel side. Some features like
> > > > > lazy-RCU can risk slowing down boot time if, say, a callback has been
> > > > > added that the boot synchronously depends on. Further expedited callbacks
> > > > > can get unexpedited way earlier than it should be, thus slowing down
> > > > > boot (as shown in the data below).
> > > > >
> > > > > For these reasons, this commit adds a config option
> > > > > 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter rcupdate.boot_end_delay.
> > > > > Userspace can also make RCU's view of the system as booted, by writing the
> > > > > time in milliseconds to: /sys/module/rcupdate/parameters/rcu_boot_end_delay
> > > > > Or even just writing a value of 0 to this sysfs node.
> > > > > However, under no circumstance will the boot be allowed to end earlier
> > > > > than just before init is launched.
> > > > >
> > > > > The default value of CONFIG_RCU_BOOT_END_DELAY is chosen as 15s. This
> > > > > suites ChromeOS and also a PREEMPT_RT system below very well, which need
> > > > > no config or parameter changes, and just a simple application of this patch. A
> > > > > system designer can also choose a specific value here to keep RCU from marking
> > > > > boot completion.  As noted earlier, RCU's perspective of the system as booted
> > > > > will not be marker until at least rcu_boot_end_delay milliseconds have passed
> > > > > or an update is made via writing a small value (or 0) in milliseconds to:
> > > > > /sys/module/rcupdate/parameters/rcu_boot_end_delay.
> > > > >
> > > > > One side-effect of this patch is, there is a risk that a real-time workload
> > > > > launched just after the kernel boots will suffer interruptions due to expedited
> > > > > RCU, which previous ended just before init was launched. However, to mitigate
> > > > > such an issue (however unlikely), the user should either tune
> > > > > CONFIG_RCU_BOOT_END_DELAY to a smaller value than 15 seconds or write a value
> > > > > of 0 to /sys/module/rcupdate/parameters/rcu_boot_end_delay, once userspace
> > > > > boots, and before launching the real-time workload.
> > > > >
> > > > > Qiuxu also noted impressive boot-time improvements with earlier version
> > > > > of patch. An excerpt from the data he shared:
> > > > >
> > > > > 1) Testing environment:
> > > > >     OS            : CentOS Stream 8 (non-RT OS)
> > > > >     Kernel     : v6.2
> > > > >     Machine : Intel Cascade Lake server (2 sockets, each with 44 logical threads)
> > > > >     Qemu  args  : -cpu host -enable-kvm, -smp 88,threads=2,sockets=2, …
> > > > >
> > > > > 2) OS boot time definition:
> > > > >     The time from the start of the kernel boot to the shell command line
> > > > >     prompt is shown from the console. [ Different people may have
> > > > >     different OS boot time definitions. ]
> > > > >
> > > > > 3) Measurement method (very rough method):
> > > > >     A timer in the kernel periodically prints the boot time every 100ms.
> > > > >     As soon as the shell command line prompt is shown from the console,
> > > > >     we record the boot time printed by the timer, then the printed boot
> > > > >     time is the OS boot time.
> > > > >
> > > > > 4) Measured OS boot time (in seconds)
> > > > >    a) Measured 10 times w/o this patch:
> > > > >         8.7s, 8.4s, 8.6s, 8.2s, 9.0s, 8.7s, 8.8s, 9.3s, 8.8s, 8.3s
> > > > >         The average OS boot time was: ~8.7s
> > > > >
> > > > >    b) Measure 10 times w/ this patch:
> > > > >         8.5s, 8.2s, 7.6s, 8.2s, 8.7s, 8.2s, 7.8s, 8.2s, 9.3s, 8.4s
> > > > >         The average OS boot time was: ~8.3s.
> > > > >
> > > > > Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > >
> > > > I still don't really like that:
> > > >
> > > > 1) It feels like we are curing a symptom for which we don't know the cause.
> > > >    Which RCU write side caller is the source of this slow boot? Some tracepoints
> > > >    reporting the wait duration within synchronize_rcu() calls between the end of
> > > >    the kernel boot and the end of userspace boot may be helpful.
> > > >
> > > > 2) The kernel boot was already covered before this patch so this is about
> > > >    userspace code calling into the kernel. Is that piece of code also called
> > > >    after the boot? In that case are we missing a conversion from
> > > >    synchronize_rcu() to synchronize_rcu_expedited() somewhere? Because then
> > > >    the problem is more general than just boot.
> > > >
> > > > This needs to be analyzed first and if it happens that the issue really
> > > > needs to be fixed with telling the kernel that userspace has completed
> > > > booting, eg: because the problem is not in a few callsites that need conversion
> > > > to expedited but instead in the accumulation of lots of calls that should stay
> > > > as is:
> > > >
> > > > 3) This arbitrary timeout looks dangerous to me as latency sensitive code
> > > >    may run right after the boot. Either you choose a value that is too low
> > > >    and you miss the optimization or the value is too high and you may break
> > > >    things.
> > > >
> > > > 4) This should be fixed the way you did:
> > > >    a) a kernel parameter like you did
> > > >    b) The init process (systemd?) tells the kernel when it judges that userspace
> > > >       has completed booting.
> > > >    c) Make these interfaces more generic, maybe that information will be useful
> > > >       outside RCU. For example the kernel parameter should be
> > > >       "user_booted_reported" and the sysfs (should be sysctl?):
> > > >       kernel.user_booted = 1
> > > >    d) But yuck, this means we must know if the init process supports that...
> > > >
> > > > For these reasons, let's make sure we know exactly what is going on first.
> > > >
> > > > Thanks.
> > > Just add some notes and thoughts. There is a rcupdate.rcu_expedited=1
> > > parameter that can be used during the boot. For example on our devices
> > > to speedup a boot we boot the kernel with rcu_expedited:
> > >
> > > XQ-DQ54:/ # cat /proc/cmdline
> > > XQ-DQ54:/ #
> > >
> > > then a user space can decides if it is needed or not:
> > >
> > > <snip>
> > > rcu_expedited  rcu_normal
> > > XQ-DQ54:/ # ls -al /sys/kernel/rcu_*
> > > -rw-r--r-- 1 root root 4096 2023-02-16 09:27 /sys/kernel/rcu_expedited
> > > -rw-r--r-- 1 root root 4096 2023-02-16 09:27 /sys/kernel/rcu_normal
> > > XQ-DQ54:/ #
> > > <snip>
> > >
> > > for lazy we can add "rcu_cb_lazy" parameter and boot the kernel with
> > > true or false. So we can follow and be aligned with rcu_expedited and
> > > rcu_normal parameters.
> > 
> > Speaking of aligning, there is also the automated
> > rcu_normal_after_boot boot option correct? I prefer the automated
> > option of doing this. So the approach here is not really unprecedented
> > and is much more robust than relying on userspace too much (I am ok
> > with adding your suggestion *on top* of the automated toggle, but I
> > probably would not have ChromeOS use it if the automated way exists).
> > Or did I miss something?
> 
> See this commit:
> 
> 3705b88db0d7cc ("rcu: Add a module parameter to force use of expedited RCU primitives")
> 
> Antti provided this commit precisely in order to allow Android devices
> to expedite the boot process and to shut off the expediting at a time of
> Android userspace's choosing.  So Android has been making this work for
> about ten years, which strikes me as an adequate proof of concept.  ;-)

Thanks for the pointer. That's true. Looking at Android sources, I find that
Android Mediatek devices at least are setting rcu_expedited to 1 at late
stage of their userspace boot (which is weird, it should be set to 1 as early
as possible), and interestingly I cannot find them resetting it back to 0!.
Maybe they set rcu_normal to 1? But I cannot find that either. Vlad? :P

> Of course, Android has a rather tightly controlled userspace, as do
> real-time embedded systems (I sure hope, anyway!).  Which is why your
> timeout-based fallback/backup makes a lot of sense.  And why someone might
> want an aggressive indication when that timeout-based backup is needed.

Or someone designs a system but is unaware of RCU behavior during boot. ;-)

thanks,

 - Joel


  reply	other threads:[~2023-03-07 19:10 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-03 21:38 [PATCH v3] rcu: Add a minimum time for marking boot as completed Joel Fernandes (Google)
2023-03-04  1:02 ` Paul E. McKenney
2023-03-04  4:51   ` Joel Fernandes
2023-03-11 20:44     ` Paul E. McKenney
2023-03-11 22:23       ` Joel Fernandes
2023-03-11 22:57         ` Paul E. McKenney
2023-03-06  8:24   ` Zhuo, Qiuxu
2023-03-06 14:49     ` Paul E. McKenney
2023-03-07  7:49       ` Zhuo, Qiuxu
2023-03-07 15:22         ` Paul E. McKenney
2023-03-09 15:17           ` Zhuo, Qiuxu
2023-03-09 21:53             ` Paul E. McKenney
2023-03-10  0:11               ` Akira Yokosawa
2023-03-10  1:47                 ` Paul E. McKenney
2023-03-10  2:35                 ` Zhuo, Qiuxu
2023-03-05 11:39 ` Uladzislau Rezki
2023-03-05 15:03   ` Joel Fernandes
2023-03-06  8:37     ` Zhuo, Qiuxu
2023-03-05 20:34   ` Paul E. McKenney
2023-03-07 13:01 ` Frederic Weisbecker
2023-03-07 13:40   ` Uladzislau Rezki
2023-03-07 13:48     ` Joel Fernandes
2023-03-07 17:33       ` Paul E. McKenney
2023-03-07 18:54         ` Joel Fernandes [this message]
2023-03-07 19:27           ` Paul E. McKenney
2023-03-08  9:41             ` Uladzislau Rezki
2023-03-08 14:45               ` Paul E. McKenney
2023-03-09 12:57                 ` Uladzislau Rezki
2023-03-09 22:10                   ` Joel Fernandes
2023-03-10  8:55                     ` Uladzislau Rezki
2023-03-11  6:24                       ` Paul E. McKenney
2023-03-11 17:19                         ` Joel Fernandes
2023-03-13  9:51                         ` Uladzislau Rezki
2023-03-13 12:27                           ` Uladzislau Rezki
2023-03-13 13:48                             ` Zhuo, Qiuxu
2023-03-13 15:28                               ` Uladzislau Rezki
2023-03-13 13:58                           ` Joel Fernandes
2023-03-13 15:32                             ` Uladzislau Rezki
2023-03-13 15:49                               ` Joel Fernandes
2023-03-13 18:12                                 ` Uladzislau Rezki
2023-03-13 18:56                                   ` Paul E. McKenney
2023-03-14 11:16                                     ` Uladzislau Rezki
2023-03-14 13:49                                       ` Paul E. McKenney
2023-03-14 22:44                               ` Joel Fernandes
2023-03-15 17:12                                 ` Uladzislau Rezki
2023-03-15 12:21                       ` Joel Fernandes
2023-03-15 16:12                         ` Paul E. McKenney
2023-03-08 10:14       ` Uladzislau Rezki
2023-03-15 12:18         ` Joel Fernandes
2023-03-07 13:41   ` Joel Fernandes
2023-03-07 17:19     ` Frederic Weisbecker
2023-03-07 18:19       ` Joel Fernandes
2023-03-08 13:52       ` Joel Fernandes
2023-03-08 15:01         ` Frederic Weisbecker
2023-03-08 15:09           ` Joel Fernandes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230307185443.GA516865@google.com \
    --to=joel@joelfernandes.org \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=qiuxu.zhuo@intel.com \
    --cc=rcu@vger.kernel.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.