All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v1 00/14] Implement call_rcu_lazy() and miscellaneous fixes
@ 2022-05-12  3:04 Joel Fernandes (Google)
  2022-05-12  3:04 ` [RFC v1 01/14] rcu: Add a lock-less lazy RCU implementation Joel Fernandes (Google)
                   ` (15 more replies)
  0 siblings, 16 replies; 73+ messages in thread
From: Joel Fernandes (Google) @ 2022-05-12  3:04 UTC (permalink / raw)
  To: rcu
  Cc: rushikesh.s.kadam, urezki, neeraj.iitr10, frederic, paulmck,
	rostedt, Joel Fernandes (Google)

Hello!
Please find the proof of concept version of call_rcu_lazy() attached. This
gives a lot of savings when the CPUs are relatively idle. Huge thanks to
Rushikesh Kadam from Intel for investigating it with me.

Some numbers below:

Following are power savings we see on top of RCU_NOCB_CPU on an Intel platform.
The observation is that due to a 'trickle down' effect of RCU callbacks, the
system is very lightly loaded but constantly running few RCU callbacks very
often. This confuses the power management hardware that the system is active,
when it is in fact idle.

For example, when ChromeOS screen is off and user is not doing anything on the
system, we can see big power savings.
Before:
Pk%pc10 = 72.13
PkgWatt = 0.58
CorWatt = 0.04

After:
Pk%pc10 = 81.28
PkgWatt = 0.41
CorWatt = 0.03

Further, when ChromeOS screen is ON but system is idle or lightly loaded, we
can see that the display pipeline is constantly doing RCU callback queuing due
to open/close of file descriptors associated with graphics buffers. This is
attributed to the file_free_rcu() path which this patch series also touches.

This patch series adds a simple but effective, and lockless implementation of
RCU callback batching. On memory pressure, timeout or queue growing too big, we
initiate a flush of one or more per-CPU lists.

Similar results can be achieved by increasing jiffies_till_first_fqs, however
that also has the effect of slowing down RCU. Especially I saw huge slow down
of function graph tracer when increasing that.

One drawback of this series is, if another frequent RCU callback creeps up in
the future, that's not lazy, then that will again hurt the power. However, I
believe identifying and fixing those is a more reasonable approach than slowing
RCU down for the whole system.

NOTE: Add debug patch is added in the series toggle /proc/sys/kernel/rcu_lazy
at runtime to turn it on or off globally. It is default to on. Further, please
use the sysctls in lazy.c for further tuning of parameters that effect the
flushing.

Disclaimer 1: Don't boot your personal system on it yet anticipating power
savings, as TREE07 still causes RCU stalls and I am looking more into that, but
I believe this series should be good for general testing.

Disclaimer 2: I have intentionally not CC'd other subsystem maintainers (like
net, fs) to keep noise low and will CC them in the future after 1 or 2 rounds
of review and agreements.

Joel Fernandes (Google) (14):
  rcu: Add a lock-less lazy RCU implementation
  workqueue: Add a lazy version of queue_rcu_work()
  block/blk-ioc: Move call_rcu() to call_rcu_lazy()
  cred: Move call_rcu() to call_rcu_lazy()
  fs: Move call_rcu() to call_rcu_lazy() in some paths
  kernel: Move various core kernel usages to call_rcu_lazy()
  security: Move call_rcu() to call_rcu_lazy()
  net/core: Move call_rcu() to call_rcu_lazy()
  lib: Move call_rcu() to call_rcu_lazy()
  kfree/rcu: Queue RCU work via queue_rcu_work_lazy()
  i915: Move call_rcu() to call_rcu_lazy()
  rcu/kfree: remove useless monitor_todo flag
  rcu/kfree: Fix kfree_rcu_shrink_count() return value
  DEBUG: Toggle rcu_lazy and tune at runtime

 block/blk-ioc.c                            |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c |   2 +-
 fs/dcache.c                                |   4 +-
 fs/eventpoll.c                             |   2 +-
 fs/file_table.c                            |   3 +-
 fs/inode.c                                 |   2 +-
 include/linux/rcupdate.h                   |   6 +
 include/linux/sched/sysctl.h               |   4 +
 include/linux/workqueue.h                  |   1 +
 kernel/cred.c                              |   2 +-
 kernel/exit.c                              |   2 +-
 kernel/pid.c                               |   2 +-
 kernel/rcu/Kconfig                         |   8 ++
 kernel/rcu/Makefile                        |   1 +
 kernel/rcu/lazy.c                          | 153 +++++++++++++++++++++
 kernel/rcu/rcu.h                           |   5 +
 kernel/rcu/tree.c                          |  28 ++--
 kernel/sysctl.c                            |  23 ++++
 kernel/time/posix-timers.c                 |   2 +-
 kernel/workqueue.c                         |  25 ++++
 lib/radix-tree.c                           |   2 +-
 lib/xarray.c                               |   2 +-
 net/core/dst.c                             |   2 +-
 security/security.c                        |   2 +-
 security/selinux/avc.c                     |   4 +-
 25 files changed, 255 insertions(+), 34 deletions(-)
 create mode 100644 kernel/rcu/lazy.c

-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2022-08-09  2:25 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-12  3:04 [RFC v1 00/14] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
2022-05-12  3:04 ` [RFC v1 01/14] rcu: Add a lock-less lazy RCU implementation Joel Fernandes (Google)
2022-05-12 23:56   ` Paul E. McKenney
2022-05-14 15:08     ` Joel Fernandes
2022-05-14 16:34       ` Paul E. McKenney
2022-05-27 23:12         ` Joel Fernandes
2022-05-28 17:57           ` Paul E. McKenney
2022-05-30 14:48             ` Joel Fernandes
2022-05-30 16:42               ` Paul E. McKenney
2022-05-31  2:12                 ` Joel Fernandes
2022-05-31  4:26                   ` Paul E. McKenney
2022-05-31 16:11                     ` Joel Fernandes
2022-05-31 16:45                       ` Paul E. McKenney
2022-05-31 18:51                         ` Joel Fernandes
2022-05-31 19:25                           ` Paul E. McKenney
2022-05-31 21:29                             ` Joel Fernandes
2022-05-31 22:44                               ` Joel Fernandes
2022-06-01 14:24     ` Frederic Weisbecker
2022-06-01 16:17       ` Paul E. McKenney
2022-06-01 19:09       ` Joel Fernandes
2022-05-17  9:07   ` Uladzislau Rezki
2022-05-30 14:54     ` Joel Fernandes
2022-06-01 14:12       ` Frederic Weisbecker
2022-06-01 19:10         ` Joel Fernandes
2022-05-12  3:04 ` [RFC v1 02/14] workqueue: Add a lazy version of queue_rcu_work() Joel Fernandes (Google)
2022-05-12 23:58   ` Paul E. McKenney
2022-05-14 14:44     ` Joel Fernandes
2022-05-12  3:04 ` [RFC v1 03/14] block/blk-ioc: Move call_rcu() to call_rcu_lazy() Joel Fernandes (Google)
2022-05-13  0:00   ` Paul E. McKenney
2022-05-12  3:04 ` [RFC v1 04/14] cred: " Joel Fernandes (Google)
2022-05-13  0:02   ` Paul E. McKenney
2022-05-14 14:41     ` Joel Fernandes
2022-05-12  3:04 ` [RFC v1 05/14] fs: Move call_rcu() to call_rcu_lazy() in some paths Joel Fernandes (Google)
2022-05-13  0:07   ` Paul E. McKenney
2022-05-14 14:40     ` Joel Fernandes
2022-05-12  3:04 ` [RFC v1 06/14] kernel: Move various core kernel usages to call_rcu_lazy() Joel Fernandes (Google)
2022-05-12  3:04 ` [RFC v1 07/14] security: Move call_rcu() " Joel Fernandes (Google)
2022-05-12  3:04 ` [RFC v1 08/14] net/core: " Joel Fernandes (Google)
2022-05-12  3:04 ` [RFC v1 09/14] lib: " Joel Fernandes (Google)
2022-05-12  3:04 ` [RFC v1 10/14] kfree/rcu: Queue RCU work via queue_rcu_work_lazy() Joel Fernandes (Google)
2022-05-13  0:12   ` Paul E. McKenney
2022-05-13 14:55     ` Uladzislau Rezki
2022-05-14 14:33       ` Joel Fernandes
2022-05-14 19:10         ` Uladzislau Rezki
2022-05-12  3:04 ` [RFC v1 11/14] i915: Move call_rcu() to call_rcu_lazy() Joel Fernandes (Google)
2022-05-12  3:04 ` [RFC v1 12/14] rcu/kfree: remove useless monitor_todo flag Joel Fernandes (Google)
2022-05-13 14:53   ` Uladzislau Rezki
2022-05-14 14:35     ` Joel Fernandes
2022-05-14 19:48       ` Uladzislau Rezki
2022-05-12  3:04 ` [RFC v1 13/14] rcu/kfree: Fix kfree_rcu_shrink_count() return value Joel Fernandes (Google)
2022-05-13 14:54   ` Uladzislau Rezki
2022-05-14 14:34     ` Joel Fernandes
2022-05-12  3:04 ` [RFC v1 14/14] DEBUG: Toggle rcu_lazy and tune at runtime Joel Fernandes (Google)
2022-05-13  0:16   ` Paul E. McKenney
2022-05-14 14:38     ` Joel Fernandes
2022-05-14 16:21       ` Paul E. McKenney
2022-05-12  3:17 ` [RFC v1 00/14] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes
2022-05-12 13:09   ` Uladzislau Rezki
2022-05-12 13:56     ` Uladzislau Rezki
2022-05-12 14:03       ` Joel Fernandes
2022-05-12 14:37         ` Uladzislau Rezki
2022-05-12 16:09           ` Joel Fernandes
2022-05-12 16:32             ` Uladzislau Rezki
     [not found]               ` <Yn5e7w8NWzThUARb@pc638.lan>
2022-05-13 14:51                 ` Joel Fernandes
2022-05-13 15:43                   ` Uladzislau Rezki
2022-05-14 14:25                     ` Joel Fernandes
2022-05-14 19:01                       ` Uladzislau Rezki
2022-08-09  2:25                       ` Joel Fernandes
2022-05-13  0:23   ` Paul E. McKenney
2022-05-13 14:45     ` Joel Fernandes
2022-06-13 18:53 ` Joel Fernandes
2022-06-13 22:48   ` Paul E. McKenney
2022-06-16 16:26     ` Joel Fernandes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.