[PATCH v4 00/10] sched/psi: some optimizations and extensions

* [PATCH v4 00/10] sched/psi: some optimizations and extensions
@ 2022-08-25 16:41 ` Chengming Zhou
  0 siblings, 0 replies; 38+ messages in thread
From: Chengming Zhou @ 2022-08-25 16:41 UTC (permalink / raw)
  To: hannes, tj, mkoutny, surenb
  Cc: mingo, peterz, gregkh, corbet, cgroups, linux-doc, linux-kernel,
	songmuchun, Chengming Zhou

Hi all,

This patch series are some optimizations and extensions for PSI.

patch 1/10 fix periodic aggregation shut off problem introduced by earlier
commit 4117cebf1a9f ("psi: Optimize task switch inside shared cgroups").

patch 2-4 are some misc optimizations, so put them in front of this series.

patch 5/10 optimize task switch inside shared cgroups when in_memstall status
of prev task and next task are different.

patch 6/10 remove NR_ONCPU task accounting to save 4 bytes in the first
cacheline to be used by the following patch 7/10, which introduce new
PSI resource PSI_IRQ to track IRQ/SOFTIRQ pressure stall information.

patch 8-9 cache parent psi_group in struct psi_group to speed up the
hot iteration path.

patch 10/10 introduce a per-cgroup interface "cgroup.pressure" to disable
or re-enable PSI in the cgroup level, and we implement hiding and unhiding
the pressure files per Tejun's suggestion[1], which depends on his work[2].

[1] https://lore.kernel.org/all/YvqjhqJQi2J8RG3X@slm.duckdns.org/
[2] https://lore.kernel.org/all/20220820000550.367085-1-tj@kernel.org/

Performance test using mmtests/config-scheduler-perfpipe in
/user.slice/user-0.slice/session-4.scope:

                                 next                patched       patched/only-leaf
Min       Time        8.82 (   0.00%)        8.49 (   3.74%)        8.00 (   9.32%)
1st-qrtle Time        8.90 (   0.00%)        8.58 (   3.63%)        8.05 (   9.58%)
2nd-qrtle Time        8.94 (   0.00%)        8.61 (   3.65%)        8.09 (   9.50%)
3rd-qrtle Time        8.99 (   0.00%)        8.65 (   3.75%)        8.15 (   9.35%)
Max-1     Time        8.82 (   0.00%)        8.49 (   3.74%)        8.00 (   9.32%)
Max-5     Time        8.82 (   0.00%)        8.49 (   3.74%)        8.00 (   9.32%)
Max-10    Time        8.84 (   0.00%)        8.55 (   3.20%)        8.04 (   9.05%)
Max-90    Time        9.04 (   0.00%)        8.67 (   4.10%)        8.18 (   9.51%)
Max-95    Time        9.04 (   0.00%)        8.68 (   4.03%)        8.20 (   9.26%)
Max-99    Time        9.07 (   0.00%)        8.73 (   3.82%)        8.25 (   9.11%)
Max       Time        9.12 (   0.00%)        8.89 (   2.54%)        8.27 (   9.29%)
Amean     Time        8.95 (   0.00%)        8.62 *   3.67%*        8.11 *   9.43%*

Big thanks to Johannes Weiner, Tejun Heo and Michal Koutný for your
suggestions and review!

Changes in v4:
 - Collect Acked-by tags from Johannes Weiner.
 - Add many clear comments and changelogs per Johannes Weiner.
 - Replace for_each_psi_group() with better open-code.
 - Change to use better names cgroup_pressure_show() and
   cgroup_pressure_write().
 - Change to use better name psi_cgroup_restart() and only
   call it on enabling.

Changes in v3:
 - Rebase on linux-next and reorder patches to put misc optimizations
   patches in the front of this series.
 - Drop patch "sched/psi: don't change task psi_flags when migrate CPU/group"
   since it caused a little performance regression and it's just
   code refactoring, so drop it.
 - Don't define PSI_IRQ and PSI_IRQ_FULL when !CONFIG_IRQ_TIME_ACCOUNTING,
   in which case they are not used.
 - Add patch 8/10 "sched/psi: consolidate cgroup_psi()" make cgroup_psi()
   can handle all cgroups including root cgroup, make patch 9/10 simpler.
 - Rename interface to "cgroup.pressure" and add some explanation
   per Michal's suggestion.
 - Hide and unhide pressure files when disable/re-enable cgroup PSI,
   depends on Tejun's work.

Changes in v2:
 - Add Acked-by tags from Johannes Weiner. Thanks for review!
 - Fix periodic aggregation wakeup for common ancestors in
   psi_task_switch().
 - Add patch 7/10 from Johannes Weiner, which remove NR_ONCPU
   task accounting to save 4 bytes in the first cacheline.
 - Remove "psi_irq=" kernel cmdline parameter in last version.
 - Add per-cgroup interface "cgroup.psi" to disable/re-enable
   PSI stats accounting in the cgroup level.

Chengming Zhou (9):
  sched/psi: fix periodic aggregation shut off
  sched/psi: don't create cgroup PSI files when psi_disabled
  sched/psi: save percpu memory when !psi_cgroups_enabled
  sched/psi: move private helpers to sched/stats.h
  sched/psi: optimize task switch inside shared cgroups again
  sched/psi: add PSI_IRQ to track IRQ/SOFTIRQ pressure
  sched/psi: consolidate cgroup_psi()
  sched/psi: cache parent psi_group to speed up groups iterate
  sched/psi: per-cgroup PSI accounting disable/re-enable interface

Johannes Weiner (1):
  sched/psi: remove NR_ONCPU task accounting

 Documentation/admin-guide/cgroup-v2.rst |  23 ++
 include/linux/cgroup-defs.h             |   3 +
 include/linux/cgroup.h                  |   5 -
 include/linux/psi.h                     |  12 +-
 include/linux/psi_types.h               |  29 ++-
 kernel/cgroup/cgroup.c                  | 106 ++++++++-
 kernel/sched/core.c                     |   1 +
 kernel/sched/psi.c                      | 280 +++++++++++++++++-------
 kernel/sched/stats.h                    |   6 +
 9 files changed, 362 insertions(+), 103 deletions(-)

-- 
2.37.2

^ permalink raw reply	[flat|nested] 38+ messages in thread