All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/3] Avoid scheduling cache draining to isolated cpus
@ 2022-11-02  2:02 ` Leonardo Bras
  0 siblings, 0 replies; 28+ messages in thread
From: Leonardo Bras @ 2022-11-02  2:02 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Frederic Weisbecker, Leonardo Bras, Phil Auld,
	Marcelo Tosatti
  Cc: linux-kernel, cgroups, linux-mm

Patch #1 expands housekeepíng_any_cpu() so we can find housekeeping cpus
closer (NUMA) to any desired CPU, instead of only the current CPU.

### Performance argument that motivated the change:
There could be an argument of why would that be needed, since the current
CPU is probably acessing the current cacheline, and so having a CPU closer
to the current one is always the best choice since the cache invalidation
will take less time. OTOH, there could be cases like this which uses
perCPU variables, and we can have up to 3 different CPUs touching the
cacheline:

C1 - Isolated CPU: The perCPU data 'belongs' to this one
C2 - Scheduling CPU: Schedule some work to be done elsewhere, current cpu
C3 - Housekeeping CPU: This one will do the work

Most of the times the cacheline is touched, it should be by C1. Some times
a C2 will schedule work to run on C3, since C1 is isolated.

If C1 and C2 are in different NUMA nodes, we could have C3 either in
C2 NUMA node (housekeeping_any_cpu()) or in C1 NUMA node 
(housekeeping_any_cpu_from(C1). 

If C3 is in C2 NUMA node, there will be a faster invalidation when C3
tries to get cacheline exclusivity, and then a slower invalidation when
this happens in C1, when it's working in its data.

If C3 is in C1 NUMA node, there will be a slower invalidation when C3
tries to get cacheline exclusivity, and then a faster invalidation when
this happens in C1.

The thing is: it should be better to wait less when doing kernel work
on an isolated CPU, even at the cost of some housekeeping CPU waiting
a few more cycles.
###

Patch #2 changes the locking strategy of memcg_stock_pcp->stock_lock from
local_lock to spinlocks, so it can be later used to do remote percpu
cache draining on patch #3. Most performance concerns should be pointed
in the commit log.

Patch #3 implements the remote per-CPU cache drain, making use of both 
patches #2 and #3. Performance-wise, in non-isolated scenarios, it should
introduce an extra function call and a single test to check if the CPU is
isolated. 

On scenarios with isolation enabled on boot, it will also introduce an
extra test to check in the cpumask if the CPU is isolated. If it is,
there will also be an extra read of the cpumask to look for a
housekeeping CPU.

Please, provide any feedback on that!
Thanks a lot for reading!

Leonardo Bras (3):
  sched/isolation: Add housekeepíng_any_cpu_from()
  mm/memcontrol: Change stock_lock type from local_lock_t to spinlock_t
  mm/memcontrol: Add drain_remote_stock(), avoid drain_stock on isolated
    cpus

 include/linux/sched/isolation.h | 11 +++--
 kernel/sched/isolation.c        |  8 ++--
 mm/memcontrol.c                 | 83 ++++++++++++++++++++++-----------
 3 files changed, 69 insertions(+), 33 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-01-25  7:45 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-02  2:02 [PATCH v1 0/3] Avoid scheduling cache draining to isolated cpus Leonardo Bras
2022-11-02  2:02 ` Leonardo Bras
2022-11-02  2:02 ` [PATCH v1 1/3] sched/isolation: Add housekeepíng_any_cpu_from() Leonardo Bras
2022-11-02  2:02   ` Leonardo Bras
2022-11-02  2:02 ` [PATCH v1 2/3] mm/memcontrol: Change stock_lock type from local_lock_t to spinlock_t Leonardo Bras
2022-11-02  2:02   ` Leonardo Bras
2022-11-02  2:02 ` [PATCH v1 3/3] mm/memcontrol: Add drain_remote_stock(), avoid drain_stock on isolated cpus Leonardo Bras
2022-11-02  2:02   ` Leonardo Bras
2022-11-02  8:53 ` [PATCH v1 0/3] Avoid scheduling cache draining to " Michal Hocko
2022-11-02  8:53   ` Michal Hocko
2022-11-03 14:59   ` Leonardo Brás
2022-11-03 14:59     ` Leonardo Brás
2022-11-03 15:31     ` Michal Hocko
2022-11-03 15:31       ` Michal Hocko
2022-11-03 16:53       ` Leonardo Brás
2022-11-03 16:53         ` Leonardo Brás
2022-11-04  8:41         ` Michal Hocko
2022-11-04  8:41           ` Michal Hocko
2022-11-05  1:45           ` Leonardo Brás
2022-11-05  1:45             ` Leonardo Brás
2022-11-07  8:10             ` Michal Hocko
2022-11-07  8:10               ` Michal Hocko
2022-11-08 23:09               ` Leonardo Brás
2022-11-08 23:09                 ` Leonardo Brás
2022-11-09  8:05                 ` Michal Hocko
2022-11-09  8:05                   ` Michal Hocko
2023-01-25  7:44                   ` Leonardo Brás
2023-01-25  7:44                     ` Leonardo Brás

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.