[PATCH 0/2] memcg, cpuisol: do not interfere pcp cache charges draining with cpuisol workloads

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] memcg, cpuisol: do not interfere pcp cache charges draining with cpuisol workloads
@ 2023-03-17 13:44 Michal Hocko
  2023-03-17 13:44 ` [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API Michal Hocko
  2023-03-17 13:44 ` [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus Michal Hocko
  0 siblings, 2 replies; 20+ messages in thread
From: Michal Hocko @ 2023-03-17 13:44 UTC (permalink / raw)
  To: Andrew Morton, Leonardo Bras
  Cc: Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
	Marcelo Tosatti, Johannes Weiner, Roman Gushchin, Shakeel Butt,
	Muchun Song, LKML, linux-mm

Leonardo has reported [1] that pcp memcg charge draining can interfere
with cpu isolated workloads. The said draining is done from a WQ context
with a pcp worker scheduled on each CPU which holds any cached charges
for a specific memcg hierarchy. Operation is not really a common
operation [2]. It can be triggered from the userspace though so some
care is definitely due.

Leonardo has tried to address the issue by allowing remote charge
draining [3]. This approach requires an additional locking to
synchronize pcp caches sync from a remote cpu from local pcp consumers.
Even though the proposed lock was per-cpu there is still potential for
contention and less predictable behavior.

This patchset addresses the issue from a different angle. Rather than
dealing with a potential synchronization, cpus which are isolated are
simply never scheduled to be drained. This means that a small amount of
charges could be laying around and waiting for a later use or they are
flushed when a different memcg is charged from the same cpu. More
details are in patch 2. The first patch from Frederic is implementing an
abstraction to tell whether a specific cpu has been isolated and
therefore require a special treatment.

The patchset is on top of Andrew's mm-unstable tree. I am not sure which
tree is the best to route both of them but unless there are any special
requirements for the cpu isolation parts then pushing this via Andrew
seems like the easiest choice.

Frederic Weisbecker (1):
      sched/isolation: Add cpu_is_isolated() API

Michal Hocko (1):
      memcg: do not drain charge pcp caches on remote isolated cpus

 include/linux/sched/isolation.h | 12 ++++++++++++
 mm/memcontrol.c                 |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

[1] https://lore.kernel.org/all/20221102020243.522358-1-leobras@redhat.com/T/#u
[2] https://lore.kernel.org/all/Y9LQ615H13RmG7wL@dhcp22.suse.cz/T/#u
[3] https://lore.kernel.org/all/20230125073502.743446-1-leobras@redhat.com/T/#u

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API
  2023-03-17 13:44 [PATCH 0/2] memcg, cpuisol: do not interfere pcp cache charges draining with cpuisol workloads Michal Hocko
@ 2023-03-17 13:44 ` Michal Hocko
  2023-03-17 18:33   ` Marcelo Tosatti
  2023-03-17 13:44 ` [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus Michal Hocko
  1 sibling, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2023-03-17 13:44 UTC (permalink / raw)
  To: Andrew Morton, Leonardo Bras
  Cc: Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
	Marcelo Tosatti, Johannes Weiner, Roman Gushchin, Shakeel Butt,
	Muchun Song, LKML, linux-mm, Frederic Weisbecker, Michal Hocko

From: Frederic Weisbecker <frederic@kernel.org>

Provide this new API to check if a CPU has been isolated either through
isolcpus= or nohz_full= kernel parameter.

It aims at avoiding kernel load deemed to be safely spared on CPUs
running sensitive workload that can't bear any disturbance, such as
pcp cache draining.

Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/sched/isolation.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index 8c15abd67aed..fe1a46f30d24 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -46,6 +46,12 @@ static inline bool housekeeping_enabled(enum hk_type type)
 
 static inline void housekeeping_affine(struct task_struct *t,
 				       enum hk_type type) { }
+
+static inline bool housekeeping_test_cpu(int cpu, enum hk_type type)
+{
+	return true;
+}
+
 static inline void housekeeping_init(void) { }
 #endif /* CONFIG_CPU_ISOLATION */
 
@@ -58,4 +64,10 @@ static inline bool housekeeping_cpu(int cpu, enum hk_type type)
 	return true;
 }
 
+static inline bool cpu_is_isolated(int cpu)
+{
+	return !housekeeping_test_cpu(cpu, HK_TYPE_DOMAIN) ||
+		 !housekeeping_test_cpu(cpu, HK_TYPE_TICK);
+}
+
 #endif /* _LINUX_SCHED_ISOLATION_H */
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus
  2023-03-17 13:44 [PATCH 0/2] memcg, cpuisol: do not interfere pcp cache charges draining with cpuisol workloads Michal Hocko
  2023-03-17 13:44 ` [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API Michal Hocko
@ 2023-03-17 13:44 ` Michal Hocko
  2023-03-17 20:08   ` Shakeel Butt
                     ` (3 more replies)
  1 sibling, 4 replies; 20+ messages in thread
From: Michal Hocko @ 2023-03-17 13:44 UTC (permalink / raw)
  To: Andrew Morton, Leonardo Bras
  Cc: Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
	Marcelo Tosatti, Johannes Weiner, Roman Gushchin, Shakeel Butt,
	Muchun Song, LKML, linux-mm, Michal Hocko, Frederic Weisbecker

From: Michal Hocko <mhocko@suse.com>

Leonardo Bras has noticed that pcp charge cache draining might be
disruptive on workloads relying on 'isolated cpus', a feature commonly
used on workloads that are sensitive to interruption and context
switching such as vRAN and Industrial Control Systems.

There are essentially two ways how to approach the issue. We can either
allow the pcp cache to be drained on a different rather than a local cpu
or avoid remote flushing on isolated cpus.

The current pcp charge cache is really optimized for high performance
and it always relies to stick with its cpu. That means it only requires
local_lock (preempt_disable on !RT) and draining is handed over to pcp
WQ to drain locally again.

The former solution (remote draining) would require to add an additional
locking to prevent local charges from racing with the draining. This
adds an atomic operation to otherwise simple arithmetic fast path in the
try_charge path. Another concern is that the remote draining can cause a
lock contention for the isolated workloads and therefore interfere with
it indirectly via user space interfaces.

Another option is to avoid draining scheduling on isolated cpus
altogether. That means that those remote cpus would keep their charges
even after drain_all_stock returns. This is certainly not optimal either
but it shouldn't really cause any major problems. In the worst case
(many isolated cpus with charges - each of them with MEMCG_CHARGE_BATCH
i.e 64 page) the memory consumption of a memcg would be artificially
higher than can be immediately used from other cpus.

Theoretically a memcg OOM killer could be triggered pre-maturely.
Currently it is not really clear whether this is a practical problem
though. Tight memcg limit would be really counter productive to cpu
isolated workloads pretty much by definition because any memory
reclaimed induced by memcg limit could break user space timing
expectations as those usually expect execution in the userspace most of
the time.

Also charges could be left behind on memcg removal. Any future charge on
those isolated cpus will drain that pcp cache so this won't be a
permanent leak.

Considering cons and pros of both approaches this patch is implementing
the second option and simply do not schedule remote draining if the
target cpu is isolated. This solution is much more simpler. It doesn't
add any new locking and it is more more predictable from the user space
POV. Should the pre-mature memcg OOM become a real life problem, we can
revisit this decision.

Cc: Leonardo Brás <leobras@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Reported-by: Leonardo Bras <leobras@redhat.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Suggested-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memcontrol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0524add35cae..12559c08d976 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2366,7 +2366,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
 		    !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
 			if (cpu == curcpu)
 				drain_local_stock(&stock->work);
-			else
+			else if (!cpu_is_isolated(cpu))
 				schedule_work_on(cpu, &stock->work);
 		}
 	}
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API
  2023-03-17 13:44 ` [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API Michal Hocko
@ 2023-03-17 18:33   ` Marcelo Tosatti
  2023-03-17 18:35     ` Marcelo Tosatti
  0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2023-03-17 18:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Leonardo Bras, Frederic Weisbecker,
	Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Roman Gushchin,
	Shakeel Butt, Muchun Song, LKML, linux-mm, Frederic Weisbecker,
	Michal Hocko

On Fri, Mar 17, 2023 at 02:44:47PM +0100, Michal Hocko wrote:
> From: Frederic Weisbecker <frederic@kernel.org>
> 
> Provide this new API to check if a CPU has been isolated either through
> isolcpus= or nohz_full= kernel parameter.
> 
> It aims at avoiding kernel load deemed to be safely spared on CPUs
> running sensitive workload that can't bear any disturbance, such as
> pcp cache draining.

Hi Michal,

This makes no sense to me.

HK_TYPE_DOMAIN is set when isolcpus=domain is configured.
HK_TYPE_TICK is set when nohz_full= is configured.

The use-cases i am aware of use either:

isolcpus=managed_irq,... nohz_full=
OR
isolcpus=domain,managed_irq,... nohz_full=

So what is the point of this function again?

Perhaps it made sense along with, but now does not make sense
anymore:

Subject: [PATCH 1/2] sched/isolation: Merge individual nohz_full features into a common housekeeping flag

The individual isolation features turned on by nohz_full were initially
split in order for each of them to be tunable through cpusets. However
plans have changed in favour of an interface (be it cpusets or sysctl)
grouping all these features to be turned on/off altogether. Then should
the need ever arise, the interface can still be expanded to handle the
individual isolation features.

But Michal can just use housekeeping_test_cpu(cpu, HK_TYPE_TICK) and
the convertion of nohz_full features into a common housekeeping flag
can convert that to something else later?



> 
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  include/linux/sched/isolation.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index 8c15abd67aed..fe1a46f30d24 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -46,6 +46,12 @@ static inline bool housekeeping_enabled(enum hk_type type)
>  
>  static inline void housekeeping_affine(struct task_struct *t,
>  				       enum hk_type type) { }
> +
> +static inline bool housekeeping_test_cpu(int cpu, enum hk_type type)
> +{
> +	return true;
> +}
> +
>  static inline void housekeeping_init(void) { }
>  #endif /* CONFIG_CPU_ISOLATION */
>  
> @@ -58,4 +64,10 @@ static inline bool housekeeping_cpu(int cpu, enum hk_type type)
>  	return true;
>  }
>  
> +static inline bool cpu_is_isolated(int cpu)
> +{
> +	return !housekeeping_test_cpu(cpu, HK_TYPE_DOMAIN) ||
> +		 !housekeeping_test_cpu(cpu, HK_TYPE_TICK);
> +}
> +
>  #endif /* _LINUX_SCHED_ISOLATION_H */
> -- 
> 2.30.2
> 
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API
  2023-03-17 18:33   ` Marcelo Tosatti
@ 2023-03-17 18:35     ` Marcelo Tosatti
  2023-03-18  8:04       ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2023-03-17 18:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Leonardo Bras, Frederic Weisbecker,
	Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Roman Gushchin,
	Shakeel Butt, Muchun Song, LKML, linux-mm, Frederic Weisbecker,
	Michal Hocko

On Fri, Mar 17, 2023 at 03:33:13PM -0300, Marcelo Tosatti wrote:
> On Fri, Mar 17, 2023 at 02:44:47PM +0100, Michal Hocko wrote:
> > From: Frederic Weisbecker <frederic@kernel.org>
> > 
> > Provide this new API to check if a CPU has been isolated either through
> > isolcpus= or nohz_full= kernel parameter.
> > 
> > It aims at avoiding kernel load deemed to be safely spared on CPUs
> > running sensitive workload that can't bear any disturbance, such as
> > pcp cache draining.
> 
> Hi Michal,
> 
> This makes no sense to me.
> 
> HK_TYPE_DOMAIN is set when isolcpus=domain is configured.
> HK_TYPE_TICK is set when nohz_full= is configured.
> 
> The use-cases i am aware of use either:
> 
> isolcpus=managed_irq,... nohz_full=
> OR
> isolcpus=domain,managed_irq,... nohz_full=
> 
> So what is the point of this function again?
> 
> Perhaps it made sense along with, but now does not make sense
> anymore:
> 
> Subject: [PATCH 1/2] sched/isolation: Merge individual nohz_full features into a common housekeeping flag
> 
> The individual isolation features turned on by nohz_full were initially
> split in order for each of them to be tunable through cpusets. However
> plans have changed in favour of an interface (be it cpusets or sysctl)
> grouping all these features to be turned on/off altogether. Then should
> the need ever arise, the interface can still be expanded to handle the
> individual isolation features.
> 
> But Michal can just use housekeeping_test_cpu(cpu, HK_TYPE_TICK) and
> the convertion of nohz_full features into a common housekeeping flag
> can convert that to something else later?

Actually introducing cpu_is_isolated() seems fine, but it can call
housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus
  2023-03-17 13:44 ` [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus Michal Hocko
@ 2023-03-17 20:08   ` Shakeel Butt
  2023-03-17 21:51   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 20+ messages in thread
From: Shakeel Butt @ 2023-03-17 20:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Leonardo Bras, Frederic Weisbecker,
	Peter Zijlstra, Thomas Gleixner, Marcelo Tosatti,
	Johannes Weiner, Roman Gushchin, Muchun Song, LKML, linux-mm,
	Michal Hocko, Frederic Weisbecker

On Fri, Mar 17, 2023 at 6:44 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> From: Michal Hocko <mhocko@suse.com>
>
> Leonardo Bras has noticed that pcp charge cache draining might be
> disruptive on workloads relying on 'isolated cpus', a feature commonly
> used on workloads that are sensitive to interruption and context
> switching such as vRAN and Industrial Control Systems.
>
> There are essentially two ways how to approach the issue. We can either
> allow the pcp cache to be drained on a different rather than a local cpu
> or avoid remote flushing on isolated cpus.
>
> The current pcp charge cache is really optimized for high performance
> and it always relies to stick with its cpu. That means it only requires
> local_lock (preempt_disable on !RT) and draining is handed over to pcp
> WQ to drain locally again.
>
> The former solution (remote draining) would require to add an additional
> locking to prevent local charges from racing with the draining. This
> adds an atomic operation to otherwise simple arithmetic fast path in the
> try_charge path. Another concern is that the remote draining can cause a
> lock contention for the isolated workloads and therefore interfere with
> it indirectly via user space interfaces.
>
> Another option is to avoid draining scheduling on isolated cpus
> altogether. That means that those remote cpus would keep their charges
> even after drain_all_stock returns. This is certainly not optimal either
> but it shouldn't really cause any major problems. In the worst case
> (many isolated cpus with charges - each of them with MEMCG_CHARGE_BATCH
> i.e 64 page) the memory consumption of a memcg would be artificially
> higher than can be immediately used from other cpus.
>
> Theoretically a memcg OOM killer could be triggered pre-maturely.
> Currently it is not really clear whether this is a practical problem
> though. Tight memcg limit would be really counter productive to cpu
> isolated workloads pretty much by definition because any memory
> reclaimed induced by memcg limit could break user space timing
> expectations as those usually expect execution in the userspace most of
> the time.
>
> Also charges could be left behind on memcg removal. Any future charge on
> those isolated cpus will drain that pcp cache so this won't be a
> permanent leak.
>
> Considering cons and pros of both approaches this patch is implementing
> the second option and simply do not schedule remote draining if the
> target cpu is isolated. This solution is much more simpler. It doesn't
> add any new locking and it is more more predictable from the user space
> POV. Should the pre-mature memcg OOM become a real life problem, we can
> revisit this decision.
>
> Cc: Leonardo Brás <leobras@redhat.com>
> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Reported-by: Leonardo Bras <leobras@redhat.com>
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
> Suggested-by: Roman Gushchin <roman.gushchin@linux.dev>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Shakeel Butt <shakeelb@google.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus
  2023-03-17 13:44 ` [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus Michal Hocko
  2023-03-17 20:08   ` Shakeel Butt
@ 2023-03-17 21:51   ` kernel test robot
  2023-03-17 22:22   ` kernel test robot
  2023-03-18  3:23   ` Hillf Danton
  3 siblings, 0 replies; 20+ messages in thread
From: kernel test robot @ 2023-03-17 21:51 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton, Leonardo Bras
  Cc: llvm, oe-kbuild-all, Linux Memory Management List,
	Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner,
	Marcelo Tosatti, Johannes Weiner, Roman Gushchin, Shakeel Butt,
	Muchun Song, LKML, Michal Hocko

Hi Michal,

I love your patch! Yet something to improve:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.3-rc2 next-20230317]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Michal-Hocko/sched-isolation-Add-cpu_is_isolated-API/20230317-214621
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230317134448.11082-3-mhocko%40kernel.org
patch subject: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus
config: x86_64-randconfig-a014 (https://download.01.org/0day-ci/archive/20230318/202303180520.5lKAJwrg-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/b376cfcf40a43276e1280950bb926fdf3521940a
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Michal-Hocko/sched-isolation-Add-cpu_is_isolated-API/20230317-214621
        git checkout b376cfcf40a43276e1280950bb926fdf3521940a
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303180520.5lKAJwrg-lkp@intel.com/

All errors (new ones prefixed by >>):

>> mm/memcontrol.c:2369:14: error: implicit declaration of function 'cpu_is_isolated' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
                           else if (!cpu_is_isolated(cpu))
                                     ^
   1 error generated.


vim +/cpu_is_isolated +2369 mm/memcontrol.c

  2331	
  2332	/*
  2333	 * Drains all per-CPU charge caches for given root_memcg resp. subtree
  2334	 * of the hierarchy under it.
  2335	 */
  2336	static void drain_all_stock(struct mem_cgroup *root_memcg)
  2337	{
  2338		int cpu, curcpu;
  2339	
  2340		/* If someone's already draining, avoid adding running more workers. */
  2341		if (!mutex_trylock(&percpu_charge_mutex))
  2342			return;
  2343		/*
  2344		 * Notify other cpus that system-wide "drain" is running
  2345		 * We do not care about races with the cpu hotplug because cpu down
  2346		 * as well as workers from this path always operate on the local
  2347		 * per-cpu data. CPU up doesn't touch memcg_stock at all.
  2348		 */
  2349		migrate_disable();
  2350		curcpu = smp_processor_id();
  2351		for_each_online_cpu(cpu) {
  2352			struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
  2353			struct mem_cgroup *memcg;
  2354			bool flush = false;
  2355	
  2356			rcu_read_lock();
  2357			memcg = stock->cached;
  2358			if (memcg && stock->nr_pages &&
  2359			    mem_cgroup_is_descendant(memcg, root_memcg))
  2360				flush = true;
  2361			else if (obj_stock_flush_required(stock, root_memcg))
  2362				flush = true;
  2363			rcu_read_unlock();
  2364	
  2365			if (flush &&
  2366			    !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
  2367				if (cpu == curcpu)
  2368					drain_local_stock(&stock->work);
> 2369				else if (!cpu_is_isolated(cpu))
  2370					schedule_work_on(cpu, &stock->work);
  2371			}
  2372		}
  2373		migrate_enable();
  2374		mutex_unlock(&percpu_charge_mutex);
  2375	}
  2376	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus
  2023-03-17 13:44 ` [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus Michal Hocko
  2023-03-17 20:08   ` Shakeel Butt
  2023-03-17 21:51   ` kernel test robot
@ 2023-03-17 22:22   ` kernel test robot
  2023-03-17 23:32     ` Andrew Morton
  2023-03-18  3:23   ` Hillf Danton
  3 siblings, 1 reply; 20+ messages in thread
From: kernel test robot @ 2023-03-17 22:22 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton, Leonardo Bras
  Cc: oe-kbuild-all, Linux Memory Management List, Frederic Weisbecker,
	Peter Zijlstra, Thomas Gleixner, Marcelo Tosatti,
	Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML,
	Michal Hocko

Hi Michal,

I love your patch! Yet something to improve:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.3-rc2 next-20230317]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Michal-Hocko/sched-isolation-Add-cpu_is_isolated-API/20230317-214621
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230317134448.11082-3-mhocko%40kernel.org
patch subject: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus
config: x86_64-randconfig-a013 (https://download.01.org/0day-ci/archive/20230318/202303180617.7E3aIlHf-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/b376cfcf40a43276e1280950bb926fdf3521940a
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Michal-Hocko/sched-isolation-Add-cpu_is_isolated-API/20230317-214621
        git checkout b376cfcf40a43276e1280950bb926fdf3521940a
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 olddefconfig
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303180617.7E3aIlHf-lkp@intel.com/

All errors (new ones prefixed by >>):

   mm/memcontrol.c: In function 'drain_all_stock':
>> mm/memcontrol.c:2369:35: error: implicit declaration of function 'cpu_is_isolated' [-Werror=implicit-function-declaration]
    2369 |                         else if (!cpu_is_isolated(cpu))
         |                                   ^~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/cpu_is_isolated +2369 mm/memcontrol.c

  2331	
  2332	/*
  2333	 * Drains all per-CPU charge caches for given root_memcg resp. subtree
  2334	 * of the hierarchy under it.
  2335	 */
  2336	static void drain_all_stock(struct mem_cgroup *root_memcg)
  2337	{
  2338		int cpu, curcpu;
  2339	
  2340		/* If someone's already draining, avoid adding running more workers. */
  2341		if (!mutex_trylock(&percpu_charge_mutex))
  2342			return;
  2343		/*
  2344		 * Notify other cpus that system-wide "drain" is running
  2345		 * We do not care about races with the cpu hotplug because cpu down
  2346		 * as well as workers from this path always operate on the local
  2347		 * per-cpu data. CPU up doesn't touch memcg_stock at all.
  2348		 */
  2349		migrate_disable();
  2350		curcpu = smp_processor_id();
  2351		for_each_online_cpu(cpu) {
  2352			struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
  2353			struct mem_cgroup *memcg;
  2354			bool flush = false;
  2355	
  2356			rcu_read_lock();
  2357			memcg = stock->cached;
  2358			if (memcg && stock->nr_pages &&
  2359			    mem_cgroup_is_descendant(memcg, root_memcg))
  2360				flush = true;
  2361			else if (obj_stock_flush_required(stock, root_memcg))
  2362				flush = true;
  2363			rcu_read_unlock();
  2364	
  2365			if (flush &&
  2366			    !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
  2367				if (cpu == curcpu)
  2368					drain_local_stock(&stock->work);
> 2369				else if (!cpu_is_isolated(cpu))
  2370					schedule_work_on(cpu, &stock->work);
  2371			}
  2372		}
  2373		migrate_enable();
  2374		mutex_unlock(&percpu_charge_mutex);
  2375	}
  2376	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus
  2023-03-17 22:22   ` kernel test robot
@ 2023-03-17 23:32     ` Andrew Morton
  2023-03-18  8:03       ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2023-03-17 23:32 UTC (permalink / raw)
  To: kernel test robot
  Cc: Michal Hocko, Leonardo Bras, oe-kbuild-all,
	Linux Memory Management List, Frederic Weisbecker,
	Peter Zijlstra, Thomas Gleixner, Marcelo Tosatti,
	Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML,
	Michal Hocko

On Sat, 18 Mar 2023 06:22:40 +0800 kernel test robot <lkp@intel.com> wrote:

>    mm/memcontrol.c: In function 'drain_all_stock':
> >> mm/memcontrol.c:2369:35: error: implicit declaration of function 'cpu_is_isolated' [-Werror=implicit-function-declaration]
>     2369 |                         else if (!cpu_is_isolated(cpu))
>          |                                   ^~~~~~~~~~~~~~~

Thanks.

--- a/mm/memcontrol.c~memcg-do-not-drain-charge-pcp-caches-on-remote-isolated-cpus-fix
+++ a/mm/memcontrol.c
@@ -63,6 +63,7 @@
 #include <linux/resume_user_mode.h>
 #include <linux/psi.h>
 #include <linux/seq_buf.h>
+#include <linux/sched/isolation.h>
 #include "internal.h"
 #include <net/sock.h>
 #include <net/ip.h>
_



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus
  2023-03-17 13:44 ` [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus Michal Hocko
                     ` (2 preceding siblings ...)
  2023-03-17 22:22   ` kernel test robot
@ 2023-03-18  3:23   ` Hillf Danton
  2023-03-18  8:08     ` Michal Hocko
  3 siblings, 1 reply; 20+ messages in thread
From: Hillf Danton @ 2023-03-18  3:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Leonardo Bras, Frederic Weisbecker, LKML,
	linux-mm, Michal Hocko, Frederic Weisbecker

On 17 Mar 2023 14:44:48 +0100 Michal Hocko <mhocko@suse.com>
> Leonardo Bras has noticed that pcp charge cache draining might be
> disruptive on workloads relying on 'isolated cpus', a feature commonly
> used on workloads that are sensitive to interruption and context
> switching such as vRAN and Industrial Control Systems.
> 
> There are essentially two ways how to approach the issue. We can either
> allow the pcp cache to be drained on a different rather than a local cpu
> or avoid remote flushing on isolated cpus.
> 
> The current pcp charge cache is really optimized for high performance
> and it always relies to stick with its cpu. That means it only requires
> local_lock (preempt_disable on !RT) and draining is handed over to pcp
> WQ to drain locally again.
> 
> The former solution (remote draining) would require to add an additional
> locking to prevent local charges from racing with the draining. This
> adds an atomic operation to otherwise simple arithmetic fast path in the
> try_charge path. Another concern is that the remote draining can cause a
> lock contention for the isolated workloads and therefore interfere with
> it indirectly via user space interfaces.
> 
> Another option is to avoid draining scheduling on isolated cpus
> altogether. That means that those remote cpus would keep their charges
> even after drain_all_stock returns. This is certainly not optimal either
> but it shouldn't really cause any major problems. In the worst case
> (many isolated cpus with charges - each of them with MEMCG_CHARGE_BATCH
> i.e 64 page) the memory consumption of a memcg would be artificially
> higher than can be immediately used from other cpus.
> 
> Theoretically a memcg OOM killer could be triggered pre-maturely.
> Currently it is not really clear whether this is a practical problem
> though. Tight memcg limit would be really counter productive to cpu
> isolated workloads pretty much by definition because any memory
> reclaimed induced by memcg limit could break user space timing
> expectations as those usually expect execution in the userspace most of
> the time.
> 
> Also charges could be left behind on memcg removal. Any future charge on
> those isolated cpus will drain that pcp cache so this won't be a
> permanent leak.
> 
> Considering cons and pros of both approaches this patch is implementing
> the second option and simply do not schedule remote draining if the
> target cpu is isolated. This solution is much more simpler. It doesn't
> add any new locking and it is more more predictable from the user space
> POV. Should the pre-mature memcg OOM become a real life problem, we can
> revisit this decision.

JFYI feel free to take a look at the non-housekeeping CPUs [1].

[1] https://lore.kernel.org/lkml/20230223150624.GA29739@lst.de/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus
  2023-03-17 23:32     ` Andrew Morton
@ 2023-03-18  8:03       ` Michal Hocko
  0 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2023-03-18  8:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kernel test robot, Leonardo Bras, oe-kbuild-all,
	Linux Memory Management List, Frederic Weisbecker,
	Peter Zijlstra, Thomas Gleixner, Marcelo Tosatti,
	Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML

On Fri 17-03-23 16:32:29, Andrew Morton wrote:
> On Sat, 18 Mar 2023 06:22:40 +0800 kernel test robot <lkp@intel.com> wrote:
> 
> >    mm/memcontrol.c: In function 'drain_all_stock':
> > >> mm/memcontrol.c:2369:35: error: implicit declaration of function 'cpu_is_isolated' [-Werror=implicit-function-declaration]
> >     2369 |                         else if (!cpu_is_isolated(cpu))
> >          |                                   ^~~~~~~~~~~~~~~
> 
> Thanks.
> 
> --- a/mm/memcontrol.c~memcg-do-not-drain-charge-pcp-caches-on-remote-isolated-cpus-fix
> +++ a/mm/memcontrol.c
> @@ -63,6 +63,7 @@
>  #include <linux/resume_user_mode.h>
>  #include <linux/psi.h>
>  #include <linux/seq_buf.h>
> +#include <linux/sched/isolation.h>
>  #include "internal.h"
>  #include <net/sock.h>
>  #include <net/ip.h>

Thanks a lot Andrew!
> _

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API
  2023-03-17 18:35     ` Marcelo Tosatti
@ 2023-03-18  8:04       ` Michal Hocko
  2023-03-24 22:35         ` Frederic Weisbecker
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2023-03-18  8:04 UTC (permalink / raw)
  To: Marcelo Tosatti, Frederic Weisbecker
  Cc: Andrew Morton, Leonardo Bras, Peter Zijlstra, Thomas Gleixner,
	Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML,
	linux-mm, Frederic Weisbecker

On Fri 17-03-23 15:35:05, Marcelo Tosatti wrote:
> On Fri, Mar 17, 2023 at 03:33:13PM -0300, Marcelo Tosatti wrote:
> > On Fri, Mar 17, 2023 at 02:44:47PM +0100, Michal Hocko wrote:
> > > From: Frederic Weisbecker <frederic@kernel.org>
> > > 
> > > Provide this new API to check if a CPU has been isolated either through
> > > isolcpus= or nohz_full= kernel parameter.
> > > 
> > > It aims at avoiding kernel load deemed to be safely spared on CPUs
> > > running sensitive workload that can't bear any disturbance, such as
> > > pcp cache draining.
> > 
> > Hi Michal,
> > 
> > This makes no sense to me.
> > 
> > HK_TYPE_DOMAIN is set when isolcpus=domain is configured.
> > HK_TYPE_TICK is set when nohz_full= is configured.
> > 
> > The use-cases i am aware of use either:
> > 
> > isolcpus=managed_irq,... nohz_full=
> > OR
> > isolcpus=domain,managed_irq,... nohz_full=
> > 
> > So what is the point of this function again?
> > 
> > Perhaps it made sense along with, but now does not make sense
> > anymore:
> > 
> > Subject: [PATCH 1/2] sched/isolation: Merge individual nohz_full features into a common housekeeping flag
> > 
> > The individual isolation features turned on by nohz_full were initially
> > split in order for each of them to be tunable through cpusets. However
> > plans have changed in favour of an interface (be it cpusets or sysctl)
> > grouping all these features to be turned on/off altogether. Then should
> > the need ever arise, the interface can still be expanded to handle the
> > individual isolation features.
> > 
> > But Michal can just use housekeeping_test_cpu(cpu, HK_TYPE_TICK) and
> > the convertion of nohz_full features into a common housekeeping flag
> > can convert that to something else later?
> 
> Actually introducing cpu_is_isolated() seems fine, but it can call
> housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS.
 
This is not really my area. Frederic, could you have a look please?

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus
  2023-03-18  3:23   ` Hillf Danton
@ 2023-03-18  8:08     ` Michal Hocko
  0 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2023-03-18  8:08 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Andrew Morton, Leonardo Bras, Frederic Weisbecker, LKML,
	linux-mm, Frederic Weisbecker

On Sat 18-03-23 11:23:50, Hillf Danton wrote:
> On 17 Mar 2023 14:44:48 +0100 Michal Hocko <mhocko@suse.com>
> > Leonardo Bras has noticed that pcp charge cache draining might be
> > disruptive on workloads relying on 'isolated cpus', a feature commonly
> > used on workloads that are sensitive to interruption and context
> > switching such as vRAN and Industrial Control Systems.
> > 
> > There are essentially two ways how to approach the issue. We can either
> > allow the pcp cache to be drained on a different rather than a local cpu
> > or avoid remote flushing on isolated cpus.
> > 
> > The current pcp charge cache is really optimized for high performance
> > and it always relies to stick with its cpu. That means it only requires
> > local_lock (preempt_disable on !RT) and draining is handed over to pcp
> > WQ to drain locally again.
> > 
> > The former solution (remote draining) would require to add an additional
> > locking to prevent local charges from racing with the draining. This
> > adds an atomic operation to otherwise simple arithmetic fast path in the
> > try_charge path. Another concern is that the remote draining can cause a
> > lock contention for the isolated workloads and therefore interfere with
> > it indirectly via user space interfaces.
> > 
> > Another option is to avoid draining scheduling on isolated cpus
> > altogether. That means that those remote cpus would keep their charges
> > even after drain_all_stock returns. This is certainly not optimal either
> > but it shouldn't really cause any major problems. In the worst case
> > (many isolated cpus with charges - each of them with MEMCG_CHARGE_BATCH
> > i.e 64 page) the memory consumption of a memcg would be artificially
> > higher than can be immediately used from other cpus.
> > 
> > Theoretically a memcg OOM killer could be triggered pre-maturely.
> > Currently it is not really clear whether this is a practical problem
> > though. Tight memcg limit would be really counter productive to cpu
> > isolated workloads pretty much by definition because any memory
> > reclaimed induced by memcg limit could break user space timing
> > expectations as those usually expect execution in the userspace most of
> > the time.
> > 
> > Also charges could be left behind on memcg removal. Any future charge on
> > those isolated cpus will drain that pcp cache so this won't be a
> > permanent leak.
> > 
> > Considering cons and pros of both approaches this patch is implementing
> > the second option and simply do not schedule remote draining if the
> > target cpu is isolated. This solution is much more simpler. It doesn't
> > add any new locking and it is more more predictable from the user space
> > POV. Should the pre-mature memcg OOM become a real life problem, we can
> > revisit this decision.
> 
> JFYI feel free to take a look at the non-housekeeping CPUs [1].
> 
> [1] https://lore.kernel.org/lkml/20230223150624.GA29739@lst.de/

Such an approach would require remote draining and I hope I have
explained why that is not a preferred way in this case. Other than that
I do agree with Christoph that a generic approach would be really nice.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API
  2023-03-18  8:04       ` Michal Hocko
@ 2023-03-24 22:35         ` Frederic Weisbecker
  2023-03-27 10:24           ` Marcelo Tosatti
  0 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2023-03-24 22:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Marcelo Tosatti, Frederic Weisbecker, Andrew Morton,
	Leonardo Bras, Peter Zijlstra, Thomas Gleixner, Johannes Weiner,
	Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm

Le Sat, Mar 18, 2023 at 09:04:38AM +0100, Michal Hocko a écrit :
> On Fri 17-03-23 15:35:05, Marcelo Tosatti wrote:
> > On Fri, Mar 17, 2023 at 03:33:13PM -0300, Marcelo Tosatti wrote:
> > > On Fri, Mar 17, 2023 at 02:44:47PM +0100, Michal Hocko wrote:
> > > > From: Frederic Weisbecker <frederic@kernel.org>
> > > > 
> > > > Provide this new API to check if a CPU has been isolated either through
> > > > isolcpus= or nohz_full= kernel parameter.
> > > > 
> > > > It aims at avoiding kernel load deemed to be safely spared on CPUs
> > > > running sensitive workload that can't bear any disturbance, such as
> > > > pcp cache draining.
> > > 
> > > Hi Michal,
> > > 
> > > This makes no sense to me.
> > > 
> > > HK_TYPE_DOMAIN is set when isolcpus=domain is configured.
> > > HK_TYPE_TICK is set when nohz_full= is configured.
> > > 
> > > The use-cases i am aware of use either:
> > > 
> > > isolcpus=managed_irq,... nohz_full=
> > > OR
> > > isolcpus=domain,managed_irq,... nohz_full=
> > > 
> > > So what is the point of this function again?
> > > 
> > > Perhaps it made sense along with, but now does not make sense
> > > anymore:
> > > 
> > > Subject: [PATCH 1/2] sched/isolation: Merge individual nohz_full features into a common housekeeping flag
> > > 
> > > The individual isolation features turned on by nohz_full were initially
> > > split in order for each of them to be tunable through cpusets. However
> > > plans have changed in favour of an interface (be it cpusets or sysctl)
> > > grouping all these features to be turned on/off altogether. Then should
> > > the need ever arise, the interface can still be expanded to handle the
> > > individual isolation features.
> > > 
> > > But Michal can just use housekeeping_test_cpu(cpu, HK_TYPE_TICK) and
> > > the convertion of nohz_full features into a common housekeeping flag
> > > can convert that to something else later?
> > 
> > Actually introducing cpu_is_isolated() seems fine, but it can call
> > housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS.
>  
> This is not really my area. Frederic, could you have a look please?

The point is to have a function that tells if either nohz_full= or
isolcpus=[domain] has been passed for the given CPU.

Because I assumed that both would be interested in avoiding that flush
noise, wouldn't it be the case?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API
  2023-03-24 22:35         ` Frederic Weisbecker
@ 2023-03-27 10:24           ` Marcelo Tosatti
  2023-03-28 11:38             ` Frederic Weisbecker
  2023-03-28 11:48             ` Michal Hocko
  0 siblings, 2 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2023-03-27 10:24 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Michal Hocko, Frederic Weisbecker, Andrew Morton, Leonardo Bras,
	Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Roman Gushchin,
	Shakeel Butt, Muchun Song, LKML, linux-mm

On Fri, Mar 24, 2023 at 11:35:35PM +0100, Frederic Weisbecker wrote:
> Le Sat, Mar 18, 2023 at 09:04:38AM +0100, Michal Hocko a écrit :
> > On Fri 17-03-23 15:35:05, Marcelo Tosatti wrote:
> > > On Fri, Mar 17, 2023 at 03:33:13PM -0300, Marcelo Tosatti wrote:
> > > > On Fri, Mar 17, 2023 at 02:44:47PM +0100, Michal Hocko wrote:
> > > > > From: Frederic Weisbecker <frederic@kernel.org>
> > > > > 
> > > > > Provide this new API to check if a CPU has been isolated either through
> > > > > isolcpus= or nohz_full= kernel parameter.
> > > > > 
> > > > > It aims at avoiding kernel load deemed to be safely spared on CPUs
> > > > > running sensitive workload that can't bear any disturbance, such as
> > > > > pcp cache draining.
> > > > 
> > > > Hi Michal,
> > > > 
> > > > This makes no sense to me.
> > > > 
> > > > HK_TYPE_DOMAIN is set when isolcpus=domain is configured.
> > > > HK_TYPE_TICK is set when nohz_full= is configured.
> > > > 
> > > > The use-cases i am aware of use either:
> > > > 
> > > > isolcpus=managed_irq,... nohz_full=
> > > > OR
> > > > isolcpus=domain,managed_irq,... nohz_full=
> > > > 
> > > > So what is the point of this function again?
> > > > 
> > > > Perhaps it made sense along with, but now does not make sense
> > > > anymore:
> > > > 
> > > > Subject: [PATCH 1/2] sched/isolation: Merge individual nohz_full features into a common housekeeping flag
> > > > 
> > > > The individual isolation features turned on by nohz_full were initially
> > > > split in order for each of them to be tunable through cpusets. However
> > > > plans have changed in favour of an interface (be it cpusets or sysctl)
> > > > grouping all these features to be turned on/off altogether. Then should
> > > > the need ever arise, the interface can still be expanded to handle the
> > > > individual isolation features.
> > > > 
> > > > But Michal can just use housekeeping_test_cpu(cpu, HK_TYPE_TICK) and
> > > > the convertion of nohz_full features into a common housekeeping flag
> > > > can convert that to something else later?
> > > 
> > > Actually introducing cpu_is_isolated() seems fine, but it can call
> > > housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS.
> >  
> > This is not really my area. Frederic, could you have a look please?
> 
> The point is to have a function that tells if either nohz_full= or
> isolcpus=[domain] has been passed for the given CPU.
> 
> Because I assumed that both would be interested in avoiding that flush
> noise, wouldn't it be the case?

Yes, that is the case. But as a note: for the two main types of
configuration performed (one uses isolcpus=[domain] and the other
cgroups, for isolating processes) nohz_full= is always set.

So just testing for nohz_full= would be sufficient (which perhaps would
make the code simpler).



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API
  2023-03-27 10:24           ` Marcelo Tosatti
@ 2023-03-28 11:38             ` Frederic Weisbecker
  2023-03-28 11:48             ` Michal Hocko
  1 sibling, 0 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2023-03-28 11:38 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Michal Hocko, Frederic Weisbecker, Andrew Morton, Leonardo Bras,
	Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Roman Gushchin,
	Shakeel Butt, Muchun Song, LKML, linux-mm

On Mon, Mar 27, 2023 at 07:24:54AM -0300, Marcelo Tosatti wrote:
> On Fri, Mar 24, 2023 at 11:35:35PM +0100, Frederic Weisbecker wrote:
> > Le Sat, Mar 18, 2023 at 09:04:38AM +0100, Michal Hocko a écrit :
> > > On Fri 17-03-23 15:35:05, Marcelo Tosatti wrote:
> > > > On Fri, Mar 17, 2023 at 03:33:13PM -0300, Marcelo Tosatti wrote:
> > > > > On Fri, Mar 17, 2023 at 02:44:47PM +0100, Michal Hocko wrote:
> > > > > > From: Frederic Weisbecker <frederic@kernel.org>
> > > > > > 
> > > > > > Provide this new API to check if a CPU has been isolated either through
> > > > > > isolcpus= or nohz_full= kernel parameter.
> > > > > > 
> > > > > > It aims at avoiding kernel load deemed to be safely spared on CPUs
> > > > > > running sensitive workload that can't bear any disturbance, such as
> > > > > > pcp cache draining.
> > > > > 
> > > > > Hi Michal,
> > > > > 
> > > > > This makes no sense to me.
> > > > > 
> > > > > HK_TYPE_DOMAIN is set when isolcpus=domain is configured.
> > > > > HK_TYPE_TICK is set when nohz_full= is configured.
> > > > > 
> > > > > The use-cases i am aware of use either:
> > > > > 
> > > > > isolcpus=managed_irq,... nohz_full=
> > > > > OR
> > > > > isolcpus=domain,managed_irq,... nohz_full=
> > > > > 
> > > > > So what is the point of this function again?
> > > > > 
> > > > > Perhaps it made sense along with, but now does not make sense
> > > > > anymore:
> > > > > 
> > > > > Subject: [PATCH 1/2] sched/isolation: Merge individual nohz_full features into a common housekeeping flag
> > > > > 
> > > > > The individual isolation features turned on by nohz_full were initially
> > > > > split in order for each of them to be tunable through cpusets. However
> > > > > plans have changed in favour of an interface (be it cpusets or sysctl)
> > > > > grouping all these features to be turned on/off altogether. Then should
> > > > > the need ever arise, the interface can still be expanded to handle the
> > > > > individual isolation features.
> > > > > 
> > > > > But Michal can just use housekeeping_test_cpu(cpu, HK_TYPE_TICK) and
> > > > > the convertion of nohz_full features into a common housekeeping flag
> > > > > can convert that to something else later?
> > > > 
> > > > Actually introducing cpu_is_isolated() seems fine, but it can call
> > > > housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS.
> > >  
> > > This is not really my area. Frederic, could you have a look please?
> > 
> > The point is to have a function that tells if either nohz_full= or
> > isolcpus=[domain] has been passed for the given CPU.
> > 
> > Because I assumed that both would be interested in avoiding that flush
> > noise, wouldn't it be the case?
> 
> Yes, that is the case. But as a note: for the two main types of
> configuration performed (one uses isolcpus=[domain] and the other
> cgroups, for isolating processes) nohz_full= is always set.
> 
> So just testing for nohz_full= would be sufficient (which perhaps would
> make the code simpler).

Ok then all is needed is to test tick_nohz_full_cpu(target), right?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API
  2023-03-27 10:24           ` Marcelo Tosatti
  2023-03-28 11:38             ` Frederic Weisbecker
@ 2023-03-28 11:48             ` Michal Hocko
  2023-03-29 14:20               ` Marcelo Tosatti
  1 sibling, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2023-03-28 11:48 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Frederic Weisbecker, Frederic Weisbecker, Andrew Morton,
	Leonardo Bras, Peter Zijlstra, Thomas Gleixner, Johannes Weiner,
	Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm

On Mon 27-03-23 07:24:54, Marcelo Tosatti wrote:
> On Fri, Mar 24, 2023 at 11:35:35PM +0100, Frederic Weisbecker wrote:
> > Le Sat, Mar 18, 2023 at 09:04:38AM +0100, Michal Hocko a écrit :
> > > On Fri 17-03-23 15:35:05, Marcelo Tosatti wrote:
[...]
> > > > Actually introducing cpu_is_isolated() seems fine, but it can call
> > > > housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS.
> > >  
> > > This is not really my area. Frederic, could you have a look please?
> > 
> > The point is to have a function that tells if either nohz_full= or
> > isolcpus=[domain] has been passed for the given CPU.
> > 
> > Because I assumed that both would be interested in avoiding that flush
> > noise, wouldn't it be the case?
> 
> Yes, that is the case. But as a note: for the two main types of
> configuration performed (one uses isolcpus=[domain] and the other
> cgroups, for isolating processes) nohz_full= is always set.
> 
> So just testing for nohz_full= would be sufficient (which perhaps would
> make the code simpler).

I do not see any mention about that assumption under Documentation/. Is
this a best practice documented anywhere or it just happens to be the
case with workloads you deal with?

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API
  2023-03-28 11:48             ` Michal Hocko
@ 2023-03-29 14:20               ` Marcelo Tosatti
  2023-03-30 13:28                 ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2023-03-29 14:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Frederic Weisbecker, Frederic Weisbecker, Andrew Morton,
	Leonardo Bras, Peter Zijlstra, Thomas Gleixner, Johannes Weiner,
	Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm

On Tue, Mar 28, 2023 at 01:48:02PM +0200, Michal Hocko wrote:
> On Mon 27-03-23 07:24:54, Marcelo Tosatti wrote:
> > On Fri, Mar 24, 2023 at 11:35:35PM +0100, Frederic Weisbecker wrote:
> > > Le Sat, Mar 18, 2023 at 09:04:38AM +0100, Michal Hocko a écrit :
> > > > On Fri 17-03-23 15:35:05, Marcelo Tosatti wrote:
> [...]
> > > > > Actually introducing cpu_is_isolated() seems fine, but it can call
> > > > > housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS.
> > > >  
> > > > This is not really my area. Frederic, could you have a look please?
> > > 
> > > The point is to have a function that tells if either nohz_full= or
> > > isolcpus=[domain] has been passed for the given CPU.
> > > 
> > > Because I assumed that both would be interested in avoiding that flush
> > > noise, wouldn't it be the case?
> > 
> > Yes, that is the case. But as a note: for the two main types of
> > configuration performed (one uses isolcpus=[domain] and the other
> > cgroups, for isolating processes) nohz_full= is always set.
> > 
> > So just testing for nohz_full= would be sufficient (which perhaps would
> > make the code simpler).
> 
> I do not see any mention about that assumption under Documentation/. 

Documentation/admin-guide/kernel-per-CPU-kthreads.rst

SCHED_SOFTIRQ
-------------

Do all of the following:

1.      Avoid sending scheduler IPIs to the CPU to be de-jittered,
        for example, ensure that at most one runnable kthread is present
        on that CPU.  If a thread that expects to run on the de-jittered
        CPU awakens, the scheduler will send an IPI that can result in
        a subsequent SCHED_SOFTIRQ.
2.      CONFIG_NO_HZ_FULL=y and ensure that the CPU to be de-jittered
        is marked as an adaptive-ticks CPU using the "nohz_full="
        boot parameter.  This reduces the number of scheduler-clock
        interrupts that the de-jittered CPU receives, minimizing its
        chances of being selected to do the load balancing work that
        runs in SCHED_SOFTIRQ context.

> Is this a best practice documented anywhere or it just happens to be
> the case with workloads you deal with?

Option 2. However Frederic seems interested in matching the exported
toggles with the known use-cases classes.

For example, for this guide:
http://www.comfilewiki.co.kr/en/doku.php?id=comfilepi:improving_real-time_performance:index

Using nohz_full= would be a benefit (and its not being currently set,
perhaps due to not knowing all the options?).

http://www.comfilewiki.co.kr/en/doku.php?id=comfilepi:improving_real-time_performance:index


AFAIU the workloads for which disabling nohz_full= is a benefit are those
where the switching between nohz full mode and sched tick enabled mode
and vice-versa (which involve programming the local timer) happens
often and is therefore avoidable? For example switching between 1
runnable task and more than 1 runnable task (and vice versa).



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API
  2023-03-29 14:20               ` Marcelo Tosatti
@ 2023-03-30 13:28                 ` Michal Hocko
  2023-03-30 15:21                   ` Marcelo Tosatti
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2023-03-30 13:28 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Frederic Weisbecker, Frederic Weisbecker, Andrew Morton,
	Leonardo Bras, Peter Zijlstra, Thomas Gleixner, Johannes Weiner,
	Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm

On Wed 29-03-23 11:20:21, Marcelo Tosatti wrote:
> On Tue, Mar 28, 2023 at 01:48:02PM +0200, Michal Hocko wrote:
> > On Mon 27-03-23 07:24:54, Marcelo Tosatti wrote:
> > > On Fri, Mar 24, 2023 at 11:35:35PM +0100, Frederic Weisbecker wrote:
> > > > Le Sat, Mar 18, 2023 at 09:04:38AM +0100, Michal Hocko a écrit :
> > > > > On Fri 17-03-23 15:35:05, Marcelo Tosatti wrote:
> > [...]
> > > > > > Actually introducing cpu_is_isolated() seems fine, but it can call
> > > > > > housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS.
> > > > >  
> > > > > This is not really my area. Frederic, could you have a look please?
> > > > 
> > > > The point is to have a function that tells if either nohz_full= or
> > > > isolcpus=[domain] has been passed for the given CPU.
> > > > 
> > > > Because I assumed that both would be interested in avoiding that flush
> > > > noise, wouldn't it be the case?
> > > 
> > > Yes, that is the case. But as a note: for the two main types of
> > > configuration performed (one uses isolcpus=[domain] and the other
> > > cgroups, for isolating processes) nohz_full= is always set.
> > > 
> > > So just testing for nohz_full= would be sufficient (which perhaps would
> > > make the code simpler).
> > 
> > I do not see any mention about that assumption under Documentation/. 
> 
> Documentation/admin-guide/kernel-per-CPU-kthreads.rst
> 
> SCHED_SOFTIRQ
> -------------
> 
> Do all of the following:
> 
> 1.      Avoid sending scheduler IPIs to the CPU to be de-jittered,
>         for example, ensure that at most one runnable kthread is present
>         on that CPU.  If a thread that expects to run on the de-jittered
>         CPU awakens, the scheduler will send an IPI that can result in
>         a subsequent SCHED_SOFTIRQ.
> 2.      CONFIG_NO_HZ_FULL=y and ensure that the CPU to be de-jittered
>         is marked as an adaptive-ticks CPU using the "nohz_full="
>         boot parameter.  This reduces the number of scheduler-clock
>         interrupts that the de-jittered CPU receives, minimizing its
>         chances of being selected to do the load balancing work that
>         runs in SCHED_SOFTIRQ context.

Quite hidden and easy to miss if you are only aware of isolcpus.

> > Is this a best practice documented anywhere or it just happens to be
> > the case with workloads you deal with?
> 
> Option 2. However Frederic seems interested in matching the exported
> toggles with the known use-cases classes.
> 
> For example, for this guide:
> http://www.comfilewiki.co.kr/en/doku.php?id=comfilepi:improving_real-time_performance:index
> 
> Using nohz_full= would be a benefit (and its not being currently set,
> perhaps due to not knowing all the options?).
> 
> http://www.comfilewiki.co.kr/en/doku.php?id=comfilepi:improving_real-time_performance:index
> 
> 
> AFAIU the workloads for which disabling nohz_full= is a benefit are those
> where the switching between nohz full mode and sched tick enabled mode
> and vice-versa (which involve programming the local timer) happens
> often and is therefore avoidable? For example switching between 1
> runnable task and more than 1 runnable task (and vice versa).

The patch from Frederic is testing for both. You seem to be arguing to
reduce the test and I still do not understand why. Sure some workloads
(following the above) will likely use nohz_full= as well but does it
make sense to build that expectation into the higher level logic? What
is an actual benefit?

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API
  2023-03-30 13:28                 ` Michal Hocko
@ 2023-03-30 15:21                   ` Marcelo Tosatti
  0 siblings, 0 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2023-03-30 15:21 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Frederic Weisbecker, Frederic Weisbecker, Andrew Morton,
	Leonardo Bras, Peter Zijlstra, Thomas Gleixner, Johannes Weiner,
	Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm

On Thu, Mar 30, 2023 at 03:28:52PM +0200, Michal Hocko wrote:
> > > Is this a best practice documented anywhere or it just happens to be
> > > the case with workloads you deal with?
> > 
> > Option 2. However Frederic seems interested in matching the exported
> > toggles with the known use-cases classes.
> > 
> > For example, for this guide:
> > http://www.comfilewiki.co.kr/en/doku.php?id=comfilepi:improving_real-time_performance:index
> > 
> > Using nohz_full= would be a benefit (and its not being currently set,
> > perhaps due to not knowing all the options?).
> > 
> > http://www.comfilewiki.co.kr/en/doku.php?id=comfilepi:improving_real-time_performance:index
> > 
> > 
> > AFAIU the workloads for which disabling nohz_full= is a benefit are those
> > where the switching between nohz full mode and sched tick enabled mode
> > and vice-versa (which involve programming the local timer) happens
> > often and is therefore avoidable? For example switching between 1
> > runnable task and more than 1 runnable task (and vice versa).
> 
> The patch from Frederic is testing for both. You seem to be arguing to
> reduce the test and I still do not understand why. Sure some workloads
> (following the above) will likely use nohz_full= as well but does it
> make sense to build that expectation into the higher level logic? What
> is an actual benefit?

Just thinking of simpler code. Feel free to maintain the patch as-is if
you see fit.



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-03-30 18:53 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-17 13:44 [PATCH 0/2] memcg, cpuisol: do not interfere pcp cache charges draining with cpuisol workloads Michal Hocko
2023-03-17 13:44 ` [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API Michal Hocko
2023-03-17 18:33   ` Marcelo Tosatti
2023-03-17 18:35     ` Marcelo Tosatti
2023-03-18  8:04       ` Michal Hocko
2023-03-24 22:35         ` Frederic Weisbecker
2023-03-27 10:24           ` Marcelo Tosatti
2023-03-28 11:38             ` Frederic Weisbecker
2023-03-28 11:48             ` Michal Hocko
2023-03-29 14:20               ` Marcelo Tosatti
2023-03-30 13:28                 ` Michal Hocko
2023-03-30 15:21                   ` Marcelo Tosatti
2023-03-17 13:44 ` [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus Michal Hocko
2023-03-17 20:08   ` Shakeel Butt
2023-03-17 21:51   ` kernel test robot
2023-03-17 22:22   ` kernel test robot
2023-03-17 23:32     ` Andrew Morton
2023-03-18  8:03       ` Michal Hocko
2023-03-18  3:23   ` Hillf Danton
2023-03-18  8:08     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).