* [PATCH 0/2] memcg, cpuisol: do not interfere pcp cache charges draining with cpuisol workloads @ 2023-03-17 13:44 Michal Hocko 2023-03-17 13:44 ` [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API Michal Hocko 2023-03-17 13:44 ` [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus Michal Hocko 0 siblings, 2 replies; 20+ messages in thread From: Michal Hocko @ 2023-03-17 13:44 UTC (permalink / raw) To: Andrew Morton, Leonardo Bras Cc: Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner, Marcelo Tosatti, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm Leonardo has reported [1] that pcp memcg charge draining can interfere with cpu isolated workloads. The said draining is done from a WQ context with a pcp worker scheduled on each CPU which holds any cached charges for a specific memcg hierarchy. Operation is not really a common operation [2]. It can be triggered from the userspace though so some care is definitely due. Leonardo has tried to address the issue by allowing remote charge draining [3]. This approach requires an additional locking to synchronize pcp caches sync from a remote cpu from local pcp consumers. Even though the proposed lock was per-cpu there is still potential for contention and less predictable behavior. This patchset addresses the issue from a different angle. Rather than dealing with a potential synchronization, cpus which are isolated are simply never scheduled to be drained. This means that a small amount of charges could be laying around and waiting for a later use or they are flushed when a different memcg is charged from the same cpu. More details are in patch 2. The first patch from Frederic is implementing an abstraction to tell whether a specific cpu has been isolated and therefore require a special treatment. The patchset is on top of Andrew's mm-unstable tree. I am not sure which tree is the best to route both of them but unless there are any special requirements for the cpu isolation parts then pushing this via Andrew seems like the easiest choice. Frederic Weisbecker (1): sched/isolation: Add cpu_is_isolated() API Michal Hocko (1): memcg: do not drain charge pcp caches on remote isolated cpus include/linux/sched/isolation.h | 12 ++++++++++++ mm/memcontrol.c | 2 +- 2 files changed, 13 insertions(+), 1 deletion(-) [1] https://lore.kernel.org/all/20221102020243.522358-1-leobras@redhat.com/T/#u [2] https://lore.kernel.org/all/Y9LQ615H13RmG7wL@dhcp22.suse.cz/T/#u [3] https://lore.kernel.org/all/20230125073502.743446-1-leobras@redhat.com/T/#u ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API 2023-03-17 13:44 [PATCH 0/2] memcg, cpuisol: do not interfere pcp cache charges draining with cpuisol workloads Michal Hocko @ 2023-03-17 13:44 ` Michal Hocko 2023-03-17 18:33 ` Marcelo Tosatti 2023-03-17 13:44 ` [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus Michal Hocko 1 sibling, 1 reply; 20+ messages in thread From: Michal Hocko @ 2023-03-17 13:44 UTC (permalink / raw) To: Andrew Morton, Leonardo Bras Cc: Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner, Marcelo Tosatti, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm, Frederic Weisbecker, Michal Hocko From: Frederic Weisbecker <frederic@kernel.org> Provide this new API to check if a CPU has been isolated either through isolcpus= or nohz_full= kernel parameter. It aims at avoiding kernel load deemed to be safely spared on CPUs running sensitive workload that can't bear any disturbance, such as pcp cache draining. Suggested-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Michal Hocko <mhocko@suse.com> --- include/linux/sched/isolation.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h index 8c15abd67aed..fe1a46f30d24 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -46,6 +46,12 @@ static inline bool housekeeping_enabled(enum hk_type type) static inline void housekeeping_affine(struct task_struct *t, enum hk_type type) { } + +static inline bool housekeeping_test_cpu(int cpu, enum hk_type type) +{ + return true; +} + static inline void housekeeping_init(void) { } #endif /* CONFIG_CPU_ISOLATION */ @@ -58,4 +64,10 @@ static inline bool housekeeping_cpu(int cpu, enum hk_type type) return true; } +static inline bool cpu_is_isolated(int cpu) +{ + return !housekeeping_test_cpu(cpu, HK_TYPE_DOMAIN) || + !housekeeping_test_cpu(cpu, HK_TYPE_TICK); +} + #endif /* _LINUX_SCHED_ISOLATION_H */ -- 2.30.2 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API 2023-03-17 13:44 ` [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API Michal Hocko @ 2023-03-17 18:33 ` Marcelo Tosatti 2023-03-17 18:35 ` Marcelo Tosatti 0 siblings, 1 reply; 20+ messages in thread From: Marcelo Tosatti @ 2023-03-17 18:33 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Leonardo Bras, Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm, Frederic Weisbecker, Michal Hocko On Fri, Mar 17, 2023 at 02:44:47PM +0100, Michal Hocko wrote: > From: Frederic Weisbecker <frederic@kernel.org> > > Provide this new API to check if a CPU has been isolated either through > isolcpus= or nohz_full= kernel parameter. > > It aims at avoiding kernel load deemed to be safely spared on CPUs > running sensitive workload that can't bear any disturbance, such as > pcp cache draining. Hi Michal, This makes no sense to me. HK_TYPE_DOMAIN is set when isolcpus=domain is configured. HK_TYPE_TICK is set when nohz_full= is configured. The use-cases i am aware of use either: isolcpus=managed_irq,... nohz_full= OR isolcpus=domain,managed_irq,... nohz_full= So what is the point of this function again? Perhaps it made sense along with, but now does not make sense anymore: Subject: [PATCH 1/2] sched/isolation: Merge individual nohz_full features into a common housekeeping flag The individual isolation features turned on by nohz_full were initially split in order for each of them to be tunable through cpusets. However plans have changed in favour of an interface (be it cpusets or sysctl) grouping all these features to be turned on/off altogether. Then should the need ever arise, the interface can still be expanded to handle the individual isolation features. But Michal can just use housekeeping_test_cpu(cpu, HK_TYPE_TICK) and the convertion of nohz_full features into a common housekeeping flag can convert that to something else later? > > Suggested-by: Michal Hocko <mhocko@suse.com> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > include/linux/sched/isolation.h | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h > index 8c15abd67aed..fe1a46f30d24 100644 > --- a/include/linux/sched/isolation.h > +++ b/include/linux/sched/isolation.h > @@ -46,6 +46,12 @@ static inline bool housekeeping_enabled(enum hk_type type) > > static inline void housekeeping_affine(struct task_struct *t, > enum hk_type type) { } > + > +static inline bool housekeeping_test_cpu(int cpu, enum hk_type type) > +{ > + return true; > +} > + > static inline void housekeeping_init(void) { } > #endif /* CONFIG_CPU_ISOLATION */ > > @@ -58,4 +64,10 @@ static inline bool housekeeping_cpu(int cpu, enum hk_type type) > return true; > } > > +static inline bool cpu_is_isolated(int cpu) > +{ > + return !housekeeping_test_cpu(cpu, HK_TYPE_DOMAIN) || > + !housekeeping_test_cpu(cpu, HK_TYPE_TICK); > +} > + > #endif /* _LINUX_SCHED_ISOLATION_H */ > -- > 2.30.2 > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API 2023-03-17 18:33 ` Marcelo Tosatti @ 2023-03-17 18:35 ` Marcelo Tosatti 2023-03-18 8:04 ` Michal Hocko 0 siblings, 1 reply; 20+ messages in thread From: Marcelo Tosatti @ 2023-03-17 18:35 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Leonardo Bras, Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm, Frederic Weisbecker, Michal Hocko On Fri, Mar 17, 2023 at 03:33:13PM -0300, Marcelo Tosatti wrote: > On Fri, Mar 17, 2023 at 02:44:47PM +0100, Michal Hocko wrote: > > From: Frederic Weisbecker <frederic@kernel.org> > > > > Provide this new API to check if a CPU has been isolated either through > > isolcpus= or nohz_full= kernel parameter. > > > > It aims at avoiding kernel load deemed to be safely spared on CPUs > > running sensitive workload that can't bear any disturbance, such as > > pcp cache draining. > > Hi Michal, > > This makes no sense to me. > > HK_TYPE_DOMAIN is set when isolcpus=domain is configured. > HK_TYPE_TICK is set when nohz_full= is configured. > > The use-cases i am aware of use either: > > isolcpus=managed_irq,... nohz_full= > OR > isolcpus=domain,managed_irq,... nohz_full= > > So what is the point of this function again? > > Perhaps it made sense along with, but now does not make sense > anymore: > > Subject: [PATCH 1/2] sched/isolation: Merge individual nohz_full features into a common housekeeping flag > > The individual isolation features turned on by nohz_full were initially > split in order for each of them to be tunable through cpusets. However > plans have changed in favour of an interface (be it cpusets or sysctl) > grouping all these features to be turned on/off altogether. Then should > the need ever arise, the interface can still be expanded to handle the > individual isolation features. > > But Michal can just use housekeeping_test_cpu(cpu, HK_TYPE_TICK) and > the convertion of nohz_full features into a common housekeeping flag > can convert that to something else later? Actually introducing cpu_is_isolated() seems fine, but it can call housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API 2023-03-17 18:35 ` Marcelo Tosatti @ 2023-03-18 8:04 ` Michal Hocko 2023-03-24 22:35 ` Frederic Weisbecker 0 siblings, 1 reply; 20+ messages in thread From: Michal Hocko @ 2023-03-18 8:04 UTC (permalink / raw) To: Marcelo Tosatti, Frederic Weisbecker Cc: Andrew Morton, Leonardo Bras, Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm, Frederic Weisbecker On Fri 17-03-23 15:35:05, Marcelo Tosatti wrote: > On Fri, Mar 17, 2023 at 03:33:13PM -0300, Marcelo Tosatti wrote: > > On Fri, Mar 17, 2023 at 02:44:47PM +0100, Michal Hocko wrote: > > > From: Frederic Weisbecker <frederic@kernel.org> > > > > > > Provide this new API to check if a CPU has been isolated either through > > > isolcpus= or nohz_full= kernel parameter. > > > > > > It aims at avoiding kernel load deemed to be safely spared on CPUs > > > running sensitive workload that can't bear any disturbance, such as > > > pcp cache draining. > > > > Hi Michal, > > > > This makes no sense to me. > > > > HK_TYPE_DOMAIN is set when isolcpus=domain is configured. > > HK_TYPE_TICK is set when nohz_full= is configured. > > > > The use-cases i am aware of use either: > > > > isolcpus=managed_irq,... nohz_full= > > OR > > isolcpus=domain,managed_irq,... nohz_full= > > > > So what is the point of this function again? > > > > Perhaps it made sense along with, but now does not make sense > > anymore: > > > > Subject: [PATCH 1/2] sched/isolation: Merge individual nohz_full features into a common housekeeping flag > > > > The individual isolation features turned on by nohz_full were initially > > split in order for each of them to be tunable through cpusets. However > > plans have changed in favour of an interface (be it cpusets or sysctl) > > grouping all these features to be turned on/off altogether. Then should > > the need ever arise, the interface can still be expanded to handle the > > individual isolation features. > > > > But Michal can just use housekeeping_test_cpu(cpu, HK_TYPE_TICK) and > > the convertion of nohz_full features into a common housekeeping flag > > can convert that to something else later? > > Actually introducing cpu_is_isolated() seems fine, but it can call > housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS. This is not really my area. Frederic, could you have a look please? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API 2023-03-18 8:04 ` Michal Hocko @ 2023-03-24 22:35 ` Frederic Weisbecker 2023-03-27 10:24 ` Marcelo Tosatti 0 siblings, 1 reply; 20+ messages in thread From: Frederic Weisbecker @ 2023-03-24 22:35 UTC (permalink / raw) To: Michal Hocko Cc: Marcelo Tosatti, Frederic Weisbecker, Andrew Morton, Leonardo Bras, Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm Le Sat, Mar 18, 2023 at 09:04:38AM +0100, Michal Hocko a écrit : > On Fri 17-03-23 15:35:05, Marcelo Tosatti wrote: > > On Fri, Mar 17, 2023 at 03:33:13PM -0300, Marcelo Tosatti wrote: > > > On Fri, Mar 17, 2023 at 02:44:47PM +0100, Michal Hocko wrote: > > > > From: Frederic Weisbecker <frederic@kernel.org> > > > > > > > > Provide this new API to check if a CPU has been isolated either through > > > > isolcpus= or nohz_full= kernel parameter. > > > > > > > > It aims at avoiding kernel load deemed to be safely spared on CPUs > > > > running sensitive workload that can't bear any disturbance, such as > > > > pcp cache draining. > > > > > > Hi Michal, > > > > > > This makes no sense to me. > > > > > > HK_TYPE_DOMAIN is set when isolcpus=domain is configured. > > > HK_TYPE_TICK is set when nohz_full= is configured. > > > > > > The use-cases i am aware of use either: > > > > > > isolcpus=managed_irq,... nohz_full= > > > OR > > > isolcpus=domain,managed_irq,... nohz_full= > > > > > > So what is the point of this function again? > > > > > > Perhaps it made sense along with, but now does not make sense > > > anymore: > > > > > > Subject: [PATCH 1/2] sched/isolation: Merge individual nohz_full features into a common housekeeping flag > > > > > > The individual isolation features turned on by nohz_full were initially > > > split in order for each of them to be tunable through cpusets. However > > > plans have changed in favour of an interface (be it cpusets or sysctl) > > > grouping all these features to be turned on/off altogether. Then should > > > the need ever arise, the interface can still be expanded to handle the > > > individual isolation features. > > > > > > But Michal can just use housekeeping_test_cpu(cpu, HK_TYPE_TICK) and > > > the convertion of nohz_full features into a common housekeeping flag > > > can convert that to something else later? > > > > Actually introducing cpu_is_isolated() seems fine, but it can call > > housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS. > > This is not really my area. Frederic, could you have a look please? The point is to have a function that tells if either nohz_full= or isolcpus=[domain] has been passed for the given CPU. Because I assumed that both would be interested in avoiding that flush noise, wouldn't it be the case? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API 2023-03-24 22:35 ` Frederic Weisbecker @ 2023-03-27 10:24 ` Marcelo Tosatti 2023-03-28 11:38 ` Frederic Weisbecker 2023-03-28 11:48 ` Michal Hocko 0 siblings, 2 replies; 20+ messages in thread From: Marcelo Tosatti @ 2023-03-27 10:24 UTC (permalink / raw) To: Frederic Weisbecker Cc: Michal Hocko, Frederic Weisbecker, Andrew Morton, Leonardo Bras, Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm On Fri, Mar 24, 2023 at 11:35:35PM +0100, Frederic Weisbecker wrote: > Le Sat, Mar 18, 2023 at 09:04:38AM +0100, Michal Hocko a écrit : > > On Fri 17-03-23 15:35:05, Marcelo Tosatti wrote: > > > On Fri, Mar 17, 2023 at 03:33:13PM -0300, Marcelo Tosatti wrote: > > > > On Fri, Mar 17, 2023 at 02:44:47PM +0100, Michal Hocko wrote: > > > > > From: Frederic Weisbecker <frederic@kernel.org> > > > > > > > > > > Provide this new API to check if a CPU has been isolated either through > > > > > isolcpus= or nohz_full= kernel parameter. > > > > > > > > > > It aims at avoiding kernel load deemed to be safely spared on CPUs > > > > > running sensitive workload that can't bear any disturbance, such as > > > > > pcp cache draining. > > > > > > > > Hi Michal, > > > > > > > > This makes no sense to me. > > > > > > > > HK_TYPE_DOMAIN is set when isolcpus=domain is configured. > > > > HK_TYPE_TICK is set when nohz_full= is configured. > > > > > > > > The use-cases i am aware of use either: > > > > > > > > isolcpus=managed_irq,... nohz_full= > > > > OR > > > > isolcpus=domain,managed_irq,... nohz_full= > > > > > > > > So what is the point of this function again? > > > > > > > > Perhaps it made sense along with, but now does not make sense > > > > anymore: > > > > > > > > Subject: [PATCH 1/2] sched/isolation: Merge individual nohz_full features into a common housekeeping flag > > > > > > > > The individual isolation features turned on by nohz_full were initially > > > > split in order for each of them to be tunable through cpusets. However > > > > plans have changed in favour of an interface (be it cpusets or sysctl) > > > > grouping all these features to be turned on/off altogether. Then should > > > > the need ever arise, the interface can still be expanded to handle the > > > > individual isolation features. > > > > > > > > But Michal can just use housekeeping_test_cpu(cpu, HK_TYPE_TICK) and > > > > the convertion of nohz_full features into a common housekeeping flag > > > > can convert that to something else later? > > > > > > Actually introducing cpu_is_isolated() seems fine, but it can call > > > housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS. > > > > This is not really my area. Frederic, could you have a look please? > > The point is to have a function that tells if either nohz_full= or > isolcpus=[domain] has been passed for the given CPU. > > Because I assumed that both would be interested in avoiding that flush > noise, wouldn't it be the case? Yes, that is the case. But as a note: for the two main types of configuration performed (one uses isolcpus=[domain] and the other cgroups, for isolating processes) nohz_full= is always set. So just testing for nohz_full= would be sufficient (which perhaps would make the code simpler). ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API 2023-03-27 10:24 ` Marcelo Tosatti @ 2023-03-28 11:38 ` Frederic Weisbecker 2023-03-28 11:48 ` Michal Hocko 1 sibling, 0 replies; 20+ messages in thread From: Frederic Weisbecker @ 2023-03-28 11:38 UTC (permalink / raw) To: Marcelo Tosatti Cc: Michal Hocko, Frederic Weisbecker, Andrew Morton, Leonardo Bras, Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm On Mon, Mar 27, 2023 at 07:24:54AM -0300, Marcelo Tosatti wrote: > On Fri, Mar 24, 2023 at 11:35:35PM +0100, Frederic Weisbecker wrote: > > Le Sat, Mar 18, 2023 at 09:04:38AM +0100, Michal Hocko a écrit : > > > On Fri 17-03-23 15:35:05, Marcelo Tosatti wrote: > > > > On Fri, Mar 17, 2023 at 03:33:13PM -0300, Marcelo Tosatti wrote: > > > > > On Fri, Mar 17, 2023 at 02:44:47PM +0100, Michal Hocko wrote: > > > > > > From: Frederic Weisbecker <frederic@kernel.org> > > > > > > > > > > > > Provide this new API to check if a CPU has been isolated either through > > > > > > isolcpus= or nohz_full= kernel parameter. > > > > > > > > > > > > It aims at avoiding kernel load deemed to be safely spared on CPUs > > > > > > running sensitive workload that can't bear any disturbance, such as > > > > > > pcp cache draining. > > > > > > > > > > Hi Michal, > > > > > > > > > > This makes no sense to me. > > > > > > > > > > HK_TYPE_DOMAIN is set when isolcpus=domain is configured. > > > > > HK_TYPE_TICK is set when nohz_full= is configured. > > > > > > > > > > The use-cases i am aware of use either: > > > > > > > > > > isolcpus=managed_irq,... nohz_full= > > > > > OR > > > > > isolcpus=domain,managed_irq,... nohz_full= > > > > > > > > > > So what is the point of this function again? > > > > > > > > > > Perhaps it made sense along with, but now does not make sense > > > > > anymore: > > > > > > > > > > Subject: [PATCH 1/2] sched/isolation: Merge individual nohz_full features into a common housekeeping flag > > > > > > > > > > The individual isolation features turned on by nohz_full were initially > > > > > split in order for each of them to be tunable through cpusets. However > > > > > plans have changed in favour of an interface (be it cpusets or sysctl) > > > > > grouping all these features to be turned on/off altogether. Then should > > > > > the need ever arise, the interface can still be expanded to handle the > > > > > individual isolation features. > > > > > > > > > > But Michal can just use housekeeping_test_cpu(cpu, HK_TYPE_TICK) and > > > > > the convertion of nohz_full features into a common housekeeping flag > > > > > can convert that to something else later? > > > > > > > > Actually introducing cpu_is_isolated() seems fine, but it can call > > > > housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS. > > > > > > This is not really my area. Frederic, could you have a look please? > > > > The point is to have a function that tells if either nohz_full= or > > isolcpus=[domain] has been passed for the given CPU. > > > > Because I assumed that both would be interested in avoiding that flush > > noise, wouldn't it be the case? > > Yes, that is the case. But as a note: for the two main types of > configuration performed (one uses isolcpus=[domain] and the other > cgroups, for isolating processes) nohz_full= is always set. > > So just testing for nohz_full= would be sufficient (which perhaps would > make the code simpler). Ok then all is needed is to test tick_nohz_full_cpu(target), right? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API 2023-03-27 10:24 ` Marcelo Tosatti 2023-03-28 11:38 ` Frederic Weisbecker @ 2023-03-28 11:48 ` Michal Hocko 2023-03-29 14:20 ` Marcelo Tosatti 1 sibling, 1 reply; 20+ messages in thread From: Michal Hocko @ 2023-03-28 11:48 UTC (permalink / raw) To: Marcelo Tosatti Cc: Frederic Weisbecker, Frederic Weisbecker, Andrew Morton, Leonardo Bras, Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm On Mon 27-03-23 07:24:54, Marcelo Tosatti wrote: > On Fri, Mar 24, 2023 at 11:35:35PM +0100, Frederic Weisbecker wrote: > > Le Sat, Mar 18, 2023 at 09:04:38AM +0100, Michal Hocko a écrit : > > > On Fri 17-03-23 15:35:05, Marcelo Tosatti wrote: [...] > > > > Actually introducing cpu_is_isolated() seems fine, but it can call > > > > housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS. > > > > > > This is not really my area. Frederic, could you have a look please? > > > > The point is to have a function that tells if either nohz_full= or > > isolcpus=[domain] has been passed for the given CPU. > > > > Because I assumed that both would be interested in avoiding that flush > > noise, wouldn't it be the case? > > Yes, that is the case. But as a note: for the two main types of > configuration performed (one uses isolcpus=[domain] and the other > cgroups, for isolating processes) nohz_full= is always set. > > So just testing for nohz_full= would be sufficient (which perhaps would > make the code simpler). I do not see any mention about that assumption under Documentation/. Is this a best practice documented anywhere or it just happens to be the case with workloads you deal with? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API 2023-03-28 11:48 ` Michal Hocko @ 2023-03-29 14:20 ` Marcelo Tosatti 2023-03-30 13:28 ` Michal Hocko 0 siblings, 1 reply; 20+ messages in thread From: Marcelo Tosatti @ 2023-03-29 14:20 UTC (permalink / raw) To: Michal Hocko Cc: Frederic Weisbecker, Frederic Weisbecker, Andrew Morton, Leonardo Bras, Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm On Tue, Mar 28, 2023 at 01:48:02PM +0200, Michal Hocko wrote: > On Mon 27-03-23 07:24:54, Marcelo Tosatti wrote: > > On Fri, Mar 24, 2023 at 11:35:35PM +0100, Frederic Weisbecker wrote: > > > Le Sat, Mar 18, 2023 at 09:04:38AM +0100, Michal Hocko a écrit : > > > > On Fri 17-03-23 15:35:05, Marcelo Tosatti wrote: > [...] > > > > > Actually introducing cpu_is_isolated() seems fine, but it can call > > > > > housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS. > > > > > > > > This is not really my area. Frederic, could you have a look please? > > > > > > The point is to have a function that tells if either nohz_full= or > > > isolcpus=[domain] has been passed for the given CPU. > > > > > > Because I assumed that both would be interested in avoiding that flush > > > noise, wouldn't it be the case? > > > > Yes, that is the case. But as a note: for the two main types of > > configuration performed (one uses isolcpus=[domain] and the other > > cgroups, for isolating processes) nohz_full= is always set. > > > > So just testing for nohz_full= would be sufficient (which perhaps would > > make the code simpler). > > I do not see any mention about that assumption under Documentation/. Documentation/admin-guide/kernel-per-CPU-kthreads.rst SCHED_SOFTIRQ ------------- Do all of the following: 1. Avoid sending scheduler IPIs to the CPU to be de-jittered, for example, ensure that at most one runnable kthread is present on that CPU. If a thread that expects to run on the de-jittered CPU awakens, the scheduler will send an IPI that can result in a subsequent SCHED_SOFTIRQ. 2. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be de-jittered is marked as an adaptive-ticks CPU using the "nohz_full=" boot parameter. This reduces the number of scheduler-clock interrupts that the de-jittered CPU receives, minimizing its chances of being selected to do the load balancing work that runs in SCHED_SOFTIRQ context. > Is this a best practice documented anywhere or it just happens to be > the case with workloads you deal with? Option 2. However Frederic seems interested in matching the exported toggles with the known use-cases classes. For example, for this guide: http://www.comfilewiki.co.kr/en/doku.php?id=comfilepi:improving_real-time_performance:index Using nohz_full= would be a benefit (and its not being currently set, perhaps due to not knowing all the options?). http://www.comfilewiki.co.kr/en/doku.php?id=comfilepi:improving_real-time_performance:index AFAIU the workloads for which disabling nohz_full= is a benefit are those where the switching between nohz full mode and sched tick enabled mode and vice-versa (which involve programming the local timer) happens often and is therefore avoidable? For example switching between 1 runnable task and more than 1 runnable task (and vice versa). ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API 2023-03-29 14:20 ` Marcelo Tosatti @ 2023-03-30 13:28 ` Michal Hocko 2023-03-30 15:21 ` Marcelo Tosatti 0 siblings, 1 reply; 20+ messages in thread From: Michal Hocko @ 2023-03-30 13:28 UTC (permalink / raw) To: Marcelo Tosatti Cc: Frederic Weisbecker, Frederic Weisbecker, Andrew Morton, Leonardo Bras, Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm On Wed 29-03-23 11:20:21, Marcelo Tosatti wrote: > On Tue, Mar 28, 2023 at 01:48:02PM +0200, Michal Hocko wrote: > > On Mon 27-03-23 07:24:54, Marcelo Tosatti wrote: > > > On Fri, Mar 24, 2023 at 11:35:35PM +0100, Frederic Weisbecker wrote: > > > > Le Sat, Mar 18, 2023 at 09:04:38AM +0100, Michal Hocko a écrit : > > > > > On Fri 17-03-23 15:35:05, Marcelo Tosatti wrote: > > [...] > > > > > > Actually introducing cpu_is_isolated() seems fine, but it can call > > > > > > housekeeping_test_cpu(cpu, HK_TYPE_TICK) AFAICS. > > > > > > > > > > This is not really my area. Frederic, could you have a look please? > > > > > > > > The point is to have a function that tells if either nohz_full= or > > > > isolcpus=[domain] has been passed for the given CPU. > > > > > > > > Because I assumed that both would be interested in avoiding that flush > > > > noise, wouldn't it be the case? > > > > > > Yes, that is the case. But as a note: for the two main types of > > > configuration performed (one uses isolcpus=[domain] and the other > > > cgroups, for isolating processes) nohz_full= is always set. > > > > > > So just testing for nohz_full= would be sufficient (which perhaps would > > > make the code simpler). > > > > I do not see any mention about that assumption under Documentation/. > > Documentation/admin-guide/kernel-per-CPU-kthreads.rst > > SCHED_SOFTIRQ > ------------- > > Do all of the following: > > 1. Avoid sending scheduler IPIs to the CPU to be de-jittered, > for example, ensure that at most one runnable kthread is present > on that CPU. If a thread that expects to run on the de-jittered > CPU awakens, the scheduler will send an IPI that can result in > a subsequent SCHED_SOFTIRQ. > 2. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be de-jittered > is marked as an adaptive-ticks CPU using the "nohz_full=" > boot parameter. This reduces the number of scheduler-clock > interrupts that the de-jittered CPU receives, minimizing its > chances of being selected to do the load balancing work that > runs in SCHED_SOFTIRQ context. Quite hidden and easy to miss if you are only aware of isolcpus. > > Is this a best practice documented anywhere or it just happens to be > > the case with workloads you deal with? > > Option 2. However Frederic seems interested in matching the exported > toggles with the known use-cases classes. > > For example, for this guide: > http://www.comfilewiki.co.kr/en/doku.php?id=comfilepi:improving_real-time_performance:index > > Using nohz_full= would be a benefit (and its not being currently set, > perhaps due to not knowing all the options?). > > http://www.comfilewiki.co.kr/en/doku.php?id=comfilepi:improving_real-time_performance:index > > > AFAIU the workloads for which disabling nohz_full= is a benefit are those > where the switching between nohz full mode and sched tick enabled mode > and vice-versa (which involve programming the local timer) happens > often and is therefore avoidable? For example switching between 1 > runnable task and more than 1 runnable task (and vice versa). The patch from Frederic is testing for both. You seem to be arguing to reduce the test and I still do not understand why. Sure some workloads (following the above) will likely use nohz_full= as well but does it make sense to build that expectation into the higher level logic? What is an actual benefit? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API 2023-03-30 13:28 ` Michal Hocko @ 2023-03-30 15:21 ` Marcelo Tosatti 0 siblings, 0 replies; 20+ messages in thread From: Marcelo Tosatti @ 2023-03-30 15:21 UTC (permalink / raw) To: Michal Hocko Cc: Frederic Weisbecker, Frederic Weisbecker, Andrew Morton, Leonardo Bras, Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm On Thu, Mar 30, 2023 at 03:28:52PM +0200, Michal Hocko wrote: > > > Is this a best practice documented anywhere or it just happens to be > > > the case with workloads you deal with? > > > > Option 2. However Frederic seems interested in matching the exported > > toggles with the known use-cases classes. > > > > For example, for this guide: > > http://www.comfilewiki.co.kr/en/doku.php?id=comfilepi:improving_real-time_performance:index > > > > Using nohz_full= would be a benefit (and its not being currently set, > > perhaps due to not knowing all the options?). > > > > http://www.comfilewiki.co.kr/en/doku.php?id=comfilepi:improving_real-time_performance:index > > > > > > AFAIU the workloads for which disabling nohz_full= is a benefit are those > > where the switching between nohz full mode and sched tick enabled mode > > and vice-versa (which involve programming the local timer) happens > > often and is therefore avoidable? For example switching between 1 > > runnable task and more than 1 runnable task (and vice versa). > > The patch from Frederic is testing for both. You seem to be arguing to > reduce the test and I still do not understand why. Sure some workloads > (following the above) will likely use nohz_full= as well but does it > make sense to build that expectation into the higher level logic? What > is an actual benefit? Just thinking of simpler code. Feel free to maintain the patch as-is if you see fit. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus 2023-03-17 13:44 [PATCH 0/2] memcg, cpuisol: do not interfere pcp cache charges draining with cpuisol workloads Michal Hocko 2023-03-17 13:44 ` [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API Michal Hocko @ 2023-03-17 13:44 ` Michal Hocko 2023-03-17 20:08 ` Shakeel Butt ` (3 more replies) 1 sibling, 4 replies; 20+ messages in thread From: Michal Hocko @ 2023-03-17 13:44 UTC (permalink / raw) To: Andrew Morton, Leonardo Bras Cc: Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner, Marcelo Tosatti, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML, linux-mm, Michal Hocko, Frederic Weisbecker From: Michal Hocko <mhocko@suse.com> Leonardo Bras has noticed that pcp charge cache draining might be disruptive on workloads relying on 'isolated cpus', a feature commonly used on workloads that are sensitive to interruption and context switching such as vRAN and Industrial Control Systems. There are essentially two ways how to approach the issue. We can either allow the pcp cache to be drained on a different rather than a local cpu or avoid remote flushing on isolated cpus. The current pcp charge cache is really optimized for high performance and it always relies to stick with its cpu. That means it only requires local_lock (preempt_disable on !RT) and draining is handed over to pcp WQ to drain locally again. The former solution (remote draining) would require to add an additional locking to prevent local charges from racing with the draining. This adds an atomic operation to otherwise simple arithmetic fast path in the try_charge path. Another concern is that the remote draining can cause a lock contention for the isolated workloads and therefore interfere with it indirectly via user space interfaces. Another option is to avoid draining scheduling on isolated cpus altogether. That means that those remote cpus would keep their charges even after drain_all_stock returns. This is certainly not optimal either but it shouldn't really cause any major problems. In the worst case (many isolated cpus with charges - each of them with MEMCG_CHARGE_BATCH i.e 64 page) the memory consumption of a memcg would be artificially higher than can be immediately used from other cpus. Theoretically a memcg OOM killer could be triggered pre-maturely. Currently it is not really clear whether this is a practical problem though. Tight memcg limit would be really counter productive to cpu isolated workloads pretty much by definition because any memory reclaimed induced by memcg limit could break user space timing expectations as those usually expect execution in the userspace most of the time. Also charges could be left behind on memcg removal. Any future charge on those isolated cpus will drain that pcp cache so this won't be a permanent leak. Considering cons and pros of both approaches this patch is implementing the second option and simply do not schedule remote draining if the target cpu is isolated. This solution is much more simpler. It doesn't add any new locking and it is more more predictable from the user space POV. Should the pre-mature memcg OOM become a real life problem, we can revisit this decision. Cc: Leonardo Brás <leobras@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Frederic Weisbecker <frederic@kernel.org> Reported-by: Leonardo Bras <leobras@redhat.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Suggested-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Michal Hocko <mhocko@suse.com> --- mm/memcontrol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0524add35cae..12559c08d976 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2366,7 +2366,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg) !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { if (cpu == curcpu) drain_local_stock(&stock->work); - else + else if (!cpu_is_isolated(cpu)) schedule_work_on(cpu, &stock->work); } } -- 2.30.2 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus 2023-03-17 13:44 ` [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus Michal Hocko @ 2023-03-17 20:08 ` Shakeel Butt 2023-03-17 21:51 ` kernel test robot ` (2 subsequent siblings) 3 siblings, 0 replies; 20+ messages in thread From: Shakeel Butt @ 2023-03-17 20:08 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Leonardo Bras, Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner, Marcelo Tosatti, Johannes Weiner, Roman Gushchin, Muchun Song, LKML, linux-mm, Michal Hocko, Frederic Weisbecker On Fri, Mar 17, 2023 at 6:44 AM Michal Hocko <mhocko@kernel.org> wrote: > > From: Michal Hocko <mhocko@suse.com> > > Leonardo Bras has noticed that pcp charge cache draining might be > disruptive on workloads relying on 'isolated cpus', a feature commonly > used on workloads that are sensitive to interruption and context > switching such as vRAN and Industrial Control Systems. > > There are essentially two ways how to approach the issue. We can either > allow the pcp cache to be drained on a different rather than a local cpu > or avoid remote flushing on isolated cpus. > > The current pcp charge cache is really optimized for high performance > and it always relies to stick with its cpu. That means it only requires > local_lock (preempt_disable on !RT) and draining is handed over to pcp > WQ to drain locally again. > > The former solution (remote draining) would require to add an additional > locking to prevent local charges from racing with the draining. This > adds an atomic operation to otherwise simple arithmetic fast path in the > try_charge path. Another concern is that the remote draining can cause a > lock contention for the isolated workloads and therefore interfere with > it indirectly via user space interfaces. > > Another option is to avoid draining scheduling on isolated cpus > altogether. That means that those remote cpus would keep their charges > even after drain_all_stock returns. This is certainly not optimal either > but it shouldn't really cause any major problems. In the worst case > (many isolated cpus with charges - each of them with MEMCG_CHARGE_BATCH > i.e 64 page) the memory consumption of a memcg would be artificially > higher than can be immediately used from other cpus. > > Theoretically a memcg OOM killer could be triggered pre-maturely. > Currently it is not really clear whether this is a practical problem > though. Tight memcg limit would be really counter productive to cpu > isolated workloads pretty much by definition because any memory > reclaimed induced by memcg limit could break user space timing > expectations as those usually expect execution in the userspace most of > the time. > > Also charges could be left behind on memcg removal. Any future charge on > those isolated cpus will drain that pcp cache so this won't be a > permanent leak. > > Considering cons and pros of both approaches this patch is implementing > the second option and simply do not schedule remote draining if the > target cpu is isolated. This solution is much more simpler. It doesn't > add any new locking and it is more more predictable from the user space > POV. Should the pre-mature memcg OOM become a real life problem, we can > revisit this decision. > > Cc: Leonardo Brás <leobras@redhat.com> > Cc: Marcelo Tosatti <mtosatti@redhat.com> > Cc: Shakeel Butt <shakeelb@google.com> > Cc: Muchun Song <muchun.song@linux.dev> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Frederic Weisbecker <frederic@kernel.org> > Reported-by: Leonardo Bras <leobras@redhat.com> > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> > Suggested-by: Roman Gushchin <roman.gushchin@linux.dev> > Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeelb@google.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus 2023-03-17 13:44 ` [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus Michal Hocko 2023-03-17 20:08 ` Shakeel Butt @ 2023-03-17 21:51 ` kernel test robot 2023-03-17 22:22 ` kernel test robot 2023-03-18 3:23 ` Hillf Danton 3 siblings, 0 replies; 20+ messages in thread From: kernel test robot @ 2023-03-17 21:51 UTC (permalink / raw) To: Michal Hocko, Andrew Morton, Leonardo Bras Cc: llvm, oe-kbuild-all, Linux Memory Management List, Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner, Marcelo Tosatti, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML, Michal Hocko Hi Michal, I love your patch! Yet something to improve: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on linus/master v6.3-rc2 next-20230317] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Michal-Hocko/sched-isolation-Add-cpu_is_isolated-API/20230317-214621 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20230317134448.11082-3-mhocko%40kernel.org patch subject: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus config: x86_64-randconfig-a014 (https://download.01.org/0day-ci/archive/20230318/202303180520.5lKAJwrg-lkp@intel.com/config) compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/b376cfcf40a43276e1280950bb926fdf3521940a git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Michal-Hocko/sched-isolation-Add-cpu_is_isolated-API/20230317-214621 git checkout b376cfcf40a43276e1280950bb926fdf3521940a # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202303180520.5lKAJwrg-lkp@intel.com/ All errors (new ones prefixed by >>): >> mm/memcontrol.c:2369:14: error: implicit declaration of function 'cpu_is_isolated' is invalid in C99 [-Werror,-Wimplicit-function-declaration] else if (!cpu_is_isolated(cpu)) ^ 1 error generated. vim +/cpu_is_isolated +2369 mm/memcontrol.c 2331 2332 /* 2333 * Drains all per-CPU charge caches for given root_memcg resp. subtree 2334 * of the hierarchy under it. 2335 */ 2336 static void drain_all_stock(struct mem_cgroup *root_memcg) 2337 { 2338 int cpu, curcpu; 2339 2340 /* If someone's already draining, avoid adding running more workers. */ 2341 if (!mutex_trylock(&percpu_charge_mutex)) 2342 return; 2343 /* 2344 * Notify other cpus that system-wide "drain" is running 2345 * We do not care about races with the cpu hotplug because cpu down 2346 * as well as workers from this path always operate on the local 2347 * per-cpu data. CPU up doesn't touch memcg_stock at all. 2348 */ 2349 migrate_disable(); 2350 curcpu = smp_processor_id(); 2351 for_each_online_cpu(cpu) { 2352 struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); 2353 struct mem_cgroup *memcg; 2354 bool flush = false; 2355 2356 rcu_read_lock(); 2357 memcg = stock->cached; 2358 if (memcg && stock->nr_pages && 2359 mem_cgroup_is_descendant(memcg, root_memcg)) 2360 flush = true; 2361 else if (obj_stock_flush_required(stock, root_memcg)) 2362 flush = true; 2363 rcu_read_unlock(); 2364 2365 if (flush && 2366 !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { 2367 if (cpu == curcpu) 2368 drain_local_stock(&stock->work); > 2369 else if (!cpu_is_isolated(cpu)) 2370 schedule_work_on(cpu, &stock->work); 2371 } 2372 } 2373 migrate_enable(); 2374 mutex_unlock(&percpu_charge_mutex); 2375 } 2376 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus 2023-03-17 13:44 ` [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus Michal Hocko 2023-03-17 20:08 ` Shakeel Butt 2023-03-17 21:51 ` kernel test robot @ 2023-03-17 22:22 ` kernel test robot 2023-03-17 23:32 ` Andrew Morton 2023-03-18 3:23 ` Hillf Danton 3 siblings, 1 reply; 20+ messages in thread From: kernel test robot @ 2023-03-17 22:22 UTC (permalink / raw) To: Michal Hocko, Andrew Morton, Leonardo Bras Cc: oe-kbuild-all, Linux Memory Management List, Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner, Marcelo Tosatti, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML, Michal Hocko Hi Michal, I love your patch! Yet something to improve: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on linus/master v6.3-rc2 next-20230317] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Michal-Hocko/sched-isolation-Add-cpu_is_isolated-API/20230317-214621 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20230317134448.11082-3-mhocko%40kernel.org patch subject: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus config: x86_64-randconfig-a013 (https://download.01.org/0day-ci/archive/20230318/202303180617.7E3aIlHf-lkp@intel.com/config) compiler: gcc-11 (Debian 11.3.0-8) 11.3.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/b376cfcf40a43276e1280950bb926fdf3521940a git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Michal-Hocko/sched-isolation-Add-cpu_is_isolated-API/20230317-214621 git checkout b376cfcf40a43276e1280950bb926fdf3521940a # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=x86_64 olddefconfig make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202303180617.7E3aIlHf-lkp@intel.com/ All errors (new ones prefixed by >>): mm/memcontrol.c: In function 'drain_all_stock': >> mm/memcontrol.c:2369:35: error: implicit declaration of function 'cpu_is_isolated' [-Werror=implicit-function-declaration] 2369 | else if (!cpu_is_isolated(cpu)) | ^~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +/cpu_is_isolated +2369 mm/memcontrol.c 2331 2332 /* 2333 * Drains all per-CPU charge caches for given root_memcg resp. subtree 2334 * of the hierarchy under it. 2335 */ 2336 static void drain_all_stock(struct mem_cgroup *root_memcg) 2337 { 2338 int cpu, curcpu; 2339 2340 /* If someone's already draining, avoid adding running more workers. */ 2341 if (!mutex_trylock(&percpu_charge_mutex)) 2342 return; 2343 /* 2344 * Notify other cpus that system-wide "drain" is running 2345 * We do not care about races with the cpu hotplug because cpu down 2346 * as well as workers from this path always operate on the local 2347 * per-cpu data. CPU up doesn't touch memcg_stock at all. 2348 */ 2349 migrate_disable(); 2350 curcpu = smp_processor_id(); 2351 for_each_online_cpu(cpu) { 2352 struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); 2353 struct mem_cgroup *memcg; 2354 bool flush = false; 2355 2356 rcu_read_lock(); 2357 memcg = stock->cached; 2358 if (memcg && stock->nr_pages && 2359 mem_cgroup_is_descendant(memcg, root_memcg)) 2360 flush = true; 2361 else if (obj_stock_flush_required(stock, root_memcg)) 2362 flush = true; 2363 rcu_read_unlock(); 2364 2365 if (flush && 2366 !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { 2367 if (cpu == curcpu) 2368 drain_local_stock(&stock->work); > 2369 else if (!cpu_is_isolated(cpu)) 2370 schedule_work_on(cpu, &stock->work); 2371 } 2372 } 2373 migrate_enable(); 2374 mutex_unlock(&percpu_charge_mutex); 2375 } 2376 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus 2023-03-17 22:22 ` kernel test robot @ 2023-03-17 23:32 ` Andrew Morton 2023-03-18 8:03 ` Michal Hocko 0 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2023-03-17 23:32 UTC (permalink / raw) To: kernel test robot Cc: Michal Hocko, Leonardo Bras, oe-kbuild-all, Linux Memory Management List, Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner, Marcelo Tosatti, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML, Michal Hocko On Sat, 18 Mar 2023 06:22:40 +0800 kernel test robot <lkp@intel.com> wrote: > mm/memcontrol.c: In function 'drain_all_stock': > >> mm/memcontrol.c:2369:35: error: implicit declaration of function 'cpu_is_isolated' [-Werror=implicit-function-declaration] > 2369 | else if (!cpu_is_isolated(cpu)) > | ^~~~~~~~~~~~~~~ Thanks. --- a/mm/memcontrol.c~memcg-do-not-drain-charge-pcp-caches-on-remote-isolated-cpus-fix +++ a/mm/memcontrol.c @@ -63,6 +63,7 @@ #include <linux/resume_user_mode.h> #include <linux/psi.h> #include <linux/seq_buf.h> +#include <linux/sched/isolation.h> #include "internal.h" #include <net/sock.h> #include <net/ip.h> _ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus 2023-03-17 23:32 ` Andrew Morton @ 2023-03-18 8:03 ` Michal Hocko 0 siblings, 0 replies; 20+ messages in thread From: Michal Hocko @ 2023-03-18 8:03 UTC (permalink / raw) To: Andrew Morton Cc: kernel test robot, Leonardo Bras, oe-kbuild-all, Linux Memory Management List, Frederic Weisbecker, Peter Zijlstra, Thomas Gleixner, Marcelo Tosatti, Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song, LKML On Fri 17-03-23 16:32:29, Andrew Morton wrote: > On Sat, 18 Mar 2023 06:22:40 +0800 kernel test robot <lkp@intel.com> wrote: > > > mm/memcontrol.c: In function 'drain_all_stock': > > >> mm/memcontrol.c:2369:35: error: implicit declaration of function 'cpu_is_isolated' [-Werror=implicit-function-declaration] > > 2369 | else if (!cpu_is_isolated(cpu)) > > | ^~~~~~~~~~~~~~~ > > Thanks. > > --- a/mm/memcontrol.c~memcg-do-not-drain-charge-pcp-caches-on-remote-isolated-cpus-fix > +++ a/mm/memcontrol.c > @@ -63,6 +63,7 @@ > #include <linux/resume_user_mode.h> > #include <linux/psi.h> > #include <linux/seq_buf.h> > +#include <linux/sched/isolation.h> > #include "internal.h" > #include <net/sock.h> > #include <net/ip.h> Thanks a lot Andrew! > _ -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus 2023-03-17 13:44 ` [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus Michal Hocko ` (2 preceding siblings ...) 2023-03-17 22:22 ` kernel test robot @ 2023-03-18 3:23 ` Hillf Danton 2023-03-18 8:08 ` Michal Hocko 3 siblings, 1 reply; 20+ messages in thread From: Hillf Danton @ 2023-03-18 3:23 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Leonardo Bras, Frederic Weisbecker, LKML, linux-mm, Michal Hocko, Frederic Weisbecker On 17 Mar 2023 14:44:48 +0100 Michal Hocko <mhocko@suse.com> > Leonardo Bras has noticed that pcp charge cache draining might be > disruptive on workloads relying on 'isolated cpus', a feature commonly > used on workloads that are sensitive to interruption and context > switching such as vRAN and Industrial Control Systems. > > There are essentially two ways how to approach the issue. We can either > allow the pcp cache to be drained on a different rather than a local cpu > or avoid remote flushing on isolated cpus. > > The current pcp charge cache is really optimized for high performance > and it always relies to stick with its cpu. That means it only requires > local_lock (preempt_disable on !RT) and draining is handed over to pcp > WQ to drain locally again. > > The former solution (remote draining) would require to add an additional > locking to prevent local charges from racing with the draining. This > adds an atomic operation to otherwise simple arithmetic fast path in the > try_charge path. Another concern is that the remote draining can cause a > lock contention for the isolated workloads and therefore interfere with > it indirectly via user space interfaces. > > Another option is to avoid draining scheduling on isolated cpus > altogether. That means that those remote cpus would keep their charges > even after drain_all_stock returns. This is certainly not optimal either > but it shouldn't really cause any major problems. In the worst case > (many isolated cpus with charges - each of them with MEMCG_CHARGE_BATCH > i.e 64 page) the memory consumption of a memcg would be artificially > higher than can be immediately used from other cpus. > > Theoretically a memcg OOM killer could be triggered pre-maturely. > Currently it is not really clear whether this is a practical problem > though. Tight memcg limit would be really counter productive to cpu > isolated workloads pretty much by definition because any memory > reclaimed induced by memcg limit could break user space timing > expectations as those usually expect execution in the userspace most of > the time. > > Also charges could be left behind on memcg removal. Any future charge on > those isolated cpus will drain that pcp cache so this won't be a > permanent leak. > > Considering cons and pros of both approaches this patch is implementing > the second option and simply do not schedule remote draining if the > target cpu is isolated. This solution is much more simpler. It doesn't > add any new locking and it is more more predictable from the user space > POV. Should the pre-mature memcg OOM become a real life problem, we can > revisit this decision. JFYI feel free to take a look at the non-housekeeping CPUs [1]. [1] https://lore.kernel.org/lkml/20230223150624.GA29739@lst.de/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus 2023-03-18 3:23 ` Hillf Danton @ 2023-03-18 8:08 ` Michal Hocko 0 siblings, 0 replies; 20+ messages in thread From: Michal Hocko @ 2023-03-18 8:08 UTC (permalink / raw) To: Hillf Danton Cc: Andrew Morton, Leonardo Bras, Frederic Weisbecker, LKML, linux-mm, Frederic Weisbecker On Sat 18-03-23 11:23:50, Hillf Danton wrote: > On 17 Mar 2023 14:44:48 +0100 Michal Hocko <mhocko@suse.com> > > Leonardo Bras has noticed that pcp charge cache draining might be > > disruptive on workloads relying on 'isolated cpus', a feature commonly > > used on workloads that are sensitive to interruption and context > > switching such as vRAN and Industrial Control Systems. > > > > There are essentially two ways how to approach the issue. We can either > > allow the pcp cache to be drained on a different rather than a local cpu > > or avoid remote flushing on isolated cpus. > > > > The current pcp charge cache is really optimized for high performance > > and it always relies to stick with its cpu. That means it only requires > > local_lock (preempt_disable on !RT) and draining is handed over to pcp > > WQ to drain locally again. > > > > The former solution (remote draining) would require to add an additional > > locking to prevent local charges from racing with the draining. This > > adds an atomic operation to otherwise simple arithmetic fast path in the > > try_charge path. Another concern is that the remote draining can cause a > > lock contention for the isolated workloads and therefore interfere with > > it indirectly via user space interfaces. > > > > Another option is to avoid draining scheduling on isolated cpus > > altogether. That means that those remote cpus would keep their charges > > even after drain_all_stock returns. This is certainly not optimal either > > but it shouldn't really cause any major problems. In the worst case > > (many isolated cpus with charges - each of them with MEMCG_CHARGE_BATCH > > i.e 64 page) the memory consumption of a memcg would be artificially > > higher than can be immediately used from other cpus. > > > > Theoretically a memcg OOM killer could be triggered pre-maturely. > > Currently it is not really clear whether this is a practical problem > > though. Tight memcg limit would be really counter productive to cpu > > isolated workloads pretty much by definition because any memory > > reclaimed induced by memcg limit could break user space timing > > expectations as those usually expect execution in the userspace most of > > the time. > > > > Also charges could be left behind on memcg removal. Any future charge on > > those isolated cpus will drain that pcp cache so this won't be a > > permanent leak. > > > > Considering cons and pros of both approaches this patch is implementing > > the second option and simply do not schedule remote draining if the > > target cpu is isolated. This solution is much more simpler. It doesn't > > add any new locking and it is more more predictable from the user space > > POV. Should the pre-mature memcg OOM become a real life problem, we can > > revisit this decision. > > JFYI feel free to take a look at the non-housekeeping CPUs [1]. > > [1] https://lore.kernel.org/lkml/20230223150624.GA29739@lst.de/ Such an approach would require remote draining and I hope I have explained why that is not a preferred way in this case. Other than that I do agree with Christoph that a generic approach would be really nice. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2023-03-30 18:53 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-03-17 13:44 [PATCH 0/2] memcg, cpuisol: do not interfere pcp cache charges draining with cpuisol workloads Michal Hocko 2023-03-17 13:44 ` [PATCH 1/2] sched/isolation: Add cpu_is_isolated() API Michal Hocko 2023-03-17 18:33 ` Marcelo Tosatti 2023-03-17 18:35 ` Marcelo Tosatti 2023-03-18 8:04 ` Michal Hocko 2023-03-24 22:35 ` Frederic Weisbecker 2023-03-27 10:24 ` Marcelo Tosatti 2023-03-28 11:38 ` Frederic Weisbecker 2023-03-28 11:48 ` Michal Hocko 2023-03-29 14:20 ` Marcelo Tosatti 2023-03-30 13:28 ` Michal Hocko 2023-03-30 15:21 ` Marcelo Tosatti 2023-03-17 13:44 ` [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus Michal Hocko 2023-03-17 20:08 ` Shakeel Butt 2023-03-17 21:51 ` kernel test robot 2023-03-17 22:22 ` kernel test robot 2023-03-17 23:32 ` Andrew Morton 2023-03-18 8:03 ` Michal Hocko 2023-03-18 3:23 ` Hillf Danton 2023-03-18 8:08 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).