linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] mm: Sync percpu mm RSS counters before querying
@ 2023-06-08 17:12 Michal Koutný
  2023-06-08 17:38 ` Shakeel Butt
  0 siblings, 1 reply; 2+ messages in thread
From: Michal Koutný @ 2023-06-08 17:12 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Christian Brauner, Liam R . Howlett,
	Suren Baghdasaryan, Michael S . Tsirkin, Mike Christie,
	Andrei Vagin, Mathieu Desnoyers, Nicholas Piggin, Peter Zijlstra,
	Shakeel Butt, Adam Majer, Jan Kara, Michal Hocko

An issue was observed with stats collected in struct rusage on ppc64le
with 64kB pages. The percpu counters use batching with
	percpu_counter_batch = max(32, nr*2) # in PAGE_SIZE
i.e. with larger pages but similar RSS consumption (bytes), there'll be
less flushes and error more noticeable.

In this given case (getting consumption of exited child), we can request
percpu counter's flush without worrying about contention with updaters.

Fortunately, the commit f1a7941243c1 ("mm: convert mm's rss stats into
percpu_counter") didn't eradicate all traces of SPLIT_RSS_COUNTING and
this mechanism already provided some synchronization points before
reading stats.
Therefore, use sync_mm_rss as carrier for percpu counters refreshes and
forget SPLIT_RSS_COUNTING macro for good.

Fixes: f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter")
Reported-by: Adam Majer <amajer@suse.com>
Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 include/linux/mm.h | 6 ++----
 kernel/fork.c      | 4 ----
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 27ce77080c79..30cfde88d5b2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2547,13 +2547,11 @@ static inline void setmax_mm_hiwater_rss(unsigned long *maxrss,
 		*maxrss = hiwater_rss;
 }
 
-#if defined(SPLIT_RSS_COUNTING)
-void sync_mm_rss(struct mm_struct *mm);
-#else
 static inline void sync_mm_rss(struct mm_struct *mm)
 {
+	for (int i = 0; i < NR_MM_COUNTERS; ++i)
+		percpu_counter_sum(&mm->rss_stat[i]);
 }
-#endif
 
 #ifndef CONFIG_ARCH_HAS_PTE_SPECIAL
 static inline int pte_special(pte_t pte)
diff --git a/kernel/fork.c b/kernel/fork.c
index 81cba91f30bb..e030eb902e4b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2412,10 +2412,6 @@ __latent_entropy struct task_struct *copy_process(
 	p->io_uring = NULL;
 #endif
 
-#if defined(SPLIT_RSS_COUNTING)
-	memset(&p->rss_stat, 0, sizeof(p->rss_stat));
-#endif
-
 	p->default_timer_slack_ns = current->timer_slack_ns;
 
 #ifdef CONFIG_PSI
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [RFC PATCH] mm: Sync percpu mm RSS counters before querying
  2023-06-08 17:12 [RFC PATCH] mm: Sync percpu mm RSS counters before querying Michal Koutný
@ 2023-06-08 17:38 ` Shakeel Butt
  0 siblings, 0 replies; 2+ messages in thread
From: Shakeel Butt @ 2023-06-08 17:38 UTC (permalink / raw)
  To: Michal Koutný
  Cc: linux-mm, linux-kernel, Andrew Morton, Christian Brauner,
	Liam R . Howlett, Suren Baghdasaryan, Michael S . Tsirkin,
	Mike Christie, Andrei Vagin, Mathieu Desnoyers, Nicholas Piggin,
	Peter Zijlstra, Adam Majer, Jan Kara, Michal Hocko

On Thu, Jun 08, 2023 at 07:12:56PM +0200, Michal Koutný wrote:
> An issue was observed with stats collected in struct rusage on ppc64le
> with 64kB pages. The percpu counters use batching with
> 	percpu_counter_batch = max(32, nr*2) # in PAGE_SIZE
> i.e. with larger pages but similar RSS consumption (bytes), there'll be
> less flushes and error more noticeable.
> 
> In this given case (getting consumption of exited child), we can request
> percpu counter's flush without worrying about contention with updaters.
> 
> Fortunately, the commit f1a7941243c1 ("mm: convert mm's rss stats into
> percpu_counter") didn't eradicate all traces of SPLIT_RSS_COUNTING and
> this mechanism already provided some synchronization points before
> reading stats.
> Therefore, use sync_mm_rss as carrier for percpu counters refreshes and
> forget SPLIT_RSS_COUNTING macro for good.
> 
> Fixes: f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter")
> Reported-by: Adam Majer <amajer@suse.com>
> Signed-off-by: Michal Koutný <mkoutny@suse.com>

The patch seems reasonable to me. Are any of the callsites of
sync_mm_rss performance sensitive?

> ---
>  include/linux/mm.h | 6 ++----
>  kernel/fork.c      | 4 ----
>  2 files changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 27ce77080c79..30cfde88d5b2 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2547,13 +2547,11 @@ static inline void setmax_mm_hiwater_rss(unsigned long *maxrss,
>  		*maxrss = hiwater_rss;
>  }
>  
> -#if defined(SPLIT_RSS_COUNTING)
> -void sync_mm_rss(struct mm_struct *mm);
> -#else
>  static inline void sync_mm_rss(struct mm_struct *mm)
>  {
> +	for (int i = 0; i < NR_MM_COUNTERS; ++i)
> +		percpu_counter_sum(&mm->rss_stat[i]);
>  }
> -#endif
>  
>  #ifndef CONFIG_ARCH_HAS_PTE_SPECIAL
>  static inline int pte_special(pte_t pte)
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 81cba91f30bb..e030eb902e4b 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2412,10 +2412,6 @@ __latent_entropy struct task_struct *copy_process(
>  	p->io_uring = NULL;
>  #endif
>  
> -#if defined(SPLIT_RSS_COUNTING)
> -	memset(&p->rss_stat, 0, sizeof(p->rss_stat));
> -#endif
> -
>  	p->default_timer_slack_ns = current->timer_slack_ns;
>  
>  #ifdef CONFIG_PSI
> -- 
> 2.40.1
> 

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-06-08 17:38 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-08 17:12 [RFC PATCH] mm: Sync percpu mm RSS counters before querying Michal Koutný
2023-06-08 17:38 ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).