All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: "Michal Koutný" <mkoutny@suse.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Christian Brauner <brauner@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Mike Christie <michael.christie@oracle.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Nicholas Piggin <npiggin@gmail.com>,
	Andrei Vagin <avagin@gmail.com>,
	Shakeel Butt <shakeelb@google.com>, Adam Majer <amajer@suse.com>,
	Jan Kara <jack@suse.cz>, Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH] mm: Sync percpu mm RSS counters before querying
Date: Fri, 16 Jun 2023 12:31:44 -0700	[thread overview]
Message-ID: <20230616123144.bd2a8120dab25736c5c37297@linux-foundation.org> (raw)
In-Reply-To: <20230616180718.17725-1-mkoutny@suse.com>

On Fri, 16 Jun 2023 20:07:18 +0200 Michal Koutný <mkoutny@suse.com> wrote:

> An issue was observed with stats collected in struct rusage on ppc64le
> with 64kB pages. The percpu counters use batching with
> 	percpu_counter_batch = max(32, nr*2) # in PAGE_SIZE
> i.e. with larger pages but similar RSS consumption (bytes), there'll be
> less flushes and error more noticeable.

A fully detailed description of the issue would be helpful.  Obviously
"inaccuracy", but how bad?

> In this given case (getting consumption of exited child), we can request
> percpu counter's flush without worrying about contention with updaters.
> 
> Fortunately, the commit f1a7941243c1 ("mm: convert mm's rss stats into
> percpu_counter") didn't eradicate all traces of SPLIT_RSS_COUNTING and
> this mechanism already provided some synchronization points before
> reading stats.
> Therefore, use sync_mm_rss as carrier for percpu counters refreshes and
> forget SPLIT_RSS_COUNTING macro for good.
> 
> Impact of summing on a 8 CPU machine:
> Benchmark 1: taskset -c 1 ./shell-bench.sh
> 
> Before
>   Time (mean ± σ):      9.950 s ±  0.052 s    [User: 7.773 s, System: 2.023 s]
> 
> After
>   Time (mean ± σ):      9.990 s ±  0.070 s    [User: 7.825 s, System: 2.011 s]
> 
> cat >shell-bench.sh <<EOD
> for (( i = 0; i < 20000; i++ )); do
> 	/bin/true
> done
> EOD
> 
> The script is meant to stress fork-exit path (exit is where sync_mm_rss
> is most called, add_mm_rss_vec should be covered in fork).
> 
> ...
>
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2547,13 +2547,12 @@ static inline void setmax_mm_hiwater_rss(unsigned long *maxrss,
>  		*maxrss = hiwater_rss;
>  }
>  
> -#if defined(SPLIT_RSS_COUNTING)
> -void sync_mm_rss(struct mm_struct *mm);
> -#else
>  static inline void sync_mm_rss(struct mm_struct *mm)
>  {
> +	for (int i = 0; i < NR_MM_COUNTERS; ++i)
> +		percpu_counter_set(&mm->rss_stat[i],
> +				   percpu_counter_sum(&mm->rss_stat[i]));
>  }
> -#endif

Far too large to be inlined!  For six callsites it adds 1kb of text.

Why even modify the counter?  Can't <whatever this issue is> be
addressed by using percpu_counter_sum() in an appropriate place?

For unknown reasons percpu_counter_set() uses for_each_possible_cpu(). 
Probably just a mistake - percpu_counters are hotplug-aware and
for_each_online_cpu should suffice.

I'm really not liking percpu_counter_set().  It's only safe in
situations where the caller knows that no other CPU can be modifying
the counter.  I wonder if all the callers know that.  This situation
isn't aided by the lack of any documentation.

  reply	other threads:[~2023-06-16 19:31 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-16 18:07 [PATCH] mm: Sync percpu mm RSS counters before querying Michal Koutný
2023-06-16 19:31 ` Andrew Morton [this message]
2023-06-19 12:53   ` Michal Koutný
2023-06-19 13:10 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230616123144.bd2a8120dab25736c5c37297@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=amajer@suse.com \
    --cc=avagin@gmail.com \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhocko@kernel.org \
    --cc=michael.christie@oracle.com \
    --cc=mkoutny@suse.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=shakeelb@google.com \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.