linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeelb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <muchun.song@linux.dev>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: memcg: provide accurate stats for userspace reads
Date: Wed, 9 Aug 2023 10:51:13 +0200	[thread overview]
Message-ID: <ZNNTgZVPZipTL/UM@dhcp22.suse.cz> (raw)
In-Reply-To: <20230809045810.1659356-1-yosryahmed@google.com>

On Wed 09-08-23 04:58:10, Yosry Ahmed wrote:
> Over time, the memcg code added multiple optimizations to the stats
> flushing path that introduce a tradeoff between accuracy and
> performance. In some contexts (e.g. dirty throttling, refaults, etc), a
> full rstat flush of the stats in the tree can be too expensive. Such
> optimizations include [1]:
> (a) Introducing a periodic background flusher to keep the size of the
> update tree from growing unbounded.
> (b) Allowing only one thread to flush at a time, and other concurrent
> flushers just skip the flush. This avoids a thundering herd problem
> when multiple reclaim/refault threads attempt to flush the stats at
> once.
> (c) Only executing a flush if the magnitude of the stats updates exceeds
> a certain threshold.
> 
> These optimizations were necessary to make flushing feasible in
> performance-critical paths, and they come at the cost of some accuracy
> that we choose to live without. On the other hand, for flushes invoked
> when userspace is reading the stats, the tradeoff is less appealing
> This code path is not performance-critical, and the inaccuracies can
> affect userspace behavior. For example, skipping flushing when there is
> another ongoing flush is essentially a coin flip. We don't know if the
> ongoing flush is done with the subtree of interest or not.

I am not convinced by this much TBH. What kind of precision do you
really need and how much off is what we provide?

More expensive read of stats from userspace is quite easy to notice
and usually reported as a regression. So you should have a convincing
argument that an extra time spent is really worth it. AFAIK there are
many monitoring (top like) tools which simply read those files regularly
just to show numbers and they certainly do not need a high level of
precision.

[...]
> @@ -639,17 +639,24 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
>  	}
>  }
>  
> -static void do_flush_stats(void)
> +static void do_flush_stats(bool full)
>  {
> +	if (!atomic_read(&stats_flush_ongoing) &&
> +	    !atomic_xchg(&stats_flush_ongoing, 1))
> +		goto flush;
> +
>  	/*
> -	 * We always flush the entire tree, so concurrent flushers can just
> -	 * skip. This avoids a thundering herd problem on the rstat global lock
> -	 * from memcg flushers (e.g. reclaim, refault, etc).
> +	 * We always flush the entire tree, so concurrent flushers can choose to
> +	 * skip if accuracy is not critical. Otherwise, wait for the ongoing
> +	 * flush to complete. This avoids a thundering herd problem on the rstat
> +	 * global lock from memcg flushers (e.g. reclaim, refault, etc).
>  	 */
> -	if (atomic_read(&stats_flush_ongoing) ||
> -	    atomic_xchg(&stats_flush_ongoing, 1))
> -		return;
> -
> +	while (full && atomic_read(&stats_flush_ongoing) == 1) {
> +		if (!cond_resched())
> +			cpu_relax();

You are reinveting a mutex with spinning waiter. Why don't you simply
make stats_flush_ongoing a real mutex and make use try_lock for !full
flush and normal lock otherwise?

> +	}
> +	return;
> +flush:
>  	WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME);
>  
>  	cgroup_rstat_flush(root_mem_cgroup->css.cgroup);
[...]
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2023-08-09  8:51 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-09  4:58 [PATCH] mm: memcg: provide accurate stats for userspace reads Yosry Ahmed
2023-08-09  8:51 ` Michal Hocko [this message]
2023-08-09 12:31   ` Yosry Ahmed
2023-08-09 12:58     ` Michal Hocko
2023-08-09 13:13       ` Yosry Ahmed
2023-08-09 13:31         ` Michal Hocko
2023-08-09 18:33           ` Yosry Ahmed
2023-08-11 12:21             ` Michal Hocko
2023-08-11 19:02               ` Yosry Ahmed
2023-08-11 19:25                 ` Michal Hocko
2023-08-11 20:39                   ` Yosry Ahmed
2023-08-12  2:08                     ` Shakeel Butt
2023-08-12  2:11                       ` Yosry Ahmed
2023-08-12  2:29                         ` Shakeel Butt
2023-08-12  2:35                           ` Yosry Ahmed
2023-08-12  2:48                             ` Shakeel Butt
2023-08-12  8:35                               ` Michal Hocko
2023-08-12 11:04                                 ` Yosry Ahmed
2023-08-15  0:14                                   ` Tejun Heo
2023-08-15  0:28                                     ` Yosry Ahmed
2023-08-15  0:35                                       ` Tejun Heo
2023-08-15  0:39                                         ` Yosry Ahmed
2023-08-15  0:47                                           ` Tejun Heo
2023-08-15  0:50                                             ` Yosry Ahmed
2023-08-16  0:23                                               ` Shakeel Butt
2023-08-16  0:29                                                 ` Yosry Ahmed
2023-08-16  1:14                                                   ` Shakeel Butt
2023-08-16  2:19                                                     ` Yosry Ahmed
2023-08-16 17:11                                                       ` Shakeel Butt
2023-08-16 19:08                                                         ` Tejun Heo
2023-08-16 22:35                                                           ` Yosry Ahmed
2023-08-18 21:40                                                             ` Yosry Ahmed
2023-08-21 20:58                                                               ` Yosry Ahmed
2023-08-15 15:44                                         ` Waiman Long
2023-08-09 13:17       ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZNNTgZVPZipTL/UM@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).