linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>, Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <muchun.song@linux.dev>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Ivan Babrou <ivan@cloudflare.com>
Subject: Re: [PATCH] mm: memcg: provide accurate stats for userspace reads
Date: Tue, 15 Aug 2023 17:29:03 -0700	[thread overview]
Message-ID: <CAJD7tkaUzhvZPohpo1F8TUKRPuXH7bjDeg9VCzN2CbywQbRutQ@mail.gmail.com> (raw)
In-Reply-To: <CALvZod6KRxiDzrppCgx+=SHg2+96nFE5crwXCKwe9PZbWM_6cQ@mail.gmail.com>

On Tue, Aug 15, 2023 at 5:23 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> +Ivan
>
> On Mon, Aug 14, 2023 at 5:51 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > On Mon, Aug 14, 2023 at 5:48 PM Tejun Heo <tj@kernel.org> wrote:
> > >
> > > Hello,
> > >
> > > On Mon, Aug 14, 2023 at 05:39:15PM -0700, Yosry Ahmed wrote:
> > > > I believe dropping unified flushing, if possible of course, may fix
> > > > both problems.
> > >
> > > Yeah, flushing the whole tree for every stat read will push up the big O
> > > complexity of the operation. It shouldn't be too bad because only what's
> > > updated since the last read will need flushing but if you have a really big
> > > machine with a lot of constantly active cgroups, you're still gonna feel it.
> > > So, yeah, drop that and switch the global lock to mutex and we should all be
> > > good?
> >
> > I hope so, but I am not sure.
> >
> > The unified flushing was added initially to mitigate a thundering herd
> > problem from concurrent in-kernel flushers (e.g. concurrent reclaims),
> > but back then flushing was atomic so we had to keep the spinlock held
> > for a long time. I think it should be better now, but I am hoping
> > Shakeel will chime in since he added the unified flushing originally.
> >
> > We also need to agree on what to do about stats_flushing_threshold and
> > flush_next_time since they're both global now (since all flushing is
> > global).
> >
>
> I thought we already reached the decision on how to proceed here. Let
> me summarize what I think we should do:
>
> 1. Completely remove the sync flush from stat files read from userspace.
> 2. Provide a separate way/interface to explicitly flush stats for
> users who want more accurate stats and can pay the cost. This is
> similar to the stat_refresh interface.
> 3. Keep the 2 sec periodic stats flusher.

I think this solution is suboptimal to be honest, I think we can do better.

With recent improvements to spinlocks/mutexes, and flushers becoming
sleepable, I think a better solution would be to remove unified
flushing and let everyone only flush the subtree they care about. Sync
flushing becomes much better (unless you're flushing root ofc), and
concurrent flushing wouldn't cause too many problems (ideally no
thundering herd, and rstat lock can be dropped at cpu boundaries in
cgroup_rstat_flush_locked()).

If we do this, stat reads can be much faster as Ivan demonstrated with
his patch that only flushes the cgroup being read, and we do not
sacrifice accuracy as we never skip flushing. We also do not need a
separate interface for explicit refresh.

In all cases, we need to keep the 2 sec periodic flusher. What we need
to figure out if we remove unified flushing is:

1. Handling stats_flush_threshold.
2. Handling flush_next_time.

Both of these are global now, and will need to be adapted to
non-unified non-global flushing.

  reply	other threads:[~2023-08-16  0:35 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-09  4:58 [PATCH] mm: memcg: provide accurate stats for userspace reads Yosry Ahmed
2023-08-09  8:51 ` Michal Hocko
2023-08-09 12:31   ` Yosry Ahmed
2023-08-09 12:58     ` Michal Hocko
2023-08-09 13:13       ` Yosry Ahmed
2023-08-09 13:31         ` Michal Hocko
2023-08-09 18:33           ` Yosry Ahmed
2023-08-11 12:21             ` Michal Hocko
2023-08-11 19:02               ` Yosry Ahmed
2023-08-11 19:25                 ` Michal Hocko
2023-08-11 20:39                   ` Yosry Ahmed
2023-08-12  2:08                     ` Shakeel Butt
2023-08-12  2:11                       ` Yosry Ahmed
2023-08-12  2:29                         ` Shakeel Butt
2023-08-12  2:35                           ` Yosry Ahmed
2023-08-12  2:48                             ` Shakeel Butt
2023-08-12  8:35                               ` Michal Hocko
2023-08-12 11:04                                 ` Yosry Ahmed
2023-08-15  0:14                                   ` Tejun Heo
2023-08-15  0:28                                     ` Yosry Ahmed
2023-08-15  0:35                                       ` Tejun Heo
2023-08-15  0:39                                         ` Yosry Ahmed
2023-08-15  0:47                                           ` Tejun Heo
2023-08-15  0:50                                             ` Yosry Ahmed
2023-08-16  0:23                                               ` Shakeel Butt
2023-08-16  0:29                                                 ` Yosry Ahmed [this message]
2023-08-16  1:14                                                   ` Shakeel Butt
2023-08-16  2:19                                                     ` Yosry Ahmed
2023-08-16 17:11                                                       ` Shakeel Butt
2023-08-16 19:08                                                         ` Tejun Heo
2023-08-16 22:35                                                           ` Yosry Ahmed
2023-08-18 21:40                                                             ` Yosry Ahmed
2023-08-21 20:58                                                               ` Yosry Ahmed
2023-08-15 15:44                                         ` Waiman Long
2023-08-09 13:17       ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkaUzhvZPohpo1F8TUKRPuXH7bjDeg9VCzN2CbywQbRutQ@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=ivan@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).