linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Shakeel Butt <shakeelb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <muchun.song@linux.dev>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH] mm: memcg: provide accurate stats for userspace reads
Date: Sat, 12 Aug 2023 04:04:32 -0700	[thread overview]
Message-ID: <CAJD7tkaFHgc3eN1K1wYsQFWMLu4+Frf9DJ-5HOja2nC20Es9Dw@mail.gmail.com> (raw)
In-Reply-To: <ZNdEaw2nktq1NfmH@dhcp22.suse.cz>

On Sat, Aug 12, 2023 at 1:35 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Fri 11-08-23 19:48:14, Shakeel Butt wrote:
> > On Fri, Aug 11, 2023 at 7:36 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > >
> > > On Fri, Aug 11, 2023 at 7:29 PM Shakeel Butt <shakeelb@google.com> wrote:
> > > >
> > > > On Fri, Aug 11, 2023 at 7:12 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > > > >
> > > > [...]
> > > > >
> > > > > I am worried that writing to a stat for flushing then reading will
> > > > > increase the staleness window which we are trying to reduce here.
> > > > > Would it be acceptable to add a separate interface to explicitly read
> > > > > flushed stats without having to write first? If the distinction
> > > > > disappears in the future we can just short-circuit both interfaces.
> > > >
> > > > What is the acceptable staleness time window for your case? It is hard
> > > > to imagine that a write+read will always be worse than just a read.
> > > > Even the proposed patch can have an unintended and larger than
> > > > expected staleness window due to some processing on
> > > > return-to-userspace or some scheduling delay.
> > >
> > > Maybe I am worrying too much, we can just go for writing to
> > > memory.stat for explicit stats refresh.
> > >
> > > Do we still want to go with the mutex approach Michal suggested for
> > > do_flush_stats() to support either waiting for ongoing flushes
> > > (mutex_lock) or skipping (mutex_trylock)?
> >
> > I would say keep that as a separate patch.
>
> Separate patches would be better but please make the mutex conversion
> first. We really do not want to have any busy waiting depending on a
> sleep exported to the userspace. That is just no-go.

+tj@kernel.org

That makes sense.

Taking a step back though, and considering there have been other
complaints about unified flushing causing expensive reads from
memory.stat [1], I am wondering if we should tackle the fundamental
problem.

We have a single global rstat lock for flushing, which protects the
global per-cgroup counters as far as I understand. A single lock means
a lot of contention, which is why we implemented unified flushing on
the memcg side in the first place, where we only let one flusher
operate and everyone else skip, but that flusher needs to flush the
entire tree.

This can be unnecessarily expensive (see [1]), and to avoid how
expensive it is we sacrifice accuracy (what this patch is about). I am
exploring breaking down that lock into per-cgroup locks, where a
flusher acquires locks in a top down fashion. This allows for some
concurrency in flushing, and makes unified flushing unnecessary. If we
retire unified flushing we fix both accuracy and expensive reads at
the same time, while not sacrificing performance for concurrent
in-kernel flushers.

What do you think? I am prototyping something now and running some
tests, it seems promising and simple-ish (unless I am missing a big
correctness issue).

[1] https://lore.kernel.org/lkml/CABWYdi3YNwtPDwwJWmCO-ER50iP7CfbXkCep5TKb-9QzY-a40A@mail.gmail.com/

>
> Thanks!
> --
> Michal Hocko
> SUSE Labs

  reply	other threads:[~2023-08-12 11:05 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-09  4:58 [PATCH] mm: memcg: provide accurate stats for userspace reads Yosry Ahmed
2023-08-09  8:51 ` Michal Hocko
2023-08-09 12:31   ` Yosry Ahmed
2023-08-09 12:58     ` Michal Hocko
2023-08-09 13:13       ` Yosry Ahmed
2023-08-09 13:31         ` Michal Hocko
2023-08-09 18:33           ` Yosry Ahmed
2023-08-11 12:21             ` Michal Hocko
2023-08-11 19:02               ` Yosry Ahmed
2023-08-11 19:25                 ` Michal Hocko
2023-08-11 20:39                   ` Yosry Ahmed
2023-08-12  2:08                     ` Shakeel Butt
2023-08-12  2:11                       ` Yosry Ahmed
2023-08-12  2:29                         ` Shakeel Butt
2023-08-12  2:35                           ` Yosry Ahmed
2023-08-12  2:48                             ` Shakeel Butt
2023-08-12  8:35                               ` Michal Hocko
2023-08-12 11:04                                 ` Yosry Ahmed [this message]
2023-08-15  0:14                                   ` Tejun Heo
2023-08-15  0:28                                     ` Yosry Ahmed
2023-08-15  0:35                                       ` Tejun Heo
2023-08-15  0:39                                         ` Yosry Ahmed
2023-08-15  0:47                                           ` Tejun Heo
2023-08-15  0:50                                             ` Yosry Ahmed
2023-08-16  0:23                                               ` Shakeel Butt
2023-08-16  0:29                                                 ` Yosry Ahmed
2023-08-16  1:14                                                   ` Shakeel Butt
2023-08-16  2:19                                                     ` Yosry Ahmed
2023-08-16 17:11                                                       ` Shakeel Butt
2023-08-16 19:08                                                         ` Tejun Heo
2023-08-16 22:35                                                           ` Yosry Ahmed
2023-08-18 21:40                                                             ` Yosry Ahmed
2023-08-21 20:58                                                               ` Yosry Ahmed
2023-08-15 15:44                                         ` Waiman Long
2023-08-09 13:17       ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkaFHgc3eN1K1wYsQFWMLu4+Frf9DJ-5HOja2nC20Es9Dw@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).