bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Tejun Heo" <tj@kernel.org>, "Josef Bacik" <josef@toxicpanda.com>,
	"Jens Axboe" <axboe@kernel.dk>,
	"Zefan Li" <lizefan.x@bytedance.com>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Roman Gushchin" <roman.gushchin@linux.dev>,
	"Shakeel Butt" <shakeelb@google.com>,
	"Muchun Song" <muchun.song@linux.dev>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Vasily Averin" <vasily.averin@linux.dev>,
	cgroups@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	bpf@vger.kernel.org
Subject: Re: [PATCH v1 6/9] memcg: sleep during flushing stats in safe contexts
Date: Tue, 28 Mar 2023 11:45:19 -0700	[thread overview]
Message-ID: <CAJD7tkZxEEcVZ9G7NSM56q_uOyL7e353NT06kD0mY5DyNmKTpw@mail.gmail.com> (raw)
In-Reply-To: <ZCMzfQuo9IhWVzRA@cmpxchg.org>

On Tue, Mar 28, 2023 at 11:35 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Tue, Mar 28, 2023 at 06:16:35AM +0000, Yosry Ahmed wrote:
> > @@ -642,24 +642,57 @@ static void __mem_cgroup_flush_stats(void)
> >        * from memcg flushers (e.g. reclaim, refault, etc).
> >        */
> >       if (atomic_xchg(&stats_flush_ongoing, 1))
> > -             return;
> > +             return false;
> >
> >       WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME);
> > -     cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup);
> > +     return true;
> > +}
> > +
> > +static void mem_cgroup_post_stats_flush(void)
> > +{
> >       atomic_set(&stats_flush_threshold, 0);
> >       atomic_set(&stats_flush_ongoing, 0);
> >  }
> >
> > -void mem_cgroup_flush_stats(void)
> > +static bool mem_cgroup_should_flush_stats(void)
> >  {
> > -     if (atomic_read(&stats_flush_threshold) > num_online_cpus())
> > -             __mem_cgroup_flush_stats();
> > +     return atomic_read(&stats_flush_threshold) > num_online_cpus();
> > +}
> > +
> > +/* atomic functions, safe to call from any context */
> > +static void __mem_cgroup_flush_stats_atomic(void)
> > +{
> > +     if (mem_cgroup_pre_stats_flush()) {
> > +             cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup);
> > +             mem_cgroup_post_stats_flush();
> > +     }
> > +}
>
> I'm afraid I wasn't very nuanced with my complaint about the bool
> parameter in the previous version. In this case, when you can do a
> common helper for a couple of API functions defined right below it,
> and the callers don't spread throughout the codebase, using bools
> makes things simpler while still being easily understandable:

Looking at your suggestion now, it seems fairly obvious that this is
what I should have gone for. Will do that for v2. Thanks!

>
> static void do_flush_stats(bool may_sleep)
> {
>         if (atomic_xchg(&stats_flush_ongoing, 1))
>                 return;
>
>         WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME);
>         atomic_set(&stats_flush_threshold, 0);
>
>         if (!may_sleep)
>                 cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup);
>         else
>                 cgroup_rstat_flush(root_mem_cgroup->css.cgroup);
>
>         atomic_set(&stats_flush_ongoing, 0);
> }
>
> void mem_cgroup_flush_stats(void)
> {
>         if (atomic_read(&stats_flush_threshold) > num_online_cpus())
>                 do_flush_stats(true);
> }
>
> void mem_cgroup_flush_stats_atomic(void)
> {
>         if (atomic_read(&stats_flush_threshold) > num_online_cpus())
>                 do_flush_stats(false);
> }
>
> >  void mem_cgroup_flush_stats_ratelimited(void)
> >  {
> >       if (time_after64(jiffies_64, READ_ONCE(flush_next_time)))
> > -             mem_cgroup_flush_stats();
> > +             mem_cgroup_flush_stats_atomic();
> > +}
>
> This should probably be mem_cgroup_flush_stats_atomic_ratelimited().
>
> (Whee, kinda long, but that's alright. Very specialized caller...)

It should, but the following patch makes it non-atomic anyway, so I
thought I wouldn't clutter the diff by renaming it here and then
reverting it back in the next patch.

There is an argument for maintaining a clean history tho in case the
next patch is reverted separately (which is the reason I put it in a
separate patch to begin with) -- so perhaps I should rename it here to
mem_cgroup_flush_stats_atomic_ratelimited () and back to
mem_cgroup_flush_stats_ratelimited() in the next patch, just for
consistency?

>
> Btw, can you guys think of a reason against moving the threshold check
> into the common function? It would then apply to the time-limited
> flushes as well, but that shouldn't hurt anything. This would make the
> code even simpler:

I think the point of having the threshold check outside the common
function is that the periodic flusher always flushes, regardless of
the threshold, to keep rstat flushing from critical contexts as cheap
as possible.

If you think it's not worth it, I can make that change. It is a
separate functional change tho, so maybe in a separate patch.

>
> static void do_flush_stats(bool may_sleep)
> {
>         if (atomic_read(&stats_flush_threshold) <= num_online_cpus())
>                 return;
>
>         if (atomic_xchg(&stats_flush_ongoing, 1))
>                 return;
>
>         WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME);
>         atomic_set(&stats_flush_threshold, 0);
>
>         if (!may_sleep)
>                 cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup);
>         else
>                 cgroup_rstat_flush(root_mem_cgroup->css.cgroup);
>
>         atomic_set(&stats_flush_ongoing, 0);
> }
>
> void mem_cgroup_flush_stats(void)
> {
>         do_flush_stats(true);
> }
>
> void mem_cgroup_flush_stats_atomic(void)
> {
>         do_flush_stats(false);
> }
>
> void mem_cgroup_flush_stats_atomic_ratelimited(void)
> {
>         if (time_after64(jiffies_64, READ_ONCE(flush_next_time)))
>                 do_flush_stats(false);
> }
>
> > @@ -2845,7 +2845,7 @@ static void prepare_scan_count(pg_data_t *pgdat, struct scan_control *sc)
> >        * Flush the memory cgroup stats, so that we read accurate per-memcg
> >        * lruvec stats for heuristics.
> >        */
> > -     mem_cgroup_flush_stats();
> > +     mem_cgroup_flush_stats_atomic();
>
> I'm thinking this one could be non-atomic as well. It's called fairly
> high up in reclaim without any locks held.

A later patch does exactly that. I put making the reclaim and refault
paths non-atomic in separate patches to easily revert them if we see a
regression. Let me know if this is too defensive and if you'd rather
have them squashed.

Thanks!

  reply	other threads:[~2023-03-28 18:46 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-28  6:16 [PATCH v1 0/9] memcg: make rstat flushing irq and sleep friendly Yosry Ahmed
2023-03-28  6:16 ` [PATCH v1 1/9] cgroup: rename cgroup_rstat_flush_"irqsafe" to "atomic" Yosry Ahmed
2023-03-28 13:24   ` Shakeel Butt
2023-03-28 17:42   ` Johannes Weiner
2023-03-28  6:16 ` [PATCH v1 2/9] memcg: rename mem_cgroup_flush_stats_"delayed" to "ratelimited" Yosry Ahmed
2023-03-28 13:25   ` Shakeel Butt
2023-03-28 17:42   ` Johannes Weiner
2023-03-28  6:16 ` [PATCH v1 3/9] memcg: do not flush stats in irq context Yosry Ahmed
2023-03-28 13:26   ` Shakeel Butt
2023-03-28 17:43   ` Johannes Weiner
2023-03-28  6:16 ` [PATCH v1 4/9] cgroup: rstat: add WARN_ON_ONCE() if flushing outside task context Yosry Ahmed
2023-03-28 14:59   ` Shakeel Butt
2023-03-28 17:49   ` Johannes Weiner
2023-03-28 18:59     ` Yosry Ahmed
2023-03-28 22:18       ` Yosry Ahmed
2023-03-28  6:16 ` [PATCH v1 5/9] memcg: replace stats_flush_lock with an atomic Yosry Ahmed
2023-03-28 14:15   ` Shakeel Butt
2023-03-28 18:52     ` Yosry Ahmed
2023-03-28 19:28       ` Shakeel Butt
2023-03-28 19:34         ` Yosry Ahmed
2023-03-28 19:42           ` Yosry Ahmed
2023-03-28 17:53   ` Johannes Weiner
2023-03-28  6:16 ` [PATCH v1 6/9] memcg: sleep during flushing stats in safe contexts Yosry Ahmed
2023-03-28 15:09   ` Shakeel Butt
2023-03-28 18:35   ` Johannes Weiner
2023-03-28 18:45     ` Yosry Ahmed [this message]
2023-03-28 19:06       ` Johannes Weiner
2023-03-28 19:26         ` Yosry Ahmed
2023-03-28  6:16 ` [PATCH v1 7/9] workingset: memcg: sleep when flushing stats in workingset_refault() Yosry Ahmed
2023-03-28 15:18   ` Shakeel Butt
2023-03-28 18:47     ` Johannes Weiner
2023-03-28 19:25     ` Yosry Ahmed
2023-03-28 18:43   ` Johannes Weiner
2023-03-28  6:16 ` [PATCH v1 8/9] vmscan: memcg: sleep when flushing stats during reclaim Yosry Ahmed
2023-03-28 15:19   ` Shakeel Butt
2023-03-28 19:01     ` Yosry Ahmed
2023-03-28 19:29       ` Shakeel Butt
2023-03-28 18:49   ` Johannes Weiner
2023-03-28  6:16 ` [PATCH v1 9/9] memcg: do not modify rstat tree for zero updates Yosry Ahmed
2023-03-28 15:20   ` Shakeel Butt
2023-03-28 18:50   ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkZxEEcVZ9G7NSM56q_uOyL7e353NT06kD0mY5DyNmKTpw@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=vasily.averin@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).