bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>, Josef Bacik <josef@toxicpanda.com>,
	Jens Axboe <axboe@kernel.dk>, Zefan Li <lizefan.x@bytedance.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vasily Averin <vasily.averin@linux.dev>,
	cgroups@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	bpf@vger.kernel.org
Subject: Re: [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock
Date: Thu, 23 Mar 2023 09:17:33 -0700	[thread overview]
Message-ID: <CAJD7tkbGCgk9VkGdec0=AdHErds4XQs1LzJMhqVryXdjY5PVAg@mail.gmail.com> (raw)
In-Reply-To: <CALvZod7-6F84POkNetA2XJB-24wms=5q_s495NEthO8b63rL4A@mail.gmail.com>

On Thu, Mar 23, 2023 at 9:10 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Thu, Mar 23, 2023 at 8:46 AM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Thu, Mar 23, 2023 at 8:43 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> > >
> > > On Thu, Mar 23, 2023 at 8:40 AM Shakeel Butt <shakeelb@google.com> wrote:
> > > >
> > > > On Thu, Mar 23, 2023 at 6:36 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> > > > >
> > > > [...]
> > > > > > >
> > > > > > > > 2. Are we really calling rstat flush in irq context?
> > > > > > >
> > > > > > > I think it is possible through the charge/uncharge path:
> > > > > > > memcg_check_events()->mem_cgroup_threshold()->mem_cgroup_usage(). I
> > > > > > > added the protection against flushing in an interrupt context for
> > > > > > > future callers as well, as it may cause a deadlock if we don't disable
> > > > > > > interrupts when acquiring cgroup_rstat_lock.
> > > > > > >
> > > > > > > > 3. The mem_cgroup_flush_stats() call in mem_cgroup_usage() is only
> > > > > > > > done for root memcg. Why is mem_cgroup_threshold() interested in root
> > > > > > > > memcg usage? Why not ignore root memcg in mem_cgroup_threshold() ?
> > > > > > >
> > > > > > > I am not sure, but the code looks like event notifications may be set
> > > > > > > up on root memcg, which is why we need to check thresholds.
> > > > > >
> > > > > > This is something we should deprecate as root memcg's usage is ill defined.
> > > > >
> > > > > Right, but I think this would be orthogonal to this patch series.
> > > > >
> > > >
> > > > I don't think we can make cgroup_rstat_lock a non-irq-disabling lock
> > > > without either breaking a link between mem_cgroup_threshold and
> > > > cgroup_rstat_lock or make mem_cgroup_threshold work without disabling
> > > > irqs.
> > > >
> > > > So, this patch can not be applied before either of those two tasks are
> > > > done (and we may find more such scenarios).
> > >
> > >
> > > Could you elaborate why?
> > >
> > > My understanding is that with an in_task() check to make sure we only
> > > acquire cgroup_rstat_lock from non-irq context it should be fine to
> > > acquire cgroup_rstat_lock without disabling interrupts.
> >
> > From mem_cgroup_threshold() code path, cgroup_rstat_lock will be taken
> > with irq disabled while other code paths will take cgroup_rstat_lock
> > with irq enabled. This is a potential deadlock hazard unless
> > cgroup_rstat_lock is always taken with irq disabled.
>
> Oh you are making sure it is not taken in the irq context through
> should_skip_flush(). Hmm seems like a hack. Normally it is recommended
> to actually remove all such users instead of silently
> ignoring/bypassing the functionality.

It is a workaround, we simply accept to read stale stats in irq
context instead of the expensive flush operation.

>
> So, how about removing mem_cgroup_flush_stats() from
> mem_cgroup_usage(). It will break the known chain which is taking
> cgroup_rstat_lock with irq disabled and you can add
> WARN_ON_ONCE(!in_task()).

This changes the behavior in a more obvious way because:
1. The memcg_check_events()->mem_cgroup_threshold()->mem_cgroup_usage()
path is also exercised in a lot of paths outside irq context, this
will change the behavior for any event thresholds on the root memcg.
With proposed skipped flushing in irq context we only change the
behavior in a small subset of cases.

I think we can skip flushing in irq context for now, and separately
deprecate threshold events for the root memcg. When that is done we
can come back and remove should_skip_flush() and add a VM_BUG_ON or
WARN_ON_ONCE instead. WDYT?

2. mem_cgroup_usage() is also used when reading usage from userspace.
This should be an easy workaround though.

  reply	other threads:[~2023-03-23 16:18 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-23  4:00 [RFC PATCH 0/7] Make rstat flushing IRQ and sleep friendly Yosry Ahmed
2023-03-23  4:00 ` [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock Yosry Ahmed
2023-03-23  4:29   ` Shakeel Butt
2023-03-23  5:15     ` Yosry Ahmed
2023-03-23  6:33       ` Shakeel Butt
2023-03-23 13:35         ` Yosry Ahmed
2023-03-23 15:40           ` Shakeel Butt
2023-03-23 15:42             ` Yosry Ahmed
2023-03-23 15:46               ` Shakeel Butt
2023-03-23 16:09                 ` Shakeel Butt
2023-03-23 16:17                   ` Yosry Ahmed [this message]
2023-03-23 16:29                     ` Shakeel Butt
2023-03-23 16:36                       ` Yosry Ahmed
2023-03-23 16:45                         ` Shakeel Butt
2023-03-23 16:51                           ` Yosry Ahmed
2023-03-23 19:09                             ` Shakeel Butt
2023-03-23 17:33                     ` Johannes Weiner
2023-03-23 18:09                       ` Yosry Ahmed
2023-03-23 18:19                         ` Johannes Weiner
2023-03-24  1:39   ` Tejun Heo
2023-03-24  7:22     ` Yosry Ahmed
2023-03-24 14:12       ` Waiman Long
2023-03-24 22:50         ` Yosry Ahmed
2023-03-25  1:54       ` Tejun Heo
2023-03-25  2:17         ` Yosry Ahmed
2023-03-25  4:30           ` Shakeel Butt
2023-03-25  4:37             ` Yosry Ahmed
2023-03-25  4:46               ` Shakeel Butt
2023-03-27 23:23                 ` Yosry Ahmed
2023-03-29 18:53                   ` Tejun Heo
2023-03-29 19:22                     ` Hugh Dickins
2023-03-29 20:00                       ` Tejun Heo
2023-03-29 20:38                         ` Hugh Dickins
2023-03-30  4:26                           ` Yosry Ahmed
2023-03-31  1:51                           ` Tejun Heo
2023-03-23  4:00 ` [RFC PATCH 2/7] memcg: do not disable interrupts when holding stats_flush_lock Yosry Ahmed
2023-03-23  4:32   ` Shakeel Butt
2023-03-23  5:16     ` Yosry Ahmed
2023-03-23  4:00 ` [RFC PATCH 3/7] cgroup: rstat: remove cgroup_rstat_flush_irqsafe() Yosry Ahmed
2023-03-23 15:43   ` Johannes Weiner
2023-03-23 15:45     ` Yosry Ahmed
2023-03-23  4:00 ` [RFC PATCH 4/7] memcg: sleep during flushing stats in safe contexts Yosry Ahmed
2023-03-23 15:56   ` Johannes Weiner
2023-03-23 16:01     ` Yosry Ahmed
2023-03-23 17:27       ` Johannes Weiner
2023-03-23 18:07         ` Yosry Ahmed
2023-03-23 19:35           ` Shakeel Butt
2023-03-23  4:00 ` [RFC PATCH 5/7] vmscan: memcg: sleep when flushing stats during reclaim Yosry Ahmed
2023-03-23  4:00 ` [RFC PATCH 6/7] workingset: memcg: sleep when flushing stats in workingset_refault() Yosry Ahmed
2023-03-23 15:50   ` Johannes Weiner
2023-03-23 16:02     ` Yosry Ahmed
2023-03-23 16:00   ` Johannes Weiner
2023-03-23 16:02     ` Yosry Ahmed
2023-03-23  4:00 ` [RFC PATCH 7/7] memcg: do not modify rstat tree for zero updates Yosry Ahmed
2023-03-23  4:10 ` [RFC PATCH 0/7] Make rstat flushing IRQ and sleep friendly Shakeel Butt
2023-03-23  5:07   ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJD7tkbGCgk9VkGdec0=AdHErds4XQs1LzJMhqVryXdjY5PVAg@mail.gmail.com' \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=vasily.averin@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).