linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <roman.gushchin@linux.dev>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: Tejun Heo <tj@kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>, Martin Lau <kafai@fb.com>,
	Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
	john fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Shakeel Butt <shakeelb@google.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Zefan Li <lizefan.x@bytedance.com>,
	Cgroups <cgroups@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>
Subject: Re: [PATCH bpf-next v3 00/13] bpf: Introduce selectable memcg for bpf map
Date: Fri, 16 Sep 2022 09:53:42 -0700	[thread overview]
Message-ID: <YySqFtU9skPaJipV@P9FQF9L96D.corp.robot.car> (raw)
In-Reply-To: <CALOAHbAzi0s3N_5BOkLsnGfwWCDpUksvvhPejjj5jo4G2v3mGg@mail.gmail.com>

On Tue, Sep 13, 2022 at 02:15:20PM +0800, Yafang Shao wrote:
> On Fri, Sep 9, 2022 at 12:13 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> > On Thu, Sep 08, 2022 at 10:37:02AM +0800, Yafang Shao wrote:
> > > On Thu, Sep 8, 2022 at 6:29 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > > >
> > > > On Wed, Sep 07, 2022 at 05:43:31AM -1000, Tejun Heo wrote:
> > > > > Hello,
> > > > >
> > > > > On Fri, Sep 02, 2022 at 02:29:50AM +0000, Yafang Shao wrote:
> > > > > ...
> > > > > > This patchset tries to resolve the above two issues by introducing a
> > > > > > selectable memcg to limit the bpf memory. Currently we only allow to
> > > > > > select its ancestor to avoid breaking the memcg hierarchy further.
> > > > > > Possible use cases of the selectable memcg as follows,
> > > > >
> > > > > As discussed in the following thread, there are clear downsides to an
> > > > > interface which requires the users to specify the cgroups directly.
> > > > >
> > > > >  https://lkml.kernel.org/r/YwNold0GMOappUxc@slm.duckdns.org
> > > > >
> > > > > So, I don't really think this is an interface we wanna go for. I was hoping
> > > > > to hear more from memcg folks in the above thread. Maybe ping them in that
> > > > > thread and continue there?
> > > >
> > >
> > > Hi Roman,
> > >
> > > > As I said previously, I don't like it, because it's an attempt to solve a non
> > > > bpf-specific problem in a bpf-specific way.
> > > >
> > >
> > > Why do you still insist that bpf_map->memcg is not a bpf-specific
> > > issue after so many discussions?
> > > Do you charge the bpf-map's memory the same way as you charge the page
> > > caches or slabs ?
> > > No, you don't. You charge it in a bpf-specific way.
> >
> 
> Hi Roman,
> 
> Sorry for the late response.
> I've been on vacation in the past few days.
> 
> > The only difference is that we charge the cgroup of the processes who
> > created a map, not a process who is doing a specific allocation.
> 
> This means the bpf-map can be indepent of process, IOW, the memcg of
> bpf-map can be indepent of the memcg of the processes.
> This is the fundamental difference between bpf-map and page caches, then...
> 
> > Your patchset doesn't change this.
> 
> We can make this behavior reasonable by introducing an independent
> memcg, as what I did in the previous version.
> 
> > There are pros and cons with this approach, we've discussed it back
> > to the times when bpf memcg accounting was developed. If you want
> > to revisit this, it's maybe possible (given there is a really strong and likely
> > new motivation appears), but I haven't seen any complaints yet except from you.
> >
> 
> memcg-base bpf accounting is a new feature, which may not be used widely.
> 
> > >
> > > > Yes, memory cgroups are not great for accounting of shared resources, it's well
> > > > known. This patchset looks like an attempt to "fix" it specifically for bpf maps
> > > > in a particular cgroup setup. Honestly, I don't think it's worth the added
> > > > complexity. Especially because a similar behaviour can be achieved simple
> > > > by placing the task which creates the map into the desired cgroup.
> > >
> > > Are you serious ?
> > > Have you ever read the cgroup doc? Which clearly describe the "No
> > > Internal Process Constraint".[1]
> > > Obviously you can't place the task in the desired cgroup, i.e. the parent memcg.
> >
> > But you can place it into another leaf cgroup. You can delete this leaf cgroup
> > and your memcg will get reparented. You can attach this process and create
> > a bpf map to the parent cgroup before it gets child cgroups.
> 
> If the process doesn't exit after it created bpf-map, we have to
> migrate it around memcgs....
> The complexity in deployment can introduce unexpected issues easily.
> 
> > You can revisit the idea of shared bpf maps and outlive specific cgroups.
> > Lof of options.
> >
> > >
> > > [1] https://www.kernel.org/doc/Documentation/cgroup-v2.txt
> > >
> > > > Beatiful? Not. Neither is the proposed solution.
> > > >
> > >
> > > Is it really hard to admit a fault?
> >
> > Yafang, you posted several versions and so far I haven't seen much of support
> > or excitement from anyone (please, fix me if I'm wrong). It's not like I'm
> > nacking a patchset with many acks, reviews and supporters.
> >
> > Still think you're solving an important problem in a reasonable way?
> > It seems like not many are convinced yet. I'd recommend to focus on this instead
> > of blaming me.
> >
> 
> The best way so far is to introduce specific memcg for specific resources.
> Because not only the process owns its memcg, but also specific
> resources own their memcgs, for example bpf-map, or socket.
> 
> struct bpf_map {                                 <<<< memcg owner
>     struct memcg_cgroup *memcg;
> };
> 
> struct sock {                                       <<<< memcg owner
>     struct mem_cgroup *sk_memcg;
> };
> 
> These resources already have their own memcgs, so we should make this
> behavior formal.
> 
> The selectable memcg is just a variant of 'echo ${proc} > cgroup.procs'.

This is a fundamental change: cgroups were always hierarchical groups
of processes/threads. You're basically suggesting to extend it to
hierarchical groups of processes and some other objects (what's a good
definition?). It's a huge change and it's scope is definetely larger
than bpf and even memory cgroups. It will raise a lot of questions:
e.g. what does it mean for other controllers (cpu, io, etc)?
Which objects can have dedicated cgroups and which not? How the interface will
look like? How the oom handling will work? Etc.

The history showed that starting small with one controller and/or specific
use case isn't working well for cgroups, because different resources and
controllers are not living independently. So if you really want to go this way
you need to present the whole picture and convince many people that it's worth
it. I doubt this specific bpf map accounting thing can justify it.

Personally I know some examples where such functionality could be useful,
but I'm not yet convinced it's worth the effort and potential problems.

Thanks!


  reply	other threads:[~2022-09-16 16:54 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-02  2:29 [PATCH bpf-next v3 00/13] bpf: Introduce selectable memcg for bpf map Yafang Shao
2022-09-02  2:29 ` [PATCH bpf-next v3 01/13] cgroup: Update the comment on cgroup_get_from_fd Yafang Shao
2022-09-02  2:29 ` [PATCH bpf-next v3 02/13] bpf: Introduce new helper bpf_map_put_memcg() Yafang Shao
2022-09-02  2:29 ` [PATCH bpf-next v3 03/13] bpf: Define bpf_map_{get,put}_memcg for !CONFIG_MEMCG_KMEM Yafang Shao
2022-09-02  2:29 ` [PATCH bpf-next v3 04/13] bpf: Call bpf_map_init_from_attr() immediately after map creation Yafang Shao
2022-09-02  2:29 ` [PATCH bpf-next v3 05/13] bpf: Save memcg in bpf_map_init_from_attr() Yafang Shao
2022-09-02  2:29 ` [PATCH bpf-next v3 06/13] bpf: Use scoped-based charge in bpf_map_area_alloc Yafang Shao
2022-09-02  2:29 ` [PATCH bpf-next v3 07/13] bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free} Yafang Shao
2022-09-02  2:29 ` [PATCH bpf-next v3 08/13] bpf: Use bpf_map_kzalloc in arraymap Yafang Shao
2022-09-02  2:29 ` [PATCH bpf-next v3 09/13] bpf: Use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
2022-09-02  2:30 ` [PATCH bpf-next v3 10/13] mm, memcg: Add new helper get_obj_cgroup_from_cgroup Yafang Shao
2022-09-02  2:30 ` [PATCH bpf-next v3 11/13] mm, memcg: Add new helper task_under_memcg_hierarchy Yafang Shao
2022-09-02  2:30 ` [PATCH bpf-next v3 12/13] bpf: Add return value for bpf_map_init_from_attr Yafang Shao
2022-09-02  2:30 ` [PATCH bpf-next v3 13/13] bpf: Introduce selectable memcg for bpf map Yafang Shao
2022-09-07 15:43 ` [PATCH bpf-next v3 00/13] " Tejun Heo
2022-09-07 15:45   ` Tejun Heo
2022-09-07 16:13     ` Alexei Starovoitov
2022-09-07 16:18       ` Tejun Heo
2022-09-07 16:27         ` Alexei Starovoitov
2022-09-07 17:01           ` Tejun Heo
2022-09-08  2:44             ` Yafang Shao
2022-09-07 22:28   ` Roman Gushchin
2022-09-08  2:37     ` Yafang Shao
2022-09-08  2:43       ` Alexei Starovoitov
2022-09-08  2:48         ` Yafang Shao
2022-09-08 16:13       ` Roman Gushchin
2022-09-13  6:15         ` Yafang Shao
2022-09-16 16:53           ` Roman Gushchin [this message]
2022-09-18  3:44             ` Yafang Shao
2022-09-20  2:40               ` Roman Gushchin
2022-09-20 12:42                 ` Yafang Shao
2022-09-20 23:15                   ` Roman Gushchin
2022-09-21  9:36                     ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YySqFtU9skPaJipV@P9FQF9L96D.corp.robot.car \
    --to=roman.gushchin@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hannes@cmpxchg.org \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=shakeelb@google.com \
    --cc=songliubraving@fb.com \
    --cc=songmuchun@bytedance.com \
    --cc=tj@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).