linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Alexei Starovoitov <ast@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>, Tejun Heo <tj@kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH bpf-next] bpf: fix cgroup bpf release synchronization
Date: Mon, 24 Jun 2019 04:02:20 +0000	[thread overview]
Message-ID: <20190624040211.GA10696@castle.dhcp.thefacebook.com> (raw)
In-Reply-To: <91017042-1b59-6110-dfdd-13cfbbec1ae1@fb.com>

On Sun, Jun 23, 2019 at 08:29:21PM -0700, Alexei Starovoitov wrote:
> On 6/23/19 7:30 PM, Roman Gushchin wrote:
> > Since commit 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf
> > from cgroup itself"), cgroup_bpf release occurs asynchronously
> > (from a worker context), and before the release of the cgroup itself.
> > 
> > This introduced a previously non-existing race between the release
> > and update paths. E.g. if a leaf's cgroup_bpf is released and a new
> > bpf program is attached to the one of ancestor cgroups at the same
> > time. The race may result in double-free and other memory corruptions.
> > 
> > To fix the problem, let's protect the body of cgroup_bpf_release()
> > with cgroup_mutex, as it was effectively previously, when all this
> > code was called from the cgroup release path with cgroup mutex held.
> > 
> > Also make sure, that we don't leave already freed pointers to the
> > effective prog arrays. Otherwise, they can be released again by
> > the update path. It wasn't necessary before, because previously
> > the update path couldn't see such a cgroup, as cgroup_bpf and cgroup
> > itself were released together.
> 
> I thought dying cgroup won't have any children cgroups ?

It's not completely true, a dying cgroup can't have living children.

> It should have been empty with no tasks inside it?

Right.

> Only some resources are still held?

Right.

> mutex and zero init are highly suspicious.
> It feels that cgroup_bpf_release is called too early.

An alternative solution is to bump the refcounter on
every update path, and explicitly skip de-bpf'ed cgroups.

> 
> Thinking from another angle... if child cgroups can still attach then
> this bpf_release is broken.

Hm, what do you mean under attach? It's not possible to attach
a new prog, but if a prog is attached to a parent cgroup,
a pointer can spill through "effective" array.

But I agree, it's broken. Update path should ignore such
cgroups (cgroups, which cgroup_bpf was released). I'll take a look.

> The code should be
> calling __cgroup_bpf_detach() one by one to make sure
> update_effective_progs() is called, since descendant are still
> sort-of alive and can attach?

Not sure I get you. Dying cgroup is a leaf cgroup.

> 
> My money is on 'too early'.
> May be cgroup is not dying ?
> Just cgroup_sk_free() is called on the last socket and
> this auto-detach logic got triggered incorrectly?

So, once again, what's my picture:

A
A/B
A/B/C

cpu1:                               cpu2:
rmdir C                             attach new prog to A
C got dying                         update A, update B, update C...
C's cgroup_bpf is released          C's effective progs is replaced with new one
                                    old is double freed

It looks like it can be reproduced without any sockets.

Thanks!

  reply	other threads:[~2019-06-24  4:02 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-24  2:30 [PATCH bpf-next] bpf: fix cgroup bpf release synchronization Roman Gushchin
2019-06-24  3:29 ` Alexei Starovoitov
2019-06-24  4:02   ` Roman Gushchin [this message]
2019-06-25 15:50     ` Alexei Starovoitov
2019-06-25 16:22       ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190624040211.GA10696@castle.dhcp.thefacebook.com \
    --to=guro@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=ast@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).