From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751491AbdAPBTH (ORCPT <rfc822;w@1wt.eu>);
        Sun, 15 Jan 2017 20:19:07 -0500
Received: from mail-it0-f67.google.com ([209.85.214.67]:34179 "EHLO
        mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751173AbdAPBTF (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 15 Jan 2017 20:19:05 -0500
Date: Sun, 15 Jan 2017 20:19:01 -0500
From: Tejun Heo <tj@kernel.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
        Andy Lutomirski <luto@amacapital.net>, David Ahern <dsahern@gmail.com>,
        Alexei Starovoitov <alexei.starovoitov@gmail.com>,
        Andy Lutomirski <luto@kernel.org>, Daniel Mack <daniel@zonque.org>,
        =?iso-8859-1?Q?Micka=EBl_Sala=FCn?= <mic@digikod.net>,
        Kees Cook <keescook@chromium.org>, Jann Horn <jann@thejh.net>,
        "David S. Miller" <davem@davemloft.net>, Thomas Graf <tgraf@suug.ch>,
        Michael Kerrisk <mtk.manpages@gmail.com>,
        Linux API <linux-api@vger.kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Network Development <netdev@vger.kernel.org>
Subject: Re: Potential issues (security and otherwise) with the current
 cgroup-bpf API
Message-ID: <20170116011901.GH14446@mtj.duckdns.org>
References: <CALCETrV81oFwq2AgeRsN54HA1jR=b5cOZfAgve8H8zhx83DTyA@mail.gmail.com>
 <20161219205631.GA31242@ast-mbp.thefacebook.com>
 <CALCETrWr5XMkexdGp7HdkiLkQV=P9ycj+sNO7xWSRoCVxihVZA@mail.gmail.com>
 <20161220000254.GA58895@ast-mbp.thefacebook.com>
 <CALCETrU1_bDVLfokQ7zasHVmeq7S-R+603GEw59V_wuj4eE1hw@mail.gmail.com>
 <2dbec775-6304-e44c-19c5-fbf07877e7b1@gmail.com>
 <CALCETrUW2jEYmjSsOrPj+MAjkDGGUCw_rdxQh+5Er0r4ReGLnA@mail.gmail.com>
 <20161220091150.GJ3124@twins.programming.kicks-ass.net>
 <20170103102559.GA30129@dhcp22.suse.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170103102559.GA30129@dhcp22.suse.cz>
User-Agent: Mutt/1.7.1 (2016-10-04)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hello,

Sorry about the delay.  Some fire fighthing followed the holidays.

On Tue, Jan 03, 2017 at 11:25:59AM +0100, Michal Hocko wrote:
> > So from what I understand the proposed cgroup is not in fact
> > hierarchical at all.
> > 
> > @TJ, I thought you were enforcing all new cgroups to be properly
> > hierarchical, that would very much include this one.
> 
> I would be interested in that as well. We have made that mistake in
> memcg v1 where hierarchy could be disabled for performance reasons and
> that turned out to be major PITA in the end. Why do we want to repeat
> the same mistake here?

Across the different threads on this subject, there have been multiple
explanations but I'll try to sum it up more clearly.

The big issue here is whether this is a cgroup thing or a bpf thing.
I don't think there's anything inherently wrong with one approach or
the other.  Forget about the proposed cgroup bpf extentions but thinkg
about how iptables does cgroups.  Whether it's the netcls/netprio in
v1 or direct membership matching in v2, it is the network side testing
for cgroup membership one way or the other.  The only part where
cgroup is involved in is answering that test.

This also holds true for the perf controller.  While it is implemented
as a controller, it isn't visible to cgroup users in any way and the
only function it serves is providing the membership test to perf
subsystem.  perf is the one which decides whether and how it is to be
used.  cgroup providing membership test to other subsystems is
completely acceptable and established.

Now coming back to bpf, the current implementation is just that.
Sure, cgroup hosts the rules in its data structures but that isn't
something conceptually relevant.  We might as well implement it as a
prefixed hash table from bpf side.  Having pointers in struct cgroup
is just a more efficient and easier way of achieving the same result.
In fact, IIUC, this whole thing was born out of discussions around
implementing scalable cgroup membership matching from bpf programs.

So, what's proposed is a proper part of bpf.  In terms of
implementation, cgroup helps by hosting the pointers but that doesn't
necessarily affect the conceptual structure of it.  Given that, I
don't think it'd be a good idea to add anything to cgroup interface
for this feature.  Introspection is great to have but this should be
introspectable together with other bpf programs using the same
mechanism.  That's where it belongs.

None of the issues that people have been raising here is actually an
issue if one thinks of it as a part of bpf.  Its security model is
exactly the same as any other bpf programs.  Recursive behavior is
exactly the same as how other external cgroup descendant membership
testing work.  There is no issue here whatsoever.

Now, I'm not claiming that a bpf mechanism which is a proper part of
cgrou isn't attractive.  It is, especially with delegation; however,
that is also where we don't quite know how to proceed.  This doesn't
have much to do with cgroup.  If something is delegatable to non-priv
users and scoped, cgroup's fine with it and if that's not possible it
simply isn't something which is delegatable and putting it on cgroup
doesn't change that.

I'm far from being a bpf expert, so I could be wrong here, but I don't
think there's anything fundamental which prevents bpf from being
delegatable but at the same time bpf is something which is extremely
flexible and nobody really thought about or worked that much on
delegating bpf. If there's enough need for it, I'm sure we'll
eventually get there but from what I hear it isn't something we can
pull off in a restricted timeframe.

There's nothing which makes the currently implemented mechanism
exclusive with a cgroup controller based one.  The hooks are the
expensive part but can be shared, the rest is just about which
programs to execute in what order and how they should be chained.

There are a lot of immediate use cases which can benefit from the
proposed cgroup bpf mechanism and they're all fine with it being a
part of bpf and behaving like any other network mechanism behaves in
terms of configuration and delegation.  I don't see a reason why we
would hold back on merging this.  All the raised issues are coming
from confusing this as a part of cgroup.  It isn't.  It is a part of
bpf.  If we want a bpf cgroup controller, great, but that is a
separate thing.

Thanks.

-- 
tejun