All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
To: Sargun Dhillon <sargun-GaZTRHToo+CzQB+pC5nmwQ@public.gmane.org>
Cc: Pablo Neira Ayuso <pablo-Cap9r6Oaw4JrovVCs/uTlw@public.gmane.org>,
	htejun-b10kYP2dOMg@public.gmane.org,
	daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org,
	ast-b10kYP2dOMg@public.gmane.org,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	kafai-b10kYP2dOMg@public.gmane.org,
	fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org,
	harald-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v5 0/6] Add eBPF hooks for cgroups
Date: Mon, 19 Sep 2016 18:34:28 +0200	[thread overview]
Message-ID: <1918ef04-f22f-dce3-4bb4-1275936111d0@zonque.org> (raw)
In-Reply-To: <20160916195728.GA14736-I4sfFR6g6EicJoAdRrHjTrzMkBWIpU9tytq7g7fCXyjEk0E+pv7Png@public.gmane.org>

Hi,

On 09/16/2016 09:57 PM, Sargun Dhillon wrote:
> On Wed, Sep 14, 2016 at 01:13:16PM +0200, Daniel Mack wrote:

>> I have no idea what makes you think this is limited to systemd. As I
>> said, I provided an example for userspace that works from the command
>> line. The same limitation apply as for all other users of cgroups.
>>
> So, at least in my work, we have Mesos, but on nearly every machine that Mesos 
> runs, people also have systemd. Now, there's recently become a bit of a battle 
> of ownership of things like cgroups on these machines. We can usually solve it 
> by nesting under systemd cgroups, and thus so far we've avoided making too many 
> systemd-specific concessions.
> 
> The reason this works (mostly), is because everything we touch has a sense of 
> nesting, where we can apply policy at a place lower in the hierarchy, and yet 
> systemd's monitoring and policy still stays in place. 
> 
> Now, with this patch, we don't have that, but I think we can reasonably add some 
> flag like "no override" when applying policies, or alternatively something like 
> "no new privileges", to prevent children from applying policies that override 
> top-level policy.

Yes, but the API is already guarded by CAP_NET_ADMIN. Take that
capability away from your children, and they can't tamper with the
policy. Does that work for you?

> I realize there is a speed concern as well, but I think for 
> people who want nested policy, we're willing to make the tradeoff. The cost
> of traversing a few extra pointers still outweighs the overhead of network
> namespaces, iptables, etc.. for many of us. 

Not sure. Have you tried it?

> What do you think Daniel?

I think we should look at an implementation once we really need it, and
then revisit the performance impact. In any case, this can be changed
under the hood, without touching the userspace API (except for adding
flags if we need them).

>> Not necessarily. You can as well do it the inetd way, and pass the
>> socket to a process that is launched on demand, but do SO_ATTACH_FILTER
>> + SO_LOCK_FILTER  in the middle. What happens with payload on the socket
>> is not transparent to the launched binary at all. The proposed cgroup
>> eBPF solution implements a very similar behavior in that regard.
>
> It would be nice to be able to see whether or not a filter is attached to a 
> cgroup, but given this is going through syscalls, at least introspection
> is possible as opposed to something like netlink.

Sure, there are many ways. I implemented the bpf cgroup logic using an
own cgroup controller once, which made it possible to read out the
status. But as we agreed on attaching programs through the bpf(2) system
call, I moved back to the implementation that directly stores the
pointers in the cgroup.

First enabling the controller through the fs-backed cgroup interface,
then come back through the bpf(2) syscall and then go back to the fs
interface to read out status values is a bit weird.

>> And FWIW, I agree with Thomas - there is nothing wrong with having
>> multiple options to use for such use-cases.
>
> Right now, for containers, we have netfilter and network namespaces.
> There's a lot of performance overhead that comes with this.

Out of curiosity: Could you express that in numbers? And how exactly are
you testing?

> Not only
> that, but iptables doesn't really have a simple way of usage by
> automated infrastructure. We (firewalld, systemd, dockerd, mesos)
> end up fighting with one another for ownership over firewall rules.

Yes, that's a common problem.

> Although, I have problems with this approach, I think that it's
> a good baseline where we can have top level owned by systemd,
> docker underneath that, and Mesos underneath that. We can add
> additional hooks for things like Checmate and Landlock, and
> with a little more work, we can do compositition, solving
> all of our problems.

It is supposed to be just a baseline, yes.


Thanks for your feedback,
Daniel

  parent reply	other threads:[~2016-09-19 16:34 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-12 16:12 [PATCH v5 0/6] Add eBPF hooks for cgroups Daniel Mack
2016-09-12 16:12 ` [PATCH v5 1/6] bpf: add new prog type for cgroup socket filtering Daniel Mack
2016-09-12 16:12 ` [PATCH v5 2/6] cgroup: add support for eBPF programs Daniel Mack
2016-09-12 16:12 ` [PATCH v5 3/6] bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH commands Daniel Mack
     [not found] ` <1473696735-11269-1-git-send-email-daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
2016-09-12 16:12   ` [PATCH v5 4/6] net: filter: run cgroup eBPF ingress programs Daniel Mack
2016-09-12 16:12   ` [PATCH v5 5/6] net: core: run cgroup eBPF egress programs Daniel Mack
2016-09-12 16:12   ` [PATCH v5 6/6] samples: bpf: add userspace example for attaching eBPF programs to cgroups Daniel Mack
2016-09-13 11:56 ` [PATCH v5 0/6] Add eBPF hooks for cgroups Pablo Neira Ayuso
2016-09-13 13:31   ` Daniel Mack
     [not found]     ` <da300784-284c-0d1f-a82e-aa0a0f8ae116-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
2016-09-13 14:14       ` Daniel Borkmann
2016-09-13 17:24       ` Pablo Neira Ayuso
2016-09-14  4:42         ` Alexei Starovoitov
2016-09-14  9:03           ` Thomas Graf
     [not found]           ` <20160914044217.GA44742-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
2016-09-14 10:30             ` Pablo Neira Ayuso
2016-09-14 11:06               ` Thomas Graf
2016-09-14 11:36               ` Daniel Borkmann
2016-09-14 11:13         ` Daniel Mack
     [not found]           ` <6de6809a-13f5-4000-5639-c760dde30223-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
2016-09-14 11:42             ` Daniel Borkmann
     [not found]               ` <57D937B9.2090100-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
2016-09-14 15:55                 ` Alexei Starovoitov
2016-09-16 19:57           ` Sargun Dhillon
     [not found]             ` <20160916195728.GA14736-I4sfFR6g6EicJoAdRrHjTrzMkBWIpU9tytq7g7fCXyjEk0E+pv7Png@public.gmane.org>
2016-09-18 23:34               ` Sargun Dhillon
2016-09-19 16:34               ` Daniel Mack [this message]
2016-09-19 21:53                 ` Sargun Dhillon
     [not found]                   ` <20160919215311.GA9723-I4sfFR6g6EicJoAdRrHjTrzMkBWIpU9tytq7g7fCXyjEk0E+pv7Png@public.gmane.org>
2016-09-20 14:25                     ` Daniel Mack
2016-09-15  6:36 ` Vincent Bernat
     [not found]   ` <m3y42tlldz.fsf-PiWSfznZvZU/eRriIvX0kg@public.gmane.org>
2016-09-15  8:11     ` Daniel Mack
2016-09-15  8:11       ` Daniel Mack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1918ef04-f22f-dce3-4bb4-1275936111d0@zonque.org \
    --to=daniel-cyrqpvfzoowdnm+yrofe0a@public.gmane.org \
    --cc=ast-b10kYP2dOMg@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org \
    --cc=davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org \
    --cc=fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org \
    --cc=harald-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=htejun-b10kYP2dOMg@public.gmane.org \
    --cc=kafai-b10kYP2dOMg@public.gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=pablo-Cap9r6Oaw4JrovVCs/uTlw@public.gmane.org \
    --cc=sargun-GaZTRHToo+CzQB+pC5nmwQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.