LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: "Mickaël Salaün" <mic@digikod.net>
Cc: linux-kernel@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Daniel Mack <daniel@zonque.org>,
	"David S . Miller" <davem@davemloft.net>,
	Elena Reshetova <elena.reshetova@intel.com>,
	Kees Cook <keescook@chromium.org>,
	Sargun Dhillon <sargun@sargun.me>,
	kernel-hardening@lists.openwall.com, linux-api@vger.kernel.org,
	linux-security-module@vger.kernel.org, netdev@vger.kernel.org,
	anaravaram@google.com, tj@kernel.org
Subject: Re: [RFC v2 09/10] landlock: Handle cgroups
Date: Thu, 25 Aug 2016 19:14:34 -0700
Message-ID: <20160826021432.GA8291@ast-mbp.thefacebook.com> (raw)
In-Reply-To: <1472121165-29071-10-git-send-email-mic@digikod.net>

On Thu, Aug 25, 2016 at 12:32:44PM +0200, Mickaël Salaün wrote:
> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op)
> to compare the current process cgroup with a cgroup handle, The handle
> can match the current cgroup if it is the same or a child. This allows
> to make conditional rules according to the current cgroup.
> 
> A cgroup handle is a map entry created from a file descriptor referring
> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the
> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the
> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP.
> 
> An unprivileged process can create and manipulate cgroups thanks to
> cgroup delegation.
> 
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
...
> +static inline u64 bpf_landlock_cmp_cgroup_beneath(u64 r1_option, u64 r2_map,
> +		u64 r3_map_op, u64 r4, u64 r5)
> +{
> +	u8 option = (u8) r1_option;
> +	struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
> +	enum bpf_map_array_op map_op = r3_map_op;
> +	struct bpf_array *array = container_of(map, struct bpf_array, map);
> +	struct cgroup *cg1, *cg2;
> +	struct map_landlock_handle *handle;
> +	int i;
> +
> +	/* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP is an arraymap */
> +	if (unlikely(!map)) {
> +		WARN_ON(1);
> +		return -EFAULT;
> +	}
> +	if (unlikely((option | _LANDLOCK_FLAG_OPT_MASK) != _LANDLOCK_FLAG_OPT_MASK))
> +		return -EINVAL;
> +
> +	/* for now, only handle OP_OR */
> +	switch (map_op) {
> +	case BPF_MAP_ARRAY_OP_OR:
> +		break;
> +	case BPF_MAP_ARRAY_OP_UNSPEC:
> +	case BPF_MAP_ARRAY_OP_AND:
> +	case BPF_MAP_ARRAY_OP_XOR:
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	synchronize_rcu();
> +
> +	for (i = 0; i < array->n_entries; i++) {
> +		handle = (struct map_landlock_handle *)
> +				(array->value + array->elem_size * i);
> +
> +		/* protected by the proto types, should not happen */
> +		if (unlikely(handle->type != BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD)) {
> +			WARN_ON(1);
> +			return -EFAULT;
> +		}
> +		if (unlikely(!handle->css)) {
> +			WARN_ON(1);
> +			return -EFAULT;
> +		}
> +
> +		if (option & LANDLOCK_FLAG_OPT_REVERSE) {
> +			cg1 = handle->css->cgroup;
> +			cg2 = task_css_set(current)->dfl_cgrp;
> +		} else {
> +			cg1 = task_css_set(current)->dfl_cgrp;
> +			cg2 = handle->css->cgroup;
> +		}
> +
> +		if (cgroup_is_descendant(cg1, cg2))
> +			return 0;
> +	}
> +	return 1;
> +}

- please take a loook at exisiting bpf_current_task_under_cgroup and
reuse BPF_MAP_TYPE_CGROUP_ARRAY as a minimum. Doing new cgroup array
is nothing but duplication of the code.

- I don't think such 'for' loop can scale. The solution needs to work
with thousands of containers and thousands of cgroups.
In the patch 06/10 the proposal is to use 'current' as holder of
the programs:
+   for (prog = current->seccomp.landlock_prog;
+                   prog; prog = prog->prev) {
+           if (prog->filter == landlock_ret->filter) {
+                   cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx);
+                   break;
+           }
+   }
imo that's the root of scalability issue.
I think to be able to scale the bpf programs have to be attached to
cgroups instead of tasks.
That would be very different api. seccomp doesn't need to be touched.
But that is the only way I see to be able to scale.
May be another way of thinking about it is 'lsm cgroup controller'
that Sargun is proposing.
The lsm hooks will provide stable execution points and the programs
will be called like:
prog = task_css_set(current)->dfl_cgrp->bpf.prog_effective[lsm_hook_id];
BPF_PROG_RUN(prog, ctx);
The delegation functionality and 'prog_effective' logic that
Daniel Mack is proposing will be fully reused here.
External container management software will be able to apply bpf
programs to control tasks under cgroup and such
bpf_landlock_cmp_cgroup_beneath() helper won't be necessary.
The user will be able to register different programs for different lsm hooks.
If I understand the patch 6/10 correctly, there is one (or a list) prog for
all lsm hooks per task which is not flexible enough.
Anoop Naravaram's use case is to control the ports the applications
under cgroup can bind and listen on.
Such use case can be solved by such 'lsm cgroup controller' by
attaching bpf program to security_socket_bind lsm hook and
filtering sockaddr.
Furthermore Sargun's use case is to allow further sockaddr rewrites
from the bpf program which can be done as natural extension
of such mechanism.

If I understood Daniel's Anoop's Sargun's and yours use cases
correctly the common piece of kernel infrastructure that can solve
them all can start from Daniel's current set of patches that
establish a mechanism of attaching bpf program to a cgroup.
Then adding lsm hooks to it and later allowing argument rewrite
(since they're already in the kernel and no ToCToU problems exist)

As far as safety and type checking that bpf programs has to do,
I like the approach of patch 06/10:
+LANDLOCK_HOOK2(file_open, FILE_OPEN,
+       PTR_TO_STRUCT_FILE, struct file *, file,
+       PTR_TO_STRUCT_CRED, const struct cred *, cred
+)
teaching verifier to recognize struct file, cred, sockaddr
will let bpf program access them naturally without any overhead.
Though:
@@ -102,6 +102,9 @@ enum bpf_prog_type {
        BPF_PROG_TYPE_SCHED_CLS,
        BPF_PROG_TYPE_SCHED_ACT,
        BPF_PROG_TYPE_TRACEPOINT,
+       BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
+       BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
+       BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
 };
is a bit of overkill.
I think it would be cleaner to have single
BPF_PROG_TYPE_LSM and at program load time pass
lsm_hook_id as well, so that verifier can do safety checks
based on type info provided in LANDLOCK_HOOKs

  parent reply index

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 01/10] landlock: Add Kconfig Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 02/10] bpf: Move u64_to_ptr() to BPF headers and inline it Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 03/10] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 04/10] seccomp: Split put_seccomp_filter() with put_seccomp() Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 05/10] seccomp: Handle Landlock Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 06/10] landlock: Add LSM hooks Mickaël Salaün
2016-08-30 18:56   ` Andy Lutomirski
2016-08-30 20:10     ` Mickaël Salaün
2016-08-30 20:18       ` Andy Lutomirski
2016-08-30 20:27         ` Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 07/10] landlock: Add errno check Mickaël Salaün
2016-08-25 11:13   ` Andy Lutomirski
2016-08-25 10:32 ` [RFC v2 08/10] landlock: Handle file system comparisons Mickaël Salaün
2016-08-25 11:12   ` Andy Lutomirski
2016-08-25 14:10     ` Mickaël Salaün
2016-08-26 14:57       ` Andy Lutomirski
2016-08-27 13:45         ` Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 09/10] landlock: Handle cgroups Mickaël Salaün
2016-08-25 11:09   ` Andy Lutomirski
2016-08-25 14:44     ` Mickaël Salaün
2016-08-26 12:55       ` Tejun Heo
2016-08-26 14:20       ` Andy Lutomirski
2016-08-26 15:50         ` Tejun Heo
2016-08-26  2:14   ` Alexei Starovoitov [this message]
2016-08-26 15:10     ` Mickaël Salaün
2016-08-26 23:05       ` Alexei Starovoitov
2016-08-27  7:30         ` Andy Lutomirski
2016-08-27 18:11           ` Alexei Starovoitov
2016-08-28  8:14             ` Andy Lutomirski
2016-08-27 14:06         ` [RFC v2 09/10] landlock: Handle cgroups (performance) Mickaël Salaün
2016-08-27 18:06           ` Alexei Starovoitov
2016-08-27 19:35             ` Mickaël Salaün
2016-08-27 20:43               ` Alexei Starovoitov
2016-08-27 21:14                 ` Mickaël Salaün
2016-08-28  8:13                   ` Andy Lutomirski
2016-08-28  9:42                     ` Mickaël Salaün
2016-08-30 18:55                       ` Andy Lutomirski
2016-08-30 20:20                         ` Mickaël Salaün
2016-08-30 20:23                           ` Andy Lutomirski
2016-08-30 20:33                             ` Mickaël Salaün
2016-08-30 20:55                               ` Alexei Starovoitov
2016-08-30 21:45                                 ` Andy Lutomirski
2016-08-31  1:36                                   ` Alexei Starovoitov
2016-08-31  3:29                                     ` Andy Lutomirski
2016-08-27 14:19         ` [RFC v2 09/10] landlock: Handle cgroups (netfilter match) Mickaël Salaün
2016-08-27 18:32           ` Alexei Starovoitov
2016-08-27 14:34         ` [RFC v2 09/10] landlock: Handle cgroups (program types) Mickaël Salaün
2016-08-27 18:19           ` Alexei Starovoitov
2016-08-27 19:55             ` Mickaël Salaün
2016-08-27 20:56               ` Alexei Starovoitov
2016-08-27 21:18                 ` Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 10/10] samples/landlock: Add sandbox example Mickaël Salaün
2016-08-25 11:05 ` [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Andy Lutomirski
2016-08-25 13:57   ` Mickaël Salaün
2016-08-27  7:40 ` Andy Lutomirski
2016-08-27 15:10   ` Mickaël Salaün
2016-08-27 15:21     ` [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing (cgroup delegation) Mickaël Salaün
2016-08-30 16:06 ` [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Andy Lutomirski
2016-08-30 19:51   ` Mickaël Salaün
2016-08-30 19:55     ` Andy Lutomirski
2016-09-15  9:19 ` Pavel Machek
2016-09-20 17:08   ` Mickaël Salaün
2016-09-24  7:45     ` Pavel Machek
2016-10-03 22:56     ` Kees Cook
2016-10-05 20:30       ` Mickaël Salaün

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160826021432.GA8291@ast-mbp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=anaravaram@google.com \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=daniel@zonque.org \
    --cc=davem@davemloft.net \
    --cc=elena.reshetova@intel.com \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mic@digikod.net \
    --cc=netdev@vger.kernel.org \
    --cc=sargun@sargun.me \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git