LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: "Mickaël Salaün" <mic@digikod.net>
Cc: linux-kernel@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Daniel Mack <daniel@zonque.org>,
	"David S . Miller" <davem@davemloft.net>,
	Kees Cook <keescook@chromium.org>,
	Sargun Dhillon <sargun@sargun.me>,
	kernel-hardening@lists.openwall.com, linux-api@vger.kernel.org,
	linux-security-module@vger.kernel.org, netdev@vger.kernel.org,
	Tejun Heo <tj@kernel.org>,
	cgroups@vger.kernel.org
Subject: Re: [RFC v2 09/10] landlock: Handle cgroups (performance)
Date: Sat, 27 Aug 2016 13:43:09 -0700
Message-ID: <20160827204307.GA43714@ast-mbp.thefacebook.com> (raw)
In-Reply-To: <57C1EB72.2050703@digikod.net>

On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
> 
> On 27/08/2016 20:06, Alexei Starovoitov wrote:
> > On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> >>
> >> On 27/08/2016 01:05, Alexei Starovoitov wrote:
> >>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> >>>>
> >>>>>
> >>>>> - I don't think such 'for' loop can scale. The solution needs to work
> >>>>> with thousands of containers and thousands of cgroups.
> >>>>> In the patch 06/10 the proposal is to use 'current' as holder of
> >>>>> the programs:
> >>>>> +   for (prog = current->seccomp.landlock_prog;
> >>>>> +                   prog; prog = prog->prev) {
> >>>>> +           if (prog->filter == landlock_ret->filter) {
> >>>>> +                   cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx);
> >>>>> +                   break;
> >>>>> +           }
> >>>>> +   }
> >>>>> imo that's the root of scalability issue.
> >>>>> I think to be able to scale the bpf programs have to be attached to
> >>>>> cgroups instead of tasks.
> >>>>> That would be very different api. seccomp doesn't need to be touched.
> >>>>> But that is the only way I see to be able to scale.
> >>>>
> >>>> Landlock is inspired from seccomp which also use a BPF program per
> >>>> thread. For seccomp, each BPF programs are executed for each syscall.
> >>>> For Landlock, some BPF programs are executed for some LSM hooks. I don't
> >>>> see why it is a scale issue for Landlock comparing to seccomp. I also
> >>>> don't see why storing the BPF program list pointer in the cgroup struct
> >>>> instead of the task struct change a lot here. The BPF programs execution
> >>>> will be the same anyway (for each LSM hook). Kees should probably have a
> >>>> better opinion on this.
> >>>
> >>> seccomp has its own issues and copying them doesn't make this lsm any better.
> >>> Like seccomp bpf programs are all gigantic switch statement that looks
> >>> for interesting syscall numbers. All syscalls of a task are paying
> >>> non-trivial seccomp penalty due to such design. If bpf was attached per
> >>> syscall it would have been much cheaper. Of course doing it this way
> >>> for seccomp is not easy, but for lsm such facility is already there.
> >>> Blank call of a single bpf prog for all lsm hooks is unnecessary
> >>> overhead that can and should be avoided.
> >>
> >> It's probably a misunderstanding. Contrary to seccomp which run all the
> >> thread's BPF programs for any syscall, Landlock only run eBPF programs
> >> for the triggered LSM hooks, if their type match. Indeed, thanks to the
> >> multiple eBPF program types and contrary to seccomp, Landlock only run
> >> an eBPF program when needed. Landlock will have almost no performance
> >> overhead if the syscalls do not trigger the watched LSM hooks for the
> >> current process.
> > 
> > that's not what I see in the patch 06/10:
> > all lsm_hooks in 'static struct security_hook_list landlock_hooks'
> > (which eventually means all lsm hooks) will call
> > static inline int landlock_hook_##NAME
> > which will call landlock_run_prog()
> > which does:
> > + for (landlock_ret = current->seccomp.landlock_ret;
> > +      landlock_ret; landlock_ret = landlock_ret->prev) {
> > +    if (landlock_ret->triggered) {
> > +       ctx.cookie = landlock_ret->cookie;
> > +       for (prog = current->seccomp.landlock_prog;
> > +            prog; prog = prog->prev) {
> > +               if (prog->filter == landlock_ret->filter) {
> > +                       cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx);
> > +                       break;
> > +               }
> > +       }
> > 
> > that is unacceptable overhead and not a scalable design.
> > It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale
> > as soon as more lsm hooks are added.
> 
> Good catch! I forgot to check the program (sub)type in the loop to only
> run the needed programs for the current hook. I will fix this.
> 
> 
> > 
> >> As said above, Landlock will not run an eBPF programs when not strictly
> >> needed. Attaching to a cgroup will have the same performance impact as
> >> attaching to a process hierarchy.
> > 
> > Having a prog per cgroup per lsm_hook is the only scalable way I
> > could come up with. If you see another way, please propose.
> > current->seccomp.landlock_prog is not the answer.
> 
> Hum, I don't see the difference from a performance point of view between
> a cgroup-based or a process hierarchy-based system.
> 
> Maybe a better option should be to use an array of pointers with N
> entries, one for each supported hook, instead of a unique pointer list?

yes, clearly array dereference is faster than link list walk.
Now the question is where to keep this prog_array[num_lsm_hooks] ?
Since we cannot keep it inside task_struct, we have to allocate it.
Every time the task is creted then. What to do on the fork? That
will require changes all over. Then the obvious optimization would be
to share this allocated array of prog pointers across multiple tasks...
and little by little this new facility will look like cgroup.
Hence the suggestion to put this array into cgroup from the start.

> Anyway, being able to attach an LSM hook program to a cgroup thanks to
> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
> to use a process hierarchy). The downside will be to handle an LSM hook
> program which is not triggered by a seccomp-filter, but this should be
> needed anyway to handle interruptions.

what do you mean 'not triggered by seccomp' ?
You're not suggesting that this lsm has to enable seccomp to be functional?
imo that's non starter due to overhead.

  reply index

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-25 10:32 [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 01/10] landlock: Add Kconfig Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 02/10] bpf: Move u64_to_ptr() to BPF headers and inline it Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 03/10] bpf,landlock: Add a new arraymap type to deal with (Landlock) handles Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 04/10] seccomp: Split put_seccomp_filter() with put_seccomp() Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 05/10] seccomp: Handle Landlock Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 06/10] landlock: Add LSM hooks Mickaël Salaün
2016-08-30 18:56   ` Andy Lutomirski
2016-08-30 20:10     ` Mickaël Salaün
2016-08-30 20:18       ` Andy Lutomirski
2016-08-30 20:27         ` Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 07/10] landlock: Add errno check Mickaël Salaün
2016-08-25 11:13   ` Andy Lutomirski
2016-08-25 10:32 ` [RFC v2 08/10] landlock: Handle file system comparisons Mickaël Salaün
2016-08-25 11:12   ` Andy Lutomirski
2016-08-25 14:10     ` Mickaël Salaün
2016-08-26 14:57       ` Andy Lutomirski
2016-08-27 13:45         ` Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 09/10] landlock: Handle cgroups Mickaël Salaün
2016-08-25 11:09   ` Andy Lutomirski
2016-08-25 14:44     ` Mickaël Salaün
2016-08-26 12:55       ` Tejun Heo
2016-08-26 14:20       ` Andy Lutomirski
2016-08-26 15:50         ` Tejun Heo
2016-08-26  2:14   ` Alexei Starovoitov
2016-08-26 15:10     ` Mickaël Salaün
2016-08-26 23:05       ` Alexei Starovoitov
2016-08-27  7:30         ` Andy Lutomirski
2016-08-27 18:11           ` Alexei Starovoitov
2016-08-28  8:14             ` Andy Lutomirski
2016-08-27 14:06         ` [RFC v2 09/10] landlock: Handle cgroups (performance) Mickaël Salaün
2016-08-27 18:06           ` Alexei Starovoitov
2016-08-27 19:35             ` Mickaël Salaün
2016-08-27 20:43               ` Alexei Starovoitov [this message]
2016-08-27 21:14                 ` Mickaël Salaün
2016-08-28  8:13                   ` Andy Lutomirski
2016-08-28  9:42                     ` Mickaël Salaün
2016-08-30 18:55                       ` Andy Lutomirski
2016-08-30 20:20                         ` Mickaël Salaün
2016-08-30 20:23                           ` Andy Lutomirski
2016-08-30 20:33                             ` Mickaël Salaün
2016-08-30 20:55                               ` Alexei Starovoitov
2016-08-30 21:45                                 ` Andy Lutomirski
2016-08-31  1:36                                   ` Alexei Starovoitov
2016-08-31  3:29                                     ` Andy Lutomirski
2016-08-27 14:19         ` [RFC v2 09/10] landlock: Handle cgroups (netfilter match) Mickaël Salaün
2016-08-27 18:32           ` Alexei Starovoitov
2016-08-27 14:34         ` [RFC v2 09/10] landlock: Handle cgroups (program types) Mickaël Salaün
2016-08-27 18:19           ` Alexei Starovoitov
2016-08-27 19:55             ` Mickaël Salaün
2016-08-27 20:56               ` Alexei Starovoitov
2016-08-27 21:18                 ` Mickaël Salaün
2016-08-25 10:32 ` [RFC v2 10/10] samples/landlock: Add sandbox example Mickaël Salaün
2016-08-25 11:05 ` [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Andy Lutomirski
2016-08-25 13:57   ` Mickaël Salaün
2016-08-27  7:40 ` Andy Lutomirski
2016-08-27 15:10   ` Mickaël Salaün
2016-08-27 15:21     ` [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing (cgroup delegation) Mickaël Salaün
2016-08-30 16:06 ` [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing Andy Lutomirski
2016-08-30 19:51   ` Mickaël Salaün
2016-08-30 19:55     ` Andy Lutomirski
2016-09-15  9:19 ` Pavel Machek
2016-09-20 17:08   ` Mickaël Salaün
2016-09-24  7:45     ` Pavel Machek
2016-10-03 22:56     ` Kees Cook
2016-10-05 20:30       ` Mickaël Salaün

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160827204307.GA43714@ast-mbp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=daniel@zonque.org \
    --cc=davem@davemloft.net \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mic@digikod.net \
    --cc=netdev@vger.kernel.org \
    --cc=sargun@sargun.me \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git