On 27/08/2016 20:06, Alexei Starovoitov wrote: > On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote: >> >> On 27/08/2016 01:05, Alexei Starovoitov wrote: >>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote: >>>> >>>>> >>>>> - I don't think such 'for' loop can scale. The solution needs to work >>>>> with thousands of containers and thousands of cgroups. >>>>> In the patch 06/10 the proposal is to use 'current' as holder of >>>>> the programs: >>>>> + for (prog = current->seccomp.landlock_prog; >>>>> + prog; prog = prog->prev) { >>>>> + if (prog->filter == landlock_ret->filter) { >>>>> + cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx); >>>>> + break; >>>>> + } >>>>> + } >>>>> imo that's the root of scalability issue. >>>>> I think to be able to scale the bpf programs have to be attached to >>>>> cgroups instead of tasks. >>>>> That would be very different api. seccomp doesn't need to be touched. >>>>> But that is the only way I see to be able to scale. >>>> >>>> Landlock is inspired from seccomp which also use a BPF program per >>>> thread. For seccomp, each BPF programs are executed for each syscall. >>>> For Landlock, some BPF programs are executed for some LSM hooks. I don't >>>> see why it is a scale issue for Landlock comparing to seccomp. I also >>>> don't see why storing the BPF program list pointer in the cgroup struct >>>> instead of the task struct change a lot here. The BPF programs execution >>>> will be the same anyway (for each LSM hook). Kees should probably have a >>>> better opinion on this. >>> >>> seccomp has its own issues and copying them doesn't make this lsm any better. >>> Like seccomp bpf programs are all gigantic switch statement that looks >>> for interesting syscall numbers. All syscalls of a task are paying >>> non-trivial seccomp penalty due to such design. If bpf was attached per >>> syscall it would have been much cheaper. Of course doing it this way >>> for seccomp is not easy, but for lsm such facility is already there. >>> Blank call of a single bpf prog for all lsm hooks is unnecessary >>> overhead that can and should be avoided. >> >> It's probably a misunderstanding. Contrary to seccomp which run all the >> thread's BPF programs for any syscall, Landlock only run eBPF programs >> for the triggered LSM hooks, if their type match. Indeed, thanks to the >> multiple eBPF program types and contrary to seccomp, Landlock only run >> an eBPF program when needed. Landlock will have almost no performance >> overhead if the syscalls do not trigger the watched LSM hooks for the >> current process. > > that's not what I see in the patch 06/10: > all lsm_hooks in 'static struct security_hook_list landlock_hooks' > (which eventually means all lsm hooks) will call > static inline int landlock_hook_##NAME > which will call landlock_run_prog() > which does: > + for (landlock_ret = current->seccomp.landlock_ret; > + landlock_ret; landlock_ret = landlock_ret->prev) { > + if (landlock_ret->triggered) { > + ctx.cookie = landlock_ret->cookie; > + for (prog = current->seccomp.landlock_prog; > + prog; prog = prog->prev) { > + if (prog->filter == landlock_ret->filter) { > + cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx); > + break; > + } > + } > > that is unacceptable overhead and not a scalable design. > It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale > as soon as more lsm hooks are added. Good catch! I forgot to check the program (sub)type in the loop to only run the needed programs for the current hook. I will fix this. > >> As said above, Landlock will not run an eBPF programs when not strictly >> needed. Attaching to a cgroup will have the same performance impact as >> attaching to a process hierarchy. > > Having a prog per cgroup per lsm_hook is the only scalable way I > could come up with. If you see another way, please propose. > current->seccomp.landlock_prog is not the answer. Hum, I don't see the difference from a performance point of view between a cgroup-based or a process hierarchy-based system. Maybe a better option should be to use an array of pointers with N entries, one for each supported hook, instead of a unique pointer list? Anyway, being able to attach an LSM hook program to a cgroup thanks to the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility to use a process hierarchy). The downside will be to handle an LSM hook program which is not triggered by a seccomp-filter, but this should be needed anyway to handle interruptions.