Re: [Ksummit-discuss] [TECH TOPIC] seccomp feature development

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Kees Cook <keescook@chromium.org>
Cc: bpf@vger.kernel.org, ksummit <ksummit-discuss@lists.linuxfoundation.org>
Subject: Re: [Ksummit-discuss] [TECH TOPIC] seccomp feature development
Date: Wed, 20 May 2020 15:12:56 -0700	[thread overview]
Message-ID: <20200520221256.tzqkjpeswv3d6ne2@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <202005201151.AFA3C9E@keescook>

On Wed, May 20, 2020 at 12:04:04PM -0700, Kees Cook wrote:
> On Wed, May 20, 2020 at 11:27:03AM -0700, Linus Torvalds wrote:
> > Don't make this some kind of abstract conceptual problem thing.
> > Because it's not.
> 
> I have no intention of making this abstract (the requests for expanding
> seccomp coverage have been for only a select class of syscalls, and
> specifically clone3 and openat2) nor more complicated than it needs to be
> (I regularly resist expanding the seccomp BPF dialect into eBPF).

Kees, since you've forked the thread I'm adding bpf mailing list back and
re-iterating my point:
** Nack to cBPF extensions **
How that is relevant?
You're proposing to add copy_from_user() to selected syscalls, like clone3,
and present large __u32 array to cBPF program.
In other words existing fixed sized 'struct seccomp_data' will become
either variable length or jumbo fixed size like one page.
In the fomer case it would mean that cBPF would need to be extended
with variable length logic. Which in turn means it will suffer from
spectre v1 issues.
We've spent a lot of time fixing spectre v1 issues with eBPF. Including
teaching the verifier to recognize speculative patterns inside the programs
so that malicious bpf progs trying to exploit spec v1 will be caught
at load time. There is no other tool (compiler or static analysis) that
can do similar analysis. I suggest that you look into what eBPF
is actually doing instead of trying to reinvent the wheel.
If you go with latter approach of presenting cBPF with giant
'struct seccomp_data + page' that extra page would need to be zeroed out
before invocation of bpf program which will make seccomp even less usable
that it is today. Currently it's slow and unusable in production datacenter.
People suggested for years to adopt eBPF in seccomp to accelerate it,
but, as you confessed, you resisted and sounds like now you want to
implement seccomp specific syscall bitmask?
Which means more kernel code, more bugs, more security issues.
imo that's another reinvented wheel when eBPF can do it already. I don't think
it's a good idea to add kernel code when eBPF-based solution exists and capable
of examining any level of nested args.

> Perhaps the question is "how deeply does seccomp need to inspect?"
> and maybe it does not get to see anything beyond just the "top level"
> struct (i.e. struct clone_args) and all pointers within THAT become
> opaque? That certainly simplifies the design.

clone3's 'struct clone_args' has set_tid pointer as a second level.
I don't think that sticking to first level of pointers for this particular
syscall will make seccomp filtering any more practical.
_______________________________________________
Ksummit-discuss mailing list
Ksummit-discuss@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss