On 2020-05-19, Alexei Starovoitov wrote: > On Wed, May 20, 2020 at 11:20:45AM +1000, Aleksa Sarai wrote: > > No it won't become copy_from_user(), nor will there be a TOCTOU race. > > > > The idea is that seccomp will proactively copy the struct (and > > recursively any of the struct pointers inside) before the syscall runs > > -- as this is done by seccomp it doesn't require any copy_from_user() > > primitives in cBPF. We then run the cBPF filter on the copied struct, > > just like how cBPF programs currently operate on seccomp_data (how this > > would be exposed to the cBPF program as part of the seccomp ABI is the > > topic of discussion here). > > > > Then, when the actual syscall code runs, the struct will have already > > been copied and the syscall won't copy it again. > > Let's take bpf syscall as an example. > Are you suggesting that all of syscall logic of conditionally parsing > the arguments will be copy-pasted into seccomp-syscall infra, then > it will do copy_from_user() all the data and replace all aligned_u64 > in "union bpf_attr" with kernel copied pointers instead of user pointers > and make all of bpf syscall's copy_from_user() actions to be conditional ? > If seccomp is on, use kernel pointers... if seccomp is off, do copy_from_user ? > And the same idea will be replicated for all syscalls? This would be done optionally per-syscall. Only syscalls which want to opt-in to such a mechanism (such as clone3 and openat2) would be affected. Also, bpf is possibly the least-friendly syscall to pick as an example of these types of filters -- openat2/clone3 is much simpler to consider. The point is that if we both agree that seccomp needs to have a way to do "deep argument inspection" (filtering based on the struct argument to a syscall), then some sort of caching mechanism is simply necessary to solve the problem. Otherwise there's a trivial TOCTOU and seccomp filtering for such syscalls would be rendered almost useless. -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH