On 05/05/2017 22:28, Eric W. Biederman wrote: > Al Viro writes: > >> On Thu, May 04, 2017 at 08:46:49PM -0700, Linus Torvalds wrote: >>> On Thu, May 4, 2017 at 7:47 PM, Jann Horn wrote: >>>> >>>> Thread 1 starts an AT_BENEATH path walk using an O_PATH fd >>>> pointing to /srv/www/example.org/foo; the path given to the syscall is >>>> "bar/../../../../etc/passwd". The path walk enters the "bar" directory. >>>> Thread 2 moves /srv/www/example.org/foo/bar to >>>> /srv/www/example.org/bar. >>>> Thread 1 processes the rest of the path ("../../../../etc/passwd"), never >>>> hitting /srv/www/example.org/foo in the process. >>>> >>>> I'm not really familiar with the VFS internals, but from a coarse look >>>> at the patch, it seems like it wouldn't block this? >>> >>> I think you're right. >>> >>> I guess it would be safe for the RCU case due to the sequence number >>> check, but not the non-RCU case. >> >> Yes and no... FWIW, to exclude that it would suffice to have >> mount --rbind /src/www/example.org/foo /srv/www/example.org/foo done first. >> Then this kind of race will end up with -ENOENT due to path_connected() >> logics in follow_dotdot_rcu()/follow_dotdot(). I'm not sure about the >> intended applications, though - is that thing supposed to be used along with >> some horror like seccomp, or...? > > As I recall the general idea is that if you have an application like a > tftp server or a web server that gets a path from a possibly dubious > source. Instead of implementing an error prone validation logic in > userspace you can use AT_BENEATH and be certain the path resolution > stays in bounds. > > As you can do stronger things as root this seems mostly targeted at > non-root applications. > > I seem to recall part of the idea was to sometimes pair this to seccomp > to be certain your application can't escape a sandbox. That plays to > seccomp limitations that it can inspect flags as they reside in > registers but seccomp can't follow pointers. Here is the code and tests from David Drysdale: https://github.com/google/capsicum-linux/commits/openat-v2 ...and the latest patch: https://lkml.org/lkml/2015/3/9/407 The O_BENEATH flag have also been discussed for FreeBSD to support Capsicum. > > Which all suggests that we would want something similar to is_subdir > when AT_BENEATH is specified that we check every time we follow .. > that would verify that on the same filesystem we stay below and > that we also stay on a mount that is below. mount --move has > all of the same challenges for enforcing you stay within bounds > as rename does. FYI, I'm working on a new LSM [1] to work around the limitations of seccomp-bpf, especially the pointer checks. The idea is to enable some filtering as seccomp-bpf can do but instead of checking at the syscall level, Landlock take advantage of LSM hooks. I had a first PoC of an eBPF function and map type to check if a file was beneath another [2]. I plan to create a new one that record a "snapshot" of the current mount tree into an eBPF map to be able to check if a file is beneath or a parent of another one. [1] https://lkml.kernel.org/r/20170328234650.19695-1-mic@digikod.net [2] https://lkml.kernel.org/r/20161026065654.19166-9-mic@digikod.net