On 2019-05-12, Linus Torvalds wrote: > On Sat, May 11, 2019 at 7:37 PM Andy Lutomirski wrote: > > I bet this will break something that already exists. An execveat() > > flag to turn off /proc/self/exe would do the trick, though. > > Thinking more about it, I suspect it is (once again) wrong to let the > thing that does the execve() control that bit. > > Generally, the less we allow people to affect the lifetime and > environment of a suid executable, the better off we are. > > But maybe we could limit /proc/*/exe to at least not honor suid'ness > of the target? Or does chrome/runc depend on that too? Speaking on the runc side, we don't depend on this. It's possible someone depends on this for fexecve(3) -- but as mentioned before in newer kernels glibc uses execve(AT_EMPTY_PATH). I would like to point out though that I'm a little bit cautious about /proc/self/exe-specific restrictions -- because a trivial way to get around them would be to just open it with O_PATH (and you end up with a /proc/self/fd/ which is equivalent). Unfortunately blocking setuid exec on all O_PATH descriptors would break even execve(AT_EMPTY_PATH) of setuid descriptors. The patches I mentioned (which Andy and I discussed off-list) would effectively make the magiclink modes in /proc/ affect how you can operate on the path (no write bit in the mode, cannot re-open it write). One aspect of this is how to handle O_PATH and in particular how do we handle an O_PATH re-open of an already-restricted magiclink. Maybe we could make it so that setuid is disallowed if you are dealing with an O_PATH fd which was a magiclink. Effectively, on O_PATH open you get an fmode_t saying FMODE_SETUID_EXEC_ALLOWED *but* if the path is a magiclink this fmode gets dropped and when the fd is given to execveat(AT_EMPTY_PATH) the fmode is checked and setuid-exec is not allowed. [I assume in this discussion "setuid" means "setuid + setcap", right?] -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH