From mboxrd@z Thu Jan 1 00:00:00 1970 From: Willy Tarreau Subject: Re: /proc/pid/fd && anon_inode_fops Date: Sun, 25 Aug 2013 08:50:39 +0200 Message-ID: <20130825065039.GB9299@1wt.eu> References: <20130822185317.GI31117@1wt.eu> <20130822201530.GL31117@1wt.eu> <20130824182939.GA23630@redhat.com> <20130824212432.GA9299@1wt.eu> <20130825052317.GZ27005@ZenIV.linux.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Oleg Nesterov , Andy Lutomirski , Linus Torvalds , "security@kernel.org" , Ingo Molnar , Linux Kernel Mailing List , Linux FS Devel , Brad Spengler To: Al Viro Return-path: Content-Disposition: inline In-Reply-To: <20130825052317.GZ27005@ZenIV.linux.org.uk> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Sun, Aug 25, 2013 at 06:23:17AM +0100, Al Viro wrote: > On Sat, Aug 24, 2013 at 11:24:32PM +0200, Willy Tarreau wrote: > > > I doubt it. It seems to me that most such entries are implemented > > for completeness while most valid uses only concern /proc/self/fd. > > Maybe if we had an option so that only /proc/self/fd would actually > > allow to access the fds while all /proc/pid/fd would only show what > > they map to, it would be a good step forward. > > How? The fundamental problem is not visibility of that stuff, it's > new opened file for the same object (Linux behaviour) vs. new descriptor > refering to the same opened file (*BSD and friends). We can't get > anon_... sanely reopened in the former semantics and they are very > visibly different for regular files, so switching to *BSD one is not > feasible - too high odds of userland breakage. The difference in > semantics, of course, is that on Linux opening /dev/stdin gives you > a descriptor with independent current IO position; on *BSD you get > a descriptor sharing the current IO position with stdin. IOW, it's > independent open() of the same file vs. dup(). > > We are really stuck with the current semantics here - switching to > *BSD one would not only mean serious surgery on descriptor handling > (it's one of the wartier areas in *BSD VFS, in large part because > of magic-open-really-a-dup kludges they have to do), it would change > a long-standing userland API that had been there for nearly 20 years > _and_ one that tends to be used in corner cases of hell knows how many > scripts. Thanks for explaining Al, that really helps me understand. However there's still a difference between /proc/pid called from the process itself (=/proc/self) and called from other processes that seems to suit the situation : willy@eeepc:~$ ls -la /tmp/bash -r-x--x--x 1 root users 916852 2013-08-25 08:19 /tmp/bash* willy@eeepc:~$ exec /tmp/bash -i willy@eeepc:~$ echo $$ 22678 willy@eeepc:~$ ls -la /proc/22678/fd ls: cannot open directory /proc/22678/fd: Permission denied willy@eeepc:~$ ls -la /proc/22678/exe ls: cannot read symbolic link /proc/22678/exe: Permission denied willy@eeepc:~$ cat /proc/22678/fd/0 cat: /proc/22678/fd/0: Permission denied but : willy@eeepc:~$ read < /proc/22678/fd/0 azerazerazer willy@eeepc:~$ echo $REPLY azerazerazer strace clearly shows that the process was allowed to inspect itself and the other ones were not : willy@eeepc:~$ strace -p 22678 open("/proc/22678/fd/0", O_RDONLY|O_LARGEFILE) = 3 willy@eeepc:~$ strace cat /proc/22678/fd/0 open("/proc/22678/fd/0", O_RDONLY|O_LARGEFILE) = -1 EACCES (Permission denied) It looks like this difference was introduced by this patch (which also fixes this issue we've been having for a very long time on 2.4 and early 2.6) : 8948e11 Allow access to /proc/$PID/fd after setuid() Thus I'm wondering if something like this could help, the idea would be that a with the appropriate mount option, a task could only look at its own descriptors unless it's running with privileges : static int proc_fd_permission(struct inode *inode, int mask, struct nameidata *nd) { if (task_pid(current) == proc_pid(inode)) return 0; if (capable(CAP_DAC_OVERRIDE)) return 0; if (proc_mounted_with_strict_option) return -EACCES; return generic_permission(inode, mask, NULL); } Thus it would not change the default behaviour except for people who would mount /proc with a special option. Thanks, Willy