* [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) @ 2013-08-21 19:14 Andy Lutomirski [not found] ` <CA+55aFxi-ps2f2M8xPhfbuQ0pToqupPrDsLi2+GPUK2sqdYfUw@mail.gmail.com> 2013-08-22 18:48 ` Linus Torvalds 0 siblings, 2 replies; 66+ messages in thread From: Andy Lutomirski @ 2013-08-21 19:14 UTC (permalink / raw) To: Linus Torvalds, security Cc: Ingo Molnar, Willy Tarreau, linux-kernel, oleg, Al Viro, Linux FS Devel, spender, Andy Lutomirski There have long been two ways to ask the kernel to create a new hardlink to the inode represented by an fd: linkat(..., AT_EMPTY_PATH) and AT_SYMLINK_FOLLOW on /proc/self/fd/N. The latter has no particular security restrictions, but the former required privilege until: commit bb2314b47996491bbc5add73633905c3120b6268 Author: Andy Lutomirski <luto@amacapital.net> Date: Thu Aug 1 21:44:31 2013 -0700 fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink Spender pointed out that there could be code that passes an fd to an untrusted, chrooted process, and allowing that process to flink the fd could be dangerous. (I'm not aware of any actual known attack.) So let's be careful for now: only allow linkat(..., AT_EMPTY_PATH) if the target is I_LINKABLE. I_LINKABLE is only set for inodes created by O_TMPFILE without O_EXCL, which is an explicit request for flinkability. Reported-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Andy Lutomirski <luto@amacapital.net> --- Changes from v1: Removed an unnecessary spin_lock. fs/namei.c | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 89a612e..9802d51 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3652,6 +3652,26 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de } /* + * flink is dangerous. For now, only allow it when specifically requested + * or when the caller is privileged. + * + * NB: This is not currently enforced for AT_SYMLINK_FOLLOW on procfs. + * Fixing that would be intrusive and could break something. + */ +static bool may_flink(const struct path *path) +{ + struct inode *inode = path->dentry->d_inode; + + /* + * This is racy: I_LINKABLE could be cleared between this check + * and the actual link operation. This is odd but not a security + * problem: the caller could get the same effect by flinking once + * and then using normal link(2) to create a second link. + */ + return (inode->i_state & I_LINKABLE) || capable(CAP_DAC_READ_SEARCH); +} + +/* * Hardlinks are often used in delicate situations. We avoid * security-related surprises by not following symlinks on the * newname. --KAB @@ -3670,10 +3690,7 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname, if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0) return -EINVAL; - /* - * Using empty names is equivalent to using AT_SYMLINK_FOLLOW - * on /proc/self/fd/<fd>. - */ + if (flags & AT_EMPTY_PATH) how = LOOKUP_EMPTY; @@ -3684,6 +3701,11 @@ retry: if (error) return error; + if ((flags & AT_EMPTY_PATH) && !may_flink(&old_path)) { + error = -EPERM; + goto out; + } + new_dentry = user_path_create(newdfd, newname, &new_path, (how & LOOKUP_REVAL)); error = PTR_ERR(new_dentry); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 66+ messages in thread
[parent not found: <CA+55aFxi-ps2f2M8xPhfbuQ0pToqupPrDsLi2+GPUK2sqdYfUw@mail.gmail.com>]
[parent not found: <CALCETrW7+LcexA6v6RQDKhni_yJAduOmiSDneCpq3v8sPDvwUQ@mail.gmail.com>]
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) [not found] ` <CALCETrW7+LcexA6v6RQDKhni_yJAduOmiSDneCpq3v8sPDvwUQ@mail.gmail.com> @ 2013-08-21 20:16 ` Willy Tarreau 0 siblings, 0 replies; 66+ messages in thread From: Willy Tarreau @ 2013-08-21 20:16 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Oleg Nesterov, Brad Spengler, Al Viro, Ingo Molnar, security, linux-kernel, Linux FS Devel On Wed, Aug 21, 2013 at 12:33:24PM -0700, Andy Lutomirski wrote: > On Wed, Aug 21, 2013 at 12:29 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > On my phone, sorry. Removed lists due to HTML crud.. OK no HTML from me here, so re-adding the 3 lists if that can help reaching an end faster on this stuff. > > If the issue is just reopening a read-only file descriptor, then why isn't > > this an issue for just regular openat? > > > > Iow, I get the feeling this isn't about flink, but about AT_EMPTY_PATH in > > general. That's my understanding as well since it's what replaces /proc/self/fd/ which is he difference in this case. However AT_EMPTY_PATH is currently limited to linkat(2), fstatat(2) and name_to_handle_at(2) from what I'm seeing, so it looks like only linkat would permit the calling process to gain more permissions by using it when it has no /proc access. > Dunno, since I still haven't come up with a real-world attack here. > Maybe someone chroots, runs something in the chroot, kills that thing, > and then expects that it hasn't somehow retained access to the file > (by, say, linking it). In my understanding, it does not necessarily need to klil that thing to get an issue. Let's imagine a completely made up example. A malware scanner on a proxy uses sandboxes to analyse suspicious contents. For this it holds a read-only mmap to the signature database, forks and chroots the scanner process to avoid any risk of getting fooled by the malicious code. If the malicious code manages to execute some code in the context of the sandboxed scanner that still has access to the signature db, it could use linkat(AT_EMPTH_PATH) to this fd to recreate a file in the chroot, and reopen it R/W to modify it and remove any trace of its signature. Sure it's a bit far-fetched, but I think that between this and the most trivial case, there might be a few real world cases. Also I still feel concerned by the ability for an unprivileged process to create device nodes in a chroot using this. I don't have any immediate impact in mind but I tend to design my chroots so that I'm sure there is no way to get a /dev there, and this would leave me a bit of doubt. > It may pay to consider some (opt-in) hardening of procfs's follow_link. I agree. A mount option would be great for some constrained environments. > Sometimes I wish that hardlinks had never been invented and that > people had gone straight to cow links. > > --Andy :-) Willy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-21 19:14 [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) Andy Lutomirski [not found] ` <CA+55aFxi-ps2f2M8xPhfbuQ0pToqupPrDsLi2+GPUK2sqdYfUw@mail.gmail.com> @ 2013-08-22 18:48 ` Linus Torvalds 2013-08-22 18:53 ` Willy Tarreau 1 sibling, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2013-08-22 18:48 UTC (permalink / raw) To: Andy Lutomirski Cc: security, Ingo Molnar, Willy Tarreau, Linux Kernel Mailing List, Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler On Wed, Aug 21, 2013 at 12:14 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > So let's be careful for now: only allow linkat(..., AT_EMPTY_PATH) > if the target is I_LINKABLE. So I really detest this just because it's such a special case. Now this is only useful for that one special case, and the thing very fundamentally checks that one special case in a place that is impossible to check for the /proc case, so the proc case remains totally separate. Which just bothers me. I think we could easily at least allow the "file->f_creds == current->creds" case (and yes, I literally mean comparing the pointers - not only is it cheaper, but it literally means "nothing odd has happened in between opening and the lookup"). And I'm wondering if we shouldn't actually do that at "path_init" time. Right now the code says: /* Caller must check execute permissions on the starting path component */ struct fd f = fdget_raw(dfd); and then uses the struct file mindlessly. I'm wondering if we should just do some validation in that place, and say: - for directories, we require exec permissions here - for everything else, we require that f->f_cred == current->cred check. I dunno. But that I_LINKABLE thing just bothers me. It screams "I'm hacky" to me. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-22 18:48 ` Linus Torvalds @ 2013-08-22 18:53 ` Willy Tarreau 2013-08-22 19:05 ` Andy Lutomirski 0 siblings, 1 reply; 66+ messages in thread From: Willy Tarreau @ 2013-08-22 18:53 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Lutomirski, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler On Thu, Aug 22, 2013 at 11:48:10AM -0700, Linus Torvalds wrote: > On Wed, Aug 21, 2013 at 12:14 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > > > So let's be careful for now: only allow linkat(..., AT_EMPTY_PATH) > > if the target is I_LINKABLE. > > So I really detest this just because it's such a special case. Now > this is only useful for that one special case, and the thing very > fundamentally checks that one special case in a place that is > impossible to check for the /proc case, so the proc case remains > totally separate. > > Which just bothers me. > > I think we could easily at least allow the "file->f_creds == > current->creds" case (and yes, I literally mean comparing the pointers > - not only is it cheaper, but it literally means "nothing odd has > happened in between opening and the lookup"). > > And I'm wondering if we shouldn't actually do that at "path_init" > time. Right now the code says: > > /* Caller must check execute permissions on the > starting path component */ > struct fd f = fdget_raw(dfd); > > and then uses the struct file mindlessly. > > I'm wondering if we should just do some validation in that place, and say: > > - for directories, we require exec permissions here > - for everything else, we require that f->f_cred == current->cred check. > > I dunno. But that I_LINKABLE thing just bothers me. It screams "I'm > hacky" to me. I agreed, simply because the condition here is different from the one in /proc. I have read some code last evening to try to understand how /proc/pid/fd entries were granted access to various processes, because I would love to see the same condition being used in both places. Unfortunately, it's beyond my skills, and I stopped after my random attempts gave me some panics. Willy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-22 18:53 ` Willy Tarreau @ 2013-08-22 19:05 ` Andy Lutomirski 2013-08-22 19:23 ` Linus Torvalds 2013-08-22 19:39 ` [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) Willy Tarreau 0 siblings, 2 replies; 66+ messages in thread From: Andy Lutomirski @ 2013-08-22 19:05 UTC (permalink / raw) To: Willy Tarreau Cc: Linus Torvalds, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler On Thu, Aug 22, 2013 at 11:53 AM, Willy Tarreau <w@1wt.eu> wrote: > On Thu, Aug 22, 2013 at 11:48:10AM -0700, Linus Torvalds wrote: >> On Wed, Aug 21, 2013 at 12:14 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> > >> > So let's be careful for now: only allow linkat(..., AT_EMPTY_PATH) >> > if the target is I_LINKABLE. >> >> So I really detest this just because it's such a special case. Now >> this is only useful for that one special case, and the thing very >> fundamentally checks that one special case in a place that is >> impossible to check for the /proc case, so the proc case remains >> totally separate. >> >> Which just bothers me. Agreed. It sucks. See below (the bottom). >> >> I think we could easily at least allow the "file->f_creds == >> current->creds" case (and yes, I literally mean comparing the pointers >> - not only is it cheaper, but it literally means "nothing odd has >> happened in between opening and the lookup"). I thought about that, but it won't catch chroot. (Admittedly, chroot without changing creds is asking for trouble.) >> >> And I'm wondering if we shouldn't actually do that at "path_init" >> time. Right now the code says: >> >> /* Caller must check execute permissions on the >> starting path component */ >> struct fd f = fdget_raw(dfd); >> >> and then uses the struct file mindlessly. >> >> I'm wondering if we should just do some validation in that place, and say: >> >> - for directories, we require exec permissions here >> - for everything else, we require that f->f_cred == current->cred check. Does this work for the procfs case? As far as I understand it (which isn't saying much), it goes through the symlink-following path. >> >> I dunno. But that I_LINKABLE thing just bothers me. It screams "I'm >> hacky" to me. > > I agreed, simply because the condition here is different from the one in /proc. > > I have read some code last evening to try to understand how /proc/pid/fd > entries were granted access to various processes, because I would love to > see the same condition being used in both places. Unfortunately, it's beyond > my skills, and I stopped after my random attempts gave me some panics. What if we added another field to struct nameidata that's indicates what restrictions need to be enforced when following magical symlinks and then enforcing them when nd_jump_link gets used. (There are only two of these, both in procfs.) Then open could check that the original fd was opened for a superset of the requested permissions (or that the caller has CAP_DAC_OVERRIDE), linkat could check whatever it feels like checking, etc. This would allow all of these issues to be fixed for real (controlled by sysctl, presumably). --Andy > > Willy > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-22 19:05 ` Andy Lutomirski @ 2013-08-22 19:23 ` Linus Torvalds 2013-08-22 20:10 ` Andy Lutomirski 2013-08-22 19:39 ` [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) Willy Tarreau 1 sibling, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2013-08-22 19:23 UTC (permalink / raw) To: Andy Lutomirski Cc: Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler On Thu, Aug 22, 2013 at 12:05 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > Does this work for the procfs case? As far as I understand it (which > isn't saying much), it goes through the symlink-following path. Right. The /proc case is still separate, and we really should do something about that too. But again, I don't think I_LINKABLE is the thing to use there either. We probably should tighten up the magic /proc follow-link a lot. > What if we added another field to struct nameidata that's indicates > what restrictions need to be enforced when following magical symlinks > and then enforcing them when nd_jump_link gets used. (There are only > two of these, both in procfs.) Yes, I think that might be just the kind of thing to do. Except some tightening could well be quite regardless of any extra flags. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-22 19:23 ` Linus Torvalds @ 2013-08-22 20:10 ` Andy Lutomirski 2013-08-22 20:15 ` Willy Tarreau 0 siblings, 1 reply; 66+ messages in thread From: Andy Lutomirski @ 2013-08-22 20:10 UTC (permalink / raw) To: Linus Torvalds Cc: Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler On Thu, Aug 22, 2013 at 12:23 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, Aug 22, 2013 at 12:05 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> >> Does this work for the procfs case? As far as I understand it (which >> isn't saying much), it goes through the symlink-following path. > > Right. The /proc case is still separate, and we really should do > something about that too. But again, I don't think I_LINKABLE is the > thing to use there either. We probably should tighten up the magic > /proc follow-link a lot. > >> What if we added another field to struct nameidata that's indicates >> what restrictions need to be enforced when following magical symlinks >> and then enforcing them when nd_jump_link gets used. (There are only >> two of these, both in procfs.) > > Yes, I think that might be just the kind of thing to do. Except some > tightening could well be quite regardless of any extra flags. > What's the point of nd_jump_link anyway? The only way I can think of for a magic symlink in /proc to point to another symlink is to open a symlink with O_PATH | O_NOFOLLOW. Actually trying to use the resulting link in /proc results in -ELOOP. (Even just trying to open a normal symlink with O_NOFOLLOW and without O_PATH results in -ELOOP.) It might be a lot simpler to rig up nd_jump_link to immediately terminate lookup and let the callers (or the outer level of lookup) deal with it. That way the checks could be (more) easily unified with AT_EMPTY_PATH. --Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-22 20:10 ` Andy Lutomirski @ 2013-08-22 20:15 ` Willy Tarreau 2013-08-22 20:22 ` Andy Lutomirski 2013-08-24 18:29 ` /proc/pid/fd && anon_inode_fops Oleg Nesterov 0 siblings, 2 replies; 66+ messages in thread From: Willy Tarreau @ 2013-08-22 20:15 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler On Thu, Aug 22, 2013 at 01:10:43PM -0700, Andy Lutomirski wrote: > What's the point of nd_jump_link anyway? The only way I can think of > for a magic symlink in /proc to point to another symlink is to open a > symlink with O_PATH | O_NOFOLLOW. Actually trying to use the > resulting link in /proc results in -ELOOP. (Even just trying to open > a normal symlink with O_NOFOLLOW and without O_PATH results in > -ELOOP.) It's not only that, it also supports sockets and pipes that you can access via /proc/pid/fd and not via a real symlink which would try to open eg "pipe:[23456]" instead of the real file. So you can't get rid of it without breaking existing apps (starting with your shell for which /dev/stdin is a link to /proc/self/fd/0 for example). Willy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-22 20:15 ` Willy Tarreau @ 2013-08-22 20:22 ` Andy Lutomirski 2013-08-22 20:44 ` Linus Torvalds 2013-08-24 18:29 ` /proc/pid/fd && anon_inode_fops Oleg Nesterov 1 sibling, 1 reply; 66+ messages in thread From: Andy Lutomirski @ 2013-08-22 20:22 UTC (permalink / raw) To: Willy Tarreau Cc: Linus Torvalds, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler On Thu, Aug 22, 2013 at 1:15 PM, Willy Tarreau <w@1wt.eu> wrote: > On Thu, Aug 22, 2013 at 01:10:43PM -0700, Andy Lutomirski wrote: >> What's the point of nd_jump_link anyway? The only way I can think of >> for a magic symlink in /proc to point to another symlink is to open a >> symlink with O_PATH | O_NOFOLLOW. Actually trying to use the >> resulting link in /proc results in -ELOOP. (Even just trying to open >> a normal symlink with O_NOFOLLOW and without O_PATH results in >> -ELOOP.) > > It's not only that, it also supports sockets and pipes that you can access > via /proc/pid/fd and not via a real symlink which would try to open eg > "pipe:[23456]" instead of the real file. So you can't get rid of it > without breaking existing apps (starting with your shell for which > /dev/stdin is a link to /proc/self/fd/0 for example). > Let me rephrase that: why do we allow these types of lookup to recurse like normal symlinks? I'm proposing that these links immediately terminate lookup and return back to user_path_at_empty, which can, in turn, do an extra check to see if the inode that was found is one of these magic inodes (or checks to see if nd_jump_link was called). user_path_at_empty could then enforce LOOKUP_EMPTY-list restrictions (or the caller could). --Andy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-22 20:22 ` Andy Lutomirski @ 2013-08-22 20:44 ` Linus Torvalds 2013-08-22 20:48 ` Andy Lutomirski 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2013-08-22 20:44 UTC (permalink / raw) To: Andy Lutomirski Cc: Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler On Thu, Aug 22, 2013 at 1:22 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > Let me rephrase that: why do we allow these types of lookup to recurse > like normal symlinks? I'm proposing that these links immediately > terminate lookup [..] It can nest *inside* a regular symlink. So there should not be any recursion of pure /proc style symlink jumps, but they live within the nesting that is normal symlink behavior. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-22 20:44 ` Linus Torvalds @ 2013-08-22 20:48 ` Andy Lutomirski 2013-08-22 20:54 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Andy Lutomirski @ 2013-08-22 20:48 UTC (permalink / raw) To: Linus Torvalds Cc: Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler On Thu, Aug 22, 2013 at 1:44 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, Aug 22, 2013 at 1:22 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> >> Let me rephrase that: why do we allow these types of lookup to recurse >> like normal symlinks? I'm proposing that these links immediately >> terminate lookup [..] > > It can nest *inside* a regular symlink. So there should not be any > recursion of pure /proc style symlink jumps, but they live within the > nesting that is normal symlink behavior. > Sure. But aren't they always last? With the current code structure, trying to enforce some kind of security restriction in the middle of lookup seems really unpleasant. But if the final step is to verify one of these links and then either reject or end lookup right there, it'll be (IMO) much easier to understand. --Andy > Linus -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-22 20:48 ` Andy Lutomirski @ 2013-08-22 20:54 ` Linus Torvalds 2013-08-22 20:58 ` Andy Lutomirski 2013-08-23 1:07 ` Al Viro 0 siblings, 2 replies; 66+ messages in thread From: Linus Torvalds @ 2013-08-22 20:54 UTC (permalink / raw) To: Andy Lutomirski Cc: Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler On Thu, Aug 22, 2013 at 1:48 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > Sure. But aren't they always last? What do you mean? I'd say that the /proc lookup is always *innermost*. Which means that it certainly cannot bail out, since there are many levels of nesting outside of it. > With the current code structure, trying to enforce some kind of > security restriction in the middle of lookup seems really unpleasant. If it's conditional (ie "linkat behaves differently from openat"), it certainly means that we'd have to pass in that info in annoying ways. But having unconditional rules like "you can only follow the proc link if you have CAP_SEARCH _or_ you match the file->f_cred" doesn't sound too painful. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-22 20:54 ` Linus Torvalds @ 2013-08-22 20:58 ` Andy Lutomirski 2013-08-23 1:07 ` Al Viro 1 sibling, 0 replies; 66+ messages in thread From: Andy Lutomirski @ 2013-08-22 20:58 UTC (permalink / raw) To: Linus Torvalds Cc: Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler On Thu, Aug 22, 2013 at 1:54 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, Aug 22, 2013 at 1:48 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> >> Sure. But aren't they always last? > > What do you mean? I'd say that the /proc lookup is always *innermost*. > Which means that it certainly cannot bail out, since there are many > levels of nesting outside of it. I was thinking iteratively, so I said "last" instead of "innermost". > >> With the current code structure, trying to enforce some kind of >> security restriction in the middle of lookup seems really unpleasant. > > If it's conditional (ie "linkat behaves differently from openat"), it > certainly means that we'd have to pass in that info in annoying ways. > Hmm. That depends on whether things like I_LINKABLE should be considered. We might also want to ban open("/proc/self/fd/3", O_RDWR) if !CAP_DAC_READ_SEARCH and fd wasn't opened with O_RDWR. Both of those will require passing information in or out. I'll see how nasty this ends up being. (This may take awhile -- I'm not at all familiar with this code, and this is at best a minor side project for me.) --Andy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-22 20:54 ` Linus Torvalds 2013-08-22 20:58 ` Andy Lutomirski @ 2013-08-23 1:07 ` Al Viro 2013-08-25 3:37 ` Al Viro 1 sibling, 1 reply; 66+ messages in thread From: Al Viro @ 2013-08-23 1:07 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Lutomirski, Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Thu, Aug 22, 2013 at 01:54:15PM -0700, Linus Torvalds wrote: > On Thu, Aug 22, 2013 at 1:48 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > > > Sure. But aren't they always last? > > What do you mean? I'd say that the /proc lookup is always *innermost*. > Which means that it certainly cannot bail out, since there are many > levels of nesting outside of it. > > > With the current code structure, trying to enforce some kind of > > security restriction in the middle of lookup seems really unpleasant. > > If it's conditional (ie "linkat behaves differently from openat"), it > certainly means that we'd have to pass in that info in annoying ways. Nope. All we need to pass is one more LOOKUP_... Add if (unlikely(nd->last_type == LAST_BIND)) { if ((nd->flags & LOOKUP_BLAH) && !may_flink(...)) { terminate_walk(nd); return -EINVAL; } } in the beginning of lookup_last() and pass LOOKUP_BLAH in flags when linkat() calls user_path_at(). That will affect *only* the terminal symlinks and cost nothing in all normal cases. The same check can bloody well go into path_init() - take if (*name) { if (!can_lookup(dentry->d_inode)) { fdput(f); return -ENOTDIR; } } in there and slap else { if ((flags & LOOKUP_BLAH) && !may_flink(...)) { fdput(f); return -EINVAL; } } after it. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-23 1:07 ` Al Viro @ 2013-08-25 3:37 ` Al Viro 2013-08-25 7:26 ` Andy Lutomirski 2013-08-25 19:57 ` Linus Torvalds 0 siblings, 2 replies; 66+ messages in thread From: Al Viro @ 2013-08-25 3:37 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Lutomirski, Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Fri, Aug 23, 2013 at 02:07:26AM +0100, Al Viro wrote: > On Thu, Aug 22, 2013 at 01:54:15PM -0700, Linus Torvalds wrote: > > On Thu, Aug 22, 2013 at 1:48 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > > > > > Sure. But aren't they always last? > > > > What do you mean? I'd say that the /proc lookup is always *innermost*. > > Which means that it certainly cannot bail out, since there are many > > levels of nesting outside of it. > > > > > With the current code structure, trying to enforce some kind of > > > security restriction in the middle of lookup seems really unpleasant. > > > > If it's conditional (ie "linkat behaves differently from openat"), it > > certainly means that we'd have to pass in that info in annoying ways. > > Nope. All we need to pass is one more LOOKUP_... Add > if (unlikely(nd->last_type == LAST_BIND)) { > if ((nd->flags & LOOKUP_BLAH) && !may_flink(...)) { > terminate_walk(nd); > return -EINVAL; > } > } > in the beginning of lookup_last() and pass LOOKUP_BLAH in flags when > linkat() calls user_path_at(). That will affect *only* the terminal > symlinks and cost nothing in all normal cases. The same check can > bloody well go into path_init() - take > if (*name) { > if (!can_lookup(dentry->d_inode)) { > fdput(f); > return -ENOTDIR; > } > } > in there and slap > else { > if ((flags & LOOKUP_BLAH) && !may_flink(...)) { > fdput(f); > return -EINVAL; > } > } > after it. OK, let me summarize these threads so far: * restrictions for flink() are needed and they'd better be consistent for AT_SYMLINK_FOLLOW + /proc/<pid>/fd/<n> and simply passing the descriptor as dfd. * CAP_DAC_OVERRIDE is sufficient; so should be O_TMPFILE used to open that sucker. * lookup_last() is the natural place for catching the case of following a trailing procfs symlink - it can be done very cheaply there. FWIW, I'm tempted to try the following trick: * introduce FMODE_FLINK in file->f_mode; O_TMPFILE would set it, unless O_EXCL is present. * introduce LOOKUP_LINK, to be passed by sys_linkat() when resolving the target. * have path_init() called with empty pathname and LOOKUP_LINK in flags do checks for FMODE_FLINK or CAP_DAC_OVERRIDE * have ->proc_get_link() report whether the target is linkable (either as bool * or by returning 1 instead of 0). After the call of ->proc_get_link() check that and set nd->last_type to LAST_BIND_LINKABLE. Note that *all* places looking at ->last_type treat LAST_BIND as "none of the above" - we never compare with it, so splitting it in two wouldn't break anything. * have lookup_last() check if LOOKUP_LINK is present and ->last_type is LAST_BIND; fail unless we have CAP_DAC_OVERRIDE. AFAICS, it gets more or less sane behaviour; additionally, it makes possible to introduce explicit "I want that descriptor to be suitable for flink()" open(2) flag - that would require teaching do_last() about LOOKUP_LINK, making it check for CAP_DAC_OVERRIDE if it sees LAST_BIND / LOOKUP_LINK, same as lookup_last() above (we obviously want to avoid the possibility to take a non-flinkable descriptor and use it to reopen the sucker in flinkable way). Alternatively we can revert "fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink" for the time being. flink() is certainly an awful mess and I seriously regret touching it ;-/ Comments? Hell, maybe somebody even has printable ones - stranger things have happened... ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-25 3:37 ` Al Viro @ 2013-08-25 7:26 ` Andy Lutomirski 2013-08-25 14:23 ` Al Viro 2013-08-25 19:57 ` Linus Torvalds 1 sibling, 1 reply; 66+ messages in thread From: Andy Lutomirski @ 2013-08-25 7:26 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Sat, Aug 24, 2013 at 8:37 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Fri, Aug 23, 2013 at 02:07:26AM +0100, Al Viro wrote: >> On Thu, Aug 22, 2013 at 01:54:15PM -0700, Linus Torvalds wrote: >> > On Thu, Aug 22, 2013 at 1:48 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> > > >> > > Sure. But aren't they always last? >> > >> > What do you mean? I'd say that the /proc lookup is always *innermost*. >> > Which means that it certainly cannot bail out, since there are many >> > levels of nesting outside of it. >> > >> > > With the current code structure, trying to enforce some kind of >> > > security restriction in the middle of lookup seems really unpleasant. >> > >> > If it's conditional (ie "linkat behaves differently from openat"), it >> > certainly means that we'd have to pass in that info in annoying ways. >> >> Nope. All we need to pass is one more LOOKUP_... Add >> if (unlikely(nd->last_type == LAST_BIND)) { >> if ((nd->flags & LOOKUP_BLAH) && !may_flink(...)) { >> terminate_walk(nd); >> return -EINVAL; >> } >> } >> in the beginning of lookup_last() and pass LOOKUP_BLAH in flags when >> linkat() calls user_path_at(). That will affect *only* the terminal >> symlinks and cost nothing in all normal cases. The same check can >> bloody well go into path_init() - take >> if (*name) { >> if (!can_lookup(dentry->d_inode)) { >> fdput(f); >> return -ENOTDIR; >> } >> } >> in there and slap >> else { >> if ((flags & LOOKUP_BLAH) && !may_flink(...)) { >> fdput(f); >> return -EINVAL; >> } >> } >> after it. > > OK, let me summarize these threads so far: > * restrictions for flink() are needed and they'd better be > consistent for AT_SYMLINK_FOLLOW + /proc/<pid>/fd/<n> and simply > passing the descriptor as dfd. > * CAP_DAC_OVERRIDE is sufficient; so should be O_TMPFILE used > to open that sucker. > * lookup_last() is the natural place for catching the case > of following a trailing procfs symlink - it can be done very cheaply > there. > > FWIW, I'm tempted to try the following trick: > * introduce FMODE_FLINK in file->f_mode; O_TMPFILE would set it, > unless O_EXCL is present. > * introduce LOOKUP_LINK, to be passed by sys_linkat() when > resolving the target. > * have path_init() called with empty pathname and LOOKUP_LINK in > flags do checks for FMODE_FLINK or CAP_DAC_OVERRIDE > * have ->proc_get_link() report whether the target is linkable > (either as bool * or by returning 1 instead of 0). After the call of > ->proc_get_link() check that and set nd->last_type to LAST_BIND_LINKABLE. > Note that *all* places looking at ->last_type treat LAST_BIND as "none > of the above" - we never compare with it, so splitting it in two wouldn't > break anything. > * have lookup_last() check if LOOKUP_LINK is present and ->last_type > is LAST_BIND; fail unless we have CAP_DAC_OVERRIDE. > > AFAICS, it gets more or less sane behaviour; additionally, it makes possible > to introduce explicit "I want that descriptor to be suitable for flink()" > open(2) flag - that would require teaching do_last() about LOOKUP_LINK, > making it check for CAP_DAC_OVERRIDE if it sees LAST_BIND / LOOKUP_LINK, > same as lookup_last() above (we obviously want to avoid the possibility > to take a non-flinkable descriptor and use it to reopen the sucker in > flinkable way). > > Alternatively we can revert "fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) > aka flink" for the time being. flink() is certainly an awful mess and I > seriously regret touching it ;-/ > > Comments? Hell, maybe somebody even has printable ones - stranger things > have happened... I think this is more screwed up than just flink and open. For example: $ echo 'WTF' >test $ truncate -s 1 /proc/self/fd/3 3<test $ cat test W$ IMO that should have failed. In an ideal world (I think) ffrob(N), frobat(N, "", AT_EMPTY_PATH), and frobat(AT_FDCWD, "/proc/self/fd/N) should generally do the same thing. This includes flink (even though the flink variant doesn't exist). open is a bit special. Of course, we're rather inconsistent with AT_EMPTY_PATH. utimensat accepts a null filename instead of a blank filename, and it doesn't appear to check the file mode at all. Sigh. --Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-25 7:26 ` Andy Lutomirski @ 2013-08-25 14:23 ` Al Viro 2013-08-25 17:04 ` Andy Lutomirski 0 siblings, 1 reply; 66+ messages in thread From: Al Viro @ 2013-08-25 14:23 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Sun, Aug 25, 2013 at 12:26:34AM -0700, Andy Lutomirski wrote: > I think this is more screwed up than just flink and open. For example: > > $ echo 'WTF' >test > $ truncate -s 1 /proc/self/fd/3 3<test > $ cat test > W$ > > IMO that should have failed. Why? truncate() always follows links, so what's the problem with that one? That you get checks of truncate() and not ftruncate()? > In an ideal world (I think) ffrob(N), frobat(N, "", AT_EMPTY_PATH), > and frobat(AT_FDCWD, "/proc/self/fd/N) should generally do the same > thing. What about the cases where frob() and ffrob() check for different things? ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-25 14:23 ` Al Viro @ 2013-08-25 17:04 ` Andy Lutomirski 0 siblings, 0 replies; 66+ messages in thread From: Andy Lutomirski @ 2013-08-25 17:04 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Sun, Aug 25, 2013 at 7:23 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Sun, Aug 25, 2013 at 12:26:34AM -0700, Andy Lutomirski wrote: > >> I think this is more screwed up than just flink and open. For example: >> >> $ echo 'WTF' >test >> $ truncate -s 1 /proc/self/fd/3 3<test >> $ cat test >> W$ >> >> IMO that should have failed. > > Why? truncate() always follows links, so what's the problem with that > one? That you get checks of truncate() and not ftruncate()? The same as the issue with all these other things: the fd might have survived a privilege drop or been passed through exec or SCM_RIGHTS, and the holder of the fd might not be able to see the inode. For example, suppose a daemon creates a file with O_TMPFILE | O_RDWR. Then it does open("/proc/self/fd/N", O_RDONLY) to get a read-only fd for the same temporary file. It passes that fd to something else. It's rather surprising that the recipient would be able to truncate it using /proc/self/fd when it couldn't ftruncate it due to its being O_RDONLY. (Of course, this can be worked around by setting the mode to 0644, but I doubt that everyone will get that right.) > >> In an ideal world (I think) ffrob(N), frobat(N, "", AT_EMPTY_PATH), >> and frobat(AT_FDCWD, "/proc/self/fd/N) should generally do the same >> thing. > > What about the cases where frob() and ffrob() check for different things? I'll go out on a limb and say that every single case where ffrob has a check that frob("/proc/self/fd/N") doesn't is wrong. Maybe we're stuck with them for backwards compatibility, but that doesn't mean they're good ideas. --Andy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-25 3:37 ` Al Viro 2013-08-25 7:26 ` Andy Lutomirski @ 2013-08-25 19:57 ` Linus Torvalds 2013-08-25 20:06 ` Al Viro 1 sibling, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2013-08-25 19:57 UTC (permalink / raw) To: Al Viro Cc: Andy Lutomirski, Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Sat, Aug 24, 2013 at 8:37 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > FWIW, I'm tempted to try the following trick: > * introduce FMODE_FLINK in file->f_mode; O_TMPFILE would set it, > unless O_EXCL is present. > * introduce LOOKUP_LINK, to be passed by sys_linkat() when > resolving the target. [ .. snipped .. ] Yes. I think we should do this, but I think we should also look at what _other_ LOOKUP_xyz we should do for the /proc case. For the read-only fd case, we should have a LOOKUP_WRITE flag, and return -EPERM if an operation is a write, and we terminate in that LAST_BIND case. That would catch the truncate() case, but also the "open a read-only fd for write or O_TRUNC" case. Anything else? What other path operations matter that follow links than truncate(), link() and open()? Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-25 19:57 ` Linus Torvalds @ 2013-08-25 20:06 ` Al Viro 2013-08-25 20:23 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Al Viro @ 2013-08-25 20:06 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Lutomirski, Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Sun, Aug 25, 2013 at 12:57:25PM -0700, Linus Torvalds wrote: > Yes. I think we should do this, but I think we should also look at > what _other_ LOOKUP_xyz we should do for the /proc case. > > For the read-only fd case, we should have a LOOKUP_WRITE flag, and > return -EPERM if an operation is a write, and we terminate in that > LAST_BIND case. > > That would catch the truncate() case, but also the "open a read-only > fd for write or O_TRUNC" case. > > Anything else? What other path operations matter that follow links > than truncate(), link() and open()? Timestamp updates, chmod/chown, xattr mess... ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-25 20:06 ` Al Viro @ 2013-08-25 20:23 ` Linus Torvalds 2013-08-26 17:37 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2013-08-25 20:23 UTC (permalink / raw) To: Al Viro Cc: Andy Lutomirski, Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Sun, Aug 25, 2013 at 1:06 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Timestamp updates, chmod/chown, xattr mess... Ok, so that's just too much details. So I'll just go back to square one, and wonder if we could/should just make the rule be that in order to be in that LAST_BIND case, you really have to have f_cred match your own credentials. Or have CAP_SEARCH. And just not have any new LOOKUP_xyz flags at all. No special cases, no different semantics for different ops, just check f_cred. Because if you had the permissions to do the original open (ie f_cred matches your current credentials), then that shows that you originally had all the pathname permissions, and you are still the same person. And yeah, you may have opened in for reading (so the file is technically read-only), but obviously we're re-doing all the inode permission checks anyway, so the only thing /proc/<pid>/fd/<n> really gave you was the path traversal. So we shouldn't bother with "the file descriptor is only readable", because that is simply *irrelevant*. So it means that the case Andi brought up (truncating or open-for-write a fd that we only had open for reading) would continue to be allowed, because while it "sounds odd", there is no actual problem. And CAP_SEARCH is very much about that path lookup again. So it's consistent with the notion that "ok, you may do odd things to file descriptors through /proc, but we check that you cannot avoid the pathname lookup rules". And then we do exactly the same to flink(). So then we're all consistent again. Not the consistency Andy worried about, but that's the consistency that was was the security worries with flink are all about. Because the issues with the "use the file descriptor, not the path to the file descriptor" really are *not* about the endpoint itself (since we will re-do the permission check for that particular inode anyway), but about the path leading up to that end-point. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-25 20:23 ` Linus Torvalds @ 2013-08-26 17:37 ` Linus Torvalds 2013-08-26 18:07 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2013-08-26 17:37 UTC (permalink / raw) To: Al Viro Cc: Andy Lutomirski, Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Sun, Aug 25, 2013 at 1:23 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So I'll just go back to square one, and wonder if we could/should just > make the rule be that in order to be in that LAST_BIND case, you > really have to have f_cred match your own credentials. Or have > CAP_SEARCH. Nope. That doesn't work. It breaks the chrome sandboxing. Right now, following a /proc fd symlink requires ptrace access to the process. Which is actually pretty strict, and makes sense. But it does mean that there are other capabilities than CAP_DAC_READ_SEARCH at play. I'm playing with a patch that then in addition to the ptrace check *also* requires that the file was opened with the same credentials as the follower _or_ the task being followed. I'll see if that works out. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-26 17:37 ` Linus Torvalds @ 2013-08-26 18:07 ` Linus Torvalds 2013-08-26 18:11 ` Andy Lutomirski 2013-08-27 19:16 ` [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate Andy Lutomirski 0 siblings, 2 replies; 66+ messages in thread From: Linus Torvalds @ 2013-08-26 18:07 UTC (permalink / raw) To: Al Viro Cc: Andy Lutomirski, Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler [-- Attachment #1: Type: text/plain, Size: 640 bytes --] On Mon, Aug 26, 2013 at 10:37 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I'm playing with a patch that then in addition to the ptrace check > *also* requires that the file was opened with the same credentials as > the follower _or_ the task being followed. I'll see if that works out. Chrome sandboxing still _works_ with the attached patch, but there's a few audit messages about the sandbox being denied dac_read_search. And I'm really not very happy with this anyway. So I'm going to send it out (using email to archive things and see if anybody has any comments), and then forget about it. Linus [-- Attachment #2: patch.diff --] [-- Type: application/octet-stream, Size: 1592 bytes --] fs/proc/fd.c | 34 +++++++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 5 deletions(-) diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 0ff80f9b930f..7a97dfbb4217 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -137,6 +137,33 @@ static const struct dentry_operations tid_fd_dentry_operations = { .d_delete = pid_delete_dentry, }; +static inline int match_cred(const struct cred *a, const struct cred *b) +{ + return a->fsuid == b->fsuid && a->fsgid == b->fsgid; +} + +/* + * To get here, we have ptrace() access to the task in question. + * + * But we still require that the file itself was either opened by that + * task or by ourselves, or that we have path search capability. + */ +static int get_fd_path(struct task_struct *task, struct file *file, struct path *path) +{ + if (!match_cred(file->f_cred, current_cred())) { + int match; + + rcu_read_lock(); + match = match_cred(file->f_cred, __task_cred(task)); + rcu_read_unlock(); + if (!match && !capable(CAP_DAC_READ_SEARCH)) + return -EACCES; + } + *path = file->f_path; + path_get(&file->f_path); + return 0; +} + static int proc_fd_link(struct dentry *dentry, struct path *path) { struct files_struct *files = NULL; @@ -155,11 +182,8 @@ static int proc_fd_link(struct dentry *dentry, struct path *path) spin_lock(&files->file_lock); fd_file = fcheck_files(files, fd); - if (fd_file) { - *path = fd_file->f_path; - path_get(&fd_file->f_path); - ret = 0; - } + if (fd_file) + ret = get_fd_path(task, fd_file, path); spin_unlock(&files->file_lock); put_files_struct(files); } ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-26 18:07 ` Linus Torvalds @ 2013-08-26 18:11 ` Andy Lutomirski 2013-08-27 19:16 ` [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate Andy Lutomirski 1 sibling, 0 replies; 66+ messages in thread From: Andy Lutomirski @ 2013-08-26 18:11 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Mon, Aug 26, 2013 at 11:07 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Aug 26, 2013 at 10:37 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> I'm playing with a patch that then in addition to the ptrace check >> *also* requires that the file was opened with the same credentials as >> the follower _or_ the task being followed. I'll see if that works out. > > Chrome sandboxing still _works_ with the attached patch, but there's a > few audit messages about the sandbox being denied dac_read_search. And > I'm really not very happy with this anyway. > > So I'm going to send it out (using email to archive things and see if > anybody has any comments), and then forget about it. > I'm still not thrilled with this approach. I'll see if I can find a clean way to use fmode later today. --Andy > Linus -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 66+ messages in thread
* [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate 2013-08-26 18:07 ` Linus Torvalds 2013-08-26 18:11 ` Andy Lutomirski @ 2013-08-27 19:16 ` Andy Lutomirski 2013-08-27 19:32 ` Linus Torvalds 2013-08-27 23:08 ` Al Viro 1 sibling, 2 replies; 66+ messages in thread From: Andy Lutomirski @ 2013-08-27 19:16 UTC (permalink / raw) To: Al Viro, Linus Torvalds Cc: Willy Tarreau, Ingo Molnar, security, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler, Andy Lutomirski This is an experiment to see if we can get nice semantics for all syscalls that either follow symlinks or allow AT_EMPTY_PATH without jumping through enormous hoops. This converts truncate (although you can't tell using truncate from coreutils, because it actually uses open + ftruncate). The basic idea is that there's a new helper function user_file_or_path_at. It takes an fd and a path and, depending on flags, the emptiness of the path, and whether path is a magic /proc symlink (or a symlink to a magic /proc/symlink), it returns either a struct path or a struct file *. To make this genuinely useful, something similar will need to happen for open/openat. The lifetime rules for the contents of struct nameidata seem weird. If all users first memset the thing to zero and called nd_put or something when they were done, this would be cleaner. But they don't. Signed-off-by: Andy Lutomirski <luto@amacapital.net> --- fs/namei.c | 73 ++++++++++++++++++++++++++++++++++++++++ fs/open.c | 93 +++++++++++++++++++++++++++++---------------------- fs/proc/base.c | 44 +++++++++++++----------- fs/proc/fd.c | 10 +++--- fs/proc/internal.h | 3 +- include/linux/namei.h | 14 ++++++++ 6 files changed, 171 insertions(+), 66 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 89a612e..0d905ac 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -691,6 +691,28 @@ void nd_jump_link(struct nameidata *nd, struct path *path) nd->flags |= LOOKUP_JUMPED; } +/* + * Helper for /proc/pid/fd/N-style magic links. This is like nd_jump_link + * except that it will apply additional security checks. Caller must have + * taken a reference to file beforehand. + */ +void nd_jump_link_file(struct nameidata *nd, struct file *file) +{ + path_put(&nd->path); + + nd->path = file->f_path; + path_get(&nd->path); + if (nd->flags & LOOKUP_FILE) { + if (nd->last_symlink_file) + fput(nd->last_symlink_file); + nd->last_symlink_file = file; + } else { + fput(file); + } + nd->inode = nd->path.dentry->d_inode; + nd->flags |= LOOKUP_JUMPED; +} + static inline void put_link(struct nameidata *nd, struct path *link, void *cookie) { struct inode *inode = link->dentry->d_inode; @@ -842,6 +864,10 @@ follow_link(struct path *link, struct nameidata *nd, void **p) goto out_put_nd_path; nd->last_type = LAST_BIND; + if (unlikely((nd->flags & LOOKUP_FILE) && nd->last_symlink_file)) { + fput(nd->last_symlink_file); + nd->last_symlink_file = NULL; + } *p = dentry->d_inode->i_op->follow_link(dentry, nd); error = PTR_ERR(*p); if (IS_ERR(*p)) @@ -2156,6 +2182,53 @@ int user_path_at(int dfd, const char __user *name, unsigned flags, return user_path_at_empty(dfd, name, flags, path, NULL); } +int user_file_or_path_at(int dfd, const char __user *filename, + int lookup_flags, bool null_filename_okay, + struct file_or_path *out) +{ + struct nameidata nd; + struct filename *tmp; + int err; + int empty = 0; + + if (!filename && null_filename_okay) { + out->file = fget(dfd); + return out->file ? 0 : -EBADF; + } + + tmp = getname_flags(filename, lookup_flags, &empty); + if (IS_ERR(tmp)) + return PTR_ERR(tmp); + + if (empty) { + out->file = fget(dfd); + return out->file ? 0 : -EBADF; + } + + nd.last_symlink_file = NULL; + BUG_ON(lookup_flags & LOOKUP_PARENT); + + err = filename_lookup(dfd, tmp, lookup_flags | LOOKUP_FILE, &nd); + putname(tmp); + if (!err) { + if (nd.last_symlink_file && nd.last_type == LAST_BIND) { + out->file = nd.last_symlink_file; + memset(&out->path, 0, sizeof(out->path)); + path_put(&nd.path); + } else { + out->path = nd.path; + out->file = NULL; + if (nd.last_symlink_file) + fput(nd.last_symlink_file); + } + } else { + if (nd.last_symlink_file) + fput(nd.last_symlink_file); + } + + return err; +} + /* * NB: most callers don't do anything directly with the reference to the * to struct filename, but the nd->last pointer points into the name string diff --git a/fs/open.c b/fs/open.c index 7931f76..1349b43 100644 --- a/fs/open.c +++ b/fs/open.c @@ -114,20 +114,66 @@ out: } EXPORT_SYMBOL_GPL(vfs_truncate); +static long do_truncate_file(struct file *file, loff_t length, int small) +{ + struct inode *inode; + struct dentry *dentry; + int error; + + error = -EINVAL; + if (length < 0) + goto out; + + /* explicitly opened as large or we are on 64-bit box */ + if (file->f_flags & O_LARGEFILE) + small = 0; + + dentry = file->f_path.dentry; + inode = dentry->d_inode; + error = -EINVAL; + if (!S_ISREG(inode->i_mode) || !(file->f_mode & FMODE_WRITE)) + goto out; + + error = -EINVAL; + /* Cannot ftruncate over 2^31 bytes without large file support */ + if (small && length > MAX_NON_LFS) + goto out; + + error = -EPERM; + if (IS_APPEND(inode)) + goto out; + + sb_start_write(inode->i_sb); + error = locks_verify_truncate(inode, file, length); + if (!error) + error = security_path_truncate(&file->f_path); + if (!error) + error = do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, file); + sb_end_write(inode->i_sb); +out: + return error; +} + static long do_sys_truncate(const char __user *pathname, loff_t length) { - unsigned int lookup_flags = LOOKUP_FOLLOW; - struct path path; + struct file_or_path obj; + int lookup_flags = LOOKUP_FOLLOW; int error; if (length < 0) /* sorry, but loff_t says... */ return -EINVAL; retry: - error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path); + error = user_file_or_path_at(AT_FDCWD, pathname, + LOOKUP_FOLLOW, false, &obj); if (!error) { - error = vfs_truncate(&path, length); - path_put(&path); + if (obj.file) { + error = do_truncate_file(obj.file, length, 0); + fput(obj.file); + } else { + error = vfs_truncate(&obj.path, length); + path_put(&obj.path); + } } if (retry_estale(error, lookup_flags)) { lookup_flags |= LOOKUP_REVAL; @@ -150,48 +196,15 @@ COMPAT_SYSCALL_DEFINE2(truncate, const char __user *, path, compat_off_t, length static long do_sys_ftruncate(unsigned int fd, loff_t length, int small) { - struct inode *inode; - struct dentry *dentry; struct fd f; int error; - error = -EINVAL; - if (length < 0) - goto out; error = -EBADF; f = fdget(fd); if (!f.file) - goto out; - - /* explicitly opened as large or we are on 64-bit box */ - if (f.file->f_flags & O_LARGEFILE) - small = 0; - - dentry = f.file->f_path.dentry; - inode = dentry->d_inode; - error = -EINVAL; - if (!S_ISREG(inode->i_mode) || !(f.file->f_mode & FMODE_WRITE)) - goto out_putf; - - error = -EINVAL; - /* Cannot ftruncate over 2^31 bytes without large file support */ - if (small && length > MAX_NON_LFS) - goto out_putf; - - error = -EPERM; - if (IS_APPEND(inode)) - goto out_putf; - - sb_start_write(inode->i_sb); - error = locks_verify_truncate(inode, f.file, length); - if (!error) - error = security_path_truncate(&f.file->f_path); - if (!error) - error = do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, f.file); - sb_end_write(inode->i_sb); -out_putf: + return error; + error = do_truncate_file(f.file, length, small); fdput(f); -out: return error; } diff --git a/fs/proc/base.c b/fs/proc/base.c index 1485e38..ac28ff3 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -171,7 +171,7 @@ static int get_task_root(struct task_struct *task, struct path *root) return result; } -static int proc_cwd_link(struct dentry *dentry, struct path *path) +static int proc_cwd_link(struct dentry *dentry, struct file_or_path *link) { struct task_struct *task = get_proc_task(dentry->d_inode); int result = -ENOENT; @@ -179,7 +179,8 @@ static int proc_cwd_link(struct dentry *dentry, struct path *path) if (task) { task_lock(task); if (task->fs) { - get_fs_pwd(task->fs, path); + link->file = NULL; + get_fs_pwd(task->fs, &link->path); result = 0; } task_unlock(task); @@ -188,13 +189,14 @@ static int proc_cwd_link(struct dentry *dentry, struct path *path) return result; } -static int proc_root_link(struct dentry *dentry, struct path *path) +static int proc_root_link(struct dentry *dentry, struct file_or_path *link) { struct task_struct *task = get_proc_task(dentry->d_inode); int result = -ENOENT; if (task) { - result = get_task_root(task, path); + link->file = NULL; + result = get_task_root(task, &link->path); put_task_struct(task); } return result; @@ -1430,11 +1432,10 @@ static const struct file_operations proc_pid_set_comm_operations = { .release = single_release, }; -static int proc_exe_link(struct dentry *dentry, struct path *exe_path) +static int proc_exe_link(struct dentry *dentry, struct file_or_path *link) { struct task_struct *task; struct mm_struct *mm; - struct file *exe_file; task = get_proc_task(dentry->d_inode); if (!task) @@ -1443,32 +1444,32 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path) put_task_struct(task); if (!mm) return -ENOENT; - exe_file = get_mm_exe_file(mm); + link->file = get_mm_exe_file(mm); mmput(mm); - if (exe_file) { - *exe_path = exe_file->f_path; - path_get(&exe_file->f_path); - fput(exe_file); + if (link->file) return 0; - } else + else return -ENOENT; } static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd) { struct inode *inode = dentry->d_inode; - struct path path; + struct file_or_path link; int error = -EACCES; /* Are we allowed to snoop on the tasks file descriptors? */ if (!proc_fd_access_allowed(inode)) goto out; - error = PROC_I(inode)->op.proc_get_link(dentry, &path); + error = PROC_I(inode)->op.proc_get_link(dentry, &link); if (error) goto out; - nd_jump_link(nd, &path); + if (link.file) + nd_jump_link_file(nd, link.file); + else + nd_jump_link(nd, &link.path); return NULL; out: return ERR_PTR(error); @@ -1502,18 +1503,23 @@ static int proc_pid_readlink(struct dentry * dentry, char __user * buffer, int b { int error = -EACCES; struct inode *inode = dentry->d_inode; - struct path path; + struct file_or_path link; /* Are we allowed to snoop on the tasks file descriptors? */ if (!proc_fd_access_allowed(inode)) goto out; - error = PROC_I(inode)->op.proc_get_link(dentry, &path); + error = PROC_I(inode)->op.proc_get_link(dentry, &link); if (error) goto out; - error = do_proc_readlink(&path, buffer, buflen); - path_put(&path); + if (link.file) { + error = do_proc_readlink(&link.file->f_path, buffer, buflen); + fput(link.file); + } else { + error = do_proc_readlink(&link.path, buffer, buflen); + path_put(&link.path); + } out: return error; } diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 0ff80f9..4c88fc2 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -137,7 +137,7 @@ static const struct dentry_operations tid_fd_dentry_operations = { .d_delete = pid_delete_dentry, }; -static int proc_fd_link(struct dentry *dentry, struct path *path) +static int proc_fd_link(struct dentry *dentry, struct file_or_path *link) { struct files_struct *files = NULL; struct task_struct *task; @@ -151,13 +151,11 @@ static int proc_fd_link(struct dentry *dentry, struct path *path) if (files) { int fd = proc_fd(dentry->d_inode); - struct file *fd_file; spin_lock(&files->file_lock); - fd_file = fcheck_files(files, fd); - if (fd_file) { - *path = fd_file->f_path; - path_get(&fd_file->f_path); + link->file = fcheck_files(files, fd); + if (link->file) { + get_file(link->file); ret = 0; } spin_unlock(&files->file_lock); diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 651d09a..fc936d5 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -14,6 +14,7 @@ #include <linux/spinlock.h> #include <linux/atomic.h> #include <linux/binfmts.h> +#include <linux/namei.h> struct ctl_table_header; struct mempolicy; @@ -51,7 +52,7 @@ struct proc_dir_entry { }; union proc_op { - int (*proc_get_link)(struct dentry *, struct path *); + int (*proc_get_link)(struct dentry *, struct file_or_path *link); int (*proc_read)(struct task_struct *task, char *page); int (*proc_show)(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, diff --git a/include/linux/namei.h b/include/linux/namei.h index 5a5ff57..77f585e 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -15,6 +15,7 @@ struct nameidata { struct qstr last; struct path root; struct inode *inode; /* path.dentry.d_inode */ + struct file *last_symlink_file; unsigned int flags; unsigned seq; int last_type; @@ -56,9 +57,21 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND}; #define LOOKUP_ROOT 0x2000 #define LOOKUP_EMPTY 0x4000 +#define LOOKUP_FILE 0x8000 + extern int user_path_at(int, const char __user *, unsigned, struct path *); extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty); +struct file_or_path +{ + struct file *file; + struct path path; +}; + +extern int user_file_or_path_at(int dfd, const char __user *filename, + int lookup_flags, bool null_filename_okay, + struct file_or_path *out); + #define user_path(name, path) user_path_at(AT_FDCWD, name, LOOKUP_FOLLOW, path) #define user_lpath(name, path) user_path_at(AT_FDCWD, name, 0, path) #define user_path_dir(name, path) \ @@ -83,6 +96,7 @@ extern struct dentry *lock_rename(struct dentry *, struct dentry *); extern void unlock_rename(struct dentry *, struct dentry *); extern void nd_jump_link(struct nameidata *nd, struct path *path); +extern void nd_jump_link_file(struct nameidata *nd, struct file *file); static inline void nd_set_link(struct nameidata *nd, char *path) { -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate 2013-08-27 19:16 ` [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate Andy Lutomirski @ 2013-08-27 19:32 ` Linus Torvalds 2013-08-27 20:28 ` Andy Lutomirski 2013-08-27 23:08 ` Al Viro 1 sibling, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2013-08-27 19:32 UTC (permalink / raw) To: Andy Lutomirski Cc: Al Viro, Willy Tarreau, Ingo Molnar, security, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Tue, Aug 27, 2013 at 12:16 PM, Andy Lutomirski <luto@amacapital.net> wrote: > This is an experiment to see if we can get nice semantics for all syscalls > that either follow symlinks or allow AT_EMPTY_PATH without jumping through > enormous hoops. This converts truncate (although you can't tell using > truncate from coreutils, because it actually uses open + ftruncate). So this seems *way* too complex. I'd much rather see "nd->flags" get a LOOKUP_READONLY flag, for example, that gets set by proc_pid_follow_link() when it hits a read-only file descriptor (and gets cleared by other lookups). Wouldn't that be *much* more straightforward? Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate 2013-08-27 19:32 ` Linus Torvalds @ 2013-08-27 20:28 ` Andy Lutomirski 2013-08-28 6:16 ` Al Viro 0 siblings, 1 reply; 66+ messages in thread From: Andy Lutomirski @ 2013-08-27 20:28 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Willy Tarreau, Ingo Molnar, security, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Tue, Aug 27, 2013 at 12:32 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Tue, Aug 27, 2013 at 12:16 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> This is an experiment to see if we can get nice semantics for all syscalls >> that either follow symlinks or allow AT_EMPTY_PATH without jumping through >> enormous hoops. This converts truncate (although you can't tell using >> truncate from coreutils, because it actually uses open + ftruncate). > > So this seems *way* too complex. I'd much rather see "nd->flags" get a > LOOKUP_READONLY flag, for example, that gets set by > proc_pid_follow_link() when it hits a read-only file descriptor (and > gets cleared by other lookups). > > Wouldn't that be *much* more straightforward? It would if it works. It certainly would for truncate, setxattr, etc. There are funny cases, though. For example, execing an O_WRONLY fd should probably fail, as should opening an O_WRONLY fd as O_RDWR. O_APPEND is also funny. flink will (I suspect) always want to be a bit special. There are also O_PATH fds, and I'm not sure what the semantics of O_PATH fds are or should be when they refer to something other than a directory. The benefit of my approach is that it's really obvious that truncate("/proc/self/fd/N") and ftruncate(N) do exactly the same thing. The downside is that the namei code is a bit gross and there was more rearranging of ftruncate than I would have liked. (I also lost the benefit of fget_light, but I could fix that by passing a struct fd around instead of a struct file *.) --Andy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate 2013-08-27 20:28 ` Andy Lutomirski @ 2013-08-28 6:16 ` Al Viro 2013-08-28 16:24 ` Linus Torvalds 2013-08-28 19:04 ` Andy Lutomirski 0 siblings, 2 replies; 66+ messages in thread From: Al Viro @ 2013-08-28 6:16 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Willy Tarreau, Ingo Molnar, security, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Tue, Aug 27, 2013 at 01:28:27PM -0700, Andy Lutomirski wrote: > It would if it works. It certainly would for truncate, setxattr, etc. > > There are funny cases, though. For example, execing an O_WRONLY fd > should probably fail, as should opening an O_WRONLY fd as O_RDWR. > O_APPEND is also funny. flink will (I suspect) always want to be a > bit special. > > There are also O_PATH fds, and I'm not sure what the semantics of > O_PATH fds are or should be when they refer to something other than a > directory. O_PATH file just points to specific location in the tree, no more and no less. > The benefit of my approach is that it's really obvious that > truncate("/proc/self/fd/N") and ftruncate(N) do exactly the same > thing. The downside is that the namei code is a bit gross and there > was more rearranging of ftruncate than I would have liked. (I also > lost the benefit of fget_light, but I could fix that by passing a > struct fd around instead of a struct file *.) With that thing namei code becomes _really_ gross and we'll have to live with it afterwards. No. Moreover, grabbing references to struct file is the wrong thing to do - we need far less information than that. Let's look at that zoo first. a) linkat() target. flink(2) doesn't exist, so we can choose whatever we bloody please for that case. And frankly, I would gladly go for O_TMPFILE sans O_EXCL or sufficient caps. See upthread for proposed solution. b) chdir(). Checks are identical in chdir() and fchdir(), don't bloody care how the file had been opened and don't need to care - anything we could've accessed after such chdir() we could've accessed by slapping a relative path to the end of the path we'd used to reach that thing. c) chroot(). fchroot() doesn't exist and realistically anyone who can do chroot(2) is well past caring about anything. We really don't care how the damn thing had been opened. d) fchownat(). Neither it nor fchown() give a flying fuch about the way file had been opened. e) fchmodat(). Same story. f) faccessat(). IMO we don't care how the damn thing had been opened. g) name_to_handle_at(). Not sure if we need any capability checks, but we definitely don't give a damn about the way file had been opened. h) one case in quotactl(). With CAP_SYS_ADMIN required. If attacker got that, it's too fucking late anyway. i) fstatat() and friends, statfs(), utimes() et.al, {set,get,list,remove}xattr() - no difference in checks between pathname- and descriptor-based calls anyway. Please, note that for fsetxattr() we do *NOT* require the file to be opened for write; adding such requirement would be a user-visible API change, and one fairly likely to break stuff, at that. j) umount() and pivot_root(). We really don't care how the file had been opened, and attacker capable of playing with that can do a lot more. k) inotify_add_watch()/fanotify_mark(). No descriptor-based versions, no permission checks other than "may read whatever we ended up with". I really doubt that we care about the way fd had been opened in case of /proc/<pid>/fd/<fd>. l) truncate() mess. m) open() mess. AFAICS, the *only* cases when we might possibly care are linkat() target, truncate() and open(). Note, BTW, that right now we *do* allow an attempt to reopen a file via procfs symlink r/w, even when file had been r/o. It's subject to permissions on the object being opened, but that's it. I'm not sure we can change that - again, it's a user-visible API, and the change is very likely to break some scripts. In fact, it's about as dangerous as a full-blown switch to dup-style semantics for procfs opens, and it's a lot less attractive. For truncate() we would only need to have FMODE_WRITE reported, more or less the same way as FMODE_FLINK. And without open() changes it doesn't buy us anything at all... I've no problem with unrolling the user_path_at() in do_sys_truncate() into an explicit loop by trailing symlinks and checking for indication left by proc_pid_follow_link(), more or less the same way as with LOOKUP_LINK in lookup_last(). It's _far_ less invasive than playing with "oh, here we fill a struct path or maybe a struct file" horrors, pinning struct file for no reason, etc. AFAICS, the real question is whether we dare to change open() behaviour on /proc/*/fd/*. I've played with that a bit and I believe that I can do the switch to dup-style with very localized changes in fs/namei.c and fs/proc/{base,fd}.c. Will be even binary compatible kernel-side - ->atomic_open() returns NULL/ERR_PTR where it used to return 0/-error, not that we had many instances to convert. *IF* that variant is not out of consideration for userland API stability reasons, I would certainly prefer to go that way; turns out that these days we can pull it off without black magic in descriptor handling, etc. Linus? ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate 2013-08-28 6:16 ` Al Viro @ 2013-08-28 16:24 ` Linus Torvalds 2013-08-28 19:04 ` Andy Lutomirski 1 sibling, 0 replies; 66+ messages in thread From: Linus Torvalds @ 2013-08-28 16:24 UTC (permalink / raw) To: Al Viro Cc: Andy Lutomirski, Willy Tarreau, Ingo Molnar, security, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler Pseudo-related: I just reverted the flink() commit in my tree, so that at least wrt that whole thing we'll be back to the original behavior, broken or not. Clearly we're not done with this discussion, and the patches flying around aren't appropriate for 3.11, so.. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate 2013-08-28 6:16 ` Al Viro 2013-08-28 16:24 ` Linus Torvalds @ 2013-08-28 19:04 ` Andy Lutomirski 2013-08-28 19:59 ` Al Viro 1 sibling, 1 reply; 66+ messages in thread From: Andy Lutomirski @ 2013-08-28 19:04 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Willy Tarreau, Ingo Molnar, security, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Tue, Aug 27, 2013 at 11:16 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Tue, Aug 27, 2013 at 01:28:27PM -0700, Andy Lutomirski wrote: >> There are also O_PATH fds, and I'm not sure what the semantics of >> O_PATH fds are or should be when they refer to something other than a >> directory. > > O_PATH file just points to specific location in the tree, no more and > no less. I don't know whether ftruncate(some O_PATH fd) should work. But this probably barely matters. > > g) name_to_handle_at(). Not sure if we need any capability checks, but > we definitely don't give a damn about the way file had been opened. open_by_handle(name_to_handle(read-only fd)) is dangerous for the same reason that open("/proc/self/fd/N", O_RDWR) is, but the capability checks have that covered. > > h) one case in quotactl(). With CAP_SYS_ADMIN required. If attacker got > that, it's too fucking late anyway. > > i) fstatat() and friends, statfs(), utimes() et.al, {set,get,list,remove}xattr() > - no difference in checks between pathname- and descriptor-based calls > anyway. Please, note that for fsetxattr() we do *NOT* require the file > to be opened for write; adding such requirement would be a user-visible > API change, and one fairly likely to break stuff, at that. Sigh. I wonder if that was deliberate. Anyway, it is what it is. > > j) umount() and pivot_root(). We really don't care how the file had been > opened, and attacker capable of playing with that can do a lot more. > > k) inotify_add_watch()/fanotify_mark(). No descriptor-based versions, > no permission checks other than "may read whatever we ended up with". > I really doubt that we care about the way fd had been opened in case > of /proc/<pid>/fd/<fd>. > > l) truncate() mess. > > m) open() mess. > > AFAICS, the *only* cases when we might possibly care are linkat() target, > truncate() and open(). Note, BTW, that right now we *do* allow an attempt > to reopen a file via procfs symlink r/w, even when file had been r/o. > It's subject to permissions on the object being opened, but that's it. > > I'm not sure we can change that - again, it's a user-visible API, and > the change is very likely to break some scripts. In fact, it's about > as dangerous as a full-blown switch to dup-style semantics for procfs > opens, and it's a lot less attractive. > > For truncate() we would only need to have FMODE_WRITE reported, more or > less the same way as FMODE_FLINK. And without open() changes it doesn't > buy us anything at all... > > I've no problem with unrolling the user_path_at() in do_sys_truncate() > into an explicit loop by trailing symlinks and checking for indication > left by proc_pid_follow_link(), more or less the same way as with > LOOKUP_LINK in lookup_last(). It's _far_ less invasive than playing > with "oh, here we fill a struct path or maybe a struct file" horrors, > pinning struct file for no reason, etc. > > AFAICS, the real question is whether we dare to change open() behaviour on > /proc/*/fd/*. I've played with that a bit and I believe that I can do > the switch to dup-style with very localized changes in fs/namei.c and > fs/proc/{base,fd}.c. Will be even binary compatible kernel-side - > ->atomic_open() returns NULL/ERR_PTR where it used to return 0/-error, > not that we had many instances to convert. *IF* that variant is not > out of consideration for userland API stability reasons, I would certainly > prefer to go that way; turns out that these days we can pull it off without > black magic in descriptor handling, etc. Linus? I personally find the check-mode-but-get-a-new-struct-file version to be less weird that the dup approach. Either approach will break scripts that try to write to /dev/stdin (which is the whole point). --Andy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate 2013-08-28 19:04 ` Andy Lutomirski @ 2013-08-28 19:59 ` Al Viro 2013-08-28 21:07 ` Andy Lutomirski 0 siblings, 1 reply; 66+ messages in thread From: Al Viro @ 2013-08-28 19:59 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Willy Tarreau, Ingo Molnar, security, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Wed, Aug 28, 2013 at 12:04:43PM -0700, Andy Lutomirski wrote: > On Tue, Aug 27, 2013 at 11:16 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Tue, Aug 27, 2013 at 01:28:27PM -0700, Andy Lutomirski wrote: > >> There are also O_PATH fds, and I'm not sure what the semantics of > >> O_PATH fds are or should be when they refer to something other than a > >> directory. > > > > O_PATH file just points to specific location in the tree, no more and > > no less. > > I don't know whether ftruncate(some O_PATH fd) should work. But this > probably barely matters. It shouldn't. No IO on these guys at all. > > AFAICS, the *only* cases when we might possibly care are linkat() target, > > truncate() and open(). Note, BTW, that right now we *do* allow an attempt > > to reopen a file via procfs symlink r/w, even when file had been r/o. > > It's subject to permissions on the object being opened, but that's it. > > > > I'm not sure we can change that - again, it's a user-visible API, and > > the change is very likely to break some scripts. In fact, it's about > > as dangerous as a full-blown switch to dup-style semantics for procfs > > opens, and it's a lot less attractive. > > > > For truncate() we would only need to have FMODE_WRITE reported, more or > > less the same way as FMODE_FLINK. And without open() changes it doesn't > > buy us anything at all... > > > > I've no problem with unrolling the user_path_at() in do_sys_truncate() > > into an explicit loop by trailing symlinks and checking for indication > > left by proc_pid_follow_link(), more or less the same way as with > > LOOKUP_LINK in lookup_last(). It's _far_ less invasive than playing > > with "oh, here we fill a struct path or maybe a struct file" horrors, > > pinning struct file for no reason, etc. > > > > AFAICS, the real question is whether we dare to change open() behaviour on > > /proc/*/fd/*. I've played with that a bit and I believe that I can do > > the switch to dup-style with very localized changes in fs/namei.c and > > fs/proc/{base,fd}.c. Will be even binary compatible kernel-side - > > ->atomic_open() returns NULL/ERR_PTR where it used to return 0/-error, > > not that we had many instances to convert. *IF* that variant is not > > out of consideration for userland API stability reasons, I would certainly > > prefer to go that way; turns out that these days we can pull it off without > > black magic in descriptor handling, etc. Linus? > > I personally find the check-mode-but-get-a-new-struct-file version to > be less weird that the dup approach. Either approach will break > scripts that try to write to /dev/stdin (which is the whole point). What, breaking existing userland? IMO that's a thing to avoid, unless we have really, really strong reasons not to. And yes, it goes for both variants... FWIW, I'm not convinced that the reasons you are giving for it are strong enough - passing somebody a read-only file descriptor to a file they could open for write and relying on their inability to truncate the fscker just because it's not reachable via any path they've got search permissions to looks like a Bloody Bad Idea(tm), and not only because it won't do what you hope it'll do on existing kernels. It's very easy to fuck up and end up with a searchable path to the damn thing; e.g. /proc/<pid>/cwd/foo will bypass the grandparent of foo not being searchable for you, etc. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate 2013-08-28 19:59 ` Al Viro @ 2013-08-28 21:07 ` Andy Lutomirski 0 siblings, 0 replies; 66+ messages in thread From: Andy Lutomirski @ 2013-08-28 21:07 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Willy Tarreau, Ingo Molnar, security, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Wed, Aug 28, 2013 at 12:59 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Wed, Aug 28, 2013 at 12:04:43PM -0700, Andy Lutomirski wrote: >> On Tue, Aug 27, 2013 at 11:16 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: >> > On Tue, Aug 27, 2013 at 01:28:27PM -0700, Andy Lutomirski wrote: >> >> There are also O_PATH fds, and I'm not sure what the semantics of >> >> O_PATH fds are or should be when they refer to something other than a >> >> directory. >> > >> > O_PATH file just points to specific location in the tree, no more and >> > no less. >> >> I don't know whether ftruncate(some O_PATH fd) should work. But this >> probably barely matters. > > It shouldn't. No IO on these guys at all. > >> > AFAICS, the *only* cases when we might possibly care are linkat() target, >> > truncate() and open(). Note, BTW, that right now we *do* allow an attempt >> > to reopen a file via procfs symlink r/w, even when file had been r/o. >> > It's subject to permissions on the object being opened, but that's it. >> > >> > I'm not sure we can change that - again, it's a user-visible API, and >> > the change is very likely to break some scripts. In fact, it's about >> > as dangerous as a full-blown switch to dup-style semantics for procfs >> > opens, and it's a lot less attractive. >> > >> > For truncate() we would only need to have FMODE_WRITE reported, more or >> > less the same way as FMODE_FLINK. And without open() changes it doesn't >> > buy us anything at all... >> > >> > I've no problem with unrolling the user_path_at() in do_sys_truncate() >> > into an explicit loop by trailing symlinks and checking for indication >> > left by proc_pid_follow_link(), more or less the same way as with >> > LOOKUP_LINK in lookup_last(). It's _far_ less invasive than playing >> > with "oh, here we fill a struct path or maybe a struct file" horrors, >> > pinning struct file for no reason, etc. >> > >> > AFAICS, the real question is whether we dare to change open() behaviour on >> > /proc/*/fd/*. I've played with that a bit and I believe that I can do >> > the switch to dup-style with very localized changes in fs/namei.c and >> > fs/proc/{base,fd}.c. Will be even binary compatible kernel-side - >> > ->atomic_open() returns NULL/ERR_PTR where it used to return 0/-error, >> > not that we had many instances to convert. *IF* that variant is not >> > out of consideration for userland API stability reasons, I would certainly >> > prefer to go that way; turns out that these days we can pull it off without >> > black magic in descriptor handling, etc. Linus? >> >> I personally find the check-mode-but-get-a-new-struct-file version to >> be less weird that the dup approach. Either approach will break >> scripts that try to write to /dev/stdin (which is the whole point). > > What, breaking existing userland? IMO that's a thing to avoid, unless we > have really, really strong reasons not to. And yes, it goes for both > variants... FWIW, I'm not convinced that the reasons you are giving for > it are strong enough - passing somebody a read-only file descriptor to > a file they could open for write and relying on their inability to truncate > the fscker just because it's not reachable via any path they've got search > permissions to looks like a Bloody Bad Idea(tm), and not only because it won't > do what you hope it'll do on existing kernels. It's very easy to fuck up > and end up with a searchable path to the damn thing; e.g. /proc/<pid>/cwd/foo > will bypass the grandparent of foo not being searchable for you, etc. > This affects O_TMPFILE, for example -- create a file with O_TMPFILE | O_RDWR and mode 0666 (by accident), write something, then open("/proc/self/fd/N", O_RDONLY) and send the resulting fd to someone. They can't directly write it, but they can reopen it O_RDWR. I agree that flink is the main issue here, FWIW. I think that the current semantics are just too screwed up to make it really safe to pass around fds with reduced access modes and expect those modes to stick. --Andy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate 2013-08-27 19:16 ` [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate Andy Lutomirski 2013-08-27 19:32 ` Linus Torvalds @ 2013-08-27 23:08 ` Al Viro 2013-08-27 23:13 ` Andy Lutomirski 1 sibling, 1 reply; 66+ messages in thread From: Al Viro @ 2013-08-27 23:08 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Willy Tarreau, Ingo Molnar, security, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Tue, Aug 27, 2013 at 12:16:34PM -0700, Andy Lutomirski wrote: > This is an experiment to see if we can get nice semantics for all syscalls > that either follow symlinks or allow AT_EMPTY_PATH without jumping through > enormous hoops. This converts truncate (although you can't tell using > truncate from coreutils, because it actually uses open + ftruncate). > > The basic idea is that there's a new helper function > user_file_or_path_at. It takes an fd and a path and, depending on > flags, the emptiness of the path, and whether path is a magic /proc > symlink (or a symlink to a magic /proc/symlink), it returns either a > struct path or a struct file *. No. > + path_get(&nd->path); > + if (nd->flags & LOOKUP_FILE) { > + if (nd->last_symlink_file) > + fput(nd->last_symlink_file); > + nd->last_symlink_file = file; This is ugly (and costs quite a bit of overhead) > -static int proc_cwd_link(struct dentry *dentry, struct path *path) > +static int proc_cwd_link(struct dentry *dentry, struct file_or_path *link) ... and this is even more vile. Vetoed, for being too ugly to live. I think I've a saner approach, not involving anything that ugly; I'll post a writeup later tonight. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate 2013-08-27 23:08 ` Al Viro @ 2013-08-27 23:13 ` Andy Lutomirski 0 siblings, 0 replies; 66+ messages in thread From: Andy Lutomirski @ 2013-08-27 23:13 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Willy Tarreau, Ingo Molnar, security, Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel, Brad Spengler On Tue, Aug 27, 2013 at 4:08 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Tue, Aug 27, 2013 at 12:16:34PM -0700, Andy Lutomirski wrote: >> This is an experiment to see if we can get nice semantics for all syscalls >> that either follow symlinks or allow AT_EMPTY_PATH without jumping through >> enormous hoops. This converts truncate (although you can't tell using >> truncate from coreutils, because it actually uses open + ftruncate). >> >> The basic idea is that there's a new helper function >> user_file_or_path_at. It takes an fd and a path and, depending on >> flags, the emptiness of the path, and whether path is a magic /proc >> symlink (or a symlink to a magic /proc/symlink), it returns either a >> struct path or a struct file *. > > No. I'm curious what's wrong with the general concept. (I agree that the implementation is heinous.) If I ever wanted to add a new *at syscall, I'd like to be able to do: int sys_whateverat(int dfd, const char __user *name, int flags) { resolve_the_thing(dfd, name, flags); if (it's a file) { check fmode; } else { inode_permission(); } actually_do_something(inode); unreference_whatever_i_got(); return ret; } thereby killing the fwhatever and whateverat birds with one stone. > >> + path_get(&nd->path); >> + if (nd->flags & LOOKUP_FILE) { >> + if (nd->last_symlink_file) >> + fput(nd->last_symlink_file); >> + nd->last_symlink_file = file; > > This is ugly (and costs quite a bit of overhead) > >> -static int proc_cwd_link(struct dentry *dentry, struct path *path) >> +static int proc_cwd_link(struct dentry *dentry, struct file_or_path *link) > > ... and this is even more vile. Vetoed, for being too ugly to live. ...phew. I wasn't looking forward to testing and debugging my crap :) --Andy ^ permalink raw reply [flat|nested] 66+ messages in thread
* /proc/pid/fd && anon_inode_fops 2013-08-22 20:15 ` Willy Tarreau 2013-08-22 20:22 ` Andy Lutomirski @ 2013-08-24 18:29 ` Oleg Nesterov 2013-08-24 21:24 ` Willy Tarreau 1 sibling, 1 reply; 66+ messages in thread From: Oleg Nesterov @ 2013-08-24 18:29 UTC (permalink / raw) To: Willy Tarreau Cc: Andy Lutomirski, Linus Torvalds, security, Ingo Molnar, Linux Kernel Mailing List, Al Viro, Linux FS Devel, Brad Spengler Sorry for off-topic, I am just curios. On 08/22, Willy Tarreau wrote: > > It's not only that, it also supports sockets and pipes that you can access > via /proc/pid/fd and not via a real symlink which would try to open eg > "pipe:[23456]" instead of the real file. But sock_no_open() disallows this, and for good reason I guess. I am wondering, perhaps anon_inode should do the same? I do not see any problem, but it looks pointless and misleading to allow to open a file you can do nothing with. Or is there any reason why, say, open("anon_inode:[perf_event]") should succeed? Thanks, Oleg. --- x/fs/anon_inodes.c +++ x/fs/anon_inodes.c @@ -24,7 +24,15 @@ static struct vfsmount *anon_inode_mnt __read_mostly; static struct inode *anon_inode_inode; -static const struct file_operations anon_inode_fops; + +static int anon_open(struct inode *inode, struct file *file) +{ + return -ENXIO; +} + +static const struct file_operations anon_inode_fops = { + .open = anon_open, +}; /* * anon_inodefs_dname() is called from d_path(). ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: /proc/pid/fd && anon_inode_fops 2013-08-24 18:29 ` /proc/pid/fd && anon_inode_fops Oleg Nesterov @ 2013-08-24 21:24 ` Willy Tarreau 2013-08-25 5:23 ` Al Viro 2013-08-25 15:45 ` Oleg Nesterov 0 siblings, 2 replies; 66+ messages in thread From: Willy Tarreau @ 2013-08-24 21:24 UTC (permalink / raw) To: Oleg Nesterov Cc: Andy Lutomirski, Linus Torvalds, security, Ingo Molnar, Linux Kernel Mailing List, Al Viro, Linux FS Devel, Brad Spengler Hi Oleg, On Sat, Aug 24, 2013 at 08:29:39PM +0200, Oleg Nesterov wrote: > Sorry for off-topic, I am just curios. > > On 08/22, Willy Tarreau wrote: > > > > It's not only that, it also supports sockets and pipes that you can access > > via /proc/pid/fd and not via a real symlink which would try to open eg > > "pipe:[23456]" instead of the real file. > > But sock_no_open() disallows this, and for good reason I guess. Hmmm not exactly, it works for a pipe but not for a socket. It even breaks /dev/stdin (/proc/self/fd/0) for processes running attached to a socket (eg: shell script) : sh-3.1$ ls -la /proc/$$/fd/ total 0 dr-x------ 2 willy users 0 Aug 24 23:03 . dr-xr-xr-x 6 willy users 0 Aug 24 23:03 .. lrwx------ 1 willy users 64 Aug 24 23:03 0 -> socket:[1247293] lrwx------ 1 willy users 64 Aug 24 23:03 1 -> socket:[1247293] lrwx------ 1 willy users 64 Aug 24 23:03 2 -> socket:[1247293] lrwx------ 1 willy users 64 Aug 24 23:03 255 -> socket:[1247293] sh-3.1$ read < /dev/stdin sh: /dev/stdin: No such device or address sh-3.1$ read < /dev/fd/0 sh: /dev/fd/0: No such device or address But : window 1: willy@pcw:~$ ssh 0 sh -i sh-3.1$ echo $$ 9832 sh-3.1$ ls -la /proc/$$/fd/ total 0 dr-x------ 2 willy users 0 Aug 24 23:16 . dr-xr-xr-x 6 willy users 0 Aug 24 23:16 .. lr-x------ 1 willy users 64 Aug 24 23:16 0 -> pipe:[1247914] l-wx------ 1 willy users 64 Aug 24 23:16 1 -> pipe:[1247915] l-wx------ 1 willy users 64 Aug 24 23:16 2 -> pipe:[1247916] l-wx------ 1 willy users 64 Aug 24 23:17 255 -> pipe:[1247916] sh-3.1$ window 2: willy@pcw:~$ echo hello > /proc/9832/fd/1 willy@pcw:~$ echo whoami > /proc/9832/fd/0 window 1: sh-3.1$ hello willy sh-3.1$ > I am wondering, perhaps anon_inode should do the same? I do not > see any problem, but it looks pointless and misleading to allow > to open a file you can do nothing with. I don't know, I'm often a bit confused by entries in /proc/pid/fd because I don't always know which ones might safely be used or not. > Or is there any reason why, say, open("anon_inode:[perf_event]") > should succeed? I doubt it. It seems to me that most such entries are implemented for completeness while most valid uses only concern /proc/self/fd. Maybe if we had an option so that only /proc/self/fd would actually allow to access the fds while all /proc/pid/fd would only show what they map to, it would be a good step forward. > Thanks, > > Oleg. > > --- x/fs/anon_inodes.c > +++ x/fs/anon_inodes.c > @@ -24,7 +24,15 @@ > > static struct vfsmount *anon_inode_mnt __read_mostly; > static struct inode *anon_inode_inode; > -static const struct file_operations anon_inode_fops; > + > +static int anon_open(struct inode *inode, struct file *file) > +{ > + return -ENXIO; > +} > + > +static const struct file_operations anon_inode_fops = { > + .open = anon_open, > +}; > > /* > * anon_inodefs_dname() is called from d_path(). regards, willy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: /proc/pid/fd && anon_inode_fops 2013-08-24 21:24 ` Willy Tarreau @ 2013-08-25 5:23 ` Al Viro 2013-08-25 6:50 ` Willy Tarreau 2013-08-25 18:32 ` /proc/pid/fd && anon_inode_fops Linus Torvalds 2013-08-25 15:45 ` Oleg Nesterov 1 sibling, 2 replies; 66+ messages in thread From: Al Viro @ 2013-08-25 5:23 UTC (permalink / raw) To: Willy Tarreau Cc: Oleg Nesterov, Andy Lutomirski, Linus Torvalds, security, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler On Sat, Aug 24, 2013 at 11:24:32PM +0200, Willy Tarreau wrote: > I doubt it. It seems to me that most such entries are implemented > for completeness while most valid uses only concern /proc/self/fd. > Maybe if we had an option so that only /proc/self/fd would actually > allow to access the fds while all /proc/pid/fd would only show what > they map to, it would be a good step forward. How? The fundamental problem is not visibility of that stuff, it's new opened file for the same object (Linux behaviour) vs. new descriptor refering to the same opened file (*BSD and friends). We can't get anon_... sanely reopened in the former semantics and they are very visibly different for regular files, so switching to *BSD one is not feasible - too high odds of userland breakage. The difference in semantics, of course, is that on Linux opening /dev/stdin gives you a descriptor with independent current IO position; on *BSD you get a descriptor sharing the current IO position with stdin. IOW, it's independent open() of the same file vs. dup(). We are really stuck with the current semantics here - switching to *BSD one would not only mean serious surgery on descriptor handling (it's one of the wartier areas in *BSD VFS, in large part because of magic-open-really-a-dup kludges they have to do), it would change a long-standing userland API that had been there for nearly 20 years _and_ one that tends to be used in corner cases of hell knows how many scripts. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: /proc/pid/fd && anon_inode_fops 2013-08-25 5:23 ` Al Viro @ 2013-08-25 6:50 ` Willy Tarreau 2013-08-25 18:51 ` Linus Torvalds 2013-08-25 18:32 ` /proc/pid/fd && anon_inode_fops Linus Torvalds 1 sibling, 1 reply; 66+ messages in thread From: Willy Tarreau @ 2013-08-25 6:50 UTC (permalink / raw) To: Al Viro Cc: Oleg Nesterov, Andy Lutomirski, Linus Torvalds, security, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler On Sun, Aug 25, 2013 at 06:23:17AM +0100, Al Viro wrote: > On Sat, Aug 24, 2013 at 11:24:32PM +0200, Willy Tarreau wrote: > > > I doubt it. It seems to me that most such entries are implemented > > for completeness while most valid uses only concern /proc/self/fd. > > Maybe if we had an option so that only /proc/self/fd would actually > > allow to access the fds while all /proc/pid/fd would only show what > > they map to, it would be a good step forward. > > How? The fundamental problem is not visibility of that stuff, it's > new opened file for the same object (Linux behaviour) vs. new descriptor > refering to the same opened file (*BSD and friends). We can't get > anon_... sanely reopened in the former semantics and they are very > visibly different for regular files, so switching to *BSD one is not > feasible - too high odds of userland breakage. The difference in > semantics, of course, is that on Linux opening /dev/stdin gives you > a descriptor with independent current IO position; on *BSD you get > a descriptor sharing the current IO position with stdin. IOW, it's > independent open() of the same file vs. dup(). > > We are really stuck with the current semantics here - switching to > *BSD one would not only mean serious surgery on descriptor handling > (it's one of the wartier areas in *BSD VFS, in large part because > of magic-open-really-a-dup kludges they have to do), it would change > a long-standing userland API that had been there for nearly 20 years > _and_ one that tends to be used in corner cases of hell knows how many > scripts. Thanks for explaining Al, that really helps me understand. However there's still a difference between /proc/pid called from the process itself (=/proc/self) and called from other processes that seems to suit the situation : willy@eeepc:~$ ls -la /tmp/bash -r-x--x--x 1 root users 916852 2013-08-25 08:19 /tmp/bash* willy@eeepc:~$ exec /tmp/bash -i willy@eeepc:~$ echo $$ 22678 willy@eeepc:~$ ls -la /proc/22678/fd ls: cannot open directory /proc/22678/fd: Permission denied willy@eeepc:~$ ls -la /proc/22678/exe ls: cannot read symbolic link /proc/22678/exe: Permission denied willy@eeepc:~$ cat /proc/22678/fd/0 cat: /proc/22678/fd/0: Permission denied but : willy@eeepc:~$ read < /proc/22678/fd/0 azerazerazer willy@eeepc:~$ echo $REPLY azerazerazer strace clearly shows that the process was allowed to inspect itself and the other ones were not : willy@eeepc:~$ strace -p 22678 open("/proc/22678/fd/0", O_RDONLY|O_LARGEFILE) = 3 willy@eeepc:~$ strace cat /proc/22678/fd/0 open("/proc/22678/fd/0", O_RDONLY|O_LARGEFILE) = -1 EACCES (Permission denied) It looks like this difference was introduced by this patch (which also fixes this issue we've been having for a very long time on 2.4 and early 2.6) : 8948e11 Allow access to /proc/$PID/fd after setuid() Thus I'm wondering if something like this could help, the idea would be that a with the appropriate mount option, a task could only look at its own descriptors unless it's running with privileges : static int proc_fd_permission(struct inode *inode, int mask, struct nameidata *nd) { if (task_pid(current) == proc_pid(inode)) return 0; if (capable(CAP_DAC_OVERRIDE)) return 0; if (proc_mounted_with_strict_option) return -EACCES; return generic_permission(inode, mask, NULL); } Thus it would not change the default behaviour except for people who would mount /proc with a special option. Thanks, Willy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: /proc/pid/fd && anon_inode_fops 2013-08-25 6:50 ` Willy Tarreau @ 2013-08-25 18:51 ` Linus Torvalds 2013-08-25 19:48 ` Oleg Nesterov 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2013-08-25 18:51 UTC (permalink / raw) To: Willy Tarreau Cc: Al Viro, Oleg Nesterov, Andy Lutomirski, security, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler On Sat, Aug 24, 2013 at 11:50 PM, Willy Tarreau <w@1wt.eu> wrote: > > Thanks for explaining Al, that really helps me understand. However > there's still a difference between /proc/pid called from the process > itself (=/proc/self) and called from other processes that seems to > suit the situation : /proc/self has magic special properties, as you noticed. > Thus I'm wondering if something like this could help, the idea would be > that a with the appropriate mount option, a task could only look at its > own descriptors unless it's running with privileges : I'd much rather try to do it in general, and use "file->f_cred" more aggressively for /proc/<pid>/fd/ security. We don't use f_cred at all in /proc, but that's because /proc predates that whole thing. So instead we use the credentials of the task when we want to look at credentials of the file, because that was the closest approximation we used to have. Look at the code that creates the fd stat information, for example. It's in tid_fd_revalidate(), and it really doesn't make much sense to use the task credentials for it. I wonder if we should do something like the appended (whitespace-damaged and totally untested) patch. Linus --- diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 75f2890abbd8..2a5a53cc7a0a 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -74,7 +74,6 @@ static int tid_fd_revalidate(struct dentry *dentry, unsigned int flags) { struct files_struct *files; struct task_struct *task; - const struct cred *cred; struct inode *inode; int fd; @@ -95,19 +94,17 @@ static int tid_fd_revalidate(struct dentry *dentry, unsigned int flags) if (file) { unsigned f_mode = file->f_mode; - rcu_read_unlock(); - put_files_struct(files); - if (task_dumpable(task)) { - rcu_read_lock(); - cred = __task_cred(task); + const struct cred *cred = file->f_cred; inode->i_uid = cred->euid; inode->i_gid = cred->egid; - rcu_read_unlock(); } else { inode->i_uid = GLOBAL_ROOT_UID; inode->i_gid = GLOBAL_ROOT_GID; } + rcu_read_unlock(); + put_files_struct(files); + if (S_ISLNK(inode->i_mode)) { unsigned i_mode = S_IFLNK; ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: /proc/pid/fd && anon_inode_fops 2013-08-25 18:51 ` Linus Torvalds @ 2013-08-25 19:48 ` Oleg Nesterov 2013-08-25 20:05 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Oleg Nesterov @ 2013-08-25 19:48 UTC (permalink / raw) To: Linus Torvalds Cc: Willy Tarreau, Al Viro, Andy Lutomirski, security, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler Cough. I am going off-topic again, but I can't resist... On 08/25, Linus Torvalds wrote: > > Look at the code that creates the fd stat information, for example. > It's in tid_fd_revalidate(), and it really doesn't make much sense to > use the task credentials for it. Or pid_revalidate(), but my concern is task_dumpable() logic. pid_revalidate() does inode->i_*id = GLOBAL_ROOT_*ID if task_dumpable() fails, but it can fail simply because ->mm = NULL. This means that almost everything in /proc/zombie-pid/ becomes root. Doesn't really hurt, but for what? Looks a bit strange imho. Oleg. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: /proc/pid/fd && anon_inode_fops 2013-08-25 19:48 ` Oleg Nesterov @ 2013-08-25 20:05 ` Linus Torvalds 2013-08-26 15:33 ` Oleg Nesterov 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2013-08-25 20:05 UTC (permalink / raw) To: Oleg Nesterov Cc: Willy Tarreau, Al Viro, Andy Lutomirski, security, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler On Sun, Aug 25, 2013 at 12:48 PM, Oleg Nesterov <oleg@redhat.com> wrote: > > Or pid_revalidate(), but my concern is task_dumpable() logic. > > pid_revalidate() does inode->i_*id = GLOBAL_ROOT_*ID if task_dumpable() > fails, but it can fail simply because ->mm = NULL. > > This means that almost everything in /proc/zombie-pid/ becomes root. > Doesn't really hurt, but for what? Looks a bit strange imho. The zombie case shouldn't be relevant, because a zombie will have closed all the file descriptors anyway, so they no longer exist. That said, task_dumpable isn't wonderful, and I suspect we could drop that logic entirely in the tid-fd case if we just use f_cred. The reason we have task_dumpable is exactly because we use the task credentials, and they may not really be relevant to the file credentials. IOW, it's there to protect against execve'ing a suid program that opens some protected file and then in setuid()'s back the the original user after having done the critical stuff. But file->f_cred is exactly about the credentials at the time of the open, so it should make things like that irrelevant. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: /proc/pid/fd && anon_inode_fops 2013-08-25 20:05 ` Linus Torvalds @ 2013-08-26 15:33 ` Oleg Nesterov 2013-08-26 16:37 ` Oleg Nesterov 0 siblings, 1 reply; 66+ messages in thread From: Oleg Nesterov @ 2013-08-26 15:33 UTC (permalink / raw) To: Linus Torvalds Cc: Willy Tarreau, Al Viro, Andy Lutomirski, security, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler On 08/25, Linus Torvalds wrote: > > On Sun, Aug 25, 2013 at 12:48 PM, Oleg Nesterov <oleg@redhat.com> wrote: > > > > pid_revalidate() does inode->i_*id = GLOBAL_ROOT_*ID if task_dumpable() > > fails, but it can fail simply because ->mm = NULL. > > > > This means that almost everything in /proc/zombie-pid/ becomes root. > > Doesn't really hurt, but for what? Looks a bit strange imho. > > The zombie case shouldn't be relevant, because a zombie will have > closed all the file descriptors anyway, so they no longer exist. I specially mentioned that this is off-topic ;) > That said, task_dumpable isn't wonderful, and I suspect we could drop > that logic entirely in the tid-fd case if we just use f_cred. Probably yes, but I do not understand this S_IFLNK && uid/chmod magic in tid_fd_revalidate(). And afaics this should not affect readlink() anyway. So yes, ->f_cred makes more sense to me, but I can't comment. But, afaics, speaking of task_dumpable() this doesn't matter. Please forget about /proc/fd or zombies. I can't even understand proc_pid_make_inode() or pid_revalidate(). $ id uid=1000(tst) gid=100(users) groups=100(users) $ cp `which ls` ls $ chmod a-r ./ls $ $ ./ls -l /proc/self/ total 0 -r-------- 1 root root 0 Aug 26 06:35 auxv -r--r--r-- 1 root root 0 Aug 26 06:35 cgroup --w------- 1 root root 0 Aug 26 06:35 clear_refs -r--r--r-- 1 root root 0 Aug 26 06:35 cmdline -rw-r--r-- 1 root root 0 Aug 26 06:35 comm -rw-r--r-- 1 root root 0 Aug 26 06:35 coredump_filter lrwxrwxrwx 1 root root 0 Aug 26 06:35 cwd -> /home/tst -r-------- 1 root root 0 Aug 26 06:35 environ lrwxrwxrwx 1 root root 0 Aug 26 06:35 exe -> /home/tst/ls dr-x------ 2 root root 0 Aug 26 06:35 fd dr-x------ 2 root root 0 Aug 26 06:35 fdinfo -rw-r--r-- 1 root root 0 Aug 26 06:35 gid_map -r--r--r-- 1 root root 0 Aug 26 06:35 limits -r--r--r-- 1 root root 0 Aug 26 06:35 maps -rw------- 1 root root 0 Aug 26 06:35 mem -r--r--r-- 1 root root 0 Aug 26 06:35 mountinfo -r--r--r-- 1 root root 0 Aug 26 06:35 mounts -r-------- 1 root root 0 Aug 26 06:35 mountstats dr-xr-xr-x 4 tst users 0 Aug 26 06:35 net dr-x--x--x 2 root root 0 Aug 26 06:35 ns -rw-r--r-- 1 root root 0 Aug 26 06:35 oom_adj -r--r--r-- 1 root root 0 Aug 26 06:35 oom_score -rw-r--r-- 1 root root 0 Aug 26 06:35 oom_score_adj -r--r--r-- 1 root root 0 Aug 26 06:35 pagemap -r--r--r-- 1 root root 0 Aug 26 06:35 personality -rw-r--r-- 1 root root 0 Aug 26 06:35 projid_map lrwxrwxrwx 1 root root 0 Aug 26 06:35 root -> / -r--r--r-- 1 root root 0 Aug 26 06:35 smaps -r--r--r-- 1 root root 0 Aug 26 06:35 stack -r--r--r-- 1 root root 0 Aug 26 06:35 stat -r--r--r-- 1 root root 0 Aug 26 06:35 statm -r--r--r-- 1 root root 0 Aug 26 06:35 status -r--r--r-- 1 root root 0 Aug 26 06:35 syscall dr-xr-xr-x 3 tst users 0 Aug 26 06:35 task -rw-r--r-- 1 root root 0 Aug 26 06:35 uid_map -r--r--r-- 1 root root 0 Aug 26 06:35 wchan For what? Say, -r--r--r-- 1 root root 0 Aug 26 06:35 status but it is S_IRUGO anyway, why do we need to change the owner? dr-x------ 2 root root 0 Aug 26 06:35 fd OK, this means that I can't access this dir from another process. Not sure we really want this in this case but $ ./ls /proc/self/fd 0 1 2 3 still works, I guess thanks to proc_fd_permission(). However, say, -r-------- 1 root root 0 Aug 26 06:35 mountstats actually becomes unreadable even via /proc/self/. Imho, this all is confusing. Perhaps it makes sense to "chmod", say, /proc/pid/maps if !task_dumpable(), but "chown" looks strange. Oleg. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: /proc/pid/fd && anon_inode_fops 2013-08-26 15:33 ` Oleg Nesterov @ 2013-08-26 16:37 ` Oleg Nesterov 2013-08-26 17:54 ` [PATCH] proc: make proc_fd_permission() thread-friendly Oleg Nesterov 0 siblings, 1 reply; 66+ messages in thread From: Oleg Nesterov @ 2013-08-26 16:37 UTC (permalink / raw) To: Linus Torvalds Cc: Willy Tarreau, Al Viro, Andy Lutomirski, security, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler On 08/26, Oleg Nesterov wrote: > > Not sure we really want this in this case but > > $ ./ls /proc/self/fd > 0 1 2 3 > > still works, I guess thanks to proc_fd_permission(). And btw. Whatever we do, shouldn't we change proc_fd_permission()? /proc/self is actually /proc/tgid, this means that the task_pid() check can't help if a sub-thread uses /proc/self. Oleg. --- x/fs/proc/fd.c +++ x/fs/proc/fd.c @@ -288,7 +288,7 @@ int proc_fd_permission(struct inode *ino int rv = generic_permission(inode, mask); if (rv == 0) return 0; - if (task_pid(current) == proc_pid(inode)) + if (task_tgid(current) == proc_pid(inode)) rv = 0; return rv; } ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH] proc: make proc_fd_permission() thread-friendly 2013-08-26 16:37 ` Oleg Nesterov @ 2013-08-26 17:54 ` Oleg Nesterov 2013-08-26 18:09 ` Linus Torvalds 2013-08-26 18:32 ` Eric W. Biederman 0 siblings, 2 replies; 66+ messages in thread From: Oleg Nesterov @ 2013-08-26 17:54 UTC (permalink / raw) To: Andrew Morton Cc: Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler, Linus Torvalds, Eric W. Biederman proc_fd_permission() says "process can still access /proc/self/fd after it has executed a setuid()", but the "task_pid() = proc_pid() check only helps if the task is group leader, /proc/self points to /proc/leader-pid. Change this check to use task_tgid() so that the whole process can access /proc/self/fd. Note: CLONE_THREAD doesn't require CLONE_FILES so task->files can differ, but I don't think this can lead to any security problem. Test-case: void *tfunc(void *arg) { assert(opendir("/proc/self/fd")); return NULL; } int main(void) { pthread_t t; pthread_create(&t, NULL, tfunc, NULL); pthread_join(t, NULL); return 0; } fails if, say, this executable is not readable and suid_dumpable = 0. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- fs/proc/fd.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 0ff80f9..985ea88 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -286,7 +286,7 @@ int proc_fd_permission(struct inode *inode, int mask) int rv = generic_permission(inode, mask); if (rv == 0) return 0; - if (task_pid(current) == proc_pid(inode)) + if (task_tgid(current) == proc_pid(inode)) rv = 0; return rv; } -- 1.5.5.1 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH] proc: make proc_fd_permission() thread-friendly 2013-08-26 17:54 ` [PATCH] proc: make proc_fd_permission() thread-friendly Oleg Nesterov @ 2013-08-26 18:09 ` Linus Torvalds 2013-08-26 19:35 ` Linus Torvalds 2013-08-26 18:32 ` Eric W. Biederman 1 sibling, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2013-08-26 18:09 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler, Eric W. Biederman On Mon, Aug 26, 2013 at 10:54 AM, Oleg Nesterov <oleg@redhat.com> wrote: > proc_fd_permission() says "process can still access /proc/self/fd > after it has executed a setuid()", but the "task_pid() = proc_pid() > check only helps if the task is group leader, /proc/self points to > /proc/leader-pid. Patch looks ok to me, but since this has never worked and nobody has actually complained, I can't really convince myself that this is critical. So I'm not going to apply it for 3.11, and it had better wait for the next merge window. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] proc: make proc_fd_permission() thread-friendly 2013-08-26 18:09 ` Linus Torvalds @ 2013-08-26 19:35 ` Linus Torvalds 2013-08-26 20:20 ` Willy Tarreau ` (2 more replies) 0 siblings, 3 replies; 66+ messages in thread From: Linus Torvalds @ 2013-08-26 19:35 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler, Eric W. Biederman On Mon, Aug 26, 2013 at 11:09 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Patch looks ok to me, but since this has never worked and nobody has > actually complained, I can't really convince myself that this is > critical. Actually, let's back-track.. Did you try the other approach? Make /proc/self point to the thread instead of the task? The thread-group leader seems to have these extra files: - autogroup, coredump_filter, mountstats, net, task but quite frankly, at least "net" and "task" look like they should exist there - with "task" pointing back to the actual task (it would make more sense for "/proc/<pid>/task" itself to be named "threads", but whatever). Yes, it would be semantically different, but it would mean that "/proc/self/fd/" would actually make sense in a way that it currently does *not* - which would seem fairly important, since the primary use for it tends to be /dev/stdin. And the other semantic differences might be much harder to notice. Worth testing? Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] proc: make proc_fd_permission() thread-friendly 2013-08-26 19:35 ` Linus Torvalds @ 2013-08-26 20:20 ` Willy Tarreau 2013-08-27 15:05 ` Oleg Nesterov 2013-08-27 14:39 ` [PATCH 0/1] proc: make /proc/self point to thread Oleg Nesterov [not found] ` <CALCETrXP-mYBPRon=0NzexW1FK1Qxz2+Bwv7-WeHBQpvW7ywRg@mail.gmail.com> 2 siblings, 1 reply; 66+ messages in thread From: Willy Tarreau @ 2013-08-26 20:20 UTC (permalink / raw) To: Linus Torvalds Cc: Oleg Nesterov, Andrew Morton, Al Viro, Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler, Eric W. Biederman On Mon, Aug 26, 2013 at 12:35:26PM -0700, Linus Torvalds wrote: (...) > Yes, it would be semantically different, but it would mean that > "/proc/self/fd/" would actually make sense in a way that it currently > does *not* - which would seem fairly important, since the primary use > for it tends to be /dev/stdin. I remember another user, don't know if that has changed. UPX used to build self-extract binaries that opened /proc/self/fd/3. It might be worth checking this as well if doing important changes, because it's typically a usage that could be discovered to be broken months after the change! Willy ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] proc: make proc_fd_permission() thread-friendly 2013-08-26 20:20 ` Willy Tarreau @ 2013-08-27 15:05 ` Oleg Nesterov 0 siblings, 0 replies; 66+ messages in thread From: Oleg Nesterov @ 2013-08-27 15:05 UTC (permalink / raw) To: Willy Tarreau Cc: Linus Torvalds, Andrew Morton, Al Viro, Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler, Eric W. Biederman On 08/26, Willy Tarreau wrote: > > On Mon, Aug 26, 2013 at 12:35:26PM -0700, Linus Torvalds wrote: > (...) > > Yes, it would be semantically different, but it would mean that > > "/proc/self/fd/" would actually make sense in a way that it currently > > does *not* - which would seem fairly important, since the primary use > > for it tends to be /dev/stdin. > > I remember another user, don't know if that has changed. UPX used to build > self-extract binaries that opened /proc/self/fd/3. But /proc/<tgid>/fd and /proc/<tid>/fd should be the same. Unless it plays with unshare() or clone(CLONE_THREAD /* no CLONE_FILES */). Unlike, but: > because it's typically a usage that > could be discovered to be broken months after the change! Oh yes, I agree. This change is trivial but nasty, god knows what people actually do with /proc/self. Oleg. ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH 0/1] proc: make /proc/self point to thread 2013-08-26 19:35 ` Linus Torvalds 2013-08-26 20:20 ` Willy Tarreau @ 2013-08-27 14:39 ` Oleg Nesterov 2013-08-27 14:40 ` [PATCH 1/1] " Oleg Nesterov [not found] ` <CALCETrXP-mYBPRon=0NzexW1FK1Qxz2+Bwv7-WeHBQpvW7ywRg@mail.gmail.com> 2 siblings, 1 reply; 66+ messages in thread From: Oleg Nesterov @ 2013-08-27 14:39 UTC (permalink / raw) To: Linus Torvalds, Andrew Morton, Eric W. Biederman Cc: Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler On 08/26, Linus Torvalds wrote: > > On Mon, Aug 26, 2013 at 11:09 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > Patch looks ok to me, but since this has never worked and nobody has > > actually complained, I can't really convince myself that this is > > critical. > > Actually, let's back-track.. > > Did you try the other approach? Make /proc/self point to the thread > instead of the task? Yes, I thought about this. But I agree with Eric, we probably need another magic link, /proc/thread or whatever. And. I think that s/task_pid/task_tgid/ in proc_fd_permission() makes sense anyway. It is not only for /proc/self, why we should restrict the access to /proc/<sub-thread>/fd ? > The thread-group leader seems to have these extra files: > > - autogroup, coredump_filter, mountstats, net, task > Note really afaics. Yes, tgid_base_stuff and tid_base_stuff differ, but proc_root_lookup() uses tgid_base_stuff in any case, so /proc/<tid>/ also has task,mountstats,etc even if it is not leader. > Yes, it would be semantically different, And I am afraid this can break things. But I leave this to you and Eric. Personally I think that /proc/self pointing to "current" is better, and in fact I was surprised when I recently found that this is not true. But perhaps it is too late to change this old behaviour. > but it would mean that > "/proc/self/fd/" would actually make sense in a way that it currently > does *not* - which would seem fairly important, since the primary use > for it tends to be /dev/stdin. I think this doesn't matter "in practice", normally all threads have the same ->files. Who needs CLONE_THREAD without CLONE_FILES ? > And the other semantic differences might be much harder to notice. > Worth testing? Perhaps... Well, if Andrew takes this patch (assuming you and Eric ack it), we can see if we have any bug reports. Oleg. ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH 1/1] proc: make /proc/self point to thread 2013-08-27 14:39 ` [PATCH 0/1] proc: make /proc/self point to thread Oleg Nesterov @ 2013-08-27 14:40 ` Oleg Nesterov 2013-08-27 16:39 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Oleg Nesterov @ 2013-08-27 14:40 UTC (permalink / raw) To: Linus Torvalds, Andrew Morton, Eric W. Biederman Cc: Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler /proc/self points to /proc/<tgid>, not to /proc/<tid>, and thus "self" actually means "group leader". Change fs/proc/self.c to print the caller's tid, this is probably what the users actually expect. Note: this is obviously semantically different and the difference is subtle, but most (all?) users should not notice this change. Lets see if it breaks something. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- fs/proc/self.c | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/proc/self.c b/fs/proc/self.c index 6b6a993..7da7154 100644 --- a/fs/proc/self.c +++ b/fs/proc/self.c @@ -11,26 +11,26 @@ static int proc_self_readlink(struct dentry *dentry, char __user *buffer, int buflen) { struct pid_namespace *ns = dentry->d_sb->s_fs_info; - pid_t tgid = task_tgid_nr_ns(current, ns); + pid_t tid = task_pid_nr_ns(current, ns); char tmp[PROC_NUMBUF]; - if (!tgid) + if (!tid) return -ENOENT; - sprintf(tmp, "%d", tgid); + sprintf(tmp, "%d", tid); return vfs_readlink(dentry,buffer,buflen,tmp); } static void *proc_self_follow_link(struct dentry *dentry, struct nameidata *nd) { struct pid_namespace *ns = dentry->d_sb->s_fs_info; - pid_t tgid = task_tgid_nr_ns(current, ns); + pid_t tid = task_pid_nr_ns(current, ns); char *name = ERR_PTR(-ENOENT); - if (tgid) { + if (tid) { /* 11 for max length of signed int in decimal + NULL term */ name = kmalloc(12, GFP_KERNEL); if (!name) name = ERR_PTR(-ENOMEM); else - sprintf(name, "%d", tgid); + sprintf(name, "%d", tid); } nd_set_link(nd, name); return NULL; @@ -57,7 +57,7 @@ int proc_setup_self(struct super_block *s) struct inode *root_inode = s->s_root->d_inode; struct pid_namespace *ns = s->s_fs_info; struct dentry *self; - + mutex_lock(&root_inode->i_mutex); self = d_alloc_name(s->s_root, "self"); if (self) { -- 1.5.5.1 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH 1/1] proc: make /proc/self point to thread 2013-08-27 14:40 ` [PATCH 1/1] " Oleg Nesterov @ 2013-08-27 16:39 ` Linus Torvalds 2013-08-27 17:49 ` Oleg Nesterov 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2013-08-27 16:39 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Eric W. Biederman, Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler On Tue, Aug 27, 2013 at 7:40 AM, Oleg Nesterov <oleg@redhat.com> wrote: > /proc/self points to /proc/<tgid>, not to /proc/<tid>, and thus > "self" actually means "group leader". > > Change fs/proc/self.c to print the caller's tid, this is probably > what the users actually expect. Yeah, thinking more about this, this can't work. I think you'd need to use "/proc/<tgid>/task/<tid>" wouldn't you? Or perhaps even more complicated, and say "if it's a task group leader, keep the old /proc/<tgid>, otherwise do the full thread path". Anyway, does not not sound worth it. Your previous simple patch is probably the right thing to do. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 1/1] proc: make /proc/self point to thread 2013-08-27 16:39 ` Linus Torvalds @ 2013-08-27 17:49 ` Oleg Nesterov 2013-08-27 18:15 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Oleg Nesterov @ 2013-08-27 17:49 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Eric W. Biederman, Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler On 08/27, Linus Torvalds wrote: > > On Tue, Aug 27, 2013 at 7:40 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > /proc/self points to /proc/<tgid>, not to /proc/<tid>, and thus > > "self" actually means "group leader". > > > > Change fs/proc/self.c to print the caller's tid, this is probably > > what the users actually expect. > > Yeah, thinking more about this, this can't work. I think you'd need to > use "/proc/<tgid>/task/<tid>" wouldn't you? Why? To me /proc/self == /proc/$((sys_gettid)) looks more natural. Say, /proc/self/task... But this is subjective. Not to mention, that would be much more visible change. > Anyway, does not not sound worth it. Your previous simple patch is > probably the right thing to do. OK, lets forget it. Although to be honest, I was seduced by "Worth testing". I mean I am just curious, who can suffer from this change? Nevermind, please ignore. Oleg. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 1/1] proc: make /proc/self point to thread 2013-08-27 17:49 ` Oleg Nesterov @ 2013-08-27 18:15 ` Linus Torvalds 2013-08-27 18:28 ` Oleg Nesterov 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2013-08-27 18:15 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Eric W. Biederman, Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler On Tue, Aug 27, 2013 at 10:49 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > Why? To me /proc/self == /proc/$((sys_gettid)) looks more natural. > Say, /proc/self/task... But this is subjective. Actually, you're right - I incorrectly thought we had removed the thread id lookup from the top level in /proc. But we never actually did that. We only removed them from readdir. So while you won't see thread id's in the directory listing, we *do* successfully look up thread id's when specified explicitly. It's confusing, but it happens to work. So you can do ls -l /proc/<tid>/ and get the expected result, but if you do ls -l /proc | grep <tid> it won't actually show up unless the thread ID is also the thread group leader. > Although to be honest, I was seduced by "Worth testing". I mean I am > just curious, who can suffer from this change? Nevermind, please > ignore. Yeah, if we were to redesign /proc I'd do it differently, but I think we should just accept that it works "well enough" and there's just too much risk from making changes that aren't strictly required. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 1/1] proc: make /proc/self point to thread 2013-08-27 18:15 ` Linus Torvalds @ 2013-08-27 18:28 ` Oleg Nesterov 0 siblings, 0 replies; 66+ messages in thread From: Oleg Nesterov @ 2013-08-27 18:28 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Eric W. Biederman, Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler On 08/27, Linus Torvalds wrote: > > On Tue, Aug 27, 2013 at 10:49 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > > Although to be honest, I was seduced by "Worth testing". I mean I am > > just curious, who can suffer from this change? Nevermind, please > > ignore. > > Yeah, if we were to redesign /proc I'd do it differently, but I think > we should just accept that it works "well enough" and there's just too > much risk from making changes that aren't strictly required. Yes, agreed. To all: just in case, I will be completely offline till Sep 10, vacation. So, Eric, if you still disagree with s/task_pid/task_tgid/ change please nack that (minor) patch, we can return to this later. Oleg. ^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <CALCETrXP-mYBPRon=0NzexW1FK1Qxz2+Bwv7-WeHBQpvW7ywRg@mail.gmail.com>]
* Re: [PATCH] proc: make proc_fd_permission() thread-friendly [not found] ` <CALCETrXP-mYBPRon=0NzexW1FK1Qxz2+Bwv7-WeHBQpvW7ywRg@mail.gmail.com> @ 2013-08-27 15:45 ` Oleg Nesterov 0 siblings, 0 replies; 66+ messages in thread From: Oleg Nesterov @ 2013-08-27 15:45 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, linux-kernel, Linux FS Devel, Eric W. Biederman, Al Viro, Andrew Morton, Willy Tarreau, Brad Spengler, Ingo Molnar On 08/26, Andy Lutomirski wrote: > > On Aug 26, 2013 12:35 PM, "Linus Torvalds" <torvalds@linux-foundation.org> > wrote: > > > > Yes, it would be semantically different, but it would mean that > > "/proc/self/fd/" would actually make sense in a way that it currently > > does *not* - which would seem fairly important, since the primary use > > for it tends to be /dev/stdin. > > > > And the other semantic differences might be much harder to notice. > > Worth testing? > > The "children" subdirectory will be different. But it's already screwed up. No, it lives in tid_base_stuff, it should be only visible in /proc/*/task/tid/ dir. (and perhaps it also makes sense to remove mem/exe-like things from tid_base_stuff...) Anyway I agree, this change is risky. Oleg. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] proc: make proc_fd_permission() thread-friendly 2013-08-26 17:54 ` [PATCH] proc: make proc_fd_permission() thread-friendly Oleg Nesterov 2013-08-26 18:09 ` Linus Torvalds @ 2013-08-26 18:32 ` Eric W. Biederman 2013-08-26 18:46 ` Oleg Nesterov 1 sibling, 1 reply; 66+ messages in thread From: Eric W. Biederman @ 2013-08-26 18:32 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler, Linus Torvalds Oleg Nesterov <oleg@redhat.com> writes: > proc_fd_permission() says "process can still access /proc/self/fd > after it has executed a setuid()", but the "task_pid() = proc_pid() > check only helps if the task is group leader, /proc/self points to > /proc/leader-pid. > > Change this check to use task_tgid() so that the whole process can > access /proc/self/fd. There is at least a semantic goofiness here. There is /proc/<tgid>/fd and /proc/<tgid>/task/<pid>/fd, and the same permission check is used by both. We might just want to have a /proc/thread symlink as well so people don't have this issue. Of course that would require people to use it, and I think the common case if people care is call gettid() and build the path themselves. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] proc: make proc_fd_permission() thread-friendly 2013-08-26 18:32 ` Eric W. Biederman @ 2013-08-26 18:46 ` Oleg Nesterov 2013-08-26 18:56 ` Oleg Nesterov 2013-08-26 19:10 ` Eric W. Biederman 0 siblings, 2 replies; 66+ messages in thread From: Oleg Nesterov @ 2013-08-26 18:46 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler, Linus Torvalds On 08/26, Eric W. Biederman wrote: > > Oleg Nesterov <oleg@redhat.com> writes: > > > proc_fd_permission() says "process can still access /proc/self/fd > > after it has executed a setuid()", but the "task_pid() = proc_pid() > > check only helps if the task is group leader, /proc/self points to > > /proc/leader-pid. > > > > Change this check to use task_tgid() so that the whole process can > > access /proc/self/fd. > > There is at least a semantic goofiness here. > > There is /proc/<tgid>/fd and /proc/<tgid>/task/<pid>/fd, and the same > permission check is used by both. Yes, and we have /proc/<tid>/ which includes fd as well. > We might just want to have a /proc/thread symlink as well so people > don't have this issue. Yes! I agree. In particular, from the changelog: Note: CLONE_THREAD doesn't require CLONE_FILES so task->files can differ, so /proc/self/fd doesn't necessarily mean current->files, this can confuse the application. And I also assume that you agree with this change ;) Oleg. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] proc: make proc_fd_permission() thread-friendly 2013-08-26 18:46 ` Oleg Nesterov @ 2013-08-26 18:56 ` Oleg Nesterov 2013-08-26 19:10 ` Eric W. Biederman 1 sibling, 0 replies; 66+ messages in thread From: Oleg Nesterov @ 2013-08-26 18:56 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler, Linus Torvalds On 08/26, Oleg Nesterov wrote: > > And I also assume that you agree with this change ;) At least I hope... Because I think this is another issue. And because with or without a /proc/thread symlink, I believe that people don't expect that /proc/self can restrict a sub-thread, and CLONE_THREAS without CLONE_FILES is exotic. Oleg. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] proc: make proc_fd_permission() thread-friendly 2013-08-26 18:46 ` Oleg Nesterov 2013-08-26 18:56 ` Oleg Nesterov @ 2013-08-26 19:10 ` Eric W. Biederman 2013-08-27 14:53 ` Oleg Nesterov 1 sibling, 1 reply; 66+ messages in thread From: Eric W. Biederman @ 2013-08-26 19:10 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler, Linus Torvalds Oleg Nesterov <oleg@redhat.com> writes: > On 08/26, Eric W. Biederman wrote: >> >> Oleg Nesterov <oleg@redhat.com> writes: >> >> > proc_fd_permission() says "process can still access /proc/self/fd >> > after it has executed a setuid()", but the "task_pid() = proc_pid() >> > check only helps if the task is group leader, /proc/self points to >> > /proc/leader-pid. >> > >> > Change this check to use task_tgid() so that the whole process can >> > access /proc/self/fd. >> >> There is at least a semantic goofiness here. >> >> There is /proc/<tgid>/fd and /proc/<tgid>/task/<pid>/fd, and the same >> permission check is used by both. > > Yes, and we have /proc/<tid>/ which includes fd as well. > >> We might just want to have a /proc/thread symlink as well so people >> don't have this issue. > > Yes! I agree. > > In particular, from the changelog: > > Note: CLONE_THREAD doesn't require CLONE_FILES so task->files can > differ, > > so /proc/self/fd doesn't necessarily mean current->files, this can confuse > the application. > > And I also assume that you agree with this change ;) I don't disagree. Comparing tgid to pids is goofy and my brain is elsewhere so I have no thought through the implications. Actually thinking I think the check should really be. In which case we are comparing what we really care about. int proc_fd_permission(struct inode *inode, int mask) { int rv = generic_permission(inode, mask); if (rv == 0) return 0; rcu_read_lock(); struct task *task = pid_task(proc_pid(inode)); if (task && (current->files == task->files)) rv = 0; rcu_read_unlock(); return rv; } Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] proc: make proc_fd_permission() thread-friendly 2013-08-26 19:10 ` Eric W. Biederman @ 2013-08-27 14:53 ` Oleg Nesterov 0 siblings, 0 replies; 66+ messages in thread From: Oleg Nesterov @ 2013-08-27 14:53 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler, Linus Torvalds On 08/26, Eric W. Biederman wrote: > > Oleg Nesterov <oleg@redhat.com> writes: > > > > And I also assume that you agree with this change ;) > > I don't disagree. Comparing tgid to pids is goofy and my brain is > elsewhere so I have no thought through the implications. > > Actually thinking I think the check should really be. In which case we > are comparing what we really care about. > > int proc_fd_permission(struct inode *inode, int mask) > { > int rv = generic_permission(inode, mask); > if (rv == 0) > return 0; > > rcu_read_lock(); > struct task *task = pid_task(proc_pid(inode)); > if (task && (current->files == task->files)) But for what? To me, this looks like the unnecessary semantic complication. It looks as if we actually need to restrict the access to /proc/self/fd or /proc/<tgid>/fd or /proc/<subthread-tid>/fd. I do not think there is any security reason to deny this. They share ->mm, a sub-thread can do "everything" with its leader or vice versa. same_thread_group() looks more simple and natural to me. And note that __ptrace_may_access() was recently changed (in -mm) to use same_thread_group() instead of "task == current". So personally I'd prefer to not change this patch and I think it makes sense even with "make /proc/self point to thread" I sent. But. please tell me if you really dislike it. You are maintainer, I won't argue. Oleg. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: /proc/pid/fd && anon_inode_fops 2013-08-25 5:23 ` Al Viro 2013-08-25 6:50 ` Willy Tarreau @ 2013-08-25 18:32 ` Linus Torvalds 2013-08-25 19:11 ` Al Viro ` (2 more replies) 1 sibling, 3 replies; 66+ messages in thread From: Linus Torvalds @ 2013-08-25 18:32 UTC (permalink / raw) To: Al Viro Cc: Willy Tarreau, Oleg Nesterov, Andy Lutomirski, security, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler On Sat, Aug 24, 2013 at 10:23 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > We are really stuck with the current semantics here - switching to > *BSD one would not only mean serious surgery on descriptor handling > (it's one of the wartier areas in *BSD VFS, in large part because > of magic-open-really-a-dup kludges they have to do), it would change > a long-standing userland API that had been there for nearly 20 years > _and_ one that tends to be used in corner cases of hell knows how many > scripts. Actually, I'm pretty sure we did have the "dup" semantics at one point (long ago), and they were really nice (because you could use them to see where in the stream the fd was etc). It just fit so horribly badly into the VFS semantics that it got changed into the current "new file descriptor" one. Afaik, nothing broke. So I'm not really sure about the "we're stuck with it" for semantic reasons, and it turns out that very few programs/scripts actually use /proc/<pid>/fd/<nr> at all (random use of /dev/stdin is likely the most common case). But I agree about the "serious surgery on descriptor handling" part. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: /proc/pid/fd && anon_inode_fops 2013-08-25 18:32 ` /proc/pid/fd && anon_inode_fops Linus Torvalds @ 2013-08-25 19:11 ` Al Viro 2013-08-25 19:17 ` Andy Lutomirski 2013-09-03 15:58 ` Pavel Machek 2 siblings, 0 replies; 66+ messages in thread From: Al Viro @ 2013-08-25 19:11 UTC (permalink / raw) To: Linus Torvalds Cc: Willy Tarreau, Oleg Nesterov, Andy Lutomirski, security, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler On Sun, Aug 25, 2013 at 11:32:45AM -0700, Linus Torvalds wrote: > On Sat, Aug 24, 2013 at 10:23 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > We are really stuck with the current semantics here - switching to > > *BSD one would not only mean serious surgery on descriptor handling > > (it's one of the wartier areas in *BSD VFS, in large part because > > of magic-open-really-a-dup kludges they have to do), it would change > > a long-standing userland API that had been there for nearly 20 years > > _and_ one that tends to be used in corner cases of hell knows how many > > scripts. > > Actually, I'm pretty sure we did have the "dup" semantics at one point > (long ago), and they were really nice (because you could use them to > see where in the stream the fd was etc). It just fit so horribly badly > into the VFS semantics that it got changed into the current "new file > descriptor" one. Afaik, nothing broke. > > So I'm not really sure about the "we're stuck with it" for semantic > reasons, and it turns out that very few programs/scripts actually use > /proc/<pid>/fd/<nr> at all (random use of /dev/stdin is likely the > most common case). But I agree about the "serious surgery on > descriptor handling" part. Well... We are actually in better position for that these days; right now we have very few instances of ->atomic_open(), so we could change the calling conventions for it. It returns 0 or -error and we could turn that into NULL, ERR_PTR(-error) or a reference to already opened struct file. It's not _that_ far to propagate from that point - atomic_open() <- lookup_open() <- do_last() <- path_openat(). So the amount of surgery is nowhere near the horrors we used to need (and *BSD actually does). We could try that, but I'm really afraid that semantics changes will break stuff; worse yet, that it'll happen to stuff in dusty corners of random admin scripts nobody able to debug anymore ;-/ ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: /proc/pid/fd && anon_inode_fops 2013-08-25 18:32 ` /proc/pid/fd && anon_inode_fops Linus Torvalds 2013-08-25 19:11 ` Al Viro @ 2013-08-25 19:17 ` Andy Lutomirski 2013-09-03 15:58 ` Pavel Machek 2 siblings, 0 replies; 66+ messages in thread From: Andy Lutomirski @ 2013-08-25 19:17 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Willy Tarreau, Oleg Nesterov, security, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler On Sun, Aug 25, 2013 at 11:32 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Sat, Aug 24, 2013 at 10:23 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: >> >> We are really stuck with the current semantics here - switching to >> *BSD one would not only mean serious surgery on descriptor handling >> (it's one of the wartier areas in *BSD VFS, in large part because >> of magic-open-really-a-dup kludges they have to do), it would change >> a long-standing userland API that had been there for nearly 20 years >> _and_ one that tends to be used in corner cases of hell knows how many >> scripts. > > Actually, I'm pretty sure we did have the "dup" semantics at one point > (long ago), and they were really nice (because you could use them to > see where in the stream the fd was etc). It just fit so horribly badly > into the VFS semantics that it got changed into the current "new file > descriptor" one. Afaik, nothing broke. > We have fdinfo now, which is IMO much less scary. Programs can find the stream position, but they can't change it. OTOH... > So I'm not really sure about the "we're stuck with it" for semantic > reasons, and it turns out that very few programs/scripts actually use > /proc/<pid>/fd/<nr> at all (random use of /dev/stdin is likely the > most common case). But I agree about the "serious surgery on > descriptor handling" part. .../dev/stdin doesn't actually do what you expect if input comes from something seekable. $ cat /proc/self/fd/3 test $ cat /proc/self/fd/3 test $ cat /proc/self/fd/3 test $ cat <&3 est $ cat /proc/self/fd/3 test $ cat <&3 (I'm not going to advocate for changing this.) --Andy > > Linus -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: /proc/pid/fd && anon_inode_fops 2013-08-25 18:32 ` /proc/pid/fd && anon_inode_fops Linus Torvalds 2013-08-25 19:11 ` Al Viro 2013-08-25 19:17 ` Andy Lutomirski @ 2013-09-03 15:58 ` Pavel Machek 2 siblings, 0 replies; 66+ messages in thread From: Pavel Machek @ 2013-09-03 15:58 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Willy Tarreau, Oleg Nesterov, Andy Lutomirski, security, Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel, Brad Spengler Hi! > > We are really stuck with the current semantics here - switching to > > *BSD one would not only mean serious surgery on descriptor handling > > (it's one of the wartier areas in *BSD VFS, in large part because > > of magic-open-really-a-dup kludges they have to do), it would change > > a long-standing userland API that had been there for nearly 20 years > > _and_ one that tends to be used in corner cases of hell knows how many > > scripts. > > Actually, I'm pretty sure we did have the "dup" semantics at one point > (long ago), and they were really nice (because you could use them to > see where in the stream the fd was etc). It just fit so horribly badly > into the VFS semantics that it got changed into the current "new file > descriptor" one. Afaik, nothing broke. Hmm, are you going to break my exploit? http://www.exploit-db.com/exploits/10038/ I'd like that, because I don't think /proc should allow people to bypass directory permissions. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: /proc/pid/fd && anon_inode_fops 2013-08-24 21:24 ` Willy Tarreau 2013-08-25 5:23 ` Al Viro @ 2013-08-25 15:45 ` Oleg Nesterov 1 sibling, 0 replies; 66+ messages in thread From: Oleg Nesterov @ 2013-08-25 15:45 UTC (permalink / raw) To: Willy Tarreau Cc: Andy Lutomirski, Linus Torvalds, security, Ingo Molnar, Linux Kernel Mailing List, Al Viro, Linux FS Devel, Brad Spengler Hi Willy, On 08/24, Willy Tarreau wrote: > > On Sat, Aug 24, 2013 at 08:29:39PM +0200, Oleg Nesterov wrote: > > > > On 08/22, Willy Tarreau wrote: > > > > > > It's not only that, it also supports sockets and pipes that you can access > > > via /proc/pid/fd and not via a real symlink which would try to open eg > > > "pipe:[23456]" instead of the real file. > > > > But sock_no_open() disallows this, and for good reason I guess. > > Hmmm not exactly, it works for a pipe but not for a socket. But this is what I meant, sorry for confusion. Let me try to explain. Just in case, this has nothing to do with security and I do not see any problem, still I think there is something wrong (but harmless). Suppose that you are trying to open(/proc/pid/$pipe-or-socket-fd). nd_jump_link() sets nd->inode correctly, then dentry_open() does the rest. Everything is correct at this stage, the new file gets the correct f_inode/f_op. However, unlike fifo_open(), socket_file_ops->open() can not actually create the file/sock connection, so sock_no_open() just fails and nothing bad happens. But if you open an anon_inodefs file via proc, you get the "bogus" file. There is a single anon_inode_inode, its ->i_fop points to the empty anon_inode_fops, this has nothing to do with ->f_op of the actual file you tried to open. Nothing bad happens, still I think this is simply wrong and misleading, and thus I think it would be better to disallow this via anon_open(). Oleg. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) 2013-08-22 19:05 ` Andy Lutomirski 2013-08-22 19:23 ` Linus Torvalds @ 2013-08-22 19:39 ` Willy Tarreau 1 sibling, 0 replies; 66+ messages in thread From: Willy Tarreau @ 2013-08-22 19:39 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, security, Ingo Molnar, Linux Kernel Mailing List, Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler On Thu, Aug 22, 2013 at 12:05:50PM -0700, Andy Lutomirski wrote: > On Thu, Aug 22, 2013 at 11:53 AM, Willy Tarreau <w@1wt.eu> wrote: > > On Thu, Aug 22, 2013 at 11:48:10AM -0700, Linus Torvalds wrote: > >> And I'm wondering if we shouldn't actually do that at "path_init" > >> time. Right now the code says: > >> > >> /* Caller must check execute permissions on the > >> starting path component */ > >> struct fd f = fdget_raw(dfd); > >> > >> and then uses the struct file mindlessly. > >> > >> I'm wondering if we should just do some validation in that place, and say: > >> > >> - for directories, we require exec permissions here > >> - for everything else, we require that f->f_cred == current->cred check. > > Does this work for the procfs case? As far as I understand it (which > isn't saying much), it goes through the symlink-following path. Indeed I checked yesterday that it seems to use proc_pid_follow_link() for fd/, cwd, root and exe, which means the same tests are used everywhere. > >> I dunno. But that I_LINKABLE thing just bothers me. It screams "I'm > >> hacky" to me. > > > > I agreed, simply because the condition here is different from the one in /proc. > > > > I have read some code last evening to try to understand how /proc/pid/fd > > entries were granted access to various processes, because I would love to > > see the same condition being used in both places. Unfortunately, it's beyond > > my skills, and I stopped after my random attempts gave me some panics. > > What if we added another field to struct nameidata that's indicates > what restrictions need to be enforced when following magical symlinks > and then enforcing them when nd_jump_link gets used. (There are only > two of these, both in procfs.) I tried to add a test based on a mount option before this call to nd_jump_link() when I realized my attempt was a total disaster because I'm a noob. But what I think would be nice (at least as an opt-in) would be : - processes which don't share the same root should not be allowed to access files through /proc/pid/{root,cwd,exe,fd/*} - keep the current restrictions as well of course - the exact same restrictions should apply to AT_EMPTY_PATH I might be totally wrong, but as a user that's what I find natural and what I tend to expect. > Then open could check that the original fd was opened for a superset > of the requested permissions (or that the caller has > CAP_DAC_OVERRIDE), linkat could check whatever it feels like checking, > etc. Do not forget that 2 other syscalls seem to support AT_EMPTH_PATH as well if that makes a difference. > This would allow all of these issues to be fixed for real (controlled > by sysctl, presumably). If needed for backwards compatibility, possibly, though I doubt that there are apps that *rely* on the current lack of isolation between chroots. But at the same time I hate to break existing setups :-) > --Andy Willy ^ permalink raw reply [flat|nested] 66+ messages in thread
end of thread, other threads:[~2013-09-03 15:58 UTC | newest] Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-08-21 19:14 [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) Andy Lutomirski [not found] ` <CA+55aFxi-ps2f2M8xPhfbuQ0pToqupPrDsLi2+GPUK2sqdYfUw@mail.gmail.com> [not found] ` <CALCETrW7+LcexA6v6RQDKhni_yJAduOmiSDneCpq3v8sPDvwUQ@mail.gmail.com> 2013-08-21 20:16 ` Willy Tarreau 2013-08-22 18:48 ` Linus Torvalds 2013-08-22 18:53 ` Willy Tarreau 2013-08-22 19:05 ` Andy Lutomirski 2013-08-22 19:23 ` Linus Torvalds 2013-08-22 20:10 ` Andy Lutomirski 2013-08-22 20:15 ` Willy Tarreau 2013-08-22 20:22 ` Andy Lutomirski 2013-08-22 20:44 ` Linus Torvalds 2013-08-22 20:48 ` Andy Lutomirski 2013-08-22 20:54 ` Linus Torvalds 2013-08-22 20:58 ` Andy Lutomirski 2013-08-23 1:07 ` Al Viro 2013-08-25 3:37 ` Al Viro 2013-08-25 7:26 ` Andy Lutomirski 2013-08-25 14:23 ` Al Viro 2013-08-25 17:04 ` Andy Lutomirski 2013-08-25 19:57 ` Linus Torvalds 2013-08-25 20:06 ` Al Viro 2013-08-25 20:23 ` Linus Torvalds 2013-08-26 17:37 ` Linus Torvalds 2013-08-26 18:07 ` Linus Torvalds 2013-08-26 18:11 ` Andy Lutomirski 2013-08-27 19:16 ` [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate Andy Lutomirski 2013-08-27 19:32 ` Linus Torvalds 2013-08-27 20:28 ` Andy Lutomirski 2013-08-28 6:16 ` Al Viro 2013-08-28 16:24 ` Linus Torvalds 2013-08-28 19:04 ` Andy Lutomirski 2013-08-28 19:59 ` Al Viro 2013-08-28 21:07 ` Andy Lutomirski 2013-08-27 23:08 ` Al Viro 2013-08-27 23:13 ` Andy Lutomirski 2013-08-24 18:29 ` /proc/pid/fd && anon_inode_fops Oleg Nesterov 2013-08-24 21:24 ` Willy Tarreau 2013-08-25 5:23 ` Al Viro 2013-08-25 6:50 ` Willy Tarreau 2013-08-25 18:51 ` Linus Torvalds 2013-08-25 19:48 ` Oleg Nesterov 2013-08-25 20:05 ` Linus Torvalds 2013-08-26 15:33 ` Oleg Nesterov 2013-08-26 16:37 ` Oleg Nesterov 2013-08-26 17:54 ` [PATCH] proc: make proc_fd_permission() thread-friendly Oleg Nesterov 2013-08-26 18:09 ` Linus Torvalds 2013-08-26 19:35 ` Linus Torvalds 2013-08-26 20:20 ` Willy Tarreau 2013-08-27 15:05 ` Oleg Nesterov 2013-08-27 14:39 ` [PATCH 0/1] proc: make /proc/self point to thread Oleg Nesterov 2013-08-27 14:40 ` [PATCH 1/1] " Oleg Nesterov 2013-08-27 16:39 ` Linus Torvalds 2013-08-27 17:49 ` Oleg Nesterov 2013-08-27 18:15 ` Linus Torvalds 2013-08-27 18:28 ` Oleg Nesterov [not found] ` <CALCETrXP-mYBPRon=0NzexW1FK1Qxz2+Bwv7-WeHBQpvW7ywRg@mail.gmail.com> 2013-08-27 15:45 ` [PATCH] proc: make proc_fd_permission() thread-friendly Oleg Nesterov 2013-08-26 18:32 ` Eric W. Biederman 2013-08-26 18:46 ` Oleg Nesterov 2013-08-26 18:56 ` Oleg Nesterov 2013-08-26 19:10 ` Eric W. Biederman 2013-08-27 14:53 ` Oleg Nesterov 2013-08-25 18:32 ` /proc/pid/fd && anon_inode_fops Linus Torvalds 2013-08-25 19:11 ` Al Viro 2013-08-25 19:17 ` Andy Lutomirski 2013-09-03 15:58 ` Pavel Machek 2013-08-25 15:45 ` Oleg Nesterov 2013-08-22 19:39 ` [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) Willy Tarreau
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).