linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
@ 2013-08-21 19:14 Andy Lutomirski
       [not found] ` <CA+55aFxi-ps2f2M8xPhfbuQ0pToqupPrDsLi2+GPUK2sqdYfUw@mail.gmail.com>
  2013-08-22 18:48 ` Linus Torvalds
  0 siblings, 2 replies; 66+ messages in thread
From: Andy Lutomirski @ 2013-08-21 19:14 UTC (permalink / raw)
  To: Linus Torvalds, security
  Cc: Ingo Molnar, Willy Tarreau, linux-kernel, oleg, Al Viro,
	Linux FS Devel, spender, Andy Lutomirski

There have long been two ways to ask the kernel to create a new
hardlink to the inode represented by an fd: linkat(..., AT_EMPTY_PATH)
and AT_SYMLINK_FOLLOW on /proc/self/fd/N.  The latter has no
particular security restrictions, but the former required privilege
until:

    commit bb2314b47996491bbc5add73633905c3120b6268
    Author: Andy Lutomirski <luto@amacapital.net>
    Date:   Thu Aug 1 21:44:31 2013 -0700

        fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink

Spender pointed out that there could be code that passes an fd to an
untrusted, chrooted process, and allowing that process to flink the
fd could be dangerous.  (I'm not aware of any actual known attack.)

So let's be careful for now: only allow linkat(..., AT_EMPTY_PATH)
if the target is I_LINKABLE.  I_LINKABLE is only set for inodes
created by O_TMPFILE without O_EXCL, which is an explicit request
for flinkability.

Reported-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---

Changes from v1: Removed an unnecessary spin_lock.

 fs/namei.c | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 89a612e..9802d51 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3652,6 +3652,26 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de
 }
 
 /*
+ * flink is dangerous.  For now, only allow it when specifically requested
+ * or when the caller is privileged.
+ *
+ * NB: This is not currently enforced for AT_SYMLINK_FOLLOW on procfs.
+ *     Fixing that would be intrusive and could break something.
+ */
+static bool may_flink(const struct path *path)
+{
+	struct inode *inode = path->dentry->d_inode;
+
+	/*
+	 * This is racy: I_LINKABLE could be cleared between this check
+	 * and the actual link operation.  This is odd but not a security
+	 * problem: the caller could get the same effect by flinking once
+	 * and then using normal link(2) to create a second link.
+	 */
+	return (inode->i_state & I_LINKABLE) || capable(CAP_DAC_READ_SEARCH);
+}
+
+/*
  * Hardlinks are often used in delicate situations.  We avoid
  * security-related surprises by not following symlinks on the
  * newname.  --KAB
@@ -3670,10 +3690,7 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
 
 	if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0)
 		return -EINVAL;
-	/*
-	 * Using empty names is equivalent to using AT_SYMLINK_FOLLOW
-	 * on /proc/self/fd/<fd>.
-	 */
+
 	if (flags & AT_EMPTY_PATH)
 		how = LOOKUP_EMPTY;
 
@@ -3684,6 +3701,11 @@ retry:
 	if (error)
 		return error;
 
+	if ((flags & AT_EMPTY_PATH) && !may_flink(&old_path)) {
+		error = -EPERM;
+		goto out;
+	}
+
 	new_dentry = user_path_create(newdfd, newname, &new_path,
 					(how & LOOKUP_REVAL));
 	error = PTR_ERR(new_dentry);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
       [not found]   ` <CALCETrW7+LcexA6v6RQDKhni_yJAduOmiSDneCpq3v8sPDvwUQ@mail.gmail.com>
@ 2013-08-21 20:16     ` Willy Tarreau
  0 siblings, 0 replies; 66+ messages in thread
From: Willy Tarreau @ 2013-08-21 20:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Oleg Nesterov, Brad Spengler, Al Viro,
	Ingo Molnar, security, linux-kernel, Linux FS Devel

On Wed, Aug 21, 2013 at 12:33:24PM -0700, Andy Lutomirski wrote:
> On Wed, Aug 21, 2013 at 12:29 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On my phone, sorry. Removed lists due to HTML crud..

OK no HTML from me here, so re-adding the 3 lists if that can help reaching
an end faster on this stuff.

> > If the issue is just reopening a read-only file descriptor, then why isn't
> > this an issue for just regular openat?
> >
> > Iow, I get the feeling this isn't about flink, but about AT_EMPTY_PATH in
> > general.

That's my understanding as well since it's what replaces /proc/self/fd/ which
is he difference in this case. However AT_EMPTY_PATH is currently limited to
linkat(2), fstatat(2) and name_to_handle_at(2) from what I'm seeing, so it
looks like only linkat would permit the calling process to gain more permissions
by using it when it has no /proc access.

> Dunno, since I still haven't come up with a real-world attack here.
> Maybe someone chroots, runs something in the chroot, kills that thing,
> and then expects that it hasn't somehow retained access to the file
> (by, say, linking it).

In my understanding, it does not necessarily need to klil that thing to
get an issue. Let's imagine a completely made up example. A malware
scanner on a proxy uses sandboxes to analyse suspicious contents. For
this it holds a read-only mmap to the signature database, forks and
chroots the scanner process to avoid any risk of getting fooled by the
malicious code. If the malicious code manages to execute some code in
the context of the sandboxed scanner that still has access to the
signature db, it could use linkat(AT_EMPTH_PATH) to this fd to recreate
a file in the chroot, and reopen it R/W to modify it and remove any trace
of its signature.

Sure it's a bit far-fetched, but I think that between this and the most
trivial case, there might be a few real world cases. Also I still feel
concerned by the ability for an unprivileged process to create device
nodes in a chroot using this. I don't have any immediate impact in mind
but I tend to design my chroots so that I'm sure there is no way to get
a /dev there, and this would leave me a bit of doubt.

> It may pay to consider some (opt-in) hardening of procfs's follow_link.

I agree. A mount option would be great for some constrained environments.

> Sometimes I wish that hardlinks had never been invented and that
> people had gone straight to cow links.
>
> --Andy

:-)

Willy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-21 19:14 [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) Andy Lutomirski
       [not found] ` <CA+55aFxi-ps2f2M8xPhfbuQ0pToqupPrDsLi2+GPUK2sqdYfUw@mail.gmail.com>
@ 2013-08-22 18:48 ` Linus Torvalds
  2013-08-22 18:53   ` Willy Tarreau
  1 sibling, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2013-08-22 18:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: security, Ingo Molnar, Willy Tarreau, Linux Kernel Mailing List,
	Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler

On Wed, Aug 21, 2013 at 12:14 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> So let's be careful for now: only allow linkat(..., AT_EMPTY_PATH)
> if the target is I_LINKABLE.

So I really detest this just because it's such a special case. Now
this is only useful for that one special case, and the thing very
fundamentally checks that one special case in a place that is
impossible to check for the /proc case, so the proc case remains
totally separate.

Which just bothers me.

I think we could easily at least allow the "file->f_creds ==
current->creds" case (and yes, I literally mean comparing the pointers
- not only is it cheaper, but it literally means "nothing odd has
happened in between opening and the lookup").

And I'm wondering if we shouldn't actually do that at "path_init"
time. Right now the code says:

                /* Caller must check execute permissions on the
starting path component */
                struct fd f = fdget_raw(dfd);

and then uses the struct file mindlessly.

I'm wondering if we should just do some validation in that place, and say:

 - for directories, we require exec permissions here
 - for everything else, we require that f->f_cred == current->cred check.

I dunno. But that I_LINKABLE thing just bothers me. It screams "I'm
hacky" to me.

               Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-22 18:48 ` Linus Torvalds
@ 2013-08-22 18:53   ` Willy Tarreau
  2013-08-22 19:05     ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Willy Tarreau @ 2013-08-22 18:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, security, Ingo Molnar,
	Linux Kernel Mailing List, Oleg Nesterov, Al Viro,
	Linux FS Devel, Brad Spengler

On Thu, Aug 22, 2013 at 11:48:10AM -0700, Linus Torvalds wrote:
> On Wed, Aug 21, 2013 at 12:14 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> >
> > So let's be careful for now: only allow linkat(..., AT_EMPTY_PATH)
> > if the target is I_LINKABLE.
> 
> So I really detest this just because it's such a special case. Now
> this is only useful for that one special case, and the thing very
> fundamentally checks that one special case in a place that is
> impossible to check for the /proc case, so the proc case remains
> totally separate.
> 
> Which just bothers me.
> 
> I think we could easily at least allow the "file->f_creds ==
> current->creds" case (and yes, I literally mean comparing the pointers
> - not only is it cheaper, but it literally means "nothing odd has
> happened in between opening and the lookup").
> 
> And I'm wondering if we shouldn't actually do that at "path_init"
> time. Right now the code says:
> 
>                 /* Caller must check execute permissions on the
> starting path component */
>                 struct fd f = fdget_raw(dfd);
> 
> and then uses the struct file mindlessly.
> 
> I'm wondering if we should just do some validation in that place, and say:
> 
>  - for directories, we require exec permissions here
>  - for everything else, we require that f->f_cred == current->cred check.
> 
> I dunno. But that I_LINKABLE thing just bothers me. It screams "I'm
> hacky" to me.

I agreed, simply because the condition here is different from the one in /proc.

I have read some code last evening to try to understand how /proc/pid/fd
entries were granted access to various processes, because I would love to
see the same condition being used in both places. Unfortunately, it's beyond
my skills, and I stopped after my random attempts gave me some panics.

Willy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-22 18:53   ` Willy Tarreau
@ 2013-08-22 19:05     ` Andy Lutomirski
  2013-08-22 19:23       ` Linus Torvalds
  2013-08-22 19:39       ` [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) Willy Tarreau
  0 siblings, 2 replies; 66+ messages in thread
From: Andy Lutomirski @ 2013-08-22 19:05 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, security, Ingo Molnar, Linux Kernel Mailing List,
	Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler

On Thu, Aug 22, 2013 at 11:53 AM, Willy Tarreau <w@1wt.eu> wrote:
> On Thu, Aug 22, 2013 at 11:48:10AM -0700, Linus Torvalds wrote:
>> On Wed, Aug 21, 2013 at 12:14 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> >
>> > So let's be careful for now: only allow linkat(..., AT_EMPTY_PATH)
>> > if the target is I_LINKABLE.
>>
>> So I really detest this just because it's such a special case. Now
>> this is only useful for that one special case, and the thing very
>> fundamentally checks that one special case in a place that is
>> impossible to check for the /proc case, so the proc case remains
>> totally separate.
>>
>> Which just bothers me.

Agreed.  It sucks.  See below (the bottom).

>>
>> I think we could easily at least allow the "file->f_creds ==
>> current->creds" case (and yes, I literally mean comparing the pointers
>> - not only is it cheaper, but it literally means "nothing odd has
>> happened in between opening and the lookup").

I thought about that, but it won't catch chroot.  (Admittedly, chroot
without changing creds is asking for trouble.)

>>
>> And I'm wondering if we shouldn't actually do that at "path_init"
>> time. Right now the code says:
>>
>>                 /* Caller must check execute permissions on the
>> starting path component */
>>                 struct fd f = fdget_raw(dfd);
>>
>> and then uses the struct file mindlessly.
>>
>> I'm wondering if we should just do some validation in that place, and say:
>>
>>  - for directories, we require exec permissions here
>>  - for everything else, we require that f->f_cred == current->cred check.

Does this work for the procfs case?  As far as I understand it (which
isn't saying much), it goes through the symlink-following path.

>>
>> I dunno. But that I_LINKABLE thing just bothers me. It screams "I'm
>> hacky" to me.
>
> I agreed, simply because the condition here is different from the one in /proc.
>
> I have read some code last evening to try to understand how /proc/pid/fd
> entries were granted access to various processes, because I would love to
> see the same condition being used in both places. Unfortunately, it's beyond
> my skills, and I stopped after my random attempts gave me some panics.

What if we added another field to struct nameidata that's indicates
what restrictions need to be enforced when following magical symlinks
and then enforcing them when nd_jump_link gets used.  (There are only
two of these, both in procfs.)

Then open could check that the original fd was opened for a superset
of the requested permissions (or that the caller has
CAP_DAC_OVERRIDE), linkat could check whatever it feels like checking,
etc.

This would allow all of these issues to be fixed for real (controlled
by sysctl, presumably).

--Andy

>
> Willy
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-22 19:05     ` Andy Lutomirski
@ 2013-08-22 19:23       ` Linus Torvalds
  2013-08-22 20:10         ` Andy Lutomirski
  2013-08-22 19:39       ` [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) Willy Tarreau
  1 sibling, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2013-08-22 19:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List,
	Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler

On Thu, Aug 22, 2013 at 12:05 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> Does this work for the procfs case?  As far as I understand it (which
> isn't saying much), it goes through the symlink-following path.

Right. The /proc case is still separate, and we really should do
something about that too. But again, I don't think I_LINKABLE is the
thing to use there either. We probably should tighten up the magic
/proc follow-link a lot.

> What if we added another field to struct nameidata that's indicates
> what restrictions need to be enforced when following magical symlinks
> and then enforcing them when nd_jump_link gets used.  (There are only
> two of these, both in procfs.)

Yes, I think that might be just the kind of thing to do. Except some
tightening could well be quite regardless of any extra flags.

            Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-22 19:05     ` Andy Lutomirski
  2013-08-22 19:23       ` Linus Torvalds
@ 2013-08-22 19:39       ` Willy Tarreau
  1 sibling, 0 replies; 66+ messages in thread
From: Willy Tarreau @ 2013-08-22 19:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, security, Ingo Molnar, Linux Kernel Mailing List,
	Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler

On Thu, Aug 22, 2013 at 12:05:50PM -0700, Andy Lutomirski wrote:
> On Thu, Aug 22, 2013 at 11:53 AM, Willy Tarreau <w@1wt.eu> wrote:
> > On Thu, Aug 22, 2013 at 11:48:10AM -0700, Linus Torvalds wrote:
> >> And I'm wondering if we shouldn't actually do that at "path_init"
> >> time. Right now the code says:
> >>
> >>                 /* Caller must check execute permissions on the
> >> starting path component */
> >>                 struct fd f = fdget_raw(dfd);
> >>
> >> and then uses the struct file mindlessly.
> >>
> >> I'm wondering if we should just do some validation in that place, and say:
> >>
> >>  - for directories, we require exec permissions here
> >>  - for everything else, we require that f->f_cred == current->cred check.
> 
> Does this work for the procfs case?  As far as I understand it (which
> isn't saying much), it goes through the symlink-following path.

Indeed I checked yesterday that it seems to use proc_pid_follow_link() for
fd/, cwd, root and exe, which means the same tests are used everywhere.

> >> I dunno. But that I_LINKABLE thing just bothers me. It screams "I'm
> >> hacky" to me.
> >
> > I agreed, simply because the condition here is different from the one in /proc.
> >
> > I have read some code last evening to try to understand how /proc/pid/fd
> > entries were granted access to various processes, because I would love to
> > see the same condition being used in both places. Unfortunately, it's beyond
> > my skills, and I stopped after my random attempts gave me some panics.
> 
> What if we added another field to struct nameidata that's indicates
> what restrictions need to be enforced when following magical symlinks
> and then enforcing them when nd_jump_link gets used.  (There are only
> two of these, both in procfs.)

I tried to add a test based on a mount option before this call to
nd_jump_link() when I realized my attempt was a total disaster because
I'm a noob. But what I think would be nice (at least as an opt-in) would
be :

  - processes which don't share the same root should not be allowed to
    access files through /proc/pid/{root,cwd,exe,fd/*}

  - keep the current restrictions as well of course

  - the exact same restrictions should apply to AT_EMPTY_PATH

I might be totally wrong, but as a user that's what I find natural and
what I tend to expect.

> Then open could check that the original fd was opened for a superset
> of the requested permissions (or that the caller has
> CAP_DAC_OVERRIDE), linkat could check whatever it feels like checking,
> etc.

Do not forget that 2 other syscalls seem to support AT_EMPTH_PATH as
well if that makes a difference.

> This would allow all of these issues to be fixed for real (controlled
> by sysctl, presumably).

If needed for backwards compatibility, possibly, though I doubt that
there are apps that *rely* on the current lack of isolation between
chroots. But at the same time I hate to break existing setups :-)

> --Andy

Willy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-22 19:23       ` Linus Torvalds
@ 2013-08-22 20:10         ` Andy Lutomirski
  2013-08-22 20:15           ` Willy Tarreau
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2013-08-22 20:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List,
	Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler

On Thu, Aug 22, 2013 at 12:23 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Aug 22, 2013 at 12:05 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> Does this work for the procfs case?  As far as I understand it (which
>> isn't saying much), it goes through the symlink-following path.
>
> Right. The /proc case is still separate, and we really should do
> something about that too. But again, I don't think I_LINKABLE is the
> thing to use there either. We probably should tighten up the magic
> /proc follow-link a lot.
>
>> What if we added another field to struct nameidata that's indicates
>> what restrictions need to be enforced when following magical symlinks
>> and then enforcing them when nd_jump_link gets used.  (There are only
>> two of these, both in procfs.)
>
> Yes, I think that might be just the kind of thing to do. Except some
> tightening could well be quite regardless of any extra flags.
>

What's the point of nd_jump_link anyway?  The only way I can think of
for a magic symlink in /proc to point to another symlink is to open a
symlink with O_PATH | O_NOFOLLOW.  Actually trying to use the
resulting link in /proc results in -ELOOP.  (Even just trying to open
a normal symlink with O_NOFOLLOW and without O_PATH results in
-ELOOP.)

It might be a lot simpler to rig up nd_jump_link to immediately
terminate lookup and let the callers (or the outer level of lookup)
deal with it.  That way the checks could be (more) easily unified with
AT_EMPTY_PATH.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-22 20:10         ` Andy Lutomirski
@ 2013-08-22 20:15           ` Willy Tarreau
  2013-08-22 20:22             ` Andy Lutomirski
  2013-08-24 18:29             ` /proc/pid/fd && anon_inode_fops Oleg Nesterov
  0 siblings, 2 replies; 66+ messages in thread
From: Willy Tarreau @ 2013-08-22 20:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, security, Ingo Molnar, Linux Kernel Mailing List,
	Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler

On Thu, Aug 22, 2013 at 01:10:43PM -0700, Andy Lutomirski wrote:
> What's the point of nd_jump_link anyway?  The only way I can think of
> for a magic symlink in /proc to point to another symlink is to open a
> symlink with O_PATH | O_NOFOLLOW.  Actually trying to use the
> resulting link in /proc results in -ELOOP.  (Even just trying to open
> a normal symlink with O_NOFOLLOW and without O_PATH results in
> -ELOOP.)

It's not only that, it also supports sockets and pipes that you can access
via /proc/pid/fd and not via a real symlink which would try to open eg
"pipe:[23456]" instead of the real file. So you can't get rid of it
without breaking existing apps (starting with your shell for which
/dev/stdin is a link to /proc/self/fd/0 for example).

Willy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-22 20:15           ` Willy Tarreau
@ 2013-08-22 20:22             ` Andy Lutomirski
  2013-08-22 20:44               ` Linus Torvalds
  2013-08-24 18:29             ` /proc/pid/fd && anon_inode_fops Oleg Nesterov
  1 sibling, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2013-08-22 20:22 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, security, Ingo Molnar, Linux Kernel Mailing List,
	Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler

On Thu, Aug 22, 2013 at 1:15 PM, Willy Tarreau <w@1wt.eu> wrote:
> On Thu, Aug 22, 2013 at 01:10:43PM -0700, Andy Lutomirski wrote:
>> What's the point of nd_jump_link anyway?  The only way I can think of
>> for a magic symlink in /proc to point to another symlink is to open a
>> symlink with O_PATH | O_NOFOLLOW.  Actually trying to use the
>> resulting link in /proc results in -ELOOP.  (Even just trying to open
>> a normal symlink with O_NOFOLLOW and without O_PATH results in
>> -ELOOP.)
>
> It's not only that, it also supports sockets and pipes that you can access
> via /proc/pid/fd and not via a real symlink which would try to open eg
> "pipe:[23456]" instead of the real file. So you can't get rid of it
> without breaking existing apps (starting with your shell for which
> /dev/stdin is a link to /proc/self/fd/0 for example).
>

Let me rephrase that: why do we allow these types of lookup to recurse
like normal symlinks?  I'm proposing that these links immediately
terminate lookup and return back to user_path_at_empty, which can, in
turn, do an extra check to see if the inode that was found is one of
these magic inodes (or checks to see if nd_jump_link was called).
user_path_at_empty could then enforce LOOKUP_EMPTY-list restrictions
(or the caller could).

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-22 20:22             ` Andy Lutomirski
@ 2013-08-22 20:44               ` Linus Torvalds
  2013-08-22 20:48                 ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2013-08-22 20:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List,
	Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler

On Thu, Aug 22, 2013 at 1:22 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> Let me rephrase that: why do we allow these types of lookup to recurse
> like normal symlinks?  I'm proposing that these links immediately
> terminate lookup  [..]

It can nest *inside* a regular symlink. So there should not be any
recursion of pure /proc style symlink jumps, but they live within the
nesting that is normal symlink behavior.

            Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-22 20:44               ` Linus Torvalds
@ 2013-08-22 20:48                 ` Andy Lutomirski
  2013-08-22 20:54                   ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2013-08-22 20:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List,
	Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler

On Thu, Aug 22, 2013 at 1:44 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Aug 22, 2013 at 1:22 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> Let me rephrase that: why do we allow these types of lookup to recurse
>> like normal symlinks?  I'm proposing that these links immediately
>> terminate lookup  [..]
>
> It can nest *inside* a regular symlink. So there should not be any
> recursion of pure /proc style symlink jumps, but they live within the
> nesting that is normal symlink behavior.
>

Sure.  But aren't they always last?

With the current code structure, trying to enforce some kind of
security restriction in the middle of lookup seems really unpleasant.
But if the final step is to verify one of these links and then either
reject or end lookup right there, it'll be (IMO) much easier to
understand.

--Andy


>             Linus



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-22 20:48                 ` Andy Lutomirski
@ 2013-08-22 20:54                   ` Linus Torvalds
  2013-08-22 20:58                     ` Andy Lutomirski
  2013-08-23  1:07                     ` Al Viro
  0 siblings, 2 replies; 66+ messages in thread
From: Linus Torvalds @ 2013-08-22 20:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List,
	Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler

On Thu, Aug 22, 2013 at 1:48 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> Sure.  But aren't they always last?

What do you mean? I'd say that the /proc lookup is always *innermost*.
Which means that it certainly cannot bail out, since there are many
levels of nesting outside of it.

> With the current code structure, trying to enforce some kind of
> security restriction in the middle of lookup seems really unpleasant.

If it's conditional (ie "linkat behaves differently from openat"), it
certainly means that we'd have to pass in that info in annoying ways.

But having unconditional rules like "you can only follow the proc link
if you have CAP_SEARCH _or_ you match the file->f_cred" doesn't sound
too painful.

              Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-22 20:54                   ` Linus Torvalds
@ 2013-08-22 20:58                     ` Andy Lutomirski
  2013-08-23  1:07                     ` Al Viro
  1 sibling, 0 replies; 66+ messages in thread
From: Andy Lutomirski @ 2013-08-22 20:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, security, Ingo Molnar, Linux Kernel Mailing List,
	Oleg Nesterov, Al Viro, Linux FS Devel, Brad Spengler

On Thu, Aug 22, 2013 at 1:54 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Aug 22, 2013 at 1:48 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> Sure.  But aren't they always last?
>
> What do you mean? I'd say that the /proc lookup is always *innermost*.
> Which means that it certainly cannot bail out, since there are many
> levels of nesting outside of it.

I was thinking iteratively, so I said "last" instead of "innermost".

>
>> With the current code structure, trying to enforce some kind of
>> security restriction in the middle of lookup seems really unpleasant.
>
> If it's conditional (ie "linkat behaves differently from openat"), it
> certainly means that we'd have to pass in that info in annoying ways.
>

Hmm.  That depends on whether things like I_LINKABLE should be
considered.  We might also want to ban open("/proc/self/fd/3", O_RDWR)
if !CAP_DAC_READ_SEARCH and fd wasn't opened with O_RDWR.  Both of
those will require passing information in or out.

I'll see how nasty this ends up being.  (This may take awhile -- I'm
not at all familiar with this code, and this is at best a minor side
project for me.)

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-22 20:54                   ` Linus Torvalds
  2013-08-22 20:58                     ` Andy Lutomirski
@ 2013-08-23  1:07                     ` Al Viro
  2013-08-25  3:37                       ` Al Viro
  1 sibling, 1 reply; 66+ messages in thread
From: Al Viro @ 2013-08-23  1:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Willy Tarreau, security, Ingo Molnar,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Thu, Aug 22, 2013 at 01:54:15PM -0700, Linus Torvalds wrote:
> On Thu, Aug 22, 2013 at 1:48 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> >
> > Sure.  But aren't they always last?
> 
> What do you mean? I'd say that the /proc lookup is always *innermost*.
> Which means that it certainly cannot bail out, since there are many
> levels of nesting outside of it.
> 
> > With the current code structure, trying to enforce some kind of
> > security restriction in the middle of lookup seems really unpleasant.
> 
> If it's conditional (ie "linkat behaves differently from openat"), it
> certainly means that we'd have to pass in that info in annoying ways.

Nope.  All we need to pass is one more LOOKUP_...  Add
	if (unlikely(nd->last_type == LAST_BIND)) {
		if ((nd->flags & LOOKUP_BLAH) && !may_flink(...)) {
			terminate_walk(nd);
			return -EINVAL;
		}
	}
in the beginning of lookup_last() and pass LOOKUP_BLAH in flags when
linkat() calls user_path_at().  That will affect *only* the terminal
symlinks and cost nothing in all normal cases.  The same check can
bloody well go into path_init() - take
                if (*name) {
                        if (!can_lookup(dentry->d_inode)) {
                                fdput(f);
                                return -ENOTDIR;
                        }
                }
in there and slap
		else {
			if ((flags & LOOKUP_BLAH) && !may_flink(...)) {
                                fdput(f);
				return -EINVAL;
			}
		}
after it.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* /proc/pid/fd && anon_inode_fops
  2013-08-22 20:15           ` Willy Tarreau
  2013-08-22 20:22             ` Andy Lutomirski
@ 2013-08-24 18:29             ` Oleg Nesterov
  2013-08-24 21:24               ` Willy Tarreau
  1 sibling, 1 reply; 66+ messages in thread
From: Oleg Nesterov @ 2013-08-24 18:29 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Andy Lutomirski, Linus Torvalds, security, Ingo Molnar,
	Linux Kernel Mailing List, Al Viro, Linux FS Devel,
	Brad Spengler

Sorry for off-topic, I am just curios.

On 08/22, Willy Tarreau wrote:
>
> It's not only that, it also supports sockets and pipes that you can access
> via /proc/pid/fd and not via a real symlink which would try to open eg
> "pipe:[23456]" instead of the real file.

But sock_no_open() disallows this, and for good reason I guess.

I am wondering, perhaps anon_inode should do the same? I do not
see any problem, but it looks pointless and misleading to allow
to open a file you can do nothing with.

Or is there any reason why, say, open("anon_inode:[perf_event]")
should succeed?

Thanks,

Oleg.

--- x/fs/anon_inodes.c
+++ x/fs/anon_inodes.c
@@ -24,7 +24,15 @@
 
 static struct vfsmount *anon_inode_mnt __read_mostly;
 static struct inode *anon_inode_inode;
-static const struct file_operations anon_inode_fops;
+
+static int anon_open(struct inode *inode, struct file *file)
+{
+	return -ENXIO;
+}
+
+static const struct file_operations anon_inode_fops = {
+	.open = anon_open,
+};
 
 /*
  * anon_inodefs_dname() is called from d_path().

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: /proc/pid/fd && anon_inode_fops
  2013-08-24 18:29             ` /proc/pid/fd && anon_inode_fops Oleg Nesterov
@ 2013-08-24 21:24               ` Willy Tarreau
  2013-08-25  5:23                 ` Al Viro
  2013-08-25 15:45                 ` Oleg Nesterov
  0 siblings, 2 replies; 66+ messages in thread
From: Willy Tarreau @ 2013-08-24 21:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, Linus Torvalds, security, Ingo Molnar,
	Linux Kernel Mailing List, Al Viro, Linux FS Devel,
	Brad Spengler

Hi Oleg,

On Sat, Aug 24, 2013 at 08:29:39PM +0200, Oleg Nesterov wrote:
> Sorry for off-topic, I am just curios.
> 
> On 08/22, Willy Tarreau wrote:
> >
> > It's not only that, it also supports sockets and pipes that you can access
> > via /proc/pid/fd and not via a real symlink which would try to open eg
> > "pipe:[23456]" instead of the real file.
> 
> But sock_no_open() disallows this, and for good reason I guess.

Hmmm not exactly, it works for a pipe but not for a socket. It even
breaks /dev/stdin (/proc/self/fd/0) for processes running attached
to a socket (eg: shell script) :

sh-3.1$ ls -la /proc/$$/fd/
total 0
dr-x------ 2 willy users  0 Aug 24 23:03 .
dr-xr-xr-x 6 willy users  0 Aug 24 23:03 ..
lrwx------ 1 willy users 64 Aug 24 23:03 0 -> socket:[1247293]
lrwx------ 1 willy users 64 Aug 24 23:03 1 -> socket:[1247293]
lrwx------ 1 willy users 64 Aug 24 23:03 2 -> socket:[1247293]
lrwx------ 1 willy users 64 Aug 24 23:03 255 -> socket:[1247293]
sh-3.1$ read < /dev/stdin
sh: /dev/stdin: No such device or address
sh-3.1$ read < /dev/fd/0
sh: /dev/fd/0: No such device or address

But :

window 1:
  willy@pcw:~$ ssh 0 sh -i
  sh-3.1$ echo $$
  9832
  sh-3.1$ ls -la /proc/$$/fd/
  total 0
  dr-x------ 2 willy users  0 Aug 24 23:16 .
  dr-xr-xr-x 6 willy users  0 Aug 24 23:16 ..
  lr-x------ 1 willy users 64 Aug 24 23:16 0 -> pipe:[1247914]
  l-wx------ 1 willy users 64 Aug 24 23:16 1 -> pipe:[1247915]
  l-wx------ 1 willy users 64 Aug 24 23:16 2 -> pipe:[1247916]
  l-wx------ 1 willy users 64 Aug 24 23:17 255 -> pipe:[1247916]
  sh-3.1$ 

window 2:
  willy@pcw:~$ echo hello > /proc/9832/fd/1
  willy@pcw:~$ echo whoami > /proc/9832/fd/0

window 1:
  sh-3.1$ hello
  willy
  sh-3.1$ 

> I am wondering, perhaps anon_inode should do the same? I do not
> see any problem, but it looks pointless and misleading to allow
> to open a file you can do nothing with.

I don't know, I'm often a bit confused by entries in /proc/pid/fd
because I don't always know which ones might safely be used or not.

> Or is there any reason why, say, open("anon_inode:[perf_event]")
> should succeed?

I doubt it. It seems to me that most such entries are implemented
for completeness while most valid uses only concern /proc/self/fd.
Maybe if we had an option so that only /proc/self/fd would actually
allow to access the fds while all /proc/pid/fd would only show what
they map to, it would be a good step forward.

> Thanks,
> 
> Oleg.
> 
> --- x/fs/anon_inodes.c
> +++ x/fs/anon_inodes.c
> @@ -24,7 +24,15 @@
>  
>  static struct vfsmount *anon_inode_mnt __read_mostly;
>  static struct inode *anon_inode_inode;
> -static const struct file_operations anon_inode_fops;
> +
> +static int anon_open(struct inode *inode, struct file *file)
> +{
> +	return -ENXIO;
> +}
> +
> +static const struct file_operations anon_inode_fops = {
> +	.open = anon_open,
> +};
>  
>  /*
>   * anon_inodefs_dname() is called from d_path().

regards,
willy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-23  1:07                     ` Al Viro
@ 2013-08-25  3:37                       ` Al Viro
  2013-08-25  7:26                         ` Andy Lutomirski
  2013-08-25 19:57                         ` Linus Torvalds
  0 siblings, 2 replies; 66+ messages in thread
From: Al Viro @ 2013-08-25  3:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Willy Tarreau, security, Ingo Molnar,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Fri, Aug 23, 2013 at 02:07:26AM +0100, Al Viro wrote:
> On Thu, Aug 22, 2013 at 01:54:15PM -0700, Linus Torvalds wrote:
> > On Thu, Aug 22, 2013 at 1:48 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> > >
> > > Sure.  But aren't they always last?
> > 
> > What do you mean? I'd say that the /proc lookup is always *innermost*.
> > Which means that it certainly cannot bail out, since there are many
> > levels of nesting outside of it.
> > 
> > > With the current code structure, trying to enforce some kind of
> > > security restriction in the middle of lookup seems really unpleasant.
> > 
> > If it's conditional (ie "linkat behaves differently from openat"), it
> > certainly means that we'd have to pass in that info in annoying ways.
> 
> Nope.  All we need to pass is one more LOOKUP_...  Add
> 	if (unlikely(nd->last_type == LAST_BIND)) {
> 		if ((nd->flags & LOOKUP_BLAH) && !may_flink(...)) {
> 			terminate_walk(nd);
> 			return -EINVAL;
> 		}
> 	}
> in the beginning of lookup_last() and pass LOOKUP_BLAH in flags when
> linkat() calls user_path_at().  That will affect *only* the terminal
> symlinks and cost nothing in all normal cases.  The same check can
> bloody well go into path_init() - take
>                 if (*name) {
>                         if (!can_lookup(dentry->d_inode)) {
>                                 fdput(f);
>                                 return -ENOTDIR;
>                         }
>                 }
> in there and slap
> 		else {
> 			if ((flags & LOOKUP_BLAH) && !may_flink(...)) {
>                                 fdput(f);
> 				return -EINVAL;
> 			}
> 		}
> after it.

OK, let me summarize these threads so far:
	* restrictions for flink() are needed and they'd better be
consistent for AT_SYMLINK_FOLLOW + /proc/<pid>/fd/<n> and simply
passing the descriptor as dfd.
	* CAP_DAC_OVERRIDE is sufficient; so should be O_TMPFILE used
to open that sucker.
	* lookup_last() is the natural place for catching the case
of following a trailing procfs symlink - it can be done very cheaply
there.

FWIW, I'm tempted to try the following trick:
	* introduce FMODE_FLINK in file->f_mode; O_TMPFILE would set it,
unless O_EXCL is present.
	* introduce LOOKUP_LINK, to be passed by sys_linkat() when
resolving the target.
	* have path_init() called with empty pathname and LOOKUP_LINK in
flags do checks for FMODE_FLINK or CAP_DAC_OVERRIDE
	* have ->proc_get_link() report whether the target is linkable
(either as bool * or by returning 1 instead of 0).  After the call of
->proc_get_link() check that and set nd->last_type to LAST_BIND_LINKABLE.
Note that *all* places looking at ->last_type treat LAST_BIND as "none
of the above" - we never compare with it, so splitting it in two wouldn't
break anything.
	* have lookup_last() check if LOOKUP_LINK is present and ->last_type
is LAST_BIND; fail unless we have CAP_DAC_OVERRIDE.

AFAICS, it gets more or less sane behaviour; additionally, it makes possible
to introduce explicit "I want that descriptor to be suitable for flink()"
open(2) flag - that would require teaching do_last() about LOOKUP_LINK,
making it check for CAP_DAC_OVERRIDE if it sees LAST_BIND / LOOKUP_LINK,
same as lookup_last() above (we obviously want to avoid the possibility
to take a non-flinkable descriptor and use it to reopen the sucker in
flinkable way).

Alternatively we can revert "fs: Allow unprivileged linkat(..., AT_EMPTY_PATH)
aka flink" for the time being.  flink() is certainly an awful mess and I
seriously regret touching it ;-/

Comments?  Hell, maybe somebody even has printable ones - stranger things
have happened...

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: /proc/pid/fd && anon_inode_fops
  2013-08-24 21:24               ` Willy Tarreau
@ 2013-08-25  5:23                 ` Al Viro
  2013-08-25  6:50                   ` Willy Tarreau
  2013-08-25 18:32                   ` /proc/pid/fd && anon_inode_fops Linus Torvalds
  2013-08-25 15:45                 ` Oleg Nesterov
  1 sibling, 2 replies; 66+ messages in thread
From: Al Viro @ 2013-08-25  5:23 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Oleg Nesterov, Andy Lutomirski, Linus Torvalds, security,
	Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel,
	Brad Spengler

On Sat, Aug 24, 2013 at 11:24:32PM +0200, Willy Tarreau wrote:

> I doubt it. It seems to me that most such entries are implemented
> for completeness while most valid uses only concern /proc/self/fd.
> Maybe if we had an option so that only /proc/self/fd would actually
> allow to access the fds while all /proc/pid/fd would only show what
> they map to, it would be a good step forward.

How?  The fundamental problem is not visibility of that stuff, it's
new opened file for the same object (Linux behaviour) vs. new descriptor
refering to the same opened file (*BSD and friends).  We can't get
anon_... sanely reopened in the former semantics and they are very
visibly different for regular files, so switching to *BSD one is not
feasible - too high odds of userland breakage.  The difference in
semantics, of course, is that on Linux opening /dev/stdin gives you
a descriptor with independent current IO position; on *BSD you get
a descriptor sharing the current IO position with stdin.  IOW, it's
independent open() of the same file vs. dup().

We are really stuck with the current semantics here - switching to
*BSD one would not only mean serious surgery on descriptor handling
(it's one of the wartier areas in *BSD VFS, in large part because
of magic-open-really-a-dup kludges they have to do), it would change
a long-standing userland API that had been there for nearly 20 years
_and_ one that tends to be used in corner cases of hell knows how many
scripts.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: /proc/pid/fd && anon_inode_fops
  2013-08-25  5:23                 ` Al Viro
@ 2013-08-25  6:50                   ` Willy Tarreau
  2013-08-25 18:51                     ` Linus Torvalds
  2013-08-25 18:32                   ` /proc/pid/fd && anon_inode_fops Linus Torvalds
  1 sibling, 1 reply; 66+ messages in thread
From: Willy Tarreau @ 2013-08-25  6:50 UTC (permalink / raw)
  To: Al Viro
  Cc: Oleg Nesterov, Andy Lutomirski, Linus Torvalds, security,
	Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel,
	Brad Spengler

On Sun, Aug 25, 2013 at 06:23:17AM +0100, Al Viro wrote:
> On Sat, Aug 24, 2013 at 11:24:32PM +0200, Willy Tarreau wrote:
> 
> > I doubt it. It seems to me that most such entries are implemented
> > for completeness while most valid uses only concern /proc/self/fd.
> > Maybe if we had an option so that only /proc/self/fd would actually
> > allow to access the fds while all /proc/pid/fd would only show what
> > they map to, it would be a good step forward.
> 
> How?  The fundamental problem is not visibility of that stuff, it's
> new opened file for the same object (Linux behaviour) vs. new descriptor
> refering to the same opened file (*BSD and friends).  We can't get
> anon_... sanely reopened in the former semantics and they are very
> visibly different for regular files, so switching to *BSD one is not
> feasible - too high odds of userland breakage.  The difference in
> semantics, of course, is that on Linux opening /dev/stdin gives you
> a descriptor with independent current IO position; on *BSD you get
> a descriptor sharing the current IO position with stdin.  IOW, it's
> independent open() of the same file vs. dup().
> 
> We are really stuck with the current semantics here - switching to
> *BSD one would not only mean serious surgery on descriptor handling
> (it's one of the wartier areas in *BSD VFS, in large part because
> of magic-open-really-a-dup kludges they have to do), it would change
> a long-standing userland API that had been there for nearly 20 years
> _and_ one that tends to be used in corner cases of hell knows how many
> scripts.

Thanks for explaining Al, that really helps me understand. However
there's still a difference between /proc/pid called from the process
itself (=/proc/self) and called from other processes that seems to
suit the situation :

  willy@eeepc:~$ ls -la /tmp/bash 
  -r-x--x--x 1 root users 916852 2013-08-25 08:19 /tmp/bash*
  willy@eeepc:~$ exec /tmp/bash -i
  willy@eeepc:~$ echo $$
  22678
  willy@eeepc:~$ ls -la /proc/22678/fd
  ls: cannot open directory /proc/22678/fd: Permission denied
  willy@eeepc:~$ ls -la /proc/22678/exe 
  ls: cannot read symbolic link /proc/22678/exe: Permission denied
  willy@eeepc:~$ cat /proc/22678/fd/0 
  cat: /proc/22678/fd/0: Permission denied

but :
  willy@eeepc:~$ read < /proc/22678/fd/0 
  azerazerazer
  willy@eeepc:~$ echo $REPLY
  azerazerazer

strace clearly shows that the process was allowed to inspect itself
and the other ones were not :

  willy@eeepc:~$ strace -p 22678
  open("/proc/22678/fd/0", O_RDONLY|O_LARGEFILE) = 3

  willy@eeepc:~$ strace cat /proc/22678/fd/0 
  open("/proc/22678/fd/0", O_RDONLY|O_LARGEFILE) = -1 EACCES (Permission denied)

It looks like this difference was introduced by this patch (which also fixes
this issue we've been having for a very long time on 2.4 and early 2.6) :

    8948e11 Allow access to /proc/$PID/fd after setuid()

Thus I'm wondering if something like this could help, the idea would be
that a with the appropriate mount option, a task could only look at its
own descriptors unless it's running with privileges :

    static int proc_fd_permission(struct inode *inode, int mask,
                                  struct nameidata *nd)
    {
       if (task_pid(current) == proc_pid(inode))
             return 0;
       if (capable(CAP_DAC_OVERRIDE))
             return 0;
       if (proc_mounted_with_strict_option)
             return -EACCES;
       return generic_permission(inode, mask, NULL);
    }

Thus it would not change the default behaviour except for people who would
mount /proc with a special option.

Thanks,
Willy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-25  3:37                       ` Al Viro
@ 2013-08-25  7:26                         ` Andy Lutomirski
  2013-08-25 14:23                           ` Al Viro
  2013-08-25 19:57                         ` Linus Torvalds
  1 sibling, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2013-08-25  7:26 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Willy Tarreau, security, Ingo Molnar,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Sat, Aug 24, 2013 at 8:37 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Aug 23, 2013 at 02:07:26AM +0100, Al Viro wrote:
>> On Thu, Aug 22, 2013 at 01:54:15PM -0700, Linus Torvalds wrote:
>> > On Thu, Aug 22, 2013 at 1:48 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> > >
>> > > Sure.  But aren't they always last?
>> >
>> > What do you mean? I'd say that the /proc lookup is always *innermost*.
>> > Which means that it certainly cannot bail out, since there are many
>> > levels of nesting outside of it.
>> >
>> > > With the current code structure, trying to enforce some kind of
>> > > security restriction in the middle of lookup seems really unpleasant.
>> >
>> > If it's conditional (ie "linkat behaves differently from openat"), it
>> > certainly means that we'd have to pass in that info in annoying ways.
>>
>> Nope.  All we need to pass is one more LOOKUP_...  Add
>>       if (unlikely(nd->last_type == LAST_BIND)) {
>>               if ((nd->flags & LOOKUP_BLAH) && !may_flink(...)) {
>>                       terminate_walk(nd);
>>                       return -EINVAL;
>>               }
>>       }
>> in the beginning of lookup_last() and pass LOOKUP_BLAH in flags when
>> linkat() calls user_path_at().  That will affect *only* the terminal
>> symlinks and cost nothing in all normal cases.  The same check can
>> bloody well go into path_init() - take
>>                 if (*name) {
>>                         if (!can_lookup(dentry->d_inode)) {
>>                                 fdput(f);
>>                                 return -ENOTDIR;
>>                         }
>>                 }
>> in there and slap
>>               else {
>>                       if ((flags & LOOKUP_BLAH) && !may_flink(...)) {
>>                                 fdput(f);
>>                               return -EINVAL;
>>                       }
>>               }
>> after it.
>
> OK, let me summarize these threads so far:
>         * restrictions for flink() are needed and they'd better be
> consistent for AT_SYMLINK_FOLLOW + /proc/<pid>/fd/<n> and simply
> passing the descriptor as dfd.
>         * CAP_DAC_OVERRIDE is sufficient; so should be O_TMPFILE used
> to open that sucker.
>         * lookup_last() is the natural place for catching the case
> of following a trailing procfs symlink - it can be done very cheaply
> there.
>
> FWIW, I'm tempted to try the following trick:
>         * introduce FMODE_FLINK in file->f_mode; O_TMPFILE would set it,
> unless O_EXCL is present.
>         * introduce LOOKUP_LINK, to be passed by sys_linkat() when
> resolving the target.
>         * have path_init() called with empty pathname and LOOKUP_LINK in
> flags do checks for FMODE_FLINK or CAP_DAC_OVERRIDE
>         * have ->proc_get_link() report whether the target is linkable
> (either as bool * or by returning 1 instead of 0).  After the call of
> ->proc_get_link() check that and set nd->last_type to LAST_BIND_LINKABLE.
> Note that *all* places looking at ->last_type treat LAST_BIND as "none
> of the above" - we never compare with it, so splitting it in two wouldn't
> break anything.
>         * have lookup_last() check if LOOKUP_LINK is present and ->last_type
> is LAST_BIND; fail unless we have CAP_DAC_OVERRIDE.
>
> AFAICS, it gets more or less sane behaviour; additionally, it makes possible
> to introduce explicit "I want that descriptor to be suitable for flink()"
> open(2) flag - that would require teaching do_last() about LOOKUP_LINK,
> making it check for CAP_DAC_OVERRIDE if it sees LAST_BIND / LOOKUP_LINK,
> same as lookup_last() above (we obviously want to avoid the possibility
> to take a non-flinkable descriptor and use it to reopen the sucker in
> flinkable way).
>
> Alternatively we can revert "fs: Allow unprivileged linkat(..., AT_EMPTY_PATH)
> aka flink" for the time being.  flink() is certainly an awful mess and I
> seriously regret touching it ;-/
>
> Comments?  Hell, maybe somebody even has printable ones - stranger things
> have happened...

I think this is more screwed up than just flink and open.  For example:

$ echo 'WTF' >test
$ truncate -s 1 /proc/self/fd/3 3<test
$ cat test
W$

IMO that should have failed.

In an ideal world (I think) ffrob(N), frobat(N, "", AT_EMPTY_PATH),
and frobat(AT_FDCWD, "/proc/self/fd/N) should generally do the same
thing.  This includes flink (even though the flink variant doesn't
exist).  open is a bit special.

Of course, we're rather inconsistent with AT_EMPTY_PATH.  utimensat
accepts a null filename instead of a blank filename, and it doesn't
appear to check the file mode at all.

Sigh.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-25  7:26                         ` Andy Lutomirski
@ 2013-08-25 14:23                           ` Al Viro
  2013-08-25 17:04                             ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Al Viro @ 2013-08-25 14:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Willy Tarreau, security, Ingo Molnar,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Sun, Aug 25, 2013 at 12:26:34AM -0700, Andy Lutomirski wrote:

> I think this is more screwed up than just flink and open.  For example:
> 
> $ echo 'WTF' >test
> $ truncate -s 1 /proc/self/fd/3 3<test
> $ cat test
> W$
> 
> IMO that should have failed.

Why?  truncate() always follows links, so what's the problem with that
one?  That you get checks of truncate() and not ftruncate()?

> In an ideal world (I think) ffrob(N), frobat(N, "", AT_EMPTY_PATH),
> and frobat(AT_FDCWD, "/proc/self/fd/N) should generally do the same
> thing.

What about the cases where frob() and ffrob() check for different things?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: /proc/pid/fd && anon_inode_fops
  2013-08-24 21:24               ` Willy Tarreau
  2013-08-25  5:23                 ` Al Viro
@ 2013-08-25 15:45                 ` Oleg Nesterov
  1 sibling, 0 replies; 66+ messages in thread
From: Oleg Nesterov @ 2013-08-25 15:45 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Andy Lutomirski, Linus Torvalds, security, Ingo Molnar,
	Linux Kernel Mailing List, Al Viro, Linux FS Devel,
	Brad Spengler

Hi Willy,

On 08/24, Willy Tarreau wrote:
>
> On Sat, Aug 24, 2013 at 08:29:39PM +0200, Oleg Nesterov wrote:
> >
> > On 08/22, Willy Tarreau wrote:
> > >
> > > It's not only that, it also supports sockets and pipes that you can access
> > > via /proc/pid/fd and not via a real symlink which would try to open eg
> > > "pipe:[23456]" instead of the real file.
> >
> > But sock_no_open() disallows this, and for good reason I guess.
>
> Hmmm not exactly, it works for a pipe but not for a socket.

But this is what I meant, sorry for confusion.

Let me try to explain. Just in case, this has nothing to do with security
and I do not see any problem, still I think there is something wrong (but
harmless).

Suppose that you are trying to open(/proc/pid/$pipe-or-socket-fd).
nd_jump_link() sets nd->inode correctly, then dentry_open() does the
rest. Everything is correct at this stage, the new file gets the correct
f_inode/f_op.

However, unlike fifo_open(), socket_file_ops->open() can not actually
create the file/sock connection, so sock_no_open() just fails and
nothing bad happens.

But if you open an anon_inodefs file via proc, you get the "bogus" file.
There is a single anon_inode_inode, its ->i_fop points to the empty
anon_inode_fops, this has nothing to do with ->f_op of the actual file
you tried to open.

Nothing bad happens, still I think this is simply wrong and misleading,
and thus I think it would be better to disallow this via anon_open().

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-25 14:23                           ` Al Viro
@ 2013-08-25 17:04                             ` Andy Lutomirski
  0 siblings, 0 replies; 66+ messages in thread
From: Andy Lutomirski @ 2013-08-25 17:04 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Willy Tarreau, security, Ingo Molnar,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Sun, Aug 25, 2013 at 7:23 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Sun, Aug 25, 2013 at 12:26:34AM -0700, Andy Lutomirski wrote:
>
>> I think this is more screwed up than just flink and open.  For example:
>>
>> $ echo 'WTF' >test
>> $ truncate -s 1 /proc/self/fd/3 3<test
>> $ cat test
>> W$
>>
>> IMO that should have failed.
>
> Why?  truncate() always follows links, so what's the problem with that
> one?  That you get checks of truncate() and not ftruncate()?

The same as the issue with all these other things: the fd might have
survived a privilege drop or been passed through exec or SCM_RIGHTS,
and the holder of the fd might not be able to see the inode.

For example, suppose a daemon creates a file with O_TMPFILE | O_RDWR.
Then it does open("/proc/self/fd/N", O_RDONLY) to get a read-only fd
for the same temporary file.  It passes that fd to something else.
It's rather surprising that the recipient would be able to truncate it
using /proc/self/fd when it couldn't ftruncate it due to its being
O_RDONLY.

(Of course, this can be worked around by setting the mode to 0644, but
I doubt that everyone will get that right.)

>
>> In an ideal world (I think) ffrob(N), frobat(N, "", AT_EMPTY_PATH),
>> and frobat(AT_FDCWD, "/proc/self/fd/N) should generally do the same
>> thing.
>
> What about the cases where frob() and ffrob() check for different things?

I'll go out on a limb and say that every single case where ffrob has a
check that frob("/proc/self/fd/N") doesn't is wrong.  Maybe we're
stuck with them for backwards compatibility, but that doesn't mean
they're good ideas.

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: /proc/pid/fd && anon_inode_fops
  2013-08-25  5:23                 ` Al Viro
  2013-08-25  6:50                   ` Willy Tarreau
@ 2013-08-25 18:32                   ` Linus Torvalds
  2013-08-25 19:11                     ` Al Viro
                                       ` (2 more replies)
  1 sibling, 3 replies; 66+ messages in thread
From: Linus Torvalds @ 2013-08-25 18:32 UTC (permalink / raw)
  To: Al Viro
  Cc: Willy Tarreau, Oleg Nesterov, Andy Lutomirski, security,
	Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel,
	Brad Spengler

On Sat, Aug 24, 2013 at 10:23 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> We are really stuck with the current semantics here - switching to
> *BSD one would not only mean serious surgery on descriptor handling
> (it's one of the wartier areas in *BSD VFS, in large part because
> of magic-open-really-a-dup kludges they have to do), it would change
> a long-standing userland API that had been there for nearly 20 years
> _and_ one that tends to be used in corner cases of hell knows how many
> scripts.

Actually, I'm pretty sure we did have the "dup" semantics at one point
(long ago), and they were really nice (because you could use them to
see where in the stream the fd was etc). It just fit so horribly badly
into the VFS semantics that it got changed into the current "new file
descriptor" one. Afaik, nothing broke.

So I'm not really sure about the "we're stuck with it" for semantic
reasons, and it turns out that very few programs/scripts actually use
/proc/<pid>/fd/<nr> at all (random use of /dev/stdin is likely the
most common case). But I agree about the "serious surgery on
descriptor handling" part.

              Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: /proc/pid/fd && anon_inode_fops
  2013-08-25  6:50                   ` Willy Tarreau
@ 2013-08-25 18:51                     ` Linus Torvalds
  2013-08-25 19:48                       ` Oleg Nesterov
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2013-08-25 18:51 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Al Viro, Oleg Nesterov, Andy Lutomirski, security, Ingo Molnar,
	Linux Kernel Mailing List, Linux FS Devel, Brad Spengler

On Sat, Aug 24, 2013 at 11:50 PM, Willy Tarreau <w@1wt.eu> wrote:
>
> Thanks for explaining Al, that really helps me understand. However
> there's still a difference between /proc/pid called from the process
> itself (=/proc/self) and called from other processes that seems to
> suit the situation :

/proc/self has magic special properties, as you noticed.

> Thus I'm wondering if something like this could help, the idea would be
> that a with the appropriate mount option, a task could only look at its
> own descriptors unless it's running with privileges :

I'd much rather try to do it in general, and use "file->f_cred" more
aggressively for /proc/<pid>/fd/ security.

We don't use f_cred at all in /proc, but that's because /proc predates
that whole thing. So instead we use the credentials of the task when
we want to look at credentials of the file, because that was the
closest approximation we used to have.

Look at the code that creates the fd stat information, for example.
It's in tid_fd_revalidate(), and it really doesn't make much sense to
use the task credentials for it. I wonder if we should do something
like the appended (whitespace-damaged and totally untested) patch.

                Linus

---
  diff --git a/fs/proc/fd.c b/fs/proc/fd.c
  index 75f2890abbd8..2a5a53cc7a0a 100644
  --- a/fs/proc/fd.c
  +++ b/fs/proc/fd.c
  @@ -74,7 +74,6 @@ static int tid_fd_revalidate(struct dentry
*dentry, unsigned int flags)
   {
          struct files_struct *files;
          struct task_struct *task;
  -       const struct cred *cred;
          struct inode *inode;
          int fd;

  @@ -95,19 +94,17 @@ static int tid_fd_revalidate(struct dentry
*dentry, unsigned int flags)
                          if (file) {
                                  unsigned f_mode = file->f_mode;

  -                               rcu_read_unlock();
  -                               put_files_struct(files);
  -
                                  if (task_dumpable(task)) {
  -                                       rcu_read_lock();
  -                                       cred = __task_cred(task);
  +                                       const struct cred *cred =
file->f_cred;
                                          inode->i_uid = cred->euid;
                                          inode->i_gid = cred->egid;
  -                                       rcu_read_unlock();
                                  } else {
                                          inode->i_uid = GLOBAL_ROOT_UID;
                                          inode->i_gid = GLOBAL_ROOT_GID;
                                  }
  +                               rcu_read_unlock();
  +                               put_files_struct(files);
  +

                                  if (S_ISLNK(inode->i_mode)) {
                                          unsigned i_mode = S_IFLNK;

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: /proc/pid/fd && anon_inode_fops
  2013-08-25 18:32                   ` /proc/pid/fd && anon_inode_fops Linus Torvalds
@ 2013-08-25 19:11                     ` Al Viro
  2013-08-25 19:17                     ` Andy Lutomirski
  2013-09-03 15:58                     ` Pavel Machek
  2 siblings, 0 replies; 66+ messages in thread
From: Al Viro @ 2013-08-25 19:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, Oleg Nesterov, Andy Lutomirski, security,
	Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel,
	Brad Spengler

On Sun, Aug 25, 2013 at 11:32:45AM -0700, Linus Torvalds wrote:
> On Sat, Aug 24, 2013 at 10:23 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > We are really stuck with the current semantics here - switching to
> > *BSD one would not only mean serious surgery on descriptor handling
> > (it's one of the wartier areas in *BSD VFS, in large part because
> > of magic-open-really-a-dup kludges they have to do), it would change
> > a long-standing userland API that had been there for nearly 20 years
> > _and_ one that tends to be used in corner cases of hell knows how many
> > scripts.
> 
> Actually, I'm pretty sure we did have the "dup" semantics at one point
> (long ago), and they were really nice (because you could use them to
> see where in the stream the fd was etc). It just fit so horribly badly
> into the VFS semantics that it got changed into the current "new file
> descriptor" one. Afaik, nothing broke.
> 
> So I'm not really sure about the "we're stuck with it" for semantic
> reasons, and it turns out that very few programs/scripts actually use
> /proc/<pid>/fd/<nr> at all (random use of /dev/stdin is likely the
> most common case). But I agree about the "serious surgery on
> descriptor handling" part.

Well...  We are actually in better position for that these days;
right now we have very few instances of ->atomic_open(), so we could
change the calling conventions for it.  It returns 0 or -error and we
could turn that into NULL, ERR_PTR(-error) or a reference to already
opened struct file.  It's not _that_ far to propagate from that point -
atomic_open() <- lookup_open() <- do_last() <- path_openat().  So the amount
of surgery is nowhere near the horrors we used to need (and *BSD actually
does).

We could try that, but I'm really afraid that semantics changes will break
stuff; worse yet, that it'll happen to stuff in dusty corners of random admin
scripts nobody able to debug anymore ;-/

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: /proc/pid/fd && anon_inode_fops
  2013-08-25 18:32                   ` /proc/pid/fd && anon_inode_fops Linus Torvalds
  2013-08-25 19:11                     ` Al Viro
@ 2013-08-25 19:17                     ` Andy Lutomirski
  2013-09-03 15:58                     ` Pavel Machek
  2 siblings, 0 replies; 66+ messages in thread
From: Andy Lutomirski @ 2013-08-25 19:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Willy Tarreau, Oleg Nesterov, security, Ingo Molnar,
	Linux Kernel Mailing List, Linux FS Devel, Brad Spengler

On Sun, Aug 25, 2013 at 11:32 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sat, Aug 24, 2013 at 10:23 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> We are really stuck with the current semantics here - switching to
>> *BSD one would not only mean serious surgery on descriptor handling
>> (it's one of the wartier areas in *BSD VFS, in large part because
>> of magic-open-really-a-dup kludges they have to do), it would change
>> a long-standing userland API that had been there for nearly 20 years
>> _and_ one that tends to be used in corner cases of hell knows how many
>> scripts.
>
> Actually, I'm pretty sure we did have the "dup" semantics at one point
> (long ago), and they were really nice (because you could use them to
> see where in the stream the fd was etc). It just fit so horribly badly
> into the VFS semantics that it got changed into the current "new file
> descriptor" one. Afaik, nothing broke.
>

We have fdinfo now, which is IMO much less scary.  Programs can find
the stream position, but they can't change it.  OTOH...

> So I'm not really sure about the "we're stuck with it" for semantic
> reasons, and it turns out that very few programs/scripts actually use
> /proc/<pid>/fd/<nr> at all (random use of /dev/stdin is likely the
> most common case). But I agree about the "serious surgery on
> descriptor handling" part.

.../dev/stdin doesn't actually do what you expect if input comes from
something seekable.

$ cat /proc/self/fd/3
test
$ cat /proc/self/fd/3
test
$ cat /proc/self/fd/3
test
$ cat <&3
est
$ cat /proc/self/fd/3
test
$ cat <&3

(I'm not going to advocate for changing this.)

--Andy

>
>               Linus



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: /proc/pid/fd && anon_inode_fops
  2013-08-25 18:51                     ` Linus Torvalds
@ 2013-08-25 19:48                       ` Oleg Nesterov
  2013-08-25 20:05                         ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Oleg Nesterov @ 2013-08-25 19:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, Al Viro, Andy Lutomirski, security, Ingo Molnar,
	Linux Kernel Mailing List, Linux FS Devel, Brad Spengler

Cough. I am going off-topic again, but I can't resist...

On 08/25, Linus Torvalds wrote:
>
> Look at the code that creates the fd stat information, for example.
> It's in tid_fd_revalidate(), and it really doesn't make much sense to
> use the task credentials for it.

Or pid_revalidate(), but my concern is task_dumpable() logic.

pid_revalidate() does inode->i_*id = GLOBAL_ROOT_*ID if task_dumpable()
fails, but it can fail simply because ->mm = NULL.

This means that almost everything in /proc/zombie-pid/ becomes root.
Doesn't really hurt, but for what? Looks a bit strange imho.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-25  3:37                       ` Al Viro
  2013-08-25  7:26                         ` Andy Lutomirski
@ 2013-08-25 19:57                         ` Linus Torvalds
  2013-08-25 20:06                           ` Al Viro
  1 sibling, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2013-08-25 19:57 UTC (permalink / raw)
  To: Al Viro
  Cc: Andy Lutomirski, Willy Tarreau, security, Ingo Molnar,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Sat, Aug 24, 2013 at 8:37 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> FWIW, I'm tempted to try the following trick:
>         * introduce FMODE_FLINK in file->f_mode; O_TMPFILE would set it,
> unless O_EXCL is present.
>         * introduce LOOKUP_LINK, to be passed by sys_linkat() when
> resolving the target.

[ .. snipped .. ]

Yes. I think we should do this, but I think we should also look at
what _other_ LOOKUP_xyz we should do for the /proc case.

For the read-only fd case, we should have a LOOKUP_WRITE flag, and
return -EPERM if an operation is a write, and we terminate in that
LAST_BIND case.

That would catch the truncate() case, but also the "open a read-only
fd for write or O_TRUNC" case.

Anything else? What other path operations matter that follow links
than truncate(), link() and open()?

                Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: /proc/pid/fd && anon_inode_fops
  2013-08-25 19:48                       ` Oleg Nesterov
@ 2013-08-25 20:05                         ` Linus Torvalds
  2013-08-26 15:33                           ` Oleg Nesterov
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2013-08-25 20:05 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Willy Tarreau, Al Viro, Andy Lutomirski, security, Ingo Molnar,
	Linux Kernel Mailing List, Linux FS Devel, Brad Spengler

On Sun, Aug 25, 2013 at 12:48 PM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> Or pid_revalidate(), but my concern is task_dumpable() logic.
>
> pid_revalidate() does inode->i_*id = GLOBAL_ROOT_*ID if task_dumpable()
> fails, but it can fail simply because ->mm = NULL.
>
> This means that almost everything in /proc/zombie-pid/ becomes root.
> Doesn't really hurt, but for what? Looks a bit strange imho.

The zombie case shouldn't be relevant, because a zombie will have
closed all the file descriptors anyway, so they no longer exist.

That said, task_dumpable isn't wonderful, and I suspect we could drop
that logic entirely in the tid-fd case if we just use f_cred.

The reason we have task_dumpable is exactly because we use the task
credentials, and they may not really be relevant to the file
credentials. IOW, it's there to protect against execve'ing a suid
program that opens some protected file and then in setuid()'s back the
the original user after having done the critical stuff.  But
file->f_cred is exactly about the credentials at the time of the open,
so it should make things like that irrelevant.

            Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-25 19:57                         ` Linus Torvalds
@ 2013-08-25 20:06                           ` Al Viro
  2013-08-25 20:23                             ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Al Viro @ 2013-08-25 20:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Willy Tarreau, security, Ingo Molnar,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Sun, Aug 25, 2013 at 12:57:25PM -0700, Linus Torvalds wrote:

> Yes. I think we should do this, but I think we should also look at
> what _other_ LOOKUP_xyz we should do for the /proc case.
> 
> For the read-only fd case, we should have a LOOKUP_WRITE flag, and
> return -EPERM if an operation is a write, and we terminate in that
> LAST_BIND case.
> 
> That would catch the truncate() case, but also the "open a read-only
> fd for write or O_TRUNC" case.
> 
> Anything else? What other path operations matter that follow links
> than truncate(), link() and open()?

Timestamp updates, chmod/chown, xattr mess...

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-25 20:06                           ` Al Viro
@ 2013-08-25 20:23                             ` Linus Torvalds
  2013-08-26 17:37                               ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2013-08-25 20:23 UTC (permalink / raw)
  To: Al Viro
  Cc: Andy Lutomirski, Willy Tarreau, security, Ingo Molnar,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Sun, Aug 25, 2013 at 1:06 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Timestamp updates, chmod/chown, xattr mess...

Ok, so that's just too much details.

So I'll just go back to square one, and wonder if we could/should just
make the rule be that in order to be in that LAST_BIND case, you
really have to have f_cred match your own credentials. Or have
CAP_SEARCH.

And just not have any new LOOKUP_xyz flags at all. No special cases,
no different semantics for different ops, just check f_cred. Because
if you had the permissions to do the original open (ie f_cred matches
your current credentials), then that shows that you originally had all
the pathname permissions, and you are still the same person.

And yeah, you may have opened in for reading (so the file is
technically read-only), but obviously we're re-doing all the inode
permission checks anyway, so the only thing /proc/<pid>/fd/<n> really
gave you was the path traversal. So we shouldn't bother with "the file
descriptor is only readable", because that is simply *irrelevant*.

So it means that the case Andi brought up (truncating or
open-for-write a fd that we only had open for reading) would continue
to be allowed, because while it "sounds odd", there is no actual
problem.

And CAP_SEARCH is very much about that path lookup again. So it's
consistent with the notion that "ok, you may do odd things to file
descriptors through /proc, but we check that you cannot avoid the
pathname lookup rules".

And then we do exactly the same to flink(). So then we're all
consistent again. Not the consistency Andy worried about, but that's
the consistency that was was the security worries with flink are all
about. Because the issues with the "use the file descriptor, not the
path to the file descriptor" really are *not* about the endpoint
itself (since we will re-do the permission check for that particular
inode anyway), but about the path leading up to that end-point.

                    Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: /proc/pid/fd && anon_inode_fops
  2013-08-25 20:05                         ` Linus Torvalds
@ 2013-08-26 15:33                           ` Oleg Nesterov
  2013-08-26 16:37                             ` Oleg Nesterov
  0 siblings, 1 reply; 66+ messages in thread
From: Oleg Nesterov @ 2013-08-26 15:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, Al Viro, Andy Lutomirski, security, Ingo Molnar,
	Linux Kernel Mailing List, Linux FS Devel, Brad Spengler

On 08/25, Linus Torvalds wrote:
>
> On Sun, Aug 25, 2013 at 12:48 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > pid_revalidate() does inode->i_*id = GLOBAL_ROOT_*ID if task_dumpable()
> > fails, but it can fail simply because ->mm = NULL.
> >
> > This means that almost everything in /proc/zombie-pid/ becomes root.
> > Doesn't really hurt, but for what? Looks a bit strange imho.
>
> The zombie case shouldn't be relevant, because a zombie will have
> closed all the file descriptors anyway, so they no longer exist.

I specially mentioned that this is off-topic ;)

> That said, task_dumpable isn't wonderful, and I suspect we could drop
> that logic entirely in the tid-fd case if we just use f_cred.

Probably yes, but I do not understand this S_IFLNK && uid/chmod magic
in tid_fd_revalidate(). And afaics this should not affect readlink()
anyway. So yes, ->f_cred makes more sense to me, but I can't comment.


But, afaics, speaking of task_dumpable() this doesn't matter. Please
forget about /proc/fd or zombies. I can't even understand
proc_pid_make_inode() or pid_revalidate().

	$ id
	uid=1000(tst) gid=100(users) groups=100(users)
	$ cp `which ls` ls
	$ chmod a-r ./ls
	$
	$ ./ls -l /proc/self/
	total 0
	-r-------- 1 root root  0 Aug 26 06:35 auxv
	-r--r--r-- 1 root root  0 Aug 26 06:35 cgroup
	--w------- 1 root root  0 Aug 26 06:35 clear_refs
	-r--r--r-- 1 root root  0 Aug 26 06:35 cmdline
	-rw-r--r-- 1 root root  0 Aug 26 06:35 comm
	-rw-r--r-- 1 root root  0 Aug 26 06:35 coredump_filter
	lrwxrwxrwx 1 root root  0 Aug 26 06:35 cwd -> /home/tst
	-r-------- 1 root root  0 Aug 26 06:35 environ
	lrwxrwxrwx 1 root root  0 Aug 26 06:35 exe -> /home/tst/ls
	dr-x------ 2 root root  0 Aug 26 06:35 fd
	dr-x------ 2 root root  0 Aug 26 06:35 fdinfo
	-rw-r--r-- 1 root root  0 Aug 26 06:35 gid_map
	-r--r--r-- 1 root root  0 Aug 26 06:35 limits
	-r--r--r-- 1 root root  0 Aug 26 06:35 maps
	-rw------- 1 root root  0 Aug 26 06:35 mem
	-r--r--r-- 1 root root  0 Aug 26 06:35 mountinfo
	-r--r--r-- 1 root root  0 Aug 26 06:35 mounts
	-r-------- 1 root root  0 Aug 26 06:35 mountstats
	dr-xr-xr-x 4 tst  users 0 Aug 26 06:35 net
	dr-x--x--x 2 root root  0 Aug 26 06:35 ns
	-rw-r--r-- 1 root root  0 Aug 26 06:35 oom_adj
	-r--r--r-- 1 root root  0 Aug 26 06:35 oom_score
	-rw-r--r-- 1 root root  0 Aug 26 06:35 oom_score_adj
	-r--r--r-- 1 root root  0 Aug 26 06:35 pagemap
	-r--r--r-- 1 root root  0 Aug 26 06:35 personality
	-rw-r--r-- 1 root root  0 Aug 26 06:35 projid_map
	lrwxrwxrwx 1 root root  0 Aug 26 06:35 root -> /
	-r--r--r-- 1 root root  0 Aug 26 06:35 smaps
	-r--r--r-- 1 root root  0 Aug 26 06:35 stack
	-r--r--r-- 1 root root  0 Aug 26 06:35 stat
	-r--r--r-- 1 root root  0 Aug 26 06:35 statm
	-r--r--r-- 1 root root  0 Aug 26 06:35 status
	-r--r--r-- 1 root root  0 Aug 26 06:35 syscall
	dr-xr-xr-x 3 tst  users 0 Aug 26 06:35 task
	-rw-r--r-- 1 root root  0 Aug 26 06:35 uid_map
	-r--r--r-- 1 root root  0 Aug 26 06:35 wchan

For what? Say,

	-r--r--r-- 1 root root  0 Aug 26 06:35 status

but it is S_IRUGO anyway, why do we need to change the owner?

	dr-x------ 2 root root  0 Aug 26 06:35 fd

OK, this means that I can't access this dir from another process.
Not sure we really want this in this case but

	$ ./ls /proc/self/fd
	0  1  2  3

still works, I guess thanks to proc_fd_permission().

However, say,

	-r-------- 1 root root  0 Aug 26 06:35 mountstats

actually becomes unreadable even via /proc/self/.

Imho, this all is confusing. Perhaps it makes sense to "chmod", say,
/proc/pid/maps if !task_dumpable(), but "chown" looks strange.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: /proc/pid/fd && anon_inode_fops
  2013-08-26 15:33                           ` Oleg Nesterov
@ 2013-08-26 16:37                             ` Oleg Nesterov
  2013-08-26 17:54                               ` [PATCH] proc: make proc_fd_permission() thread-friendly Oleg Nesterov
  0 siblings, 1 reply; 66+ messages in thread
From: Oleg Nesterov @ 2013-08-26 16:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Willy Tarreau, Al Viro, Andy Lutomirski, security, Ingo Molnar,
	Linux Kernel Mailing List, Linux FS Devel, Brad Spengler

On 08/26, Oleg Nesterov wrote:
>
> Not sure we really want this in this case but
>
> 	$ ./ls /proc/self/fd
> 	0  1  2  3
>
> still works, I guess thanks to proc_fd_permission().

And btw. Whatever we do, shouldn't we change proc_fd_permission()?

/proc/self is actually /proc/tgid, this means that the task_pid()
check can't help if a sub-thread uses /proc/self.

Oleg.

--- x/fs/proc/fd.c
+++ x/fs/proc/fd.c
@@ -288,7 +288,7 @@ int proc_fd_permission(struct inode *ino
 	int rv = generic_permission(inode, mask);
 	if (rv == 0)
 		return 0;
-	if (task_pid(current) == proc_pid(inode))
+	if (task_tgid(current) == proc_pid(inode))
 		rv = 0;
 	return rv;
 }

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-25 20:23                             ` Linus Torvalds
@ 2013-08-26 17:37                               ` Linus Torvalds
  2013-08-26 18:07                                 ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2013-08-26 17:37 UTC (permalink / raw)
  To: Al Viro
  Cc: Andy Lutomirski, Willy Tarreau, security, Ingo Molnar,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Sun, Aug 25, 2013 at 1:23 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So I'll just go back to square one, and wonder if we could/should just
> make the rule be that in order to be in that LAST_BIND case, you
> really have to have f_cred match your own credentials. Or have
> CAP_SEARCH.

Nope. That doesn't work. It breaks the chrome sandboxing.

Right now, following a /proc fd symlink requires ptrace access to the
process. Which is actually pretty strict, and makes sense. But it does
mean that there are other capabilities than CAP_DAC_READ_SEARCH at
play.

I'm playing with a patch that then in addition to the ptrace check
*also* requires that the file was opened with the same credentials as
the follower _or_ the task being followed. I'll see if that works out.

                Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH] proc: make proc_fd_permission() thread-friendly
  2013-08-26 16:37                             ` Oleg Nesterov
@ 2013-08-26 17:54                               ` Oleg Nesterov
  2013-08-26 18:09                                 ` Linus Torvalds
  2013-08-26 18:32                                 ` Eric W. Biederman
  0 siblings, 2 replies; 66+ messages in thread
From: Oleg Nesterov @ 2013-08-26 17:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar,
	Linux Kernel Mailing List, Linux FS Devel, Brad Spengler,
	Linus Torvalds, Eric W. Biederman

proc_fd_permission() says "process can still access /proc/self/fd
after it has executed a setuid()", but the "task_pid() = proc_pid()
check only helps if the task is group leader, /proc/self points to
/proc/leader-pid.

Change this check to use task_tgid() so that the whole process can
access /proc/self/fd.

Note: CLONE_THREAD doesn't require CLONE_FILES so task->files can
differ, but I don't think this can lead to any security problem.

Test-case:

	void *tfunc(void *arg)
	{
		assert(opendir("/proc/self/fd"));
		return NULL;
	}

	int main(void)
	{
		pthread_t t;
		pthread_create(&t, NULL, tfunc, NULL);
		pthread_join(t, NULL);
		return 0;
	}

fails if, say, this executable is not readable and suid_dumpable = 0.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/fd.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 0ff80f9..985ea88 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -286,7 +286,7 @@ int proc_fd_permission(struct inode *inode, int mask)
 	int rv = generic_permission(inode, mask);
 	if (rv == 0)
 		return 0;
-	if (task_pid(current) == proc_pid(inode))
+	if (task_tgid(current) == proc_pid(inode))
 		rv = 0;
 	return rv;
 }
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-26 17:37                               ` Linus Torvalds
@ 2013-08-26 18:07                                 ` Linus Torvalds
  2013-08-26 18:11                                   ` Andy Lutomirski
  2013-08-27 19:16                                   ` [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate Andy Lutomirski
  0 siblings, 2 replies; 66+ messages in thread
From: Linus Torvalds @ 2013-08-26 18:07 UTC (permalink / raw)
  To: Al Viro
  Cc: Andy Lutomirski, Willy Tarreau, security, Ingo Molnar,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

[-- Attachment #1: Type: text/plain, Size: 640 bytes --]

On Mon, Aug 26, 2013 at 10:37 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'm playing with a patch that then in addition to the ptrace check
> *also* requires that the file was opened with the same credentials as
> the follower _or_ the task being followed. I'll see if that works out.

Chrome sandboxing still _works_ with the attached patch, but there's a
few audit messages about the sandbox being denied dac_read_search. And
I'm really not very happy with this anyway.

So I'm going to send it out (using email to archive things and see if
anybody has any comments), and then forget about it.

                  Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 1592 bytes --]

 fs/proc/fd.c | 34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 0ff80f9b930f..7a97dfbb4217 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -137,6 +137,33 @@ static const struct dentry_operations tid_fd_dentry_operations = {
 	.d_delete	= pid_delete_dentry,
 };
 
+static inline int match_cred(const struct cred *a, const struct cred *b)
+{
+	return a->fsuid == b->fsuid && a->fsgid == b->fsgid;
+}
+
+/*
+ * To get here, we have ptrace() access to the task in question.
+ *
+ * But we still require that the file itself was either opened by that
+ * task or by ourselves, or that we have path search capability.
+ */
+static int get_fd_path(struct task_struct *task, struct file *file, struct path *path)
+{
+	if (!match_cred(file->f_cred, current_cred())) {
+		int match;
+
+		rcu_read_lock();
+		match = match_cred(file->f_cred, __task_cred(task));
+		rcu_read_unlock();
+		if (!match && !capable(CAP_DAC_READ_SEARCH))
+			return -EACCES;
+	}
+	*path = file->f_path;
+	path_get(&file->f_path);
+	return 0;
+}
+
 static int proc_fd_link(struct dentry *dentry, struct path *path)
 {
 	struct files_struct *files = NULL;
@@ -155,11 +182,8 @@ static int proc_fd_link(struct dentry *dentry, struct path *path)
 
 		spin_lock(&files->file_lock);
 		fd_file = fcheck_files(files, fd);
-		if (fd_file) {
-			*path = fd_file->f_path;
-			path_get(&fd_file->f_path);
-			ret = 0;
-		}
+		if (fd_file)
+			ret = get_fd_path(task, fd_file, path);
 		spin_unlock(&files->file_lock);
 		put_files_struct(files);
 	}

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH] proc: make proc_fd_permission() thread-friendly
  2013-08-26 17:54                               ` [PATCH] proc: make proc_fd_permission() thread-friendly Oleg Nesterov
@ 2013-08-26 18:09                                 ` Linus Torvalds
  2013-08-26 19:35                                   ` Linus Torvalds
  2013-08-26 18:32                                 ` Eric W. Biederman
  1 sibling, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2013-08-26 18:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Willy Tarreau, Al Viro, Andy Lutomirski,
	Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel,
	Brad Spengler, Eric W. Biederman

On Mon, Aug 26, 2013 at 10:54 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> proc_fd_permission() says "process can still access /proc/self/fd
> after it has executed a setuid()", but the "task_pid() = proc_pid()
> check only helps if the task is group leader, /proc/self points to
> /proc/leader-pid.

Patch looks ok to me, but since this has never worked and nobody has
actually complained, I can't really convince myself that this is
critical.

So I'm not going to apply it for 3.11, and it had better wait for the
next merge window.

                 Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH)
  2013-08-26 18:07                                 ` Linus Torvalds
@ 2013-08-26 18:11                                   ` Andy Lutomirski
  2013-08-27 19:16                                   ` [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate Andy Lutomirski
  1 sibling, 0 replies; 66+ messages in thread
From: Andy Lutomirski @ 2013-08-26 18:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Willy Tarreau, security, Ingo Molnar,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Mon, Aug 26, 2013 at 11:07 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Aug 26, 2013 at 10:37 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> I'm playing with a patch that then in addition to the ptrace check
>> *also* requires that the file was opened with the same credentials as
>> the follower _or_ the task being followed. I'll see if that works out.
>
> Chrome sandboxing still _works_ with the attached patch, but there's a
> few audit messages about the sandbox being denied dac_read_search. And
> I'm really not very happy with this anyway.
>
> So I'm going to send it out (using email to archive things and see if
> anybody has any comments), and then forget about it.
>

I'm still not thrilled with this approach.  I'll see if I can find a
clean way to use fmode later today.

--Andy

>                   Linus



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] proc: make proc_fd_permission() thread-friendly
  2013-08-26 17:54                               ` [PATCH] proc: make proc_fd_permission() thread-friendly Oleg Nesterov
  2013-08-26 18:09                                 ` Linus Torvalds
@ 2013-08-26 18:32                                 ` Eric W. Biederman
  2013-08-26 18:46                                   ` Oleg Nesterov
  1 sibling, 1 reply; 66+ messages in thread
From: Eric W. Biederman @ 2013-08-26 18:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Willy Tarreau, Al Viro, Andy Lutomirski,
	Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel,
	Brad Spengler, Linus Torvalds

Oleg Nesterov <oleg@redhat.com> writes:

> proc_fd_permission() says "process can still access /proc/self/fd
> after it has executed a setuid()", but the "task_pid() = proc_pid()
> check only helps if the task is group leader, /proc/self points to
> /proc/leader-pid.
>
> Change this check to use task_tgid() so that the whole process can
> access /proc/self/fd.

There is at least a semantic goofiness here.

There is /proc/<tgid>/fd and /proc/<tgid>/task/<pid>/fd, and the same
permission check is used by both.

We might just want to have a /proc/thread symlink as well so people
don't have this issue.  Of course that would require people to use it,
and I think the common case if people care is call gettid() and build
the path themselves.

Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] proc: make proc_fd_permission() thread-friendly
  2013-08-26 18:32                                 ` Eric W. Biederman
@ 2013-08-26 18:46                                   ` Oleg Nesterov
  2013-08-26 18:56                                     ` Oleg Nesterov
  2013-08-26 19:10                                     ` Eric W. Biederman
  0 siblings, 2 replies; 66+ messages in thread
From: Oleg Nesterov @ 2013-08-26 18:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Willy Tarreau, Al Viro, Andy Lutomirski,
	Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel,
	Brad Spengler, Linus Torvalds

On 08/26, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > proc_fd_permission() says "process can still access /proc/self/fd
> > after it has executed a setuid()", but the "task_pid() = proc_pid()
> > check only helps if the task is group leader, /proc/self points to
> > /proc/leader-pid.
> >
> > Change this check to use task_tgid() so that the whole process can
> > access /proc/self/fd.
>
> There is at least a semantic goofiness here.
>
> There is /proc/<tgid>/fd and /proc/<tgid>/task/<pid>/fd, and the same
> permission check is used by both.

Yes, and we have /proc/<tid>/ which includes fd as well.

> We might just want to have a /proc/thread symlink as well so people
> don't have this issue.

Yes! I agree.

In particular, from the changelog:

	Note: CLONE_THREAD doesn't require CLONE_FILES so task->files can
	differ,

so /proc/self/fd doesn't necessarily mean current->files, this can confuse
the application.

And I also assume that you agree with this change ;)

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] proc: make proc_fd_permission() thread-friendly
  2013-08-26 18:46                                   ` Oleg Nesterov
@ 2013-08-26 18:56                                     ` Oleg Nesterov
  2013-08-26 19:10                                     ` Eric W. Biederman
  1 sibling, 0 replies; 66+ messages in thread
From: Oleg Nesterov @ 2013-08-26 18:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Willy Tarreau, Al Viro, Andy Lutomirski,
	Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel,
	Brad Spengler, Linus Torvalds

On 08/26, Oleg Nesterov wrote:
>
> And I also assume that you agree with this change ;)

At least I hope... Because I think this is another issue.

And because with or without a /proc/thread symlink, I believe that
people don't expect that /proc/self can restrict a sub-thread, and
CLONE_THREAS without CLONE_FILES is exotic.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] proc: make proc_fd_permission() thread-friendly
  2013-08-26 18:46                                   ` Oleg Nesterov
  2013-08-26 18:56                                     ` Oleg Nesterov
@ 2013-08-26 19:10                                     ` Eric W. Biederman
  2013-08-27 14:53                                       ` Oleg Nesterov
  1 sibling, 1 reply; 66+ messages in thread
From: Eric W. Biederman @ 2013-08-26 19:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Willy Tarreau, Al Viro, Andy Lutomirski,
	Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel,
	Brad Spengler, Linus Torvalds

Oleg Nesterov <oleg@redhat.com> writes:

> On 08/26, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@redhat.com> writes:
>>
>> > proc_fd_permission() says "process can still access /proc/self/fd
>> > after it has executed a setuid()", but the "task_pid() = proc_pid()
>> > check only helps if the task is group leader, /proc/self points to
>> > /proc/leader-pid.
>> >
>> > Change this check to use task_tgid() so that the whole process can
>> > access /proc/self/fd.
>>
>> There is at least a semantic goofiness here.
>>
>> There is /proc/<tgid>/fd and /proc/<tgid>/task/<pid>/fd, and the same
>> permission check is used by both.
>
> Yes, and we have /proc/<tid>/ which includes fd as well.
>
>> We might just want to have a /proc/thread symlink as well so people
>> don't have this issue.
>
> Yes! I agree.
>
> In particular, from the changelog:
>
> 	Note: CLONE_THREAD doesn't require CLONE_FILES so task->files can
> 	differ,
>
> so /proc/self/fd doesn't necessarily mean current->files, this can confuse
> the application.
>
> And I also assume that you agree with this change ;)

I don't disagree.  Comparing tgid to pids is goofy and my brain is
elsewhere so I have no thought through the implications.

Actually thinking I think the check should really be.  In which case we
are comparing what we really care about.

int proc_fd_permission(struct inode *inode, int mask)
{
	int rv = generic_permission(inode, mask);
	if (rv == 0)
		return 0;

        rcu_read_lock();
        struct task *task = pid_task(proc_pid(inode));
        if (task && (current->files == task->files))
		rv = 0;
        rcu_read_unlock();

	return rv;
}

Eric

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] proc: make proc_fd_permission() thread-friendly
  2013-08-26 18:09                                 ` Linus Torvalds
@ 2013-08-26 19:35                                   ` Linus Torvalds
  2013-08-26 20:20                                     ` Willy Tarreau
                                                       ` (2 more replies)
  0 siblings, 3 replies; 66+ messages in thread
From: Linus Torvalds @ 2013-08-26 19:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Willy Tarreau, Al Viro, Andy Lutomirski,
	Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel,
	Brad Spengler, Eric W. Biederman

On Mon, Aug 26, 2013 at 11:09 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Patch looks ok to me, but since this has never worked and nobody has
> actually complained, I can't really convince myself that this is
> critical.

Actually, let's back-track..

Did you try the other approach? Make /proc/self point to the thread
instead of the task?

The thread-group leader seems to have these extra files:

 - autogroup, coredump_filter, mountstats, net, task

but quite frankly, at least "net" and "task" look like they should
exist there - with "task" pointing back to the actual task (it would
make more sense for "/proc/<pid>/task" itself to be named "threads",
but whatever).

Yes, it would be semantically different, but it would mean that
"/proc/self/fd/" would actually make sense in a way that it currently
does *not* - which would seem fairly important, since the primary use
for it tends to be /dev/stdin.

And the other semantic differences might be much harder to notice.
Worth testing?

                 Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] proc: make proc_fd_permission() thread-friendly
  2013-08-26 19:35                                   ` Linus Torvalds
@ 2013-08-26 20:20                                     ` Willy Tarreau
  2013-08-27 15:05                                       ` Oleg Nesterov
  2013-08-27 14:39                                     ` [PATCH 0/1] proc: make /proc/self point to thread Oleg Nesterov
       [not found]                                     ` <CALCETrXP-mYBPRon=0NzexW1FK1Qxz2+Bwv7-WeHBQpvW7ywRg@mail.gmail.com>
  2 siblings, 1 reply; 66+ messages in thread
From: Willy Tarreau @ 2013-08-26 20:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, Andrew Morton, Al Viro, Andy Lutomirski,
	Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel,
	Brad Spengler, Eric W. Biederman

On Mon, Aug 26, 2013 at 12:35:26PM -0700, Linus Torvalds wrote:
(...)
> Yes, it would be semantically different, but it would mean that
> "/proc/self/fd/" would actually make sense in a way that it currently
> does *not* - which would seem fairly important, since the primary use
> for it tends to be /dev/stdin.

I remember another user, don't know if that has changed. UPX used to build
self-extract binaries that opened /proc/self/fd/3. It might be worth checking
this as well if doing important changes, because it's typically a usage that
could be discovered to be broken months after the change!

Willy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 0/1] proc: make /proc/self point to thread
  2013-08-26 19:35                                   ` Linus Torvalds
  2013-08-26 20:20                                     ` Willy Tarreau
@ 2013-08-27 14:39                                     ` Oleg Nesterov
  2013-08-27 14:40                                       ` [PATCH 1/1] " Oleg Nesterov
       [not found]                                     ` <CALCETrXP-mYBPRon=0NzexW1FK1Qxz2+Bwv7-WeHBQpvW7ywRg@mail.gmail.com>
  2 siblings, 1 reply; 66+ messages in thread
From: Oleg Nesterov @ 2013-08-27 14:39 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, Eric W. Biederman
  Cc: Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar,
	Linux Kernel Mailing List, Linux FS Devel, Brad Spengler

On 08/26, Linus Torvalds wrote:
>
> On Mon, Aug 26, 2013 at 11:09 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Patch looks ok to me, but since this has never worked and nobody has
> > actually complained, I can't really convince myself that this is
> > critical.
>
> Actually, let's back-track..
>
> Did you try the other approach? Make /proc/self point to the thread
> instead of the task?

Yes, I thought about this. But I agree with Eric, we probably need
another magic link, /proc/thread or whatever.

And. I think that s/task_pid/task_tgid/ in proc_fd_permission()
makes sense anyway. It is not only for /proc/self, why we should
restrict the access to /proc/<sub-thread>/fd ?

> The thread-group leader seems to have these extra files:
>
>  - autogroup, coredump_filter, mountstats, net, task
>

Note really afaics. Yes, tgid_base_stuff and tid_base_stuff differ,
but proc_root_lookup() uses tgid_base_stuff in any case, so
/proc/<tid>/ also has task,mountstats,etc even if it is not leader.

> Yes, it would be semantically different,

And I am afraid this can break things. But I leave this to you and Eric.

Personally I think that /proc/self pointing to "current" is better, and
in fact I was surprised when I recently found that this is not true.
But perhaps it is too late to change this old behaviour.

> but it would mean that
> "/proc/self/fd/" would actually make sense in a way that it currently
> does *not* - which would seem fairly important, since the primary use
> for it tends to be /dev/stdin.

I think this doesn't matter "in practice", normally all threads have
the same ->files. Who needs CLONE_THREAD without CLONE_FILES ?

> And the other semantic differences might be much harder to notice.
> Worth testing?

Perhaps... Well, if Andrew takes this patch (assuming you and Eric
ack it), we can see if we have any bug reports.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 1/1] proc: make /proc/self point to thread
  2013-08-27 14:39                                     ` [PATCH 0/1] proc: make /proc/self point to thread Oleg Nesterov
@ 2013-08-27 14:40                                       ` Oleg Nesterov
  2013-08-27 16:39                                         ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Oleg Nesterov @ 2013-08-27 14:40 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, Eric W. Biederman
  Cc: Willy Tarreau, Al Viro, Andy Lutomirski, Ingo Molnar,
	Linux Kernel Mailing List, Linux FS Devel, Brad Spengler

/proc/self points to /proc/<tgid>, not to /proc/<tid>, and thus
"self" actually means "group leader".

Change fs/proc/self.c to print the caller's tid, this is probably
what the users actually expect.

Note: this is obviously semantically different and the difference
is subtle, but most (all?) users should not notice this change.
Lets see if it breaks something.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/self.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/proc/self.c b/fs/proc/self.c
index 6b6a993..7da7154 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -11,26 +11,26 @@ static int proc_self_readlink(struct dentry *dentry, char __user *buffer,
 			      int buflen)
 {
 	struct pid_namespace *ns = dentry->d_sb->s_fs_info;
-	pid_t tgid = task_tgid_nr_ns(current, ns);
+	pid_t tid = task_pid_nr_ns(current, ns);
 	char tmp[PROC_NUMBUF];
-	if (!tgid)
+	if (!tid)
 		return -ENOENT;
-	sprintf(tmp, "%d", tgid);
+	sprintf(tmp, "%d", tid);
 	return vfs_readlink(dentry,buffer,buflen,tmp);
 }
 
 static void *proc_self_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
 	struct pid_namespace *ns = dentry->d_sb->s_fs_info;
-	pid_t tgid = task_tgid_nr_ns(current, ns);
+	pid_t tid = task_pid_nr_ns(current, ns);
 	char *name = ERR_PTR(-ENOENT);
-	if (tgid) {
+	if (tid) {
 		/* 11 for max length of signed int in decimal + NULL term */
 		name = kmalloc(12, GFP_KERNEL);
 		if (!name)
 			name = ERR_PTR(-ENOMEM);
 		else
-			sprintf(name, "%d", tgid);
+			sprintf(name, "%d", tid);
 	}
 	nd_set_link(nd, name);
 	return NULL;
@@ -57,7 +57,7 @@ int proc_setup_self(struct super_block *s)
 	struct inode *root_inode = s->s_root->d_inode;
 	struct pid_namespace *ns = s->s_fs_info;
 	struct dentry *self;
-	
+
 	mutex_lock(&root_inode->i_mutex);
 	self = d_alloc_name(s->s_root, "self");
 	if (self) {
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH] proc: make proc_fd_permission() thread-friendly
  2013-08-26 19:10                                     ` Eric W. Biederman
@ 2013-08-27 14:53                                       ` Oleg Nesterov
  0 siblings, 0 replies; 66+ messages in thread
From: Oleg Nesterov @ 2013-08-27 14:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Willy Tarreau, Al Viro, Andy Lutomirski,
	Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel,
	Brad Spengler, Linus Torvalds

On 08/26, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
> >
> > And I also assume that you agree with this change ;)
>
> I don't disagree.  Comparing tgid to pids is goofy and my brain is
> elsewhere so I have no thought through the implications.
>
> Actually thinking I think the check should really be.  In which case we
> are comparing what we really care about.
>
> int proc_fd_permission(struct inode *inode, int mask)
> {
> 	int rv = generic_permission(inode, mask);
> 	if (rv == 0)
> 		return 0;
>
>         rcu_read_lock();
>         struct task *task = pid_task(proc_pid(inode));
>         if (task && (current->files == task->files))

But for what?

To me, this looks like the unnecessary semantic complication. It
looks as if we actually need to restrict the access to /proc/self/fd
or /proc/<tgid>/fd or /proc/<subthread-tid>/fd.

I do not think there is any security reason to deny this. They share
->mm, a sub-thread can do "everything" with its leader or vice versa.

same_thread_group() looks more simple and natural to me. And note
that __ptrace_may_access() was recently changed (in -mm) to use
same_thread_group() instead of "task == current".

So personally I'd prefer to not change this patch and I think it
makes sense even with "make /proc/self point to thread" I sent.

But. please tell me if you really dislike it. You are maintainer,
I won't argue.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] proc: make proc_fd_permission() thread-friendly
  2013-08-26 20:20                                     ` Willy Tarreau
@ 2013-08-27 15:05                                       ` Oleg Nesterov
  0 siblings, 0 replies; 66+ messages in thread
From: Oleg Nesterov @ 2013-08-27 15:05 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, Andrew Morton, Al Viro, Andy Lutomirski,
	Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel,
	Brad Spengler, Eric W. Biederman

On 08/26, Willy Tarreau wrote:
>
> On Mon, Aug 26, 2013 at 12:35:26PM -0700, Linus Torvalds wrote:
> (...)
> > Yes, it would be semantically different, but it would mean that
> > "/proc/self/fd/" would actually make sense in a way that it currently
> > does *not* - which would seem fairly important, since the primary use
> > for it tends to be /dev/stdin.
>
> I remember another user, don't know if that has changed. UPX used to build
> self-extract binaries that opened /proc/self/fd/3.

But /proc/<tgid>/fd and /proc/<tid>/fd should be the same. Unless it plays
with unshare() or clone(CLONE_THREAD /* no CLONE_FILES */).

Unlike, but:

> because it's typically a usage that
> could be discovered to be broken months after the change!

Oh yes, I agree. This change is trivial but nasty, god knows what people
actually do with /proc/self.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] proc: make proc_fd_permission() thread-friendly
       [not found]                                     ` <CALCETrXP-mYBPRon=0NzexW1FK1Qxz2+Bwv7-WeHBQpvW7ywRg@mail.gmail.com>
@ 2013-08-27 15:45                                       ` Oleg Nesterov
  0 siblings, 0 replies; 66+ messages in thread
From: Oleg Nesterov @ 2013-08-27 15:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, linux-kernel, Linux FS Devel, Eric W. Biederman,
	Al Viro, Andrew Morton, Willy Tarreau, Brad Spengler,
	Ingo Molnar

On 08/26, Andy Lutomirski wrote:
>
> On Aug 26, 2013 12:35 PM, "Linus Torvalds" <torvalds@linux-foundation.org>
> wrote:
> >
> > Yes, it would be semantically different, but it would mean that
> > "/proc/self/fd/" would actually make sense in a way that it currently
> > does *not* - which would seem fairly important, since the primary use
> > for it tends to be /dev/stdin.
> >
> > And the other semantic differences might be much harder to notice.
> > Worth testing?
>
> The "children" subdirectory will be different.  But it's already screwed up.

No, it lives in tid_base_stuff, it should be only visible in
/proc/*/task/tid/ dir.

(and perhaps it also makes sense to remove mem/exe-like things from
 tid_base_stuff...)

Anyway I agree, this change is risky.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/1] proc: make /proc/self point to thread
  2013-08-27 14:40                                       ` [PATCH 1/1] " Oleg Nesterov
@ 2013-08-27 16:39                                         ` Linus Torvalds
  2013-08-27 17:49                                           ` Oleg Nesterov
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2013-08-27 16:39 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Willy Tarreau, Al Viro,
	Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List,
	Linux FS Devel, Brad Spengler

On Tue, Aug 27, 2013 at 7:40 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> /proc/self points to /proc/<tgid>, not to /proc/<tid>, and thus
> "self" actually means "group leader".
>
> Change fs/proc/self.c to print the caller's tid, this is probably
> what the users actually expect.

Yeah, thinking more about this, this can't work. I think you'd need to
use "/proc/<tgid>/task/<tid>" wouldn't you?

Or perhaps even more complicated, and say "if it's a task group
leader, keep the old /proc/<tgid>, otherwise do the full thread path".

Anyway, does not not sound worth it. Your previous simple patch is
probably the right thing to do.

             Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/1] proc: make /proc/self point to thread
  2013-08-27 16:39                                         ` Linus Torvalds
@ 2013-08-27 17:49                                           ` Oleg Nesterov
  2013-08-27 18:15                                             ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Oleg Nesterov @ 2013-08-27 17:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Eric W. Biederman, Willy Tarreau, Al Viro,
	Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List,
	Linux FS Devel, Brad Spengler

On 08/27, Linus Torvalds wrote:
>
> On Tue, Aug 27, 2013 at 7:40 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > /proc/self points to /proc/<tgid>, not to /proc/<tid>, and thus
> > "self" actually means "group leader".
> >
> > Change fs/proc/self.c to print the caller's tid, this is probably
> > what the users actually expect.
>
> Yeah, thinking more about this, this can't work. I think you'd need to
> use "/proc/<tgid>/task/<tid>" wouldn't you?

Why? To me /proc/self == /proc/$((sys_gettid)) looks more natural.
Say, /proc/self/task... But this is subjective.

Not to mention, that would be much more visible change.

> Anyway, does not not sound worth it. Your previous simple patch is
> probably the right thing to do.

OK, lets forget it.

Although to be honest, I was seduced by "Worth testing". I mean I am
just curious, who can suffer from this change? Nevermind, please
ignore.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/1] proc: make /proc/self point to thread
  2013-08-27 17:49                                           ` Oleg Nesterov
@ 2013-08-27 18:15                                             ` Linus Torvalds
  2013-08-27 18:28                                               ` Oleg Nesterov
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2013-08-27 18:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Willy Tarreau, Al Viro,
	Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List,
	Linux FS Devel, Brad Spengler

On Tue, Aug 27, 2013 at 10:49 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> Why? To me /proc/self == /proc/$((sys_gettid)) looks more natural.
> Say, /proc/self/task... But this is subjective.

Actually, you're right - I incorrectly thought we had removed the
thread id lookup from the top level in /proc. But we never actually
did that. We only removed them from readdir.

So while you won't see thread id's in the directory listing, we *do*
successfully look up thread id's when specified explicitly. It's
confusing, but it happens to work. So you can do

   ls -l /proc/<tid>/

and get the expected result, but if you do

   ls -l /proc | grep <tid>

it won't actually show up unless the thread ID is also the thread group leader.

> Although to be honest, I was seduced by "Worth testing". I mean I am
> just curious, who can suffer from this change? Nevermind, please
> ignore.

Yeah, if we were to redesign /proc I'd do it differently, but I think
we should just accept that it works "well enough" and there's just too
much risk from making changes that aren't strictly required.

            Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/1] proc: make /proc/self point to thread
  2013-08-27 18:15                                             ` Linus Torvalds
@ 2013-08-27 18:28                                               ` Oleg Nesterov
  0 siblings, 0 replies; 66+ messages in thread
From: Oleg Nesterov @ 2013-08-27 18:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Eric W. Biederman, Willy Tarreau, Al Viro,
	Andy Lutomirski, Ingo Molnar, Linux Kernel Mailing List,
	Linux FS Devel, Brad Spengler

On 08/27, Linus Torvalds wrote:
>
> On Tue, Aug 27, 2013 at 10:49 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> > Although to be honest, I was seduced by "Worth testing". I mean I am
> > just curious, who can suffer from this change? Nevermind, please
> > ignore.
>
> Yeah, if we were to redesign /proc I'd do it differently, but I think
> we should just accept that it works "well enough" and there's just too
> much risk from making changes that aren't strictly required.

Yes, agreed.

To all: just in case, I will be completely offline till Sep 10, vacation.

So, Eric, if you still disagree with s/task_pid/task_tgid/ change please
nack that (minor) patch, we can return to this later.

Oleg.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate
  2013-08-26 18:07                                 ` Linus Torvalds
  2013-08-26 18:11                                   ` Andy Lutomirski
@ 2013-08-27 19:16                                   ` Andy Lutomirski
  2013-08-27 19:32                                     ` Linus Torvalds
  2013-08-27 23:08                                     ` Al Viro
  1 sibling, 2 replies; 66+ messages in thread
From: Andy Lutomirski @ 2013-08-27 19:16 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Willy Tarreau, Ingo Molnar, security, Linux Kernel Mailing List,
	Oleg Nesterov, Linux FS Devel, Brad Spengler, Andy Lutomirski

This is an experiment to see if we can get nice semantics for all syscalls
that either follow symlinks or allow AT_EMPTY_PATH without jumping through
enormous hoops.  This converts truncate (although you can't tell using
truncate from coreutils, because it actually uses open + ftruncate).

The basic idea is that there's a new helper function
user_file_or_path_at.  It takes an fd and a path and, depending on
flags, the emptiness of the path, and whether path is a magic /proc
symlink (or a symlink to a magic /proc/symlink), it returns either a
struct path or a struct file *.

To make this genuinely useful, something similar will need to happen for
open/openat.

The lifetime rules for the contents of struct nameidata seem weird.
If all users first memset the thing to zero and called nd_put or
something when they were done, this would be cleaner.  But they don't.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 fs/namei.c            | 73 ++++++++++++++++++++++++++++++++++++++++
 fs/open.c             | 93 +++++++++++++++++++++++++++++----------------------
 fs/proc/base.c        | 44 +++++++++++++-----------
 fs/proc/fd.c          | 10 +++---
 fs/proc/internal.h    |  3 +-
 include/linux/namei.h | 14 ++++++++
 6 files changed, 171 insertions(+), 66 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 89a612e..0d905ac 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -691,6 +691,28 @@ void nd_jump_link(struct nameidata *nd, struct path *path)
 	nd->flags |= LOOKUP_JUMPED;
 }
 
+/*
+ * Helper for /proc/pid/fd/N-style magic links.  This is like nd_jump_link
+ * except that it will apply additional security checks.  Caller must have
+ * taken a reference to file beforehand.
+ */
+void nd_jump_link_file(struct nameidata *nd, struct file *file)
+{
+	path_put(&nd->path);
+
+	nd->path = file->f_path;
+	path_get(&nd->path);
+	if (nd->flags & LOOKUP_FILE) {
+		if (nd->last_symlink_file)
+			fput(nd->last_symlink_file);
+		nd->last_symlink_file = file;
+	} else {
+		fput(file);
+	}
+	nd->inode = nd->path.dentry->d_inode;
+	nd->flags |= LOOKUP_JUMPED;
+}
+
 static inline void put_link(struct nameidata *nd, struct path *link, void *cookie)
 {
 	struct inode *inode = link->dentry->d_inode;
@@ -842,6 +864,10 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 		goto out_put_nd_path;
 
 	nd->last_type = LAST_BIND;
+	if (unlikely((nd->flags & LOOKUP_FILE) && nd->last_symlink_file)) {
+		fput(nd->last_symlink_file);
+		nd->last_symlink_file = NULL;
+	}
 	*p = dentry->d_inode->i_op->follow_link(dentry, nd);
 	error = PTR_ERR(*p);
 	if (IS_ERR(*p))
@@ -2156,6 +2182,53 @@ int user_path_at(int dfd, const char __user *name, unsigned flags,
 	return user_path_at_empty(dfd, name, flags, path, NULL);
 }
 
+int user_file_or_path_at(int dfd, const char __user *filename,
+			 int lookup_flags, bool null_filename_okay,
+			 struct file_or_path *out)
+{
+	struct nameidata nd;
+	struct filename *tmp;
+	int err;
+	int empty = 0;
+
+	if (!filename && null_filename_okay) {
+		out->file = fget(dfd);
+		return out->file ? 0 : -EBADF;
+	}
+
+	tmp = getname_flags(filename, lookup_flags, &empty);
+	if (IS_ERR(tmp))
+		return PTR_ERR(tmp);
+
+	if (empty) {
+		out->file = fget(dfd);
+		return out->file ? 0 : -EBADF;
+	}
+
+	nd.last_symlink_file = NULL;
+	BUG_ON(lookup_flags & LOOKUP_PARENT);
+
+	err = filename_lookup(dfd, tmp, lookup_flags | LOOKUP_FILE, &nd);
+	putname(tmp);
+	if (!err) {
+		if (nd.last_symlink_file && nd.last_type == LAST_BIND) {
+			out->file = nd.last_symlink_file;
+			memset(&out->path, 0, sizeof(out->path));
+			path_put(&nd.path);
+		} else {
+			out->path = nd.path;
+			out->file = NULL;
+			if (nd.last_symlink_file)
+				fput(nd.last_symlink_file);
+		}
+	} else {
+		if (nd.last_symlink_file)
+			fput(nd.last_symlink_file);
+	}
+
+	return err;
+}
+
 /*
  * NB: most callers don't do anything directly with the reference to the
  *     to struct filename, but the nd->last pointer points into the name string
diff --git a/fs/open.c b/fs/open.c
index 7931f76..1349b43 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -114,20 +114,66 @@ out:
 }
 EXPORT_SYMBOL_GPL(vfs_truncate);
 
+static long do_truncate_file(struct file *file, loff_t length, int small)
+{
+	struct inode *inode;
+	struct dentry *dentry;
+	int error;
+
+	error = -EINVAL;
+	if (length < 0)
+		goto out;
+
+	/* explicitly opened as large or we are on 64-bit box */
+	if (file->f_flags & O_LARGEFILE)
+		small = 0;
+
+	dentry = file->f_path.dentry;
+	inode = dentry->d_inode;
+	error = -EINVAL;
+	if (!S_ISREG(inode->i_mode) || !(file->f_mode & FMODE_WRITE))
+		goto out;
+
+	error = -EINVAL;
+	/* Cannot ftruncate over 2^31 bytes without large file support */
+	if (small && length > MAX_NON_LFS)
+		goto out;
+
+	error = -EPERM;
+	if (IS_APPEND(inode))
+		goto out;
+
+	sb_start_write(inode->i_sb);
+	error = locks_verify_truncate(inode, file, length);
+	if (!error)
+		error = security_path_truncate(&file->f_path);
+	if (!error)
+		error = do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, file);
+	sb_end_write(inode->i_sb);
+out:
+	return error;
+}
+
 static long do_sys_truncate(const char __user *pathname, loff_t length)
 {
-	unsigned int lookup_flags = LOOKUP_FOLLOW;
-	struct path path;
+	struct file_or_path obj;
+	int lookup_flags = LOOKUP_FOLLOW;
 	int error;
 
 	if (length < 0)	/* sorry, but loff_t says... */
 		return -EINVAL;
 
 retry:
-	error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path);
+	error = user_file_or_path_at(AT_FDCWD, pathname,
+				     LOOKUP_FOLLOW, false, &obj);
 	if (!error) {
-		error = vfs_truncate(&path, length);
-		path_put(&path);
+		if (obj.file) {
+			error = do_truncate_file(obj.file, length, 0);
+			fput(obj.file);
+		} else {
+			error = vfs_truncate(&obj.path, length);
+			path_put(&obj.path);
+		}
 	}
 	if (retry_estale(error, lookup_flags)) {
 		lookup_flags |= LOOKUP_REVAL;
@@ -150,48 +196,15 @@ COMPAT_SYSCALL_DEFINE2(truncate, const char __user *, path, compat_off_t, length
 
 static long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
 {
-	struct inode *inode;
-	struct dentry *dentry;
 	struct fd f;
 	int error;
 
-	error = -EINVAL;
-	if (length < 0)
-		goto out;
 	error = -EBADF;
 	f = fdget(fd);
 	if (!f.file)
-		goto out;
-
-	/* explicitly opened as large or we are on 64-bit box */
-	if (f.file->f_flags & O_LARGEFILE)
-		small = 0;
-
-	dentry = f.file->f_path.dentry;
-	inode = dentry->d_inode;
-	error = -EINVAL;
-	if (!S_ISREG(inode->i_mode) || !(f.file->f_mode & FMODE_WRITE))
-		goto out_putf;
-
-	error = -EINVAL;
-	/* Cannot ftruncate over 2^31 bytes without large file support */
-	if (small && length > MAX_NON_LFS)
-		goto out_putf;
-
-	error = -EPERM;
-	if (IS_APPEND(inode))
-		goto out_putf;
-
-	sb_start_write(inode->i_sb);
-	error = locks_verify_truncate(inode, f.file, length);
-	if (!error)
-		error = security_path_truncate(&f.file->f_path);
-	if (!error)
-		error = do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, f.file);
-	sb_end_write(inode->i_sb);
-out_putf:
+		return error;
+	error = do_truncate_file(f.file, length, small);
 	fdput(f);
-out:
 	return error;
 }
 
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1485e38..ac28ff3 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -171,7 +171,7 @@ static int get_task_root(struct task_struct *task, struct path *root)
 	return result;
 }
 
-static int proc_cwd_link(struct dentry *dentry, struct path *path)
+static int proc_cwd_link(struct dentry *dentry, struct file_or_path *link)
 {
 	struct task_struct *task = get_proc_task(dentry->d_inode);
 	int result = -ENOENT;
@@ -179,7 +179,8 @@ static int proc_cwd_link(struct dentry *dentry, struct path *path)
 	if (task) {
 		task_lock(task);
 		if (task->fs) {
-			get_fs_pwd(task->fs, path);
+			link->file = NULL;
+			get_fs_pwd(task->fs, &link->path);
 			result = 0;
 		}
 		task_unlock(task);
@@ -188,13 +189,14 @@ static int proc_cwd_link(struct dentry *dentry, struct path *path)
 	return result;
 }
 
-static int proc_root_link(struct dentry *dentry, struct path *path)
+static int proc_root_link(struct dentry *dentry, struct file_or_path *link)
 {
 	struct task_struct *task = get_proc_task(dentry->d_inode);
 	int result = -ENOENT;
 
 	if (task) {
-		result = get_task_root(task, path);
+		link->file = NULL;
+		result = get_task_root(task, &link->path);
 		put_task_struct(task);
 	}
 	return result;
@@ -1430,11 +1432,10 @@ static const struct file_operations proc_pid_set_comm_operations = {
 	.release	= single_release,
 };
 
-static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
+static int proc_exe_link(struct dentry *dentry, struct file_or_path *link)
 {
 	struct task_struct *task;
 	struct mm_struct *mm;
-	struct file *exe_file;
 
 	task = get_proc_task(dentry->d_inode);
 	if (!task)
@@ -1443,32 +1444,32 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
 	put_task_struct(task);
 	if (!mm)
 		return -ENOENT;
-	exe_file = get_mm_exe_file(mm);
+	link->file = get_mm_exe_file(mm);
 	mmput(mm);
-	if (exe_file) {
-		*exe_path = exe_file->f_path;
-		path_get(&exe_file->f_path);
-		fput(exe_file);
+	if (link->file)
 		return 0;
-	} else
+	else
 		return -ENOENT;
 }
 
 static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
 	struct inode *inode = dentry->d_inode;
-	struct path path;
+	struct file_or_path link;
 	int error = -EACCES;
 
 	/* Are we allowed to snoop on the tasks file descriptors? */
 	if (!proc_fd_access_allowed(inode))
 		goto out;
 
-	error = PROC_I(inode)->op.proc_get_link(dentry, &path);
+	error = PROC_I(inode)->op.proc_get_link(dentry, &link);
 	if (error)
 		goto out;
 
-	nd_jump_link(nd, &path);
+	if (link.file)
+		nd_jump_link_file(nd, link.file);
+	else
+		nd_jump_link(nd, &link.path);
 	return NULL;
 out:
 	return ERR_PTR(error);
@@ -1502,18 +1503,23 @@ static int proc_pid_readlink(struct dentry * dentry, char __user * buffer, int b
 {
 	int error = -EACCES;
 	struct inode *inode = dentry->d_inode;
-	struct path path;
+	struct file_or_path link;
 
 	/* Are we allowed to snoop on the tasks file descriptors? */
 	if (!proc_fd_access_allowed(inode))
 		goto out;
 
-	error = PROC_I(inode)->op.proc_get_link(dentry, &path);
+	error = PROC_I(inode)->op.proc_get_link(dentry, &link);
 	if (error)
 		goto out;
 
-	error = do_proc_readlink(&path, buffer, buflen);
-	path_put(&path);
+	if (link.file) {
+		error = do_proc_readlink(&link.file->f_path, buffer, buflen);
+		fput(link.file);
+	} else {
+		error = do_proc_readlink(&link.path, buffer, buflen);
+		path_put(&link.path);
+	}
 out:
 	return error;
 }
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 0ff80f9..4c88fc2 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -137,7 +137,7 @@ static const struct dentry_operations tid_fd_dentry_operations = {
 	.d_delete	= pid_delete_dentry,
 };
 
-static int proc_fd_link(struct dentry *dentry, struct path *path)
+static int proc_fd_link(struct dentry *dentry, struct file_or_path *link)
 {
 	struct files_struct *files = NULL;
 	struct task_struct *task;
@@ -151,13 +151,11 @@ static int proc_fd_link(struct dentry *dentry, struct path *path)
 
 	if (files) {
 		int fd = proc_fd(dentry->d_inode);
-		struct file *fd_file;
 
 		spin_lock(&files->file_lock);
-		fd_file = fcheck_files(files, fd);
-		if (fd_file) {
-			*path = fd_file->f_path;
-			path_get(&fd_file->f_path);
+		link->file = fcheck_files(files, fd);
+		if (link->file) {
+			get_file(link->file);
 			ret = 0;
 		}
 		spin_unlock(&files->file_lock);
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 651d09a..fc936d5 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -14,6 +14,7 @@
 #include <linux/spinlock.h>
 #include <linux/atomic.h>
 #include <linux/binfmts.h>
+#include <linux/namei.h>
 
 struct ctl_table_header;
 struct mempolicy;
@@ -51,7 +52,7 @@ struct proc_dir_entry {
 };
 
 union proc_op {
-	int (*proc_get_link)(struct dentry *, struct path *);
+	int (*proc_get_link)(struct dentry *, struct file_or_path *link);
 	int (*proc_read)(struct task_struct *task, char *page);
 	int (*proc_show)(struct seq_file *m,
 		struct pid_namespace *ns, struct pid *pid,
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 5a5ff57..77f585e 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -15,6 +15,7 @@ struct nameidata {
 	struct qstr	last;
 	struct path	root;
 	struct inode	*inode; /* path.dentry.d_inode */
+	struct file	*last_symlink_file;
 	unsigned int	flags;
 	unsigned	seq;
 	int		last_type;
@@ -56,9 +57,21 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
 #define LOOKUP_ROOT		0x2000
 #define LOOKUP_EMPTY		0x4000
 
+#define LOOKUP_FILE		0x8000
+
 extern int user_path_at(int, const char __user *, unsigned, struct path *);
 extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty);
 
+struct file_or_path
+{
+	struct file *file;
+	struct path path;
+};
+
+extern int user_file_or_path_at(int dfd, const char __user *filename,
+				int lookup_flags, bool null_filename_okay,
+				struct file_or_path *out);
+
 #define user_path(name, path) user_path_at(AT_FDCWD, name, LOOKUP_FOLLOW, path)
 #define user_lpath(name, path) user_path_at(AT_FDCWD, name, 0, path)
 #define user_path_dir(name, path) \
@@ -83,6 +96,7 @@ extern struct dentry *lock_rename(struct dentry *, struct dentry *);
 extern void unlock_rename(struct dentry *, struct dentry *);
 
 extern void nd_jump_link(struct nameidata *nd, struct path *path);
+extern void nd_jump_link_file(struct nameidata *nd, struct file *file);
 
 static inline void nd_set_link(struct nameidata *nd, char *path)
 {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate
  2013-08-27 19:16                                   ` [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate Andy Lutomirski
@ 2013-08-27 19:32                                     ` Linus Torvalds
  2013-08-27 20:28                                       ` Andy Lutomirski
  2013-08-27 23:08                                     ` Al Viro
  1 sibling, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2013-08-27 19:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Al Viro, Willy Tarreau, Ingo Molnar, security,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Tue, Aug 27, 2013 at 12:16 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> This is an experiment to see if we can get nice semantics for all syscalls
> that either follow symlinks or allow AT_EMPTY_PATH without jumping through
> enormous hoops.  This converts truncate (although you can't tell using
> truncate from coreutils, because it actually uses open + ftruncate).

So this seems *way* too complex. I'd much rather see "nd->flags" get a
LOOKUP_READONLY flag, for example, that gets set by
proc_pid_follow_link() when it hits a read-only file descriptor (and
gets cleared by other lookups).

Wouldn't that be *much* more straightforward?

             Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate
  2013-08-27 19:32                                     ` Linus Torvalds
@ 2013-08-27 20:28                                       ` Andy Lutomirski
  2013-08-28  6:16                                         ` Al Viro
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2013-08-27 20:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Willy Tarreau, Ingo Molnar, security,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Tue, Aug 27, 2013 at 12:32 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Aug 27, 2013 at 12:16 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> This is an experiment to see if we can get nice semantics for all syscalls
>> that either follow symlinks or allow AT_EMPTY_PATH without jumping through
>> enormous hoops.  This converts truncate (although you can't tell using
>> truncate from coreutils, because it actually uses open + ftruncate).
>
> So this seems *way* too complex. I'd much rather see "nd->flags" get a
> LOOKUP_READONLY flag, for example, that gets set by
> proc_pid_follow_link() when it hits a read-only file descriptor (and
> gets cleared by other lookups).
>
> Wouldn't that be *much* more straightforward?

It would if it works.  It certainly would for truncate, setxattr, etc.

There are funny cases, though.  For example, execing an O_WRONLY fd
should probably fail, as should opening an O_WRONLY fd as O_RDWR.
O_APPEND is also funny.  flink will (I suspect) always want to be a
bit special.

There are also O_PATH fds, and I'm not sure what the semantics of
O_PATH fds are or should be when they refer to something other than a
directory.

The benefit of my approach is that it's really obvious that
truncate("/proc/self/fd/N") and ftruncate(N) do exactly the same
thing.  The downside is that the namei code is a bit gross and there
was more rearranging of ftruncate than I would have liked.  (I also
lost the benefit of fget_light, but I could fix that by passing a
struct fd around instead of a struct file *.)

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate
  2013-08-27 19:16                                   ` [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate Andy Lutomirski
  2013-08-27 19:32                                     ` Linus Torvalds
@ 2013-08-27 23:08                                     ` Al Viro
  2013-08-27 23:13                                       ` Andy Lutomirski
  1 sibling, 1 reply; 66+ messages in thread
From: Al Viro @ 2013-08-27 23:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Willy Tarreau, Ingo Molnar, security,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Tue, Aug 27, 2013 at 12:16:34PM -0700, Andy Lutomirski wrote:
> This is an experiment to see if we can get nice semantics for all syscalls
> that either follow symlinks or allow AT_EMPTY_PATH without jumping through
> enormous hoops.  This converts truncate (although you can't tell using
> truncate from coreutils, because it actually uses open + ftruncate).
> 
> The basic idea is that there's a new helper function
> user_file_or_path_at.  It takes an fd and a path and, depending on
> flags, the emptiness of the path, and whether path is a magic /proc
> symlink (or a symlink to a magic /proc/symlink), it returns either a
> struct path or a struct file *.

No.  

> +	path_get(&nd->path);
> +	if (nd->flags & LOOKUP_FILE) {
> +		if (nd->last_symlink_file)
> +			fput(nd->last_symlink_file);
> +		nd->last_symlink_file = file;

This is ugly (and costs quite a bit of overhead)

> -static int proc_cwd_link(struct dentry *dentry, struct path *path)
> +static int proc_cwd_link(struct dentry *dentry, struct file_or_path *link)

... and this is even more vile.  Vetoed, for being too ugly to live.

I think I've a saner approach, not involving anything that ugly; I'll post
a writeup later tonight.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate
  2013-08-27 23:08                                     ` Al Viro
@ 2013-08-27 23:13                                       ` Andy Lutomirski
  0 siblings, 0 replies; 66+ messages in thread
From: Andy Lutomirski @ 2013-08-27 23:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Willy Tarreau, Ingo Molnar, security,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Tue, Aug 27, 2013 at 4:08 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Aug 27, 2013 at 12:16:34PM -0700, Andy Lutomirski wrote:
>> This is an experiment to see if we can get nice semantics for all syscalls
>> that either follow symlinks or allow AT_EMPTY_PATH without jumping through
>> enormous hoops.  This converts truncate (although you can't tell using
>> truncate from coreutils, because it actually uses open + ftruncate).
>>
>> The basic idea is that there's a new helper function
>> user_file_or_path_at.  It takes an fd and a path and, depending on
>> flags, the emptiness of the path, and whether path is a magic /proc
>> symlink (or a symlink to a magic /proc/symlink), it returns either a
>> struct path or a struct file *.
>
> No.

I'm curious what's wrong with the general concept.  (I agree that the
implementation is heinous.)

If I ever wanted to add a new *at syscall, I'd like to be able to do:

int sys_whateverat(int dfd, const char __user *name, int flags)
{
  resolve_the_thing(dfd, name, flags);
  if (it's a file) {
    check fmode;
  } else {
    inode_permission();
  }

  actually_do_something(inode);

  unreference_whatever_i_got();

  return ret;
}

thereby killing the fwhatever and whateverat birds with one stone.


>
>> +     path_get(&nd->path);
>> +     if (nd->flags & LOOKUP_FILE) {
>> +             if (nd->last_symlink_file)
>> +                     fput(nd->last_symlink_file);
>> +             nd->last_symlink_file = file;
>
> This is ugly (and costs quite a bit of overhead)
>
>> -static int proc_cwd_link(struct dentry *dentry, struct path *path)
>> +static int proc_cwd_link(struct dentry *dentry, struct file_or_path *link)
>
> ... and this is even more vile.  Vetoed, for being too ugly to live.

...phew.  I wasn't looking forward to testing and debugging my crap :)

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate
  2013-08-27 20:28                                       ` Andy Lutomirski
@ 2013-08-28  6:16                                         ` Al Viro
  2013-08-28 16:24                                           ` Linus Torvalds
  2013-08-28 19:04                                           ` Andy Lutomirski
  0 siblings, 2 replies; 66+ messages in thread
From: Al Viro @ 2013-08-28  6:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Willy Tarreau, Ingo Molnar, security,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Tue, Aug 27, 2013 at 01:28:27PM -0700, Andy Lutomirski wrote:

> It would if it works.  It certainly would for truncate, setxattr, etc.
> 
> There are funny cases, though.  For example, execing an O_WRONLY fd
> should probably fail, as should opening an O_WRONLY fd as O_RDWR.
> O_APPEND is also funny.  flink will (I suspect) always want to be a
> bit special.
> 
> There are also O_PATH fds, and I'm not sure what the semantics of
> O_PATH fds are or should be when they refer to something other than a
> directory.

O_PATH file just points to specific location in the tree, no more and
no less.
 
> The benefit of my approach is that it's really obvious that
> truncate("/proc/self/fd/N") and ftruncate(N) do exactly the same
> thing.  The downside is that the namei code is a bit gross and there
> was more rearranging of ftruncate than I would have liked.  (I also
> lost the benefit of fget_light, but I could fix that by passing a
> struct fd around instead of a struct file *.)

With that thing namei code becomes _really_ gross and we'll have to live
with it afterwards.  No.  Moreover, grabbing references to struct file
is the wrong thing to do - we need far less information than that.

Let's look at that zoo first.

a) linkat() target.  flink(2) doesn't exist, so we can choose whatever we
bloody please for that case.  And frankly, I would gladly go for O_TMPFILE
sans O_EXCL or sufficient caps.  See upthread for proposed solution.

b) chdir().  Checks are identical in chdir() and fchdir(), don't bloody care
how the file had been opened and don't need to care - anything we could've
accessed after such chdir() we could've accessed by slapping a relative
path to the end of the path we'd used to reach that thing.

c) chroot().  fchroot() doesn't exist and realistically anyone who can do
chroot(2) is well past caring about anything.  We really don't care how the
damn thing had been opened.

d) fchownat().  Neither it nor fchown() give a flying fuch about the way
file had been opened.

e) fchmodat().  Same story.

f) faccessat().  IMO we don't care how the damn thing had been opened.

g) name_to_handle_at().  Not sure if we need any capability checks, but
we definitely don't give a damn about the way file had been opened.

h) one case in quotactl().  With CAP_SYS_ADMIN required.  If attacker got
that, it's too fucking late anyway.

i) fstatat() and friends, statfs(), utimes() et.al, {set,get,list,remove}xattr()
- no difference in checks between pathname- and descriptor-based calls
anyway.  Please, note that for fsetxattr() we do *NOT* require the file
to be opened for write; adding such requirement would be a user-visible
API change, and one fairly likely to break stuff, at that.

j) umount() and pivot_root().  We really don't care how the file had been
opened, and attacker capable of playing with that can do a lot more.

k) inotify_add_watch()/fanotify_mark().  No descriptor-based versions,
no permission checks other than "may read whatever we ended up with".
I really doubt that we care about the way fd had been opened in case
of /proc/<pid>/fd/<fd>.

l) truncate() mess.

m) open() mess.

AFAICS, the *only* cases when we might possibly care are linkat() target,
truncate() and open().  Note, BTW, that right now we *do* allow an attempt
to reopen a file via procfs symlink r/w, even when file had been r/o.
It's subject to permissions on the object being opened, but that's it.

I'm not sure we can change that - again, it's a user-visible API, and
the change is very likely to break some scripts.  In fact, it's about
as dangerous as a full-blown switch to dup-style semantics for procfs
opens, and it's a lot less attractive.

For truncate() we would only need to have FMODE_WRITE reported, more or
less the same way as FMODE_FLINK.  And without open() changes it doesn't
buy us anything at all...

I've no problem with unrolling the user_path_at() in do_sys_truncate()
into an explicit loop by trailing symlinks and checking for indication
left by proc_pid_follow_link(), more or less the same way as with
LOOKUP_LINK in lookup_last().  It's _far_ less invasive than playing
with "oh, here we fill a struct path or maybe a struct file" horrors,
pinning struct file for no reason, etc.

AFAICS, the real question is whether we dare to change open() behaviour on
/proc/*/fd/*.  I've played with that a bit and I believe that I can do
the switch to dup-style with very localized changes in fs/namei.c and
fs/proc/{base,fd}.c.  Will be even binary compatible kernel-side -
->atomic_open() returns NULL/ERR_PTR where it used to return 0/-error,
not that we had many instances to convert.  *IF* that variant is not
out of consideration for userland API stability reasons, I would certainly
prefer to go that way; turns out that these days we can pull it off without
black magic in descriptor handling, etc.  Linus?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate
  2013-08-28  6:16                                         ` Al Viro
@ 2013-08-28 16:24                                           ` Linus Torvalds
  2013-08-28 19:04                                           ` Andy Lutomirski
  1 sibling, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2013-08-28 16:24 UTC (permalink / raw)
  To: Al Viro
  Cc: Andy Lutomirski, Willy Tarreau, Ingo Molnar, security,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

Pseudo-related: I just reverted the flink() commit in my tree, so that
at least wrt that whole thing we'll be back to the original behavior,
broken or not. Clearly we're not done with this discussion, and the
patches flying around aren't appropriate for 3.11, so..

               Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate
  2013-08-28  6:16                                         ` Al Viro
  2013-08-28 16:24                                           ` Linus Torvalds
@ 2013-08-28 19:04                                           ` Andy Lutomirski
  2013-08-28 19:59                                             ` Al Viro
  1 sibling, 1 reply; 66+ messages in thread
From: Andy Lutomirski @ 2013-08-28 19:04 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Willy Tarreau, Ingo Molnar, security,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Tue, Aug 27, 2013 at 11:16 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Aug 27, 2013 at 01:28:27PM -0700, Andy Lutomirski wrote:
>> There are also O_PATH fds, and I'm not sure what the semantics of
>> O_PATH fds are or should be when they refer to something other than a
>> directory.
>
> O_PATH file just points to specific location in the tree, no more and
> no less.

I don't know whether ftruncate(some O_PATH fd) should work.  But this
probably barely matters.

>
> g) name_to_handle_at().  Not sure if we need any capability checks, but
> we definitely don't give a damn about the way file had been opened.

open_by_handle(name_to_handle(read-only fd)) is dangerous for the same
reason that open("/proc/self/fd/N", O_RDWR) is, but the capability
checks have that covered.

>
> h) one case in quotactl().  With CAP_SYS_ADMIN required.  If attacker got
> that, it's too fucking late anyway.
>
> i) fstatat() and friends, statfs(), utimes() et.al, {set,get,list,remove}xattr()
> - no difference in checks between pathname- and descriptor-based calls
> anyway.  Please, note that for fsetxattr() we do *NOT* require the file
> to be opened for write; adding such requirement would be a user-visible
> API change, and one fairly likely to break stuff, at that.

Sigh.  I wonder if that was deliberate.  Anyway, it is what it is.

>
> j) umount() and pivot_root().  We really don't care how the file had been
> opened, and attacker capable of playing with that can do a lot more.
>
> k) inotify_add_watch()/fanotify_mark().  No descriptor-based versions,
> no permission checks other than "may read whatever we ended up with".
> I really doubt that we care about the way fd had been opened in case
> of /proc/<pid>/fd/<fd>.
>
> l) truncate() mess.
>
> m) open() mess.
>
> AFAICS, the *only* cases when we might possibly care are linkat() target,
> truncate() and open().  Note, BTW, that right now we *do* allow an attempt
> to reopen a file via procfs symlink r/w, even when file had been r/o.
> It's subject to permissions on the object being opened, but that's it.
>
> I'm not sure we can change that - again, it's a user-visible API, and
> the change is very likely to break some scripts.  In fact, it's about
> as dangerous as a full-blown switch to dup-style semantics for procfs
> opens, and it's a lot less attractive.
>
> For truncate() we would only need to have FMODE_WRITE reported, more or
> less the same way as FMODE_FLINK.  And without open() changes it doesn't
> buy us anything at all...
>
> I've no problem with unrolling the user_path_at() in do_sys_truncate()
> into an explicit loop by trailing symlinks and checking for indication
> left by proc_pid_follow_link(), more or less the same way as with
> LOOKUP_LINK in lookup_last().  It's _far_ less invasive than playing
> with "oh, here we fill a struct path or maybe a struct file" horrors,
> pinning struct file for no reason, etc.
>
> AFAICS, the real question is whether we dare to change open() behaviour on
> /proc/*/fd/*.  I've played with that a bit and I believe that I can do
> the switch to dup-style with very localized changes in fs/namei.c and
> fs/proc/{base,fd}.c.  Will be even binary compatible kernel-side -
> ->atomic_open() returns NULL/ERR_PTR where it used to return 0/-error,
> not that we had many instances to convert.  *IF* that variant is not
> out of consideration for userland API stability reasons, I would certainly
> prefer to go that way; turns out that these days we can pull it off without
> black magic in descriptor handling, etc.  Linus?

I personally find the check-mode-but-get-a-new-struct-file version to
be less weird that the dup approach.  Either approach will break
scripts that try to write to /dev/stdin (which is the whole point).

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate
  2013-08-28 19:04                                           ` Andy Lutomirski
@ 2013-08-28 19:59                                             ` Al Viro
  2013-08-28 21:07                                               ` Andy Lutomirski
  0 siblings, 1 reply; 66+ messages in thread
From: Al Viro @ 2013-08-28 19:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Willy Tarreau, Ingo Molnar, security,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Wed, Aug 28, 2013 at 12:04:43PM -0700, Andy Lutomirski wrote:
> On Tue, Aug 27, 2013 at 11:16 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Tue, Aug 27, 2013 at 01:28:27PM -0700, Andy Lutomirski wrote:
> >> There are also O_PATH fds, and I'm not sure what the semantics of
> >> O_PATH fds are or should be when they refer to something other than a
> >> directory.
> >
> > O_PATH file just points to specific location in the tree, no more and
> > no less.
> 
> I don't know whether ftruncate(some O_PATH fd) should work.  But this
> probably barely matters.

It shouldn't.  No IO on these guys at all.

> > AFAICS, the *only* cases when we might possibly care are linkat() target,
> > truncate() and open().  Note, BTW, that right now we *do* allow an attempt
> > to reopen a file via procfs symlink r/w, even when file had been r/o.
> > It's subject to permissions on the object being opened, but that's it.
> >
> > I'm not sure we can change that - again, it's a user-visible API, and
> > the change is very likely to break some scripts.  In fact, it's about
> > as dangerous as a full-blown switch to dup-style semantics for procfs
> > opens, and it's a lot less attractive.
> >
> > For truncate() we would only need to have FMODE_WRITE reported, more or
> > less the same way as FMODE_FLINK.  And without open() changes it doesn't
> > buy us anything at all...
> >
> > I've no problem with unrolling the user_path_at() in do_sys_truncate()
> > into an explicit loop by trailing symlinks and checking for indication
> > left by proc_pid_follow_link(), more or less the same way as with
> > LOOKUP_LINK in lookup_last().  It's _far_ less invasive than playing
> > with "oh, here we fill a struct path or maybe a struct file" horrors,
> > pinning struct file for no reason, etc.
> >
> > AFAICS, the real question is whether we dare to change open() behaviour on
> > /proc/*/fd/*.  I've played with that a bit and I believe that I can do
> > the switch to dup-style with very localized changes in fs/namei.c and
> > fs/proc/{base,fd}.c.  Will be even binary compatible kernel-side -
> > ->atomic_open() returns NULL/ERR_PTR where it used to return 0/-error,
> > not that we had many instances to convert.  *IF* that variant is not
> > out of consideration for userland API stability reasons, I would certainly
> > prefer to go that way; turns out that these days we can pull it off without
> > black magic in descriptor handling, etc.  Linus?
> 
> I personally find the check-mode-but-get-a-new-struct-file version to
> be less weird that the dup approach.  Either approach will break
> scripts that try to write to /dev/stdin (which is the whole point).

What, breaking existing userland?  IMO that's a thing to avoid, unless we
have really, really strong reasons not to.  And yes, it goes for both
variants...  FWIW, I'm not convinced that the reasons you are giving for
it are strong enough - passing somebody a read-only file descriptor to
a file they could open for write and relying on their inability to truncate
the fscker just because it's not reachable via any path they've got search
permissions to looks like a Bloody Bad Idea(tm), and not only because it won't
do what you hope it'll do on existing kernels.  It's very easy to fuck up
and end up with a searchable path to the damn thing; e.g. /proc/<pid>/cwd/foo
will bypass the grandparent of foo not being searchable for you, etc.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate
  2013-08-28 19:59                                             ` Al Viro
@ 2013-08-28 21:07                                               ` Andy Lutomirski
  0 siblings, 0 replies; 66+ messages in thread
From: Andy Lutomirski @ 2013-08-28 21:07 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Willy Tarreau, Ingo Molnar, security,
	Linux Kernel Mailing List, Oleg Nesterov, Linux FS Devel,
	Brad Spengler

On Wed, Aug 28, 2013 at 12:59 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Wed, Aug 28, 2013 at 12:04:43PM -0700, Andy Lutomirski wrote:
>> On Tue, Aug 27, 2013 at 11:16 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> > On Tue, Aug 27, 2013 at 01:28:27PM -0700, Andy Lutomirski wrote:
>> >> There are also O_PATH fds, and I'm not sure what the semantics of
>> >> O_PATH fds are or should be when they refer to something other than a
>> >> directory.
>> >
>> > O_PATH file just points to specific location in the tree, no more and
>> > no less.
>>
>> I don't know whether ftruncate(some O_PATH fd) should work.  But this
>> probably barely matters.
>
> It shouldn't.  No IO on these guys at all.
>
>> > AFAICS, the *only* cases when we might possibly care are linkat() target,
>> > truncate() and open().  Note, BTW, that right now we *do* allow an attempt
>> > to reopen a file via procfs symlink r/w, even when file had been r/o.
>> > It's subject to permissions on the object being opened, but that's it.
>> >
>> > I'm not sure we can change that - again, it's a user-visible API, and
>> > the change is very likely to break some scripts.  In fact, it's about
>> > as dangerous as a full-blown switch to dup-style semantics for procfs
>> > opens, and it's a lot less attractive.
>> >
>> > For truncate() we would only need to have FMODE_WRITE reported, more or
>> > less the same way as FMODE_FLINK.  And without open() changes it doesn't
>> > buy us anything at all...
>> >
>> > I've no problem with unrolling the user_path_at() in do_sys_truncate()
>> > into an explicit loop by trailing symlinks and checking for indication
>> > left by proc_pid_follow_link(), more or less the same way as with
>> > LOOKUP_LINK in lookup_last().  It's _far_ less invasive than playing
>> > with "oh, here we fill a struct path or maybe a struct file" horrors,
>> > pinning struct file for no reason, etc.
>> >
>> > AFAICS, the real question is whether we dare to change open() behaviour on
>> > /proc/*/fd/*.  I've played with that a bit and I believe that I can do
>> > the switch to dup-style with very localized changes in fs/namei.c and
>> > fs/proc/{base,fd}.c.  Will be even binary compatible kernel-side -
>> > ->atomic_open() returns NULL/ERR_PTR where it used to return 0/-error,
>> > not that we had many instances to convert.  *IF* that variant is not
>> > out of consideration for userland API stability reasons, I would certainly
>> > prefer to go that way; turns out that these days we can pull it off without
>> > black magic in descriptor handling, etc.  Linus?
>>
>> I personally find the check-mode-but-get-a-new-struct-file version to
>> be less weird that the dup approach.  Either approach will break
>> scripts that try to write to /dev/stdin (which is the whole point).
>
> What, breaking existing userland?  IMO that's a thing to avoid, unless we
> have really, really strong reasons not to.  And yes, it goes for both
> variants...  FWIW, I'm not convinced that the reasons you are giving for
> it are strong enough - passing somebody a read-only file descriptor to
> a file they could open for write and relying on their inability to truncate
> the fscker just because it's not reachable via any path they've got search
> permissions to looks like a Bloody Bad Idea(tm), and not only because it won't
> do what you hope it'll do on existing kernels.  It's very easy to fuck up
> and end up with a searchable path to the damn thing; e.g. /proc/<pid>/cwd/foo
> will bypass the grandparent of foo not being searchable for you, etc.
>

This affects O_TMPFILE, for example -- create a file with O_TMPFILE |
O_RDWR and mode 0666 (by accident), write something, then
open("/proc/self/fd/N", O_RDONLY) and send the resulting fd to
someone.  They can't directly write it, but they can reopen it O_RDWR.

I agree that flink is the main issue here, FWIW.  I think that the
current semantics are just too screwed up to make it really safe to
pass around fds with reduced access modes and expect those modes to
stick.

--Andy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: /proc/pid/fd && anon_inode_fops
  2013-08-25 18:32                   ` /proc/pid/fd && anon_inode_fops Linus Torvalds
  2013-08-25 19:11                     ` Al Viro
  2013-08-25 19:17                     ` Andy Lutomirski
@ 2013-09-03 15:58                     ` Pavel Machek
  2 siblings, 0 replies; 66+ messages in thread
From: Pavel Machek @ 2013-09-03 15:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Willy Tarreau, Oleg Nesterov, Andy Lutomirski, security,
	Ingo Molnar, Linux Kernel Mailing List, Linux FS Devel,
	Brad Spengler

Hi!

> > We are really stuck with the current semantics here - switching to
> > *BSD one would not only mean serious surgery on descriptor handling
> > (it's one of the wartier areas in *BSD VFS, in large part because
> > of magic-open-really-a-dup kludges they have to do), it would change
> > a long-standing userland API that had been there for nearly 20 years
> > _and_ one that tends to be used in corner cases of hell knows how many
> > scripts.
> 
> Actually, I'm pretty sure we did have the "dup" semantics at one point
> (long ago), and they were really nice (because you could use them to
> see where in the stream the fd was etc). It just fit so horribly badly
> into the VFS semantics that it got changed into the current "new file
> descriptor" one. Afaik, nothing broke.

Hmm, are you going to break my exploit?

http://www.exploit-db.com/exploits/10038/

I'd like that, because I don't think /proc should allow people to
bypass directory permissions.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2013-09-03 15:58 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-21 19:14 [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) Andy Lutomirski
     [not found] ` <CA+55aFxi-ps2f2M8xPhfbuQ0pToqupPrDsLi2+GPUK2sqdYfUw@mail.gmail.com>
     [not found]   ` <CALCETrW7+LcexA6v6RQDKhni_yJAduOmiSDneCpq3v8sPDvwUQ@mail.gmail.com>
2013-08-21 20:16     ` Willy Tarreau
2013-08-22 18:48 ` Linus Torvalds
2013-08-22 18:53   ` Willy Tarreau
2013-08-22 19:05     ` Andy Lutomirski
2013-08-22 19:23       ` Linus Torvalds
2013-08-22 20:10         ` Andy Lutomirski
2013-08-22 20:15           ` Willy Tarreau
2013-08-22 20:22             ` Andy Lutomirski
2013-08-22 20:44               ` Linus Torvalds
2013-08-22 20:48                 ` Andy Lutomirski
2013-08-22 20:54                   ` Linus Torvalds
2013-08-22 20:58                     ` Andy Lutomirski
2013-08-23  1:07                     ` Al Viro
2013-08-25  3:37                       ` Al Viro
2013-08-25  7:26                         ` Andy Lutomirski
2013-08-25 14:23                           ` Al Viro
2013-08-25 17:04                             ` Andy Lutomirski
2013-08-25 19:57                         ` Linus Torvalds
2013-08-25 20:06                           ` Al Viro
2013-08-25 20:23                             ` Linus Torvalds
2013-08-26 17:37                               ` Linus Torvalds
2013-08-26 18:07                                 ` Linus Torvalds
2013-08-26 18:11                                   ` Andy Lutomirski
2013-08-27 19:16                                   ` [RFC PATCH] fs: Add user_file_or_path_at and use it for truncate Andy Lutomirski
2013-08-27 19:32                                     ` Linus Torvalds
2013-08-27 20:28                                       ` Andy Lutomirski
2013-08-28  6:16                                         ` Al Viro
2013-08-28 16:24                                           ` Linus Torvalds
2013-08-28 19:04                                           ` Andy Lutomirski
2013-08-28 19:59                                             ` Al Viro
2013-08-28 21:07                                               ` Andy Lutomirski
2013-08-27 23:08                                     ` Al Viro
2013-08-27 23:13                                       ` Andy Lutomirski
2013-08-24 18:29             ` /proc/pid/fd && anon_inode_fops Oleg Nesterov
2013-08-24 21:24               ` Willy Tarreau
2013-08-25  5:23                 ` Al Viro
2013-08-25  6:50                   ` Willy Tarreau
2013-08-25 18:51                     ` Linus Torvalds
2013-08-25 19:48                       ` Oleg Nesterov
2013-08-25 20:05                         ` Linus Torvalds
2013-08-26 15:33                           ` Oleg Nesterov
2013-08-26 16:37                             ` Oleg Nesterov
2013-08-26 17:54                               ` [PATCH] proc: make proc_fd_permission() thread-friendly Oleg Nesterov
2013-08-26 18:09                                 ` Linus Torvalds
2013-08-26 19:35                                   ` Linus Torvalds
2013-08-26 20:20                                     ` Willy Tarreau
2013-08-27 15:05                                       ` Oleg Nesterov
2013-08-27 14:39                                     ` [PATCH 0/1] proc: make /proc/self point to thread Oleg Nesterov
2013-08-27 14:40                                       ` [PATCH 1/1] " Oleg Nesterov
2013-08-27 16:39                                         ` Linus Torvalds
2013-08-27 17:49                                           ` Oleg Nesterov
2013-08-27 18:15                                             ` Linus Torvalds
2013-08-27 18:28                                               ` Oleg Nesterov
     [not found]                                     ` <CALCETrXP-mYBPRon=0NzexW1FK1Qxz2+Bwv7-WeHBQpvW7ywRg@mail.gmail.com>
2013-08-27 15:45                                       ` [PATCH] proc: make proc_fd_permission() thread-friendly Oleg Nesterov
2013-08-26 18:32                                 ` Eric W. Biederman
2013-08-26 18:46                                   ` Oleg Nesterov
2013-08-26 18:56                                     ` Oleg Nesterov
2013-08-26 19:10                                     ` Eric W. Biederman
2013-08-27 14:53                                       ` Oleg Nesterov
2013-08-25 18:32                   ` /proc/pid/fd && anon_inode_fops Linus Torvalds
2013-08-25 19:11                     ` Al Viro
2013-08-25 19:17                     ` Andy Lutomirski
2013-09-03 15:58                     ` Pavel Machek
2013-08-25 15:45                 ` Oleg Nesterov
2013-08-22 19:39       ` [PATCH v2] vfs: Tighten up linkat(..., AT_EMPTY_PATH) Willy Tarreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).