All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 resend] vfs: new O_NODE open flag
@ 2009-11-05 11:23 Miklos Szeredi
  2009-11-05 13:15 ` Alan Cox
  0 siblings, 1 reply; 21+ messages in thread
From: Miklos Szeredi @ 2009-11-05 11:23 UTC (permalink / raw)
  To: akpm
  Cc: viro, dhowells, hch, adilger, mtk.manpages, torvalds, drepper,
	jamie, linux-fsdevel, linux-kernel

Hi Andrew,

What's up with this patch?

Al is offline (as usual), you haven't responded to my last request to
take it into -mm.

It would be so much more fun to contribute to VFS if we had a
maintainer who took less than three months to respond ;-/

Thanks,
Miklos

---
From: Miklos Szeredi <mszeredi@suse.cz>

This patch adds a new open flag, O_NODE.  This flag means: open just
the filesystem node instead of the object referenced by the node.

This allows for a couple of new things:

 - opening a special file (device/socket/fifo) without side effects

 - opening a symlink

 - re-opening normally after checking file type (there's a debate
   whether this would have security issues, but currently we do allow
   re-opening with increased permissions thorugh /proc/*/fd)

 - opening any type of file without any permission is also possible
   (of course the resuling file descriptor may not be read or written)

The above allows fstat(), fchmod(), ioctl(), etc to be used for files
previously not possible.  AFS for example wanted a "pioctl()" syscall
for various things, but the same can be done without having to define
a new syscall:

     fd = open(path, O_NODE);
     ioctl(fd, ...);
     close(fd);

Another important use would be opening a directory without read
permission (see O_SEARCH from the POSIX draft for what this is useful
for).  O_NODE is not quite equivalent to O_SEARCH, but similar and
equally useful.

Opening a file will not call the driver's ->open() method and will not
have any side effect other than referencing the dentry/vfsmount from
the struct file pointer.

Currently O_NODE can only be used together with O_NOACCESS (value 3),
meaning that read(2) and write(2) will return EBADF and no permission
is necessary to open the file.  This is logical for device nodes and
fifos.  For directories we could allow O_RDONLY, and for regular files
any access mode, but this is not done yet.

O_NODE used together with O_NOFOLLOW will open a symlink node itself
instead of returning ELOOP.  O_NOFOLLOW will not prevent following
mountpoints if used together with O_NODE.  In theory we could allow
O_RDONLY for symlinks too and return the contents of the link on
read(2).  Permissions on symlinks cannot be changed, so make fchmod()
fail with ELOOP on such a file descriptor.

Filesystems which want to install special file operations for O_NODE
opens (e.g. ioctl) may set S_OPEN_NODE flag in the inode.  This will
cause dentry_open() to call inode->i_fop->open, and the filesystem is
responsible for handling the O_NODE flag and installing the right
filesystem operations in file->f_op.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: David Howells <dhowells@redhat.com>
---
 fs/file_table.c             |    6 ++--
 fs/locks.c                  |    3 ++
 fs/namei.c                  |   65 ++++++++++++++++++++++++++------------------
 fs/open.c                   |   17 ++++++++++-
 include/asm-generic/fcntl.h |    4 ++
 include/linux/fs.h          |    1 
 6 files changed, 67 insertions(+), 29 deletions(-)

Index: linux-2.6/fs/file_table.c
===================================================================
--- linux-2.6.orig/fs/file_table.c	2009-09-24 20:10:58.000000000 +0200
+++ linux-2.6/fs/file_table.c	2009-10-08 09:47:05.000000000 +0200
@@ -281,8 +281,10 @@ void __fput(struct file *file)
 		file->f_op->release(inode, file);
 	security_file_free(file);
 	ima_file_free(file);
-	if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL))
-		cdev_put(inode->i_cdev);
+	if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL)) {
+		if (!(file->f_flags & O_NODE))
+			cdev_put(inode->i_cdev);
+	}
 	fops_put(file->f_op);
 	put_pid(file->f_owner.pid);
 	file_kill(file);
Index: linux-2.6/fs/locks.c
===================================================================
--- linux-2.6.orig/fs/locks.c	2009-09-24 20:10:26.000000000 +0200
+++ linux-2.6/fs/locks.c	2009-10-08 09:47:05.000000000 +0200
@@ -1183,6 +1183,9 @@ int __break_lease(struct inode *inode, u
 	unsigned long break_time;
 	int i_have_this_lease = 0;
 
+	if (!(mode & (FMODE_READ | FMODE_WRITE)))
+		return 0;
+
 	new_fl = lease_alloc(NULL, mode & FMODE_WRITE ? F_WRLCK : F_RDLCK);
 
 	lock_kernel();
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c	2009-09-24 20:10:26.000000000 +0200
+++ linux-2.6/fs/namei.c	2009-10-08 09:47:05.000000000 +0200
@@ -1160,7 +1160,9 @@ static int path_lookup_open(int dfd, con
 	nd->intent.open.file = filp;
 	nd->intent.open.flags = open_flags;
 	nd->intent.open.create_mode = 0;
-	err = do_path_lookup(dfd, name, lookup_flags|LOOKUP_OPEN, nd);
+	if (!(open_flags & O_NODE))
+		lookup_flags |= LOOKUP_OPEN;
+	err = do_path_lookup(dfd, name, lookup_flags, nd);
 	if (IS_ERR(nd->intent.open.file)) {
 		if (err == 0) {
 			err = PTR_ERR(nd->intent.open.file);
@@ -1511,22 +1513,27 @@ int may_open(struct path *path, int acc_
 	if (!inode)
 		return -ENOENT;
 
-	switch (inode->i_mode & S_IFMT) {
-	case S_IFLNK:
-		return -ELOOP;
-	case S_IFDIR:
-		if (acc_mode & MAY_WRITE)
-			return -EISDIR;
-		break;
-	case S_IFBLK:
-	case S_IFCHR:
-		if (path->mnt->mnt_flags & MNT_NODEV)
+	if ((flag & O_NODE)) {
+		if (acc_mode & (MAY_READ | MAY_WRITE))
 			return -EACCES;
-		/*FALLTHRU*/
-	case S_IFIFO:
-	case S_IFSOCK:
-		flag &= ~O_TRUNC;
-		break;
+	} else {
+		switch (inode->i_mode & S_IFMT) {
+		case S_IFLNK:
+			return -ELOOP;
+		case S_IFDIR:
+			if (acc_mode & MAY_WRITE)
+				return -EISDIR;
+			break;
+		case S_IFBLK:
+		case S_IFCHR:
+			if (path->mnt->mnt_flags & MNT_NODEV)
+				return -EACCES;
+			/*FALLTHRU*/
+		case S_IFIFO:
+		case S_IFSOCK:
+			flag &= ~O_TRUNC;
+			break;
+		}
 	}
 
 	error = inode_permission(inode, acc_mode);
@@ -1639,14 +1646,16 @@ out_unlock:
  *	11 - read-write
  * for the internal routines (ie open_namei()/follow_link() etc)
  * This is more logical, and also allows the 00 "no perm needed"
- * to be used for symlinks (where the permissions are checked
- * later).
+ * to be used for opening a symlink, pipe, socket or device with
+ * O_NODE.
  *
 */
 static inline int open_to_namei_flags(int flag)
 {
 	if ((flag+1) & O_ACCMODE)
 		flag++;
+	else if (flag & O_NODE)
+		flag &= ~O_ACCMODE;
 	return flag;
 }
 
@@ -1734,7 +1743,9 @@ struct file *do_filp_open(int dfd, const
 	nd.intent.open.create_mode = mode;
 	dir = nd.path.dentry;
 	nd.flags &= ~LOOKUP_PARENT;
-	nd.flags |= LOOKUP_CREATE | LOOKUP_OPEN;
+	nd.flags |= LOOKUP_CREATE;
+	if (!(open_flag & O_NODE))
+		nd.flags |= LOOKUP_OPEN;
 	if (flag & O_EXCL)
 		nd.flags |= LOOKUP_EXCL;
 	mutex_lock(&dir->d_inode->i_mutex);
@@ -1793,19 +1804,24 @@ do_last:
 
 	if (__follow_mount(&path)) {
 		error = -ELOOP;
-		if (flag & O_NOFOLLOW)
+		if ((flag & O_NOFOLLOW) & !(flag & O_NODE))
 			goto exit_dput;
 	}
 
 	error = -ENOENT;
 	if (!path.dentry->d_inode)
 		goto exit_dput;
-	if (path.dentry->d_inode->i_op->follow_link)
-		goto do_link;
+	if (path.dentry->d_inode->i_op->follow_link) {
+		if (!(flag & O_NOFOLLOW))
+			goto do_link;
+		error = -ELOOP;
+		if (!(flag & O_NODE))
+			goto exit_dput;
+	}
 
 	path_to_nameidata(&path, &nd);
 	error = -EISDIR;
-	if (path.dentry->d_inode && S_ISDIR(path.dentry->d_inode->i_mode))
+	if (S_ISDIR(path.dentry->d_inode->i_mode))
 		goto exit;
 ok:
 	/*
@@ -1859,9 +1875,6 @@ exit_parent:
 	return ERR_PTR(error);
 
 do_link:
-	error = -ELOOP;
-	if (flag & O_NOFOLLOW)
-		goto exit_dput;
 	/*
 	 * This is subtle. Instead of calling do_follow_link() we do the
 	 * thing by hands. The reason is that this way we have zero link_count
Index: linux-2.6/fs/open.c
===================================================================
--- linux-2.6.orig/fs/open.c	2009-09-24 20:10:26.000000000 +0200
+++ linux-2.6/fs/open.c	2009-10-08 09:50:36.000000000 +0200
@@ -611,6 +611,11 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd
 	dentry = file->f_path.dentry;
 	inode = dentry->d_inode;
 
+	/* fchmod not allowed for symlinks opened with O_NODE */
+	err = -ELOOP;
+	if (S_ISLNK(inode->i_mode))
+		goto out_putf;
+
 	audit_inode(NULL, dentry);
 
 	err = mnt_want_write_file(file);
@@ -804,12 +809,17 @@ static inline int __get_file_write_acces
 	return error;
 }
 
+static const struct file_operations default_node_fops = {
+	.llseek = no_llseek,
+};
+
 static struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt,
 					int flags, struct file *f,
 					int (*open)(struct inode *, struct file *),
 					const struct cred *cred)
 {
 	struct inode *inode;
+	const struct file_operations *fops;
 	int error;
 
 	f->f_flags = flags;
@@ -828,7 +838,12 @@ static struct file *__dentry_open(struct
 	f->f_path.dentry = dentry;
 	f->f_path.mnt = mnt;
 	f->f_pos = 0;
-	f->f_op = fops_get(inode->i_fop);
+
+	fops = inode->i_fop;
+	if ((flags & O_NODE) && !(inode->i_flags & S_OPEN_NODE))
+		fops = &default_node_fops;
+	f->f_op = fops_get(fops);
+
 	file_move(f, &inode->i_sb->s_files);
 
 	error = security_dentry_open(f, cred);
Index: linux-2.6/include/asm-generic/fcntl.h
===================================================================
--- linux-2.6.orig/include/asm-generic/fcntl.h	2009-09-24 20:10:58.000000000 +0200
+++ linux-2.6/include/asm-generic/fcntl.h	2009-10-08 09:47:05.000000000 +0200
@@ -9,6 +9,7 @@
 #define O_RDONLY	00000000
 #define O_WRONLY	00000001
 #define O_RDWR		00000002
+#define O_NOACCESS	00000003
 #ifndef O_CREAT
 #define O_CREAT		00000100	/* not fcntl */
 #endif
@@ -51,6 +52,9 @@
 #ifndef O_CLOEXEC
 #define O_CLOEXEC	02000000	/* set close_on_exec */
 #endif
+#ifndef O_NODE
+#define O_NODE		04000000	/* open filesystem node, not device */
+#endif
 #ifndef O_NDELAY
 #define O_NDELAY	O_NONBLOCK
 #endif
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2009-10-08 09:46:44.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2009-10-08 09:47:05.000000000 +0200
@@ -231,6 +231,7 @@ struct inodes_stat_t {
 #define S_NOCMTIME	128	/* Do not update file c/mtime */
 #define S_SWAPFILE	256	/* Do not truncate: swapon got its bmaps */
 #define S_PRIVATE	512	/* Inode is fs-internal */
+#define S_OPEN_NODE	1024	/* Fs is prepared for O_NODE opens */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-05 11:23 [PATCH v2 resend] vfs: new O_NODE open flag Miklos Szeredi
@ 2009-11-05 13:15 ` Alan Cox
  2009-11-05 14:27   ` Miklos Szeredi
  0 siblings, 1 reply; 21+ messages in thread
From: Alan Cox @ 2009-11-05 13:15 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm, viro, dhowells, hch, adilger, mtk.manpages, torvalds,
	drepper, jamie, linux-fsdevel, linux-kernel

>  - re-opening normally after checking file type (there's a debate
>    whether this would have security issues, but currently we do allow
>    re-opening with increased permissions thorugh /proc/*/fd)

Which has already been demonstrated to be an (unfixed) security hole.

> Filesystems which want to install special file operations for O_NODE
> opens (e.g. ioctl) may set S_OPEN_NODE flag in the inode.  This will

Wrong way around. The defailt should be that O_NODE fails for any handle
which has not specifically added support.

It'll fill itself in where it matters pretty fast that way but cases will
get reviewed which is critical.

You also need to address the open with no permissions pinning a removable
device question.

Alan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-05 13:15 ` Alan Cox
@ 2009-11-05 14:27   ` Miklos Szeredi
  2009-11-05 14:50     ` Alan Cox
  2009-11-06 14:17     ` Pavel Machek
  0 siblings, 2 replies; 21+ messages in thread
From: Miklos Szeredi @ 2009-11-05 14:27 UTC (permalink / raw)
  To: Alan Cox
  Cc: miklos, akpm, viro, dhowells, hch, adilger, mtk.manpages,
	torvalds, drepper, jamie, linux-fsdevel, linux-kernel

On Thu, 5 Nov 2009, Alan Cox wrote:
> >  - re-opening normally after checking file type (there's a debate
> >    whether this would have security issues, but currently we do allow
> >    re-opening with increased permissions thorugh /proc/*/fd)
> 
> Which has already been demonstrated to be an (unfixed) security hole.

No it hasn't :)  Jamie theorized that there *might* be a real world
situation where the application writer didn't anticipate this
behavior.  But as to actual demonstration, we have not seen one yet, I
think.

And as for reopening O_NODE files with increased permission: that's
feature people actually expressed interest in, so it's hardly a
security hole, is it?

> 
> > Filesystems which want to install special file operations for O_NODE
> > opens (e.g. ioctl) may set S_OPEN_NODE flag in the inode.  This will
> 
> Wrong way around. The defailt should be that O_NODE fails for any handle
> which has not specifically added support.

Why?  O_NODE can be nicely implemented without any filesystem support.

The only filesystems that need to do anything special is things like
AFS which for example want to implement special ioctls, which work on
the node itself.

> You also need to address the open with no permissions pinning a removable
> device question.

The whole point of O_NODE is that it doesn't do that, it only goes as
far as the mnt/dentry for the filesystem node and not further.  It
does not get to touch the device at all, so it can't pin it or have
any other side effect.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-05 14:27   ` Miklos Szeredi
@ 2009-11-05 14:50     ` Alan Cox
  2009-11-05 15:24       ` Miklos Szeredi
  2009-11-06 14:17     ` Pavel Machek
  1 sibling, 1 reply; 21+ messages in thread
From: Alan Cox @ 2009-11-05 14:50 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: miklos, akpm, viro, dhowells, hch, adilger, mtk.manpages,
	torvalds, drepper, jamie, linux-fsdevel, linux-kernel

On Thu, 05 Nov 2009 15:27:06 +0100
Miklos Szeredi <miklos@szeredi.hu> wrote:

> On Thu, 5 Nov 2009, Alan Cox wrote:
> > >  - re-opening normally after checking file type (there's a debate
> > >    whether this would have security issues, but currently we do allow
> > >    re-opening with increased permissions thorugh /proc/*/fd)
> > 
> > Which has already been demonstrated to be an (unfixed) security hole.
> 
> No it hasn't :)  Jamie theorized that there *might* be a real world
> situation where the application writer didn't anticipate this
> behavior.  But as to actual demonstration, we have not seen one yet, I
> think.

The examples on the list are not entirely theoretical but based on
accepted and normal behaviour for application programming and Unix
security models -so they are a security bug, minor or otherwise.

Fortunately you can patch it by hand.

> And as for reopening O_NODE files with increased permission: that's
> feature people actually expressed interest in, so it's hardly a
> security hole, is it?

Its a very unexpected semantic particularly for a passed file handle.

> > Wrong way around. The defailt should be that O_NODE fails for any handle
> > which has not specifically added support.
> 
> Why?  O_NODE can be nicely implemented without any filesystem support.

So that you audit the behaviour for unexpected surprises as you go. And
in most filesystem cases that consists of "dum de dum, nothing to see,
add default handler, tick".

But that isn't the case for some things - consider CIFS and other network
file systems.

> The only filesystems that need to do anything special is things like
> AFS which for example want to implement special ioctls, which work on
> the node itself.
> 
> > You also need to address the open with no permissions pinning a removable
> > device question.
> 
> The whole point of O_NODE is that it doesn't do that, it only goes as
> far as the mnt/dentry for the filesystem node and not further.  It
> does not get to touch the device at all, so it can't pin it or have
> any other side effect.

You have a reference to the mnt/dentry pinned so how will you unmount the
volume ?

Alan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-05 14:50     ` Alan Cox
@ 2009-11-05 15:24       ` Miklos Szeredi
  2009-11-05 15:56         ` Alan Cox
  2009-11-06  1:40         ` Jamie Lokier
  0 siblings, 2 replies; 21+ messages in thread
From: Miklos Szeredi @ 2009-11-05 15:24 UTC (permalink / raw)
  To: Alan Cox
  Cc: miklos, miklos, akpm, viro, dhowells, hch, adilger, mtk.manpages,
	torvalds, drepper, jamie, linux-fsdevel, linux-kernel

On Thu, 5 Nov 2009, Alan Cox wrote:
> The examples on the list are not entirely theoretical but based on
> accepted and normal behaviour for application programming and Unix
> security models -so they are a security bug, minor or otherwise.
> 
> Fortunately you can patch it by hand.

How do you patch it by hand?

> > And as for reopening O_NODE files with increased permission: that's
> > feature people actually expressed interest in, so it's hardly a
> > security hole, is it?
> 
> Its a very unexpected semantic particularly for a passed file handle.

All of this is about new and unexpected semantics.  I don't think
anything more needs to be done than document it in the manpage:

  "A file descriptor opened with O_NODE | O_NOACCESS may be used to
   re-open the same file later with increased permissions
   (e.g. O_RDWR) if the access mode allows.  This is true even if the
   permissions on the path leading up to the file would prevent it"

> > > Wrong way around. The defailt should be that O_NODE fails for any handle
> > > which has not specifically added support.
> > 
> > Why?  O_NODE can be nicely implemented without any filesystem support.
> 
> So that you audit the behaviour for unexpected surprises as you go. And
> in most filesystem cases that consists of "dum de dum, nothing to see,
> add default handler, tick".
> 
> But that isn't the case for some things - consider CIFS and other network
> file systems.

Why?

Why would the server need to know anything about that?  O_NODE is
similar to a chdir() in this respect, and chdir doesn't have a handler
either.

> > > You also need to address the open with no permissions pinning a removable
> > > device question.
> > 
> > The whole point of O_NODE is that it doesn't do that, it only goes as
> > far as the mnt/dentry for the filesystem node and not further.  It
> > does not get to touch the device at all, so it can't pin it or have
> > any other side effect.
> 
> You have a reference to the mnt/dentry pinned so how will you unmount the
> volume ?

Oh, I see what you are getting at.  So the situation is this: the root
of the volume does not allow any access to the user, so normal
open/chdir won't work.  Yet open(O_NODE) will and so user can pin the
volume.

However, there's not all that much difference between the above and
doing "stat()" on the mountpoint in a tight loop, except the former is
a more reliable way to prevent unmounting.

I don't see this as a huge issue, but ideas about handling it better
are welcome.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-05 15:24       ` Miklos Szeredi
@ 2009-11-05 15:56         ` Alan Cox
  2009-11-05 16:52           ` Miklos Szeredi
  2009-11-06  1:40         ` Jamie Lokier
  1 sibling, 1 reply; 21+ messages in thread
From: Alan Cox @ 2009-11-05 15:56 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: miklos, akpm, viro, dhowells, hch, adilger, mtk.manpages,
	torvalds, drepper, jamie, linux-fsdevel, linux-kernel

> > Fortunately you can patch it by hand.
> 
> How do you patch it by hand?

I find "joe" quite useful, some people prefer vi or emacs

> All of this is about new and unexpected semantics.  I don't think
> anything more needs to be done than document it in the manpage:

It's a security changing behaviour. It's effectively a regression for
security and that is usually bad news.
 
>   "A file descriptor opened with O_NODE | O_NOACCESS may be used to
>    re-open the same file later with increased permissions
>    (e.g. O_RDWR) if the access mode allows.  This is true even if the
>    permissions on the path leading up to the file would prevent it"

Which is contrary to the assumptions made by systems designers for the
past forty years, so its a very dangerous assumption to break.

What are the sematics with regards to vhangup ?

What are the sematics of O_NODE opening a device file when the device is
later unloaded and a new device is created on the same node with totally
unrelated permissions ? [happens all the time btw]

> > But that isn't the case for some things - consider CIFS and other network
> > file systems.
> 
> Why?

open O_NODE
remote file moves
new one appears
reopen

Now what should happen and what does happen ?

> of the volume does not allow any access to the user, so normal
> open/chdir won't work.  Yet open(O_NODE) will and so user can pin the
> volume.

and without permission on the node.

> However, there's not all that much difference between the above and
> doing "stat()" on the mountpoint in a tight loop, except the former is
> a more reliable way to prevent unmounting.

That doesn't seem to be the case testing it, but its fixable trivially if
so and its fixable without API breakage. Its at worst an implementation
corner case. The O_NODE case is a real nasty API level problem,
particularly as Linux still lacks revoke().

In fact I'd say given the need to get rid of O_NODE references to an
object that you need revoke() first and that revoke must kill O_NODE
references to an object as well.

Alan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-05 15:56         ` Alan Cox
@ 2009-11-05 16:52           ` Miklos Szeredi
  2009-11-05 18:25             ` Alan Cox
  0 siblings, 1 reply; 21+ messages in thread
From: Miklos Szeredi @ 2009-11-05 16:52 UTC (permalink / raw)
  To: Alan Cox
  Cc: miklos, miklos, akpm, viro, dhowells, hch, adilger, mtk.manpages,
	torvalds, drepper, jamie, linux-fsdevel, linux-kernel

On Thu, 5 Nov 2009, Alan Cox wrote:
> > > Fortunately you can patch it by hand.
> > 
> > How do you patch it by hand?
> 
> I find "joe" quite useful, some people prefer vi or emacs

Emacs, please.  But I'm not quite sure anymore what we are talking
about :)

> >   "A file descriptor opened with O_NODE | O_NOACCESS may be used to
> >    re-open the same file later with increased permissions
> >    (e.g. O_RDWR) if the access mode allows.  This is true even if the
> >    permissions on the path leading up to the file would prevent it"
> 
> Which is contrary to the assumptions made by systems designers for the
> past forty years, so its a very dangerous assumption to break.

I don't know.  I would be surprised if anyone actually found a setup
where the current "hole" wrt proc makes any difference.

Take a step back and look at what would be required for this to be
exploitable:

 1) a file which is readable and writable by some user
 2) but is not reachable by said user (because of permissions on path)
 3) a privileged process opening this file with O_RDONLY
 4) sending the fd it to an unprivileged process owned by user
 5) assuming that user won't be able to write it (even though the file
    has write permission)

If I was a system designer, I'd think of that as a very fragile
assumption.

I'm still not sure which is more harmful, leaving this "hole" as it
is, or fixing it, thereby possibly breaking uses of this (mis)feature.
IMO the latter is far more likely to cause problems.

> What are the sematics with regards to vhangup ?

The file descriptor opened with O_NODE cannot refer to a terminal.

> What are the sematics of O_NODE opening a device file when the device is
> later unloaded and a new device is created on the same node with totally
> unrelated permissions ? [happens all the time btw]

Right, that's a valid point.

So re-opening a device node opened with O_NODE is not safe, I agree.
Which means, I'll either have to remove the possibilty of re-opening
O_NODE files through proc, or limit it to non-device nodes.

I'd really prefer just limiting it, since that would leave re-opening
a useful feature, while having minimal risk (especially if documented)
of it causing trouble.

But I can be convinced either way with sufficiently good arguments :)

> > > But that isn't the case for some things - consider CIFS and other network
> > > file systems.
> > 
> > Why?
> 
> open O_NODE
> remote file moves
> new one appears
> reopen

Consider

 open O_RDONLY
 remote file moves
 new one appears
 fstat

What's the difference?

> Now what should happen and what does happen ?

I'd like fstat() to work in that situation as well, but any patches in
that direction consistently get rejected by Christoph, saying
"CIFS/FUSE/etc are broken by design, we shouldn't care".

> > of the volume does not allow any access to the user, so normal
> > open/chdir won't work.  Yet open(O_NODE) will and so user can pin the
> > volume.
> 
> and without permission on the node.
> 
> > However, there's not all that much difference between the above and
> > doing "stat()" on the mountpoint in a tight loop, except the former is
> > a more reliable way to prevent unmounting.
> 
> That doesn't seem to be the case testing it, but its fixable trivially if
> so and its fixable without API breakage.

No it's not.  While a node is looked up, it will pin the mount, albeit
for just a short time.  And permission on the inode itself are not
required for lookup, only permissions on the parent.

I think you are being paranoid here.  If the user has access to the
path leading up the mountpoint, he might as well pin that mount.
Permissions on the mount itself shouldn't really make a difference.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-05 16:52           ` Miklos Szeredi
@ 2009-11-05 18:25             ` Alan Cox
  2009-11-06 11:10               ` Miklos Szeredi
  0 siblings, 1 reply; 21+ messages in thread
From: Alan Cox @ 2009-11-05 18:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: miklos, akpm, viro, dhowells, hch, adilger, mtk.manpages,
	torvalds, drepper, jamie, linux-fsdevel, linux-kernel

> Take a step back and look at what would be required for this to be
> exploitable:

Oh I realise its very minor but care is needed, and its the cases that
haven't been considered that I worry about far more - eg the way you
missed the device stuff.

>  1) a file which is readable and writable by some user
>  2) but is not reachable by said user (because of permissions on path)
>  3) a privileged process opening this file with O_RDONLY
>  4) sending the fd it to an unprivileged process owned by user
>  5) assuming that user won't be able to write it (even though the file
>     has write permission)

The case I found examples of is this

	normal uid process
	exec setuid process
	chdir into locked away directory
	setreuid back

That case is safe because the process is undumpable/unptraceable (which
also keeps /proc/*/fd covered)

The case where it then execs unpriviledged code is interesting. It would
then open up the proc hole but its also true that dumpable would be false
so ptrace could also be used. So its not adding a hole.

> If I was a system designer, I'd think of that as a very fragile
> assumption.

Agreed. Read app code for an afternoon ;)

> Right, that's a valid point.
> 
> So re-opening a device node opened with O_NODE is not safe, I agree.
> Which means, I'll either have to remove the possibilty of re-opening
> O_NODE files through proc, or limit it to non-device nodes.
> 
> I'd really prefer just limiting it, since that would leave re-opening
> a useful feature, while having minimal risk (especially if documented)
> of it causing trouble.

It's certainly a better starting point. See why I said it should default
to unsupported and get added as you review each case ?

> But I can be convinced either way with sufficiently good arguments :)

Basically you need revoke(). That isn't new news either

> > open O_NODE
> > remote file moves
> > new one appears
> > reopen
> 
> Consider
> 
>  open O_RDONLY
>  remote file moves
>  new one appears
>  fstat
> 
> What's the difference?

CIFS in the second case gets a handle to the remote object I believe. NFS
certainly gets it right.

> > Now what should happen and what does happen ?

You stat the wrong file, the O_RDONLY case should stat the right file.
Similar for NFS as NFS will acquire a handle to the file. There isn't any
reason any of these shouldn't work but the moment you start going into
the fs you disappear back into the murky world of unmount

Possibly Christoph is broken as designed ;)

The unix worldview is essentially that a name is an unreliable reference
- it's a lookup directory that changes all the time. A file handle is an
object reference. It persists with the object, its effectively even
refcounted. The handle *is* the thing itself, the name is a transient
momentary thing.

> > That doesn't seem to be the case testing it, but its fixable trivially if
> > so and its fixable without API breakage.
> 
> No it's not.  While a node is looked up, it will pin the mount, albeit
> for just a short time.  And permission on the inode itself are not
> required for lookup, only permissions on the parent.
> 
> I think you are being paranoid here.  If the user has access to the
> path leading up the mountpoint, he might as well pin that mount.
> Permissions on the mount itself shouldn't really make a difference.

Which gets us back to revoke() strangely enough. I am being paranoid. But
security is about being paranoid because you have to out-think all the
bad guys all the time.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-05 15:24       ` Miklos Szeredi
  2009-11-05 15:56         ` Alan Cox
@ 2009-11-06  1:40         ` Jamie Lokier
  2009-11-06 11:31           ` Miklos Szeredi
  1 sibling, 1 reply; 21+ messages in thread
From: Jamie Lokier @ 2009-11-06  1:40 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alan Cox, akpm, viro, dhowells, hch, adilger, mtk.manpages,
	torvalds, drepper, linux-fsdevel, linux-kernel

Miklos Szeredi wrote:
>   "A file descriptor opened with O_NODE | O_NOACCESS may be used to
>    re-open the same file later with increased permissions
>    (e.g. O_RDWR) if the access mode allows.  This is true even if the
>    permissions on the path leading up to the file would prevent it"

It isn't just "the path".

The same issues apply to a file which has been deleted.  Having been
passed a file handle from some other process, you are granted greater
access to a file which has no path at all and no other handles open to
it, which it's reasonable unix security tradition to assume can't be
done.

It's not quite the same issue as /proc/PID/fd.  Someone must have
explicitly used O_NODE, which means they intend for access to be
upgradable later; they won't be surprised by it happening.

But I still think the re-open access should be limited to whatever was
the original access mode, in the same way as has been discussed for
/proc/PID/fd.

So you'd use O_NODE|O_RDWR if you want someone to be able to re-open
the file itself later with O_RDWR acces.  Use O_NODE|O_RDONLY if you
want them to be only able to re-open the file itself with O_RDONLY
access.  That would limit O_NODE|O_NOACCESS to only being able to
re-open with O_NODE|O_NOACCESS again (because O_NOACCESS by itself
isn't allowed).

Is there any reason why O_NODE|O_RDWR cannot be used for that purpose?

> Why would the server need to know anything about that?  O_NODE is
> similar to a chdir() in this respect, and chdir doesn't have a handler
> either.

chdir() needs execute access.

Note we're a bit broken w.r.t. current POSIX regarding fchdir() and
execute-only directories.  It would be good to fix that.

However, it might be possible to craft a "non-pinning inode reference"
in a similar way to inotify.  Either by not referencing the inode
directly (like inotify), or by creating a weak reference method, which
would be more reliable on filesystems without stable inode numbers.

Actually a non-pinning inode reference would be handy for other things
too.  *Must resist temptation to implement O_NOPIN option for open
files generally ;-)*

> However, there's not all that much difference between the above and
> doing "stat()" on the mountpoint in a tight loop, except the former is
> a more reliable way to prevent unmounting.

Are you sure that stops unmounting?  Doesn't unmounting just sit in a
lock waitqueue somewhere like a regular rwlock writer, until it's time
comes?

-- Jamie

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-05 18:25             ` Alan Cox
@ 2009-11-06 11:10               ` Miklos Szeredi
  0 siblings, 0 replies; 21+ messages in thread
From: Miklos Szeredi @ 2009-11-06 11:10 UTC (permalink / raw)
  To: Alan Cox
  Cc: miklos, miklos, akpm, viro, dhowells, hch, adilger, mtk.manpages,
	torvalds, drepper, jamie, linux-fsdevel, linux-kernel

On Thu, 5 Nov 2009, Alan Cox wrote:
> >  open O_RDONLY
> >  remote file moves
> >  new one appears
> >  fstat
> > 
> > What's the difference?
> 
> CIFS in the second case gets a handle to the remote object I believe. NFS
> certainly gets it right.

There's no f_op->fgetattr(), just i_op->getattr().

NFS gets it right, but NFSD is in the kernel and it can juggle with
looking up the inode by its number (for certain filesystems),
something CIFS is unlikely to be able to do.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-06  1:40         ` Jamie Lokier
@ 2009-11-06 11:31           ` Miklos Szeredi
  0 siblings, 0 replies; 21+ messages in thread
From: Miklos Szeredi @ 2009-11-06 11:31 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: miklos, alan, akpm, viro, dhowells, hch, adilger, mtk.manpages,
	torvalds, drepper, linux-fsdevel, linux-kernel

On Fri, 6 Nov 2009, Jamie Lokier wrote:
> Miklos Szeredi wrote:
> >   "A file descriptor opened with O_NODE | O_NOACCESS may be used to
> >    re-open the same file later with increased permissions
> >    (e.g. O_RDWR) if the access mode allows.  This is true even if the
> >    permissions on the path leading up to the file would prevent it"
> 
> It isn't just "the path".
> 
> The same issues apply to a file which has been deleted.  Having been
> passed a file handle from some other process, you are granted greater
> access to a file which has no path at all and no other handles open to
> it, which it's reasonable unix security tradition to assume can't be
> done.
> 
> It's not quite the same issue as /proc/PID/fd.  Someone must have
> explicitly used O_NODE, which means they intend for access to be
> upgradable later; they won't be surprised by it happening.
> 
> But I still think the re-open access should be limited to whatever was
> the original access mode, in the same way as has been discussed for
> /proc/PID/fd.
> 
> So you'd use O_NODE|O_RDWR if you want someone to be able to re-open
> the file itself later with O_RDWR acces.  Use O_NODE|O_RDONLY if you
> want them to be only able to re-open the file itself with O_RDONLY
> access.  That would limit O_NODE|O_NOACCESS to only being able to
> re-open with O_NODE|O_NOACCESS again (because O_NOACCESS by itself
> isn't allowed).
> 
> Is there any reason why O_NODE|O_RDWR cannot be used for that purpose?

It could. 

> 
> > Why would the server need to know anything about that?  O_NODE is
> > similar to a chdir() in this respect, and chdir doesn't have a handler
> > either.
> 
> chdir() needs execute access.

Yes, but the point is: neither chdir nor O_NODE *do* anything to the
filesystem.  They just hold a reference to a filesystem node.

> However, it might be possible to craft a "non-pinning inode reference"
> in a similar way to inotify.  Either by not referencing the inode
> directly (like inotify), or by creating a weak reference method, which
> would be more reliable on filesystems without stable inode numbers.
> 
> Actually a non-pinning inode reference would be handy for other things
> too.  *Must resist temptation to implement O_NOPIN option for open
> files generally ;-)*

Non-pinning reference and revoke are nice concepts, but I think they
are not easy to implement without adding overhead to common usage.

> > However, there's not all that much difference between the above and
> > doing "stat()" on the mountpoint in a tight loop, except the former is
> > a more reliable way to prevent unmounting.
> 
> Are you sure that stops unmounting?  Doesn't unmounting just sit in a
> lock waitqueue somewhere like a regular rwlock writer, until it's time
> comes?

No.  Lookup grabs a reference on the vfsmount, and umount will return
EBUSY if it finds that the vfsmount is still in use.  For stat() the
window is small, though, so it's probably difficult to actually make
it block umount.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-05 14:27   ` Miklos Szeredi
  2009-11-05 14:50     ` Alan Cox
@ 2009-11-06 14:17     ` Pavel Machek
  2009-11-06 20:55       ` Eric W. Biederman
  1 sibling, 1 reply; 21+ messages in thread
From: Pavel Machek @ 2009-11-06 14:17 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alan Cox, akpm, viro, dhowells, hch, adilger, mtk.manpages,
	torvalds, drepper, jamie, linux-fsdevel, linux-kernel

On Thu 2009-11-05 15:27:06, Miklos Szeredi wrote:
> On Thu, 5 Nov 2009, Alan Cox wrote:
> > >  - re-opening normally after checking file type (there's a debate
> > >    whether this would have security issues, but currently we do allow
> > >    re-opening with increased permissions thorugh /proc/*/fd)
> > 
> > Which has already been demonstrated to be an (unfixed) security hole.
> 
> No it hasn't :)  Jamie theorized that there *might* be a real world
> situation where the application writer didn't anticipate this
> behavior.  But as to actual demonstration, we have not seen one yet, I
> think.

See bugtraq, or lkml thread about symlinks with permissions. There's
demo script there.

> And as for reopening O_NODE files with increased permission: that's
> feature people actually expressed interest in, so it's hardly a
> security hole, is it?

Just because people want it does not mean it is not a security hole.

Consider passing /etc/shadow filedesciptor to (legacy) suid root
program. Maybe it now prints /etc/shadow content, because it assumes
that if you have fd you are allowed to read the file?

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-06 14:17     ` Pavel Machek
@ 2009-11-06 20:55       ` Eric W. Biederman
  2009-11-07  7:49         ` Miklos Szeredi
  2009-11-09  8:58         ` Pavel Machek
  0 siblings, 2 replies; 21+ messages in thread
From: Eric W. Biederman @ 2009-11-06 20:55 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Miklos Szeredi, Alan Cox, akpm, viro, dhowells, hch, adilger,
	mtk.manpages, torvalds, drepper, jamie, linux-fsdevel,
	linux-kernel

Pavel Machek <pavel@ucw.cz> writes:

> On Thu 2009-11-05 15:27:06, Miklos Szeredi wrote:
>> On Thu, 5 Nov 2009, Alan Cox wrote:
>> > >  - re-opening normally after checking file type (there's a debate
>> > >    whether this would have security issues, but currently we do allow
>> > >    re-opening with increased permissions thorugh /proc/*/fd)
>> > 
>> > Which has already been demonstrated to be an (unfixed) security hole.
>> 
>> No it hasn't :)  Jamie theorized that there *might* be a real world
>> situation where the application writer didn't anticipate this
>> behavior.  But as to actual demonstration, we have not seen one yet, I
>> think.
>
> See bugtraq, or lkml thread about symlinks with permissions. There's
> demo script there.

Exactly a theoretical discussion, that demonstrates user space
applications with security holes can be written if they make
assumptions about the world that are not true.

So far no one who believes this to be a security hole has found it
worth their while to look at nd->intent.open in proc_pid_follow_link
and write a patch.    Pavel you started out asking for help on how
to do that and I think I have answered the original question.

I am tired of the whining.  If no one who is persuaded the kernel is
wrong can be bothered to write a possibly buggy 5 line patch this is
clearly not a security hole.

Eric

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-06 20:55       ` Eric W. Biederman
@ 2009-11-07  7:49         ` Miklos Szeredi
  2009-11-07 11:09           ` Eric W. Biederman
  2009-11-09  8:58         ` Pavel Machek
  1 sibling, 1 reply; 21+ messages in thread
From: Miklos Szeredi @ 2009-11-07  7:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: pavel, miklos, alan, akpm, viro, dhowells, hch, adilger,
	mtk.manpages, torvalds, drepper, jamie, linux-fsdevel,
	linux-kernel

On Fri, 06 Nov 2009, ebiederm@xmission.com (Eric W. Biederman wrote:
> So far no one who believes this to be a security hole has found it
> worth their while to look at nd->intent.open in proc_pid_follow_link
> and write a patch.

A rather disgusting patch that would be.  The fact is, checking
permissions on follow_link makes little to no sense.  Consider
truncate(2), for example.  Will we add another intent for that?  I
really hope not.

I'm more and more convinced, that the current behavior is the right
one.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-07  7:49         ` Miklos Szeredi
@ 2009-11-07 11:09           ` Eric W. Biederman
  2009-11-07 11:31             ` Miklos Szeredi
  0 siblings, 1 reply; 21+ messages in thread
From: Eric W. Biederman @ 2009-11-07 11:09 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: pavel, alan, akpm, viro, dhowells, hch, adilger, mtk.manpages,
	torvalds, drepper, jamie, linux-fsdevel, linux-kernel

Miklos Szeredi <miklos@szeredi.hu> writes:

> On Fri, 06 Nov 2009, ebiederm@xmission.com (Eric W. Biederman wrote:
>> So far no one who believes this to be a security hole has found it
>> worth their while to look at nd->intent.open in proc_pid_follow_link
>> and write a patch.
>
> A rather disgusting patch that would be.  The fact is, checking
> permissions on follow_link makes little to no sense.  Consider
> truncate(2), for example.  Will we add another intent for that?  I
> really hope not

No.  I was just thinking we have the open intent that is there for
combining lookup and open. We can look test for LOOKUP_OPEN and do
exactly what we need.

> I'm more and more convinced, that the current behavior is the right
> one.

I think the 15 or so years we have had the current behavior without
problems is persuasive.

I think it is an interesting puzzle on how to get dup instead of
reopen as there are cases where that could be useful behavior as well.

The usefulness of an O_NONE flag increases significantly if you can
open the reference file later with more permissions.  Essentially
making a hardlink into a running program.  Hmm.  Weird cases do seem
to show up when the last dir entry is removed.  

I wonder if we want a rule that you can't open a file with link count
of 0.  Reasoning may get truly strange otherwise.

Eric

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-07 11:09           ` Eric W. Biederman
@ 2009-11-07 11:31             ` Miklos Szeredi
  2009-11-07 11:48               ` Eric W. Biederman
  2009-11-07 17:58               ` Pavel Machek
  0 siblings, 2 replies; 21+ messages in thread
From: Miklos Szeredi @ 2009-11-07 11:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: miklos, pavel, alan, akpm, viro, dhowells, hch, adilger,
	mtk.manpages, torvalds, drepper, jamie, linux-fsdevel,
	linux-kernel

On Sat, 07 Nov 2009, ebiederm@xmission.com (Eric W. Biederman wrote:
> Miklos Szeredi <miklos@szeredi.hu> writes:
> 
> > On Fri, 06 Nov 2009, ebiederm@xmission.com (Eric W. Biederman wrote:
> >> So far no one who believes this to be a security hole has found it
> >> worth their while to look at nd->intent.open in proc_pid_follow_link
> >> and write a patch.
> >
> > A rather disgusting patch that would be.  The fact is, checking
> > permissions on follow_link makes little to no sense.  Consider
> > truncate(2), for example.  Will we add another intent for that?  I
> > really hope not
> 
> No.  I was just thinking we have the open intent that is there for
> combining lookup and open. We can look test for LOOKUP_OPEN and do
> exactly what we need.

No, because you just covered half the cases.  truncate(2) will still
work fine on the /proc/PID/fd/FD belonging to a O_RDONLY file
descriptor.

> > I'm more and more convinced, that the current behavior is the right
> > one.
> 
> I think the 15 or so years we have had the current behavior without
> problems is persuasive.
> 
> I think it is an interesting puzzle on how to get dup instead of
> reopen as there are cases where that could be useful behavior as well.

Probably doable with ptrace().

> The usefulness of an O_NONE flag increases significantly if you can
> open the reference file later with more permissions.  Essentially
> making a hardlink into a running program.  Hmm.  Weird cases do seem
> to show up when the last dir entry is removed.  

Why are they more weird than files opened without O_NODE?

The only really weird case Alan spotted is device nodes, where the
actual device registered to a major/minor pair changes over time,
possibly allowing a re-open to access a device it otherwise was not
meant to.  BTW if the device number reuse happens really quickly, this
could even be a race for a plain open.  Real solution might actually
be in udev: when deregistering a device, change mode bits to all-zero
before removing the device node.

> I wonder if we want a rule that you can't open a file with link count
> of 0.  Reasoning may get truly strange otherwise.

Again, don't see why this would be different for O_NODE as for
non-O_NODE files descriptors.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-07 11:31             ` Miklos Szeredi
@ 2009-11-07 11:48               ` Eric W. Biederman
  2009-11-08 17:01                 ` Pavel Machek
  2009-11-07 17:58               ` Pavel Machek
  1 sibling, 1 reply; 21+ messages in thread
From: Eric W. Biederman @ 2009-11-07 11:48 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: pavel, alan, akpm, viro, dhowells, hch, adilger, mtk.manpages,
	torvalds, drepper, jamie, linux-fsdevel, linux-kernel

Miklos Szeredi <miklos@szeredi.hu> writes:

> On Sat, 07 Nov 2009, ebiederm@xmission.com (Eric W. Biederman wrote:
>> Miklos Szeredi <miklos@szeredi.hu> writes:
>> 
>> > On Fri, 06 Nov 2009, ebiederm@xmission.com (Eric W. Biederman wrote:
>> >> So far no one who believes this to be a security hole has found it
>> >> worth their while to look at nd->intent.open in proc_pid_follow_link
>> >> and write a patch.
>> >
>> > A rather disgusting patch that would be.  The fact is, checking
>> > permissions on follow_link makes little to no sense.  Consider
>> > truncate(2), for example.  Will we add another intent for that?  I
>> > really hope not
>> 
>> No.  I was just thinking we have the open intent that is there for
>> combining lookup and open. We can look test for LOOKUP_OPEN and do
>> exactly what we need.
>
> No, because you just covered half the cases.  truncate(2) will still
> work fine on the /proc/PID/fd/FD belonging to a O_RDONLY file
> descriptor.

Good point as truncate is as bad as opening read-write.  Shrug.

>> > I'm more and more convinced, that the current behavior is the right
>> > one.
>> 
>> I think the 15 or so years we have had the current behavior without
>> problems is persuasive.
>> 
>> I think it is an interesting puzzle on how to get dup instead of
>> reopen as there are cases where that could be useful behavior as well.
>
> Probably doable with ptrace().
>
>> The usefulness of an O_NONE flag increases significantly if you can
>> open the reference file later with more permissions.  Essentially
>> making a hardlink into a running program.  Hmm.  Weird cases do seem
>> to show up when the last dir entry is removed.  
>
> Why are they more weird than files opened without O_NODE?

Because as I recall O_NODE skips all permission checks.  Which would
mean you can limit who holds a reference to your inode without O_NODE.

> The only really weird case Alan spotted is device nodes, where the
> actual device registered to a major/minor pair changes over time,
> possibly allowing a re-open to access a device it otherwise was not
> meant to.  BTW if the device number reuse happens really quickly, this
> could even be a race for a plain open.  Real solution might actually
> be in udev: when deregistering a device, change mode bits to all-zero
> before removing the device node.

Devices nodes specifically were the case I was thinking of.

Changing the mode bits to all-zero at the final unlink would be a lot
more reliable and certain in the kernel.

>> I wonder if we want a rule that you can't open a file with link count
>> of 0.  Reasoning may get truly strange otherwise.
>
> Again, don't see why this would be different for O_NODE as for
> non-O_NODE files descriptors.


Eric

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-07 11:31             ` Miklos Szeredi
  2009-11-07 11:48               ` Eric W. Biederman
@ 2009-11-07 17:58               ` Pavel Machek
  1 sibling, 0 replies; 21+ messages in thread
From: Pavel Machek @ 2009-11-07 17:58 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Eric W. Biederman, alan, akpm, viro, dhowells, hch, adilger,
	mtk.manpages, torvalds, drepper, jamie, linux-fsdevel,
	linux-kernel

Hi!

> > > I'm more and more convinced, that the current behavior is the right
> > > one.
> > 
> > I think the 15 or so years we have had the current behavior without
> > problems is persuasive.
> > 
> > I think it is an interesting puzzle on how to get dup instead of
> > reopen as there are cases where that could be useful behavior as well.
> 
> Probably doable with ptrace().

Confused, where does ptrace fit the picture?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-07 11:48               ` Eric W. Biederman
@ 2009-11-08 17:01                 ` Pavel Machek
  2009-11-16 11:50                   ` Miklos Szeredi
  0 siblings, 1 reply; 21+ messages in thread
From: Pavel Machek @ 2009-11-08 17:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, alan, akpm, viro, dhowells, hch, adilger,
	mtk.manpages, torvalds, drepper, jamie, linux-fsdevel,
	linux-kernel

Hi!

> > The only really weird case Alan spotted is device nodes, where the
> > actual device registered to a major/minor pair changes over time,
> > possibly allowing a re-open to access a device it otherwise was not
> > meant to.  BTW if the device number reuse happens really quickly, this
> > could even be a race for a plain open.  Real solution might actually
> > be in udev: when deregistering a device, change mode bits to all-zero
> > before removing the device node.
> 
> Devices nodes specifically were the case I was thinking of.
> 
> Changing the mode bits to all-zero at the final unlink would be a lot
> more reliable and certain in the kernel.

Does it really close the race completely?

					udev sets 660
open does permission checks
					device disappears
					chmod 000
					new device appears
					udev chmods 600
open returns new device

?
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-06 20:55       ` Eric W. Biederman
  2009-11-07  7:49         ` Miklos Szeredi
@ 2009-11-09  8:58         ` Pavel Machek
  1 sibling, 0 replies; 21+ messages in thread
From: Pavel Machek @ 2009-11-09  8:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Alan Cox, akpm, viro, dhowells, hch, adilger,
	mtk.manpages, torvalds, drepper, jamie, linux-fsdevel,
	linux-kernel

On Fri 2009-11-06 12:55:33, Eric W. Biederman wrote:
> Pavel Machek <pavel@ucw.cz> writes:
> 
> > On Thu 2009-11-05 15:27:06, Miklos Szeredi wrote:
> >> On Thu, 5 Nov 2009, Alan Cox wrote:
> >> > >  - re-opening normally after checking file type (there's a debate
> >> > >    whether this would have security issues, but currently we do allow
> >> > >    re-opening with increased permissions thorugh /proc/*/fd)
> >> > 
> >> > Which has already been demonstrated to be an (unfixed) security hole.
> >> 
> >> No it hasn't :)  Jamie theorized that there *might* be a real world
> >> situation where the application writer didn't anticipate this
> >> behavior.  But as to actual demonstration, we have not seen one yet, I
> >> think.
> >
> > See bugtraq, or lkml thread about symlinks with permissions. There's
> > demo script there.
> 
> Exactly a theoretical discussion, that demonstrates user space
> applications with security holes can be written if they make
> assumptions about the world that are not true.
> 
> So far no one who believes this to be a security hole has found it
> worth their while to look at nd->intent.open in proc_pid_follow_link
> and write a patch.    Pavel you started out asking for help on how
> to do that and I think I have answered the original question.


> I am tired of the whining.  If no one who is persuaded the kernel is
> wrong can be bothered to write a possibly buggy 5 line patch this is
> clearly not a security hole.

"I did not get a patch so it can't be security hole". Interesting.

I still hope to write it one day, but as I do not have untrusted users
on my systems, it is not particulary urgent. (And I still hope distro
security people do they job.)
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 resend] vfs: new O_NODE open flag
  2009-11-08 17:01                 ` Pavel Machek
@ 2009-11-16 11:50                   ` Miklos Szeredi
  0 siblings, 0 replies; 21+ messages in thread
From: Miklos Szeredi @ 2009-11-16 11:50 UTC (permalink / raw)
  To: Pavel Machek
  Cc: ebiederm, miklos, alan, akpm, viro, dhowells, hch, adilger,
	mtk.manpages, torvalds, drepper, jamie, linux-fsdevel,
	linux-kernel

On Sun, 8 Nov 2009, Pavel Machek wrote:
> Does it really close the race completely?
> 
> 					udev sets 660
> open does permission checks
> 					device disappears
> 					chmod 000
> 					new device appears
> 					udev chmods 600
> open returns new device

Yes, there's still a small hole there.

We could check nlink != 0 after grabbing the device (untested patch).
That is a hack, however, and would break apps which previously relied
on being able to re-open already deleted devices through /proc/*/fd.
But there might not be a better solution...  Thoughts?

Thanks,
Miklos


Index: linux-2.6/fs/char_dev.c
===================================================================
--- linux-2.6.orig/fs/char_dev.c	2009-09-24 20:10:58.000000000 +0200
+++ linux-2.6/fs/char_dev.c	2009-11-16 12:48:58.000000000 +0100
@@ -396,6 +396,16 @@ static int chrdev_open(struct inode *ino
 	if (ret)
 		return ret;
 
+	/*
+	 * The device might have been removed and then reused while
+	 * the open was in progress.  Make sure we don't let open
+	 * proceed in such a case, since the old device could have had
+	 * different permissions.
+	 */
+	ret = -ENOENT;
+	if (inode->i_nlink == 0)
+		goto out_cdev_put;
+
 	ret = -ENXIO;
 	filp->f_op = fops_get(p->ops);
 	if (!filp->f_op)

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2009-11-16 11:51 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-05 11:23 [PATCH v2 resend] vfs: new O_NODE open flag Miklos Szeredi
2009-11-05 13:15 ` Alan Cox
2009-11-05 14:27   ` Miklos Szeredi
2009-11-05 14:50     ` Alan Cox
2009-11-05 15:24       ` Miklos Szeredi
2009-11-05 15:56         ` Alan Cox
2009-11-05 16:52           ` Miklos Szeredi
2009-11-05 18:25             ` Alan Cox
2009-11-06 11:10               ` Miklos Szeredi
2009-11-06  1:40         ` Jamie Lokier
2009-11-06 11:31           ` Miklos Szeredi
2009-11-06 14:17     ` Pavel Machek
2009-11-06 20:55       ` Eric W. Biederman
2009-11-07  7:49         ` Miklos Szeredi
2009-11-07 11:09           ` Eric W. Biederman
2009-11-07 11:31             ` Miklos Szeredi
2009-11-07 11:48               ` Eric W. Biederman
2009-11-08 17:01                 ` Pavel Machek
2009-11-16 11:50                   ` Miklos Szeredi
2009-11-07 17:58               ` Pavel Machek
2009-11-09  8:58         ` Pavel Machek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.