All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -V26 00/16] Generic name to handle and open by handle syscalls
@ 2011-01-29 19:08 Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 01/16] exportfs: Return the minimum required handle size Aneesh Kumar K.V
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel

Hi,

The below set of patches implement open by handle support using exportfs
operations. This allows user space application to map a file name to file
handle and later open the file using handle. This should be usable
for userspace NFS [1] and 9P server [2]. XFS already support this with the ioctls
XFS_IOC_PATH_TO_HANDLE and XFS_IOC_OPEN_BY_HANDLE.

[1] http://nfs-ganesha.sourceforge.net/
[2] http://thread.gmane.org/gmane.comp.emulators.qemu/68992

git repo for the patchset at:
git://git.kernel.org/pub/scm/linux/kernel/git/kvaneesh/linux-open-handle.git open-by-handle

Test case can be found at
http://git.kernel.org/?p=fs/ext2/kvaneesh/handle-test.git
git://git.kernel.org/pub/scm/fs/ext2/kvaneesh/handle-test.git

Changes from V22:
a) Add support for O_PATH open flag
b) Add support "" names in *at syscalls
c) Add support O_PATH descriptor in few selective syscalls like stat, chown,
   chmod and dup

NOTE: v23-25 can be found in the git repo

-aneesh


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH -V26 01/16] exportfs: Return the minimum required handle size
  2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
@ 2011-01-29 19:08 ` Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 02/16] vfs: Add name to file handle conversion support Aneesh Kumar K.V
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel, Aneesh Kumar K.V

The exportfs encode handle function should return the minimum required
handle size. This helps user to find out the handle size by passing 0
handle size in the first step and then redoing to the call again with
the returned handle size value.

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/btrfs/export.c             |    8 ++++++--
 fs/exportfs/expfs.c           |    9 +++++++--
 fs/fat/inode.c                |    4 +++-
 fs/fuse/inode.c               |    4 +++-
 fs/gfs2/export.c              |    8 ++++++--
 fs/isofs/export.c             |    8 ++++++--
 fs/ocfs2/export.c             |    8 ++++++--
 fs/reiserfs/inode.c           |    7 ++++++-
 fs/udf/namei.c                |    7 ++++++-
 fs/xfs/linux-2.6/xfs_export.c |    4 +++-
 include/linux/exportfs.h      |    6 ++++--
 mm/shmem.c                    |    4 +++-
 12 files changed, 59 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/export.c b/fs/btrfs/export.c
index 9786963..6ee1d94 100644
--- a/fs/btrfs/export.c
+++ b/fs/btrfs/export.c
@@ -21,9 +21,13 @@ static int btrfs_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
 	int len = *max_len;
 	int type;
 
-	if ((len < BTRFS_FID_SIZE_NON_CONNECTABLE) ||
-	    (connectable && len < BTRFS_FID_SIZE_CONNECTABLE))
+	if (connectable && (len < BTRFS_FID_SIZE_CONNECTABLE)) {
+		*max_len = BTRFS_FID_SIZE_CONNECTABLE;
 		return 255;
+	} else if (len < BTRFS_FID_SIZE_NON_CONNECTABLE) {
+		*max_len = BTRFS_FID_SIZE_NON_CONNECTABLE;
+		return 255;
+	}
 
 	len  = BTRFS_FID_SIZE_NON_CONNECTABLE;
 	type = FILEID_BTRFS_WITHOUT_PARENT;
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 4b68257..cfe5573 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -320,9 +320,14 @@ static int export_encode_fh(struct dentry *dentry, struct fid *fid,
 	struct inode * inode = dentry->d_inode;
 	int len = *max_len;
 	int type = FILEID_INO32_GEN;
-	
-	if (len < 2 || (connectable && len < 4))
+
+	if (connectable && (len < 4)) {
+		*max_len = 4;
+		return 255;
+	} else if (len < 2) {
+		*max_len = 2;
 		return 255;
+	}
 
 	len = 2;
 	fid->i32.ino = inode->i_ino;
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 86753fe..0e277ec 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -757,8 +757,10 @@ fat_encode_fh(struct dentry *de, __u32 *fh, int *lenp, int connectable)
 	struct inode *inode =  de->d_inode;
 	u32 ipos_h, ipos_m, ipos_l;
 
-	if (len < 5)
+	if (len < 5) {
+		*lenp = 5;
 		return 255; /* no room */
+	}
 
 	ipos_h = MSDOS_I(inode)->i_pos >> 8;
 	ipos_m = (MSDOS_I(inode)->i_pos & 0xf0) << 24;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 9e3f68c..051b1a0 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -637,8 +637,10 @@ static int fuse_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
 	u64 nodeid;
 	u32 generation;
 
-	if (*max_len < len)
+	if (*max_len < len) {
+		*max_len = len;
 		return  255;
+	}
 
 	nodeid = get_fuse_inode(inode)->nodeid;
 	generation = inode->i_generation;
diff --git a/fs/gfs2/export.c b/fs/gfs2/export.c
index 9023db8..b5a5e60 100644
--- a/fs/gfs2/export.c
+++ b/fs/gfs2/export.c
@@ -36,9 +36,13 @@ static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len,
 	struct super_block *sb = inode->i_sb;
 	struct gfs2_inode *ip = GFS2_I(inode);
 
-	if (*len < GFS2_SMALL_FH_SIZE ||
-	    (connectable && *len < GFS2_LARGE_FH_SIZE))
+	if (connectable && (*len < GFS2_LARGE_FH_SIZE)) {
+		*len = GFS2_LARGE_FH_SIZE;
 		return 255;
+	} else if (*len < GFS2_SMALL_FH_SIZE) {
+		*len = GFS2_SMALL_FH_SIZE;
+		return 255;
+	}
 
 	fh[0] = cpu_to_be32(ip->i_no_formal_ino >> 32);
 	fh[1] = cpu_to_be32(ip->i_no_formal_ino & 0xFFFFFFFF);
diff --git a/fs/isofs/export.c b/fs/isofs/export.c
index ed752cb..dd4687f 100644
--- a/fs/isofs/export.c
+++ b/fs/isofs/export.c
@@ -124,9 +124,13 @@ isofs_export_encode_fh(struct dentry *dentry,
 	 * offset of the inode and the upper 16 bits of fh32[1] to
 	 * hold the offset of the parent.
 	 */
-
-	if (len < 3 || (connectable && len < 5))
+	if (connectable && (len < 5)) {
+		*max_len = 5;
+		return 255;
+	} else if (len < 3) {
+		*max_len = 3;
 		return 255;
+	}
 
 	len = 3;
 	fh32[0] = ei->i_iget5_block;
diff --git a/fs/ocfs2/export.c b/fs/ocfs2/export.c
index 5dbc306..254652a 100644
--- a/fs/ocfs2/export.c
+++ b/fs/ocfs2/export.c
@@ -197,8 +197,12 @@ static int ocfs2_encode_fh(struct dentry *dentry, u32 *fh_in, int *max_len,
 		   dentry->d_name.len, dentry->d_name.name,
 		   fh, len, connectable);
 
-	if (len < 3 || (connectable && len < 6)) {
-		mlog(ML_ERROR, "fh buffer is too small for encoding\n");
+	if (connectable && (len < 6)) {
+		*max_len = 6;
+		type = 255;
+		goto bail;
+	} else if (len < 3) {
+		*max_len = 3;
 		type = 255;
 		goto bail;
 	}
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 0bae036..1bba24b 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -1593,8 +1593,13 @@ int reiserfs_encode_fh(struct dentry *dentry, __u32 * data, int *lenp,
 	struct inode *inode = dentry->d_inode;
 	int maxlen = *lenp;
 
-	if (maxlen < 3)
+	if (need_parent && (maxlen < 5)) {
+		*lenp = 5;
 		return 255;
+	} else if (maxlen < 3) {
+		*lenp = 3;
+		return 255;
+	}
 
 	data[0] = inode->i_ino;
 	data[1] = le32_to_cpu(INODE_PKEY(inode)->k_dir_id);
diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index 2be0f9e..076aef6 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -1287,8 +1287,13 @@ static int udf_encode_fh(struct dentry *de, __u32 *fh, int *lenp,
 	struct fid *fid = (struct fid *)fh;
 	int type = FILEID_UDF_WITHOUT_PARENT;
 
-	if (len < 3 || (connectable && len < 5))
+	if (connectable && (len < 5)) {
+		*lenp = 5;
+		return 255;
+	} else if (len < 3) {
+		*lenp = 3;
 		return 255;
+	}
 
 	*lenp = 3;
 	fid->udf.block = location.logicalBlockNum;
diff --git a/fs/xfs/linux-2.6/xfs_export.c b/fs/xfs/linux-2.6/xfs_export.c
index fc0114d..f4f878f 100644
--- a/fs/xfs/linux-2.6/xfs_export.c
+++ b/fs/xfs/linux-2.6/xfs_export.c
@@ -89,8 +89,10 @@ xfs_fs_encode_fh(
 	 * seven combinations work.  The real answer is "don't use v2".
 	 */
 	len = xfs_fileid_length(fileid_type);
-	if (*max_len < len)
+	if (*max_len < len) {
+		*max_len = len;
 		return 255;
+	}
 	*max_len = len;
 
 	switch (fileid_type) {
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index 2802898..65afdfd 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -121,8 +121,10 @@ struct fid {
  *    set, the encode_fh() should store sufficient information so that a good
  *    attempt can be made to find not only the file but also it's place in the
  *    filesystem.   This typically means storing a reference to de->d_parent in
- *    the filehandle fragment.  encode_fh() should return the number of bytes
- *    stored or a negative error code such as %-ENOSPC
+ *    the filehandle fragment.  encode_fh() should return the fileid_type on
+ *    success and on error returns 255 (if the space needed to encode fh is
+ *    greater than @max_len*4 bytes). On error @max_len contains the minimum
+ *    size(in 4 byte unit) needed to encode the file handle.
  *
  * fh_to_dentry:
  *    @fh_to_dentry is given a &struct super_block (@sb) and a file handle
diff --git a/mm/shmem.c b/mm/shmem.c
index 5ee67c9..3437b65 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2144,8 +2144,10 @@ static int shmem_encode_fh(struct dentry *dentry, __u32 *fh, int *len,
 {
 	struct inode *inode = dentry->d_inode;
 
-	if (*len < 3)
+	if (*len < 3) {
+		*len = 3;
 		return 255;
+	}
 
 	if (inode_unhashed(inode)) {
 		/* Unfortunately insert_inode_hash is not idempotent,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -V26 02/16] vfs: Add name to file handle conversion support
  2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 01/16] exportfs: Return the minimum required handle size Aneesh Kumar K.V
@ 2011-01-29 19:08 ` Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 03/16] vfs: Add open by file handle support Aneesh Kumar K.V
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel, Aneesh Kumar K.V

The syscall also return mount id which can be used
to lookup file system specific information such as uuid
in /proc/<pid>/mountinfo

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/open.c                |  128 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/exportfs.h |    3 +
 include/linux/fs.h       |    7 +++
 include/linux/syscalls.h |    5 ++-
 kernel/sys_ni.c          |    3 +
 5 files changed, 145 insertions(+), 1 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index e52389e..d12723a 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -30,6 +30,7 @@
 #include <linux/fs_struct.h>
 #include <linux/ima.h>
 #include <linux/dnotify.h>
+#include <linux/exportfs.h>
 
 #include "internal.h"
 
@@ -1047,3 +1048,130 @@ int nonseekable_open(struct inode *inode, struct file *filp)
 }
 
 EXPORT_SYMBOL(nonseekable_open);
+
+#ifdef CONFIG_EXPORTFS
+static long do_sys_name_to_handle(struct path *path,
+				  struct file_handle __user *ufh,
+				  int __user *mnt_id)
+{
+	long retval;
+	struct file_handle f_handle;
+	int handle_dwords, handle_bytes;
+	struct file_handle *handle = NULL;
+
+	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle))) {
+		retval = -EFAULT;
+		goto err_out;
+	}
+	if (f_handle.handle_bytes > MAX_HANDLE_SZ) {
+		retval = -EINVAL;
+		goto err_out;
+	}
+	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_bytes,
+			 GFP_KERNEL);
+	if (!handle) {
+		retval = -ENOMEM;
+		goto err_out;
+	}
+
+	/* convert handle size to  multiple of sizeof(u32) */
+	handle_dwords = f_handle.handle_bytes >> 2;
+
+	/* we ask for a non connected handle */
+	retval = exportfs_encode_fh(path->dentry,
+				    (struct fid *)handle->f_handle,
+				    &handle_dwords,  0);
+	handle->handle_type = retval;
+	/* convert handle size to bytes */
+	handle_bytes = handle_dwords * sizeof(u32);
+	handle->handle_bytes = handle_bytes;
+	if ((handle->handle_bytes > f_handle.handle_bytes) ||
+	    (retval == 255) || (retval == -ENOSPC)) {
+		/* As per old exportfs_encode_fh documentation
+		 * we could return ENOSPC to indicate overflow
+		 * But file system returned 255 always. So handle
+		 * both the values
+		 */
+		/*
+		 * set the handle size to zero so we copy only
+		 * non variable part of the file_handle
+		 */
+		handle_bytes = 0;
+		retval = -EOVERFLOW;
+	} else
+		retval = 0;
+	/* copy the mount id */
+	if (copy_to_user(mnt_id, &path->mnt->mnt_id, sizeof(*mnt_id))) {
+		retval = -EFAULT;
+		goto err_free_out;
+	}
+	if (copy_to_user(ufh, handle,
+			 sizeof(struct file_handle) + handle_bytes))
+		retval = -EFAULT;
+err_free_out:
+	kfree(handle);
+err_out:
+	return retval;
+}
+
+/**
+ * sys_name_to_handle_at: convert name to handle
+ * @dfd: directory relative to which name is interpreted if not absolute
+ * @name: name that should be converted to handle.
+ * @handle: resulting file handle
+ * @mnt_id: mount id of the file system containing the file
+ * @flag: flag value to indicate whether to follow symlink or not
+ *
+ * @handle->handle_size indicate the space available to store the
+ * variable part of the file handle in bytes. If there is not
+ * enough space, the field is updated to return the minimum
+ * value required.
+ */
+SYSCALL_DEFINE5(name_to_handle_at, int, dfd, const char __user *, name,
+		struct file_handle __user *, handle, int __user*, mnt_id,
+		int, flag)
+{
+
+	int follow;
+	int fput_needed;
+	long ret = -EINVAL;
+	struct path path, *pp;
+	struct file *file = NULL;
+
+	if ((flag & ~AT_SYMLINK_FOLLOW) != 0)
+		goto err_out;
+
+	if (name == NULL && dfd != AT_FDCWD) {
+		file = fget_light(dfd, &fput_needed);
+		if (file) {
+			pp = &file->f_path;
+			ret = 0;
+		} else
+			ret = -EBADF;
+	} else {
+		follow = (flag & AT_SYMLINK_FOLLOW) ? LOOKUP_FOLLOW : 0;
+		ret = user_path_at(dfd, name, follow, &path);
+		pp = &path;
+	}
+	if (ret)
+		goto err_out;
+	/*
+	 * We need t make sure wether the file system
+	 * support decoding of the file handle
+	 */
+	if (!pp->mnt->mnt_sb->s_export_op ||
+	    !pp->mnt->mnt_sb->s_export_op->fh_to_dentry) {
+		ret = -EOPNOTSUPP;
+		goto out_path;
+	}
+	ret = do_sys_name_to_handle(pp, handle, mnt_id);
+
+out_path:
+	if (file)
+		fput_light(file, fput_needed);
+	else
+		path_put(&path);
+err_out:
+	return ret;
+}
+#endif
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index 65afdfd..33a42f2 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -8,6 +8,9 @@ struct inode;
 struct super_block;
 struct vfsmount;
 
+/* limit the handle size to NFSv4 handle size now */
+#define MAX_HANDLE_SZ 128
+
 /*
  * The fileid_type identifies how the file within the filesystem is encoded.
  * In theory this is freely set and parsed by the filesystem, but we try to
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 32b38cd..9fbb0e9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -977,6 +977,13 @@ struct file {
 #endif
 };
 
+struct file_handle {
+	__u32 handle_bytes;
+	int handle_type;
+	/* file identifier */
+	unsigned char f_handle[0];
+};
+
 #define get_file(x)	atomic_long_inc(&(x)->f_count)
 #define fput_atomic(x)	atomic_long_add_unless(&(x)->f_count, -1, 1)
 #define file_count(x)	atomic_long_read(&(x)->f_count)
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 18cd068..e1ef441 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -62,6 +62,7 @@ struct robust_list_head;
 struct getcpu_cache;
 struct old_linux_dirent;
 struct perf_event_attr;
+struct file_handle;
 
 #include <linux/types.h>
 #include <linux/aio_abi.h>
@@ -830,5 +831,7 @@ asmlinkage long sys_mmap_pgoff(unsigned long addr, unsigned long len,
 			unsigned long prot, unsigned long flags,
 			unsigned long fd, unsigned long pgoff);
 asmlinkage long sys_old_mmap(struct mmap_arg_struct __user *arg);
-
+asmlinkage long sys_name_to_handle_at(int dfd, const char __user *name,
+				      struct file_handle __user *handle,
+				      int __user *mnt_id, int flag);
 #endif
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index c782fe9..4e01343 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -186,3 +186,6 @@ cond_syscall(sys_perf_event_open);
 /* fanotify! */
 cond_syscall(sys_fanotify_init);
 cond_syscall(sys_fanotify_mark);
+
+/* open by handle */
+cond_syscall(sys_name_to_handle_at);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -V26 03/16] vfs: Add open by file handle support
  2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 01/16] exportfs: Return the minimum required handle size Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 02/16] vfs: Add name to file handle conversion support Aneesh Kumar K.V
@ 2011-01-29 19:08 ` Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 04/16] fs: Don't allow to create hardlink for deleted file Aneesh Kumar K.V
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel, Aneesh Kumar K.V

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/compat.c              |   11 ++
 fs/exportfs/expfs.c      |    2 +
 fs/namei.c               |  230 ++++++++++++++++++++++++++++++++++++++++++---
 fs/open.c                |   32 ++++++-
 include/linux/fs.h       |   10 ++-
 include/linux/namei.h    |    1 +
 include/linux/syscalls.h |    3 +
 7 files changed, 268 insertions(+), 21 deletions(-)

diff --git a/fs/compat.c b/fs/compat.c
index f6fd0a0..e8436f3 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -2308,3 +2308,14 @@ asmlinkage long compat_sys_timerfd_gettime(int ufd,
 }
 
 #endif /* CONFIG_TIMERFD */
+
+/*
+ * Exactly like fs/open.c:sys_open_by_handle_at(), except that it
+ * doesn't set the O_LARGEFILE flag.
+ */
+asmlinkage long
+compat_sys_open_by_handle_at(int mountdirfd,
+			     struct file_handle __user *handle, int flags)
+{
+	return do_handle_open(mountdirfd, handle, flags);
+}
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index cfe5573..b05acb7 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -374,6 +374,8 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
 	/*
 	 * Try to get any dentry for the given file handle from the filesystem.
 	 */
+	if (!nop || !nop->fh_to_dentry)
+		return ERR_PTR(-ESTALE);
 	result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type);
 	if (!result)
 		result = ERR_PTR(-ESTALE);
diff --git a/fs/namei.c b/fs/namei.c
index 7d77f24..6b0536f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -32,6 +32,7 @@
 #include <linux/fcntl.h>
 #include <linux/device_cgroup.h>
 #include <linux/fs_struct.h>
+#include <linux/exportfs.h>
 #include <asm/uaccess.h>
 
 #include "internal.h"
@@ -1697,6 +1698,29 @@ out_fail:
 	return retval;
 }
 
+struct vfsmount *get_vfsmount_from_fd(int fd)
+{
+	int fput_needed;
+	struct path path;
+	struct file *filep;
+
+	if (fd == AT_FDCWD) {
+		struct fs_struct *fs = current->fs;
+		spin_lock(&fs->lock);
+		path = fs->pwd;
+		mntget(path.mnt);
+		spin_unlock(&fs->lock);
+	} else {
+		filep = fget_light(fd, &fput_needed);
+		if (!filep)
+			return ERR_PTR(-EBADF);
+		path = filep->f_path;
+		mntget(path.mnt);
+		fput_light(filep, fput_needed);
+	}
+	return path.mnt;
+}
+
 /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
 static int do_path_lookup(int dfd, const char *name,
 				unsigned int flags, struct nameidata *nd)
@@ -2218,26 +2242,30 @@ static int open_will_truncate(int flag, struct inode *inode)
 	return (flag & O_TRUNC);
 }
 
-static struct file *finish_open(struct nameidata *nd,
+static struct file *finish_open(struct file *filp, struct path *path,
 				int open_flag, int acc_mode)
 {
-	struct file *filp;
-	int will_truncate;
 	int error;
+	int will_truncate;
 
-	will_truncate = open_will_truncate(open_flag, nd->path.dentry->d_inode);
+	will_truncate = open_will_truncate(open_flag, path->dentry->d_inode);
 	if (will_truncate) {
-		error = mnt_want_write(nd->path.mnt);
+		error = mnt_want_write(path->mnt);
 		if (error)
 			goto exit;
 	}
-	error = may_open(&nd->path, acc_mode, open_flag);
+	error = may_open(path, acc_mode, open_flag);
 	if (error) {
 		if (will_truncate)
-			mnt_drop_write(nd->path.mnt);
+			mnt_drop_write(path->mnt);
 		goto exit;
 	}
-	filp = nameidata_to_filp(nd);
+	/* Has the filesystem initialised the file for us? */
+	if (filp->f_path.dentry == NULL) {
+		path_get(path);
+		filp = __dentry_open(path->dentry, path->mnt, filp,
+				     NULL, current_cred());
+	}
 	if (!IS_ERR(filp)) {
 		error = ima_file_check(filp, acc_mode);
 		if (error) {
@@ -2260,14 +2288,18 @@ static struct file *finish_open(struct nameidata *nd,
 	 * on its behalf.
 	 */
 	if (will_truncate)
-		mnt_drop_write(nd->path.mnt);
-	path_put(&nd->path);
+		mnt_drop_write(path->mnt);
+	path_put(path);
 	return filp;
 
 exit:
-	if (!IS_ERR(nd->intent.open.file))
-		release_open_intent(nd);
-	path_put(&nd->path);
+	if (!IS_ERR(filp)) {
+		if (filp->f_path.dentry == NULL)
+			put_filp(filp);
+		else
+			fput(filp);
+	}
+	path_put(path);
 	return ERR_PTR(error);
 }
 
@@ -2381,7 +2413,8 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
 	if (S_ISDIR(nd->inode->i_mode))
 		goto exit;
 ok:
-	filp = finish_open(nd, open_flag, acc_mode);
+	filp = finish_open(nd->intent.open.file, &nd->path,
+			   open_flag, acc_mode);
 	return filp;
 
 exit_mutex_unlock:
@@ -2476,9 +2509,9 @@ struct file *do_filp_open(int dfd, const char *pathname,
 			goto out_path;
 	}
 	audit_inode(pathname, nd.path.dentry);
-	filp = finish_open(&nd, open_flag, acc_mode);
+	filp = finish_open(nd.intent.open.file, &nd.path,
+			   open_flag, acc_mode);
 	return filp;
-
 creat:
 	/* OK, have to create the file. Find the parent. */
 	error = path_init_rcu(dfd, pathname,
@@ -2583,6 +2616,171 @@ struct file *filp_open(const char *filename, int flags, int mode)
 }
 EXPORT_SYMBOL(filp_open);
 
+#ifdef CONFIG_EXPORTFS
+static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
+{
+	return 1;
+}
+
+static int do_handle_to_path(int mountdirfd, struct file_handle *handle,
+			     struct path *path)
+{
+	int retval = 0;
+	int handle_dwords;
+
+	path->mnt = get_vfsmount_from_fd(mountdirfd);
+	if (IS_ERR(path->mnt)) {
+		retval = PTR_ERR(path->mnt);
+		goto out_err;
+	}
+	/* change the handle size to multiple of sizeof(u32) */
+	handle_dwords = handle->handle_bytes >> 2;
+	path->dentry = exportfs_decode_fh(path->mnt,
+					  (struct fid *)handle->f_handle,
+					  handle_dwords, handle->handle_type,
+					  vfs_dentry_acceptable, NULL);
+	if (IS_ERR(path->dentry)) {
+		retval = PTR_ERR(path->dentry);
+		goto out_mnt;
+	}
+	return 0;
+out_mnt:
+	mntput(path->mnt);
+out_err:
+	return retval;
+}
+
+int handle_to_path(int mountdirfd, struct file_handle __user *ufh,
+		   struct path *path)
+{
+	int retval = 0;
+	struct file_handle f_handle;
+	struct file_handle *handle = NULL;
+
+	/*
+	 * With handle we don't look at the execute bit on the
+	 * the directory. Ideally we would like CAP_DAC_SEARCH.
+	 * But we don't have that
+	 */
+	if (!capable(CAP_DAC_READ_SEARCH)) {
+		retval = -EPERM;
+		goto out_err;
+	}
+	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle))) {
+		retval = -EFAULT;
+		goto out_err;
+	}
+	if ((f_handle.handle_bytes > MAX_HANDLE_SZ) ||
+	    (f_handle.handle_bytes == 0)) {
+		retval = -EINVAL;
+		goto out_err;
+	}
+	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_bytes,
+			 GFP_KERNEL);
+	if (!handle) {
+		retval = -ENOMEM;
+		goto out_err;
+	}
+	/* copy the full handle */
+	if (copy_from_user(handle, ufh,
+			   sizeof(struct file_handle) +
+			   f_handle.handle_bytes)) {
+		retval = -EFAULT;
+		goto out_handle;
+	}
+
+	retval = do_handle_to_path(mountdirfd, handle, path);
+
+out_handle:
+	kfree(handle);
+out_err:
+	return retval;
+}
+#else
+int handle_to_path(int mountdirfd, struct file_handle __user *ufh,
+		   struct path *path)
+{
+	return -ENOSYS;
+}
+#endif
+
+long do_handle_open(int mountdirfd,
+		    struct file_handle __user *ufh, int open_flag)
+{
+	long retval = 0;
+	int fd, acc_mode;
+	struct path path;
+	struct file *filp;
+
+	/* can't use O_CREATE with open_by_handle */
+	if (open_flag & O_CREAT) {
+		retval = -EINVAL;
+		goto out_err;
+	}
+	retval = handle_to_path(mountdirfd, ufh, &path);
+	if (retval)
+		goto out_err;
+
+	if ((open_flag & O_DIRECTORY) &&
+	    !S_ISDIR(path.dentry->d_inode->i_mode)) {
+		retval = -ENOTDIR;
+		goto out_path;
+	}
+
+	/* Must never be set by userspace */
+	open_flag &= ~FMODE_NONOTIFY;
+
+	/*
+	 * O_SYNC is implemented as __O_SYNC|O_DSYNC.  As many places only
+	 * check for O_DSYNC if the need any syncing at all we enforce it's
+	 * always set instead of having to deal with possibly weird behaviour
+	 * for malicious applications setting only __O_SYNC.
+	 */
+	if (open_flag & __O_SYNC)
+		open_flag |= O_DSYNC;
+
+	acc_mode = MAY_OPEN | ACC_MODE(open_flag);
+
+	/* O_TRUNC implies we need access checks for write permissions */
+	if (open_flag & O_TRUNC)
+		acc_mode |= MAY_WRITE;
+	/*
+	 * Allow the LSM permission hook to distinguish append
+	 * access from general write access.
+	 */
+	if (open_flag & O_APPEND)
+		acc_mode |= MAY_APPEND;
+
+	fd = get_unused_fd_flags(open_flag);
+	if (fd < 0) {
+		retval = fd;
+		goto out_path;
+	}
+	filp = get_empty_filp();
+	if (!filp) {
+		retval = -ENFILE;
+		goto out_free_fd;
+	}
+	filp->f_flags = open_flag;
+	filp = finish_open(filp, &path, open_flag, acc_mode);
+	if (IS_ERR(filp)) {
+		put_unused_fd(fd);
+		retval =  PTR_ERR(filp);
+	} else {
+		retval = fd;
+		fsnotify_open(filp);
+		fd_install(fd, filp);
+	}
+	return retval;
+
+out_free_fd:
+	put_unused_fd(fd);
+out_path:
+	path_put(&path);
+out_err:
+	return retval;
+}
+
 /**
  * lookup_create - lookup a dentry, creating it if it doesn't exist
  * @nd: nameidata info
diff --git a/fs/open.c b/fs/open.c
index d12723a..1ec2623 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -657,10 +657,10 @@ static inline int __get_file_write_access(struct inode *inode,
 	return error;
 }
 
-static struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt,
-					struct file *f,
-					int (*open)(struct inode *, struct file *),
-					const struct cred *cred)
+struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt,
+			   struct file *f,
+			   int (*open)(struct inode *, struct file *),
+			   const struct cred *cred)
 {
 	struct inode *inode;
 	int error;
@@ -1175,3 +1175,27 @@ err_out:
 	return ret;
 }
 #endif
+
+/**
+ * sys_open_by_handle_at: Open the file handle
+ * @mountdirfd: directory file descriptor
+ * @handle: file handle to be opened
+ * @flag: open flags.
+ *
+ * @mountdirfd indicate the directory file descriptor
+ * of the mount point. file handle is decoded relative
+ * to the vfsmount pointed by the @mountdirfd. @flags
+ * value is same as the open(2) flags.
+ */
+SYSCALL_DEFINE3(open_by_handle_at, int, mountdirfd,
+		struct file_handle __user *, handle,
+		int, flags)
+{
+	long ret;
+
+	if (force_o_largefile())
+		flags |= O_LARGEFILE;
+
+	ret = do_handle_open(mountdirfd, handle, flags);
+	return ret;
+}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9fbb0e9..1aed36f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1996,6 +1996,10 @@ extern int do_fallocate(struct file *file, int mode, loff_t offset,
 extern long do_sys_open(int dfd, const char __user *filename, int flags,
 			int mode);
 extern struct file *filp_open(const char *, int, int);
+struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt,
+			   struct file *f,
+			   int (*open)(struct inode *, struct file *),
+			   const struct cred *cred);
 extern struct file * dentry_open(struct dentry *, struct vfsmount *, int,
 				 const struct cred *);
 extern int filp_close(struct file *, fl_owner_t id);
@@ -2213,11 +2217,15 @@ extern void free_write_pipe(struct file *);
 
 extern struct file *do_filp_open(int dfd, const char *pathname,
 		int open_flag, int mode, int acc_mode);
+extern int handle_to_path(int mountdirfd, struct file_handle __user *ufh,
+			  struct path *path);
+extern long do_handle_open(int mountdirfd,
+			   struct file_handle __user *ufh, int open_flag);
 extern int may_open(struct path *, int, int);
 
 extern int kernel_read(struct file *, loff_t, char *, unsigned long);
 extern struct file * open_exec(const char *);
- 
+
 /* fs/dcache.c -- generic fs support functions */
 extern int is_subdir(struct dentry *, struct dentry *);
 extern int path_is_under(struct path *, struct path *);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index f276d4f..3ec030a 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -70,6 +70,7 @@ extern int user_path_at(int, const char __user *, unsigned, struct path *);
 #define user_path_dir(name, path) \
 	user_path_at(AT_FDCWD, name, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, path)
 
+extern struct vfsmount *get_vfsmount_from_fd(int);
 extern int kern_path(const char *, unsigned, struct path *);
 
 extern int path_lookup(const char *, unsigned, struct nameidata *);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e1ef441..a4734c5 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -834,4 +834,7 @@ asmlinkage long sys_old_mmap(struct mmap_arg_struct __user *arg);
 asmlinkage long sys_name_to_handle_at(int dfd, const char __user *name,
 				      struct file_handle __user *handle,
 				      int __user *mnt_id, int flag);
+asmlinkage long sys_open_by_handle_at(int mountdirfd,
+				      struct file_handle __user *handle,
+				      int flags);
 #endif
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -V26 04/16] fs: Don't allow to create hardlink for deleted file
  2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2011-01-29 19:08 ` [PATCH -V26 03/16] vfs: Add open by file handle support Aneesh Kumar K.V
@ 2011-01-29 19:08 ` Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 05/16] fs: Remove i_nlink check from file system link callback Aneesh Kumar K.V
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel, Aneesh Kumar K.V

Add inode->i_nlink == 0 check in VFS. Some of the file systems
do this internally. A followup patch will remove those instance.
This is needed to ensure that with link by handle we don't allow
to create hardlink of an unlinked file. The check also prevent a race
between unlink and link

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/namei.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6b0536f..a8346fa 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3318,7 +3318,11 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de
 		return error;
 
 	mutex_lock(&inode->i_mutex);
-	error = dir->i_op->link(old_dentry, dir, new_dentry);
+	/* Make sure we don't allow creating hardlink to an unlinked file */
+	if (inode->i_nlink == 0)
+		error =  -ENOENT;
+	else
+		error = dir->i_op->link(old_dentry, dir, new_dentry);
 	mutex_unlock(&inode->i_mutex);
 	if (!error)
 		fsnotify_link(dir, inode, new_dentry);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -V26 05/16] fs: Remove i_nlink check from file system link callback
  2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (3 preceding siblings ...)
  2011-01-29 19:08 ` [PATCH -V26 04/16] fs: Don't allow to create hardlink for deleted file Aneesh Kumar K.V
@ 2011-01-29 19:08 ` Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 06/16] x86: Add new syscalls for x86_32 Aneesh Kumar K.V
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel, Aneesh Kumar K.V

Now that VFS check for inode->i_nlink == 0 and returns proper
error, remove similar check from file system

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/btrfs/inode.c    |    3 ---
 fs/ext3/namei.c     |    7 -------
 fs/ext4/namei.c     |    7 -------
 fs/jfs/namei.c      |    3 ---
 fs/reiserfs/namei.c |    4 ----
 fs/ubifs/dir.c      |   18 ------------------
 6 files changed, 0 insertions(+), 42 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 160b55b..5b1e504 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4794,9 +4794,6 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
 	int err;
 	int drop_inode = 0;
 
-	if (inode->i_nlink == 0)
-		return -ENOENT;
-
 	/* do not allow sys_link's with other subvols of the same device */
 	if (root->objectid != BTRFS_I(inode)->root->objectid)
 		return -EPERM;
diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
index b27ba71..561f692 100644
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -2253,13 +2253,6 @@ static int ext3_link (struct dentry * old_dentry,
 
 	dquot_initialize(dir);
 
-	/*
-	 * Return -ENOENT if we've raced with unlink and i_nlink is 0.  Doing
-	 * otherwise has the potential to corrupt the orphan inode list.
-	 */
-	if (inode->i_nlink == 0)
-		return -ENOENT;
-
 retry:
 	handle = ext3_journal_start(dir, EXT3_DATA_TRANS_BLOCKS(dir->i_sb) +
 					EXT3_INDEX_EXTRA_TRANS_BLOCKS);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 5485390..e781b7e 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2304,13 +2304,6 @@ static int ext4_link(struct dentry *old_dentry,
 
 	dquot_initialize(dir);
 
-	/*
-	 * Return -ENOENT if we've raced with unlink and i_nlink is 0.  Doing
-	 * otherwise has the potential to corrupt the orphan inode list.
-	 */
-	if (inode->i_nlink == 0)
-		return -ENOENT;
-
 retry:
 	handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
 					EXT4_INDEX_EXTRA_TRANS_BLOCKS);
diff --git a/fs/jfs/namei.c b/fs/jfs/namei.c
index 81ead85..8799020 100644
--- a/fs/jfs/namei.c
+++ b/fs/jfs/namei.c
@@ -809,9 +809,6 @@ static int jfs_link(struct dentry *old_dentry,
 	if (ip->i_nlink == JFS_LINK_MAX)
 		return -EMLINK;
 
-	if (ip->i_nlink == 0)
-		return -ENOENT;
-
 	dquot_initialize(dir);
 
 	tid = txBegin(ip->i_sb, 0);
diff --git a/fs/reiserfs/namei.c b/fs/reiserfs/namei.c
index ba5f51e..ae303ca 100644
--- a/fs/reiserfs/namei.c
+++ b/fs/reiserfs/namei.c
@@ -1122,10 +1122,6 @@ static int reiserfs_link(struct dentry *old_dentry, struct inode *dir,
 		reiserfs_write_unlock(dir->i_sb);
 		return -EMLINK;
 	}
-	if (inode->i_nlink == 0) {
-		reiserfs_write_unlock(dir->i_sb);
-		return -ENOENT;
-	}
 
 	/* inc before scheduling so reiserfs_unlink knows we are here */
 	inc_nlink(inode);
diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 14f64b6..7217d67 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -522,24 +522,6 @@ static int ubifs_link(struct dentry *old_dentry, struct inode *dir,
 	ubifs_assert(mutex_is_locked(&dir->i_mutex));
 	ubifs_assert(mutex_is_locked(&inode->i_mutex));
 
-	/*
-	 * Return -ENOENT if we've raced with unlink and i_nlink is 0.  Doing
-	 * otherwise has the potential to corrupt the orphan inode list.
-	 *
-	 * Indeed, consider a scenario when 'vfs_link(dirA/fileA)' and
-	 * 'vfs_unlink(dirA/fileA, dirB/fileB)' race. 'vfs_link()' does not
-	 * lock 'dirA->i_mutex', so this is possible. Both of the functions
-	 * lock 'fileA->i_mutex' though. Suppose 'vfs_unlink()' wins, and takes
-	 * 'fileA->i_mutex' mutex first. Suppose 'fileA->i_nlink' is 1. In this
-	 * case 'ubifs_unlink()' will drop the last reference, and put 'inodeA'
-	 * to the list of orphans. After this, 'vfs_link()' will link
-	 * 'dirB/fileB' to 'inodeA'. This is a problem because, for example,
-	 * the subsequent 'vfs_unlink(dirB/fileB)' will add the same inode
-	 * to the list of orphans.
-	 */
-	 if (inode->i_nlink == 0)
-		 return -ENOENT;
-
 	err = dbg_check_synced_i_size(inode);
 	if (err)
 		return err;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -V26 06/16] x86: Add new syscalls for x86_32
  2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (4 preceding siblings ...)
  2011-01-29 19:08 ` [PATCH -V26 05/16] fs: Remove i_nlink check from file system link callback Aneesh Kumar K.V
@ 2011-01-29 19:08 ` Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 07/16] x86: Add new syscalls for x86_64 Aneesh Kumar K.V
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel, Aneesh Kumar K.V

This patch adds new syscalls to x86_32

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/x86/include/asm/unistd_32.h   |    4 +++-
 arch/x86/kernel/syscall_table_32.S |    2 ++
 2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
index b766a5e..f4c4973 100644
--- a/arch/x86/include/asm/unistd_32.h
+++ b/arch/x86/include/asm/unistd_32.h
@@ -346,10 +346,12 @@
 #define __NR_fanotify_init	338
 #define __NR_fanotify_mark	339
 #define __NR_prlimit64		340
+#define __NR_name_to_handle_at	341
+#define __NR_open_by_handle_at  342
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 341
+#define NR_syscalls 343
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index b35786d..c314b21 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -340,3 +340,5 @@ ENTRY(sys_call_table)
 	.long sys_fanotify_init
 	.long sys_fanotify_mark
 	.long sys_prlimit64		/* 340 */
+	.long sys_name_to_handle_at
+	.long sys_open_by_handle_at
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -V26 07/16] x86: Add new syscalls for x86_64
  2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (5 preceding siblings ...)
  2011-01-29 19:08 ` [PATCH -V26 06/16] x86: Add new syscalls for x86_32 Aneesh Kumar K.V
@ 2011-01-29 19:08 ` Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 08/16] unistd.h: Add new syscalls numbers to asm-generic Aneesh Kumar K.V
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel, Aneesh Kumar K.V

This patch add new syscalls to x86_64

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/x86/ia32/ia32entry.S        |    2 ++
 arch/x86/include/asm/unistd_64.h |    4 ++++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 518bb99..98d353e 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -851,4 +851,6 @@ ia32_sys_call_table:
 	.quad sys_fanotify_init
 	.quad sys32_fanotify_mark
 	.quad sys_prlimit64		/* 340 */
+	.quad sys_name_to_handle_at
+	.quad compat_sys_open_by_handle_at
 ia32_syscall_end:
diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
index 363e9b8..81a3d5b 100644
--- a/arch/x86/include/asm/unistd_64.h
+++ b/arch/x86/include/asm/unistd_64.h
@@ -669,6 +669,10 @@ __SYSCALL(__NR_fanotify_init, sys_fanotify_init)
 __SYSCALL(__NR_fanotify_mark, sys_fanotify_mark)
 #define __NR_prlimit64				302
 __SYSCALL(__NR_prlimit64, sys_prlimit64)
+#define __NR_name_to_handle_at			303
+__SYSCALL(__NR_name_to_handle_at, sys_name_to_handle_at)
+#define __NR_open_by_handle_at			304
+__SYSCALL(__NR_open_by_handle_at, sys_open_by_handle_at)
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -V26 08/16] unistd.h: Add new syscalls numbers to asm-generic
  2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (6 preceding siblings ...)
  2011-01-29 19:08 ` [PATCH -V26 07/16] x86: Add new syscalls for x86_64 Aneesh Kumar K.V
@ 2011-01-29 19:08 ` Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 09/16] vfs: Export file system uuid via /proc/<pid>/mountinfo Aneesh Kumar K.V
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel, Aneesh Kumar K.V

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/asm-generic/unistd.h |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/include/asm-generic/unistd.h b/include/asm-generic/unistd.h
index b969770..57af033 100644
--- a/include/asm-generic/unistd.h
+++ b/include/asm-generic/unistd.h
@@ -646,9 +646,13 @@ __SYSCALL(__NR_prlimit64, sys_prlimit64)
 __SYSCALL(__NR_fanotify_init, sys_fanotify_init)
 #define __NR_fanotify_mark 263
 __SYSCALL(__NR_fanotify_mark, sys_fanotify_mark)
+#define __NR_name_to_handle_at		264
+__SYSCALL(__NR_name_to_handle_at, sys_name_to_handle_at)
+#define __NR_open_by_handle_at		265
+__SYSCALL(__NR_open_by_handle_at, sys_open_by_handle_at)
 
 #undef __NR_syscalls
-#define __NR_syscalls 264
+#define __NR_syscalls 266
 
 /*
  * All syscalls below here should go away really,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -V26 09/16] vfs: Export file system uuid via /proc/<pid>/mountinfo
  2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (7 preceding siblings ...)
  2011-01-29 19:08 ` [PATCH -V26 08/16] unistd.h: Add new syscalls numbers to asm-generic Aneesh Kumar K.V
@ 2011-01-29 19:08 ` Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 10/16] ext3: Copy fs UUID to superblock Aneesh Kumar K.V
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel, Aneesh Kumar K.V

We add a per superblock uuid field. File systems should
update the uuid in the fill_super callback

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/namespace.c     |   16 ++++++++++++++++
 include/linux/fs.h |    1 +
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 7b0b953..43ef348 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1002,6 +1002,18 @@ const struct seq_operations mounts_op = {
 	.show	= show_vfsmnt
 };
 
+static int uuid_is_nil(u8 *uuid)
+{
+	int i;
+	u8  *cp = (u8 *)uuid;
+
+	for (i = 0; i < 16; i++) {
+		if (*cp++)
+			return 0;
+	}
+	return 1;
+}
+
 static int show_mountinfo(struct seq_file *m, void *v)
 {
 	struct proc_mounts *p = m->private;
@@ -1040,6 +1052,10 @@ static int show_mountinfo(struct seq_file *m, void *v)
 	if (IS_MNT_UNBINDABLE(mnt))
 		seq_puts(m, " unbindable");
 
+	if (!uuid_is_nil(mnt->mnt_sb->s_uuid))
+		/* print the uuid */
+		seq_printf(m, " uuid:%pU", mnt->mnt_sb->s_uuid);
+
 	/* Filesystem specific data */
 	seq_puts(m, " - ");
 	show_type(m, sb);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1aed36f..ac6e899 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1407,6 +1407,7 @@ struct super_block {
 	wait_queue_head_t	s_wait_unfrozen;
 
 	char s_id[32];				/* Informational name */
+	u8 s_uuid[16];				/* UUID */
 
 	void 			*s_fs_info;	/* Filesystem private info */
 	fmode_t			s_mode;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -V26 10/16] ext3: Copy fs UUID to superblock.
  2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (8 preceding siblings ...)
  2011-01-29 19:08 ` [PATCH -V26 09/16] vfs: Export file system uuid via /proc/<pid>/mountinfo Aneesh Kumar K.V
@ 2011-01-29 19:08 ` Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 11/16] ext4: " Aneesh Kumar K.V
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel, Aneesh Kumar K.V

File system UUID is made available to application
via  /proc/<pid>/mountinfo

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext3/super.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 85c8cc8..9cc19a1 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -1936,6 +1936,7 @@ static int ext3_fill_super (struct super_block *sb, void *data, int silent)
 	sb->s_qcop = &ext3_qctl_operations;
 	sb->dq_op = &ext3_quota_operations;
 #endif
+	memcpy(sb->s_uuid, es->s_uuid, sizeof(es->s_uuid));
 	INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */
 	mutex_init(&sbi->s_orphan_lock);
 	mutex_init(&sbi->s_resize_lock);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -V26 11/16] ext4: Copy fs UUID to superblock
  2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (9 preceding siblings ...)
  2011-01-29 19:08 ` [PATCH -V26 10/16] ext3: Copy fs UUID to superblock Aneesh Kumar K.V
@ 2011-01-29 19:08 ` Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 12/16] vfs: Add O_PATH open flag Aneesh Kumar K.V
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel, Aneesh Kumar K.V

File system UUID is made available to application
via  /proc/<pid>/mountinfo

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext4/super.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 48ce561..e250c61 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3413,6 +3413,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	sb->s_qcop = &ext4_qctl_operations;
 	sb->dq_op = &ext4_quota_operations;
 #endif
+	memcpy(sb->s_uuid, es->s_uuid, sizeof(es->s_uuid));
+
 	INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */
 	mutex_init(&sbi->s_orphan_lock);
 	mutex_init(&sbi->s_resize_lock);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -V26 12/16] vfs: Add O_PATH open flag
  2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (10 preceding siblings ...)
  2011-01-29 19:08 ` [PATCH -V26 11/16] ext4: " Aneesh Kumar K.V
@ 2011-01-29 19:08 ` Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 13/16] fs: Support "" relative pathnames Aneesh Kumar K.V
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel, Aneesh Kumar K.V

This flag can be used to get a descriptor that is used only
for fetching file attributes. We can get a O_PATH descriptor for even symlink.
A attempt to do any file system operation like read/write/lseek/ioctl will all
fail with EBADF

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/fcntl.c                  |    4 +-
 fs/file_table.c             |   60 +++++++++++++++++++++++++++++++++++++++++++
 fs/namei.c                  |   38 +++++++++++++++++++++++----
 fs/open.c                   |   16 +++++++++--
 include/asm-generic/fcntl.h |    4 +++
 include/linux/file.h        |    2 +
 6 files changed, 113 insertions(+), 11 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index ecc8b39..ba4b564 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -808,14 +808,14 @@ static int __init fcntl_init(void)
 	 * Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY
 	 * is defined as O_NONBLOCK on some platforms and not on others.
 	 */
-	BUILD_BUG_ON(18 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
+	BUILD_BUG_ON(19 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
 		O_RDONLY	| O_WRONLY	| O_RDWR	|
 		O_CREAT		| O_EXCL	| O_NOCTTY	|
 		O_TRUNC		| O_APPEND	| /* O_NONBLOCK	| */
 		__O_SYNC	| O_DSYNC	| FASYNC	|
 		O_DIRECT	| O_LARGEFILE	| O_DIRECTORY	|
 		O_NOFOLLOW	| O_NOATIME	| O_CLOEXEC	|
-		FMODE_EXEC
+		FMODE_EXEC	| O_PATH
 		));
 
 	fasync_cache = kmem_cache_create("fasync_cache",
diff --git a/fs/file_table.c b/fs/file_table.c
index c3e89ad..67b2668 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -284,11 +284,39 @@ struct file *fget(unsigned int fd)
 	}
 	rcu_read_unlock();
 
+	if (file && (file->f_flags & O_PATH)) {
+		/*
+		 * O_PATH descriptor need to use
+		 * fget_light_lenient() variant
+		 */
+		fput(file);
+		file = NULL;
+	}
 	return file;
 }
 
 EXPORT_SYMBOL(fget);
 
+struct file *fget_lenient(unsigned int fd)
+{
+	struct file *file;
+	struct files_struct *files = current->files;
+
+	rcu_read_lock();
+	file = fcheck_files(files, fd);
+	if (file) {
+		if (!atomic_long_inc_not_zero(&file->f_count)) {
+			/* File object ref couldn't be taken */
+			rcu_read_unlock();
+			return NULL;
+		}
+	}
+	rcu_read_unlock();
+
+	return file;
+}
+EXPORT_SYMBOL(fget_lenient);
+
 /*
  * Lightweight file lookup - no refcnt increment if fd table isn't shared.
  *
@@ -326,6 +354,38 @@ struct file *fget_light(unsigned int fd, int *fput_needed)
 		rcu_read_unlock();
 	}
 
+	if (file && (file->f_flags & O_PATH)) {
+		/*
+		 * O_PATH descriptor need to use
+		 * fget_light_lenient() variant
+		 */
+		if (*fput_needed)
+			fput(file);
+		file = NULL;
+	}
+	return file;
+}
+
+struct file *fget_light_lenient(unsigned int fd, int *fput_needed)
+{
+	struct file *file;
+	struct files_struct *files = current->files;
+
+	*fput_needed = 0;
+	if (atomic_read(&files->count) == 1) {
+		file = fcheck_files(files, fd);
+	} else {
+		rcu_read_lock();
+		file = fcheck_files(files, fd);
+		if (file) {
+			if (atomic_long_inc_not_zero(&file->f_count))
+				*fput_needed = 1;
+			else
+				/* Didn't get the reference, someone's freed */
+				file = NULL;
+		}
+		rcu_read_unlock();
+	}
 	return file;
 }
 
diff --git a/fs/namei.c b/fs/namei.c
index a8346fa..b9a500c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2115,6 +2115,9 @@ int may_open(struct path *path, int acc_mode, int flag)
 	if (!inode)
 		return -ENOENT;
 
+	if (!(acc_mode & MAY_OPEN))
+		return 0;
+
 	switch (inode->i_mode & S_IFMT) {
 	case S_IFLNK:
 		return -ELOOP;
@@ -2404,8 +2407,10 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
 	if (!path->dentry->d_inode)
 		goto exit_dput;
 
-	if (path->dentry->d_inode->i_op->follow_link)
-		return NULL;
+	/* We allow open on symlinks with O_PATH flag */
+	if ((open_flag & (O_PATH | O_NOFOLLOW)) != (O_PATH | O_NOFOLLOW))
+		if (path->dentry->d_inode->i_op->follow_link)
+			return NULL;
 
 	path_to_nameidata(path, nd);
 	nd->inode = path->dentry->d_inode;
@@ -2444,6 +2449,14 @@ struct file *do_filp_open(int dfd, const char *pathname,
 	int flag = open_to_namei_flags(open_flag);
 	int flags;
 
+	/*
+	 * If we have O_PATH in the open flag. Then we
+	 * cannot have anything other than the below set of flags
+	 */
+	if ((open_flag & O_PATH) &&
+	    (open_flag & ~(O_DIRECTORY|O_NOFOLLOW|O_PATH)))
+		return ERR_PTR(-EINVAL);
+
 	if (!(open_flag & O_CREAT))
 		mode = 0;
 
@@ -2459,7 +2472,7 @@ struct file *do_filp_open(int dfd, const char *pathname,
 	if (open_flag & __O_SYNC)
 		open_flag |= O_DSYNC;
 
-	if (!acc_mode)
+	if (!acc_mode && !(open_flag & O_PATH))
 		acc_mode = MAY_OPEN | ACC_MODE(open_flag);
 
 	/* O_TRUNC implies we need access checks for write permissions */
@@ -2499,7 +2512,10 @@ struct file *do_filp_open(int dfd, const char *pathname,
 	if (unlikely(error))
 		goto out_filp;
 	error = -ELOOP;
-	if (!(nd.flags & LOOKUP_FOLLOW)) {
+	/*
+	 * With allow open on symlinks with O_PATH flag.
+	 */
+	if (!(nd.flags & LOOKUP_FOLLOW) && !(open_flag & O_PATH)) {
 		if (nd.inode->i_op->follow_link)
 			goto out_path;
 	}
@@ -2708,10 +2724,19 @@ long do_handle_open(int mountdirfd,
 		    struct file_handle __user *ufh, int open_flag)
 {
 	long retval = 0;
-	int fd, acc_mode;
+	int fd, acc_mode = 0;
 	struct path path;
 	struct file *filp;
 
+	/*
+	 * If we have O_PATH in the open flag. Then we
+	 * cannot have anything other than the below set of flags
+	 */
+	if ((open_flag & O_PATH) &&
+	    (open_flag & ~(O_DIRECTORY|O_PATH))) {
+		retval = -EINVAL;
+		goto out_err;
+	}
 	/* can't use O_CREATE with open_by_handle */
 	if (open_flag & O_CREAT) {
 		retval = -EINVAL;
@@ -2739,7 +2764,8 @@ long do_handle_open(int mountdirfd,
 	if (open_flag & __O_SYNC)
 		open_flag |= O_DSYNC;
 
-	acc_mode = MAY_OPEN | ACC_MODE(open_flag);
+	if (!(open_flag & O_PATH))
+		acc_mode = MAY_OPEN | ACC_MODE(open_flag);
 
 	/* O_TRUNC implies we need access checks for write permissions */
 	if (open_flag & O_TRUNC)
diff --git a/fs/open.c b/fs/open.c
index 1ec2623..7dbde0f 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -657,6 +657,9 @@ static inline int __get_file_write_access(struct inode *inode,
 	return error;
 }
 
+/* empty file_operations to be used for O_PATH descriptor */
+static const struct file_operations o_path_file_operations = {};
+
 struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt,
 			   struct file *f,
 			   int (*open)(struct inode *, struct file *),
@@ -665,8 +668,9 @@ struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt,
 	struct inode *inode;
 	int error;
 
-	f->f_mode = OPEN_FMODE(f->f_flags) | FMODE_LSEEK |
-				FMODE_PREAD | FMODE_PWRITE;
+	if (!(f->f_flags & O_PATH))
+		f->f_mode = OPEN_FMODE(f->f_flags) | FMODE_LSEEK |
+			    FMODE_PREAD | FMODE_PWRITE;
 	inode = dentry->d_inode;
 	if (f->f_mode & FMODE_WRITE) {
 		error = __get_file_write_access(inode, mnt);
@@ -680,8 +684,14 @@ struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt,
 	f->f_path.dentry = dentry;
 	f->f_path.mnt = mnt;
 	f->f_pos = 0;
-	f->f_op = fops_get(inode->i_fop);
 	file_sb_list_add(f, inode->i_sb);
+	/* For O_PATH open we just return without opening the file */
+	if (f->f_flags & O_PATH) {
+		f->f_op = &o_path_file_operations;
+		return f;
+	}
+	f->f_op = fops_get(inode->i_fop);
+
 
 	error = security_dentry_open(f, cred);
 	if (error)
diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h
index 0fc16e3..84793c7 100644
--- a/include/asm-generic/fcntl.h
+++ b/include/asm-generic/fcntl.h
@@ -80,6 +80,10 @@
 #define O_SYNC		(__O_SYNC|O_DSYNC)
 #endif
 
+#ifndef O_PATH
+#define O_PATH		010000000
+#endif
+
 #ifndef O_NDELAY
 #define O_NDELAY	O_NONBLOCK
 #endif
diff --git a/include/linux/file.h b/include/linux/file.h
index e85baeb..e21b733 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -29,6 +29,8 @@ static inline void fput_light(struct file *file, int fput_needed)
 
 extern struct file *fget(unsigned int fd);
 extern struct file *fget_light(unsigned int fd, int *fput_needed);
+extern struct file *fget_lenient(unsigned int fd);
+extern struct file *fget_light_lenient(unsigned int fd, int *fput_needed);
 extern void set_close_on_exec(unsigned int fd, int flag);
 extern void put_filp(struct file *);
 extern int alloc_fd(unsigned start, unsigned flags);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -V26 13/16] fs: Support "" relative pathnames
  2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (11 preceding siblings ...)
  2011-01-29 19:08 ` [PATCH -V26 12/16] vfs: Add O_PATH open flag Aneesh Kumar K.V
@ 2011-01-29 19:08 ` Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 14/16] fs: limit linkat syscall with null relative name to CAP_DAC_READ_SEARCH Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel, Aneesh Kumar K.V

Support "" relative pathnames relative to file descriptor opened
with O_PATH flag. This is needed so that we can make *_at variant
syscall operate on the dirfd passed.

Primary motivation is to enable readlinkat and linkat syscall
to work on symlink file descriptor. This also enables us to
do path lookup in userspace which can be useful for implementing
file servers in userspace.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/namei.c         |  104 +++++++++++++++++++++++++++++++++++++++-------------
 include/linux/fs.h |   10 ++++-
 2 files changed, 87 insertions(+), 27 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index b9a500c..990b155 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -115,7 +115,8 @@
  * POSIX.1 2.4: an empty pathname is invalid (ENOENT).
  * PATH_MAX includes the nul terminator --RR.
  */
-static int do_getname(const char __user *filename, char *page)
+static int __do_getname(const char __user *filename,
+			char *page, int allow_null_name)
 {
 	int retval;
 	unsigned long len = PATH_MAX;
@@ -132,19 +133,20 @@ static int do_getname(const char __user *filename, char *page)
 		if (retval < len)
 			return 0;
 		return -ENAMETOOLONG;
-	} else if (!retval)
+	} else if (!retval && !allow_null_name)
 		retval = -ENOENT;
+
 	return retval;
 }
 
-char * getname(const char __user * filename)
+char *do_getname(const char __user *filename, int allow_null_name)
 {
 	char *tmp, *result;
 
 	result = ERR_PTR(-ENOMEM);
 	tmp = __getname();
 	if (tmp)  {
-		int retval = do_getname(filename, tmp);
+		int retval = __do_getname(filename, tmp, allow_null_name);
 
 		result = tmp;
 		if (retval < 0) {
@@ -155,6 +157,7 @@ char * getname(const char __user * filename)
 	audit_getname(result);
 	return result;
 }
+EXPORT_SYMBOL(do_getname);
 
 #ifdef CONFIG_AUDITSYSCALL
 void putname(const char *name)
@@ -1605,6 +1608,14 @@ static int path_init_rcu(int dfd, const char *name, unsigned int flags, struct n
 		struct fs_struct *fs = current->fs;
 		unsigned seq;
 
+		/*
+		 * If relative name is "" the descriptor should
+		 * be O_PATH descriptor.
+		 */
+		if (*name == 0) {
+			retval = -ENOENT;
+			goto out_fail;
+		}
 		br_read_lock(vfsmount_lock);
 		rcu_read_lock();
 
@@ -1617,21 +1628,38 @@ static int path_init_rcu(int dfd, const char *name, unsigned int flags, struct n
 	} else {
 		struct dentry *dentry;
 
-		file = fget_light(dfd, &fput_needed);
+		file = fget_light_lenient(dfd, &fput_needed);
 		retval = -EBADF;
 		if (!file)
 			goto out_fail;
 
 		dentry = file->f_path.dentry;
-
-		retval = -ENOTDIR;
-		if (!S_ISDIR(dentry->d_inode->i_mode))
-			goto fput_fail;
-
-		retval = file_permission(file, MAY_EXEC);
-		if (retval)
-			goto fput_fail;
-
+		/*
+		 * We allow O_PATH fd to be used relative
+		 * to "" name. This indicate operate on
+		 * dfd itself.
+		 */
+		if (!S_ISDIR(dentry->d_inode->i_mode)) {
+			if (!(file->f_flags & O_PATH) || (*name != 0)) {
+				retval = -ENOTDIR;
+				goto fput_fail;
+			}
+		} else {
+			/*
+			 * if directory and relative name is not ""
+			 * then we need to check for EXEC permission.
+			 * If relative name is "" the descriptor should
+			 * be O_PATH descriptor.
+			 */
+			if (*name != 0) {
+				retval = file_permission(file, MAY_EXEC);
+				if (retval)
+					goto fput_fail;
+			} else if (!(file->f_flags & O_PATH)) {
+				retval = -ENOENT;
+				goto fput_fail;
+			}
+		}
 		nd->path = file->f_path;
 		if (fput_needed)
 			nd->file = file;
@@ -1665,25 +1693,50 @@ static int path_init(int dfd, const char *name, unsigned int flags, struct namei
 		nd->path = nd->root;
 		path_get(&nd->root);
 	} else if (dfd == AT_FDCWD) {
+		/*
+		 * If relative name is "" the descriptor should
+		 * be O_PATH descriptor.
+		 */
+		if (*name == 0) {
+			retval = -ENOENT;
+			goto out_fail;
+		}
 		get_fs_pwd(current->fs, &nd->path);
 	} else {
 		struct dentry *dentry;
 
-		file = fget_light(dfd, &fput_needed);
+		file = fget_light_lenient(dfd, &fput_needed);
 		retval = -EBADF;
 		if (!file)
 			goto out_fail;
 
 		dentry = file->f_path.dentry;
-
-		retval = -ENOTDIR;
-		if (!S_ISDIR(dentry->d_inode->i_mode))
-			goto fput_fail;
-
-		retval = file_permission(file, MAY_EXEC);
-		if (retval)
-			goto fput_fail;
-
+		/*
+		 * We allow O_PATH fd to be used relative
+		 * to "" name. This indicate operate on
+		 * dfd itself.
+		 */
+		if (!S_ISDIR(dentry->d_inode->i_mode)) {
+			if (!(file->f_flags & O_PATH) || (*name != 0)) {
+				retval = -ENOTDIR;
+				goto fput_fail;
+			}
+		} else {
+			/*
+			 * if directory and relative name is not ""
+			 * then we need to check for EXEC permission.
+			 * If relative name is "" the descriptor should
+			 * be O_PATH descriptor.
+			 */
+			if (*name != 0) {
+				retval = file_permission(file, MAY_EXEC);
+				if (retval)
+					goto fput_fail;
+			} else if (!(file->f_flags & O_PATH)) {
+				retval = -ENOENT;
+				goto fput_fail;
+			}
+		}
 		nd->path = file->f_path;
 		path_get(&file->f_path);
 
@@ -1926,7 +1979,7 @@ int user_path_at(int dfd, const char __user *name, unsigned flags,
 		 struct path *path)
 {
 	struct nameidata nd;
-	char *tmp = getname(name);
+	char *tmp = getname_null(name);
 	int err = PTR_ERR(tmp);
 	if (!IS_ERR(tmp)) {
 
@@ -3806,7 +3859,6 @@ EXPORT_SYMBOL(follow_down_one);
 EXPORT_SYMBOL(follow_down);
 EXPORT_SYMBOL(follow_up);
 EXPORT_SYMBOL(get_write_access); /* binfmt_aout */
-EXPORT_SYMBOL(getname);
 EXPORT_SYMBOL(lock_rename);
 EXPORT_SYMBOL(lookup_one_len);
 EXPORT_SYMBOL(page_follow_link_light);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ac6e899..3f1e7fc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2004,7 +2004,15 @@ struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt,
 extern struct file * dentry_open(struct dentry *, struct vfsmount *, int,
 				 const struct cred *);
 extern int filp_close(struct file *, fl_owner_t id);
-extern char * getname(const char __user *);
+extern char *do_getname(const char __user *, int allow_null_name);
+static inline char *getname(const char __user *name)
+{
+	return do_getname(name, 0);
+}
+static inline char *getname_null(const char __user *name)
+{
+	return do_getname(name, 1);
+}
 
 /* fs/ioctl.c */
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -V26 14/16] fs: limit linkat syscall with null relative name to CAP_DAC_READ_SEARCH
  2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (12 preceding siblings ...)
  2011-01-29 19:08 ` [PATCH -V26 13/16] fs: Support "" relative pathnames Aneesh Kumar K.V
@ 2011-01-29 19:08 ` Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 15/16] vfs: enable O_PATH descriptor for few syscalls Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 16/16] vfs: enable "" pathname in openat syscall Aneesh Kumar K.V
  15 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel, Aneesh Kumar K.V

We don't want to allow creation of private hardlinks by different application
using the fd passed to them via SCM_RIGHTS. So limit the null relative name
usage in linkat syscall to CAP_DAC_READ_SEARCH

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/namei.c |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 990b155..5c4902c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3408,6 +3408,18 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de
 	return error;
 }
 
+static int null_name(const char __user *name)
+{
+	int retval = 0;
+	char *tmp = getname_null(name);
+	if (!IS_ERR(tmp)) {
+		if (*tmp == 0)
+			retval = 1;
+	}
+	putname(tmp);
+	return retval;
+}
+
 /*
  * Hardlinks are often used in delicate situations.  We avoid
  * security-related surprises by not following symlinks on the
@@ -3428,6 +3440,15 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
 
 	if ((flags & ~AT_SYMLINK_FOLLOW) != 0)
 		return -EINVAL;
+	/*
+	 * To use null names we require CAP_DAC_READ_SEARCH
+	 * This ensures that not everyone will be able to create
+	 * handlink using the passed filedescriptor.
+	 */
+	if (null_name(oldname)) {
+		if (!capable(CAP_DAC_READ_SEARCH))
+			return -ENOENT;
+	}
 
 	error = user_path_at(olddfd, oldname,
 			     flags & AT_SYMLINK_FOLLOW ? LOOKUP_FOLLOW : 0,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -V26 15/16] vfs: enable O_PATH descriptor for few syscalls
  2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (13 preceding siblings ...)
  2011-01-29 19:08 ` [PATCH -V26 14/16] fs: limit linkat syscall with null relative name to CAP_DAC_READ_SEARCH Aneesh Kumar K.V
@ 2011-01-29 19:08 ` Aneesh Kumar K.V
  2011-01-29 19:08 ` [PATCH -V26 16/16] vfs: enable "" pathname in openat syscall Aneesh Kumar K.V
  15 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel, Aneesh Kumar K.V

This patch enable O_PATH descriptor for dup*/chmod/chown/stat

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/fcntl.c |    2 +-
 fs/open.c  |    4 ++--
 fs/stat.c  |    2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index ba4b564..b052600 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -131,7 +131,7 @@ SYSCALL_DEFINE2(dup2, unsigned int, oldfd, unsigned int, newfd)
 SYSCALL_DEFINE1(dup, unsigned int, fildes)
 {
 	int ret = -EBADF;
-	struct file *file = fget(fildes);
+	struct file *file = fget_lenient(fildes);
 
 	if (file) {
 		ret = get_unused_fd();
diff --git a/fs/open.c b/fs/open.c
index 7dbde0f..328a76e 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -447,7 +447,7 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, mode_t, mode)
 	int err = -EBADF;
 	struct iattr newattrs;
 
-	file = fget(fd);
+	file = fget_lenient(fd);
 	if (!file)
 		goto out;
 
@@ -611,7 +611,7 @@ SYSCALL_DEFINE3(fchown, unsigned int, fd, uid_t, user, gid_t, group)
 	int error = -EBADF;
 	struct dentry * dentry;
 
-	file = fget(fd);
+	file = fget_lenient(fd);
 	if (!file)
 		goto out;
 
diff --git a/fs/stat.c b/fs/stat.c
index d5c61cf..d88a96a 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -57,7 +57,7 @@ EXPORT_SYMBOL(vfs_getattr);
 
 int vfs_fstat(unsigned int fd, struct kstat *stat)
 {
-	struct file *f = fget(fd);
+	struct file *f = fget_lenient(fd);
 	int error = -EBADF;
 
 	if (f) {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -V26 16/16] vfs: enable "" pathname in openat syscall
  2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (14 preceding siblings ...)
  2011-01-29 19:08 ` [PATCH -V26 15/16] vfs: enable O_PATH descriptor for few syscalls Aneesh Kumar K.V
@ 2011-01-29 19:08 ` Aneesh Kumar K.V
  15 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2011-01-29 19:08 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, neilb, npiggin, hooanon05, bfields, miklos
  Cc: linux-fsdevel, sfrench, philippe.deniel, linux-kernel, Aneesh Kumar K.V

This enable the below usage

fd = openat(dir_fd, "", O_RDONLY);

dir_fd can be O_PATH descriptor. It does all access check as per the
open flags.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/open.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 328a76e..bccb12d 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -893,7 +893,7 @@ EXPORT_SYMBOL(fd_install);
 
 long do_sys_open(int dfd, const char __user *filename, int flags, int mode)
 {
-	char *tmp = getname(filename);
+	char *tmp = getname_null(filename);
 	int fd = PTR_ERR(tmp);
 
 	if (!IS_ERR(tmp)) {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-01-29 19:12 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-29 19:08 [PATCH -V26 00/16] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
2011-01-29 19:08 ` [PATCH -V26 01/16] exportfs: Return the minimum required handle size Aneesh Kumar K.V
2011-01-29 19:08 ` [PATCH -V26 02/16] vfs: Add name to file handle conversion support Aneesh Kumar K.V
2011-01-29 19:08 ` [PATCH -V26 03/16] vfs: Add open by file handle support Aneesh Kumar K.V
2011-01-29 19:08 ` [PATCH -V26 04/16] fs: Don't allow to create hardlink for deleted file Aneesh Kumar K.V
2011-01-29 19:08 ` [PATCH -V26 05/16] fs: Remove i_nlink check from file system link callback Aneesh Kumar K.V
2011-01-29 19:08 ` [PATCH -V26 06/16] x86: Add new syscalls for x86_32 Aneesh Kumar K.V
2011-01-29 19:08 ` [PATCH -V26 07/16] x86: Add new syscalls for x86_64 Aneesh Kumar K.V
2011-01-29 19:08 ` [PATCH -V26 08/16] unistd.h: Add new syscalls numbers to asm-generic Aneesh Kumar K.V
2011-01-29 19:08 ` [PATCH -V26 09/16] vfs: Export file system uuid via /proc/<pid>/mountinfo Aneesh Kumar K.V
2011-01-29 19:08 ` [PATCH -V26 10/16] ext3: Copy fs UUID to superblock Aneesh Kumar K.V
2011-01-29 19:08 ` [PATCH -V26 11/16] ext4: " Aneesh Kumar K.V
2011-01-29 19:08 ` [PATCH -V26 12/16] vfs: Add O_PATH open flag Aneesh Kumar K.V
2011-01-29 19:08 ` [PATCH -V26 13/16] fs: Support "" relative pathnames Aneesh Kumar K.V
2011-01-29 19:08 ` [PATCH -V26 14/16] fs: limit linkat syscall with null relative name to CAP_DAC_READ_SEARCH Aneesh Kumar K.V
2011-01-29 19:08 ` [PATCH -V26 15/16] vfs: enable O_PATH descriptor for few syscalls Aneesh Kumar K.V
2011-01-29 19:08 ` [PATCH -V26 16/16] vfs: enable "" pathname in openat syscall Aneesh Kumar K.V

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.