All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -V5 0/8] Generic name to handle and open by handle syscalls
@ 2010-04-26 17:33 Aneesh Kumar K.V
  2010-04-26 17:33 ` [PATCH -V5 1/8] exportfs: Return the minimum required handle size Aneesh Kumar K.V
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:33 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb; +Cc: linux-fsdevel, sfrench

Hi,

The below set of patches implement open by handle support using exportfs
operations. This allows user space application to map a file name to file 
handle and later open the file using handle. This should be usable
for userspace NFS [1] and 9P server [2]. XFS already support this with the ioctls
XFS_IOC_PATH_TO_HANDLE and XFS_IOC_OPEN_BY_HANDLE.

[1] http://nfs-ganesha.sourceforge.net/
[2] http://lists.gnu.org/archive/html/qemu-devel/2010-03/msg01087.html

TODO:
I guess we would need to optimize how we get the vfsmount for the filesystem
uuid specified. Searching the file system list and task name space may be a big
overhead for each open by handle call.

Chages from V4:
a) Changed the syscal arguments so that we don't need compat syscalls
   as suggested by Christoph
c) Added two new syscall sys_lname_to_handle and sys_freadlink to work with
   symlinks
d) Changed open_by_handle to work with all file types
e) Add ext3 support

Changes from V3:
a) Code cleanup suggested by Andreas
b) x86_64 syscall support
c) add compat syscall

Chages from V2:
a) Support system wide unique handle.

Changes from v1:
a) handle size is now specified in bytes
b) returns -EOVERFLOW if the handle size is small
c) dropped open_handle syscall and added open_by_handle_at syscall
   open_by_handle_at takes mount_fd as the directory fd of the mount point
   containing the file
e) handle will only be unique in a given file system. So for an NFS server
   exporting multiple file system, NFS server will have to internally track the
   mount point to which a file handle belongs to. We should be able to do it much
   easily than expecting kernel to give a system wide unique file handle. System
   wide unique file handle would need much larger changes to the exportfs or VFS
   interface and I was not sure whether we really need to do that in the kernel or
   in the user space
f) open_handle_at now only check for DAC_OVERRIDE capability


Example program: (x86_32). (x86_64 would need a different syscall number)
----------------
#include <stdio.h>
#include <stdlib.h>

#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <string.h>

struct uuid {
	char uuid[16];
};
struct file_handle {
        int handle_size;
        int handle_type;
	struct uuid fsid;
        char handle[0];
};


static int name_to_handle(const char *name, struct file_handle  *fh)
{
	return syscall(338, name, fh);
}

static int lname_to_handle(const char *name, struct file_handle  *fh)
{
	return syscall(339, name, fh);
}

static int open_by_handle(struct file_handle *fh,  int flags)
{
	return syscall(340, fh, flags);
}

static int freadlink(int fd, char *buf, size_t bufsiz)
{
	return syscall(341, fd, buf, bufsiz);
}

#define BUFSZ 100
int main(int argc, char *argv[])
{
        int ret;
	int handle_sz;
	struct stat bufstat;
        int fd, dirfd;
        char buf[BUFSZ];
        struct file_handle *fh = NULL;;
again:
	if (fh && fh->handle_size) {
		handle_sz = fh->handle_size;
		free(fh);
        	fh = malloc(sizeof(struct file_handle) + handle_sz);
		fh->handle_size = handle_sz;
	} else {
		fh = malloc(sizeof(struct file_handle));
		fh->handle_size = 0;
	}
        errno  = 0;
        ret = lname_to_handle(argv[1], fh);
        if (ret) {
                perror("Error:");
		printf("Found the handle size needed to be %d\n", fh->handle_size);
		printf("Trying again..\n");
		goto again;
                exit(1);
        }
        fd = open_by_handle(fh, O_RDONLY);
        if (fd <= 0 ) {
                perror("Error:");
                exit(1);
        }
	fstat(fd, &bufstat);
	ret = S_ISLNK(bufstat.st_mode);
	if (ret) {
        	memset(buf, 0 , BUFSZ);
		freadlink(fd, buf, BUFSZ);
		printf("%s is a symlink pointing to %s\n", argv[1], buf);
	}
        memset(buf, 0 , BUFSZ);
	while (1) {
		ret = read(fd, buf, BUFSZ -1);
		if (ret <= 0)
			break;
		buf[ret] = '\0';
                printf("%s", buf);
                memset(buf, 0 , BUFSZ);
        }
        return 0;
}

-aneesh


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH -V5 1/8] exportfs: Return the minimum required handle size
  2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
@ 2010-04-26 17:33 ` Aneesh Kumar K.V
  2010-04-26 17:34 ` [PATCH -V5 2/8] vfs: Add name to file handle conversion support Aneesh Kumar K.V
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:33 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V

The exportfs encode handle function should return the minimum required
handle size. This helps user to find out the handle size by passing 0
handle size in the first step and then redoing to the call again with
the returned handle size value.

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/btrfs/export.c             |    8 ++++++--
 fs/exportfs/expfs.c           |    9 +++++++--
 fs/fat/inode.c                |    4 +++-
 fs/fuse/inode.c               |    4 +++-
 fs/gfs2/export.c              |    8 ++++++--
 fs/isofs/export.c             |    8 ++++++--
 fs/ocfs2/export.c             |    8 +++++++-
 fs/reiserfs/inode.c           |    7 ++++++-
 fs/udf/namei.c                |    7 ++++++-
 fs/xfs/linux-2.6/xfs_export.c |    4 +++-
 mm/shmem.c                    |    4 +++-
 11 files changed, 56 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/export.c b/fs/btrfs/export.c
index 951ef09..5f8ee5a 100644
--- a/fs/btrfs/export.c
+++ b/fs/btrfs/export.c
@@ -21,9 +21,13 @@ static int btrfs_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
 	int len = *max_len;
 	int type;
 
-	if ((len < BTRFS_FID_SIZE_NON_CONNECTABLE) ||
-	    (connectable && len < BTRFS_FID_SIZE_CONNECTABLE))
+	if (connectable && (len < BTRFS_FID_SIZE_CONNECTABLE)) {
+		*max_len = BTRFS_FID_SIZE_CONNECTABLE;
 		return 255;
+	} else if (len < BTRFS_FID_SIZE_NON_CONNECTABLE) {
+		*max_len = BTRFS_FID_SIZE_NON_CONNECTABLE;
+		return 255;
+	}
 
 	len  = BTRFS_FID_SIZE_NON_CONNECTABLE;
 	type = FILEID_BTRFS_WITHOUT_PARENT;
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index e9e1759..cfee0f0 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -319,9 +319,14 @@ static int export_encode_fh(struct dentry *dentry, struct fid *fid,
 	struct inode * inode = dentry->d_inode;
 	int len = *max_len;
 	int type = FILEID_INO32_GEN;
-	
-	if (len < 2 || (connectable && len < 4))
+
+	if (connectable && (len < 4)) {
+		*max_len = 4;
+		return 255;
+	} else if (len < 2) {
+		*max_len = 2;
 		return 255;
+	}
 
 	len = 2;
 	fid->i32.ino = inode->i_ino;
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 0ce143b..6f83bc7 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -738,8 +738,10 @@ fat_encode_fh(struct dentry *de, __u32 *fh, int *lenp, int connectable)
 	struct inode *inode =  de->d_inode;
 	u32 ipos_h, ipos_m, ipos_l;
 
-	if (len < 5)
+	if (len < 5) {
+		*lenp = 5;
 		return 255; /* no room */
+	}
 
 	ipos_h = MSDOS_I(inode)->i_pos >> 8;
 	ipos_m = (MSDOS_I(inode)->i_pos & 0xf0) << 24;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index ec14d19..beaea69 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -638,8 +638,10 @@ static int fuse_encode_fh(struct dentry *dentry, u32 *fh, int *max_len,
 	u64 nodeid;
 	u32 generation;
 
-	if (*max_len < len)
+	if (*max_len < len) {
+		*max_len = len;
 		return  255;
+	}
 
 	nodeid = get_fuse_inode(inode)->nodeid;
 	generation = inode->i_generation;
diff --git a/fs/gfs2/export.c b/fs/gfs2/export.c
index c22c211..d022236 100644
--- a/fs/gfs2/export.c
+++ b/fs/gfs2/export.c
@@ -36,9 +36,13 @@ static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len,
 	struct super_block *sb = inode->i_sb;
 	struct gfs2_inode *ip = GFS2_I(inode);
 
-	if (*len < GFS2_SMALL_FH_SIZE ||
-	    (connectable && *len < GFS2_LARGE_FH_SIZE))
+	if (connectable && (*len < GFS2_LARGE_FH_SIZE)) {
+		*len = GFS2_LARGE_FH_SIZE;
 		return 255;
+	} else if (*len < GFS2_SMALL_FH_SIZE) {
+		*len = GFS2_SMALL_FH_SIZE;
+		return 255;
+	}
 
 	fh[0] = cpu_to_be32(ip->i_no_formal_ino >> 32);
 	fh[1] = cpu_to_be32(ip->i_no_formal_ino & 0xFFFFFFFF);
diff --git a/fs/isofs/export.c b/fs/isofs/export.c
index ed752cb..dd4687f 100644
--- a/fs/isofs/export.c
+++ b/fs/isofs/export.c
@@ -124,9 +124,13 @@ isofs_export_encode_fh(struct dentry *dentry,
 	 * offset of the inode and the upper 16 bits of fh32[1] to
 	 * hold the offset of the parent.
 	 */
-
-	if (len < 3 || (connectable && len < 5))
+	if (connectable && (len < 5)) {
+		*max_len = 5;
+		return 255;
+	} else if (len < 3) {
+		*max_len = 3;
 		return 255;
+	}
 
 	len = 3;
 	fh32[0] = ei->i_iget5_block;
diff --git a/fs/ocfs2/export.c b/fs/ocfs2/export.c
index 19ad145..250a347 100644
--- a/fs/ocfs2/export.c
+++ b/fs/ocfs2/export.c
@@ -201,8 +201,14 @@ static int ocfs2_encode_fh(struct dentry *dentry, u32 *fh_in, int *max_len,
 		   dentry->d_name.len, dentry->d_name.name,
 		   fh, len, connectable);
 
-	if (len < 3 || (connectable && len < 6)) {
+	if (connectable && (len < 6)) {
 		mlog(ML_ERROR, "fh buffer is too small for encoding\n");
+		*max_len = 6;
+		type = 255;
+		goto bail;
+	} else if (len < 3) {
+		mlog(ML_ERROR, "fh buffer is too small for encoding\n");
+		*max_len = 3;
 		type = 255;
 		goto bail;
 	}
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index dc2c65e..5fff1e2 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -1588,8 +1588,13 @@ int reiserfs_encode_fh(struct dentry *dentry, __u32 * data, int *lenp,
 	struct inode *inode = dentry->d_inode;
 	int maxlen = *lenp;
 
-	if (maxlen < 3)
+	if (need_parent && (maxlen < 5)) {
+		*lenp = 5;
 		return 255;
+	} else if (maxlen < 3) {
+		*lenp = 3;
+		return 255;
+	}
 
 	data[0] = inode->i_ino;
 	data[1] = le32_to_cpu(INODE_PKEY(inode)->k_dir_id);
diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index 7581602..37ce713 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -1360,8 +1360,13 @@ static int udf_encode_fh(struct dentry *de, __u32 *fh, int *lenp,
 	struct fid *fid = (struct fid *)fh;
 	int type = FILEID_UDF_WITHOUT_PARENT;
 
-	if (len < 3 || (connectable && len < 5))
+	if (connectable && (len < 5)) {
+		*lenp = 5;
+		return 255;
+	} else if (len < 3) {
+		*lenp = 3;
 		return 255;
+	}
 
 	*lenp = 3;
 	fid->udf.block = location.logicalBlockNum;
diff --git a/fs/xfs/linux-2.6/xfs_export.c b/fs/xfs/linux-2.6/xfs_export.c
index 846b75a..82c0553 100644
--- a/fs/xfs/linux-2.6/xfs_export.c
+++ b/fs/xfs/linux-2.6/xfs_export.c
@@ -81,8 +81,10 @@ xfs_fs_encode_fh(
 	 * seven combinations work.  The real answer is "don't use v2".
 	 */
 	len = xfs_fileid_length(fileid_type);
-	if (*max_len < len)
+	if (*max_len < len) {
+		*max_len = len
 		return 255;
+	}
 	*max_len = len;
 
 	switch (fileid_type) {
diff --git a/mm/shmem.c b/mm/shmem.c
index eef4ebe..bbeda1c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2125,8 +2125,10 @@ static int shmem_encode_fh(struct dentry *dentry, __u32 *fh, int *len,
 {
 	struct inode *inode = dentry->d_inode;
 
-	if (*len < 3)
+	if (*len < 3) {
+		*len = 3;
 		return 255;
+	}
 
 	if (hlist_unhashed(&inode->i_hash)) {
 		/* Unfortunately insert_inode_hash is not idempotent,
-- 
1.7.0.4.360.g11766c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH -V5 2/8] vfs: Add name to file handle conversion support
  2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
  2010-04-26 17:33 ` [PATCH -V5 1/8] exportfs: Return the minimum required handle size Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
  2010-04-27  1:19   ` Neil Brown
  2010-04-26 17:34 ` [PATCH -V5 3/8] vfs: Add open by file handle support Aneesh Kumar K.V
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/open.c          |  100 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h |   17 +++++++++
 2 files changed, 117 insertions(+), 0 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 74e5cd9..e9aae5c 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -30,6 +30,8 @@
 #include <linux/falloc.h>
 #include <linux/fs_struct.h>
 #include <linux/ima.h>
+#include <linux/exportfs.h>
+#include <linux/mnt_namespace.h>
 
 #include "internal.h"
 
@@ -1206,3 +1208,101 @@ int nonseekable_open(struct inode *inode, struct file *filp)
 }
 
 EXPORT_SYMBOL(nonseekable_open);
+
+/* limit the handle size to some value */
+#define MAX_HANDLE_SZ 4096
+static long do_sys_name_to_handle(struct path *path,
+			struct file_handle __user *ufh)
+{
+	int retval;
+	int handle_size;
+	struct super_block *sb;
+	struct uuid *this_fs_id;
+	struct file_handle f_handle;
+	struct file_handle *handle = NULL;
+
+	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
+		return -EFAULT;
+
+	if (f_handle.handle_size > MAX_HANDLE_SZ)
+		return -EINVAL;
+
+	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
+			GFP_KERNEL);
+	if (!handle) {
+		retval = -ENOMEM;
+		goto err_out;
+	}
+	handle_size = f_handle.handle_size;
+
+	/* we ask for a non connected handle */
+	retval = exportfs_encode_fh(path->dentry,
+				(struct fid *)handle->f_handle,
+				&handle_size,  0);
+	/* convert handle size to bytes */
+	handle_size *= sizeof(u32);
+	handle->handle_type = retval;
+	handle->handle_size = handle_size;
+	if (handle_size <= f_handle.handle_size) {
+		/* get the uuid */
+		sb = path->mnt->mnt_sb;
+		if (sb->s_op->get_fsid) {
+			this_fs_id = sb->s_op->get_fsid(sb);
+			memcpy(handle->fsid.uuid, this_fs_id->uuid,
+				sizeof(handle->fsid.uuid));
+		} else
+			memset(handle->fsid.uuid, 0, sizeof(handle->fsid.uuid));
+		retval = 0;
+	} else {
+		/*
+		 * set the handle_size to zero so we copy only
+		 * non variable part of the file_handle
+		 */
+		handle_size = 0;
+		retval = -EOVERFLOW;
+	}
+	if (copy_to_user(ufh, handle,
+				sizeof(struct file_handle) + handle_size))
+		retval = -EFAULT;
+
+	kfree(handle);
+err_out:
+	return retval;
+}
+
+SYSCALL_DEFINE2(name_to_handle, const char __user *, name,
+		struct file_handle __user *, handle)
+{
+	long ret;
+	struct path path;
+
+	/* Follow links */
+	ret = user_path(name, &path);
+	if (ret)
+		return ret;
+
+	ret = do_sys_name_to_handle(&path, handle);
+	path_put(&path);
+
+	/* avoid REGPARM breakage on x86: */
+	asmlinkage_protect(2, ret, name, handle);
+	return ret;
+}
+
+SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
+		struct file_handle __user *, handle)
+{
+	long ret;
+	struct path path;
+
+	ret = user_lpath(name, &path);
+	if (ret)
+		return ret;
+
+	ret = do_sys_name_to_handle(&path, handle);
+	path_put(&path);
+
+	/* avoid REGPARM breakage on x86: */
+	asmlinkage_protect(2, ret, name, handle);
+	return ret;
+}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 39d57bc..da28b80 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -948,6 +948,20 @@ struct file {
 	unsigned long f_mnt_write_state;
 #endif
 };
+
+struct uuid {
+	char uuid[16];
+};
+
+struct file_handle {
+	int handle_size;
+	int handle_type;
+	/* File system identifier */
+	struct uuid fsid;
+	/* file identifier */
+	char f_handle[0];
+};
+
 extern spinlock_t files_lock;
 #define file_list_lock() spin_lock(&files_lock);
 #define file_list_unlock() spin_unlock(&files_lock);
@@ -1580,6 +1594,7 @@ struct super_operations {
 	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
 #endif
 	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
+	struct uuid *(*get_fsid)(struct super_block *);
 };
 
 /*
@@ -2328,6 +2343,8 @@ extern struct super_block *get_super(struct block_device *);
 extern struct super_block *get_active_super(struct block_device *bdev);
 extern struct super_block *user_get_super(dev_t);
 extern void drop_super(struct super_block *sb);
+extern struct super_block *fs_get_sb(struct uuid *fsid);
+
 
 extern int dcache_dir_open(struct inode *, struct file *);
 extern int dcache_dir_close(struct inode *, struct file *);
-- 
1.7.0.4.360.g11766c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH -V5 3/8] vfs: Add open by file handle support
  2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
  2010-04-26 17:33 ` [PATCH -V5 1/8] exportfs: Return the minimum required handle size Aneesh Kumar K.V
  2010-04-26 17:34 ` [PATCH -V5 2/8] vfs: Add name to file handle conversion support Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
  2010-04-27  1:05   ` Neil Brown
  2010-04-26 17:34 ` [PATCH -V5 4/8] vfs: Add freadlink syscall Aneesh Kumar K.V
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/filesystems.c              |   32 +++++++++-
 fs/namei.c                    |   24 -------
 fs/namespace.c                |   38 +++++++++++
 fs/open.c                     |  148 +++++++++++++++++++++++++++++++++++++++++
 fs/pnode.c                    |    2 +-
 include/linux/mnt_namespace.h |    2 +
 include/linux/namei.h         |   24 +++++++
 7 files changed, 244 insertions(+), 26 deletions(-)

diff --git a/fs/filesystems.c b/fs/filesystems.c
index 68ba492..743d36e 100644
--- a/fs/filesystems.c
+++ b/fs/filesystems.c
@@ -281,5 +281,35 @@ struct file_system_type *get_fs_type(const char *name)
 	}
 	return fs;
 }
-
 EXPORT_SYMBOL(get_fs_type);
+
+struct super_block *fs_get_sb(struct uuid *fsid)
+{
+	struct uuid *this_fsid;
+	struct file_system_type *fs_type;
+	struct super_block *sb, *found_sb = NULL;
+
+	read_lock(&file_systems_lock);
+	fs_type = file_systems;
+	while (fs_type) {
+		spin_lock(&sb_lock);
+		list_for_each_entry(sb, &fs_type->fs_supers, s_instances) {
+			if (!sb->s_op->get_fsid)
+				continue;
+			this_fsid = sb->s_op->get_fsid(sb);
+			if (!memcmp(fsid->uuid, this_fsid->uuid,
+					sizeof(this_fsid->uuid))) {
+				/* found the matching super_block */
+				atomic_inc(&sb->s_active);
+				found_sb = sb;
+				spin_unlock(&sb_lock);
+				goto out;
+			}
+		}
+		spin_unlock(&sb_lock);
+		fs_type = fs_type->next;
+	}
+out:
+	read_unlock(&file_systems_lock);
+	return found_sb;
+}
diff --git a/fs/namei.c b/fs/namei.c
index a7dce91..a18711e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1521,30 +1521,6 @@ out_unlock:
 	return may_open(&nd->path, 0, open_flag & ~O_TRUNC);
 }
 
-/*
- * Note that while the flag value (low two bits) for sys_open means:
- *	00 - read-only
- *	01 - write-only
- *	10 - read-write
- *	11 - special
- * it is changed into
- *	00 - no permissions needed
- *	01 - read-permission
- *	10 - write-permission
- *	11 - read-write
- * for the internal routines (ie open_namei()/follow_link() etc)
- * This is more logical, and also allows the 00 "no perm needed"
- * to be used for symlinks (where the permissions are checked
- * later).
- *
-*/
-static inline int open_to_namei_flags(int flag)
-{
-	if ((flag+1) & O_ACCMODE)
-		flag++;
-	return flag;
-}
-
 static int open_will_truncate(int flag, struct inode *inode)
 {
 	/*
diff --git a/fs/namespace.c b/fs/namespace.c
index 8174c8a..6168526 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2364,3 +2364,41 @@ void put_mnt_ns(struct mnt_namespace *ns)
 	kfree(ns);
 }
 EXPORT_SYMBOL(put_mnt_ns);
+
+/*
+ * Get any vfsmount mapping the superblock in the
+ * task namespace
+ */
+struct vfsmount *fs_get_vfsmount(struct task_struct *task,
+				struct super_block *sb)
+{
+	struct nsproxy *nsp;
+	struct list_head *mount_list;
+	struct mnt_namespace *ns = NULL;
+	struct vfsmount *mnt, *sb_mnt = NULL;
+
+	rcu_read_lock();
+	nsp = task_nsproxy(task);
+	if (nsp) {
+		ns = nsp->mnt_ns;
+		if (ns)
+			get_mnt_ns(ns);
+	}
+	rcu_read_unlock();
+	if (!ns)
+		return NULL;
+	down_read(&namespace_sem);
+	list_for_each(mount_list, &ns->list) {
+		mnt = list_entry(mount_list, struct vfsmount, mnt_list);
+		if (mnt->mnt_sb == sb) {
+			/* found the matching super block */
+			sb_mnt = mnt;
+			mntget(sb_mnt);
+			break;
+		}
+	}
+	up_read(&namespace_sem);
+
+	put_mnt_ns(ns);
+	return sb_mnt;
+}
diff --git a/fs/open.c b/fs/open.c
index e9aae5c..b87fa32 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1306,3 +1306,151 @@ SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
 	asmlinkage_protect(2, ret, name, handle);
 	return ret;
 }
+
+static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
+{
+	return 1;
+}
+
+static struct dentry *handle_to_dentry(struct vfsmount *mnt,
+				struct file_handle *handle)
+{
+	int handle_size;
+	struct dentry *dentry;
+
+	/* change the handle size to multiple of sizeof(u32) */
+	handle_size = handle->handle_size >> 2;
+	dentry = exportfs_decode_fh(mnt, (struct fid *)handle->f_handle,
+					handle_size, handle->handle_type,
+					vfs_dentry_acceptable, NULL);
+	return dentry;
+}
+
+long do_sys_open_by_handle(struct file_handle __user *ufh, int flags)
+{
+	int fd;
+	int retval = 0;
+	int d_flags  = flags;
+	struct file *filp;
+	struct vfsmount *mnt;
+	struct inode *inode;
+	struct dentry *dentry;
+	struct super_block *sb;
+	struct file_handle f_handle;
+	struct file_handle *handle = NULL;
+
+	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
+		return  -EFAULT;
+
+	if ((f_handle.handle_size > MAX_HANDLE_SZ) ||
+		(f_handle.handle_size <= 0))
+		return -EINVAL;
+
+	if (!capable(CAP_DAC_OVERRIDE))
+		return -EPERM;
+
+	sb = fs_get_sb(&f_handle.fsid);
+	if (!sb)
+		return -ESTALE;
+	/*
+	 * Find the vfsmount for this superblock in the
+	 * current namespace
+	 */
+	mnt = fs_get_vfsmount(current, sb);
+	if (!mnt) {
+		retval = -ESTALE;
+		goto out_sb;
+	}
+
+	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
+			GFP_KERNEL);
+	if (!handle) {
+		retval =  -ENOMEM;
+		goto out_mnt;
+	}
+	/* copy the full handle */
+	if (copy_from_user(handle, ufh,
+				sizeof(struct file_handle) +
+				f_handle.handle_size)) {
+		retval = -EFAULT;
+		goto out_mnt;
+	}
+
+	dentry = handle_to_dentry(mnt, handle);
+	if (IS_ERR(dentry)) {
+		retval = PTR_ERR(dentry);
+		goto out_mnt;
+	}
+
+	inode = dentry->d_inode;
+
+	flags  = open_to_namei_flags(flags);
+	/* O_TRUNC implies we need access checks for write permissions */
+	if (flags & O_TRUNC)
+		flags |= MAY_WRITE;
+
+	if ((!(flags & O_APPEND) || (flags & O_TRUNC)) &&
+		(flags & FMODE_WRITE) && IS_APPEND(inode)) {
+		retval = -EPERM;
+		goto out_err;
+	}
+
+	if ((flags & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
+		retval = -EACCES;
+		goto out_err;
+	}
+
+	/* Can't write directories. */
+	if (S_ISDIR(inode->i_mode) && (flags & FMODE_WRITE)) {
+		retval = -EISDIR;
+		goto out_err;
+	}
+
+	fd = get_unused_fd();
+	if (fd < 0) {
+		retval = fd;
+		goto out_err;
+	}
+
+	filp = dentry_open(dentry, mntget(mnt),
+			d_flags, current_cred());
+	if (IS_ERR(filp)) {
+		put_unused_fd(fd);
+		retval =  PTR_ERR(filp);
+		goto out_err;
+	}
+
+	if (inode->i_mode & S_IFREG) {
+		filp->f_flags |= O_NOATIME;
+		filp->f_mode |= FMODE_NOCMTIME;
+	}
+	fsnotify_open(filp->f_path.dentry);
+	fd_install(fd, filp);
+	retval = fd;
+	goto out_mnt;
+
+out_err:
+	dput(dentry);
+out_mnt:
+	kfree(handle);
+	mntput(mnt);
+out_sb:
+	deactivate_super(sb);
+
+	return retval;
+}
+
+SYSCALL_DEFINE2(open_by_handle, struct file_handle __user *, handle,
+		int, flags)
+{
+	long ret;
+
+	if (force_o_largefile())
+		flags |= O_LARGEFILE;
+
+	ret = do_sys_open_by_handle(handle, flags);
+
+	/* avoid REGPARM breakage on x86: */
+	asmlinkage_protect(2, ret, handle, flags);
+	return ret;
+}
diff --git a/fs/pnode.c b/fs/pnode.c
index 5cc564a..9f6d12d 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -6,9 +6,9 @@
  *	Author : Ram Pai (linuxram@us.ibm.com)
  *
  */
+#include <linux/fs.h>
 #include <linux/mnt_namespace.h>
 #include <linux/mount.h>
-#include <linux/fs.h>
 #include "internal.h"
 #include "pnode.h"
 
diff --git a/include/linux/mnt_namespace.h b/include/linux/mnt_namespace.h
index 0b89efc..d363ecc 100644
--- a/include/linux/mnt_namespace.h
+++ b/include/linux/mnt_namespace.h
@@ -36,6 +36,8 @@ extern const struct seq_operations mounts_op;
 extern const struct seq_operations mountinfo_op;
 extern const struct seq_operations mountstats_op;
 extern int mnt_had_events(struct proc_mounts *);
+extern struct vfsmount *fs_get_vfsmount(struct task_struct *task,
+					struct super_block *sb);
 
 #endif
 #endif
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 05b441d..a853aa0 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -4,6 +4,7 @@
 #include <linux/dcache.h>
 #include <linux/linkage.h>
 #include <linux/path.h>
+#include <asm-generic/fcntl.h>
 
 struct vfsmount;
 
@@ -96,4 +97,27 @@ static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
 	((char *) name)[min(len, maxlen)] = '\0';
 }
 
+/*
+ * Note that while the flag value (low two bits) for sys_open means:
+ *	00 - read-only
+ *	01 - write-only
+ *	10 - read-write
+ *	11 - special
+ * it is changed into
+ *	00 - no permissions needed
+ *	01 - read-permission
+ *	10 - write-permission
+ *	11 - read-write
+ * for the internal routines (ie open_namei()/follow_link() etc)
+ * This is more logical, and also allows the 00 "no perm needed"
+ * to be used for symlinks (where the permissions are checked
+ * later).
+ *
+*/
+static inline int open_to_namei_flags(int flag)
+{
+	if ((flag+1) & O_ACCMODE)
+		flag++;
+	return flag;
+}
 #endif /* _LINUX_NAMEI_H */
-- 
1.7.0.4.360.g11766c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH -V5 4/8] vfs: Add freadlink syscall
  2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2010-04-26 17:34 ` [PATCH -V5 3/8] vfs: Add open by file handle support Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
  2010-04-26 17:34 ` [PATCH -V5 5/8] ext4: Add get_fsid callback Aneesh Kumar K.V
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V

This enables to use open-by-handle and then get the link target
details of a symlink using the fd returned by handle

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/stat.c |   27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/fs/stat.c b/fs/stat.c
index c4ecd52..0fbab00 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -314,6 +314,33 @@ SYSCALL_DEFINE3(readlink, const char __user *, path, char __user *, buf,
 	return sys_readlinkat(AT_FDCWD, path, buf, bufsiz);
 }
 
+SYSCALL_DEFINE3(freadlink, int, fd, char __user *, buf,
+		int, bufsiz)
+{
+	int fput_needed;
+	struct file *file;
+	int error = -EBADF;
+	struct inode *inode;
+
+	file = fget_light(fd, &fput_needed);
+	if (!file)
+		return error;
+
+	inode = file->f_path.dentry->d_inode;
+	error = -EINVAL;
+	if (inode->i_op->readlink) {
+		error = security_inode_readlink(file->f_path.dentry);
+		if (!error) {
+			touch_atime(file->f_path.mnt, file->f_path.dentry);
+			error = inode->i_op->readlink(file->f_path.dentry,
+						buf, bufsiz);
+		}
+	}
+	fput_light(file, fput_needed);
+	return error;
+}
+
+
 
 /* ---------- LFS-64 ----------- */
 #ifdef __ARCH_WANT_STAT64
-- 
1.7.0.4.360.g11766c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH -V5 5/8] ext4: Add get_fsid callback
  2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (3 preceding siblings ...)
  2010-04-26 17:34 ` [PATCH -V5 4/8] vfs: Add freadlink syscall Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
  2010-04-26 17:34 ` [PATCH -V5 6/8] x86: Add new syscalls for x86_32 Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext4/super.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e14d22c..b485f0a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1049,6 +1049,14 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
 	return try_to_free_buffers(page);
 }
 
+static struct uuid *ext4_get_fsid(struct super_block *sb)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct ext4_super_block *es = sbi->s_es;
+
+	return (struct uuid *)es->s_uuid;
+}
+
 #ifdef CONFIG_QUOTA
 #define QTYPE2NAME(t) ((t) == USRQUOTA ? "user" : "group")
 #define QTYPE2MOPT(on, t) ((t) == USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
@@ -1109,6 +1117,7 @@ static const struct super_operations ext4_sops = {
 	.quota_write	= ext4_quota_write,
 #endif
 	.bdev_try_to_free_page = bdev_try_to_free_page,
+	.get_fsid	= ext4_get_fsid,
 };
 
 static const struct super_operations ext4_nojournal_sops = {
@@ -1128,6 +1137,7 @@ static const struct super_operations ext4_nojournal_sops = {
 	.quota_write	= ext4_quota_write,
 #endif
 	.bdev_try_to_free_page = bdev_try_to_free_page,
+	.get_fsid	= ext4_get_fsid,
 };
 
 static const struct export_operations ext4_export_ops = {
-- 
1.7.0.4.360.g11766c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH -V5 6/8] x86: Add new syscalls for x86_32
  2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (4 preceding siblings ...)
  2010-04-26 17:34 ` [PATCH -V5 5/8] ext4: Add get_fsid callback Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
  2010-04-26 17:34 ` [PATCH -V5 7/8] x86: Add new syscalls for x86_64 Aneesh Kumar K.V
  2010-04-26 17:34 ` [PATCH -V5 8/8] ext3: Add get_fsid callback Aneesh Kumar K.V
  7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V

This patch adds sys_name_to_handle, sys_lname_to_handle, sys_open_by_handle
and sys_freadlink syscalls to x86_32

Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/x86/include/asm/unistd_32.h   |    6 +++++-
 arch/x86/kernel/syscall_table_32.S |    4 ++++
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
index beb9b5f..7ea7101 100644
--- a/arch/x86/include/asm/unistd_32.h
+++ b/arch/x86/include/asm/unistd_32.h
@@ -343,10 +343,14 @@
 #define __NR_rt_tgsigqueueinfo	335
 #define __NR_perf_event_open	336
 #define __NR_recvmmsg		337
+#define __NR_name_to_handle	338
+#define __NR_lname_to_handle    339
+#define __NR_open_by_handle     340
+#define __NR_freadlink          341
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 338
+#define NR_syscalls 342
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index 8b37293..caa3f0c 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -337,3 +337,7 @@ ENTRY(sys_call_table)
 	.long sys_rt_tgsigqueueinfo	/* 335 */
 	.long sys_perf_event_open
 	.long sys_recvmmsg
+	.long sys_name_to_handle
+	.long sys_lname_to_handle
+	.long sys_open_by_handle
+	.long sys_freadlink		/* 341 */
-- 
1.7.0.4.360.g11766c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH -V5 7/8] x86: Add new syscalls for x86_64
  2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (5 preceding siblings ...)
  2010-04-26 17:34 ` [PATCH -V5 6/8] x86: Add new syscalls for x86_32 Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
  2010-04-26 17:34 ` [PATCH -V5 8/8] ext3: Add get_fsid callback Aneesh Kumar K.V
  7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V

Add sys_name_to_handle, sys_lname_to_handle, sys_open_by_handle syscall
and sys_freadlink for x86_64

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/x86/ia32/ia32entry.S        |    4 ++++
 arch/x86/include/asm/unistd_64.h |    8 ++++++++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 59b4556..927cc8b 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -842,4 +842,8 @@ ia32_sys_call_table:
 	.quad compat_sys_rt_tgsigqueueinfo	/* 335 */
 	.quad sys_perf_event_open
 	.quad compat_sys_recvmmsg
+	.quad sys_name_to_handle
+	.quad sys_lname_to_handle
+	.quad sys_open_by_handle
+	.quad sys_freadlink			/* 341 */
 ia32_syscall_end:
diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
index ff4307b..a495455 100644
--- a/arch/x86/include/asm/unistd_64.h
+++ b/arch/x86/include/asm/unistd_64.h
@@ -663,6 +663,14 @@ __SYSCALL(__NR_rt_tgsigqueueinfo, sys_rt_tgsigqueueinfo)
 __SYSCALL(__NR_perf_event_open, sys_perf_event_open)
 #define __NR_recvmmsg				299
 __SYSCALL(__NR_recvmmsg, sys_recvmmsg)
+#define __NR_name_to_handle			300
+__SYSCALL(__NR_name_to_handle, sys_name_to_handle)
+#define __NR_lname_to_handle			301
+__SYSCALL(__NR_lname_to_handle, sys_lname_to_handle)
+#define __NR_open_by_handle			302
+__SYSCALL(__NR_open_by_handle, sys_open_by_handle)
+#define __NR_freadlink				303
+__SYSCALL(__NR_freadlink, sys_freadlink)
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR
-- 
1.7.0.4.360.g11766c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH -V5 8/8] ext3: Add get_fsid callback
  2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
                   ` (6 preceding siblings ...)
  2010-04-26 17:34 ` [PATCH -V5 7/8] x86: Add new syscalls for x86_64 Aneesh Kumar K.V
@ 2010-04-26 17:34 ` Aneesh Kumar K.V
  7 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2010-04-26 17:34 UTC (permalink / raw)
  To: hch, viro, adilger, corbet, serue, neilb
  Cc: linux-fsdevel, sfrench, Aneesh Kumar K.V

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext3/super.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 1bee604..dfe50a4 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -734,6 +734,14 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
 	return try_to_free_buffers(page);
 }
 
+static struct uuid *ext3_get_fsid(struct super_block *sb)
+{
+	struct ext3_sb_info *sbi = EXT3_SB(sb);
+	struct ext3_super_block *es = sbi->s_es;
+
+	return (struct uuid *)es->s_uuid;
+}
+
 #ifdef CONFIG_QUOTA
 #define QTYPE2NAME(t) ((t)==USRQUOTA?"user":"group")
 #define QTYPE2MOPT(on, t) ((t)==USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
@@ -791,6 +799,7 @@ static const struct super_operations ext3_sops = {
 	.quota_write	= ext3_quota_write,
 #endif
 	.bdev_try_to_free_page = bdev_try_to_free_page,
+	.get_fsid	= ext3_get_fsid,
 };
 
 static const struct export_operations ext3_export_ops = {
-- 
1.7.0.4.360.g11766c


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH -V5 3/8] vfs: Add open by file handle support
  2010-04-26 17:34 ` [PATCH -V5 3/8] vfs: Add open by file handle support Aneesh Kumar K.V
@ 2010-04-27  1:05   ` Neil Brown
  2010-04-27  6:10     ` Aneesh Kumar K. V
  0 siblings, 1 reply; 13+ messages in thread
From: Neil Brown @ 2010-04-27  1:05 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench

On Mon, 26 Apr 2010 23:04:01 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> Acked-by: Serge Hallyn <serue@us.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  fs/filesystems.c              |   32 +++++++++-
>  fs/namei.c                    |   24 -------
>  fs/namespace.c                |   38 +++++++++++
>  fs/open.c                     |  148 +++++++++++++++++++++++++++++++++++++++++
>  fs/pnode.c                    |    2 +-
>  include/linux/mnt_namespace.h |    2 +
>  include/linux/namei.h         |   24 +++++++
>  7 files changed, 244 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/filesystems.c b/fs/filesystems.c
> index 68ba492..743d36e 100644
> --- a/fs/filesystems.c
> +++ b/fs/filesystems.c
> @@ -281,5 +281,35 @@ struct file_system_type *get_fs_type(const char *name)
>  	}
>  	return fs;
>  }
> -
>  EXPORT_SYMBOL(get_fs_type);
> +
> +struct super_block *fs_get_sb(struct uuid *fsid)
> +{
> +	struct uuid *this_fsid;
> +	struct file_system_type *fs_type;
> +	struct super_block *sb, *found_sb = NULL;
> +
> +	read_lock(&file_systems_lock);
> +	fs_type = file_systems;
> +	while (fs_type) {

You've open-coded a for-loop here...
Why not:
        for (fs_type = file_systems ; fs_type ; fs_type = fs_type->next) {
??


> +		spin_lock(&sb_lock);
> +		list_for_each_entry(sb, &fs_type->fs_supers, s_instances) {
> +			if (!sb->s_op->get_fsid)
> +				continue;
> +			this_fsid = sb->s_op->get_fsid(sb);
> +			if (!memcmp(fsid->uuid, this_fsid->uuid,
> +					sizeof(this_fsid->uuid))) {
> +				/* found the matching super_block */
> +				atomic_inc(&sb->s_active);
> +				found_sb = sb;
> +				spin_unlock(&sb_lock);
> +				goto out;
> +			}
> +		}
> +		spin_unlock(&sb_lock);
> +		fs_type = fs_type->next;
> +	}
> +out:
> +	read_unlock(&file_systems_lock);
> +	return found_sb;
> +}
> diff --git a/fs/namei.c b/fs/namei.c
> index a7dce91..a18711e 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1521,30 +1521,6 @@ out_unlock:
>  	return may_open(&nd->path, 0, open_flag & ~O_TRUNC);
>  }
>  
> -/*
> - * Note that while the flag value (low two bits) for sys_open means:
> - *	00 - read-only
> - *	01 - write-only
> - *	10 - read-write
> - *	11 - special
> - * it is changed into
> - *	00 - no permissions needed
> - *	01 - read-permission
> - *	10 - write-permission
> - *	11 - read-write
> - * for the internal routines (ie open_namei()/follow_link() etc)
> - * This is more logical, and also allows the 00 "no perm needed"
> - * to be used for symlinks (where the permissions are checked
> - * later).
> - *
> -*/
> -static inline int open_to_namei_flags(int flag)
> -{
> -	if ((flag+1) & O_ACCMODE)
> -		flag++;
> -	return flag;
> -}
> -
>  static int open_will_truncate(int flag, struct inode *inode)
>  {
>  	/*
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 8174c8a..6168526 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2364,3 +2364,41 @@ void put_mnt_ns(struct mnt_namespace *ns)
>  	kfree(ns);
>  }
>  EXPORT_SYMBOL(put_mnt_ns);
> +
> +/*
> + * Get any vfsmount mapping the superblock in the
> + * task namespace
> + */
> +struct vfsmount *fs_get_vfsmount(struct task_struct *task,
> +				struct super_block *sb)
> +{
> +	struct nsproxy *nsp;
> +	struct list_head *mount_list;
> +	struct mnt_namespace *ns = NULL;
> +	struct vfsmount *mnt, *sb_mnt = NULL;
> +
> +	rcu_read_lock();
> +	nsp = task_nsproxy(task);
> +	if (nsp) {
> +		ns = nsp->mnt_ns;
> +		if (ns)
> +			get_mnt_ns(ns);
> +	}
> +	rcu_read_unlock();
> +	if (!ns)
> +		return NULL;
> +	down_read(&namespace_sem);
> +	list_for_each(mount_list, &ns->list) {
> +		mnt = list_entry(mount_list, struct vfsmount, mnt_list);
> +		if (mnt->mnt_sb == sb) {
> +			/* found the matching super block */
> +			sb_mnt = mnt;
> +			mntget(sb_mnt);
> +			break;
> +		}
> +	}
> +	up_read(&namespace_sem);
> +
> +	put_mnt_ns(ns);
> +	return sb_mnt;
> +}

If a task has the same filesystem mounted several times in it's namespace
(via "mount --bind") you just return any arbitrary one (the first).

Suppose that the filesystem does appear twice, and that in one appearance a
particular directory is mounted on, while in another appearance that
directory is not mounted on.  Suppose that the mounted-on appearance is
listed first in ns->list

Now we use a series of "name_to_handle" / "open_by_handle" calls to descend
through the second appearance to something beneath that directory which (in
the first appearance) is mounted-on.
This will not work as you will actually be descending the first appearance of
the filesystem.
I really think you should leave the choice of filesystem/mountpoint to
user-space.


> diff --git a/fs/open.c b/fs/open.c
> index e9aae5c..b87fa32 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -1306,3 +1306,151 @@ SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
>  	asmlinkage_protect(2, ret, name, handle);
>  	return ret;
>  }
> +
> +static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
> +{
> +	return 1;
> +}
> +
> +static struct dentry *handle_to_dentry(struct vfsmount *mnt,
> +				struct file_handle *handle)
> +{
> +	int handle_size;
> +	struct dentry *dentry;
> +
> +	/* change the handle size to multiple of sizeof(u32) */
> +	handle_size = handle->handle_size >> 2;
> +	dentry = exportfs_decode_fh(mnt, (struct fid *)handle->f_handle,
> +					handle_size, handle->handle_type,
> +					vfs_dentry_acceptable, NULL);
> +	return dentry;
> +}
> +
> +long do_sys_open_by_handle(struct file_handle __user *ufh, int flags)
> +{
> +	int fd;
> +	int retval = 0;
> +	int d_flags  = flags;
> +	struct file *filp;
> +	struct vfsmount *mnt;
> +	struct inode *inode;
> +	struct dentry *dentry;
> +	struct super_block *sb;
> +	struct file_handle f_handle;
> +	struct file_handle *handle = NULL;
> +
> +	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
> +		return  -EFAULT;
> +
> +	if ((f_handle.handle_size > MAX_HANDLE_SZ) ||
> +		(f_handle.handle_size <= 0))
> +		return -EINVAL;
> +
> +	if (!capable(CAP_DAC_OVERRIDE))
> +		return -EPERM;
> +
> +	sb = fs_get_sb(&f_handle.fsid);
> +	if (!sb)
> +		return -ESTALE;
> +	/*
> +	 * Find the vfsmount for this superblock in the
> +	 * current namespace
> +	 */
> +	mnt = fs_get_vfsmount(current, sb);
> +	if (!mnt) {
> +		retval = -ESTALE;
> +		goto out_sb;
> +	}
> +
> +	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
> +			GFP_KERNEL);
> +	if (!handle) {
> +		retval =  -ENOMEM;
> +		goto out_mnt;
> +	}
> +	/* copy the full handle */
> +	if (copy_from_user(handle, ufh,
> +				sizeof(struct file_handle) +
> +				f_handle.handle_size)) {
> +		retval = -EFAULT;
> +		goto out_mnt;
> +	}
> +
> +	dentry = handle_to_dentry(mnt, handle);
> +	if (IS_ERR(dentry)) {
> +		retval = PTR_ERR(dentry);
> +		goto out_mnt;
> +	}
> +
> +	inode = dentry->d_inode;
> +
> +	flags  = open_to_namei_flags(flags);
> +	/* O_TRUNC implies we need access checks for write permissions */
> +	if (flags & O_TRUNC)
> +		flags |= MAY_WRITE;
> +
> +	if ((!(flags & O_APPEND) || (flags & O_TRUNC)) &&
> +		(flags & FMODE_WRITE) && IS_APPEND(inode)) {
> +		retval = -EPERM;
> +		goto out_err;
> +	}
> +
> +	if ((flags & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
> +		retval = -EACCES;
> +		goto out_err;
> +	}
> +
> +	/* Can't write directories. */
> +	if (S_ISDIR(inode->i_mode) && (flags & FMODE_WRITE)) {
> +		retval = -EISDIR;
> +		goto out_err;
> +	}
> +
> +	fd = get_unused_fd();

You want to use alloc_fd() (or get_unused_fd_flags) rather than get_unused_fd
so that the flags can be passed though and, in particular, so O_CLOEXEC will
be honoured.


> +	if (fd < 0) {
> +		retval = fd;
> +		goto out_err;
> +	}
> +
> +	filp = dentry_open(dentry, mntget(mnt),
> +			d_flags, current_cred());

dentry_open consumed the reference to dentry, so ...

> +	if (IS_ERR(filp)) {
> +		put_unused_fd(fd);
> +		retval =  PTR_ERR(filp);
> +		goto out_err;

The dput at 'out_err' will put it one time too many.
I think the best fix would be to pass dget(dentry) to dentry_open, then ....

> +	}
> +
> +	if (inode->i_mode & S_IFREG) {
> +		filp->f_flags |= O_NOATIME;
> +		filp->f_mode |= FMODE_NOCMTIME;
> +	}
> +	fsnotify_open(filp->f_path.dentry);
> +	fd_install(fd, filp);
> +	retval = fd;
> +	goto out_mnt;

... this goto (Which is rather ugly to me) would not be needed.

NeilBrown


> +
> +out_err:
> +	dput(dentry);
> +out_mnt:
> +	kfree(handle);
> +	mntput(mnt);
> +out_sb:
> +	deactivate_super(sb);
> +
> +	return retval;
> +}
> +
> +SYSCALL_DEFINE2(open_by_handle, struct file_handle __user *, handle,
> +		int, flags)
> +{
> +	long ret;
> +
> +	if (force_o_largefile())
> +		flags |= O_LARGEFILE;
> +
> +	ret = do_sys_open_by_handle(handle, flags);
> +
> +	/* avoid REGPARM breakage on x86: */
> +	asmlinkage_protect(2, ret, handle, flags);
> +	return ret;
> +}
> diff --git a/fs/pnode.c b/fs/pnode.c
> index 5cc564a..9f6d12d 100644
> --- a/fs/pnode.c
> +++ b/fs/pnode.c
> @@ -6,9 +6,9 @@
>   *	Author : Ram Pai (linuxram@us.ibm.com)
>   *
>   */
> +#include <linux/fs.h>
>  #include <linux/mnt_namespace.h>
>  #include <linux/mount.h>
> -#include <linux/fs.h>
>  #include "internal.h"
>  #include "pnode.h"
>  
> diff --git a/include/linux/mnt_namespace.h b/include/linux/mnt_namespace.h
> index 0b89efc..d363ecc 100644
> --- a/include/linux/mnt_namespace.h
> +++ b/include/linux/mnt_namespace.h
> @@ -36,6 +36,8 @@ extern const struct seq_operations mounts_op;
>  extern const struct seq_operations mountinfo_op;
>  extern const struct seq_operations mountstats_op;
>  extern int mnt_had_events(struct proc_mounts *);
> +extern struct vfsmount *fs_get_vfsmount(struct task_struct *task,
> +					struct super_block *sb);
>  
>  #endif
>  #endif
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 05b441d..a853aa0 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -4,6 +4,7 @@
>  #include <linux/dcache.h>
>  #include <linux/linkage.h>
>  #include <linux/path.h>
> +#include <asm-generic/fcntl.h>
>  
>  struct vfsmount;
>  
> @@ -96,4 +97,27 @@ static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
>  	((char *) name)[min(len, maxlen)] = '\0';
>  }
>  
> +/*
> + * Note that while the flag value (low two bits) for sys_open means:
> + *	00 - read-only
> + *	01 - write-only
> + *	10 - read-write
> + *	11 - special
> + * it is changed into
> + *	00 - no permissions needed
> + *	01 - read-permission
> + *	10 - write-permission
> + *	11 - read-write
> + * for the internal routines (ie open_namei()/follow_link() etc)
> + * This is more logical, and also allows the 00 "no perm needed"
> + * to be used for symlinks (where the permissions are checked
> + * later).
> + *
> +*/
> +static inline int open_to_namei_flags(int flag)
> +{
> +	if ((flag+1) & O_ACCMODE)
> +		flag++;
> +	return flag;
> +}
>  #endif /* _LINUX_NAMEI_H */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -V5 2/8] vfs: Add name to file handle conversion support
  2010-04-26 17:34 ` [PATCH -V5 2/8] vfs: Add name to file handle conversion support Aneesh Kumar K.V
@ 2010-04-27  1:19   ` Neil Brown
  2010-04-27  6:08     ` Aneesh Kumar K. V
  0 siblings, 1 reply; 13+ messages in thread
From: Neil Brown @ 2010-04-27  1:19 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench

On Mon, 26 Apr 2010 23:04:00 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> Acked-by: Serge Hallyn <serue@us.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  fs/open.c          |  100 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h |   17 +++++++++
>  2 files changed, 117 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index 74e5cd9..e9aae5c 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -30,6 +30,8 @@
>  #include <linux/falloc.h>
>  #include <linux/fs_struct.h>
>  #include <linux/ima.h>
> +#include <linux/exportfs.h>
> +#include <linux/mnt_namespace.h>
>  
>  #include "internal.h"
>  
> @@ -1206,3 +1208,101 @@ int nonseekable_open(struct inode *inode, struct file *filp)
>  }
>  
>  EXPORT_SYMBOL(nonseekable_open);
> +
> +/* limit the handle size to some value */
> +#define MAX_HANDLE_SZ 4096
> +static long do_sys_name_to_handle(struct path *path,
> +			struct file_handle __user *ufh)
> +{
> +	int retval;
> +	int handle_size;
> +	struct super_block *sb;
> +	struct uuid *this_fs_id;
> +	struct file_handle f_handle;
> +	struct file_handle *handle = NULL;
> +
> +	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
> +		return -EFAULT;
> +
> +	if (f_handle.handle_size > MAX_HANDLE_SZ)
> +		return -EINVAL;
> +
> +	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
> +			GFP_KERNEL);
> +	if (!handle) {
> +		retval = -ENOMEM;
> +		goto err_out;
> +	}

You are mixing your patterns here.
In some cases you do:
   if (something fails)
           return -ERROR
other times you do

   if (something fails) {
            retval = -ERROR;
            goto err_out;
   }

which has exactly the same fix.  Being consistent is best and I would
recommend using the goto option - it is more maintainable.
I would actually prefer:

    retval = -ERROR;
    if (something fails)
          goto err_out;

but leave the choice to you.  Consistency is the important thing.



> +	handle_size = f_handle.handle_size;
> +
> +	/* we ask for a non connected handle */
> +	retval = exportfs_encode_fh(path->dentry,
> +				(struct fid *)handle->f_handle,
> +				&handle_size,  0);
> +	/* convert handle size to bytes */
> +	handle_size *= sizeof(u32);
> +	handle->handle_type = retval;
> +	handle->handle_size = handle_size;
> +	if (handle_size <= f_handle.handle_size) {
> +		/* get the uuid */
> +		sb = path->mnt->mnt_sb;
> +		if (sb->s_op->get_fsid) {
> +			this_fs_id = sb->s_op->get_fsid(sb);
> +			memcpy(handle->fsid.uuid, this_fs_id->uuid,
> +				sizeof(handle->fsid.uuid));
> +		} else
> +			memset(handle->fsid.uuid, 0, sizeof(handle->fsid.uuid));
> +		retval = 0;
> +	} else {
> +		/*
> +		 * set the handle_size to zero so we copy only
> +		 * non variable part of the file_handle
> +		 */
> +		handle_size = 0;
> +		retval = -EOVERFLOW;
> +	}
> +	if (copy_to_user(ufh, handle,
> +				sizeof(struct file_handle) + handle_size))
> +		retval = -EFAULT;
> +
> +	kfree(handle);
> +err_out:
> +	return retval;
> +}
> +
> +SYSCALL_DEFINE2(name_to_handle, const char __user *, name,
> +		struct file_handle __user *, handle)
> +{
> +	long ret;
> +	struct path path;

*Surely* you want name_to_handle_at, or name_at_to_handle here.
i.e. include an int 'dfd' to specify the starting directory.

Then you can make use of AT_SYMLINK_NOFOLLOW and so avoid the need for
2 syscalls.


> +
> +	/* Follow links */
> +	ret = user_path(name, &path);
> +	if (ret)
> +		return ret;
> +
> +	ret = do_sys_name_to_handle(&path, handle);
> +	path_put(&path);
> +
> +	/* avoid REGPARM breakage on x86: */
> +	asmlinkage_protect(2, ret, name, handle);
> +	return ret;
> +}
> +
> +SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
> +		struct file_handle __user *, handle)
> +{
> +	long ret;
> +	struct path path;
> +
> +	ret = user_lpath(name, &path);
> +	if (ret)
> +		return ret;
> +
> +	ret = do_sys_name_to_handle(&path, handle);
> +	path_put(&path);
> +
> +	/* avoid REGPARM breakage on x86: */
> +	asmlinkage_protect(2, ret, name, handle);
> +	return ret;
> +}
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 39d57bc..da28b80 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -948,6 +948,20 @@ struct file {
>  	unsigned long f_mnt_write_state;
>  #endif
>  };
> +
> +struct uuid {
> +	char uuid[16];

unsigned char??? just a thought.


> +};
> +
> +struct file_handle {
> +	int handle_size;
> +	int handle_type;
> +	/* File system identifier */
> +	struct uuid fsid;
> +	/* file identifier */
> +	char f_handle[0];
> +};
> +
>  extern spinlock_t files_lock;
>  #define file_list_lock() spin_lock(&files_lock);
>  #define file_list_unlock() spin_unlock(&files_lock);
> @@ -1580,6 +1594,7 @@ struct super_operations {
>  	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
>  #endif
>  	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
> +	struct uuid *(*get_fsid)(struct super_block *);

Assuming this was actually a good idea (which I am not yet sold on),
I think you would do better to pass in the address of a uuid to be filled in:

       int (*get_fsid)(struct super_block *, struct uuid *)

just incase the filesystem stored something in a slightly different format.

And as XFS has a 'nouuid' mount option, it might be best to allow an error
return to say "there is no uuid" - maybe expect -ENOENT ??

NeilBrown

>  };
>  
>  /*
> @@ -2328,6 +2343,8 @@ extern struct super_block *get_super(struct block_device *);
>  extern struct super_block *get_active_super(struct block_device *bdev);
>  extern struct super_block *user_get_super(dev_t);
>  extern void drop_super(struct super_block *sb);
> +extern struct super_block *fs_get_sb(struct uuid *fsid);
> +
>  
>  extern int dcache_dir_open(struct inode *, struct file *);
>  extern int dcache_dir_close(struct inode *, struct file *);


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -V5 2/8] vfs: Add name to file handle conversion support
  2010-04-27  1:19   ` Neil Brown
@ 2010-04-27  6:08     ` Aneesh Kumar K. V
  0 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K. V @ 2010-04-27  6:08 UTC (permalink / raw)
  To: Neil Brown; +Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench

On Tue, 27 Apr 2010 11:19:49 +1000, Neil Brown <neilb@suse.de> wrote:
> On Mon, 26 Apr 2010 23:04:00 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > Acked-by: Serge Hallyn <serue@us.ibm.com>
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > ---
> >  fs/open.c          |  100 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/fs.h |   17 +++++++++
> >  2 files changed, 117 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/open.c b/fs/open.c
> > index 74e5cd9..e9aae5c 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -30,6 +30,8 @@
> >  #include <linux/falloc.h>
> >  #include <linux/fs_struct.h>
> >  #include <linux/ima.h>
> > +#include <linux/exportfs.h>
> > +#include <linux/mnt_namespace.h>
> >  
> >  #include "internal.h"
> >  
> > @@ -1206,3 +1208,101 @@ int nonseekable_open(struct inode *inode, struct file *filp)
> >  }
> >  
> >  EXPORT_SYMBOL(nonseekable_open);
> > +
> > +/* limit the handle size to some value */
> > +#define MAX_HANDLE_SZ 4096
> > +static long do_sys_name_to_handle(struct path *path,
> > +			struct file_handle __user *ufh)
> > +{
> > +	int retval;
> > +	int handle_size;
> > +	struct super_block *sb;
> > +	struct uuid *this_fs_id;
> > +	struct file_handle f_handle;
> > +	struct file_handle *handle = NULL;
> > +
> > +	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
> > +		return -EFAULT;
> > +
> > +	if (f_handle.handle_size > MAX_HANDLE_SZ)
> > +		return -EINVAL;
> > +
> > +	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
> > +			GFP_KERNEL);
> > +	if (!handle) {
> > +		retval = -ENOMEM;
> > +		goto err_out;
> > +	}
> 
> You are mixing your patterns here.
> In some cases you do:
>    if (something fails)
>            return -ERROR
> other times you do
> 
>    if (something fails) {
>             retval = -ERROR;
>             goto err_out;
>    }
> 
> which has exactly the same fix.  Being consistent is best and I would
> recommend using the goto option - it is more maintainable.
> I would actually prefer:
> 
>     retval = -ERROR;
>     if (something fails)
>           goto err_out;
> 
> but leave the choice to you.  Consistency is the important thing.


Fixed as per you suggestion.

> 
> 
> 
> > +	handle_size = f_handle.handle_size;
> > +
> > +	/* we ask for a non connected handle */
> > +	retval = exportfs_encode_fh(path->dentry,
> > +				(struct fid *)handle->f_handle,
> > +				&handle_size,  0);
> > +	/* convert handle size to bytes */
> > +	handle_size *= sizeof(u32);
> > +	handle->handle_type = retval;
> > +	handle->handle_size = handle_size;
> > +	if (handle_size <= f_handle.handle_size) {
> > +		/* get the uuid */
> > +		sb = path->mnt->mnt_sb;
> > +		if (sb->s_op->get_fsid) {
> > +			this_fs_id = sb->s_op->get_fsid(sb);
> > +			memcpy(handle->fsid.uuid, this_fs_id->uuid,
> > +				sizeof(handle->fsid.uuid));
> > +		} else
> > +			memset(handle->fsid.uuid, 0, sizeof(handle->fsid.uuid));
> > +		retval = 0;
> > +	} else {
> > +		/*
> > +		 * set the handle_size to zero so we copy only
> > +		 * non variable part of the file_handle
> > +		 */
> > +		handle_size = 0;
> > +		retval = -EOVERFLOW;
> > +	}
> > +	if (copy_to_user(ufh, handle,
> > +				sizeof(struct file_handle) + handle_size))
> > +		retval = -EFAULT;
> > +
> > +	kfree(handle);
> > +err_out:
> > +	return retval;
> > +}
> > +
> > +SYSCALL_DEFINE2(name_to_handle, const char __user *, name,
> > +		struct file_handle __user *, handle)
> > +{
> > +	long ret;
> > +	struct path path;
> 
> *Surely* you want name_to_handle_at, or name_at_to_handle here.
> i.e. include an int 'dfd' to specify the starting directory.
> 
> Then you can make use of AT_SYMLINK_NOFOLLOW and so avoid the need for
> 2 syscalls.
> 


Done. Droped the lname_to_handle syscall


> 
> > +
> > +	/* Follow links */
> > +	ret = user_path(name, &path);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = do_sys_name_to_handle(&path, handle);
> > +	path_put(&path);
> > +
> > +	/* avoid REGPARM breakage on x86: */
> > +	asmlinkage_protect(2, ret, name, handle);
> > +	return ret;
> > +}
> > +
> > +SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
> > +		struct file_handle __user *, handle)
> > +{
> > +	long ret;
> > +	struct path path;
> > +
> > +	ret = user_lpath(name, &path);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = do_sys_name_to_handle(&path, handle);
> > +	path_put(&path);
> > +
> > +	/* avoid REGPARM breakage on x86: */
> > +	asmlinkage_protect(2, ret, name, handle);
> > +	return ret;
> > +}
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 39d57bc..da28b80 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -948,6 +948,20 @@ struct file {
> >  	unsigned long f_mnt_write_state;
> >  #endif
> >  };
> > +
> > +struct uuid {
> > +	char uuid[16];
> 
> unsigned char??? just a thought.
> 

Fixed

> 
> > +};
> > +
> > +struct file_handle {
> > +	int handle_size;
> > +	int handle_type;
> > +	/* File system identifier */
> > +	struct uuid fsid;
> > +	/* file identifier */
> > +	char f_handle[0];


I also changed this to unsigned char f_handle[0];


> > +};
> > +
> >  extern spinlock_t files_lock;
> >  #define file_list_lock() spin_lock(&files_lock);
> >  #define file_list_unlock() spin_unlock(&files_lock);
> > @@ -1580,6 +1594,7 @@ struct super_operations {
> >  	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
> >  #endif
> >  	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
> > +	struct uuid *(*get_fsid)(struct super_block *);
> 
> Assuming this was actually a good idea (which I am not yet sold on),
> I think you would do better to pass in the address of a uuid to be filled in:
> 
>        int (*get_fsid)(struct super_block *, struct uuid *)
> 
> just incase the filesystem stored something in a slightly different format.
> 
> And as XFS has a 'nouuid' mount option, it might be best to allow an error
> return to say "there is no uuid" - maybe expect -ENOENT ??

Fixed

Thanks for the review
-aneesh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -V5 3/8] vfs: Add open by file handle support
  2010-04-27  1:05   ` Neil Brown
@ 2010-04-27  6:10     ` Aneesh Kumar K. V
  0 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K. V @ 2010-04-27  6:10 UTC (permalink / raw)
  To: Neil Brown; +Cc: hch, viro, adilger, corbet, serue, linux-fsdevel, sfrench

On Tue, 27 Apr 2010 11:05:45 +1000, Neil Brown <neilb@suse.de> wrote:
> On Mon, 26 Apr 2010 23:04:01 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> 
> > Acked-by: Serge Hallyn <serue@us.ibm.com>
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > ---
> >  fs/filesystems.c              |   32 +++++++++-
> >  fs/namei.c                    |   24 -------
> >  fs/namespace.c                |   38 +++++++++++
> >  fs/open.c                     |  148 +++++++++++++++++++++++++++++++++++++++++
> >  fs/pnode.c                    |    2 +-
> >  include/linux/mnt_namespace.h |    2 +
> >  include/linux/namei.h         |   24 +++++++
> >  7 files changed, 244 insertions(+), 26 deletions(-)
> > 
> > diff --git a/fs/filesystems.c b/fs/filesystems.c
> > index 68ba492..743d36e 100644
> > --- a/fs/filesystems.c
> > +++ b/fs/filesystems.c
> > @@ -281,5 +281,35 @@ struct file_system_type *get_fs_type(const char *name)
> >  	}
> >  	return fs;
> >  }
> > -
> >  EXPORT_SYMBOL(get_fs_type);
> > +
> > +struct super_block *fs_get_sb(struct uuid *fsid)
> > +{
> > +	struct uuid *this_fsid;
> > +	struct file_system_type *fs_type;
> > +	struct super_block *sb, *found_sb = NULL;
> > +
> > +	read_lock(&file_systems_lock);
> > +	fs_type = file_systems;
> > +	while (fs_type) {
> 
> You've open-coded a for-loop here...
> Why not:
>         for (fs_type = file_systems ; fs_type ; fs_type = fs_type->next) {
> ??

Fixed

> 
> 
> > +		spin_lock(&sb_lock);
> > +		list_for_each_entry(sb, &fs_type->fs_supers, s_instances) {
> > +			if (!sb->s_op->get_fsid)
> > +				continue;
> > +			this_fsid = sb->s_op->get_fsid(sb);
> > +			if (!memcmp(fsid->uuid, this_fsid->uuid,
> > +					sizeof(this_fsid->uuid))) {
> > +				/* found the matching super_block */
> > +				atomic_inc(&sb->s_active);
> > +				found_sb = sb;
> > +				spin_unlock(&sb_lock);
> > +				goto out;
> > +			}
> > +		}
> > +		spin_unlock(&sb_lock);
> > +		fs_type = fs_type->next;
> > +	}
> > +out:
> > +	read_unlock(&file_systems_lock);
> > +	return found_sb;
> > +}
> > diff --git a/fs/namei.c b/fs/namei.c
> > index a7dce91..a18711e 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -1521,30 +1521,6 @@ out_unlock:
> >  	return may_open(&nd->path, 0, open_flag & ~O_TRUNC);
> >  }
> >  
> > -/*
> > - * Note that while the flag value (low two bits) for sys_open means:
> > - *	00 - read-only
> > - *	01 - write-only
> > - *	10 - read-write
> > - *	11 - special
> > - * it is changed into
> > - *	00 - no permissions needed
> > - *	01 - read-permission
> > - *	10 - write-permission
> > - *	11 - read-write
> > - * for the internal routines (ie open_namei()/follow_link() etc)
> > - * This is more logical, and also allows the 00 "no perm needed"
> > - * to be used for symlinks (where the permissions are checked
> > - * later).
> > - *
> > -*/
> > -static inline int open_to_namei_flags(int flag)
> > -{
> > -	if ((flag+1) & O_ACCMODE)
> > -		flag++;
> > -	return flag;
> > -}
> > -
> >  static int open_will_truncate(int flag, struct inode *inode)
> >  {
> >  	/*
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 8174c8a..6168526 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -2364,3 +2364,41 @@ void put_mnt_ns(struct mnt_namespace *ns)
> >  	kfree(ns);
> >  }
> >  EXPORT_SYMBOL(put_mnt_ns);
> > +
> > +/*
> > + * Get any vfsmount mapping the superblock in the
> > + * task namespace
> > + */
> > +struct vfsmount *fs_get_vfsmount(struct task_struct *task,
> > +				struct super_block *sb)
> > +{
> > +	struct nsproxy *nsp;
> > +	struct list_head *mount_list;
> > +	struct mnt_namespace *ns = NULL;
> > +	struct vfsmount *mnt, *sb_mnt = NULL;
> > +
> > +	rcu_read_lock();
> > +	nsp = task_nsproxy(task);
> > +	if (nsp) {
> > +		ns = nsp->mnt_ns;
> > +		if (ns)
> > +			get_mnt_ns(ns);
> > +	}
> > +	rcu_read_unlock();
> > +	if (!ns)
> > +		return NULL;
> > +	down_read(&namespace_sem);
> > +	list_for_each(mount_list, &ns->list) {
> > +		mnt = list_entry(mount_list, struct vfsmount, mnt_list);
> > +		if (mnt->mnt_sb == sb) {
> > +			/* found the matching super block */
> > +			sb_mnt = mnt;
> > +			mntget(sb_mnt);
> > +			break;
> > +		}
> > +	}
> > +	up_read(&namespace_sem);
> > +
> > +	put_mnt_ns(ns);
> > +	return sb_mnt;
> > +}
> 
> If a task has the same filesystem mounted several times in it's namespace
> (via "mount --bind") you just return any arbitrary one (the first).
> 
> Suppose that the filesystem does appear twice, and that in one appearance a
> particular directory is mounted on, while in another appearance that
> directory is not mounted on.  Suppose that the mounted-on appearance is
> listed first in ns->list
> 
> Now we use a series of "name_to_handle" / "open_by_handle" calls to descend
> through the second appearance to something beneath that directory which (in
> the first appearance) is mounted-on.
> This will not work as you will actually be descending the first appearance of
> the filesystem.
> I really think you should leave the choice of filesystem/mountpoint to
> user-space.
> 

Will comment on this on a separate thread.

> 
> > diff --git a/fs/open.c b/fs/open.c
> > index e9aae5c..b87fa32 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -1306,3 +1306,151 @@ SYSCALL_DEFINE2(lname_to_handle, const char __user *, name,
> >  	asmlinkage_protect(2, ret, name, handle);
> >  	return ret;
> >  }
> > +
> > +static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
> > +{
> > +	return 1;
> > +}
> > +
> > +static struct dentry *handle_to_dentry(struct vfsmount *mnt,
> > +				struct file_handle *handle)
> > +{
> > +	int handle_size;
> > +	struct dentry *dentry;
> > +
> > +	/* change the handle size to multiple of sizeof(u32) */
> > +	handle_size = handle->handle_size >> 2;
> > +	dentry = exportfs_decode_fh(mnt, (struct fid *)handle->f_handle,
> > +					handle_size, handle->handle_type,
> > +					vfs_dentry_acceptable, NULL);
> > +	return dentry;
> > +}
> > +
> > +long do_sys_open_by_handle(struct file_handle __user *ufh, int flags)
> > +{
> > +	int fd;
> > +	int retval = 0;
> > +	int d_flags  = flags;
> > +	struct file *filp;
> > +	struct vfsmount *mnt;
> > +	struct inode *inode;
> > +	struct dentry *dentry;
> > +	struct super_block *sb;
> > +	struct file_handle f_handle;
> > +	struct file_handle *handle = NULL;
> > +
> > +	if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
> > +		return  -EFAULT;
> > +
> > +	if ((f_handle.handle_size > MAX_HANDLE_SZ) ||
> > +		(f_handle.handle_size <= 0))
> > +		return -EINVAL;
> > +
> > +	if (!capable(CAP_DAC_OVERRIDE))
> > +		return -EPERM;
> > +
> > +	sb = fs_get_sb(&f_handle.fsid);
> > +	if (!sb)
> > +		return -ESTALE;
> > +	/*
> > +	 * Find the vfsmount for this superblock in the
> > +	 * current namespace
> > +	 */
> > +	mnt = fs_get_vfsmount(current, sb);
> > +	if (!mnt) {
> > +		retval = -ESTALE;
> > +		goto out_sb;
> > +	}
> > +
> > +	handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_size,
> > +			GFP_KERNEL);
> > +	if (!handle) {
> > +		retval =  -ENOMEM;
> > +		goto out_mnt;
> > +	}
> > +	/* copy the full handle */
> > +	if (copy_from_user(handle, ufh,
> > +				sizeof(struct file_handle) +
> > +				f_handle.handle_size)) {
> > +		retval = -EFAULT;
> > +		goto out_mnt;
> > +	}
> > +
> > +	dentry = handle_to_dentry(mnt, handle);
> > +	if (IS_ERR(dentry)) {
> > +		retval = PTR_ERR(dentry);
> > +		goto out_mnt;
> > +	}
> > +
> > +	inode = dentry->d_inode;
> > +
> > +	flags  = open_to_namei_flags(flags);
> > +	/* O_TRUNC implies we need access checks for write permissions */
> > +	if (flags & O_TRUNC)
> > +		flags |= MAY_WRITE;
> > +
> > +	if ((!(flags & O_APPEND) || (flags & O_TRUNC)) &&
> > +		(flags & FMODE_WRITE) && IS_APPEND(inode)) {
> > +		retval = -EPERM;
> > +		goto out_err;
> > +	}
> > +
> > +	if ((flags & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
> > +		retval = -EACCES;
> > +		goto out_err;
> > +	}
> > +
> > +	/* Can't write directories. */
> > +	if (S_ISDIR(inode->i_mode) && (flags & FMODE_WRITE)) {
> > +		retval = -EISDIR;
> > +		goto out_err;
> > +	}
> > +
> > +	fd = get_unused_fd();
> 
> You want to use alloc_fd() (or get_unused_fd_flags) rather than get_unused_fd
> so that the flags can be passed though and, in particular, so O_CLOEXEC will
> be honoured.


Fixed by saying fd = get_unused_fd_flags(d_flags);

> 
> 
> > +	if (fd < 0) {
> > +		retval = fd;
> > +		goto out_err;
> > +	}
> > +
> > +	filp = dentry_open(dentry, mntget(mnt),
> > +			d_flags, current_cred());
> 
> dentry_open consumed the reference to dentry, so ...
> 
> > +	if (IS_ERR(filp)) {
> > +		put_unused_fd(fd);
> > +		retval =  PTR_ERR(filp);
> > +		goto out_err;
> 
> The dput at 'out_err' will put it one time too many.
> I think the best fix would be to pass dget(dentry) to dentry_open, then ....
> 
> > +	}
> > +
> > +	if (inode->i_mode & S_IFREG) {
> > +		filp->f_flags |= O_NOATIME;
> > +		filp->f_mode |= FMODE_NOCMTIME;
> > +	}
> > +	fsnotify_open(filp->f_path.dentry);
> > +	fd_install(fd, filp);
> > +	retval = fd;
> > +	goto out_mnt;
> 
> ... this goto (Which is rather ugly to me) would not be needed.
> 


Fixed

Thanks for the review
-aneesh

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-04-27  6:10 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-26 17:33 [PATCH -V5 0/8] Generic name to handle and open by handle syscalls Aneesh Kumar K.V
2010-04-26 17:33 ` [PATCH -V5 1/8] exportfs: Return the minimum required handle size Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 2/8] vfs: Add name to file handle conversion support Aneesh Kumar K.V
2010-04-27  1:19   ` Neil Brown
2010-04-27  6:08     ` Aneesh Kumar K. V
2010-04-26 17:34 ` [PATCH -V5 3/8] vfs: Add open by file handle support Aneesh Kumar K.V
2010-04-27  1:05   ` Neil Brown
2010-04-27  6:10     ` Aneesh Kumar K. V
2010-04-26 17:34 ` [PATCH -V5 4/8] vfs: Add freadlink syscall Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 5/8] ext4: Add get_fsid callback Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 6/8] x86: Add new syscalls for x86_32 Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 7/8] x86: Add new syscalls for x86_64 Aneesh Kumar K.V
2010-04-26 17:34 ` [PATCH -V5 8/8] ext3: Add get_fsid callback Aneesh Kumar K.V

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.