All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] Support follow_link in RCU-walk. - V2
@ 2015-03-16  4:43 NeilBrown
  2015-03-16  4:43 ` [PATCH 01/13] VFS: replace {, total_}link_count in task_struct with pointer to nameidata NeilBrown
                   ` (13 more replies)
  0 siblings, 14 replies; 27+ messages in thread
From: NeilBrown @ 2015-03-16  4:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

Hi Al,
 I believe this series addresses all your concerns about
 my first attempt.
 The first patch results in nameidata being almost completely
 localized to namei.c :-)  It also highlights out-of-date
 documentation in automount-support.txt :-(

 It also exposes (and removes) some ... interesting code in lustre.
 I'm not sure how safe it is to remove that.... I didn't think
 recursive symlinks used extra stack.

 I haven't tested extensively yet but will do that before a "final"
 submission.

 If you could confirm that I'm moving in the right direction, I would
 appreciate it.

Thanks,
NeilBrown


---

NeilBrown (13):
      VFS: replace {,total_}link_count in task_struct with pointer to nameidata
      VFS: make all ->follow_link handlers aware for LOOKUP_RCU
      VFS: remove nameidata args from ->follow_link and ->put_link
      security/selinux: check for LOOKUP_RCU in _follow_link.
      VFS/namei: use terminate_walk when symlink lookup fails.
      VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU.
      VFS/namei: abort RCU-walk on symlink if atime needs updating.
      VFS/namei: enhance follow_link to support RCU-walk.
      VFS/namei: enable RCU-walk when following symlinks.
      VFS/namei: handle LOOKUP_RCU in page_follow_link_light.
      xfs: use RCU to free 'struct xfs_mount'.
      XFS: allow follow_link to often succeed in RCU-walk.
      NFS: support LOOKUP_RCU in nfs_follow_link.


 Documentation/filesystems/Locking               |    4 
 Documentation/filesystems/automount-support.txt |    3 
 Documentation/filesystems/porting               |    5 +
 Documentation/filesystems/vfs.txt               |    4 
 drivers/staging/lustre/lustre/llite/symlink.c   |   25 +--
 fs/9p/v9fs.h                                    |    3 
 fs/9p/vfs_inode.c                               |   17 +-
 fs/9p/vfs_inode_dotl.c                          |   11 +
 fs/autofs4/symlink.c                            |    4 
 fs/befs/linuxvfs.c                              |   14 +-
 fs/ceph/inode.c                                 |    4 
 fs/cifs/cifsfs.h                                |    2 
 fs/cifs/link.c                                  |    6 -
 fs/configfs/symlink.c                           |   16 +-
 fs/debugfs/file.c                               |    4 
 fs/ecryptfs/inode.c                             |   13 +
 fs/exofs/symlink.c                              |    4 
 fs/ext2/symlink.c                               |    4 
 fs/ext3/symlink.c                               |    4 
 fs/ext4/symlink.c                               |    4 
 fs/freevxfs/vxfs_immed.c                        |    8 -
 fs/fuse/dir.c                                   |   10 +
 fs/gfs2/inode.c                                 |   10 +
 fs/hostfs/hostfs_kern.c                         |   15 +-
 fs/hppfs/hppfs.c                                |    9 -
 fs/inode.c                                      |   26 ++-
 fs/jffs2/symlink.c                              |    6 -
 fs/jfs/symlink.c                                |    4 
 fs/kernfs/symlink.c                             |   16 +-
 fs/libfs.c                                      |    5 -
 fs/namei.c                                      |  214 +++++++++++++++--------
 fs/nfs/inode.c                                  |   22 ++
 fs/nfs/symlink.c                                |   24 ++-
 fs/ntfs/namei.c                                 |    1 
 fs/overlayfs/inode.c                            |   13 +
 fs/proc/base.c                                  |    6 -
 fs/proc/inode.c                                 |    6 -
 fs/proc/namespaces.c                            |    7 +
 fs/proc/self.c                                  |   11 +
 fs/proc/thread_self.c                           |   14 +-
 fs/sysv/symlink.c                               |    4 
 fs/ubifs/file.c                                 |    4 
 fs/ufs/symlink.c                                |    4 
 fs/xfs/xfs_ioctl.c                              |    2 
 fs/xfs/xfs_iops.c                               |   19 +-
 fs/xfs/xfs_mount.h                              |    2 
 fs/xfs/xfs_super.c                              |    2 
 fs/xfs/xfs_symlink.c                            |   11 +
 fs/xfs/xfs_symlink.h                            |    2 
 include/linux/fs.h                              |   12 +
 include/linux/namei.h                           |    8 -
 include/linux/nfs_fs.h                          |    1 
 include/linux/sched.h                           |    3 
 include/linux/security.h                        |    9 -
 mm/shmem.c                                      |   18 +-
 security/capability.c                           |    2 
 security/security.c                             |    4 
 security/selinux/hooks.c                        |    4 
 58 files changed, 442 insertions(+), 247 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/13] VFS: replace {, total_}link_count in task_struct with pointer to nameidata
  2015-03-16  4:43 [PATCH 00/13] Support follow_link in RCU-walk. - V2 NeilBrown
@ 2015-03-16  4:43 ` NeilBrown
  2015-03-16 19:46   ` Al Viro
  2015-03-16  4:43 ` [PATCH 02/13] VFS: make all ->follow_link handlers aware for LOOKUP_RCU NeilBrown
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2015-03-16  4:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

task_struct currently contains two ad-hoc members for use by
the VFS: link_count and total_link_count.
These are only interesting to fs/namei.c, so exposing them
explicitly is poor laying - and has resulted in some questionable
code in staging/lustre.

This patches replaces those with a single pointer to 'struct
nameidata'.
This structure represents the current filename lookup of which
there can only be one per process, and is a natural place to
store link_count and total_link_count.

This will allow the current "nameidata" argument to all
follow_link operations to be removed as current->nameidata
can be used instead.

As there are occasional circumstances where pathname lookup can
recurse, such as through kern_path_locked, we always save and old
current->nameidata (if there is one) when setting a new value, and
make sure any active link_counts are preserved.

Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/staging/lustre/lustre/llite/symlink.c |   16 ++-------
 fs/namei.c                                    |   47 +++++++++++++++++++------
 include/linux/sched.h                         |    2 +
 3 files changed, 41 insertions(+), 24 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index 686b6a574cc5..d7a1c6c48846 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -126,18 +126,10 @@ static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
 	char *symname = NULL;
 
 	CDEBUG(D_VFSTRACE, "VFS Op\n");
-	/* Limit the recursive symlink depth to 5 instead of default
-	 * 8 links when kernel has 4k stack to prevent stack overflow.
-	 * For 8k stacks we need to limit it to 7 for local servers. */
-	if (THREAD_SIZE < 8192 && current->link_count >= 6) {
-		rc = -ELOOP;
-	} else if (THREAD_SIZE == 8192 && current->link_count >= 8) {
-		rc = -ELOOP;
-	} else {
-		ll_inode_size_lock(inode);
-		rc = ll_readlink_internal(inode, &request, &symname);
-		ll_inode_size_unlock(inode);
-	}
+	ll_inode_size_lock(inode);
+	rc = ll_readlink_internal(inode, &request, &symname);
+	ll_inode_size_unlock(inode);
+
 	if (rc) {
 		ptlrpc_req_finished(request);
 		request = NULL;
diff --git a/fs/namei.c b/fs/namei.c
index c83145af4bfc..184aaafffaa9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -502,10 +502,27 @@ struct nameidata {
 	unsigned	seq, m_seq;
 	int		last_type;
 	unsigned	depth;
+	int link_count, total_link_count;
 	struct file	*base;
 	char *saved_names[MAX_NESTED_LINKS + 1];
 };
 
+static struct nameidata *set_nameidata(struct nameidata *p)
+{
+	struct nameidata *old = current->nameidata;
+	current->nameidata = p;
+	if (p) {
+		if (!old) {
+			p->link_count = 0;
+			p->total_link_count = 0;
+		} else {
+			p->link_count = old->link_count;
+			p->total_link_count = old->total_link_count;
+		}
+	}
+	return old;
+}
+
 /*
  * Path walking has 2 modes, rcu-walk and ref-walk (see
  * Documentation/filesystems/path-lookup.txt).  In situations when we can't
@@ -863,11 +880,11 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 		mntget(link->mnt);
 
 	error = -ELOOP;
-	if (unlikely(current->total_link_count >= 40))
+	if (unlikely(current->nameidata->total_link_count >= 40))
 		goto out_put_nd_path;
 
 	cond_resched();
-	current->total_link_count++;
+	current->nameidata->total_link_count++;
 
 	touch_atime(link);
 	nd_set_link(nd, NULL);
@@ -991,8 +1008,8 @@ static int follow_automount(struct path *path, unsigned flags,
 	    path->dentry->d_inode)
 		return -EISDIR;
 
-	current->total_link_count++;
-	if (current->total_link_count >= 40)
+	current->nameidata->total_link_count++;
+	if (current->nameidata->total_link_count >= 40)
 		return -ELOOP;
 
 	mnt = path->dentry->d_op->d_automount(path);
@@ -1621,7 +1638,7 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
 {
 	int res;
 
-	if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
+	if (unlikely(current->nameidata->link_count >= MAX_NESTED_LINKS)) {
 		path_put_conditional(path, nd);
 		path_put(&nd->path);
 		return -ELOOP;
@@ -1629,7 +1646,7 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
 	BUG_ON(nd->depth >= MAX_NESTED_LINKS);
 
 	nd->depth++;
-	current->link_count++;
+	current->nameidata->link_count++;
 
 	do {
 		struct path link = *path;
@@ -1642,7 +1659,7 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
 		put_link(nd, &link, cookie);
 	} while (res > 0);
 
-	current->link_count--;
+	current->nameidata->link_count--;
 	nd->depth--;
 	return res;
 }
@@ -1948,7 +1965,7 @@ static int path_init(int dfd, const char *name, unsigned int flags,
 	rcu_read_unlock();
 	return -ECHILD;
 done:
-	current->total_link_count = 0;
+	current->nameidata->total_link_count = 0;
 	return link_path_walk(name, nd);
 }
 
@@ -2027,7 +2044,9 @@ static int path_lookupat(int dfd, const char *name,
 static int filename_lookup(int dfd, struct filename *name,
 				unsigned int flags, struct nameidata *nd)
 {
-	int retval = path_lookupat(dfd, name->name, flags | LOOKUP_RCU, nd);
+	int retval;
+	struct nameidata *saved_nd = set_nameidata(nd);
+	retval = path_lookupat(dfd, name->name, flags | LOOKUP_RCU, nd);
 	if (unlikely(retval == -ECHILD))
 		retval = path_lookupat(dfd, name->name, flags, nd);
 	if (unlikely(retval == -ESTALE))
@@ -2036,6 +2055,7 @@ static int filename_lookup(int dfd, struct filename *name,
 
 	if (likely(!retval))
 		audit_inode(name, nd->path.dentry, flags & LOOKUP_PARENT);
+	set_nameidata(saved_nd);
 	return retval;
 }
 
@@ -2343,7 +2363,7 @@ out:
 static int
 path_mountpoint(int dfd, const char *name, struct path *path, unsigned int flags)
 {
-	struct nameidata nd;
+	struct nameidata nd, *saved = set_nameidata(&nd);
 	int err;
 
 	err = path_init(dfd, name, flags, &nd);
@@ -2366,6 +2386,7 @@ path_mountpoint(int dfd, const char *name, struct path *path, unsigned int flags
 	}
 out:
 	path_cleanup(&nd);
+	set_nameidata(saved);
 	return err;
 }
 
@@ -3217,12 +3238,14 @@ static struct file *path_openat(int dfd, struct filename *pathname,
 	struct path path;
 	int opened = 0;
 	int error;
+	struct nameidata *saved_nd;
 
 	file = get_empty_filp();
 	if (IS_ERR(file))
 		return file;
 
 	file->f_flags = op->open_flag;
+	saved_nd = set_nameidata(nd);
 
 	if (unlikely(file->f_flags & __O_TMPFILE)) {
 		error = do_tmpfile(dfd, pathname, nd, flags, op, file, &opened);
@@ -3269,6 +3292,7 @@ out:
 		}
 		file = ERR_PTR(error);
 	}
+	set_nameidata(saved_nd);
 	return file;
 }
 
@@ -4429,7 +4453,7 @@ EXPORT_SYMBOL(readlink_copy);
  */
 int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 {
-	struct nameidata nd;
+	struct nameidata nd, *saved = set_nameidata(&nd);
 	void *cookie;
 	int res;
 
@@ -4441,6 +4465,7 @@ int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 	res = readlink_copy(buffer, buflen, nd_get_link(&nd));
 	if (dentry->d_inode->i_op->put_link)
 		dentry->d_inode->i_op->put_link(dentry, &nd, cookie);
+	set_nameidata(saved);
 	return res;
 }
 EXPORT_SYMBOL(generic_readlink);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6d77432e14ff..b88b9eea169a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1447,7 +1447,7 @@ struct task_struct {
 				       it with task_lock())
 				     - initialized normally by setup_new_exec */
 /* file system info */
-	int link_count, total_link_count;
+	struct nameidata *nameidata;
 #ifdef CONFIG_SYSVIPC
 /* ipc stuff */
 	struct sysv_sem sysvsem;



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/13] VFS: make all ->follow_link handlers aware for LOOKUP_RCU
  2015-03-16  4:43 [PATCH 00/13] Support follow_link in RCU-walk. - V2 NeilBrown
  2015-03-16  4:43 ` [PATCH 01/13] VFS: replace {, total_}link_count in task_struct with pointer to nameidata NeilBrown
@ 2015-03-16  4:43 ` NeilBrown
  2015-03-16  4:43 ` [PATCH 04/13] security/selinux: check for LOOKUP_RCU in _follow_link NeilBrown
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: NeilBrown @ 2015-03-16  4:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

In preparation for supporting ->follow_link in RCU-walk,
make sure all ->follow_link handers which are not atomic
will fail if LOOKUP_RCU is set.

Later patches will make some of these handle LOOKUP_RCU
more gracefully.

This is current achieved by introducing a new function
"nd_is_rcu" to check if a nameidata has LOOKUP_RCU set.
There must be a better way.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/staging/lustre/lustre/llite/symlink.c |    3 +++
 fs/9p/vfs_inode.c                             |    6 +++++-
 fs/9p/vfs_inode_dotl.c                        |    5 ++++-
 fs/befs/linuxvfs.c                            |    2 ++
 fs/cifs/link.c                                |    2 ++
 fs/configfs/symlink.c                         |    7 ++++++-
 fs/ecryptfs/inode.c                           |    7 ++++++-
 fs/fuse/dir.c                                 |    2 ++
 fs/gfs2/inode.c                               |    2 ++
 fs/hostfs/hostfs_kern.c                       |    7 ++++++-
 fs/kernfs/symlink.c                           |    7 ++++++-
 fs/namei.c                                    |    8 ++++++++
 fs/nfs/symlink.c                              |    2 ++
 fs/overlayfs/inode.c                          |    3 +++
 fs/proc/base.c                                |    2 ++
 fs/proc/namespaces.c                          |    3 +++
 fs/proc/self.c                                |    7 ++++++-
 fs/proc/thread_self.c                         |   10 ++++++++--
 fs/xfs/xfs_iops.c                             |    2 ++
 include/linux/fs.h                            |    1 +
 mm/shmem.c                                    |    6 +++++-
 21 files changed, 84 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index d7a1c6c48846..e8a8d25fcabf 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -125,6 +125,9 @@ static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
 	int rc;
 	char *symname = NULL;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+
 	CDEBUG(D_VFSTRACE, "VFS Op\n");
 	ll_inode_size_lock(inode);
 	rc = ll_readlink_internal(inode, &request, &symname);
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 3662f1d1d9cf..8aff5d684154 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1281,7 +1281,11 @@ done:
 static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
 	int len = 0;
-	char *link = __getname();
+	char *link;
+
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+	link = __getname();
 
 	p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);
 
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 6054c16b8fae..51776a3cc842 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -914,9 +914,12 @@ v9fs_vfs_follow_link_dotl(struct dentry *dentry, struct nameidata *nd)
 {
 	int retval;
 	struct p9_fid *fid;
-	char *link = __getname();
+	char *link;
 	char *target;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+	link = __getname();
 	p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);
 
 	if (!link) {
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index e089f1985fca..bbe8f90924b2 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -477,6 +477,8 @@ befs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	befs_off_t len = data->size;
 	char *link;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
 	if (len == 0) {
 		befs_error(sb, "Long symlink with illegal length");
 		link = ERR_PTR(-EIO);
diff --git a/fs/cifs/link.c b/fs/cifs/link.c
index 2ec6037f61c7..0dbe1a326632 100644
--- a/fs/cifs/link.c
+++ b/fs/cifs/link.c
@@ -639,6 +639,8 @@ cifs_follow_link(struct dentry *direntry, struct nameidata *nd)
 	struct cifs_tcon *tcon;
 	struct TCP_Server_Info *server;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
 	xid = get_xid();
 
 	tlink = cifs_sb_tlink(cifs_sb);
diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index cc9f2546ea4a..1397342aad5b 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -282,7 +282,12 @@ static int configfs_getlink(struct dentry *dentry, char * path)
 static void *configfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
 	int error = -ENOMEM;
-	unsigned long page = get_zeroed_page(GFP_KERNEL);
+	unsigned long page;
+
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+
+	page = get_zeroed_page(GFP_KERNEL);
 
 	if (page) {
 		error = configfs_getlink(dentry, (char *)page);
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index b08b5187f662..49d3dd96344c 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -678,7 +678,12 @@ out:
 static void *ecryptfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
 	size_t len;
-	char *buf = ecryptfs_readlink_lower(dentry, &len);
+	char *buf;
+
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+
+	buf = ecryptfs_readlink_lower(dentry, &len);
 	if (IS_ERR(buf))
 		goto out;
 	fsstack_copy_attr_atime(dentry->d_inode,
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 1545b711ddcf..15d326ec5943 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1402,6 +1402,8 @@ static void free_link(char *link)
 
 static void *fuse_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
 	nd_set_link(nd, read_link(dentry));
 	return NULL;
 }
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 73c72253faac..21086c7870f1 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1557,6 +1557,8 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
 	char *buf;
 	int error;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
 	gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &i_gh);
 	error = gfs2_glock_nq(&i_gh);
 	if (error) {
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index fd62cae0fdcb..374d04909538 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -884,7 +884,12 @@ static const struct inode_operations hostfs_dir_iops = {
 
 static void *hostfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
-	char *link = __getname();
+	char *link;
+
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+
+	link = __getname();
 	if (link) {
 		char *path = dentry_name(dentry);
 		int err = -ENOMEM;
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 8a198898e39a..8e5421f386c0 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -115,7 +115,12 @@ static int kernfs_getlink(struct dentry *dentry, char *path)
 static void *kernfs_iop_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
 	int error = -ENOMEM;
-	unsigned long page = get_zeroed_page(GFP_KERNEL);
+	unsigned long page;
+
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+
+	page = get_zeroed_page(GFP_KERNEL);
 	if (page) {
 		error = kernfs_getlink(dentry, (char *) page);
 		if (error < 0)
diff --git a/fs/namei.c b/fs/namei.c
index 184aaafffaa9..eefa4a00501a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4500,6 +4500,8 @@ EXPORT_SYMBOL(page_readlink);
 void *page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
 {
 	struct page *page = NULL;
+	if (nd->flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
 	nd_set_link(nd, page_getlink(dentry, &page));
 	return page;
 }
@@ -4567,3 +4569,9 @@ const struct inode_operations page_symlink_inode_operations = {
 	.put_link	= page_put_link,
 };
 EXPORT_SYMBOL(page_symlink_inode_operations);
+
+int nd_is_rcu(struct nameidata *nd)
+{
+	return nd->flags & LOOKUP_RCU;
+}
+EXPORT_SYMBOL(nd_is_rcu);
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index 05c9e02f4153..c9a2d3cc4619 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -49,6 +49,8 @@ static void *nfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	struct page *page;
 	void *err;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
 	err = ERR_PTR(nfs_revalidate_mapping(inode, inode->i_mapping));
 	if (err)
 		goto read_failed;
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 04f124884687..db370d5d84c4 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/fs.h>
+#include <linux/namei.h>
 #include <linux/slab.h>
 #include <linux/xattr.h>
 #include "overlayfs.h"
@@ -146,6 +147,8 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
 	struct dentry *realdentry;
 	struct inode *realinode;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
 	realdentry = ovl_dentry_real(dentry);
 	realinode = realdentry->d_inode;
 
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 3f3d7aeb0712..6f5dbfe68516 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1377,6 +1377,8 @@ static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
 	struct path path;
 	int error = -EACCES;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
 	/* Are we allowed to snoop on the tasks file descriptors? */
 	if (!proc_fd_access_allowed(inode))
 		goto out;
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index c9eac4563fa8..c89a51401bb5 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -38,6 +38,9 @@ static void *proc_ns_follow_link(struct dentry *dentry, struct nameidata *nd)
 	struct path ns_path;
 	void *error = ERR_PTR(-EACCES);
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+
 	task = get_proc_task(inode);
 	if (!task)
 		return error;
diff --git a/fs/proc/self.c b/fs/proc/self.c
index 4348bb8907c2..c094ea04e1bb 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -22,8 +22,13 @@ static int proc_self_readlink(struct dentry *dentry, char __user *buffer,
 static void *proc_self_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
 	struct pid_namespace *ns = dentry->d_sb->s_fs_info;
-	pid_t tgid = task_tgid_nr_ns(current, ns);
+	pid_t tgid;
 	char *name = ERR_PTR(-ENOENT);
+
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+
+	tgid = task_tgid_nr_ns(current, ns);
 	if (tgid) {
 		/* 11 for max length of signed int in decimal + NULL term */
 		name = kmalloc(12, GFP_KERNEL);
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index 59075b509df3..5d3144d51018 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -23,9 +23,15 @@ static int proc_thread_self_readlink(struct dentry *dentry, char __user *buffer,
 static void *proc_thread_self_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
 	struct pid_namespace *ns = dentry->d_sb->s_fs_info;
-	pid_t tgid = task_tgid_nr_ns(current, ns);
-	pid_t pid = task_pid_nr_ns(current, ns);
+	pid_t tgid;
+	pid_t pid;
 	char *name = ERR_PTR(-ENOENT);
+
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+
+	tgid = task_tgid_nr_ns(current, ns);
+	pid = task_pid_nr_ns(current, ns);
 	if (pid) {
 		name = kmalloc(PROC_NUMBUF + 6 + PROC_NUMBUF, GFP_KERNEL);
 		if (!name)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index e53a90331422..23cea798b777 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -417,6 +417,8 @@ xfs_vn_follow_link(
 	char			*link;
 	int			error = -ENOMEM;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
 	link = kmalloc(MAXPATHLEN+1, GFP_KERNEL);
 	if (!link)
 		goto out_err;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b821fa32ba3f..eaef987ae3cf 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2167,6 +2167,7 @@ extern struct filename *getname_flags(const char __user *, int, int *);
 extern struct filename *getname(const char __user *);
 extern struct filename *getname_kernel(const char *);
 extern void putname(struct filename *name);
+extern int nd_is_rcu(struct nameidata *nd);
 
 enum {
 	FILE_CREATED = 1,
diff --git a/mm/shmem.c b/mm/shmem.c
index cf2d0ca010bc..fdf6ba18fce3 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2483,7 +2483,11 @@ static void *shmem_follow_short_symlink(struct dentry *dentry, struct nameidata
 static void *shmem_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
 	struct page *page = NULL;
-	int error = shmem_getpage(dentry->d_inode, 0, &page, SGP_READ, NULL);
+	int error;
+
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+	error = shmem_getpage(dentry->d_inode, 0, &page, SGP_READ, NULL);
 	nd_set_link(nd, error ? ERR_PTR(error) : kmap(page));
 	if (page)
 		unlock_page(page);



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/13] VFS: remove nameidata args from ->follow_link and ->put_link
  2015-03-16  4:43 [PATCH 00/13] Support follow_link in RCU-walk. - V2 NeilBrown
                   ` (2 preceding siblings ...)
  2015-03-16  4:43 ` [PATCH 04/13] security/selinux: check for LOOKUP_RCU in _follow_link NeilBrown
@ 2015-03-16  4:43 ` NeilBrown
  2015-03-16 20:47   ` Al Viro
  2015-03-16  4:43 ` [PATCH 05/13] VFS/namei: use terminate_walk when symlink lookup fails NeilBrown
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2015-03-16  4:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

Now that current->nameidata is available, nd_set_link() and
nd_get_link() can use that directly, so 'nd' doesn't need to
be passed through ->follow_link and  ->put_link.

->follow_link gains a 'flags' argument instead which will
be useful for adding RCU-walk support.
For now, any filesystem which cannot trivially handle RCU-walk
support simply returns -ECHILD if LOOKUP_RCU is set in 'flags'.

security_inode_follow_link() all gets 'flags' in place of 'nd',
as does the inode_follow_link() security_op.

As a result of this change, 'nameidata' is almost entirely
local to namei.c.  It is only exposed externally as an opaque struct
pointed to by current->nameidata.

Note: Documentation/filesystemd/automount-support.txt mentions
 nameidata in ways that have been wrong for a while and are still
 wrong.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 Documentation/filesystems/Locking               |    4 +--
 Documentation/filesystems/automount-support.txt |    3 ++
 Documentation/filesystems/porting               |    5 +++
 Documentation/filesystems/vfs.txt               |    4 +--
 drivers/staging/lustre/lustre/llite/symlink.c   |    8 +++--
 fs/9p/v9fs.h                                    |    3 +-
 fs/9p/vfs_inode.c                               |   13 ++++----
 fs/9p/vfs_inode_dotl.c                          |    8 +++--
 fs/autofs4/symlink.c                            |    4 +--
 fs/befs/linuxvfs.c                              |   14 ++++-----
 fs/ceph/inode.c                                 |    4 +--
 fs/cifs/cifsfs.h                                |    2 +
 fs/cifs/link.c                                  |    6 ++--
 fs/configfs/symlink.c                           |   11 +++----
 fs/debugfs/file.c                               |    4 +--
 fs/ecryptfs/inode.c                             |    8 ++---
 fs/exofs/symlink.c                              |    4 +--
 fs/ext2/symlink.c                               |    4 +--
 fs/ext3/symlink.c                               |    4 +--
 fs/ext4/symlink.c                               |    4 +--
 fs/freevxfs/vxfs_immed.c                        |    8 +++--
 fs/fuse/dir.c                                   |   10 +++---
 fs/gfs2/inode.c                                 |   10 +++---
 fs/hostfs/hostfs_kern.c                         |   10 +++---
 fs/hppfs/hppfs.c                                |    9 +++---
 fs/jffs2/symlink.c                              |    6 ++--
 fs/jfs/symlink.c                                |    4 +--
 fs/kernfs/symlink.c                             |   11 +++----
 fs/libfs.c                                      |    5 +--
 fs/namei.c                                      |   36 +++++++++++++----------
 fs/nfs/symlink.c                                |    8 +++--
 fs/ntfs/namei.c                                 |    1 -
 fs/overlayfs/inode.c                            |   12 ++++----
 fs/proc/base.c                                  |    6 ++--
 fs/proc/inode.c                                 |    6 ++--
 fs/proc/namespaces.c                            |    6 ++--
 fs/proc/self.c                                  |    6 ++--
 fs/proc/thread_self.c                           |    6 ++--
 fs/sysv/symlink.c                               |    4 +--
 fs/ubifs/file.c                                 |    4 +--
 fs/ufs/symlink.c                                |    4 +--
 fs/xfs/xfs_iops.c                               |    8 +++--
 include/linux/fs.h                              |   12 +++-----
 include/linux/namei.h                           |    7 ++--
 include/linux/sched.h                           |    1 +
 include/linux/security.h                        |    9 +++---
 mm/shmem.c                                      |   14 ++++-----
 security/capability.c                           |    2 +
 security/security.c                             |    4 +--
 security/selinux/hooks.c                        |    2 +
 50 files changed, 175 insertions(+), 173 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index f91926f2f482..8a772d4bb51f 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -50,8 +50,8 @@ prototypes:
 	int (*rename2) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *, unsigned int);
 	int (*readlink) (struct dentry *, char __user *,int);
-	void * (*follow_link) (struct dentry *, struct nameidata *);
-	void (*put_link) (struct dentry *, struct nameidata *, void *);
+	void * (*follow_link) (struct dentry *, int flags);
+	void (*put_link) (struct dentry *, void *);
 	void (*truncate) (struct inode *);
 	int (*permission) (struct inode *, int, unsigned int);
 	int (*get_acl)(struct inode *, int);
diff --git a/Documentation/filesystems/automount-support.txt b/Documentation/filesystems/automount-support.txt
index 7cac200e2a85..b68370fcc8f8 100644
--- a/Documentation/filesystems/automount-support.txt
+++ b/Documentation/filesystems/automount-support.txt
@@ -8,6 +8,9 @@ requested. The latter can also be requested by userspace.
 IN-KERNEL AUTOMOUNTING
 ======================
 
+THE FOLLOWING IS WRONG AND NEED TO BE UPDATED
+
+
 A filesystem can now mount another filesystem on one of its directories by the
 following procedure:
 
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index fa2db081505e..d6d228d54993 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -471,3 +471,8 @@ in your dentry operations instead.
 [mandatory]
 	f_dentry is gone; use f_path.dentry, or, better yet, see if you can avoid
 	it entirely.
+--
+[mandatory]
+	->follow_link and ->put_link no longer receive 'struct nameidata *'.
+	->follow_link receives flags which may contains LOOKUP_RCU.
+	When that is set code must not block, but can return -ECHILD.
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 966b22829f3b..a813af5ee097 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -350,8 +350,8 @@ struct inode_operations {
 	int (*rename2) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *, unsigned int);
 	int (*readlink) (struct dentry *, char __user *,int);
-        void * (*follow_link) (struct dentry *, struct nameidata *);
-        void (*put_link) (struct dentry *, struct nameidata *, void *);
+        void * (*follow_link) (struct dentry *, int flags);
+        void (*put_link) (struct dentry *, void *);
 	int (*permission) (struct inode *, int);
 	int (*get_acl)(struct inode *, int);
 	int (*setattr) (struct dentry *, struct iattr *);
diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index e8a8d25fcabf..e1f4ef3356ae 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -118,14 +118,14 @@ failed:
 	return rc;
 }
 
-static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ll_follow_link(struct dentry *dentry, int flags)
 {
 	struct inode *inode = dentry->d_inode;
 	struct ptlrpc_request *request = NULL;
 	int rc;
 	char *symname = NULL;
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 
 	CDEBUG(D_VFSTRACE, "VFS Op\n");
@@ -139,14 +139,14 @@ static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
 		symname = ERR_PTR(rc);
 	}
 
-	nd_set_link(nd, symname);
+	nd_set_link(symname);
 	/* symname may contain a pointer to the request message buffer,
 	 * we delay request releasing until ll_put_link then.
 	 */
 	return request;
 }
 
-static void ll_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+static void ll_put_link(struct dentry *dentry, void *cookie)
 {
 	ptlrpc_req_finished(cookie);
 }
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 099c7712631c..b50310bf5bac 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -150,8 +150,7 @@ extern int v9fs_vfs_unlink(struct inode *i, struct dentry *d);
 extern int v9fs_vfs_rmdir(struct inode *i, struct dentry *d);
 extern int v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 			struct inode *new_dir, struct dentry *new_dentry);
-extern void v9fs_vfs_put_link(struct dentry *dentry, struct nameidata *nd,
-			void *p);
+extern void v9fs_vfs_put_link(struct dentry *dentry, void *p);
 extern struct inode *v9fs_inode_from_fid(struct v9fs_session_info *v9ses,
 					 struct p9_fid *fid,
 					 struct super_block *sb, int new);
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 8aff5d684154..f3bc0640dd4c 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1274,16 +1274,16 @@ done:
 /**
  * v9fs_vfs_follow_link - follow a symlink path
  * @dentry: dentry for symlink
- * @nd: nameidata
+ * @flags: lookup flags
  *
  */
 
-static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *v9fs_vfs_follow_link(struct dentry *dentry, int flags)
 {
 	int len = 0;
 	char *link;
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 	link = __getname();
 
@@ -1300,7 +1300,7 @@ static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 		} else
 			link[min(len, PATH_MAX-1)] = 0;
 	}
-	nd_set_link(nd, link);
+	nd_set_link(link);
 
 	return NULL;
 }
@@ -1308,15 +1308,14 @@ static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 /**
  * v9fs_vfs_put_link - release a symlink path
  * @dentry: dentry for symlink
- * @nd: nameidata
  * @p: unused
  *
  */
 
 void
-v9fs_vfs_put_link(struct dentry *dentry, struct nameidata *nd, void *p)
+v9fs_vfs_put_link(struct dentry *dentry, void *p)
 {
-	char *s = nd_get_link(nd);
+	char *s = nd_get_link();
 
 	p9_debug(P9_DEBUG_VFS, " %pd %s\n",
 		 dentry, IS_ERR(s) ? "<error>" : s);
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 51776a3cc842..5862720911aa 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -905,19 +905,19 @@ error:
 /**
  * v9fs_vfs_follow_link_dotl - follow a symlink path
  * @dentry: dentry for symlink
- * @nd: nameidata
+ * @flags: lookup flags
  *
  */
 
 static void *
-v9fs_vfs_follow_link_dotl(struct dentry *dentry, struct nameidata *nd)
+v9fs_vfs_follow_link_dotl(struct dentry *dentry, int flags)
 {
 	int retval;
 	struct p9_fid *fid;
 	char *link;
 	char *target;
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 	link = __getname();
 	p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);
@@ -941,7 +941,7 @@ v9fs_vfs_follow_link_dotl(struct dentry *dentry, struct nameidata *nd)
 	__putname(link);
 	link = ERR_PTR(retval);
 ndset:
-	nd_set_link(nd, link);
+	nd_set_link(link);
 	return NULL;
 }
 
diff --git a/fs/autofs4/symlink.c b/fs/autofs4/symlink.c
index 1e8ea192be2b..311f176708ae 100644
--- a/fs/autofs4/symlink.c
+++ b/fs/autofs4/symlink.c
@@ -12,13 +12,13 @@
 
 #include "autofs_i.h"
 
-static void *autofs4_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *autofs4_follow_link(struct dentry *dentry, int flags)
 {
 	struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
 	struct autofs_info *ino = autofs4_dentry_ino(dentry);
 	if (ino && !autofs4_oz_mode(sbi))
 		ino->last_used = jiffies;
-	nd_set_link(nd, dentry->d_inode->i_private);
+	nd_set_link(dentry->d_inode->i_private);
 	return NULL;
 }
 
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index bbe8f90924b2..b129966e7277 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -42,8 +42,8 @@ static struct inode *befs_iget(struct super_block *, unsigned long);
 static struct inode *befs_alloc_inode(struct super_block *sb);
 static void befs_destroy_inode(struct inode *inode);
 static void befs_destroy_inodecache(void);
-static void *befs_follow_link(struct dentry *, struct nameidata *);
-static void *befs_fast_follow_link(struct dentry *, struct nameidata *);
+static void *befs_follow_link(struct dentry *, int);
+static void *befs_fast_follow_link(struct dentry *, int);
 static int befs_utf2nls(struct super_block *sb, const char *in, int in_len,
 			char **out, int *out_len);
 static int befs_nls2utf(struct super_block *sb, const char *in, int in_len,
@@ -469,7 +469,7 @@ befs_destroy_inodecache(void)
  * flag is set.
  */
 static void *
-befs_follow_link(struct dentry *dentry, struct nameidata *nd)
+befs_follow_link(struct dentry *dentry, int flags)
 {
 	struct super_block *sb = dentry->d_sb;
 	befs_inode_info *befs_ino = BEFS_I(dentry->d_inode);
@@ -477,7 +477,7 @@ befs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	befs_off_t len = data->size;
 	char *link;
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 	if (len == 0) {
 		befs_error(sb, "Long symlink with illegal length");
@@ -496,16 +496,16 @@ befs_follow_link(struct dentry *dentry, struct nameidata *nd)
 			link[len - 1] = '\0';
 		}
 	}
-	nd_set_link(nd, link);
+	nd_set_link(link);
 	return NULL;
 }
 
 
 static void *
-befs_fast_follow_link(struct dentry *dentry, struct nameidata *nd)
+befs_fast_follow_link(struct dentry *dentry, int flags)
 {
 	befs_inode_info *befs_ino = BEFS_I(dentry->d_inode);
-	nd_set_link(nd, befs_ino->i_data.symlink);
+	nd_set_link(befs_ino->i_data.symlink);
 	return NULL;
 }
 
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 119c43c80638..ceaa82d3a157 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1691,10 +1691,10 @@ retry:
 /*
  * symlinks
  */
-static void *ceph_sym_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ceph_sym_follow_link(struct dentry *dentry, int flags)
 {
 	struct ceph_inode_info *ci = ceph_inode(dentry->d_inode);
-	nd_set_link(nd, ci->i_symlink);
+	nd_set_link(ci->i_symlink);
 	return NULL;
 }
 
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 252f5c15806b..f40f664a8c51 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -120,7 +120,7 @@ extern struct vfsmount *cifs_dfs_d_automount(struct path *path);
 #endif
 
 /* Functions related to symlinks */
-extern void *cifs_follow_link(struct dentry *direntry, struct nameidata *nd);
+extern void *cifs_follow_link(struct dentry *direntry, int flags);
 extern int cifs_readlink(struct dentry *direntry, char __user *buffer,
 			 int buflen);
 extern int cifs_symlink(struct inode *inode, struct dentry *direntry,
diff --git a/fs/cifs/link.c b/fs/cifs/link.c
index 0dbe1a326632..148a9b54669f 100644
--- a/fs/cifs/link.c
+++ b/fs/cifs/link.c
@@ -627,7 +627,7 @@ cifs_hl_exit:
 }
 
 void *
-cifs_follow_link(struct dentry *direntry, struct nameidata *nd)
+cifs_follow_link(struct dentry *direntry, int flags)
 {
 	struct inode *inode = direntry->d_inode;
 	int rc = -ENOMEM;
@@ -639,7 +639,7 @@ cifs_follow_link(struct dentry *direntry, struct nameidata *nd)
 	struct cifs_tcon *tcon;
 	struct TCP_Server_Info *server;
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 	xid = get_xid();
 
@@ -681,7 +681,7 @@ out:
 	free_xid(xid);
 	if (tlink)
 		cifs_put_tlink(tlink);
-	nd_set_link(nd, target_path);
+	nd_set_link(target_path);
 	return NULL;
 }
 
diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index 1397342aad5b..a83685894e1c 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -279,12 +279,12 @@ static int configfs_getlink(struct dentry *dentry, char * path)
 
 }
 
-static void *configfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *configfs_follow_link(struct dentry *dentry, int flags)
 {
 	int error = -ENOMEM;
 	unsigned long page;
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 
 	page = get_zeroed_page(GFP_KERNEL);
@@ -292,17 +292,16 @@ static void *configfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	if (page) {
 		error = configfs_getlink(dentry, (char *)page);
 		if (!error) {
-			nd_set_link(nd, (char *)page);
+			nd_set_link((char *)page);
 			return (void *)page;
 		}
 	}
 
-	nd_set_link(nd, ERR_PTR(error));
+	nd_set_link(ERR_PTR(error));
 	return NULL;
 }
 
-static void configfs_put_link(struct dentry *dentry, struct nameidata *nd,
-			      void *cookie)
+static void configfs_put_link(struct dentry *dentry, void *cookie)
 {
 	if (cookie) {
 		unsigned long page = (unsigned long)cookie;
diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index 517e64938438..3dd676a2a4c0 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -43,9 +43,9 @@ const struct file_operations debugfs_file_operations = {
 	.llseek =	noop_llseek,
 };
 
-static void *debugfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *debugfs_follow_link(struct dentry *dentry, int flags)
 {
-	nd_set_link(nd, dentry->d_inode->i_private);
+	nd_set_link(dentry->d_inode->i_private);
 	return NULL;
 }
 
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 49d3dd96344c..47f8d3a3ff48 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -170,7 +170,6 @@ out_unlock:
  * @directory_inode: inode of the new file's dentry's parent in ecryptfs
  * @ecryptfs_dentry: New file's dentry in ecryptfs
  * @mode: The mode of the new file
- * @nd: nameidata of ecryptfs' parent's dentry & vfsmount
  *
  * Creates the underlying file and the eCryptfs inode which will link to
  * it. It will also update the eCryptfs directory inode to mimic the
@@ -384,7 +383,6 @@ static int ecryptfs_lookup_interpose(struct dentry *dentry,
  * ecryptfs_lookup
  * @ecryptfs_dir_inode: The eCryptfs directory inode
  * @ecryptfs_dentry: The eCryptfs dentry that we are looking up
- * @ecryptfs_nd: nameidata; may be NULL
  *
  * Find a file on disk. If the file does not exist, then we'll add it to the
  * dentry cache and continue on to read it from the disk.
@@ -675,12 +673,12 @@ out:
 	return rc ? ERR_PTR(rc) : buf;
 }
 
-static void *ecryptfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ecryptfs_follow_link(struct dentry *dentry, int flags)
 {
 	size_t len;
 	char *buf;
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 
 	buf = ecryptfs_readlink_lower(dentry, &len);
@@ -690,7 +688,7 @@ static void *ecryptfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 				ecryptfs_dentry_to_lower(dentry)->d_inode);
 	buf[len] = '\0';
 out:
-	nd_set_link(nd, buf);
+	nd_set_link(buf);
 	return NULL;
 }
 
diff --git a/fs/exofs/symlink.c b/fs/exofs/symlink.c
index 832e2624b80b..5565f457358c 100644
--- a/fs/exofs/symlink.c
+++ b/fs/exofs/symlink.c
@@ -35,11 +35,11 @@
 
 #include "exofs.h"
 
-static void *exofs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *exofs_follow_link(struct dentry *dentry, int flags)
 {
 	struct exofs_i_info *oi = exofs_i(dentry->d_inode);
 
-	nd_set_link(nd, (char *)oi->i_data);
+	nd_set_link((char *)oi->i_data);
 	return NULL;
 }
 
diff --git a/fs/ext2/symlink.c b/fs/ext2/symlink.c
index 565cf817bbf1..dbad23054842 100644
--- a/fs/ext2/symlink.c
+++ b/fs/ext2/symlink.c
@@ -21,10 +21,10 @@
 #include "xattr.h"
 #include <linux/namei.h>
 
-static void *ext2_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ext2_follow_link(struct dentry *dentry, int flags)
 {
 	struct ext2_inode_info *ei = EXT2_I(dentry->d_inode);
-	nd_set_link(nd, (char *)ei->i_data);
+	nd_set_link((char *)ei->i_data);
 	return NULL;
 }
 
diff --git a/fs/ext3/symlink.c b/fs/ext3/symlink.c
index 6b01c3eab1f3..28bee0541bc1 100644
--- a/fs/ext3/symlink.c
+++ b/fs/ext3/symlink.c
@@ -21,10 +21,10 @@
 #include "ext3.h"
 #include "xattr.h"
 
-static void * ext3_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void * ext3_follow_link(struct dentry *dentry, int flags)
 {
 	struct ext3_inode_info *ei = EXT3_I(dentry->d_inode);
-	nd_set_link(nd, (char*)ei->i_data);
+	nd_set_link((char*)ei->i_data);
 	return NULL;
 }
 
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index ff3711932018..eb987cc01c85 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -23,10 +23,10 @@
 #include "ext4.h"
 #include "xattr.h"
 
-static void *ext4_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ext4_follow_link(struct dentry *dentry, int flags)
 {
 	struct ext4_inode_info *ei = EXT4_I(dentry->d_inode);
-	nd_set_link(nd, (char *) ei->i_data);
+	nd_set_link((char *) ei->i_data);
 	return NULL;
 }
 
diff --git a/fs/freevxfs/vxfs_immed.c b/fs/freevxfs/vxfs_immed.c
index c36aeaf92e41..ea20270f46f8 100644
--- a/fs/freevxfs/vxfs_immed.c
+++ b/fs/freevxfs/vxfs_immed.c
@@ -39,7 +39,7 @@
 #include "vxfs_inode.h"
 
 
-static void *	vxfs_immed_follow_link(struct dentry *, struct nameidata *);
+static void *	vxfs_immed_follow_link(struct dentry *, int);
 
 static int	vxfs_immed_readpage(struct file *, struct page *);
 
@@ -64,7 +64,7 @@ const struct address_space_operations vxfs_immed_aops = {
 /**
  * vxfs_immed_follow_link - follow immed symlink
  * @dp:		dentry for the link
- * @np:		pathname lookup data for the current path walk
+ * @flags:	lookup flags for the current path walk
  *
  * Description:
  *   vxfs_immed_follow_link restarts the pathname lookup with
@@ -74,10 +74,10 @@ const struct address_space_operations vxfs_immed_aops = {
  *   Zero on success, else a negative error code.
  */
 static void *
-vxfs_immed_follow_link(struct dentry *dp, struct nameidata *np)
+vxfs_immed_follow_link(struct dentry *dp, int flags)
 {
 	struct vxfs_inode_info		*vip = VXFS_INO(dp->d_inode);
-	nd_set_link(np, vip->vii_immed.vi_immed);
+	nd_set_link(vip->vii_immed.vi_immed);
 	return NULL;
 }
 
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 15d326ec5943..9a4ca5dc62f1 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1400,17 +1400,17 @@ static void free_link(char *link)
 		free_page((unsigned long) link);
 }
 
-static void *fuse_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *fuse_follow_link(struct dentry *dentry, int flags)
 {
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
-	nd_set_link(nd, read_link(dentry));
+	nd_set_link(read_link(dentry));
 	return NULL;
 }
 
-static void fuse_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
+static void fuse_put_link(struct dentry *dentry, void *c)
 {
-	free_link(nd_get_link(nd));
+	free_link(nd_get_link());
 }
 
 static int fuse_dir_open(struct inode *inode, struct file *file)
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 21086c7870f1..f0691c863956 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1541,14 +1541,14 @@ out:
 /**
  * gfs2_follow_link - Follow a symbolic link
  * @dentry: The dentry of the link
- * @nd: Data that we pass to vfs_follow_link()
+ * @flags: Lookup flags
  *
  * This can handle symlinks of any size.
  *
  * Returns: 0 on success or error code
  */
 
-static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *gfs2_follow_link(struct dentry *dentry, int flags)
 {
 	struct gfs2_inode *ip = GFS2_I(dentry->d_inode);
 	struct gfs2_holder i_gh;
@@ -1557,13 +1557,13 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
 	char *buf;
 	int error;
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 	gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &i_gh);
 	error = gfs2_glock_nq(&i_gh);
 	if (error) {
 		gfs2_holder_uninit(&i_gh);
-		nd_set_link(nd, ERR_PTR(error));
+		nd_set_link(ERR_PTR(error));
 		return NULL;
 	}
 
@@ -1588,7 +1588,7 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
 	brelse(dibh);
 out:
 	gfs2_glock_dq_uninit(&i_gh);
-	nd_set_link(nd, buf);
+	nd_set_link(buf);
 	return NULL;
 }
 
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index 374d04909538..da224778b1be 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -882,11 +882,11 @@ static const struct inode_operations hostfs_dir_iops = {
 	.setattr	= hostfs_setattr,
 };
 
-static void *hostfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *hostfs_follow_link(struct dentry *dentry, int flags)
 {
 	char *link;
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 
 	link = __getname();
@@ -907,13 +907,13 @@ static void *hostfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 		link = ERR_PTR(-ENOMEM);
 	}
 
-	nd_set_link(nd, link);
+	nd_set_link(link);
 	return NULL;
 }
 
-static void hostfs_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+static void hostfs_put_link(struct dentry *dentry, void *cookie)
 {
-	char *s = nd_get_link(nd);
+	char *s = nd_get_link();
 	if (!IS_ERR(s))
 		__putname(s);
 }
diff --git a/fs/hppfs/hppfs.c b/fs/hppfs/hppfs.c
index 043ac9d77262..37d9a777f8e0 100644
--- a/fs/hppfs/hppfs.c
+++ b/fs/hppfs/hppfs.c
@@ -642,20 +642,19 @@ static int hppfs_readlink(struct dentry *dentry, char __user *buffer,
 						    buflen);
 }
 
-static void *hppfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *hppfs_follow_link(struct dentry *dentry, int flags)
 {
 	struct dentry *proc_dentry = HPPFS_I(dentry->d_inode)->proc_dentry;
 
-	return proc_dentry->d_inode->i_op->follow_link(proc_dentry, nd);
+	return proc_dentry->d_inode->i_op->follow_link(proc_dentry, flags);
 }
 
-static void hppfs_put_link(struct dentry *dentry, struct nameidata *nd,
-			   void *cookie)
+static void hppfs_put_link(struct dentry *dentry, void *cookie)
 {
 	struct dentry *proc_dentry = HPPFS_I(dentry->d_inode)->proc_dentry;
 
 	if (proc_dentry->d_inode->i_op->put_link)
-		proc_dentry->d_inode->i_op->put_link(proc_dentry, nd, cookie);
+		proc_dentry->d_inode->i_op->put_link(proc_dentry, cookie);
 }
 
 static const struct inode_operations hppfs_dir_iops = {
diff --git a/fs/jffs2/symlink.c b/fs/jffs2/symlink.c
index c7c77b0dfccd..c2bebe5c7c42 100644
--- a/fs/jffs2/symlink.c
+++ b/fs/jffs2/symlink.c
@@ -16,7 +16,7 @@
 #include <linux/namei.h>
 #include "nodelist.h"
 
-static void *jffs2_follow_link(struct dentry *dentry, struct nameidata *nd);
+static void *jffs2_follow_link(struct dentry *dentry, int flags);
 
 const struct inode_operations jffs2_symlink_inode_operations =
 {
@@ -29,7 +29,7 @@ const struct inode_operations jffs2_symlink_inode_operations =
 	.removexattr =	jffs2_removexattr
 };
 
-static void *jffs2_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *jffs2_follow_link(struct dentry *dentry, int flags)
 {
 	struct jffs2_inode_info *f = JFFS2_INODE_INFO(dentry->d_inode);
 	char *p = (char *)f->target;
@@ -54,7 +54,7 @@ static void *jffs2_follow_link(struct dentry *dentry, struct nameidata *nd)
 	jffs2_dbg(1, "%s(): target path is '%s'\n",
 		  __func__, (char *)f->target);
 
-	nd_set_link(nd, p);
+	nd_set_link(p);
 
 	/*
 	 * We will unlock the f->sem mutex but VFS will use the f->target string. This is safe
diff --git a/fs/jfs/symlink.c b/fs/jfs/symlink.c
index 205b946d8e0d..1cfae27aa6a8 100644
--- a/fs/jfs/symlink.c
+++ b/fs/jfs/symlink.c
@@ -22,10 +22,10 @@
 #include "jfs_inode.h"
 #include "jfs_xattr.h"
 
-static void *jfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *jfs_follow_link(struct dentry *dentry, int flags)
 {
 	char *s = JFS_IP(dentry->d_inode)->i_inline;
-	nd_set_link(nd, s);
+	nd_set_link(s);
 	return NULL;
 }
 
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 8e5421f386c0..88694e0df282 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -112,12 +112,12 @@ static int kernfs_getlink(struct dentry *dentry, char *path)
 	return error;
 }
 
-static void *kernfs_iop_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *kernfs_iop_follow_link(struct dentry *dentry, int flags)
 {
 	int error = -ENOMEM;
 	unsigned long page;
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 
 	page = get_zeroed_page(GFP_KERNEL);
@@ -126,14 +126,13 @@ static void *kernfs_iop_follow_link(struct dentry *dentry, struct nameidata *nd)
 		if (error < 0)
 			free_page((unsigned long)page);
 	}
-	nd_set_link(nd, error ? ERR_PTR(error) : (char *)page);
+	nd_set_link(error ? ERR_PTR(error) : (char *)page);
 	return NULL;
 }
 
-static void kernfs_iop_put_link(struct dentry *dentry, struct nameidata *nd,
-				void *cookie)
+static void kernfs_iop_put_link(struct dentry *dentry, void *cookie)
 {
-	char *page = nd_get_link(nd);
+	char *page = nd_get_link();
 	if (!IS_ERR(page))
 		free_page((unsigned long)page);
 }
diff --git a/fs/libfs.c b/fs/libfs.c
index 0ab65122ee45..f437c8489998 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1024,10 +1024,9 @@ int noop_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 }
 EXPORT_SYMBOL(noop_fsync);
 
-void kfree_put_link(struct dentry *dentry, struct nameidata *nd,
-				void *cookie)
+void kfree_put_link(struct dentry *dentry, void *cookie)
 {
-	char *s = nd_get_link(nd);
+	char *s = nd_get_link();
 	if (!IS_ERR(s))
 		kfree(s);
 }
diff --git a/fs/namei.c b/fs/namei.c
index eefa4a00501a..9a5d429f2a8a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -722,8 +722,10 @@ static inline void path_to_nameidata(const struct path *path,
  * Helper to directly jump to a known parsed path from ->follow_link,
  * caller must have taken a reference to path beforehand.
  */
-void nd_jump_link(struct nameidata *nd, struct path *path)
+void nd_jump_link(struct path *path)
 {
+	struct nameidata *nd = current->nameidata;
+
 	path_put(&nd->path);
 
 	nd->path = *path;
@@ -731,14 +733,18 @@ void nd_jump_link(struct nameidata *nd, struct path *path)
 	nd->flags |= LOOKUP_JUMPED;
 }
 
-void nd_set_link(struct nameidata *nd, char *path)
+void nd_set_link(char *path)
 {
+	struct nameidata *nd = current->nameidata;
+
 	nd->saved_names[nd->depth] = path;
 }
 EXPORT_SYMBOL(nd_set_link);
 
-char *nd_get_link(struct nameidata *nd)
+char *nd_get_link(void)
 {
+	struct nameidata *nd = current->nameidata;
+
 	return nd->saved_names[nd->depth];
 }
 EXPORT_SYMBOL(nd_get_link);
@@ -747,7 +753,7 @@ static inline void put_link(struct nameidata *nd, struct path *link, void *cooki
 {
 	struct inode *inode = link->dentry->d_inode;
 	if (inode->i_op->put_link)
-		inode->i_op->put_link(link->dentry, nd, cookie);
+		inode->i_op->put_link(link->dentry, cookie);
 	path_put(link);
 }
 
@@ -887,20 +893,20 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	current->nameidata->total_link_count++;
 
 	touch_atime(link);
-	nd_set_link(nd, NULL);
+	nd_set_link(NULL);
 
-	error = security_inode_follow_link(link->dentry, nd);
+	error = security_inode_follow_link(link->dentry, nd->flags);
 	if (error)
 		goto out_put_nd_path;
 
 	nd->last_type = LAST_BIND;
-	*p = dentry->d_inode->i_op->follow_link(dentry, nd);
+	*p = dentry->d_inode->i_op->follow_link(dentry, nd->flags);
 	error = PTR_ERR(*p);
 	if (IS_ERR(*p))
 		goto out_put_nd_path;
 
 	error = 0;
-	s = nd_get_link(nd);
+	s = nd_get_link();
 	if (s) {
 		if (unlikely(IS_ERR(s))) {
 			path_put(&nd->path);
@@ -4458,13 +4464,13 @@ int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 	int res;
 
 	nd.depth = 0;
-	cookie = dentry->d_inode->i_op->follow_link(dentry, &nd);
+	cookie = dentry->d_inode->i_op->follow_link(dentry, nd.flags);
 	if (IS_ERR(cookie))
 		return PTR_ERR(cookie);
 
-	res = readlink_copy(buffer, buflen, nd_get_link(&nd));
+	res = readlink_copy(buffer, buflen, nd_get_link());
 	if (dentry->d_inode->i_op->put_link)
-		dentry->d_inode->i_op->put_link(dentry, &nd, cookie);
+		dentry->d_inode->i_op->put_link(dentry,  cookie);
 	set_nameidata(saved);
 	return res;
 }
@@ -4497,17 +4503,17 @@ int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 }
 EXPORT_SYMBOL(page_readlink);
 
-void *page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
+void *page_follow_link_light(struct dentry *dentry, int flags)
 {
 	struct page *page = NULL;
-	if (nd->flags & LOOKUP_RCU)
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
-	nd_set_link(nd, page_getlink(dentry, &page));
+	nd_set_link(page_getlink(dentry, &page));
 	return page;
 }
 EXPORT_SYMBOL(page_follow_link_light);
 
-void page_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+void page_put_link(struct dentry *dentry, void *cookie)
 {
 	struct page *page = cookie;
 
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index c9a2d3cc4619..43e43d2c8c5b 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -43,13 +43,13 @@ error:
 	return -EIO;
 }
 
-static void *nfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *nfs_follow_link(struct dentry *dentry, int flags)
 {
 	struct inode *inode = dentry->d_inode;
 	struct page *page;
 	void *err;
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 	err = ERR_PTR(nfs_revalidate_mapping(inode, inode->i_mapping));
 	if (err)
@@ -60,11 +60,11 @@ static void *nfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 		err = page;
 		goto read_failed;
 	}
-	nd_set_link(nd, kmap(page));
+	nd_set_link(kmap(page));
 	return page;
 
 read_failed:
-	nd_set_link(nd, err);
+	nd_set_link(err);
 	return NULL;
 }
 
diff --git a/fs/ntfs/namei.c b/fs/ntfs/namei.c
index b3973c2fd190..a6a240ecf878 100644
--- a/fs/ntfs/namei.c
+++ b/fs/ntfs/namei.c
@@ -35,7 +35,6 @@
  * ntfs_lookup - find the inode represented by a dentry in a directory inode
  * @dir_ino:	directory inode in which to look for the inode
  * @dent:	dentry representing the inode to look for
- * @nd:		lookup nameidata
  *
  * In short, ntfs_lookup() looks for the inode represented by the dentry @dent
  * in the directory inode @dir_ino and if found attaches the inode to the
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index db370d5d84c4..e8ef9f44af12 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -141,13 +141,13 @@ struct ovl_link_data {
 	void *cookie;
 };
 
-static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ovl_follow_link(struct dentry *dentry, int flags)
 {
 	void *ret;
 	struct dentry *realdentry;
 	struct inode *realinode;
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 	realdentry = ovl_dentry_real(dentry);
 	realinode = realdentry->d_inode;
@@ -155,7 +155,7 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
 	if (WARN_ON(!realinode->i_op->follow_link))
 		return ERR_PTR(-EPERM);
 
-	ret = realinode->i_op->follow_link(realdentry, nd);
+	ret = realinode->i_op->follow_link(realdentry, flags);
 	if (IS_ERR(ret))
 		return ret;
 
@@ -164,7 +164,7 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
 
 		data = kmalloc(sizeof(struct ovl_link_data), GFP_KERNEL);
 		if (!data) {
-			realinode->i_op->put_link(realdentry, nd, ret);
+			realinode->i_op->put_link(realdentry, ret);
 			return ERR_PTR(-ENOMEM);
 		}
 		data->realdentry = realdentry;
@@ -176,7 +176,7 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
 	}
 }
 
-static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
+static void ovl_put_link(struct dentry *dentry, void *c)
 {
 	struct inode *realinode;
 	struct ovl_link_data *data = c;
@@ -185,7 +185,7 @@ static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
 		return;
 
 	realinode = data->realdentry->d_inode;
-	realinode->i_op->put_link(data->realdentry, nd, data->cookie);
+	realinode->i_op->put_link(data->realdentry, data->cookie);
 	kfree(data);
 }
 
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 6f5dbfe68516..7e6f95c0d58d 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1371,13 +1371,13 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
 		return -ENOENT;
 }
 
-static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *proc_pid_follow_link(struct dentry *dentry, int flags)
 {
 	struct inode *inode = dentry->d_inode;
 	struct path path;
 	int error = -EACCES;
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 	/* Are we allowed to snoop on the tasks file descriptors? */
 	if (!proc_fd_access_allowed(inode))
@@ -1387,7 +1387,7 @@ static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
 	if (error)
 		goto out;
 
-	nd_jump_link(nd, &path);
+	nd_jump_link(&path);
 	return NULL;
 out:
 	return ERR_PTR(error);
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 7697b6621cfd..f9980443427c 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -394,16 +394,16 @@ static const struct file_operations proc_reg_file_ops_no_compat = {
 };
 #endif
 
-static void *proc_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *proc_follow_link(struct dentry *dentry, int flags)
 {
 	struct proc_dir_entry *pde = PDE(dentry->d_inode);
 	if (unlikely(!use_pde(pde)))
 		return ERR_PTR(-EINVAL);
-	nd_set_link(nd, pde->data);
+	nd_set_link(pde->data);
 	return pde;
 }
 
-static void proc_put_link(struct dentry *dentry, struct nameidata *nd, void *p)
+static void proc_put_link(struct dentry *dentry, void *p)
 {
 	unuse_pde(p);
 }
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index c89a51401bb5..46c7ab225e17 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -30,7 +30,7 @@ static const struct proc_ns_operations *ns_entries[] = {
 	&mntns_operations,
 };
 
-static void *proc_ns_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *proc_ns_follow_link(struct dentry *dentry, int flags)
 {
 	struct inode *inode = dentry->d_inode;
 	const struct proc_ns_operations *ns_ops = PROC_I(inode)->ns_ops;
@@ -38,7 +38,7 @@ static void *proc_ns_follow_link(struct dentry *dentry, struct nameidata *nd)
 	struct path ns_path;
 	void *error = ERR_PTR(-EACCES);
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 
 	task = get_proc_task(inode);
@@ -48,7 +48,7 @@ static void *proc_ns_follow_link(struct dentry *dentry, struct nameidata *nd)
 	if (ptrace_may_access(task, PTRACE_MODE_READ)) {
 		error = ns_get_path(&ns_path, task, ns_ops);
 		if (!error)
-			nd_jump_link(nd, &ns_path);
+			nd_jump_link(&ns_path);
 	}
 	put_task_struct(task);
 	return error;
diff --git a/fs/proc/self.c b/fs/proc/self.c
index c094ea04e1bb..c56e282b84b8 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -19,13 +19,13 @@ static int proc_self_readlink(struct dentry *dentry, char __user *buffer,
 	return readlink_copy(buffer, buflen, tmp);
 }
 
-static void *proc_self_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *proc_self_follow_link(struct dentry *dentry, int flags)
 {
 	struct pid_namespace *ns = dentry->d_sb->s_fs_info;
 	pid_t tgid;
 	char *name = ERR_PTR(-ENOENT);
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 
 	tgid = task_tgid_nr_ns(current, ns);
@@ -37,7 +37,7 @@ static void *proc_self_follow_link(struct dentry *dentry, struct nameidata *nd)
 		else
 			sprintf(name, "%d", tgid);
 	}
-	nd_set_link(nd, name);
+	nd_set_link(name);
 	return NULL;
 }
 
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index 5d3144d51018..78b35e18a042 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -20,14 +20,14 @@ static int proc_thread_self_readlink(struct dentry *dentry, char __user *buffer,
 	return readlink_copy(buffer, buflen, tmp);
 }
 
-static void *proc_thread_self_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *proc_thread_self_follow_link(struct dentry *dentry, int flags)
 {
 	struct pid_namespace *ns = dentry->d_sb->s_fs_info;
 	pid_t tgid;
 	pid_t pid;
 	char *name = ERR_PTR(-ENOENT);
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 
 	tgid = task_tgid_nr_ns(current, ns);
@@ -39,7 +39,7 @@ static void *proc_thread_self_follow_link(struct dentry *dentry, struct nameidat
 		else
 			sprintf(name, "%d/task/%d", tgid, pid);
 	}
-	nd_set_link(nd, name);
+	nd_set_link(name);
 	return NULL;
 }
 
diff --git a/fs/sysv/symlink.c b/fs/sysv/symlink.c
index 00d2f8a43e4e..ad285577a928 100644
--- a/fs/sysv/symlink.c
+++ b/fs/sysv/symlink.c
@@ -8,9 +8,9 @@
 #include "sysv.h"
 #include <linux/namei.h>
 
-static void *sysv_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *sysv_follow_link(struct dentry *dentry, int flags)
 {
-	nd_set_link(nd, (char *)SYSV_I(dentry->d_inode)->i_data);
+	nd_set_link((char *)SYSV_I(dentry->d_inode)->i_data);
 	return NULL;
 }
 
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index e627c0acf626..ccc83837f078 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1300,11 +1300,11 @@ static void ubifs_invalidatepage(struct page *page, unsigned int offset,
 	ClearPageChecked(page);
 }
 
-static void *ubifs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ubifs_follow_link(struct dentry *dentry, int flags)
 {
 	struct ubifs_inode *ui = ubifs_inode(dentry->d_inode);
 
-	nd_set_link(nd, ui->data);
+	nd_set_link(ui->data);
 	return NULL;
 }
 
diff --git a/fs/ufs/symlink.c b/fs/ufs/symlink.c
index d283628b4778..29622d6beaa4 100644
--- a/fs/ufs/symlink.c
+++ b/fs/ufs/symlink.c
@@ -32,10 +32,10 @@
 #include "ufs.h"
 
 
-static void *ufs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ufs_follow_link(struct dentry *dentry, int flags)
 {
 	struct ufs_inode_info *p = UFS_I(dentry->d_inode);
-	nd_set_link(nd, (char*)p->i_u1.i_symlink);
+	nd_set_link((char*)p->i_u1.i_symlink);
 	return NULL;
 }
 
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 23cea798b777..8fd416ae935a 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -412,12 +412,12 @@ xfs_vn_rename(
 STATIC void *
 xfs_vn_follow_link(
 	struct dentry		*dentry,
-	struct nameidata	*nd)
+	int			flags)
 {
 	char			*link;
 	int			error = -ENOMEM;
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 	link = kmalloc(MAXPATHLEN+1, GFP_KERNEL);
 	if (!link)
@@ -427,13 +427,13 @@ xfs_vn_follow_link(
 	if (unlikely(error))
 		goto out_kfree;
 
-	nd_set_link(nd, link);
+	nd_set_link(link);
 	return NULL;
 
  out_kfree:
 	kfree(link);
  out_err:
-	nd_set_link(nd, ERR_PTR(error));
+	nd_set_link(ERR_PTR(error));
 	return NULL;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index eaef987ae3cf..b7d578d552bf 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -38,7 +38,6 @@ struct backing_dev_info;
 struct export_operations;
 struct hd_geometry;
 struct iovec;
-struct nameidata;
 struct kiocb;
 struct kobject;
 struct pipe_inode_info;
@@ -1574,12 +1573,12 @@ struct file_operations {
 
 struct inode_operations {
 	struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
-	void * (*follow_link) (struct dentry *, struct nameidata *);
+	void * (*follow_link) (struct dentry *, int);
 	int (*permission) (struct inode *, int);
 	struct posix_acl * (*get_acl)(struct inode *, int);
 
 	int (*readlink) (struct dentry *, char __user *,int);
-	void (*put_link) (struct dentry *, struct nameidata *, void *);
+	void (*put_link) (struct dentry *, void *);
 
 	int (*create) (struct inode *,struct dentry *, umode_t, bool);
 	int (*link) (struct dentry *,struct inode *,struct dentry *);
@@ -2167,7 +2166,6 @@ extern struct filename *getname_flags(const char __user *, int, int *);
 extern struct filename *getname(const char __user *);
 extern struct filename *getname_kernel(const char *);
 extern void putname(struct filename *name);
-extern int nd_is_rcu(struct nameidata *nd);
 
 enum {
 	FILE_CREATED = 1,
@@ -2650,13 +2648,13 @@ extern const struct file_operations generic_ro_fops;
 
 extern int readlink_copy(char __user *, int, const char *);
 extern int page_readlink(struct dentry *, char __user *, int);
-extern void *page_follow_link_light(struct dentry *, struct nameidata *);
-extern void page_put_link(struct dentry *, struct nameidata *, void *);
+extern void *page_follow_link_light(struct dentry *, int);
+extern void page_put_link(struct dentry *, void *);
 extern int __page_symlink(struct inode *inode, const char *symname, int len,
 		int nofs);
 extern int page_symlink(struct inode *inode, const char *symname, int len);
 extern const struct inode_operations page_symlink_inode_operations;
-extern void kfree_put_link(struct dentry *, struct nameidata *, void *);
+extern void kfree_put_link(struct dentry *, void *);
 extern int generic_readlink(struct dentry *, char __user *, int);
 extern void generic_fillattr(struct inode *, struct kstat *);
 int vfs_getattr_nosec(struct path *path, struct kstat *stat);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index c8990779f0c3..368eb3d721b8 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -7,7 +7,6 @@
 #include <linux/path.h>
 
 struct vfsmount;
-struct nameidata;
 
 enum { MAX_NESTED_LINKS = 8 };
 
@@ -70,9 +69,9 @@ extern int follow_up(struct path *);
 extern struct dentry *lock_rename(struct dentry *, struct dentry *);
 extern void unlock_rename(struct dentry *, struct dentry *);
 
-extern void nd_jump_link(struct nameidata *nd, struct path *path);
-extern void nd_set_link(struct nameidata *nd, char *path);
-extern char *nd_get_link(struct nameidata *nd);
+extern void nd_jump_link(struct path *path);
+extern void nd_set_link(char *path);
+extern char *nd_get_link(void);
 
 static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
 {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b88b9eea169a..5d85ef2b64c3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1267,6 +1267,7 @@ union rcu_special {
 	short s;
 };
 struct rcu_node;
+struct nameidata;
 
 enum perf_event_task_context {
 	perf_invalid_context = -1,
diff --git a/include/linux/security.h b/include/linux/security.h
index a1b7dbd127ff..587f7b0849b6 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -43,7 +43,6 @@ struct file;
 struct vfsmount;
 struct path;
 struct qstr;
-struct nameidata;
 struct iattr;
 struct fown_struct;
 struct file_operations;
@@ -477,7 +476,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
  * @inode_follow_link:
  *	Check permission to follow a symbolic link when looking up a pathname.
  *	@dentry contains the dentry structure for the link.
- *	@nd contains the nameidata structure for the parent directory.
+ *	@flags contains lookup flags
  *	Return 0 if permission is granted.
  * @inode_permission:
  *	Check permission before accessing an inode.  This hook is called by the
@@ -1553,7 +1552,7 @@ struct security_operations {
 	int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry,
 			     struct inode *new_dir, struct dentry *new_dentry);
 	int (*inode_readlink) (struct dentry *dentry);
-	int (*inode_follow_link) (struct dentry *dentry, struct nameidata *nd);
+	int (*inode_follow_link) (struct dentry *dentry, int flags);
 	int (*inode_permission) (struct inode *inode, int mask);
 	int (*inode_setattr)	(struct dentry *dentry, struct iattr *attr);
 	int (*inode_getattr) (struct vfsmount *mnt, struct dentry *dentry);
@@ -1840,7 +1839,7 @@ int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
 			  struct inode *new_dir, struct dentry *new_dentry,
 			  unsigned int flags);
 int security_inode_readlink(struct dentry *dentry);
-int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd);
+int security_inode_follow_link(struct dentry *dentry, int flags);
 int security_inode_permission(struct inode *inode, int mask);
 int security_inode_setattr(struct dentry *dentry, struct iattr *attr);
 int security_inode_getattr(struct vfsmount *mnt, struct dentry *dentry);
@@ -2243,7 +2242,7 @@ static inline int security_inode_readlink(struct dentry *dentry)
 }
 
 static inline int security_inode_follow_link(struct dentry *dentry,
-					      struct nameidata *nd)
+					      int flags)
 {
 	return 0;
 }
diff --git a/mm/shmem.c b/mm/shmem.c
index fdf6ba18fce3..8f23e0f5e050 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2474,29 +2474,29 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
 	return 0;
 }
 
-static void *shmem_follow_short_symlink(struct dentry *dentry, struct nameidata *nd)
+static void *shmem_follow_short_symlink(struct dentry *dentry, int flags)
 {
-	nd_set_link(nd, SHMEM_I(dentry->d_inode)->symlink);
+	nd_set_link(SHMEM_I(dentry->d_inode)->symlink);
 	return NULL;
 }
 
-static void *shmem_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *shmem_follow_link(struct dentry *dentry, int flags)
 {
 	struct page *page = NULL;
 	int error;
 
-	if (nd_is_rcu(nd))
+	if (flags & LOOKUP_RCU)
 		return ERR_PTR(-ECHILD);
 	error = shmem_getpage(dentry->d_inode, 0, &page, SGP_READ, NULL);
-	nd_set_link(nd, error ? ERR_PTR(error) : kmap(page));
+	nd_set_link(error ? ERR_PTR(error) : kmap(page));
 	if (page)
 		unlock_page(page);
 	return page;
 }
 
-static void shmem_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+static void shmem_put_link(struct dentry *dentry, void *cookie)
 {
-	if (!IS_ERR(nd_get_link(nd))) {
+	if (!IS_ERR(nd_get_link())) {
 		struct page *page = cookie;
 		kunmap(page);
 		mark_page_accessed(page);
diff --git a/security/capability.c b/security/capability.c
index 070dd46f62f4..569e4253343c 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -210,7 +210,7 @@ static int cap_inode_readlink(struct dentry *dentry)
 }
 
 static int cap_inode_follow_link(struct dentry *dentry,
-				 struct nameidata *nameidata)
+				 int flags)
 {
 	return 0;
 }
diff --git a/security/security.c b/security/security.c
index e81d5bbe7363..5798987b2a18 100644
--- a/security/security.c
+++ b/security/security.c
@@ -581,11 +581,11 @@ int security_inode_readlink(struct dentry *dentry)
 	return security_ops->inode_readlink(dentry);
 }
 
-int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd)
+int security_inode_follow_link(struct dentry *dentry, int flags)
 {
 	if (unlikely(IS_PRIVATE(dentry->d_inode)))
 		return 0;
-	return security_ops->inode_follow_link(dentry, nd);
+	return security_ops->inode_follow_link(dentry, flags);
 }
 
 int security_inode_permission(struct inode *inode, int mask)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 4d1a54190388..e3074e01f058 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2862,7 +2862,7 @@ static int selinux_inode_readlink(struct dentry *dentry)
 	return dentry_has_perm(cred, dentry, FILE__READ);
 }
 
-static int selinux_inode_follow_link(struct dentry *dentry, struct nameidata *nameidata)
+static int selinux_inode_follow_link(struct dentry *dentry, int flags)
 {
 	const struct cred *cred = current_cred();
 



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/13] security/selinux: check for LOOKUP_RCU in _follow_link.
  2015-03-16  4:43 [PATCH 00/13] Support follow_link in RCU-walk. - V2 NeilBrown
  2015-03-16  4:43 ` [PATCH 01/13] VFS: replace {, total_}link_count in task_struct with pointer to nameidata NeilBrown
  2015-03-16  4:43 ` [PATCH 02/13] VFS: make all ->follow_link handlers aware for LOOKUP_RCU NeilBrown
@ 2015-03-16  4:43 ` NeilBrown
  2015-03-16 21:00   ` Al Viro
  2015-03-16  4:43 ` [PATCH 03/13] VFS: remove nameidata args from ->follow_link and ->put_link NeilBrown
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2015-03-16  4:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

Some of dentry_has_perm() is not rcu-safe, so if LOOKUP_RCU
is set in selinux_inode_follow_link(), give up with
-ECHILD.

It is possible that dentry_has_perm could sometimes complete
in RCU more, in which case the flag could be propagated further
down the stack...

Signed-off-by: NeilBrown <neilb@suse.de>
---
 security/selinux/hooks.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index e3074e01f058..5d4de8cbfaa6 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2866,6 +2866,8 @@ static int selinux_inode_follow_link(struct dentry *dentry, int flags)
 {
 	const struct cred *cred = current_cred();
 
+	if (flags & LOOKUP_RCU)
+		return -ECHILD;
 	return dentry_has_perm(cred, dentry, FILE__READ);
 }
 



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/13] VFS/namei: use terminate_walk when symlink lookup fails.
  2015-03-16  4:43 [PATCH 00/13] Support follow_link in RCU-walk. - V2 NeilBrown
                   ` (3 preceding siblings ...)
  2015-03-16  4:43 ` [PATCH 03/13] VFS: remove nameidata args from ->follow_link and ->put_link NeilBrown
@ 2015-03-16  4:43 ` NeilBrown
  2015-03-16  4:43 ` [PATCH 12/13] XFS: allow follow_link to often succeed in RCU-walk NeilBrown
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: NeilBrown @ 2015-03-16  4:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

Currently following a symlink never uses rcu-walk, so
terminate_walk isn't needed.
That will change in a future patch.  In preparation, change
some
  path_put_condtional()
  path_put()
sequences to
  path_to_nameidata()
  terminate_walk()

These sequence are identical when in ref-walk, and correct when in
rcu-walk.

Also change two path_put() calls to equivalent terminate_walk().

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |   40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9a5d429f2a8a..8cb89a0d30ba 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -749,6 +749,18 @@ char *nd_get_link(void)
 }
 EXPORT_SYMBOL(nd_get_link);
 
+static void terminate_walk(struct nameidata *nd)
+{
+	if (!(nd->flags & LOOKUP_RCU)) {
+		path_put(&nd->path);
+	} else {
+		nd->flags &= ~LOOKUP_RCU;
+		if (!(nd->flags & LOOKUP_ROOT))
+			nd->root.mnt = NULL;
+		rcu_read_unlock();
+	}
+}
+
 static inline void put_link(struct nameidata *nd, struct path *link, void *cookie)
 {
 	struct inode *inode = link->dentry->d_inode;
@@ -799,8 +811,8 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd)
 		return 0;
 
 	audit_log_link_denied("follow_link", link);
-	path_put_conditional(link, nd);
-	path_put(&nd->path);
+	path_to_nameidata(link, nd);
+	terminate_walk(nd);
 	return -EACCES;
 }
 
@@ -909,7 +921,7 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	s = nd_get_link();
 	if (s) {
 		if (unlikely(IS_ERR(s))) {
-			path_put(&nd->path);
+			terminate_walk(nd);
 			put_link(nd, link, *p);
 			return PTR_ERR(s);
 		}
@@ -931,7 +943,7 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 
 out_put_nd_path:
 	*p = NULL;
-	path_put(&nd->path);
+	terminate_walk(nd);
 	path_put(link);
 	return error;
 }
@@ -1562,18 +1574,6 @@ static inline int handle_dots(struct nameidata *nd, int type)
 	return 0;
 }
 
-static void terminate_walk(struct nameidata *nd)
-{
-	if (!(nd->flags & LOOKUP_RCU)) {
-		path_put(&nd->path);
-	} else {
-		nd->flags &= ~LOOKUP_RCU;
-		if (!(nd->flags & LOOKUP_ROOT))
-			nd->root.mnt = NULL;
-		rcu_read_unlock();
-	}
-}
-
 /*
  * Do we need to follow links? We _really_ want to be able
  * to do this check without having to look at inode->i_op,
@@ -1645,8 +1645,8 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
 	int res;
 
 	if (unlikely(current->nameidata->link_count >= MAX_NESTED_LINKS)) {
-		path_put_conditional(path, nd);
-		path_put(&nd->path);
+		path_to_nameidata(path, nd);
+		terminate_walk(nd);
 		return -ELOOP;
 	}
 	BUG_ON(nd->depth >= MAX_NESTED_LINKS);
@@ -3267,8 +3267,8 @@ static struct file *path_openat(int dfd, struct filename *pathname,
 		struct path link = path;
 		void *cookie;
 		if (!(nd->flags & LOOKUP_FOLLOW)) {
-			path_put_conditional(&path, nd);
-			path_put(&nd->path);
+			path_to_nameidata(&path, nd);
+			terminate_walk(nd);
 			error = -ELOOP;
 			break;
 		}



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/13] VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU.
  2015-03-16  4:43 [PATCH 00/13] Support follow_link in RCU-walk. - V2 NeilBrown
                   ` (8 preceding siblings ...)
  2015-03-16  4:43 ` [PATCH 13/13] NFS: support LOOKUP_RCU in nfs_follow_link NeilBrown
@ 2015-03-16  4:43 ` NeilBrown
  2015-03-16 22:33   ` Al Viro
  2015-03-16  4:43 ` [PATCH 07/13] VFS/namei: abort RCU-walk on symlink if atime needs updating NeilBrown
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2015-03-16  4:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

When we support ->follow_link in RCU-walk we will not want to
take a reference to the 'struct path *link' passed to follow_link,
and correspondingly will not want to drop that reference.

As link_path_walk will complete_walk() in the case of an error,
and as complete_walk() will clear LOOKUP_RCU, we cannot test
LOOKUP_RCU to determine if the path should be 'put'.

So introduce a new flag: LOOKUP_LINK_RCU.  This is set on
entry to follow_link() if appropriate and put_link() will
only call path_put() if it is clear.

Also, unlazy_walk() will fail if LOOKUP_LINK_RCU is set.
This is because there is no way for unlazy_walk to get references
on all the "struct path *link"s that are protected by that flag.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c            |   18 +++++++++++++-----
 include/linux/namei.h |    1 +
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 8cb89a0d30ba..e0f889192f59 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -550,6 +550,9 @@ static int unlazy_walk(struct nameidata *nd, struct dentry *dentry)
 	struct dentry *parent = nd->path.dentry;
 
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
+	if (nd->flags & LOOKUP_LINK_RCU)
+		/* Cannot unlazy in the middle of following a symlink */
+		return -ECHILD;
 
 	/*
 	 * After legitimizing the bastards, terminate_walk()
@@ -766,7 +769,8 @@ static inline void put_link(struct nameidata *nd, struct path *link, void *cooki
 	struct inode *inode = link->dentry->d_inode;
 	if (inode->i_op->put_link)
 		inode->i_op->put_link(link->dentry, cookie);
-	path_put(link);
+	if (!(nd->flags & LOOKUP_LINK_RCU))
+		path_put(link);
 }
 
 int sysctl_protected_symlinks __read_mostly = 0;
@@ -892,9 +896,10 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	int error;
 	char *s;
 
-	BUG_ON(nd->flags & LOOKUP_RCU);
-
-	if (link->mnt == nd->path.mnt)
+	nd->flags &= ~LOOKUP_LINK_RCU;
+	if (nd->flags & LOOKUP_RCU)
+		nd->flags |= LOOKUP_LINK_RCU;
+	else if (link->mnt == nd->path.mnt)
 		mntget(link->mnt);
 
 	error = -ELOOP;
@@ -944,7 +949,8 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 out_put_nd_path:
 	*p = NULL;
 	terminate_walk(nd);
-	path_put(link);
+	if (!(nd->flags & LOOKUP_LINK_RCU))
+		path_put(link);
 	return error;
 }
 
@@ -1667,6 +1673,8 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
 
 	current->nameidata->link_count--;
 	nd->depth--;
+	if (!nd->depth)
+		nd->flags &= ~LOOKUP_LINK_RCU;
 	return res;
 }
 
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 368eb3d721b8..05b6b9c18801 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -31,6 +31,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
 #define LOOKUP_PARENT		0x0010
 #define LOOKUP_REVAL		0x0020
 #define LOOKUP_RCU		0x0040
+#define LOOKUP_LINK_RCU		0x0080
 
 /*
  * Intent data



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/13] VFS/namei: abort RCU-walk on symlink if atime needs updating.
  2015-03-16  4:43 [PATCH 00/13] Support follow_link in RCU-walk. - V2 NeilBrown
                   ` (9 preceding siblings ...)
  2015-03-16  4:43 ` [PATCH 06/13] VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU NeilBrown
@ 2015-03-16  4:43 ` NeilBrown
  2015-03-16  4:43 ` [PATCH 09/13] VFS/namei: enable RCU-walk when following symlinks NeilBrown
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: NeilBrown @ 2015-03-16  4:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

touch_atime is not RCU-safe, and so cannot be called on an
RCU walk.
However in situations where RCU-walk makes a difference,
the symlink will likely to accessed much more often than
it is useful to update the atime.

So split out the test of "Does the atime actually need to be updated"
into  atime_needs_update(), and only allow RCU-walk on a symlink if
that fails.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/inode.c         |   26 +++++++++++++++++++-------
 fs/namei.c         |    7 ++++++-
 include/linux/fs.h |    1 +
 3 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index f00b16f45507..a0da920e4650 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1584,30 +1584,41 @@ static int update_time(struct inode *inode, struct timespec *time, int flags)
  *	This function automatically handles read only file systems and media,
  *	as well as the "noatime" flag and inode specific "noatime" markers.
  */
-void touch_atime(const struct path *path)
+int atime_needs_update(const struct path *path)
 {
 	struct vfsmount *mnt = path->mnt;
 	struct inode *inode = path->dentry->d_inode;
 	struct timespec now;
 
 	if (inode->i_flags & S_NOATIME)
-		return;
+		return 0;
 	if (IS_NOATIME(inode))
-		return;
+		return 0;
 	if ((inode->i_sb->s_flags & MS_NODIRATIME) && S_ISDIR(inode->i_mode))
-		return;
+		return 0;
 
 	if (mnt->mnt_flags & MNT_NOATIME)
-		return;
+		return 0;
 	if ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode))
-		return;
+		return 0;
 
 	now = current_fs_time(inode->i_sb);
 
 	if (!relatime_need_update(mnt, inode, now))
-		return;
+		return 0;
 
 	if (timespec_equal(&inode->i_atime, &now))
+		return 0;
+	return 1;
+}
+
+void touch_atime(const struct path *path)
+{
+	struct vfsmount *mnt = path->mnt;
+	struct inode *inode = path->dentry->d_inode;
+	struct timespec now;
+
+	if (!atime_needs_update(path))
 		return;
 
 	if (!sb_start_write_trylock(inode->i_sb))
@@ -1624,6 +1635,7 @@ void touch_atime(const struct path *path)
 	 * We may also fail on filesystems that have the ability to make parts
 	 * of the fs read only, e.g. subvolumes in Btrfs.
 	 */
+	now = current_fs_time(inode->i_sb);
 	update_time(inode, &now, S_ATIME);
 	__mnt_drop_write(mnt);
 skip_update:
diff --git a/fs/namei.c b/fs/namei.c
index e0f889192f59..1663d21a3eb4 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -909,7 +909,12 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	cond_resched();
 	current->nameidata->total_link_count++;
 
-	touch_atime(link);
+	if (nd->flags & LOOKUP_RCU) {
+		error = -ECHILD;
+		if (atime_needs_update(link))
+			goto out_put_nd_path;
+	} else
+		touch_atime(link);
 	nd_set_link(NULL);
 
 	error = security_inode_follow_link(link->dentry, nd->flags);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b7d578d552bf..41e6d99031dd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1844,6 +1844,7 @@ enum file_time_flags {
 	S_VERSION = 8,
 };
 
+extern int atime_needs_update(const struct path *);
 extern void touch_atime(const struct path *);
 static inline void file_accessed(struct file *file)
 {



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/13] VFS/namei: enhance follow_link to support RCU-walk.
  2015-03-16  4:43 [PATCH 00/13] Support follow_link in RCU-walk. - V2 NeilBrown
                   ` (11 preceding siblings ...)
  2015-03-16  4:43 ` [PATCH 09/13] VFS/namei: enable RCU-walk when following symlinks NeilBrown
@ 2015-03-16  4:43 ` NeilBrown
  2015-03-16 19:14 ` [PATCH 00/13] Support follow_link in RCU-walk. - V2 Al Viro
  13 siblings, 0 replies; 27+ messages in thread
From: NeilBrown @ 2015-03-16  4:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

If LOOKUP_RCU is set, follow_link will not take/drop reference counts.

Replace cond_resched() with _cond_resched() as the latter
is a no-op if rcu_read_lock() is held while the former will
give a warning in that case.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |   19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1663d21a3eb4..536e0254f5f1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -906,7 +906,8 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	if (unlikely(current->nameidata->total_link_count >= 40))
 		goto out_put_nd_path;
 
-	cond_resched();
+	/* If rcu_read_locked(), this will not resched, and will not warn */
+	_cond_resched();
 	current->nameidata->total_link_count++;
 
 	if (nd->flags & LOOKUP_RCU) {
@@ -936,11 +937,17 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 			return PTR_ERR(s);
 		}
 		if (*s == '/') {
-			if (!nd->root.mnt)
-				set_root(nd);
-			path_put(&nd->path);
-			nd->path = nd->root;
-			path_get(&nd->root);
+			if (nd->flags & LOOKUP_RCU) {
+				if (!nd->root.mnt)
+					set_root_rcu(nd);
+				nd->path = nd->root;
+			} else {
+				if (!nd->root.mnt)
+					set_root(nd);
+				path_put(&nd->path);
+				nd->path = nd->root;
+				path_get(&nd->root);
+			}
 			nd->flags |= LOOKUP_JUMPED;
 		}
 		nd->inode = nd->path.dentry->d_inode;



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/13] VFS/namei: enable RCU-walk when following symlinks.
  2015-03-16  4:43 [PATCH 00/13] Support follow_link in RCU-walk. - V2 NeilBrown
                   ` (10 preceding siblings ...)
  2015-03-16  4:43 ` [PATCH 07/13] VFS/namei: abort RCU-walk on symlink if atime needs updating NeilBrown
@ 2015-03-16  4:43 ` NeilBrown
  2015-03-16 22:44   ` Al Viro
  2015-03-16  4:43 ` [PATCH 08/13] VFS/namei: enhance follow_link to support RCU-walk NeilBrown
  2015-03-16 19:14 ` [PATCH 00/13] Support follow_link in RCU-walk. - V2 Al Viro
  13 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2015-03-16  4:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

Now that follow_link handles LOOKUP_RCU, we do not need to
'unlazy_walk' when a symlink is found.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |   12 ------------
 1 file changed, 12 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 536e0254f5f1..c9c58cd1af2a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1631,12 +1631,6 @@ static inline int walk_component(struct nameidata *nd, struct path *path,
 		goto out_path_put;
 
 	if (should_follow_link(path->dentry, follow)) {
-		if (nd->flags & LOOKUP_RCU) {
-			if (unlikely(unlazy_walk(nd, path->dentry))) {
-				err = -ECHILD;
-				goto out_err;
-			}
-		}
 		BUG_ON(inode != path->dentry->d_inode);
 		return 1;
 	}
@@ -3093,12 +3087,6 @@ finish_lookup:
 	}
 
 	if (should_follow_link(path->dentry, !symlink_ok)) {
-		if (nd->flags & LOOKUP_RCU) {
-			if (unlikely(unlazy_walk(nd, path->dentry))) {
-				error = -ECHILD;
-				goto out;
-			}
-		}
 		BUG_ON(inode != path->dentry->d_inode);
 		return 1;
 	}



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/13] VFS/namei: handle LOOKUP_RCU in page_follow_link_light.
  2015-03-16  4:43 [PATCH 00/13] Support follow_link in RCU-walk. - V2 NeilBrown
                   ` (5 preceding siblings ...)
  2015-03-16  4:43 ` [PATCH 12/13] XFS: allow follow_link to often succeed in RCU-walk NeilBrown
@ 2015-03-16  4:43 ` NeilBrown
  2015-03-16 22:50   ` Al Viro
  2015-03-16  4:43 ` [PATCH 11/13] xfs: use RCU to free 'struct xfs_mount' NeilBrown
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2015-03-16  4:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

If the symlink has already be been read-in, then
page_follow_link_light can succeed in RCU-walk mode.
page_getlink_rcu() is added to support this.

With this many filesystems can follow links in RCU-walk
mode when everything is cached.  This  includes ext?fs and
others.

If the page is a HighMem page we do *not* try to kmap_atomic,
but simply give up - only page_address() is used.
This is because we need to be able to sleep while holding
the address of the page, particularly over calls to do_last()
which can be quite slow and in particular takes a mutex.

If this were a problem, then copying into a GFP_ATOMIC allocation
might be a workable solution.

This selective calling of kmap requires us to know, in page_put_link,
whether or not kunmap() need to be called.  Pass this information in
the lsb of the cookie.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |   35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c9c58cd1af2a..2602d31ecc99 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4499,6 +4499,28 @@ static char *page_getlink(struct dentry * dentry, struct page **ppage)
 	return kaddr;
 }
 
+/* get the link contents from pagecache under RCU */
+static char *page_getlink_rcu(struct dentry * dentry, struct page **ppage)
+{
+	char *kaddr;
+	struct page *page;
+	struct address_space *mapping = dentry->d_inode->i_mapping;
+	page = find_get_page(mapping, 0);
+	if (page &&
+	    (!PageUptodate(page) || PageHighMem(page))) {
+		put_page(page);
+		page = NULL;
+	}
+	if (!page) {
+		*ppage = ERR_PTR(-ECHILD);
+		return NULL;
+	}
+	*ppage = page;
+	kaddr = page_address(page);
+	nd_terminate_link(kaddr, dentry->d_inode->i_size, PAGE_SIZE - 1);
+	return kaddr;
+}
+
 int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 {
 	struct page *page = NULL;
@@ -4514,19 +4536,22 @@ EXPORT_SYMBOL(page_readlink);
 void *page_follow_link_light(struct dentry *dentry, int flags)
 {
 	struct page *page = NULL;
-	if (flags & LOOKUP_RCU)
-		return ERR_PTR(-ECHILD);
-	nd_set_link(page_getlink(dentry, &page));
+	if (flags & LOOKUP_RCU) {
+		nd_set_link(page_getlink_rcu(dentry, &page));
+		page = (void*)((unsigned long)page | 1);
+	} else
+		nd_set_link(page_getlink(dentry, &page));
 	return page;
 }
 EXPORT_SYMBOL(page_follow_link_light);
 
 void page_put_link(struct dentry *dentry, void *cookie)
 {
-	struct page *page = cookie;
+	struct page *page = (void*)((unsigned long)cookie & ~1UL) ;
 
 	if (page) {
-		kunmap(page);
+		if (page == cookie)
+			kunmap(page);
 		page_cache_release(page);
 	}
 }



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 11/13] xfs: use RCU to free 'struct xfs_mount'.
  2015-03-16  4:43 [PATCH 00/13] Support follow_link in RCU-walk. - V2 NeilBrown
                   ` (6 preceding siblings ...)
  2015-03-16  4:43 ` [PATCH 10/13] VFS/namei: handle LOOKUP_RCU in page_follow_link_light NeilBrown
@ 2015-03-16  4:43 ` NeilBrown
  2015-03-16  4:43 ` [PATCH 13/13] NFS: support LOOKUP_RCU in nfs_follow_link NeilBrown
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: NeilBrown @ 2015-03-16  4:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

In order for ->follow_link to be safe in RCU-walk, any
data structures accessed need to be freed after
an RCU grace period.

'struct xfs_mount' is not currently guaranteed to be delayed
sufficiently, so use kfree_rcu() to free it.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/xfs/xfs_mount.h |    2 ++
 fs/xfs/xfs_super.c |    2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 0d8abd6364d9..6a1094e493e9 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -185,6 +185,8 @@ typedef struct xfs_mount {
 	 * to various other kinds of pain inflicted on the pNFS server.
 	 */
 	__uint32_t		m_generation;
+
+	struct rcu_head		m_rcu;
 } xfs_mount_t;
 
 /*
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 8fcc4ccc5c79..3827be14383c 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1047,7 +1047,7 @@ xfs_fs_put_super(
 	xfs_destroy_mount_workqueues(mp);
 	xfs_close_devices(mp);
 	xfs_free_fsname(mp);
-	kfree(mp);
+	kfree_rcu(mp, m_rcu);
 }
 
 STATIC int



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 13/13] NFS: support LOOKUP_RCU in nfs_follow_link.
  2015-03-16  4:43 [PATCH 00/13] Support follow_link in RCU-walk. - V2 NeilBrown
                   ` (7 preceding siblings ...)
  2015-03-16  4:43 ` [PATCH 11/13] xfs: use RCU to free 'struct xfs_mount' NeilBrown
@ 2015-03-16  4:43 ` NeilBrown
  2015-03-16  4:43 ` [PATCH 06/13] VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU NeilBrown
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: NeilBrown @ 2015-03-16  4:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

If the inode is valid and the page has been read in,
then we can follow a link in RCU-walk.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/inode.c         |   22 ++++++++++++++++++++++
 fs/nfs/symlink.c       |   20 ++++++++++++++++++--
 include/linux/nfs_fs.h |    1 +
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 83107be3dd01..80f192405102 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1123,6 +1123,28 @@ out:
 	return ret;
 }
 
+int nfs_revalidate_mapping_rcu(struct inode *inode)
+{
+	struct nfs_inode *nfsi = NFS_I(inode);
+	unsigned long *bitlock = &nfsi->flags;
+	int ret = 0;
+
+	if (IS_SWAPFILE(inode))
+		goto out;
+	if (nfs_mapping_need_revalidate_inode(inode)) {
+		ret = -ECHILD;
+		goto out;
+	}
+	spin_lock(&inode->i_lock);
+	if (test_bit(NFS_INO_INVALIDATING, bitlock) ||
+	    (nfsi->cache_validity & NFS_INO_INVALID_DATA))
+		ret = -ECHILD;
+	spin_unlock(&inode->i_lock);
+out:
+	return ret;
+}
+
+
 static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr *fattr)
 {
 	struct nfs_inode *nfsi = NFS_I(inode);
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index 43e43d2c8c5b..849bef4b0ae1 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -49,8 +49,24 @@ static void *nfs_follow_link(struct dentry *dentry, int flags)
 	struct page *page;
 	void *err;
 
-	if (flags & LOOKUP_RCU)
-		return ERR_PTR(-ECHILD);
+	if (flags & LOOKUP_RCU) {
+		err = ERR_PTR(nfs_revalidate_mapping_rcu(inode));
+		if (err)
+			goto read_failed;
+		page = find_get_page(inode->i_mapping, 0);
+		if (page &&
+		    (!PageUptodate(page) || PageHighMem(page))) {
+			put_page(page);
+			page = NULL;
+		}
+		if (!page) {
+			err = ERR_PTR(-ECHILD);
+			goto read_failed;
+		}
+		nd_set_link(page_address(page));
+		page = (void*)((unsigned long)page | 1);
+		return page;
+	}
 	err = ERR_PTR(nfs_revalidate_mapping(inode, inode->i_mapping));
 	if (err)
 		goto read_failed;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 2f77e0c651c8..78c2f812eaeb 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -355,6 +355,7 @@ extern int nfs_revalidate_inode(struct nfs_server *server, struct inode *inode);
 extern int nfs_revalidate_inode_rcu(struct nfs_server *server, struct inode *inode);
 extern int __nfs_revalidate_inode(struct nfs_server *, struct inode *);
 extern int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping);
+extern int nfs_revalidate_mapping_rcu(struct inode *inode);
 extern int nfs_setattr(struct dentry *, struct iattr *);
 extern void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr);
 extern void nfs_setsecurity(struct inode *inode, struct nfs_fattr *fattr,



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 12/13] XFS: allow follow_link to often succeed in RCU-walk.
  2015-03-16  4:43 [PATCH 00/13] Support follow_link in RCU-walk. - V2 NeilBrown
                   ` (4 preceding siblings ...)
  2015-03-16  4:43 ` [PATCH 05/13] VFS/namei: use terminate_walk when symlink lookup fails NeilBrown
@ 2015-03-16  4:43 ` NeilBrown
  2015-03-16 22:37   ` Al Viro
  2015-03-16  4:43 ` [PATCH 10/13] VFS/namei: handle LOOKUP_RCU in page_follow_link_light NeilBrown
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2015-03-16  4:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

If LOOKUP_RCU is set, use GFP_ATOMIC rather than GFP_KERNEL,
and try to get the ilock without blocking.

When these succeed, follow_link() can succeed without dropping
out of RCU-walk.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/xfs/xfs_ioctl.c   |    2 +-
 fs/xfs/xfs_iops.c    |   15 ++++++++++-----
 fs/xfs/xfs_symlink.c |   11 +++++++++--
 fs/xfs/xfs_symlink.h |    2 +-
 4 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index ac4feae45eb3..29d95a1b76c0 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -303,7 +303,7 @@ xfs_readlink_by_handle(
 		goto out_dput;
 	}
 
-	error = xfs_readlink(XFS_I(dentry->d_inode), link);
+	error = xfs_readlink(XFS_I(dentry->d_inode), link, 0);
 	if (error)
 		goto out_kfree;
 	error = readlink_copy(hreq->ohandle, olen, link);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 8fd416ae935a..72bc60f09415 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -415,15 +415,20 @@ xfs_vn_follow_link(
 	int			flags)
 {
 	char			*link;
-	int			error = -ENOMEM;
+	int			error;
 
-	if (flags & LOOKUP_RCU)
-		return ERR_PTR(-ECHILD);
-	link = kmalloc(MAXPATHLEN+1, GFP_KERNEL);
+	if (flags & LOOKUP_RCU) {
+		error = -ECHILD;
+		link = kmalloc(MAXPATHLEN+1, GFP_ATOMIC);
+	} else {
+		error = -ENOMEM;
+		link = kmalloc(MAXPATHLEN+1, GFP_KERNEL);
+	}
 	if (!link)
 		goto out_err;
 
-	error = xfs_readlink(XFS_I(dentry->d_inode), link);
+	error = xfs_readlink(XFS_I(dentry->d_inode), link,
+			     flags & LOOKUP_RCU);
 	if (unlikely(error))
 		goto out_kfree;
 
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 25791df6f638..87b5b2ba3d38 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -123,7 +123,8 @@ xfs_readlink_bmap(
 int
 xfs_readlink(
 	struct xfs_inode *ip,
-	char		*link)
+	char		*link,
+	int		rcu)
 {
 	struct xfs_mount *mp = ip->i_mount;
 	xfs_fsize_t	pathlen;
@@ -134,7 +135,11 @@ xfs_readlink(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	xfs_ilock(ip, XFS_ILOCK_SHARED);
+	if (rcu) {
+		if (xfs_ilock_nowait(ip, XFS_ILOCK_SHARED) == 0)
+			return -ECHILD;
+	} else
+		xfs_ilock(ip, XFS_ILOCK_SHARED);
 
 	pathlen = ip->i_d.di_size;
 	if (!pathlen)
@@ -153,6 +158,8 @@ xfs_readlink(
 	if (ip->i_df.if_flags & XFS_IFINLINE) {
 		memcpy(link, ip->i_df.if_u1.if_data, pathlen);
 		link[pathlen] = '\0';
+	} else if (rcu) {
+		error = -ECHILD;
 	} else {
 		error = xfs_readlink_bmap(ip, link);
 	}
diff --git a/fs/xfs/xfs_symlink.h b/fs/xfs/xfs_symlink.h
index e75245d09116..a71d26643e20 100644
--- a/fs/xfs/xfs_symlink.h
+++ b/fs/xfs/xfs_symlink.h
@@ -21,7 +21,7 @@
 
 int xfs_symlink(struct xfs_inode *dp, struct xfs_name *link_name,
 		const char *target_path, umode_t mode, struct xfs_inode **ipp);
-int xfs_readlink(struct xfs_inode *ip, char *link);
+int xfs_readlink(struct xfs_inode *ip, char *link, int rcu);
 int xfs_inactive_symlink(struct xfs_inode *ip);
 
 #endif /* __XFS_SYMLINK_H */



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/13] Support follow_link in RCU-walk. - V2
  2015-03-16  4:43 [PATCH 00/13] Support follow_link in RCU-walk. - V2 NeilBrown
                   ` (12 preceding siblings ...)
  2015-03-16  4:43 ` [PATCH 08/13] VFS/namei: enhance follow_link to support RCU-walk NeilBrown
@ 2015-03-16 19:14 ` Al Viro
  13 siblings, 0 replies; 27+ messages in thread
From: Al Viro @ 2015-03-16 19:14 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel, linux-kernel

On Mon, Mar 16, 2015 at 03:43:19PM +1100, NeilBrown wrote:
> Hi Al,
>  I believe this series addresses all your concerns about
>  my first attempt.
>  The first patch results in nameidata being almost completely
>  localized to namei.c :-)  It also highlights out-of-date
>  documentation in automount-support.txt :-(
> 
>  It also exposes (and removes) some ... interesting code in lustre.
>  I'm not sure how safe it is to remove that.... I didn't think
>  recursive symlinks used extra stack.

Recursive nested symlinks *do* use extra stack; it's not in fs code, though.
make fs/namei.s and check link_path_walk; AFAICS, on amd64 it's 192 bytes per
level, on sparc64 - 256, sparc32 and ppc32 - 144, ppc64 - obscenely fat 336...

It's more that lustre is an extreme stack hog; call its methods on slightly
deeper stack and you are screwed.  I don't _know_ if that's pure paranoia -
might very well be.  OTOH, it might be not paranoid enough.  OTTH, if it
manages to survive 5 levels on 4K stack, it ought to survive 8 levels on 8K
one; if 3 times the footprint of link_path_walk pushes the total by more 4K,
there's no way in hell to fit 5 times that footprint into 4K stack, nevermind
the rest of call chain...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/13] VFS: replace {, total_}link_count in task_struct with pointer to nameidata
  2015-03-16  4:43 ` [PATCH 01/13] VFS: replace {, total_}link_count in task_struct with pointer to nameidata NeilBrown
@ 2015-03-16 19:46   ` Al Viro
  0 siblings, 0 replies; 27+ messages in thread
From: Al Viro @ 2015-03-16 19:46 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel, linux-kernel

On Mon, Mar 16, 2015 at 03:43:19PM +1100, NeilBrown wrote:

> -	if (unlikely(current->total_link_count >= 40))
> +	if (unlikely(current->nameidata->total_link_count >= 40))

Huh?  nd->total_link_count, please.
>  
> -	current->total_link_count++;
> +	current->nameidata->total_link_count++;

Similar.

> @@ -991,8 +1008,8 @@ static int follow_automount(struct path *path, unsigned flags,
>  	    path->dentry->d_inode)
>  		return -EISDIR;
>  
> -	current->total_link_count++;
> -	if (current->total_link_count >= 40)
> +	current->nameidata->total_link_count++;
> +	if (current->nameidata->total_link_count >= 40)
>  		return -ELOOP;

We probably ought to pass nd through follow_mount / follow_automount, instead
of nd->flags, and use nd->total_link_count here.

> -	if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
> +	if (unlikely(current->nameidata->link_count >= MAX_NESTED_LINKS)) {

Again, nd->link_count.

> @@ -1948,7 +1965,7 @@ static int path_init(int dfd, const char *name, unsigned int flags,
>  	rcu_read_unlock();
>  	return -ECHILD;
>  done:
> -	current->total_link_count = 0;
> +	current->nameidata->total_link_count = 0;

... and again.

>  	return link_path_walk(name, nd);
>  }
>  
> @@ -2027,7 +2044,9 @@ static int path_lookupat(int dfd, const char *name,
>  static int filename_lookup(int dfd, struct filename *name,
>  				unsigned int flags, struct nameidata *nd)
>  {
> -	int retval = path_lookupat(dfd, name->name, flags | LOOKUP_RCU, nd);
> +	int retval;
> +	struct nameidata *saved_nd = set_nameidata(nd);

I'm not sure it's the right place ;-/  I'll play with that a bit and see
if I can get it cleaner...

> -	struct nameidata nd;
> +	struct nameidata nd, *saved = set_nameidata(&nd);
>  	void *cookie;
>  	int res;
>  
> @@ -4441,6 +4465,7 @@ int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
>  	res = readlink_copy(buffer, buflen, nd_get_link(&nd));
>  	if (dentry->d_inode->i_op->put_link)
>  		dentry->d_inode->i_op->put_link(dentry, &nd, cookie);
> +	set_nameidata(saved);
>  	return res;

Now, _that_ is broken - get ERR_PTR(...) from ->follow_link() and you've
leaked nameidata.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/13] VFS: remove nameidata args from ->follow_link and ->put_link
  2015-03-16  4:43 ` [PATCH 03/13] VFS: remove nameidata args from ->follow_link and ->put_link NeilBrown
@ 2015-03-16 20:47   ` Al Viro
  0 siblings, 0 replies; 27+ messages in thread
From: Al Viro @ 2015-03-16 20:47 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel, linux-kernel

On Mon, Mar 16, 2015 at 03:43:19PM +1100, NeilBrown wrote:
> Now that current->nameidata is available, nd_set_link() and
> nd_get_link() can use that directly, so 'nd' doesn't need to
> be passed through ->follow_link and  ->put_link.

FWIW, I would rather pass nd_get_link(nd) to ->put_link() instead of nd.
Note that that's the only thing instances were ever using nd for; what's more,
that's the only thing nd_get_link() is ever used for outside of fs/namei.c,
so with that change it could become static in fs/namei.c.  After such change
we would have it used in
	* follow_link().  We have nd right there.
	* put_link().  Ditto.
	* generic_readlink().  Again, nd is right there (and we obviously
only need to call nd_get_link() once).
Hell, it can even become static inline...

>  	nd->last_type = LAST_BIND;
> -	*p = dentry->d_inode->i_op->follow_link(dentry, nd);
> +	*p = dentry->d_inode->i_op->follow_link(dentry, nd->flags);

I'm not sure if it's a good idea to expose all flags here - it's really
asking for somebody trying to be "smart" and acting differently depending
on what we are doing pathname resolution for/where in lookup we are/etc.
nd->flags & LOOKUP_RCU might be less tempting.

> @@ -4458,13 +4464,13 @@ int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
>  	int res;
>  
>  	nd.depth = 0;
> -	cookie = dentry->d_inode->i_op->follow_link(dentry, &nd);
> +	cookie = dentry->d_inode->i_op->follow_link(dentry, nd.flags);

...(dentry, 0);  nd.flags is uninitialized, for pity sake...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 04/13] security/selinux: check for LOOKUP_RCU in _follow_link.
  2015-03-16  4:43 ` [PATCH 04/13] security/selinux: check for LOOKUP_RCU in _follow_link NeilBrown
@ 2015-03-16 21:00   ` Al Viro
  2015-03-20  4:39     ` NeilBrown
  0 siblings, 1 reply; 27+ messages in thread
From: Al Viro @ 2015-03-16 21:00 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel, linux-kernel

On Mon, Mar 16, 2015 at 03:43:19PM +1100, NeilBrown wrote:
> Some of dentry_has_perm() is not rcu-safe, so if LOOKUP_RCU
> is set in selinux_inode_follow_link(), give up with
> -ECHILD.
> 
> It is possible that dentry_has_perm could sometimes complete
> in RCU more, in which case the flag could be propagated further
> down the stack...

It bloody well can.  Expand it a bit and you'll see - the nastiness
comes from avc_audit() doing
        return slow_avc_audit(ssid, tsid, tclass,
                              requested, audited, denied, result,
                              a, 0);
and passing that 0 to slow_avc_audit().  Pass it MAY_NOT_BLOCK instead
and it'll bugger off with -ECHILD in blocking case.

Call chain is dentry_has_perm -> inode_has_perm -> avc_has_perm -> avc_audit.
Expand those (including avc_audit()) and make slow_avc_audit() get
flags & LOOKUP_RCU ? MAY_NOT_BLOCK : 0.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/13] VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU.
  2015-03-16  4:43 ` [PATCH 06/13] VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU NeilBrown
@ 2015-03-16 22:33   ` Al Viro
  2015-03-17  0:59     ` Al Viro
  0 siblings, 1 reply; 27+ messages in thread
From: Al Viro @ 2015-03-16 22:33 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel, linux-kernel

On Mon, Mar 16, 2015 at 03:43:20PM +1100, NeilBrown wrote:
> When we support ->follow_link in RCU-walk we will not want to
> take a reference to the 'struct path *link' passed to follow_link,
> and correspondingly will not want to drop that reference.
> 
> As link_path_walk will complete_walk() in the case of an error,
> and as complete_walk() will clear LOOKUP_RCU, we cannot test
> LOOKUP_RCU to determine if the path should be 'put'.
> 
> So introduce a new flag: LOOKUP_LINK_RCU.  This is set on
> entry to follow_link() if appropriate and put_link() will
> only call path_put() if it is clear.

Umm...  How is it different from nd->depth > 0 && nd->flags & LOOKUP_RCU?
IOW, could we bump nd->depth before that (conditional) mntget()?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 12/13] XFS: allow follow_link to often succeed in RCU-walk.
  2015-03-16  4:43 ` [PATCH 12/13] XFS: allow follow_link to often succeed in RCU-walk NeilBrown
@ 2015-03-16 22:37   ` Al Viro
  0 siblings, 0 replies; 27+ messages in thread
From: Al Viro @ 2015-03-16 22:37 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel, linux-kernel

On Mon, Mar 16, 2015 at 03:43:20PM +1100, NeilBrown wrote:

> -	xfs_ilock(ip, XFS_ILOCK_SHARED);
> +	if (rcu) {
> +		if (xfs_ilock_nowait(ip, XFS_ILOCK_SHARED) == 0)
> +			return -ECHILD;

Umm...  Is that guaranteed to be safe for inode that is currently going
through xfs ->evict_inode()?  struct inode getting freed is RCU-delayed;
->evict_inode() is *not*.  It can happen right under you in RCU pathwalk.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/13] VFS/namei: enable RCU-walk when following symlinks.
  2015-03-16  4:43 ` [PATCH 09/13] VFS/namei: enable RCU-walk when following symlinks NeilBrown
@ 2015-03-16 22:44   ` Al Viro
  0 siblings, 0 replies; 27+ messages in thread
From: Al Viro @ 2015-03-16 22:44 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel, linux-kernel

On Mon, Mar 16, 2015 at 03:43:20PM +1100, NeilBrown wrote:
>  	if (should_follow_link(path->dentry, follow)) {
> -		if (nd->flags & LOOKUP_RCU) {
> -			if (unlikely(unlazy_walk(nd, path->dentry))) {
> -				err = -ECHILD;
> -				goto out_err;
> -			}
> -		}
>  		BUG_ON(inode != path->dentry->d_inode);

... and now this BUG_ON() can bloody well be triggered.
>  	if (should_follow_link(path->dentry, !symlink_ok)) {
> -		if (nd->flags & LOOKUP_RCU) {
> -			if (unlikely(unlazy_walk(nd, path->dentry))) {
> -				error = -ECHILD;
> -				goto out;
> -			}
> -		}
>  		BUG_ON(inode != path->dentry->d_inode);

So can this.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/13] VFS/namei: handle LOOKUP_RCU in page_follow_link_light.
  2015-03-16  4:43 ` [PATCH 10/13] VFS/namei: handle LOOKUP_RCU in page_follow_link_light NeilBrown
@ 2015-03-16 22:50   ` Al Viro
  2015-03-19 22:38     ` NeilBrown
  0 siblings, 1 reply; 27+ messages in thread
From: Al Viro @ 2015-03-16 22:50 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel, linux-kernel

On Mon, Mar 16, 2015 at 03:43:20PM +1100, NeilBrown wrote:
> +	char *kaddr;
> +	struct page *page;
> +	struct address_space *mapping = dentry->d_inode->i_mapping;

Who said that dentry->d_inode hasn't gone NULL by that point?

> +	nd_terminate_link(kaddr, dentry->d_inode->i_size, PAGE_SIZE - 1);

... or changed here.  Again, dentry->d_inode is stable only if you are
holding a reference to dentry.  That's why we have those dances around
nd->inode, for example.  Doing unlazy_walk() is enough to stabilize the
damn thing, so currently ->follow_link() doesn't have to worry about it.
With your changes, though...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/13] VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU.
  2015-03-16 22:33   ` Al Viro
@ 2015-03-17  0:59     ` Al Viro
  0 siblings, 0 replies; 27+ messages in thread
From: Al Viro @ 2015-03-17  0:59 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel, linux-kernel

On Mon, Mar 16, 2015 at 10:33:45PM +0000, Al Viro wrote:
> On Mon, Mar 16, 2015 at 03:43:20PM +1100, NeilBrown wrote:
> > When we support ->follow_link in RCU-walk we will not want to
> > take a reference to the 'struct path *link' passed to follow_link,
> > and correspondingly will not want to drop that reference.
> > 
> > As link_path_walk will complete_walk() in the case of an error,
> > and as complete_walk() will clear LOOKUP_RCU, we cannot test
> > LOOKUP_RCU to determine if the path should be 'put'.
> > 
> > So introduce a new flag: LOOKUP_LINK_RCU.  This is set on
> > entry to follow_link() if appropriate and put_link() will
> > only call path_put() if it is clear.
> 
> Umm...  How is it different from nd->depth > 0 && nd->flags & LOOKUP_RCU?
> IOW, could we bump nd->depth before that (conditional) mntget()?

OK, I see...  So you are holding that flag for as long as we are traversing
any part of a symlink body, including that of a trailing symlink...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/13] VFS/namei: handle LOOKUP_RCU in page_follow_link_light.
  2015-03-16 22:50   ` Al Viro
@ 2015-03-19 22:38     ` NeilBrown
  2015-03-19 23:46       ` Al Viro
  0 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2015-03-19 22:38 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1247 bytes --]

On Mon, 16 Mar 2015 22:50:40 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Mon, Mar 16, 2015 at 03:43:20PM +1100, NeilBrown wrote:
> > +	char *kaddr;
> > +	struct page *page;
> > +	struct address_space *mapping = dentry->d_inode->i_mapping;
> 
> Who said that dentry->d_inode hasn't gone NULL by that point?
> 
> > +	nd_terminate_link(kaddr, dentry->d_inode->i_size, PAGE_SIZE - 1);
> 
> ... or changed here.  Again, dentry->d_inode is stable only if you are
> holding a reference to dentry.  That's why we have those dances around
> nd->inode, for example.  Doing unlazy_walk() is enough to stabilize the
> damn thing, so currently ->follow_link() doesn't have to worry about it.
> With your changes, though...

Ahhh - that's what nd->inode is for.  I wondered.

Am I correct in thinking that dentry->d_inode can only become NULL - it cannot
then become some other inode?

In that case the various follow_link methods that are sufficiently atomic for
rcu-walk just need something like:

 struct inode *inode = dentry->d_inode;

 if (!inode)
     return -ECHILD;

If ->d_inode can become another inode, then I suspect we need to pass the
inode as well as the dentry to ->follow_link.


Thanks,
NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/13] VFS/namei: handle LOOKUP_RCU in page_follow_link_light.
  2015-03-19 22:38     ` NeilBrown
@ 2015-03-19 23:46       ` Al Viro
  0 siblings, 0 replies; 27+ messages in thread
From: Al Viro @ 2015-03-19 23:46 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel, linux-kernel

On Fri, Mar 20, 2015 at 09:38:33AM +1100, NeilBrown wrote:

> Ahhh - that's what nd->inode is for.  I wondered.
> 
> Am I correct in thinking that dentry->d_inode can only become NULL - it cannot
> then become some other inode?

It can - consider somebody doing mkdir on that name right under you.
_All_ we are guaranteed is that at some moment nd->inode matched the
pathname this far and so was (at the same moment) path->dentry.  We
are not promised that these inode and dentry will remain associated
with each other, etc.

We ought to check ->d_seq after checking ->d_flags, BTW.  _That_ will confirm
that inode remained corresponding to that dentry until the time we'd
observed d_is_symlink(dentry), i.e. make sure that inode *is* a symlink one.

And yes, we probably would have to pass dentry and inode separately, more's
the pity.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 04/13] security/selinux: check for LOOKUP_RCU in _follow_link.
  2015-03-16 21:00   ` Al Viro
@ 2015-03-20  4:39     ` NeilBrown
  2015-03-20  5:12       ` Al Viro
  0 siblings, 1 reply; 27+ messages in thread
From: NeilBrown @ 2015-03-20  4:39 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1659 bytes --]

On Mon, 16 Mar 2015 21:00:35 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Mon, Mar 16, 2015 at 03:43:19PM +1100, NeilBrown wrote:
> > Some of dentry_has_perm() is not rcu-safe, so if LOOKUP_RCU
> > is set in selinux_inode_follow_link(), give up with
> > -ECHILD.
> > 
> > It is possible that dentry_has_perm could sometimes complete
> > in RCU more, in which case the flag could be propagated further
> > down the stack...
> 
> It bloody well can.  Expand it a bit and you'll see - the nastiness
> comes from avc_audit() doing
>         return slow_avc_audit(ssid, tsid, tclass,
>                               requested, audited, denied, result,
>                               a, 0);
> and passing that 0 to slow_avc_audit().  Pass it MAY_NOT_BLOCK instead
> and it'll bugger off with -ECHILD in blocking case.
> 
> Call chain is dentry_has_perm -> inode_has_perm -> avc_has_perm -> avc_audit.
> Expand those (including avc_audit()) and make slow_avc_audit() get
> flags & LOOKUP_RCU ? MAY_NOT_BLOCK : 0.

There is more to it than that.

avc_has_perm calls avc_has_perm_noaudit which does:

	rcu_read_lock();
	...
	if (unlikely(!node)) {
		node = avc_compute_av(ssid, tsid, tclass, avd);
	} else ...

	...
	rcu_read_unlock();

and avc_compute_av() does

	rcu_read_unlock();
	security_compute_av(ssid, tsid, tclass, avd);
	rcu_read_lock();

(yes: unlock, and then lock).
so avc_has_perm_noaudit needs to bail out of RCU-walk if node turns out to be
NULL.
So I either add another 'flags' arg to that, or replace the current one which
is unused .... or leave it as someone else's problem :-)

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 04/13] security/selinux: check for LOOKUP_RCU in _follow_link.
  2015-03-20  4:39     ` NeilBrown
@ 2015-03-20  5:12       ` Al Viro
  0 siblings, 0 replies; 27+ messages in thread
From: Al Viro @ 2015-03-20  5:12 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel, linux-kernel

On Fri, Mar 20, 2015 at 03:39:30PM +1100, NeilBrown wrote:
> 	rcu_read_unlock();
> 	security_compute_av(ssid, tsid, tclass, avd);
> 	rcu_read_lock();
> 
> (yes: unlock, and then lock).
>
> so avc_has_perm_noaudit needs to bail out of RCU-walk if node turns out to be
> NULL.

NFI, but since
	a) the guts of security_compute_av() are under rwlock (shared),
I rather doubt that it could e.g. block
	b) avc_has_perm_noaudit() is called from selinux_inode_permission(),
which is called inside RCU-walk - it's hit on selinux setups in every
successful inode_permission()
I'd say that it's no worse than it already was.  AFAICS, it's a slowpath and
we don't want to hold rcu_read_lock() over it to avoid stalls, but if the
caller of avc_has_perm_noaudit() used to want rcu_read_lock(), well, we'll
just risks stalls

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2015-03-20  5:12 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-16  4:43 [PATCH 00/13] Support follow_link in RCU-walk. - V2 NeilBrown
2015-03-16  4:43 ` [PATCH 01/13] VFS: replace {, total_}link_count in task_struct with pointer to nameidata NeilBrown
2015-03-16 19:46   ` Al Viro
2015-03-16  4:43 ` [PATCH 02/13] VFS: make all ->follow_link handlers aware for LOOKUP_RCU NeilBrown
2015-03-16  4:43 ` [PATCH 04/13] security/selinux: check for LOOKUP_RCU in _follow_link NeilBrown
2015-03-16 21:00   ` Al Viro
2015-03-20  4:39     ` NeilBrown
2015-03-20  5:12       ` Al Viro
2015-03-16  4:43 ` [PATCH 03/13] VFS: remove nameidata args from ->follow_link and ->put_link NeilBrown
2015-03-16 20:47   ` Al Viro
2015-03-16  4:43 ` [PATCH 05/13] VFS/namei: use terminate_walk when symlink lookup fails NeilBrown
2015-03-16  4:43 ` [PATCH 12/13] XFS: allow follow_link to often succeed in RCU-walk NeilBrown
2015-03-16 22:37   ` Al Viro
2015-03-16  4:43 ` [PATCH 10/13] VFS/namei: handle LOOKUP_RCU in page_follow_link_light NeilBrown
2015-03-16 22:50   ` Al Viro
2015-03-19 22:38     ` NeilBrown
2015-03-19 23:46       ` Al Viro
2015-03-16  4:43 ` [PATCH 11/13] xfs: use RCU to free 'struct xfs_mount' NeilBrown
2015-03-16  4:43 ` [PATCH 13/13] NFS: support LOOKUP_RCU in nfs_follow_link NeilBrown
2015-03-16  4:43 ` [PATCH 06/13] VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU NeilBrown
2015-03-16 22:33   ` Al Viro
2015-03-17  0:59     ` Al Viro
2015-03-16  4:43 ` [PATCH 07/13] VFS/namei: abort RCU-walk on symlink if atime needs updating NeilBrown
2015-03-16  4:43 ` [PATCH 09/13] VFS/namei: enable RCU-walk when following symlinks NeilBrown
2015-03-16 22:44   ` Al Viro
2015-03-16  4:43 ` [PATCH 08/13] VFS/namei: enhance follow_link to support RCU-walk NeilBrown
2015-03-16 19:14 ` [PATCH 00/13] Support follow_link in RCU-walk. - V2 Al Viro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.