linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/20] Support follow_link in RCU-walk - V3
@ 2015-03-23  2:37 NeilBrown
  2015-03-23  2:37 ` [PATCH 02/20] STAGING/lustre: limit follow_link recursion using stack space NeilBrown
                   ` (20 more replies)
  0 siblings, 21 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

Hi Al,
 thanks for all your review help - particularly the fact that
 dentry->d_inode is not stable in RCU-walk.  That has lead to
 a number of changes.

 I think this set addresses all of your review comments, improves
 some documentation, and has a go at providing a solution for lustre.

 I hope to organize some proper testing soon, so I can confirm that it
 makes certain loads a lot faster.

Thanks,
NeilBrown


---

NeilBrown (20):
      Documentation: remove outdated information from automount-support.txt
      STAGING/lustre: limit follow_link recursion using stack space.
      VFS: replace {,total_}link_count in task_struct with pointer to nameidata
      ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link
      VFS: replace nameidata arg to ->put_link with a char*.
      SECURITY: remove nameidata arg from inode_follow_link.
      VFS: remove nameidata args from ->follow_link
      VFS: make all ->follow_link handlers aware for LOOKUP_RCU
      security/selinux: pass 'flags' arg to avc_audit() and avc_has_perm_flags()
      security: make inode_follow_link RCU-walk aware
      VFS/namei: use terminate_walk when symlink lookup fails.
      VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU.
      VFS/namei: abort RCU-walk on symlink if atime needs updating.
      VFS/namei: add 'inode' arg to put_link().
      VFS/namei: enhance follow_link to support RCU-walk.
      VFS/namei: enable RCU-walk when following symlinks.
      VFS/namei: handle LOOKUP_RCU in page_follow_link_light.
      xfs: use RCU to free 'struct xfs_mount'.
      XFS: allow follow_link to often succeed in RCU-walk.
      NFS: support LOOKUP_RCU in nfs_follow_link.


 Documentation/filesystems/Locking               |    4 
 Documentation/filesystems/automount-support.txt |   51 +---
 Documentation/filesystems/porting               |   23 ++
 Documentation/filesystems/vfs.txt               |    4 
 drivers/staging/lustre/lustre/llite/symlink.c   |   32 ++-
 fs/9p/v9fs.h                                    |    2 
 fs/9p/vfs_inode.c                               |   19 +-
 fs/9p/vfs_inode_dotl.c                          |   13 +
 fs/autofs4/symlink.c                            |    6 
 fs/befs/linuxvfs.c                              |   19 +-
 fs/ceph/inode.c                                 |    7 -
 fs/cifs/cifsfs.h                                |    3 
 fs/cifs/link.c                                  |    7 -
 fs/configfs/symlink.c                           |   16 +
 fs/debugfs/file.c                               |    5 
 fs/ecryptfs/inode.c                             |   16 +
 fs/exofs/symlink.c                              |    8 -
 fs/ext2/symlink.c                               |    7 -
 fs/ext3/symlink.c                               |    7 -
 fs/ext4/symlink.c                               |    7 -
 fs/freevxfs/vxfs_immed.c                        |   12 +
 fs/fuse/dir.c                                   |   11 +
 fs/gfs2/inode.c                                 |   14 +
 fs/hostfs/hostfs_kern.c                         |   15 +
 fs/hppfs/hppfs.c                                |   13 +
 fs/inode.c                                      |   26 ++
 fs/jffs2/symlink.c                              |   10 -
 fs/jfs/symlink.c                                |    7 -
 fs/kernfs/symlink.c                             |   15 +
 fs/libfs.c                                      |    4 
 fs/namei.c                                      |  281 +++++++++++++++--------
 fs/nfs/inode.c                                  |   21 ++
 fs/nfs/symlink.c                                |   26 ++
 fs/ntfs/namei.c                                 |    1 
 fs/overlayfs/inode.c                            |   35 ++-
 fs/proc/base.c                                  |    8 -
 fs/proc/inode.c                                 |    9 -
 fs/proc/namespaces.c                            |    9 -
 fs/proc/self.c                                  |   12 +
 fs/proc/thread_self.c                           |   15 +
 fs/sysv/symlink.c                               |    5 
 fs/ubifs/file.c                                 |    7 -
 fs/ufs/symlink.c                                |    8 -
 fs/xfs/xfs_ioctl.c                              |    2 
 fs/xfs/xfs_iops.c                               |   20 +-
 fs/xfs/xfs_mount.h                              |    2 
 fs/xfs/xfs_super.c                              |    4 
 fs/xfs/xfs_symlink.c                            |   15 +
 fs/xfs/xfs_symlink.h                            |    2 
 include/linux/fs.h                              |   12 -
 include/linux/namei.h                           |    7 -
 include/linux/nfs_fs.h                          |    1 
 include/linux/sched.h                           |    3 
 include/linux/security.h                        |   13 +
 mm/shmem.c                                      |   20 +-
 security/capability.c                           |    4 
 security/security.c                             |    7 -
 security/selinux/avc.c                          |   18 +
 security/selinux/hooks.c                        |   21 +-
 security/selinux/include/avc.h                  |    9 +
 60 files changed, 632 insertions(+), 348 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/20] Documentation: remove outdated information from automount-support.txt
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
  2015-03-23  2:37 ` [PATCH 02/20] STAGING/lustre: limit follow_link recursion using stack space NeilBrown
  2015-03-23  2:37 ` [PATCH 03/20] VFS: replace {, total_}link_count in task_struct with pointer to nameidata NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 10/20] security: make inode_follow_link RCU-walk aware NeilBrown
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

The guidelines for adding automount support to a filesystem
in filesystems/automount-support.txt is out or date.
filesystems/autofs4.txt contains more current text, so replace
the out-of-date content with a reference to that.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 Documentation/filesystems/automount-support.txt |   51 ++++++-----------------
 1 file changed, 13 insertions(+), 38 deletions(-)

diff --git a/Documentation/filesystems/automount-support.txt b/Documentation/filesystems/automount-support.txt
index 7cac200e2a85..7eb762eb3136 100644
--- a/Documentation/filesystems/automount-support.txt
+++ b/Documentation/filesystems/automount-support.txt
@@ -1,41 +1,15 @@
-Support is available for filesystems that wish to do automounting support (such
-as kAFS which can be found in fs/afs/). This facility includes allowing
-in-kernel mounts to be performed and mountpoint degradation to be
-requested. The latter can also be requested by userspace.
+Support is available for filesystems that wish to do automounting
+support (such as kAFS which can be found in fs/afs/ and NFS in
+fs/nfs/). This facility includes allowing in-kernel mounts to be
+performed and mountpoint degradation to be requested. The latter can
+also be requested by userspace.
 
 
 ======================
 IN-KERNEL AUTOMOUNTING
 ======================
 
-A filesystem can now mount another filesystem on one of its directories by the
-following procedure:
-
- (1) Give the directory a follow_link() operation.
-
-     When the directory is accessed, the follow_link op will be called, and
-     it will be provided with the location of the mountpoint in the nameidata
-     structure (vfsmount and dentry).
-
- (2) Have the follow_link() op do the following steps:
-
-     (a) Call vfs_kern_mount() to call the appropriate filesystem to set up a
-         superblock and gain a vfsmount structure representing it.
-
-     (b) Copy the nameidata provided as an argument and substitute the dentry
-	 argument into it the copy.
-
-     (c) Call do_add_mount() to install the new vfsmount into the namespace's
-	 mountpoint tree, thus making it accessible to userspace. Use the
-	 nameidata set up in (b) as the destination.
-
-	 If the mountpoint will be automatically expired, then do_add_mount()
-	 should also be given the location of an expiration list (see further
-	 down).
-
-     (d) Release the path in the nameidata argument and substitute in the new
-	 vfsmount and its root dentry. The ref counts on these will need
-	 incrementing.
+See section "Mount Traps" of  Documentation/filesystems/autofs4.txt
 
 Then from userspace, you can just do something like:
 
@@ -61,17 +35,18 @@ AUTOMATIC MOUNTPOINT EXPIRY
 ===========================
 
 Automatic expiration of mountpoints is easy, provided you've mounted the
-mountpoint to be expired in the automounting procedure outlined above.
+mountpoint to be expired in the automounting procedure outlined separately.
 
 To do expiration, you need to follow these steps:
 
- (3) Create at least one list off which the vfsmounts to be expired can be
-     hung. Access to this list will be governed by the vfsmount_lock.
+ (1) Create at least one list off which the vfsmounts to be expired can be
+     hung.
 
- (4) In step (2c) above, the call to do_add_mount() should be provided with a
-     pointer to this list. It will hang the vfsmount off of it if it succeeds.
+ (2) When a new mountpoint is created in the ->d_automount method, add
+     the mnt to the list using mnt_set_expiry()
+             mnt_set_expiry(newmnt, &afs_vfsmounts);
 
- (5) When you want mountpoints to be expired, call mark_mounts_for_expiry()
+ (3) When you want mountpoints to be expired, call mark_mounts_for_expiry()
      with a pointer to this list. This will process the list, marking every
      vfsmount thereon for potential expiry on the next call.
 



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/20] STAGING/lustre: limit follow_link recursion using stack space.
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-04-18  3:01   ` Al Viro
  2015-03-23  2:37 ` [PATCH 03/20] VFS: replace {, total_}link_count in task_struct with pointer to nameidata NeilBrown
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

lustre's ->follow_link() uses a lot of stack space and so
need to limit symlink recursion based on stack size.

It currently tests current->link_count, but that will soon
become private to fs/namei.c.
So instead base on actual available stack space.
This patch aborts recursive symlinks in less than 2K of space
is available.  This seems consistent with current code, but
hasn't been tested.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/staging/lustre/lustre/llite/symlink.c |   21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index 686b6a574cc5..ba37eb6b29dc 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -120,20 +120,27 @@ failed:
 
 static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
+	unsigned long avail_space;
 	struct inode *inode = dentry->d_inode;
 	struct ptlrpc_request *request = NULL;
 	int rc;
 	char *symname = NULL;
 
 	CDEBUG(D_VFSTRACE, "VFS Op\n");
-	/* Limit the recursive symlink depth to 5 instead of default
-	 * 8 links when kernel has 4k stack to prevent stack overflow.
-	 * For 8k stacks we need to limit it to 7 for local servers. */
-	if (THREAD_SIZE < 8192 && current->link_count >= 6) {
-		rc = -ELOOP;
-	} else if (THREAD_SIZE == 8192 && current->link_count >= 8) {
+	/* Limit the recursive symlink depth.
+	 * Previously limited to 5 instead of default 8 links when
+	 * kernel has 4k stack to prevent stack overflow.
+	 * For 8k stacks, was limited to 7 for local servers.
+	 * Now limited to ensure 2K of stack is available for lustre.
+	 */
+#ifdef CONFIG_STACK_GROWSUP
+	avail_space = end_of_stack(current) - &avail_space;
+#else
+	avail_space = &avail_space - end_of_stack(current);
+#endif
+	if (avail_space < 2048)
 		rc = -ELOOP;
-	} else {
+	else {
 		ll_inode_size_lock(inode);
 		rc = ll_readlink_internal(inode, &request, &symname);
 		ll_inode_size_unlock(inode);

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/20] VFS: replace {, total_}link_count in task_struct with pointer to nameidata
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
  2015-03-23  2:37 ` [PATCH 02/20] STAGING/lustre: limit follow_link recursion using stack space NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 01/20] Documentation: remove outdated information from automount-support.txt NeilBrown
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

task_struct currently contains two ad-hoc members for use by
the VFS: link_count and total_link_count.
These are only interesting to fs/namei.c, so exposing them
explicitly is poor layering.

This patches replaces those with a single pointer to 'struct
nameidata'.
This structure represents the current filename lookup of which
there can only be one per process, and is a natural place to
store link_count and total_link_count.

This will allow the current "nameidata" argument to all
follow_link operations to be removed as current->nameidata
can be used instead.

As there are occasional circumstances where pathname lookup can
recurse, such as through kern_path_locked, we always save and old
current->nameidata (if there is one) when setting a new value, and
make sure any active link_counts are preserved.

follow_mount and follow_automount now get a 'struct nameidata *'
rather than 'int flags' so that they can directly access
link_count and total_link_count, rather than going through 'current'.

Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c            |   79 +++++++++++++++++++++++++++++++++----------------
 include/linux/sched.h |    2 +
 2 files changed, 55 insertions(+), 26 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index c83145af4bfc..53bead4f5bdf 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -502,10 +502,29 @@ struct nameidata {
 	unsigned	seq, m_seq;
 	int		last_type;
 	unsigned	depth;
+	int		link_count,
+			total_link_count;
 	struct file	*base;
 	char *saved_names[MAX_NESTED_LINKS + 1];
 };
 
+static struct nameidata *set_nameidata(struct nameidata *p)
+{
+	struct nameidata *old = current->nameidata;
+
+	current->nameidata = p;
+	if (p) {
+		if (!old) {
+			p->link_count = 0;
+			p->total_link_count = 0;
+		} else {
+			p->link_count = old->link_count;
+			p->total_link_count = old->total_link_count;
+		}
+	}
+	return old;
+}
+
 /*
  * Path walking has 2 modes, rcu-walk and ref-walk (see
  * Documentation/filesystems/path-lookup.txt).  In situations when we can't
@@ -863,11 +882,11 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 		mntget(link->mnt);
 
 	error = -ELOOP;
-	if (unlikely(current->total_link_count >= 40))
+	if (unlikely(nd->total_link_count >= 40))
 		goto out_put_nd_path;
 
 	cond_resched();
-	current->total_link_count++;
+	nd->total_link_count++;
 
 	touch_atime(link);
 	nd_set_link(nd, NULL);
@@ -966,7 +985,7 @@ EXPORT_SYMBOL(follow_up);
  * - return -EISDIR to tell follow_managed() to stop and return the path we
  *   were called with.
  */
-static int follow_automount(struct path *path, unsigned flags,
+static int follow_automount(struct path *path, struct nameidata *nd,
 			    bool *need_mntput)
 {
 	struct vfsmount *mnt;
@@ -986,13 +1005,13 @@ static int follow_automount(struct path *path, unsigned flags,
 	 * as being automount points.  These will need the attentions
 	 * of the daemon to instantiate them before they can be used.
 	 */
-	if (!(flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
-		     LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
+	if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
+			   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
 	    path->dentry->d_inode)
 		return -EISDIR;
 
-	current->total_link_count++;
-	if (current->total_link_count >= 40)
+	nd->total_link_count++;
+	if (nd->total_link_count >= 40)
 		return -ELOOP;
 
 	mnt = path->dentry->d_op->d_automount(path);
@@ -1006,7 +1025,7 @@ static int follow_automount(struct path *path, unsigned flags,
 		 * the path being looked up; if it wasn't then the remainder of
 		 * the path is inaccessible and we should say so.
 		 */
-		if (PTR_ERR(mnt) == -EISDIR && (flags & LOOKUP_PARENT))
+		if (PTR_ERR(mnt) == -EISDIR && (nd->flags & LOOKUP_PARENT))
 			return -EREMOTE;
 		return PTR_ERR(mnt);
 	}
@@ -1046,7 +1065,7 @@ static int follow_automount(struct path *path, unsigned flags,
  *
  * Serialization is taken care of in namespace.c
  */
-static int follow_managed(struct path *path, unsigned flags)
+static int follow_managed(struct path *path, struct nameidata *nd)
 {
 	struct vfsmount *mnt = path->mnt; /* held by caller, must be left alone */
 	unsigned managed;
@@ -1090,7 +1109,7 @@ static int follow_managed(struct path *path, unsigned flags)
 
 		/* Handle an automount point */
 		if (managed & DCACHE_NEED_AUTOMOUNT) {
-			ret = follow_automount(path, flags, &need_mntput);
+			ret = follow_automount(path, nd, &need_mntput);
 			if (ret < 0)
 				break;
 			continue;
@@ -1475,7 +1494,7 @@ unlazy:
 
 	path->mnt = mnt;
 	path->dentry = dentry;
-	err = follow_managed(path, nd->flags);
+	err = follow_managed(path, nd);
 	if (unlikely(err < 0)) {
 		path_put_conditional(path, nd);
 		return err;
@@ -1505,7 +1524,7 @@ static int lookup_slow(struct nameidata *nd, struct path *path)
 		return PTR_ERR(dentry);
 	path->mnt = nd->path.mnt;
 	path->dentry = dentry;
-	err = follow_managed(path, nd->flags);
+	err = follow_managed(path, nd);
 	if (unlikely(err < 0)) {
 		path_put_conditional(path, nd);
 		return err;
@@ -1621,7 +1640,7 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
 {
 	int res;
 
-	if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
+	if (unlikely(nd->link_count >= MAX_NESTED_LINKS)) {
 		path_put_conditional(path, nd);
 		path_put(&nd->path);
 		return -ELOOP;
@@ -1629,7 +1648,7 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
 	BUG_ON(nd->depth >= MAX_NESTED_LINKS);
 
 	nd->depth++;
-	current->link_count++;
+	nd->link_count++;
 
 	do {
 		struct path link = *path;
@@ -1642,7 +1661,7 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
 		put_link(nd, &link, cookie);
 	} while (res > 0);
 
-	current->link_count--;
+	nd->link_count--;
 	nd->depth--;
 	return res;
 }
@@ -1948,7 +1967,7 @@ static int path_init(int dfd, const char *name, unsigned int flags,
 	rcu_read_unlock();
 	return -ECHILD;
 done:
-	current->total_link_count = 0;
+	nd->total_link_count = 0;
 	return link_path_walk(name, nd);
 }
 
@@ -2027,7 +2046,10 @@ static int path_lookupat(int dfd, const char *name,
 static int filename_lookup(int dfd, struct filename *name,
 				unsigned int flags, struct nameidata *nd)
 {
-	int retval = path_lookupat(dfd, name->name, flags | LOOKUP_RCU, nd);
+	int retval;
+	struct nameidata *saved_nd = set_nameidata(nd);
+
+	retval = path_lookupat(dfd, name->name, flags | LOOKUP_RCU, nd);
 	if (unlikely(retval == -ECHILD))
 		retval = path_lookupat(dfd, name->name, flags, nd);
 	if (unlikely(retval == -ESTALE))
@@ -2036,6 +2058,7 @@ static int filename_lookup(int dfd, struct filename *name,
 
 	if (likely(!retval))
 		audit_inode(name, nd->path.dentry, flags & LOOKUP_PARENT);
+	set_nameidata(saved_nd);
 	return retval;
 }
 
@@ -2343,7 +2366,7 @@ out:
 static int
 path_mountpoint(int dfd, const char *name, struct path *path, unsigned int flags)
 {
-	struct nameidata nd;
+	struct nameidata nd, *saved = set_nameidata(&nd);
 	int err;
 
 	err = path_init(dfd, name, flags, &nd);
@@ -2366,6 +2389,7 @@ path_mountpoint(int dfd, const char *name, struct path *path, unsigned int flags
 	}
 out:
 	path_cleanup(&nd);
+	set_nameidata(saved);
 	return err;
 }
 
@@ -3028,7 +3052,7 @@ retry_lookup:
 	if ((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))
 		goto exit_dput;
 
-	error = follow_managed(path, nd->flags);
+	error = follow_managed(path, nd);
 	if (error < 0)
 		goto exit_dput;
 
@@ -3217,12 +3241,14 @@ static struct file *path_openat(int dfd, struct filename *pathname,
 	struct path path;
 	int opened = 0;
 	int error;
+	struct nameidata *saved_nd;
 
 	file = get_empty_filp();
 	if (IS_ERR(file))
 		return file;
 
 	file->f_flags = op->open_flag;
+	saved_nd = set_nameidata(nd);
 
 	if (unlikely(file->f_flags & __O_TMPFILE)) {
 		error = do_tmpfile(dfd, pathname, nd, flags, op, file, &opened);
@@ -3269,6 +3295,7 @@ out:
 		}
 		file = ERR_PTR(error);
 	}
+	set_nameidata(saved_nd);
 	return file;
 }
 
@@ -4429,18 +4456,20 @@ EXPORT_SYMBOL(readlink_copy);
  */
 int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 {
-	struct nameidata nd;
+	struct nameidata nd, *saved = set_nameidata(&nd);
 	void *cookie;
 	int res;
 
 	nd.depth = 0;
 	cookie = dentry->d_inode->i_op->follow_link(dentry, &nd);
 	if (IS_ERR(cookie))
-		return PTR_ERR(cookie);
-
-	res = readlink_copy(buffer, buflen, nd_get_link(&nd));
-	if (dentry->d_inode->i_op->put_link)
-		dentry->d_inode->i_op->put_link(dentry, &nd, cookie);
+		res = PTR_ERR(cookie);
+	else {
+		res = readlink_copy(buffer, buflen, nd_get_link(&nd));
+		if (dentry->d_inode->i_op->put_link)
+			dentry->d_inode->i_op->put_link(dentry, &nd, cookie);
+	}
+	set_nameidata(saved);
 	return res;
 }
 EXPORT_SYMBOL(generic_readlink);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6d77432e14ff..b88b9eea169a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1447,7 +1447,7 @@ struct task_struct {
 				       it with task_lock())
 				     - initialized normally by setup_new_exec */
 /* file system info */
-	int link_count, total_link_count;
+	struct nameidata *nameidata;
 #ifdef CONFIG_SYSVIPC
 /* ipc stuff */
 	struct sysv_sem sysvsem;

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/20] VFS: replace nameidata arg to ->put_link with a char*.
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (6 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 06/20] SECURITY: remove nameidata arg from inode_follow_link NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 09/20] security/selinux: pass 'flags' arg to avc_audit() and avc_has_perm_flags() NeilBrown
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

The only thing any ->put_link() function did with the
nameidata was to call nd_get_link() to get the link name.

So now just pass the link name directly.

This allows us to make nd_get_link() completely local to
namei.c.

Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 Documentation/filesystems/Locking             |    2 +-
 Documentation/filesystems/porting             |    5 +++++
 Documentation/filesystems/vfs.txt             |    4 ++--
 drivers/staging/lustre/lustre/llite/symlink.c |    2 +-
 fs/9p/v9fs.h                                  |    2 +-
 fs/9p/vfs_inode.c                             |    4 +---
 fs/configfs/symlink.c                         |    2 +-
 fs/fuse/dir.c                                 |    4 ++--
 fs/hostfs/hostfs_kern.c                       |    3 +--
 fs/hppfs/hppfs.c                              |    4 ++--
 fs/kernfs/symlink.c                           |    3 +--
 fs/libfs.c                                    |    4 +---
 fs/namei.c                                    |   13 +++++++------
 fs/overlayfs/inode.c                          |    4 ++--
 fs/proc/inode.c                               |    2 +-
 include/linux/fs.h                            |    6 +++---
 include/linux/namei.h                         |    1 -
 mm/shmem.c                                    |    4 ++--
 18 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index f91926f2f482..2e19f5f543b3 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -51,7 +51,7 @@ prototypes:
 			struct inode *, struct dentry *, unsigned int);
 	int (*readlink) (struct dentry *, char __user *,int);
 	void * (*follow_link) (struct dentry *, struct nameidata *);
-	void (*put_link) (struct dentry *, struct nameidata *, void *);
+	void (*put_link) (struct dentry *, char *, void *);
 	void (*truncate) (struct inode *);
 	int (*permission) (struct inode *, int, unsigned int);
 	int (*get_acl)(struct inode *, int);
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index fa2db081505e..088dd1fbba90 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -471,3 +471,8 @@ in your dentry operations instead.
 [mandatory]
 	f_dentry is gone; use f_path.dentry, or, better yet, see if you can avoid
 	it entirely.
+--
+[mandatory]
+	->put_link now takes a 'char *' rather than a 'struct nameidata*'.
+	Instead of calling nd_get_link() on the later, just use the former
+	directly.
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 966b22829f3b..cb8f31bc2fec 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -350,8 +350,8 @@ struct inode_operations {
 	int (*rename2) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *, unsigned int);
 	int (*readlink) (struct dentry *, char __user *,int);
-        void * (*follow_link) (struct dentry *, struct nameidata *);
-        void (*put_link) (struct dentry *, struct nameidata *, void *);
+	void * (*follow_link) (struct dentry *, struct nameidata *);
+	void (*put_link) (struct dentry *, char *, void *);
 	int (*permission) (struct inode *, int);
 	int (*get_acl)(struct inode *, int);
 	int (*setattr) (struct dentry *, struct iattr *);
diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index ba37eb6b29dc..d2b4cd1399c3 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -158,7 +158,7 @@ static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return request;
 }
 
-static void ll_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+static void ll_put_link(struct dentry *dentry, char *link, void *cookie)
 {
 	ptlrpc_req_finished(cookie);
 }
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 099c7712631c..239307689a64 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -150,7 +150,7 @@ extern int v9fs_vfs_unlink(struct inode *i, struct dentry *d);
 extern int v9fs_vfs_rmdir(struct inode *i, struct dentry *d);
 extern int v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 			struct inode *new_dir, struct dentry *new_dentry);
-extern void v9fs_vfs_put_link(struct dentry *dentry, struct nameidata *nd,
+extern void v9fs_vfs_put_link(struct dentry *dentry, char *link,
 			void *p);
 extern struct inode *v9fs_inode_from_fid(struct v9fs_session_info *v9ses,
 					 struct p9_fid *fid,
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 3662f1d1d9cf..f39075956cdc 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1310,10 +1310,8 @@ static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
  */
 
 void
-v9fs_vfs_put_link(struct dentry *dentry, struct nameidata *nd, void *p)
+v9fs_vfs_put_link(struct dentry *dentry, char *s, void *p)
 {
-	char *s = nd_get_link(nd);
-
 	p9_debug(P9_DEBUG_VFS, " %pd %s\n",
 		 dentry, IS_ERR(s) ? "<error>" : s);
 	if (!IS_ERR(s))
diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index cc9f2546ea4a..e860ddb2bd61 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -296,7 +296,7 @@ static void *configfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return NULL;
 }
 
-static void configfs_put_link(struct dentry *dentry, struct nameidata *nd,
+static void configfs_put_link(struct dentry *dentry, char *link,
 			      void *cookie)
 {
 	if (cookie) {
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 1545b711ddcf..3fd76f1afd4e 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1406,9 +1406,9 @@ static void *fuse_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return NULL;
 }
 
-static void fuse_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
+static void fuse_put_link(struct dentry *dentry, char *link, void *c)
 {
-	free_link(nd_get_link(nd));
+	free_link(link);
 }
 
 static int fuse_dir_open(struct inode *inode, struct file *file)
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index fd62cae0fdcb..b59cdb8a25d4 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -906,9 +906,8 @@ static void *hostfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return NULL;
 }
 
-static void hostfs_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+static void hostfs_put_link(struct dentry *dentry, char *s, void *cookie)
 {
-	char *s = nd_get_link(nd);
 	if (!IS_ERR(s))
 		__putname(s);
 }
diff --git a/fs/hppfs/hppfs.c b/fs/hppfs/hppfs.c
index 043ac9d77262..bcf70e5331f6 100644
--- a/fs/hppfs/hppfs.c
+++ b/fs/hppfs/hppfs.c
@@ -649,13 +649,13 @@ static void *hppfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return proc_dentry->d_inode->i_op->follow_link(proc_dentry, nd);
 }
 
-static void hppfs_put_link(struct dentry *dentry, struct nameidata *nd,
+static void hppfs_put_link(struct dentry *dentry, char *link,
 			   void *cookie)
 {
 	struct dentry *proc_dentry = HPPFS_I(dentry->d_inode)->proc_dentry;
 
 	if (proc_dentry->d_inode->i_op->put_link)
-		proc_dentry->d_inode->i_op->put_link(proc_dentry, nd, cookie);
+		proc_dentry->d_inode->i_op->put_link(proc_dentry, link, cookie);
 }
 
 static const struct inode_operations hppfs_dir_iops = {
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 8a198898e39a..2aa55cbce0c2 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -125,10 +125,9 @@ static void *kernfs_iop_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return NULL;
 }
 
-static void kernfs_iop_put_link(struct dentry *dentry, struct nameidata *nd,
+static void kernfs_iop_put_link(struct dentry *dentry, char *page,
 				void *cookie)
 {
-	char *page = nd_get_link(nd);
 	if (!IS_ERR(page))
 		free_page((unsigned long)page);
 }
diff --git a/fs/libfs.c b/fs/libfs.c
index 0ab65122ee45..88cae4d39bd1 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1024,10 +1024,8 @@ int noop_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 }
 EXPORT_SYMBOL(noop_fsync);
 
-void kfree_put_link(struct dentry *dentry, struct nameidata *nd,
-				void *cookie)
+void kfree_put_link(struct dentry *dentry, char *s, void *cookie)
 {
-	char *s = nd_get_link(nd);
 	if (!IS_ERR(s))
 		kfree(s);
 }
diff --git a/fs/namei.c b/fs/namei.c
index 53bead4f5bdf..7ad88ea8c609 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -739,17 +739,16 @@ void nd_set_link(struct nameidata *nd, char *path)
 }
 EXPORT_SYMBOL(nd_set_link);
 
-char *nd_get_link(struct nameidata *nd)
+static inline char *nd_get_link(struct nameidata *nd)
 {
 	return nd->saved_names[nd->depth];
 }
-EXPORT_SYMBOL(nd_get_link);
 
 static inline void put_link(struct nameidata *nd, struct path *link, void *cookie)
 {
 	struct inode *inode = link->dentry->d_inode;
 	if (inode->i_op->put_link)
-		inode->i_op->put_link(link->dentry, nd, cookie);
+		inode->i_op->put_link(link->dentry, nd_get_link(nd), cookie);
 	path_put(link);
 }
 
@@ -4465,9 +4464,11 @@ int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 	if (IS_ERR(cookie))
 		res = PTR_ERR(cookie);
 	else {
-		res = readlink_copy(buffer, buflen, nd_get_link(&nd));
+		char *link = nd_get_link(&nd);
+
+		res = readlink_copy(buffer, buflen, link);
 		if (dentry->d_inode->i_op->put_link)
-			dentry->d_inode->i_op->put_link(dentry, &nd, cookie);
+			dentry->d_inode->i_op->put_link(dentry, link, cookie);
 	}
 	set_nameidata(saved);
 	return res;
@@ -4509,7 +4510,7 @@ void *page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
 }
 EXPORT_SYMBOL(page_follow_link_light);
 
-void page_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+void page_put_link(struct dentry *dentry, char *link, void *cookie)
 {
 	struct page *page = cookie;
 
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 1b4b9c5e51b7..f1abb51bf9ec 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -172,7 +172,7 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return data;
 }
 
-static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
+static void ovl_put_link(struct dentry *dentry, char *link, void *c)
 {
 	struct inode *realinode;
 	struct ovl_link_data *data = c;
@@ -181,7 +181,7 @@ static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
 		return;
 
 	realinode = data->realdentry->d_inode;
-	realinode->i_op->put_link(data->realdentry, nd, data->cookie);
+	realinode->i_op->put_link(data->realdentry, link, data->cookie);
 	kfree(data);
 }
 
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 7697b6621cfd..a4a716e6d6b9 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -403,7 +403,7 @@ static void *proc_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return pde;
 }
 
-static void proc_put_link(struct dentry *dentry, struct nameidata *nd, void *p)
+static void proc_put_link(struct dentry *dentry, char *link, void *p)
 {
 	unuse_pde(p);
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b821fa32ba3f..510b749c4040 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1579,7 +1579,7 @@ struct inode_operations {
 	struct posix_acl * (*get_acl)(struct inode *, int);
 
 	int (*readlink) (struct dentry *, char __user *,int);
-	void (*put_link) (struct dentry *, struct nameidata *, void *);
+	void (*put_link) (struct dentry *, char *, void *);
 
 	int (*create) (struct inode *,struct dentry *, umode_t, bool);
 	int (*link) (struct dentry *,struct inode *,struct dentry *);
@@ -2650,12 +2650,12 @@ extern const struct file_operations generic_ro_fops;
 extern int readlink_copy(char __user *, int, const char *);
 extern int page_readlink(struct dentry *, char __user *, int);
 extern void *page_follow_link_light(struct dentry *, struct nameidata *);
-extern void page_put_link(struct dentry *, struct nameidata *, void *);
+extern void page_put_link(struct dentry *, char *, void *);
 extern int __page_symlink(struct inode *inode, const char *symname, int len,
 		int nofs);
 extern int page_symlink(struct inode *inode, const char *symname, int len);
 extern const struct inode_operations page_symlink_inode_operations;
-extern void kfree_put_link(struct dentry *, struct nameidata *, void *);
+extern void kfree_put_link(struct dentry *, char *, void *);
 extern int generic_readlink(struct dentry *, char __user *, int);
 extern void generic_fillattr(struct inode *, struct kstat *);
 int vfs_getattr_nosec(struct path *path, struct kstat *stat);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index c8990779f0c3..2ec27f2457d6 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -72,7 +72,6 @@ extern void unlock_rename(struct dentry *, struct dentry *);
 
 extern void nd_jump_link(struct nameidata *nd, struct path *path);
 extern void nd_set_link(struct nameidata *nd, char *path);
-extern char *nd_get_link(struct nameidata *nd);
 
 static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
 {
diff --git a/mm/shmem.c b/mm/shmem.c
index cf2d0ca010bc..53bf4d160e8b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2490,9 +2490,9 @@ static void *shmem_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return page;
 }
 
-static void shmem_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+static void shmem_put_link(struct dentry *dentry, char *link, void *cookie)
 {
-	if (!IS_ERR(nd_get_link(nd))) {
+	if (!IS_ERR(link)) {
 		struct page *page = cookie;
 		kunmap(page);
 		mark_page_accessed(page);

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/20] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (3 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 10/20] security: make inode_follow_link RCU-walk aware NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 07/20] VFS: remove nameidata args from ->follow_link NeilBrown
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

ovl_follow_link current calls ->put_link on an error path.
However ->put_link is about to change in a way that it will be
impossible to call it from ovl_follow_link.

So rearrange the code to avoid the need for that error path.
Specifically: move the kmalloc() call before the ->follow_link()
call to the subordinate filesystem.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/overlayfs/inode.c |   25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 04f124884687..1b4b9c5e51b7 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -145,6 +145,7 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
 	void *ret;
 	struct dentry *realdentry;
 	struct inode *realinode;
+	struct ovl_link_data *data = NULL;
 
 	realdentry = ovl_dentry_real(dentry);
 	realinode = realdentry->d_inode;
@@ -152,25 +153,23 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
 	if (WARN_ON(!realinode->i_op->follow_link))
 		return ERR_PTR(-EPERM);
 
-	ret = realinode->i_op->follow_link(realdentry, nd);
-	if (IS_ERR(ret))
-		return ret;
-
 	if (realinode->i_op->put_link) {
-		struct ovl_link_data *data;
-
 		data = kmalloc(sizeof(struct ovl_link_data), GFP_KERNEL);
-		if (!data) {
-			realinode->i_op->put_link(realdentry, nd, ret);
+		if (!data)
 			return ERR_PTR(-ENOMEM);
-		}
 		data->realdentry = realdentry;
-		data->cookie = ret;
+	}
 
-		return data;
-	} else {
-		return NULL;
+	ret = realinode->i_op->follow_link(realdentry, nd);
+	if (IS_ERR(ret)) {
+		kfree(data);
+		return ret;
 	}
+
+	if (data)
+		data->cookie = ret;
+
+	return data;
 }
 
 static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/20] SECURITY: remove nameidata arg from inode_follow_link.
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (5 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 07/20] VFS: remove nameidata args from ->follow_link NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 05/20] VFS: replace nameidata arg to ->put_link with a char* NeilBrown
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

No ->inode_follow_link() methods use the nameidata arg, and
it is about to become private to namei.c.
So remove from all inode_follow_link() functions.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c               |    2 +-
 include/linux/security.h |    9 +++------
 security/capability.c    |    3 +--
 security/security.c      |    4 ++--
 security/selinux/hooks.c |    2 +-
 5 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 7ad88ea8c609..32f418f96d9b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -890,7 +890,7 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	touch_atime(link);
 	nd_set_link(nd, NULL);
 
-	error = security_inode_follow_link(link->dentry, nd);
+	error = security_inode_follow_link(link->dentry);
 	if (error)
 		goto out_put_nd_path;
 
diff --git a/include/linux/security.h b/include/linux/security.h
index a1b7dbd127ff..237d22bfc642 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -43,7 +43,6 @@ struct file;
 struct vfsmount;
 struct path;
 struct qstr;
-struct nameidata;
 struct iattr;
 struct fown_struct;
 struct file_operations;
@@ -477,7 +476,6 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
  * @inode_follow_link:
  *	Check permission to follow a symbolic link when looking up a pathname.
  *	@dentry contains the dentry structure for the link.
- *	@nd contains the nameidata structure for the parent directory.
  *	Return 0 if permission is granted.
  * @inode_permission:
  *	Check permission before accessing an inode.  This hook is called by the
@@ -1553,7 +1551,7 @@ struct security_operations {
 	int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry,
 			     struct inode *new_dir, struct dentry *new_dentry);
 	int (*inode_readlink) (struct dentry *dentry);
-	int (*inode_follow_link) (struct dentry *dentry, struct nameidata *nd);
+	int (*inode_follow_link) (struct dentry *dentry);
 	int (*inode_permission) (struct inode *inode, int mask);
 	int (*inode_setattr)	(struct dentry *dentry, struct iattr *attr);
 	int (*inode_getattr) (struct vfsmount *mnt, struct dentry *dentry);
@@ -1840,7 +1838,7 @@ int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
 			  struct inode *new_dir, struct dentry *new_dentry,
 			  unsigned int flags);
 int security_inode_readlink(struct dentry *dentry);
-int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd);
+int security_inode_follow_link(struct dentry *dentry);
 int security_inode_permission(struct inode *inode, int mask);
 int security_inode_setattr(struct dentry *dentry, struct iattr *attr);
 int security_inode_getattr(struct vfsmount *mnt, struct dentry *dentry);
@@ -2242,8 +2240,7 @@ static inline int security_inode_readlink(struct dentry *dentry)
 	return 0;
 }
 
-static inline int security_inode_follow_link(struct dentry *dentry,
-					      struct nameidata *nd)
+static inline int security_inode_follow_link(struct dentry *dentry)
 {
 	return 0;
 }
diff --git a/security/capability.c b/security/capability.c
index 070dd46f62f4..ad8557782e73 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -209,8 +209,7 @@ static int cap_inode_readlink(struct dentry *dentry)
 	return 0;
 }
 
-static int cap_inode_follow_link(struct dentry *dentry,
-				 struct nameidata *nameidata)
+static int cap_inode_follow_link(struct dentry *dentry)
 {
 	return 0;
 }
diff --git a/security/security.c b/security/security.c
index e81d5bbe7363..7b4fd199e881 100644
--- a/security/security.c
+++ b/security/security.c
@@ -581,11 +581,11 @@ int security_inode_readlink(struct dentry *dentry)
 	return security_ops->inode_readlink(dentry);
 }
 
-int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd)
+int security_inode_follow_link(struct dentry *dentry)
 {
 	if (unlikely(IS_PRIVATE(dentry->d_inode)))
 		return 0;
-	return security_ops->inode_follow_link(dentry, nd);
+	return security_ops->inode_follow_link(dentry);
 }
 
 int security_inode_permission(struct inode *inode, int mask)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 4d1a54190388..a2c29efcacc9 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2862,7 +2862,7 @@ static int selinux_inode_readlink(struct dentry *dentry)
 	return dentry_has_perm(cred, dentry, FILE__READ);
 }
 
-static int selinux_inode_follow_link(struct dentry *dentry, struct nameidata *nameidata)
+static int selinux_inode_follow_link(struct dentry *dentry)
 {
 	const struct cred *cred = current_cred();
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/20] VFS: remove nameidata args from ->follow_link
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (4 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 04/20] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 06/20] SECURITY: remove nameidata arg from inode_follow_link NeilBrown
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

Now that current->nameidata is available, nd_set_link() can
use that directly, so 'nd' doesn't need to be passed through
->follow_link.

As a result of this change, 'nameidata' is almost entirely
local to namei.c.  It is only exposed externally as an opaque struct
pointed to by current->nameidata.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 Documentation/filesystems/Locking             |    2 +-
 Documentation/filesystems/porting             |    5 +++++
 Documentation/filesystems/vfs.txt             |    2 +-
 drivers/staging/lustre/lustre/llite/symlink.c |    4 ++--
 fs/9p/vfs_inode.c                             |    6 ++----
 fs/9p/vfs_inode_dotl.c                        |    5 ++---
 fs/autofs4/symlink.c                          |    4 ++--
 fs/befs/linuxvfs.c                            |   12 ++++++------
 fs/ceph/inode.c                               |    4 ++--
 fs/cifs/cifsfs.h                              |    2 +-
 fs/cifs/link.c                                |    4 ++--
 fs/configfs/symlink.c                         |    6 +++---
 fs/debugfs/file.c                             |    4 ++--
 fs/ecryptfs/inode.c                           |    6 ++----
 fs/exofs/symlink.c                            |    4 ++--
 fs/ext2/symlink.c                             |    4 ++--
 fs/ext3/symlink.c                             |    4 ++--
 fs/ext4/symlink.c                             |    4 ++--
 fs/freevxfs/vxfs_immed.c                      |    7 +++----
 fs/fuse/dir.c                                 |    4 ++--
 fs/gfs2/inode.c                               |    7 +++----
 fs/hostfs/hostfs_kern.c                       |    4 ++--
 fs/hppfs/hppfs.c                              |    4 ++--
 fs/jffs2/symlink.c                            |    6 +++---
 fs/jfs/symlink.c                              |    4 ++--
 fs/kernfs/symlink.c                           |    4 ++--
 fs/namei.c                                    |   18 +++++++++++-------
 fs/nfs/symlink.c                              |    6 +++---
 fs/ntfs/namei.c                               |    1 -
 fs/overlayfs/inode.c                          |    4 ++--
 fs/proc/base.c                                |    4 ++--
 fs/proc/inode.c                               |    4 ++--
 fs/proc/namespaces.c                          |    4 ++--
 fs/proc/self.c                                |    4 ++--
 fs/proc/thread_self.c                         |    4 ++--
 fs/sysv/symlink.c                             |    4 ++--
 fs/ubifs/file.c                               |    4 ++--
 fs/ufs/symlink.c                              |    4 ++--
 fs/xfs/xfs_iops.c                             |    7 +++----
 include/linux/fs.h                            |    5 ++---
 include/linux/namei.h                         |    5 ++---
 include/linux/sched.h                         |    1 +
 mm/shmem.c                                    |    8 ++++----
 43 files changed, 104 insertions(+), 105 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 2e19f5f543b3..bbce4914d209 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -50,7 +50,7 @@ prototypes:
 	int (*rename2) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *, unsigned int);
 	int (*readlink) (struct dentry *, char __user *,int);
-	void * (*follow_link) (struct dentry *, struct nameidata *);
+	void * (*follow_link) (struct dentry *);
 	void (*put_link) (struct dentry *, char *, void *);
 	void (*truncate) (struct inode *);
 	int (*permission) (struct inode *, int, unsigned int);
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index 088dd1fbba90..9996b4631a87 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -476,3 +476,8 @@ in your dentry operations instead.
 	->put_link now takes a 'char *' rather than a 'struct nameidata*'.
 	Instead of calling nd_get_link() on the later, just use the former
 	directly.
+--
+[mandatory]
+	->follow_link() no longer receives 'struct nameidata *'.
+	The nd is now attached to 'current' and nd_set_link()
+	accesses it directly.
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index cb8f31bc2fec..11aac530931b 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -350,7 +350,7 @@ struct inode_operations {
 	int (*rename2) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *, unsigned int);
 	int (*readlink) (struct dentry *, char __user *,int);
-	void * (*follow_link) (struct dentry *, struct nameidata *);
+        void * (*follow_link) (struct dentry *);
 	void (*put_link) (struct dentry *, char *, void *);
 	int (*permission) (struct inode *, int);
 	int (*get_acl)(struct inode *, int);
diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index d2b4cd1399c3..63dd1a925c92 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -118,7 +118,7 @@ failed:
 	return rc;
 }
 
-static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ll_follow_link(struct dentry *dentry)
 {
 	unsigned long avail_space;
 	struct inode *inode = dentry->d_inode;
@@ -151,7 +151,7 @@ static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
 		symname = ERR_PTR(rc);
 	}
 
-	nd_set_link(nd, symname);
+	nd_set_link(symname);
 	/* symname may contain a pointer to the request message buffer,
 	 * we delay request releasing until ll_put_link then.
 	 */
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index f39075956cdc..ebf50c3e132c 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1274,11 +1274,10 @@ done:
 /**
  * v9fs_vfs_follow_link - follow a symlink path
  * @dentry: dentry for symlink
- * @nd: nameidata
  *
  */
 
-static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *v9fs_vfs_follow_link(struct dentry *dentry)
 {
 	int len = 0;
 	char *link = __getname();
@@ -1296,7 +1295,7 @@ static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 		} else
 			link[min(len, PATH_MAX-1)] = 0;
 	}
-	nd_set_link(nd, link);
+	nd_set_link(link);
 
 	return NULL;
 }
@@ -1304,7 +1303,6 @@ static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 /**
  * v9fs_vfs_put_link - release a symlink path
  * @dentry: dentry for symlink
- * @nd: nameidata
  * @p: unused
  *
  */
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 6054c16b8fae..dc35156aea6a 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -905,12 +905,11 @@ error:
 /**
  * v9fs_vfs_follow_link_dotl - follow a symlink path
  * @dentry: dentry for symlink
- * @nd: nameidata
  *
  */
 
 static void *
-v9fs_vfs_follow_link_dotl(struct dentry *dentry, struct nameidata *nd)
+v9fs_vfs_follow_link_dotl(struct dentry *dentry)
 {
 	int retval;
 	struct p9_fid *fid;
@@ -938,7 +937,7 @@ v9fs_vfs_follow_link_dotl(struct dentry *dentry, struct nameidata *nd)
 	__putname(link);
 	link = ERR_PTR(retval);
 ndset:
-	nd_set_link(nd, link);
+	nd_set_link(link);
 	return NULL;
 }
 
diff --git a/fs/autofs4/symlink.c b/fs/autofs4/symlink.c
index 1e8ea192be2b..37b4b561faa3 100644
--- a/fs/autofs4/symlink.c
+++ b/fs/autofs4/symlink.c
@@ -12,13 +12,13 @@
 
 #include "autofs_i.h"
 
-static void *autofs4_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *autofs4_follow_link(struct dentry *dentry)
 {
 	struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
 	struct autofs_info *ino = autofs4_dentry_ino(dentry);
 	if (ino && !autofs4_oz_mode(sbi))
 		ino->last_used = jiffies;
-	nd_set_link(nd, dentry->d_inode->i_private);
+	nd_set_link(dentry->d_inode->i_private);
 	return NULL;
 }
 
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index e089f1985fca..339ac02c0e17 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -42,8 +42,8 @@ static struct inode *befs_iget(struct super_block *, unsigned long);
 static struct inode *befs_alloc_inode(struct super_block *sb);
 static void befs_destroy_inode(struct inode *inode);
 static void befs_destroy_inodecache(void);
-static void *befs_follow_link(struct dentry *, struct nameidata *);
-static void *befs_fast_follow_link(struct dentry *, struct nameidata *);
+static void *befs_follow_link(struct dentry *);
+static void *befs_fast_follow_link(struct dentry *);
 static int befs_utf2nls(struct super_block *sb, const char *in, int in_len,
 			char **out, int *out_len);
 static int befs_nls2utf(struct super_block *sb, const char *in, int in_len,
@@ -469,7 +469,7 @@ befs_destroy_inodecache(void)
  * flag is set.
  */
 static void *
-befs_follow_link(struct dentry *dentry, struct nameidata *nd)
+befs_follow_link(struct dentry *dentry)
 {
 	struct super_block *sb = dentry->d_sb;
 	befs_inode_info *befs_ino = BEFS_I(dentry->d_inode);
@@ -494,16 +494,16 @@ befs_follow_link(struct dentry *dentry, struct nameidata *nd)
 			link[len - 1] = '\0';
 		}
 	}
-	nd_set_link(nd, link);
+	nd_set_link(link);
 	return NULL;
 }
 
 
 static void *
-befs_fast_follow_link(struct dentry *dentry, struct nameidata *nd)
+befs_fast_follow_link(struct dentry *dentry)
 {
 	befs_inode_info *befs_ino = BEFS_I(dentry->d_inode);
-	nd_set_link(nd, befs_ino->i_data.symlink);
+	nd_set_link(befs_ino->i_data.symlink);
 	return NULL;
 }
 
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 119c43c80638..f8212d6945de 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1691,10 +1691,10 @@ retry:
 /*
  * symlinks
  */
-static void *ceph_sym_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ceph_sym_follow_link(struct dentry *dentry)
 {
 	struct ceph_inode_info *ci = ceph_inode(dentry->d_inode);
-	nd_set_link(nd, ci->i_symlink);
+	nd_set_link(ci->i_symlink);
 	return NULL;
 }
 
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 252f5c15806b..e3a6ef52a3e4 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -120,7 +120,7 @@ extern struct vfsmount *cifs_dfs_d_automount(struct path *path);
 #endif
 
 /* Functions related to symlinks */
-extern void *cifs_follow_link(struct dentry *direntry, struct nameidata *nd);
+extern void *cifs_follow_link(struct dentry *direntry);
 extern int cifs_readlink(struct dentry *direntry, char __user *buffer,
 			 int buflen);
 extern int cifs_symlink(struct inode *inode, struct dentry *direntry,
diff --git a/fs/cifs/link.c b/fs/cifs/link.c
index 2ec6037f61c7..ba3562198c33 100644
--- a/fs/cifs/link.c
+++ b/fs/cifs/link.c
@@ -627,7 +627,7 @@ cifs_hl_exit:
 }
 
 void *
-cifs_follow_link(struct dentry *direntry, struct nameidata *nd)
+cifs_follow_link(struct dentry *direntry)
 {
 	struct inode *inode = direntry->d_inode;
 	int rc = -ENOMEM;
@@ -679,7 +679,7 @@ out:
 	free_xid(xid);
 	if (tlink)
 		cifs_put_tlink(tlink);
-	nd_set_link(nd, target_path);
+	nd_set_link(target_path);
 	return NULL;
 }
 
diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index e860ddb2bd61..ff41712ffddd 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -279,7 +279,7 @@ static int configfs_getlink(struct dentry *dentry, char * path)
 
 }
 
-static void *configfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *configfs_follow_link(struct dentry *dentry)
 {
 	int error = -ENOMEM;
 	unsigned long page = get_zeroed_page(GFP_KERNEL);
@@ -287,12 +287,12 @@ static void *configfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	if (page) {
 		error = configfs_getlink(dentry, (char *)page);
 		if (!error) {
-			nd_set_link(nd, (char *)page);
+			nd_set_link((char *)page);
 			return (void *)page;
 		}
 	}
 
-	nd_set_link(nd, ERR_PTR(error));
+	nd_set_link(ERR_PTR(error));
 	return NULL;
 }
 
diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index 517e64938438..eeed1f1fed4f 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -43,9 +43,9 @@ const struct file_operations debugfs_file_operations = {
 	.llseek =	noop_llseek,
 };
 
-static void *debugfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *debugfs_follow_link(struct dentry *dentry)
 {
-	nd_set_link(nd, dentry->d_inode->i_private);
+	nd_set_link(dentry->d_inode->i_private);
 	return NULL;
 }
 
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index b08b5187f662..680cf30e9135 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -170,7 +170,6 @@ out_unlock:
  * @directory_inode: inode of the new file's dentry's parent in ecryptfs
  * @ecryptfs_dentry: New file's dentry in ecryptfs
  * @mode: The mode of the new file
- * @nd: nameidata of ecryptfs' parent's dentry & vfsmount
  *
  * Creates the underlying file and the eCryptfs inode which will link to
  * it. It will also update the eCryptfs directory inode to mimic the
@@ -384,7 +383,6 @@ static int ecryptfs_lookup_interpose(struct dentry *dentry,
  * ecryptfs_lookup
  * @ecryptfs_dir_inode: The eCryptfs directory inode
  * @ecryptfs_dentry: The eCryptfs dentry that we are looking up
- * @ecryptfs_nd: nameidata; may be NULL
  *
  * Find a file on disk. If the file does not exist, then we'll add it to the
  * dentry cache and continue on to read it from the disk.
@@ -675,7 +673,7 @@ out:
 	return rc ? ERR_PTR(rc) : buf;
 }
 
-static void *ecryptfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ecryptfs_follow_link(struct dentry *dentry)
 {
 	size_t len;
 	char *buf = ecryptfs_readlink_lower(dentry, &len);
@@ -685,7 +683,7 @@ static void *ecryptfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 				ecryptfs_dentry_to_lower(dentry)->d_inode);
 	buf[len] = '\0';
 out:
-	nd_set_link(nd, buf);
+	nd_set_link(buf);
 	return NULL;
 }
 
diff --git a/fs/exofs/symlink.c b/fs/exofs/symlink.c
index 832e2624b80b..e6d0467a0b5a 100644
--- a/fs/exofs/symlink.c
+++ b/fs/exofs/symlink.c
@@ -35,11 +35,11 @@
 
 #include "exofs.h"
 
-static void *exofs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *exofs_follow_link(struct dentry *dentry)
 {
 	struct exofs_i_info *oi = exofs_i(dentry->d_inode);
 
-	nd_set_link(nd, (char *)oi->i_data);
+	nd_set_link((char *)oi->i_data);
 	return NULL;
 }
 
diff --git a/fs/ext2/symlink.c b/fs/ext2/symlink.c
index 565cf817bbf1..063852432bd4 100644
--- a/fs/ext2/symlink.c
+++ b/fs/ext2/symlink.c
@@ -21,10 +21,10 @@
 #include "xattr.h"
 #include <linux/namei.h>
 
-static void *ext2_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ext2_follow_link(struct dentry *dentry)
 {
 	struct ext2_inode_info *ei = EXT2_I(dentry->d_inode);
-	nd_set_link(nd, (char *)ei->i_data);
+	nd_set_link((char *)ei->i_data);
 	return NULL;
 }
 
diff --git a/fs/ext3/symlink.c b/fs/ext3/symlink.c
index 6b01c3eab1f3..bf8acd9efaae 100644
--- a/fs/ext3/symlink.c
+++ b/fs/ext3/symlink.c
@@ -21,10 +21,10 @@
 #include "ext3.h"
 #include "xattr.h"
 
-static void * ext3_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void * ext3_follow_link(struct dentry *dentry)
 {
 	struct ext3_inode_info *ei = EXT3_I(dentry->d_inode);
-	nd_set_link(nd, (char*)ei->i_data);
+	nd_set_link((char*)ei->i_data);
 	return NULL;
 }
 
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index ff3711932018..0015e7f53d0f 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -23,10 +23,10 @@
 #include "ext4.h"
 #include "xattr.h"
 
-static void *ext4_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ext4_follow_link(struct dentry *dentry)
 {
 	struct ext4_inode_info *ei = EXT4_I(dentry->d_inode);
-	nd_set_link(nd, (char *) ei->i_data);
+	nd_set_link((char *) ei->i_data);
 	return NULL;
 }
 
diff --git a/fs/freevxfs/vxfs_immed.c b/fs/freevxfs/vxfs_immed.c
index c36aeaf92e41..058acefeb11c 100644
--- a/fs/freevxfs/vxfs_immed.c
+++ b/fs/freevxfs/vxfs_immed.c
@@ -39,7 +39,7 @@
 #include "vxfs_inode.h"
 
 
-static void *	vxfs_immed_follow_link(struct dentry *, struct nameidata *);
+static void *	vxfs_immed_follow_link(struct dentry *);
 
 static int	vxfs_immed_readpage(struct file *, struct page *);
 
@@ -64,7 +64,6 @@ const struct address_space_operations vxfs_immed_aops = {
 /**
  * vxfs_immed_follow_link - follow immed symlink
  * @dp:		dentry for the link
- * @np:		pathname lookup data for the current path walk
  *
  * Description:
  *   vxfs_immed_follow_link restarts the pathname lookup with
@@ -74,10 +73,10 @@ const struct address_space_operations vxfs_immed_aops = {
  *   Zero on success, else a negative error code.
  */
 static void *
-vxfs_immed_follow_link(struct dentry *dp, struct nameidata *np)
+vxfs_immed_follow_link(struct dentry *dp)
 {
 	struct vxfs_inode_info		*vip = VXFS_INO(dp->d_inode);
-	nd_set_link(np, vip->vii_immed.vi_immed);
+	nd_set_link(vip->vii_immed.vi_immed);
 	return NULL;
 }
 
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 3fd76f1afd4e..58e632be862b 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1400,9 +1400,9 @@ static void free_link(char *link)
 		free_page((unsigned long) link);
 }
 
-static void *fuse_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *fuse_follow_link(struct dentry *dentry)
 {
-	nd_set_link(nd, read_link(dentry));
+	nd_set_link(read_link(dentry));
 	return NULL;
 }
 
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 73c72253faac..5b28009ea860 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1541,14 +1541,13 @@ out:
 /**
  * gfs2_follow_link - Follow a symbolic link
  * @dentry: The dentry of the link
- * @nd: Data that we pass to vfs_follow_link()
  *
  * This can handle symlinks of any size.
  *
  * Returns: 0 on success or error code
  */
 
-static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *gfs2_follow_link(struct dentry *dentry)
 {
 	struct gfs2_inode *ip = GFS2_I(dentry->d_inode);
 	struct gfs2_holder i_gh;
@@ -1561,7 +1560,7 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
 	error = gfs2_glock_nq(&i_gh);
 	if (error) {
 		gfs2_holder_uninit(&i_gh);
-		nd_set_link(nd, ERR_PTR(error));
+		nd_set_link(ERR_PTR(error));
 		return NULL;
 	}
 
@@ -1586,7 +1585,7 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
 	brelse(dibh);
 out:
 	gfs2_glock_dq_uninit(&i_gh);
-	nd_set_link(nd, buf);
+	nd_set_link(buf);
 	return NULL;
 }
 
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index b59cdb8a25d4..a862634bf98e 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -882,7 +882,7 @@ static const struct inode_operations hostfs_dir_iops = {
 	.setattr	= hostfs_setattr,
 };
 
-static void *hostfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *hostfs_follow_link(struct dentry *dentry)
 {
 	char *link = __getname();
 	if (link) {
@@ -902,7 +902,7 @@ static void *hostfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 		link = ERR_PTR(-ENOMEM);
 	}
 
-	nd_set_link(nd, link);
+	nd_set_link(link);
 	return NULL;
 }
 
diff --git a/fs/hppfs/hppfs.c b/fs/hppfs/hppfs.c
index bcf70e5331f6..bf13cf24eec7 100644
--- a/fs/hppfs/hppfs.c
+++ b/fs/hppfs/hppfs.c
@@ -642,11 +642,11 @@ static int hppfs_readlink(struct dentry *dentry, char __user *buffer,
 						    buflen);
 }
 
-static void *hppfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *hppfs_follow_link(struct dentry *dentry)
 {
 	struct dentry *proc_dentry = HPPFS_I(dentry->d_inode)->proc_dentry;
 
-	return proc_dentry->d_inode->i_op->follow_link(proc_dentry, nd);
+	return proc_dentry->d_inode->i_op->follow_link(proc_dentry);
 }
 
 static void hppfs_put_link(struct dentry *dentry, char *link,
diff --git a/fs/jffs2/symlink.c b/fs/jffs2/symlink.c
index c7c77b0dfccd..6b58d7659fbd 100644
--- a/fs/jffs2/symlink.c
+++ b/fs/jffs2/symlink.c
@@ -16,7 +16,7 @@
 #include <linux/namei.h>
 #include "nodelist.h"
 
-static void *jffs2_follow_link(struct dentry *dentry, struct nameidata *nd);
+static void *jffs2_follow_link(struct dentry *dentry);
 
 const struct inode_operations jffs2_symlink_inode_operations =
 {
@@ -29,7 +29,7 @@ const struct inode_operations jffs2_symlink_inode_operations =
 	.removexattr =	jffs2_removexattr
 };
 
-static void *jffs2_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *jffs2_follow_link(struct dentry *dentry)
 {
 	struct jffs2_inode_info *f = JFFS2_INODE_INFO(dentry->d_inode);
 	char *p = (char *)f->target;
@@ -54,7 +54,7 @@ static void *jffs2_follow_link(struct dentry *dentry, struct nameidata *nd)
 	jffs2_dbg(1, "%s(): target path is '%s'\n",
 		  __func__, (char *)f->target);
 
-	nd_set_link(nd, p);
+	nd_set_link(p);
 
 	/*
 	 * We will unlock the f->sem mutex but VFS will use the f->target string. This is safe
diff --git a/fs/jfs/symlink.c b/fs/jfs/symlink.c
index 205b946d8e0d..cefda45b8d3a 100644
--- a/fs/jfs/symlink.c
+++ b/fs/jfs/symlink.c
@@ -22,10 +22,10 @@
 #include "jfs_inode.h"
 #include "jfs_xattr.h"
 
-static void *jfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *jfs_follow_link(struct dentry *dentry)
 {
 	char *s = JFS_IP(dentry->d_inode)->i_inline;
-	nd_set_link(nd, s);
+	nd_set_link(s);
 	return NULL;
 }
 
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 2aa55cbce0c2..63d08ecbb72c 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -112,7 +112,7 @@ static int kernfs_getlink(struct dentry *dentry, char *path)
 	return error;
 }
 
-static void *kernfs_iop_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *kernfs_iop_follow_link(struct dentry *dentry)
 {
 	int error = -ENOMEM;
 	unsigned long page = get_zeroed_page(GFP_KERNEL);
@@ -121,7 +121,7 @@ static void *kernfs_iop_follow_link(struct dentry *dentry, struct nameidata *nd)
 		if (error < 0)
 			free_page((unsigned long)page);
 	}
-	nd_set_link(nd, error ? ERR_PTR(error) : (char *)page);
+	nd_set_link(error ? ERR_PTR(error) : (char *)page);
 	return NULL;
 }
 
diff --git a/fs/namei.c b/fs/namei.c
index 32f418f96d9b..e7fab6886e29 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -724,8 +724,10 @@ static inline void path_to_nameidata(const struct path *path,
  * Helper to directly jump to a known parsed path from ->follow_link,
  * caller must have taken a reference to path beforehand.
  */
-void nd_jump_link(struct nameidata *nd, struct path *path)
+void nd_jump_link(struct path *path)
 {
+	struct nameidata *nd = current->nameidata;
+
 	path_put(&nd->path);
 
 	nd->path = *path;
@@ -733,8 +735,10 @@ void nd_jump_link(struct nameidata *nd, struct path *path)
 	nd->flags |= LOOKUP_JUMPED;
 }
 
-void nd_set_link(struct nameidata *nd, char *path)
+void nd_set_link(char *path)
 {
+	struct nameidata *nd = current->nameidata;
+
 	nd->saved_names[nd->depth] = path;
 }
 EXPORT_SYMBOL(nd_set_link);
@@ -888,14 +892,14 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	nd->total_link_count++;
 
 	touch_atime(link);
-	nd_set_link(nd, NULL);
+	nd_set_link(NULL);
 
 	error = security_inode_follow_link(link->dentry);
 	if (error)
 		goto out_put_nd_path;
 
 	nd->last_type = LAST_BIND;
-	*p = dentry->d_inode->i_op->follow_link(dentry, nd);
+	*p = dentry->d_inode->i_op->follow_link(dentry);
 	error = PTR_ERR(*p);
 	if (IS_ERR(*p))
 		goto out_put_nd_path;
@@ -4460,7 +4464,7 @@ int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 	int res;
 
 	nd.depth = 0;
-	cookie = dentry->d_inode->i_op->follow_link(dentry, &nd);
+	cookie = dentry->d_inode->i_op->follow_link(dentry);
 	if (IS_ERR(cookie))
 		res = PTR_ERR(cookie);
 	else {
@@ -4502,10 +4506,10 @@ int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 }
 EXPORT_SYMBOL(page_readlink);
 
-void *page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
+void *page_follow_link_light(struct dentry *dentry)
 {
 	struct page *page = NULL;
-	nd_set_link(nd, page_getlink(dentry, &page));
+	nd_set_link(page_getlink(dentry, &page));
 	return page;
 }
 EXPORT_SYMBOL(page_follow_link_light);
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index 05c9e02f4153..f3c44616b615 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -43,7 +43,7 @@ error:
 	return -EIO;
 }
 
-static void *nfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *nfs_follow_link(struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
 	struct page *page;
@@ -58,11 +58,11 @@ static void *nfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 		err = page;
 		goto read_failed;
 	}
-	nd_set_link(nd, kmap(page));
+	nd_set_link(kmap(page));
 	return page;
 
 read_failed:
-	nd_set_link(nd, err);
+	nd_set_link(err);
 	return NULL;
 }
 
diff --git a/fs/ntfs/namei.c b/fs/ntfs/namei.c
index b3973c2fd190..a6a240ecf878 100644
--- a/fs/ntfs/namei.c
+++ b/fs/ntfs/namei.c
@@ -35,7 +35,6 @@
  * ntfs_lookup - find the inode represented by a dentry in a directory inode
  * @dir_ino:	directory inode in which to look for the inode
  * @dent:	dentry representing the inode to look for
- * @nd:		lookup nameidata
  *
  * In short, ntfs_lookup() looks for the inode represented by the dentry @dent
  * in the directory inode @dir_ino and if found attaches the inode to the
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index f1abb51bf9ec..0de7b87bd025 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -140,7 +140,7 @@ struct ovl_link_data {
 	void *cookie;
 };
 
-static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ovl_follow_link(struct dentry *dentry)
 {
 	void *ret;
 	struct dentry *realdentry;
@@ -160,7 +160,7 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
 		data->realdentry = realdentry;
 	}
 
-	ret = realinode->i_op->follow_link(realdentry, nd);
+	ret = realinode->i_op->follow_link(realdentry);
 	if (IS_ERR(ret)) {
 		kfree(data);
 		return ret;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 3f3d7aeb0712..a0c0b85aead3 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1371,7 +1371,7 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
 		return -ENOENT;
 }
 
-static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *proc_pid_follow_link(struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
 	struct path path;
@@ -1385,7 +1385,7 @@ static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
 	if (error)
 		goto out;
 
-	nd_jump_link(nd, &path);
+	nd_jump_link(&path);
 	return NULL;
 out:
 	return ERR_PTR(error);
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index a4a716e6d6b9..7bdaf1040f98 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -394,12 +394,12 @@ static const struct file_operations proc_reg_file_ops_no_compat = {
 };
 #endif
 
-static void *proc_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *proc_follow_link(struct dentry *dentry)
 {
 	struct proc_dir_entry *pde = PDE(dentry->d_inode);
 	if (unlikely(!use_pde(pde)))
 		return ERR_PTR(-EINVAL);
-	nd_set_link(nd, pde->data);
+	nd_set_link(pde->data);
 	return pde;
 }
 
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index c9eac4563fa8..5e3394509c2c 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -30,7 +30,7 @@ static const struct proc_ns_operations *ns_entries[] = {
 	&mntns_operations,
 };
 
-static void *proc_ns_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *proc_ns_follow_link(struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
 	const struct proc_ns_operations *ns_ops = PROC_I(inode)->ns_ops;
@@ -45,7 +45,7 @@ static void *proc_ns_follow_link(struct dentry *dentry, struct nameidata *nd)
 	if (ptrace_may_access(task, PTRACE_MODE_READ)) {
 		error = ns_get_path(&ns_path, task, ns_ops);
 		if (!error)
-			nd_jump_link(nd, &ns_path);
+			nd_jump_link(&ns_path);
 	}
 	put_task_struct(task);
 	return error;
diff --git a/fs/proc/self.c b/fs/proc/self.c
index 4348bb8907c2..639bd0afdc05 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -19,7 +19,7 @@ static int proc_self_readlink(struct dentry *dentry, char __user *buffer,
 	return readlink_copy(buffer, buflen, tmp);
 }
 
-static void *proc_self_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *proc_self_follow_link(struct dentry *dentry)
 {
 	struct pid_namespace *ns = dentry->d_sb->s_fs_info;
 	pid_t tgid = task_tgid_nr_ns(current, ns);
@@ -32,7 +32,7 @@ static void *proc_self_follow_link(struct dentry *dentry, struct nameidata *nd)
 		else
 			sprintf(name, "%d", tgid);
 	}
-	nd_set_link(nd, name);
+	nd_set_link(name);
 	return NULL;
 }
 
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index 59075b509df3..2036b051f53f 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -20,7 +20,7 @@ static int proc_thread_self_readlink(struct dentry *dentry, char __user *buffer,
 	return readlink_copy(buffer, buflen, tmp);
 }
 
-static void *proc_thread_self_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *proc_thread_self_follow_link(struct dentry *dentry)
 {
 	struct pid_namespace *ns = dentry->d_sb->s_fs_info;
 	pid_t tgid = task_tgid_nr_ns(current, ns);
@@ -33,7 +33,7 @@ static void *proc_thread_self_follow_link(struct dentry *dentry, struct nameidat
 		else
 			sprintf(name, "%d/task/%d", tgid, pid);
 	}
-	nd_set_link(nd, name);
+	nd_set_link(name);
 	return NULL;
 }
 
diff --git a/fs/sysv/symlink.c b/fs/sysv/symlink.c
index 00d2f8a43e4e..3f8154e6b27e 100644
--- a/fs/sysv/symlink.c
+++ b/fs/sysv/symlink.c
@@ -8,9 +8,9 @@
 #include "sysv.h"
 #include <linux/namei.h>
 
-static void *sysv_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *sysv_follow_link(struct dentry *dentry)
 {
-	nd_set_link(nd, (char *)SYSV_I(dentry->d_inode)->i_data);
+	nd_set_link((char *)SYSV_I(dentry->d_inode)->i_data);
 	return NULL;
 }
 
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index e627c0acf626..082958d62096 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1300,11 +1300,11 @@ static void ubifs_invalidatepage(struct page *page, unsigned int offset,
 	ClearPageChecked(page);
 }
 
-static void *ubifs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ubifs_follow_link(struct dentry *dentry)
 {
 	struct ubifs_inode *ui = ubifs_inode(dentry->d_inode);
 
-	nd_set_link(nd, ui->data);
+	nd_set_link(ui->data);
 	return NULL;
 }
 
diff --git a/fs/ufs/symlink.c b/fs/ufs/symlink.c
index d283628b4778..a8266ff60b0f 100644
--- a/fs/ufs/symlink.c
+++ b/fs/ufs/symlink.c
@@ -32,10 +32,10 @@
 #include "ufs.h"
 
 
-static void *ufs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ufs_follow_link(struct dentry *dentry)
 {
 	struct ufs_inode_info *p = UFS_I(dentry->d_inode);
-	nd_set_link(nd, (char*)p->i_u1.i_symlink);
+	nd_set_link((char*)p->i_u1.i_symlink);
 	return NULL;
 }
 
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index e53a90331422..ac915d09de29 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -411,8 +411,7 @@ xfs_vn_rename(
  */
 STATIC void *
 xfs_vn_follow_link(
-	struct dentry		*dentry,
-	struct nameidata	*nd)
+	struct dentry		*dentry)
 {
 	char			*link;
 	int			error = -ENOMEM;
@@ -425,13 +424,13 @@ xfs_vn_follow_link(
 	if (unlikely(error))
 		goto out_kfree;
 
-	nd_set_link(nd, link);
+	nd_set_link(link);
 	return NULL;
 
  out_kfree:
 	kfree(link);
  out_err:
-	nd_set_link(nd, ERR_PTR(error));
+	nd_set_link(ERR_PTR(error));
 	return NULL;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 510b749c4040..d78dd3ae1be9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -38,7 +38,6 @@ struct backing_dev_info;
 struct export_operations;
 struct hd_geometry;
 struct iovec;
-struct nameidata;
 struct kiocb;
 struct kobject;
 struct pipe_inode_info;
@@ -1574,7 +1573,7 @@ struct file_operations {
 
 struct inode_operations {
 	struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
-	void * (*follow_link) (struct dentry *, struct nameidata *);
+	void * (*follow_link) (struct dentry *);
 	int (*permission) (struct inode *, int);
 	struct posix_acl * (*get_acl)(struct inode *, int);
 
@@ -2649,7 +2648,7 @@ extern const struct file_operations generic_ro_fops;
 
 extern int readlink_copy(char __user *, int, const char *);
 extern int page_readlink(struct dentry *, char __user *, int);
-extern void *page_follow_link_light(struct dentry *, struct nameidata *);
+extern void *page_follow_link_light(struct dentry *);
 extern void page_put_link(struct dentry *, char *, void *);
 extern int __page_symlink(struct inode *inode, const char *symname, int len,
 		int nofs);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 2ec27f2457d6..cc8b51a47160 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -7,7 +7,6 @@
 #include <linux/path.h>
 
 struct vfsmount;
-struct nameidata;
 
 enum { MAX_NESTED_LINKS = 8 };
 
@@ -70,8 +69,8 @@ extern int follow_up(struct path *);
 extern struct dentry *lock_rename(struct dentry *, struct dentry *);
 extern void unlock_rename(struct dentry *, struct dentry *);
 
-extern void nd_jump_link(struct nameidata *nd, struct path *path);
-extern void nd_set_link(struct nameidata *nd, char *path);
+extern void nd_jump_link(struct path *path);
+extern void nd_set_link(char *path);
 
 static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
 {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b88b9eea169a..5d85ef2b64c3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1267,6 +1267,7 @@ union rcu_special {
 	short s;
 };
 struct rcu_node;
+struct nameidata;
 
 enum perf_event_task_context {
 	perf_invalid_context = -1,
diff --git a/mm/shmem.c b/mm/shmem.c
index 53bf4d160e8b..910b37f44a2b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2474,17 +2474,17 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
 	return 0;
 }
 
-static void *shmem_follow_short_symlink(struct dentry *dentry, struct nameidata *nd)
+static void *shmem_follow_short_symlink(struct dentry *dentry)
 {
-	nd_set_link(nd, SHMEM_I(dentry->d_inode)->symlink);
+	nd_set_link(SHMEM_I(dentry->d_inode)->symlink);
 	return NULL;
 }
 
-static void *shmem_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *shmem_follow_link(struct dentry *dentry)
 {
 	struct page *page = NULL;
 	int error = shmem_getpage(dentry->d_inode, 0, &page, SGP_READ, NULL);
-	nd_set_link(nd, error ? ERR_PTR(error) : kmap(page));
+	nd_set_link(error ? ERR_PTR(error) : kmap(page));
 	if (page)
 		unlock_page(page);
 	return page;



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/20] VFS: make all ->follow_link handlers aware for LOOKUP_RCU
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (9 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 11/20] VFS/namei: use terminate_walk when symlink lookup fails NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 13/20] VFS/namei: abort RCU-walk on symlink if atime needs updating NeilBrown
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

Pass a the inode explicit as an argument, as dentry->d_inode is not
stable during RCU-walk, and also a new 'flags' argument
which may (after further patches) contain LOOKUP_RCU.

->follow_link methods which cannot complete atomically
must return -ECHILD when LOOKUP_RCU is set.
Those which can complete atomically must use 'inode'
rather than 'dentry->d_inode', and must only reference data
structures that are freed using rcu_free().

Later patches will make some of these handle LOOKUP_RCU
more gracefully.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 Documentation/filesystems/Locking             |    2 +-
 Documentation/filesystems/porting             |    9 +++++++++
 Documentation/filesystems/vfs.txt             |    2 +-
 drivers/staging/lustre/lustre/llite/symlink.c |    7 +++++--
 fs/9p/vfs_inode.c                             |   11 +++++++++--
 fs/9p/vfs_inode_dotl.c                        |   10 ++++++++--
 fs/autofs4/symlink.c                          |    6 ++++--
 fs/befs/linuxvfs.c                            |   15 +++++++++------
 fs/ceph/inode.c                               |    5 +++--
 fs/cifs/cifsfs.h                              |    3 ++-
 fs/cifs/link.c                                |    5 +++--
 fs/configfs/symlink.c                         |   10 ++++++++--
 fs/debugfs/file.c                             |    5 +++--
 fs/ecryptfs/inode.c                           |   12 +++++++++---
 fs/exofs/symlink.c                            |    6 +++---
 fs/ext2/symlink.c                             |    5 +++--
 fs/ext3/symlink.c                             |    5 +++--
 fs/ext4/symlink.c                             |    5 +++--
 fs/freevxfs/vxfs_immed.c                      |    9 ++++++---
 fs/fuse/dir.c                                 |    5 ++++-
 fs/gfs2/inode.c                               |    9 +++++++--
 fs/hostfs/hostfs_kern.c                       |   10 ++++++++--
 fs/hppfs/hppfs.c                              |    9 ++++++---
 fs/jffs2/symlink.c                            |    8 +++++---
 fs/jfs/symlink.c                              |    5 +++--
 fs/kernfs/symlink.c                           |   10 ++++++++--
 fs/namei.c                                    |   14 ++++++++++----
 fs/nfs/symlink.c                              |    6 ++++--
 fs/overlayfs/inode.c                          |    8 ++++++--
 fs/proc/base.c                                |    6 ++++--
 fs/proc/inode.c                               |    5 +++--
 fs/proc/namespaces.c                          |    7 +++++--
 fs/proc/self.c                                |   10 ++++++++--
 fs/proc/thread_self.c                         |   13 ++++++++++---
 fs/sysv/symlink.c                             |    5 +++--
 fs/ubifs/file.c                               |    5 +++--
 fs/ufs/symlink.c                              |    6 ++++--
 fs/xfs/xfs_iops.c                             |    8 ++++++--
 include/linux/fs.h                            |    4 ++--
 mm/shmem.c                                    |   14 ++++++++++----
 40 files changed, 211 insertions(+), 88 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index bbce4914d209..c0289bae848f 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -50,7 +50,7 @@ prototypes:
 	int (*rename2) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *, unsigned int);
 	int (*readlink) (struct dentry *, char __user *,int);
-	void * (*follow_link) (struct dentry *);
+	void * (*follow_link) (struct dentry *, struct inode *, int);
 	void (*put_link) (struct dentry *, char *, void *);
 	void (*truncate) (struct inode *);
 	int (*permission) (struct inode *, int, unsigned int);
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index 9996b4631a87..eba8dd0a13e3 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -481,3 +481,12 @@ in your dentry operations instead.
 	->follow_link() no longer receives 'struct nameidata *'.
 	The nd is now attached to 'current' and nd_set_link()
 	accesses it directly.
+--
+[mandatory]
+	->follow_link now receives 'struct inode *' and 'int flags' which
+	may contain LOOKUP_RCU.  In this case -ECHILD must be
+	returned if the operation cannot be completed under
+	rcu_read_lock() conditions.
+	The passed inode must be used rather than dentry->d_inode,
+	particularly if LOOKUP_RCU is set.
+	If s_fs_info is used, it must be freed using RCU.
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 11aac530931b..5557e9283d04 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -350,7 +350,7 @@ struct inode_operations {
 	int (*rename2) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *, unsigned int);
 	int (*readlink) (struct dentry *, char __user *,int);
-        void * (*follow_link) (struct dentry *);
+        void * (*follow_link) (struct dentry *, struct inode *, int);
 	void (*put_link) (struct dentry *, char *, void *);
 	int (*permission) (struct inode *, int);
 	int (*get_acl)(struct inode *, int);
diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index 63dd1a925c92..44d095c68ce7 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -118,14 +118,17 @@ failed:
 	return rc;
 }
 
-static void *ll_follow_link(struct dentry *dentry)
+static void *ll_follow_link(struct dentry *dentry, struct inode *inode,
+			    int flags)
 {
 	unsigned long avail_space;
-	struct inode *inode = dentry->d_inode;
 	struct ptlrpc_request *request = NULL;
 	int rc;
 	char *symname = NULL;
 
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
+
 	CDEBUG(D_VFSTRACE, "VFS Op\n");
 	/* Limit the recursive symlink depth.
 	 * Previously limited to 5 instead of default 8 links when
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index ebf50c3e132c..112091a186a1 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1274,13 +1274,20 @@ done:
 /**
  * v9fs_vfs_follow_link - follow a symlink path
  * @dentry: dentry for symlink
+ * @inode:  inode for the symlink
+ * @flags: lookup flags
  *
  */
 
-static void *v9fs_vfs_follow_link(struct dentry *dentry)
+static void *v9fs_vfs_follow_link(struct dentry *dentry, struct inode *inode,
+				  int flags)
 {
 	int len = 0;
-	char *link = __getname();
+	char *link;
+
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
+	link = __getname();
 
 	p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);
 
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index dc35156aea6a..3971e265f788 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -905,17 +905,23 @@ error:
 /**
  * v9fs_vfs_follow_link_dotl - follow a symlink path
  * @dentry: dentry for symlink
+ * @inode: inode for symlink
+ * @flags: lookup flags
  *
  */
 
 static void *
-v9fs_vfs_follow_link_dotl(struct dentry *dentry)
+v9fs_vfs_follow_link_dotl(struct dentry *dentry, struct inode *inode,
+			  int flags)
 {
 	int retval;
 	struct p9_fid *fid;
-	char *link = __getname();
+	char *link;
 	char *target;
 
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
+	link = __getname();
 	p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);
 
 	if (!link) {
diff --git a/fs/autofs4/symlink.c b/fs/autofs4/symlink.c
index 37b4b561faa3..e87885a6ef4e 100644
--- a/fs/autofs4/symlink.c
+++ b/fs/autofs4/symlink.c
@@ -12,13 +12,15 @@
 
 #include "autofs_i.h"
 
-static void *autofs4_follow_link(struct dentry *dentry)
+static void *autofs4_follow_link(struct dentry *dentry, struct inode *inode,
+				 int flags)
 {
 	struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
 	struct autofs_info *ino = autofs4_dentry_ino(dentry);
+
 	if (ino && !autofs4_oz_mode(sbi))
 		ino->last_used = jiffies;
-	nd_set_link(dentry->d_inode->i_private);
+	nd_set_link(inode->i_private);
 	return NULL;
 }
 
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 339ac02c0e17..1151a6fbc74e 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -42,8 +42,8 @@ static struct inode *befs_iget(struct super_block *, unsigned long);
 static struct inode *befs_alloc_inode(struct super_block *sb);
 static void befs_destroy_inode(struct inode *inode);
 static void befs_destroy_inodecache(void);
-static void *befs_follow_link(struct dentry *);
-static void *befs_fast_follow_link(struct dentry *);
+static void *befs_follow_link(struct dentry *, struct inode *, int);
+static void *befs_fast_follow_link(struct dentry *, struct inode *, int);
 static int befs_utf2nls(struct super_block *sb, const char *in, int in_len,
 			char **out, int *out_len);
 static int befs_nls2utf(struct super_block *sb, const char *in, int in_len,
@@ -469,14 +469,17 @@ befs_destroy_inodecache(void)
  * flag is set.
  */
 static void *
-befs_follow_link(struct dentry *dentry)
+befs_follow_link(struct dentry *dentry, struct inode *inode, int flags)
 {
 	struct super_block *sb = dentry->d_sb;
-	befs_inode_info *befs_ino = BEFS_I(dentry->d_inode);
+	befs_inode_info *befs_ino = BEFS_I(inode);
 	befs_data_stream *data = &befs_ino->i_data.ds;
 	befs_off_t len = data->size;
 	char *link;
 
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
+
 	if (len == 0) {
 		befs_error(sb, "Long symlink with illegal length");
 		link = ERR_PTR(-EIO);
@@ -500,9 +503,9 @@ befs_follow_link(struct dentry *dentry)
 
 
 static void *
-befs_fast_follow_link(struct dentry *dentry)
+befs_fast_follow_link(struct dentry *dentry, struct inode *inode, int flags)
 {
-	befs_inode_info *befs_ino = BEFS_I(dentry->d_inode);
+	befs_inode_info *befs_ino = BEFS_I(inode);
 	nd_set_link(befs_ino->i_data.symlink);
 	return NULL;
 }
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index f8212d6945de..761b55e73491 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1691,9 +1691,10 @@ retry:
 /*
  * symlinks
  */
-static void *ceph_sym_follow_link(struct dentry *dentry)
+static void *ceph_sym_follow_link(struct dentry *dentry, struct inode *inode,
+				  int flags)
 {
-	struct ceph_inode_info *ci = ceph_inode(dentry->d_inode);
+	struct ceph_inode_info *ci = ceph_inode(inode);
 	nd_set_link(ci->i_symlink);
 	return NULL;
 }
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index e3a6ef52a3e4..e58e685565d8 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -120,7 +120,8 @@ extern struct vfsmount *cifs_dfs_d_automount(struct path *path);
 #endif
 
 /* Functions related to symlinks */
-extern void *cifs_follow_link(struct dentry *direntry);
+extern void *cifs_follow_link(struct dentry *direntry, struct inode *inode,
+			      int flags);
 extern int cifs_readlink(struct dentry *direntry, char __user *buffer,
 			 int buflen);
 extern int cifs_symlink(struct inode *inode, struct dentry *direntry,
diff --git a/fs/cifs/link.c b/fs/cifs/link.c
index ba3562198c33..0d3c14acbdfc 100644
--- a/fs/cifs/link.c
+++ b/fs/cifs/link.c
@@ -627,9 +627,8 @@ cifs_hl_exit:
 }
 
 void *
-cifs_follow_link(struct dentry *direntry)
+cifs_follow_link(struct dentry *direntry, struct inode *inode, int flags)
 {
-	struct inode *inode = direntry->d_inode;
 	int rc = -ENOMEM;
 	unsigned int xid;
 	char *full_path = NULL;
@@ -639,6 +638,8 @@ cifs_follow_link(struct dentry *direntry)
 	struct cifs_tcon *tcon;
 	struct TCP_Server_Info *server;
 
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
 	xid = get_xid();
 
 	tlink = cifs_sb_tlink(cifs_sb);
diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index ff41712ffddd..443b11251b84 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -279,10 +279,16 @@ static int configfs_getlink(struct dentry *dentry, char * path)
 
 }
 
-static void *configfs_follow_link(struct dentry *dentry)
+static void *configfs_follow_link(struct dentry *dentry, struct inode *inode,
+				  int flags)
 {
 	int error = -ENOMEM;
-	unsigned long page = get_zeroed_page(GFP_KERNEL);
+	unsigned long page;
+
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
+
+	page = get_zeroed_page(GFP_KERNEL);
 
 	if (page) {
 		error = configfs_getlink(dentry, (char *)page);
diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index eeed1f1fed4f..720dfb983b93 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -43,9 +43,10 @@ const struct file_operations debugfs_file_operations = {
 	.llseek =	noop_llseek,
 };
 
-static void *debugfs_follow_link(struct dentry *dentry)
+static void *debugfs_follow_link(struct dentry *dentry, struct inode *inode,
+				 int flags)
 {
-	nd_set_link(dentry->d_inode->i_private);
+	nd_set_link(inode->i_private);
 	return NULL;
 }
 
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 680cf30e9135..17c4321e6d40 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -673,13 +673,19 @@ out:
 	return rc ? ERR_PTR(rc) : buf;
 }
 
-static void *ecryptfs_follow_link(struct dentry *dentry)
+static void *ecryptfs_follow_link(struct dentry *dentry, struct inode *inode,
+				  int flags)
 {
 	size_t len;
-	char *buf = ecryptfs_readlink_lower(dentry, &len);
+	char *buf;
+
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
+
+	buf = ecryptfs_readlink_lower(dentry, &len);
 	if (IS_ERR(buf))
 		goto out;
-	fsstack_copy_attr_atime(dentry->d_inode,
+	fsstack_copy_attr_atime(inode,
 				ecryptfs_dentry_to_lower(dentry)->d_inode);
 	buf[len] = '\0';
 out:
diff --git a/fs/exofs/symlink.c b/fs/exofs/symlink.c
index e6d0467a0b5a..c8525b051811 100644
--- a/fs/exofs/symlink.c
+++ b/fs/exofs/symlink.c
@@ -35,10 +35,10 @@
 
 #include "exofs.h"
 
-static void *exofs_follow_link(struct dentry *dentry)
+static void *exofs_follow_link(struct dentry *dentry, struct inode *inode,
+			       int flags)
 {
-	struct exofs_i_info *oi = exofs_i(dentry->d_inode);
-
+	struct exofs_i_info *oi = exofs_i(inode);
 	nd_set_link((char *)oi->i_data);
 	return NULL;
 }
diff --git a/fs/ext2/symlink.c b/fs/ext2/symlink.c
index 063852432bd4..eb1820a875c9 100644
--- a/fs/ext2/symlink.c
+++ b/fs/ext2/symlink.c
@@ -21,9 +21,10 @@
 #include "xattr.h"
 #include <linux/namei.h>
 
-static void *ext2_follow_link(struct dentry *dentry)
+static void *ext2_follow_link(struct dentry *dentry, struct inode *inode,
+			      int flags)
 {
-	struct ext2_inode_info *ei = EXT2_I(dentry->d_inode);
+	struct ext2_inode_info *ei = EXT2_I(inode);
 	nd_set_link((char *)ei->i_data);
 	return NULL;
 }
diff --git a/fs/ext3/symlink.c b/fs/ext3/symlink.c
index bf8acd9efaae..c048fedc13c4 100644
--- a/fs/ext3/symlink.c
+++ b/fs/ext3/symlink.c
@@ -21,9 +21,10 @@
 #include "ext3.h"
 #include "xattr.h"
 
-static void * ext3_follow_link(struct dentry *dentry)
+static void * ext3_follow_link(struct dentry *dentry, struct inode *inode,
+			       int flags)
 {
-	struct ext3_inode_info *ei = EXT3_I(dentry->d_inode);
+	struct ext3_inode_info *ei = EXT3_I(inode);
 	nd_set_link((char*)ei->i_data);
 	return NULL;
 }
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index 0015e7f53d0f..da0790514769 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -23,9 +23,10 @@
 #include "ext4.h"
 #include "xattr.h"
 
-static void *ext4_follow_link(struct dentry *dentry)
+static void *ext4_follow_link(struct dentry *dentry, struct inode *inode,
+			      int flags)
 {
-	struct ext4_inode_info *ei = EXT4_I(dentry->d_inode);
+	struct ext4_inode_info *ei = EXT4_I(inode);
 	nd_set_link((char *) ei->i_data);
 	return NULL;
 }
diff --git a/fs/freevxfs/vxfs_immed.c b/fs/freevxfs/vxfs_immed.c
index 058acefeb11c..bc8f83f1ff7d 100644
--- a/fs/freevxfs/vxfs_immed.c
+++ b/fs/freevxfs/vxfs_immed.c
@@ -39,7 +39,7 @@
 #include "vxfs_inode.h"
 
 
-static void *	vxfs_immed_follow_link(struct dentry *);
+static void *	vxfs_immed_follow_link(struct dentry *, struct inode *, int);
 
 static int	vxfs_immed_readpage(struct file *, struct page *);
 
@@ -64,6 +64,8 @@ const struct address_space_operations vxfs_immed_aops = {
 /**
  * vxfs_immed_follow_link - follow immed symlink
  * @dp:		dentry for the link
+ * @inode:	inode for the link
+ * @flags:	lookup flags
  *
  * Description:
  *   vxfs_immed_follow_link restarts the pathname lookup with
@@ -73,9 +75,10 @@ const struct address_space_operations vxfs_immed_aops = {
  *   Zero on success, else a negative error code.
  */
 static void *
-vxfs_immed_follow_link(struct dentry *dp)
+vxfs_immed_follow_link(struct dentry *dp, struct inode *inode,
+		       int flags)
 {
-	struct vxfs_inode_info		*vip = VXFS_INO(dp->d_inode);
+	struct vxfs_inode_info		*vip = VXFS_INO(inode);
 	nd_set_link(vip->vii_immed.vi_immed);
 	return NULL;
 }
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 58e632be862b..e950b3c774c5 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1400,8 +1400,11 @@ static void free_link(char *link)
 		free_page((unsigned long) link);
 }
 
-static void *fuse_follow_link(struct dentry *dentry)
+static void *fuse_follow_link(struct dentry *dentry, struct inode *inode,
+			      int flags)
 {
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
 	nd_set_link(read_link(dentry));
 	return NULL;
 }
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 5b28009ea860..2e4caa76bf97 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1541,21 +1541,26 @@ out:
 /**
  * gfs2_follow_link - Follow a symbolic link
  * @dentry: The dentry of the link
+ * @inode: The inode of the link
+ * @flags: Lookup flags
  *
  * This can handle symlinks of any size.
  *
  * Returns: 0 on success or error code
  */
 
-static void *gfs2_follow_link(struct dentry *dentry)
+static void *gfs2_follow_link(struct dentry *dentry, struct inode *inode,
+			      int flags)
 {
-	struct gfs2_inode *ip = GFS2_I(dentry->d_inode);
+	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_holder i_gh;
 	struct buffer_head *dibh;
 	unsigned int size;
 	char *buf;
 	int error;
 
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
 	gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &i_gh);
 	error = gfs2_glock_nq(&i_gh);
 	if (error) {
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index a862634bf98e..f40966f9fdaf 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -882,9 +882,15 @@ static const struct inode_operations hostfs_dir_iops = {
 	.setattr	= hostfs_setattr,
 };
 
-static void *hostfs_follow_link(struct dentry *dentry)
+static void *hostfs_follow_link(struct dentry *dentry, struct inode *inode,
+				int flags)
 {
-	char *link = __getname();
+	char *link;
+
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
+
+	link = __getname();
 	if (link) {
 		char *path = dentry_name(dentry);
 		int err = -ENOMEM;
diff --git a/fs/hppfs/hppfs.c b/fs/hppfs/hppfs.c
index bf13cf24eec7..b6f68e22d0fb 100644
--- a/fs/hppfs/hppfs.c
+++ b/fs/hppfs/hppfs.c
@@ -642,11 +642,14 @@ static int hppfs_readlink(struct dentry *dentry, char __user *buffer,
 						    buflen);
 }
 
-static void *hppfs_follow_link(struct dentry *dentry)
+static void *hppfs_follow_link(struct dentry *dentry, struct inode *inode,
+			       int flags)
 {
-	struct dentry *proc_dentry = HPPFS_I(dentry->d_inode)->proc_dentry;
+	struct dentry *proc_dentry = HPPFS_I(inode)->proc_dentry;
 
-	return proc_dentry->d_inode->i_op->follow_link(proc_dentry);
+	return proc_dentry->d_inode->i_op->follow_link(proc_dentry,
+						       proc_dentry->d_inode,
+						       flags);
 }
 
 static void hppfs_put_link(struct dentry *dentry, char *link,
diff --git a/fs/jffs2/symlink.c b/fs/jffs2/symlink.c
index 6b58d7659fbd..18c00fbd7060 100644
--- a/fs/jffs2/symlink.c
+++ b/fs/jffs2/symlink.c
@@ -16,7 +16,8 @@
 #include <linux/namei.h>
 #include "nodelist.h"
 
-static void *jffs2_follow_link(struct dentry *dentry);
+static void *jffs2_follow_link(struct dentry *dentry, struct inode *inode,
+			       int flags);
 
 const struct inode_operations jffs2_symlink_inode_operations =
 {
@@ -29,9 +30,10 @@ const struct inode_operations jffs2_symlink_inode_operations =
 	.removexattr =	jffs2_removexattr
 };
 
-static void *jffs2_follow_link(struct dentry *dentry)
+static void *jffs2_follow_link(struct dentry *dentry, struct inode *inode,
+			       int flags)
 {
-	struct jffs2_inode_info *f = JFFS2_INODE_INFO(dentry->d_inode);
+	struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
 	char *p = (char *)f->target;
 
 	/*
diff --git a/fs/jfs/symlink.c b/fs/jfs/symlink.c
index cefda45b8d3a..610f44fad05e 100644
--- a/fs/jfs/symlink.c
+++ b/fs/jfs/symlink.c
@@ -22,9 +22,10 @@
 #include "jfs_inode.h"
 #include "jfs_xattr.h"
 
-static void *jfs_follow_link(struct dentry *dentry)
+static void *jfs_follow_link(struct dentry *dentry, struct inode *inode,
+			     int flags)
 {
-	char *s = JFS_IP(dentry->d_inode)->i_inline;
+	char *s = JFS_IP(inode)->i_inline;
 	nd_set_link(s);
 	return NULL;
 }
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 63d08ecbb72c..1a40d5e6ac71 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -112,10 +112,16 @@ static int kernfs_getlink(struct dentry *dentry, char *path)
 	return error;
 }
 
-static void *kernfs_iop_follow_link(struct dentry *dentry)
+static void *kernfs_iop_follow_link(struct dentry *dentry, struct inode *inode,
+				    int flags)
 {
 	int error = -ENOMEM;
-	unsigned long page = get_zeroed_page(GFP_KERNEL);
+	unsigned long page;
+
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
+
+	page = get_zeroed_page(GFP_KERNEL);
 	if (page) {
 		error = kernfs_getlink(dentry, (char *) page);
 		if (error < 0)
diff --git a/fs/namei.c b/fs/namei.c
index e7fab6886e29..784fca0e6c70 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -876,6 +876,7 @@ static __always_inline int
 follow_link(struct path *link, struct nameidata *nd, void **p)
 {
 	struct dentry *dentry = link->dentry;
+	struct inode *inode = dentry->d_inode;
 	int error;
 	char *s;
 
@@ -894,12 +895,13 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	touch_atime(link);
 	nd_set_link(NULL);
 
-	error = security_inode_follow_link(link->dentry);
+	error = security_inode_follow_link(dentry);
 	if (error)
 		goto out_put_nd_path;
 
 	nd->last_type = LAST_BIND;
-	*p = dentry->d_inode->i_op->follow_link(dentry);
+	*p = inode->i_op->follow_link(dentry, inode,
+				      nd->flags & LOOKUP_RCU);
 	error = PTR_ERR(*p);
 	if (IS_ERR(*p))
 		goto out_put_nd_path;
@@ -4464,7 +4466,8 @@ int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 	int res;
 
 	nd.depth = 0;
-	cookie = dentry->d_inode->i_op->follow_link(dentry);
+	cookie = dentry->d_inode->i_op->follow_link(dentry,
+						    dentry->d_inode, 0);
 	if (IS_ERR(cookie))
 		res = PTR_ERR(cookie);
 	else {
@@ -4506,9 +4509,12 @@ int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 }
 EXPORT_SYMBOL(page_readlink);
 
-void *page_follow_link_light(struct dentry *dentry)
+void *page_follow_link_light(struct dentry *dentry, struct inode *inode,
+			     int flags)
 {
 	struct page *page = NULL;
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
 	nd_set_link(page_getlink(dentry, &page));
 	return page;
 }
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index f3c44616b615..32bbac1bb4bc 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -43,12 +43,14 @@ error:
 	return -EIO;
 }
 
-static void *nfs_follow_link(struct dentry *dentry)
+static void *nfs_follow_link(struct dentry *dentry, struct inode *inode,
+			     int flags)
 {
-	struct inode *inode = dentry->d_inode;
 	struct page *page;
 	void *err;
 
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
 	err = ERR_PTR(nfs_revalidate_mapping(inode, inode->i_mapping));
 	if (err)
 		goto read_failed;
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 0de7b87bd025..675efccd5c84 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/fs.h>
+#include <linux/namei.h>
 #include <linux/slab.h>
 #include <linux/xattr.h>
 #include "overlayfs.h"
@@ -140,13 +141,16 @@ struct ovl_link_data {
 	void *cookie;
 };
 
-static void *ovl_follow_link(struct dentry *dentry)
+static void *ovl_follow_link(struct dentry *dentry, struct inode *inode,
+			     int flags)
 {
 	void *ret;
 	struct dentry *realdentry;
 	struct inode *realinode;
 	struct ovl_link_data *data = NULL;
 
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
 	realdentry = ovl_dentry_real(dentry);
 	realinode = realdentry->d_inode;
 
@@ -160,7 +164,7 @@ static void *ovl_follow_link(struct dentry *dentry)
 		data->realdentry = realdentry;
 	}
 
-	ret = realinode->i_op->follow_link(realdentry);
+	ret = realinode->i_op->follow_link(realdentry, realinode, flags);
 	if (IS_ERR(ret)) {
 		kfree(data);
 		return ret;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index a0c0b85aead3..203f8d8ceab1 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1371,12 +1371,14 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
 		return -ENOENT;
 }
 
-static void *proc_pid_follow_link(struct dentry *dentry)
+static void *proc_pid_follow_link(struct dentry *dentry, struct inode *inode,
+				  int flags)
 {
-	struct inode *inode = dentry->d_inode;
 	struct path path;
 	int error = -EACCES;
 
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
 	/* Are we allowed to snoop on the tasks file descriptors? */
 	if (!proc_fd_access_allowed(inode))
 		goto out;
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 7bdaf1040f98..faf2c5400437 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -394,9 +394,10 @@ static const struct file_operations proc_reg_file_ops_no_compat = {
 };
 #endif
 
-static void *proc_follow_link(struct dentry *dentry)
+static void *proc_follow_link(struct dentry *dentry, struct inode *inode,
+			      int flags)
 {
-	struct proc_dir_entry *pde = PDE(dentry->d_inode);
+	struct proc_dir_entry *pde = PDE(inode);
 	if (unlikely(!use_pde(pde)))
 		return ERR_PTR(-EINVAL);
 	nd_set_link(pde->data);
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index 5e3394509c2c..a2578c44edeb 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -30,14 +30,17 @@ static const struct proc_ns_operations *ns_entries[] = {
 	&mntns_operations,
 };
 
-static void *proc_ns_follow_link(struct dentry *dentry)
+static void *proc_ns_follow_link(struct dentry *dentry, struct inode *inode,
+				 int flags)
 {
-	struct inode *inode = dentry->d_inode;
 	const struct proc_ns_operations *ns_ops = PROC_I(inode)->ns_ops;
 	struct task_struct *task;
 	struct path ns_path;
 	void *error = ERR_PTR(-EACCES);
 
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
+
 	task = get_proc_task(inode);
 	if (!task)
 		return error;
diff --git a/fs/proc/self.c b/fs/proc/self.c
index 639bd0afdc05..7fcb906c250a 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -19,11 +19,17 @@ static int proc_self_readlink(struct dentry *dentry, char __user *buffer,
 	return readlink_copy(buffer, buflen, tmp);
 }
 
-static void *proc_self_follow_link(struct dentry *dentry)
+static void *proc_self_follow_link(struct dentry *dentry, struct inode *inode,
+				   int flags)
 {
 	struct pid_namespace *ns = dentry->d_sb->s_fs_info;
-	pid_t tgid = task_tgid_nr_ns(current, ns);
+	pid_t tgid;
 	char *name = ERR_PTR(-ENOENT);
+
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
+
+	tgid = task_tgid_nr_ns(current, ns);
 	if (tgid) {
 		/* 11 for max length of signed int in decimal + NULL term */
 		name = kmalloc(12, GFP_KERNEL);
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index 2036b051f53f..7a9af8a6baab 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -20,12 +20,19 @@ static int proc_thread_self_readlink(struct dentry *dentry, char __user *buffer,
 	return readlink_copy(buffer, buflen, tmp);
 }
 
-static void *proc_thread_self_follow_link(struct dentry *dentry)
+static void *proc_thread_self_follow_link(struct dentry *dentry,
+					  struct inode *inode, int flags)
 {
 	struct pid_namespace *ns = dentry->d_sb->s_fs_info;
-	pid_t tgid = task_tgid_nr_ns(current, ns);
-	pid_t pid = task_pid_nr_ns(current, ns);
+	pid_t tgid;
+	pid_t pid;
 	char *name = ERR_PTR(-ENOENT);
+
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
+
+	tgid = task_tgid_nr_ns(current, ns);
+	pid = task_pid_nr_ns(current, ns);
 	if (pid) {
 		name = kmalloc(PROC_NUMBUF + 6 + PROC_NUMBUF, GFP_KERNEL);
 		if (!name)
diff --git a/fs/sysv/symlink.c b/fs/sysv/symlink.c
index 3f8154e6b27e..bdddd74831ac 100644
--- a/fs/sysv/symlink.c
+++ b/fs/sysv/symlink.c
@@ -8,9 +8,10 @@
 #include "sysv.h"
 #include <linux/namei.h>
 
-static void *sysv_follow_link(struct dentry *dentry)
+static void *sysv_follow_link(struct dentry *dentry, struct inode *inode,
+			      int flags)
 {
-	nd_set_link((char *)SYSV_I(dentry->d_inode)->i_data);
+	nd_set_link((char *)SYSV_I(inode)->i_data);
 	return NULL;
 }
 
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 082958d62096..4872f14a88c1 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1300,9 +1300,10 @@ static void ubifs_invalidatepage(struct page *page, unsigned int offset,
 	ClearPageChecked(page);
 }
 
-static void *ubifs_follow_link(struct dentry *dentry)
+static void *ubifs_follow_link(struct dentry *dentry, struct inode *inode,
+			       int flags)
 {
-	struct ubifs_inode *ui = ubifs_inode(dentry->d_inode);
+	struct ubifs_inode *ui = ubifs_inode(inode);
 
 	nd_set_link(ui->data);
 	return NULL;
diff --git a/fs/ufs/symlink.c b/fs/ufs/symlink.c
index a8266ff60b0f..3b690a020627 100644
--- a/fs/ufs/symlink.c
+++ b/fs/ufs/symlink.c
@@ -32,9 +32,11 @@
 #include "ufs.h"
 
 
-static void *ufs_follow_link(struct dentry *dentry)
+static void *ufs_follow_link(struct dentry *dentry, struct inode *inode,
+			     int flags)
 {
-	struct ufs_inode_info *p = UFS_I(dentry->d_inode);
+	struct ufs_inode_info *p = UFS_I(inode);
+
 	nd_set_link((char*)p->i_u1.i_symlink);
 	return NULL;
 }
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ac915d09de29..c2c136ef3a50 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -411,16 +411,20 @@ xfs_vn_rename(
  */
 STATIC void *
 xfs_vn_follow_link(
-	struct dentry		*dentry)
+	struct dentry		*dentry,
+	struct inode		*inode,
+	int			flags)
 {
 	char			*link;
 	int			error = -ENOMEM;
 
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
 	link = kmalloc(MAXPATHLEN+1, GFP_KERNEL);
 	if (!link)
 		goto out_err;
 
-	error = xfs_readlink(XFS_I(dentry->d_inode), link);
+	error = xfs_readlink(XFS_I(inode), link);
 	if (unlikely(error))
 		goto out_kfree;
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d78dd3ae1be9..dda92ac8ef41 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1573,7 +1573,7 @@ struct file_operations {
 
 struct inode_operations {
 	struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
-	void * (*follow_link) (struct dentry *);
+	void * (*follow_link) (struct dentry *, struct inode *, int);
 	int (*permission) (struct inode *, int);
 	struct posix_acl * (*get_acl)(struct inode *, int);
 
@@ -2648,7 +2648,7 @@ extern const struct file_operations generic_ro_fops;
 
 extern int readlink_copy(char __user *, int, const char *);
 extern int page_readlink(struct dentry *, char __user *, int);
-extern void *page_follow_link_light(struct dentry *);
+extern void *page_follow_link_light(struct dentry *, struct inode *, int);
 extern void page_put_link(struct dentry *, char *, void *);
 extern int __page_symlink(struct inode *inode, const char *symname, int len,
 		int nofs);
diff --git a/mm/shmem.c b/mm/shmem.c
index 910b37f44a2b..1083e8c75536 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2474,16 +2474,22 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
 	return 0;
 }
 
-static void *shmem_follow_short_symlink(struct dentry *dentry)
+static void *shmem_follow_short_symlink(struct dentry *dentry,
+					struct inode *inode, int flags)
 {
-	nd_set_link(SHMEM_I(dentry->d_inode)->symlink);
+	nd_set_link(SHMEM_I(inode)->symlink);
 	return NULL;
 }
 
-static void *shmem_follow_link(struct dentry *dentry)
+static void *shmem_follow_link(struct dentry *dentry, struct inode *inode,
+			       int flags)
 {
 	struct page *page = NULL;
-	int error = shmem_getpage(dentry->d_inode, 0, &page, SGP_READ, NULL);
+	int error;
+
+	if (flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
+	error = shmem_getpage(inode, 0, &page, SGP_READ, NULL);
 	nd_set_link(error ? ERR_PTR(error) : kmap(page));
 	if (page)
 		unlock_page(page);

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/20] security: make inode_follow_link RCU-walk aware
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (2 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 01/20] Documentation: remove outdated information from automount-support.txt NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 04/20] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link NeilBrown
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

Like ->follow_link, inode_follow_link now takes an inode and
flags as well as the dentry.

inode is used in preference to dentry->d_inode, particularly in
RCU-walk mode.

selinux_inode_follow_link() gets dentry_has_perm() and
inode_has_perm() open-coded into it so that it can call
avc_has_perm_flags() in way that is safe if LOOKUP_RCU is set.

Calling avc_has_perm_flags() with rcu_read_lock() held means
that when avc_has_perm_noaudit calls avc_compute_av(), the attempt
to rcu_read_unlock() before calling security_compute_av() will not
actually drop the RCU read-lock.

However as security_compute_av() is completely in a read_lock()ed
region, it should be safe with the RCU read-lock held.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c               |    3 ++-
 include/linux/security.h |   12 +++++++++---
 security/capability.c    |    3 ++-
 security/security.c      |    7 ++++---
 security/selinux/hooks.c |   19 +++++++++++++++++--
 5 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 784fca0e6c70..6ac163212429 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -895,7 +895,8 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	touch_atime(link);
 	nd_set_link(NULL);
 
-	error = security_inode_follow_link(dentry);
+	error = security_inode_follow_link(dentry, inode,
+					   nd->flags & LOOKUP_RCU);
 	if (error)
 		goto out_put_nd_path;
 
diff --git a/include/linux/security.h b/include/linux/security.h
index 237d22bfc642..5a207d110053 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -476,6 +476,8 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
  * @inode_follow_link:
  *	Check permission to follow a symbolic link when looking up a pathname.
  *	@dentry contains the dentry structure for the link.
+ *	@inode contains dentry->d_inode, which itself is not stable in RCU-walk
+ *	@flags contains LOOKUP_RCU if in RCU-walk mode.
  *	Return 0 if permission is granted.
  * @inode_permission:
  *	Check permission before accessing an inode.  This hook is called by the
@@ -1551,7 +1553,8 @@ struct security_operations {
 	int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry,
 			     struct inode *new_dir, struct dentry *new_dentry);
 	int (*inode_readlink) (struct dentry *dentry);
-	int (*inode_follow_link) (struct dentry *dentry);
+	int (*inode_follow_link) (struct dentry *dentry, struct inode *inode,
+				  int flags);
 	int (*inode_permission) (struct inode *inode, int mask);
 	int (*inode_setattr)	(struct dentry *dentry, struct iattr *attr);
 	int (*inode_getattr) (struct vfsmount *mnt, struct dentry *dentry);
@@ -1838,7 +1841,8 @@ int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
 			  struct inode *new_dir, struct dentry *new_dentry,
 			  unsigned int flags);
 int security_inode_readlink(struct dentry *dentry);
-int security_inode_follow_link(struct dentry *dentry);
+int security_inode_follow_link(struct dentry *dentry, struct inode *inode,
+			       int flags);
 int security_inode_permission(struct inode *inode, int mask);
 int security_inode_setattr(struct dentry *dentry, struct iattr *attr);
 int security_inode_getattr(struct vfsmount *mnt, struct dentry *dentry);
@@ -2240,7 +2244,9 @@ static inline int security_inode_readlink(struct dentry *dentry)
 	return 0;
 }
 
-static inline int security_inode_follow_link(struct dentry *dentry)
+static inline int security_inode_follow_link(struct dentry *dentry,
+					     struct inode *inode,
+					     int flags)
 {
 	return 0;
 }
diff --git a/security/capability.c b/security/capability.c
index ad8557782e73..f65bf2c26944 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -209,7 +209,8 @@ static int cap_inode_readlink(struct dentry *dentry)
 	return 0;
 }
 
-static int cap_inode_follow_link(struct dentry *dentry)
+static int cap_inode_follow_link(struct dentry *dentry, struct inode *inode,
+				 int flags)
 {
 	return 0;
 }
diff --git a/security/security.c b/security/security.c
index 7b4fd199e881..0ff6d38cf1e4 100644
--- a/security/security.c
+++ b/security/security.c
@@ -581,11 +581,12 @@ int security_inode_readlink(struct dentry *dentry)
 	return security_ops->inode_readlink(dentry);
 }
 
-int security_inode_follow_link(struct dentry *dentry)
+int security_inode_follow_link(struct dentry *dentry, struct inode *inode,
+			       int flags)
 {
-	if (unlikely(IS_PRIVATE(dentry->d_inode)))
+	if (unlikely(IS_PRIVATE(inode)))
 		return 0;
-	return security_ops->inode_follow_link(dentry);
+	return security_ops->inode_follow_link(dentry, inode, flags);
 }
 
 int security_inode_permission(struct inode *inode, int mask)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 9a08b8c04eff..b46382749b33 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2862,11 +2862,26 @@ static int selinux_inode_readlink(struct dentry *dentry)
 	return dentry_has_perm(cred, dentry, FILE__READ);
 }
 
-static int selinux_inode_follow_link(struct dentry *dentry)
+static int selinux_inode_follow_link(struct dentry *dentry, struct inode *inode,
+				     int flags)
 {
 	const struct cred *cred = current_cred();
+	struct common_audit_data ad;
+	struct inode_security_struct *isec;
+	u32 sid;
 
-	return dentry_has_perm(cred, dentry, FILE__READ);
+	if (unlikely(IS_PRIVATE(inode)))
+		return 0;
+
+	validate_creds(cred);
+
+	ad.type = LSM_AUDIT_DATA_DENTRY;
+	ad.u.dentry = dentry;
+	sid = cred_sid(cred);
+	isec = inode->i_security;
+
+	return avc_has_perm_flags(sid, isec->sid, isec->sclass, FILE__READ, &ad,
+				  flags & LOOKUP_RCU ? MAY_NOT_BLOCK : 0);
 }
 
 static noinline int audit_inode_permission(struct inode *inode,

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 11/20] VFS/namei: use terminate_walk when symlink lookup fails.
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (8 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 09/20] security/selinux: pass 'flags' arg to avc_audit() and avc_has_perm_flags() NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 08/20] VFS: make all ->follow_link handlers aware for LOOKUP_RCU NeilBrown
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

Currently following a symlink never uses rcu-walk, so
terminate_walk isn't needed.
That will change in a future patch.  In preparation, change
some
  path_put_condtional()
  path_put()
sequences to
  path_to_nameidata()
  terminate_walk()

These sequence are identical when in ref-walk, and correct when in
rcu-walk.

Also change two path_put() calls to equivalent terminate_walk().

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |   40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6ac163212429..1a8cc0e47df6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -748,6 +748,18 @@ static inline char *nd_get_link(struct nameidata *nd)
 	return nd->saved_names[nd->depth];
 }
 
+static void terminate_walk(struct nameidata *nd)
+{
+	if (!(nd->flags & LOOKUP_RCU)) {
+		path_put(&nd->path);
+	} else {
+		nd->flags &= ~LOOKUP_RCU;
+		if (!(nd->flags & LOOKUP_ROOT))
+			nd->root.mnt = NULL;
+		rcu_read_unlock();
+	}
+}
+
 static inline void put_link(struct nameidata *nd, struct path *link, void *cookie)
 {
 	struct inode *inode = link->dentry->d_inode;
@@ -798,8 +810,8 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd)
 		return 0;
 
 	audit_log_link_denied("follow_link", link);
-	path_put_conditional(link, nd);
-	path_put(&nd->path);
+	path_to_nameidata(link, nd);
+	terminate_walk(nd);
 	return -EACCES;
 }
 
@@ -911,7 +923,7 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	s = nd_get_link(nd);
 	if (s) {
 		if (unlikely(IS_ERR(s))) {
-			path_put(&nd->path);
+			terminate_walk(nd);
 			put_link(nd, link, *p);
 			return PTR_ERR(s);
 		}
@@ -933,7 +945,7 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 
 out_put_nd_path:
 	*p = NULL;
-	path_put(&nd->path);
+	terminate_walk(nd);
 	path_put(link);
 	return error;
 }
@@ -1564,18 +1576,6 @@ static inline int handle_dots(struct nameidata *nd, int type)
 	return 0;
 }
 
-static void terminate_walk(struct nameidata *nd)
-{
-	if (!(nd->flags & LOOKUP_RCU)) {
-		path_put(&nd->path);
-	} else {
-		nd->flags &= ~LOOKUP_RCU;
-		if (!(nd->flags & LOOKUP_ROOT))
-			nd->root.mnt = NULL;
-		rcu_read_unlock();
-	}
-}
-
 /*
  * Do we need to follow links? We _really_ want to be able
  * to do this check without having to look at inode->i_op,
@@ -1647,8 +1647,8 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
 	int res;
 
 	if (unlikely(nd->link_count >= MAX_NESTED_LINKS)) {
-		path_put_conditional(path, nd);
-		path_put(&nd->path);
+		path_to_nameidata(path, nd);
+		terminate_walk(nd);
 		return -ELOOP;
 	}
 	BUG_ON(nd->depth >= MAX_NESTED_LINKS);
@@ -3270,8 +3270,8 @@ static struct file *path_openat(int dfd, struct filename *pathname,
 		struct path link = path;
 		void *cookie;
 		if (!(nd->flags & LOOKUP_FOLLOW)) {
-			path_put_conditional(&path, nd);
-			path_put(&nd->path);
+			path_to_nameidata(&path, nd);
+			terminate_walk(nd);
 			error = -ELOOP;
 			break;
 		}

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/20] security/selinux: pass 'flags' arg to avc_audit() and avc_has_perm_flags()
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (7 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 05/20] VFS: replace nameidata arg to ->put_link with a char* NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 11/20] VFS/namei: use terminate_walk when symlink lookup fails NeilBrown
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

This allows MAY_NOT_BLOCK to be passed, in RCU-walk mode, through
the new avc_has_perm_flags() to avc_audit() and thence the slow_avc_audit.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 security/selinux/avc.c         |   18 +++++++++++++++++-
 security/selinux/hooks.c       |    2 +-
 security/selinux/include/avc.h |    9 +++++++--
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index afcc0aed9393..385ece23f005 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -763,7 +763,23 @@ int avc_has_perm(u32 ssid, u32 tsid, u16 tclass,
 
 	rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, 0, &avd);
 
-	rc2 = avc_audit(ssid, tsid, tclass, requested, &avd, rc, auditdata);
+	rc2 = avc_audit(ssid, tsid, tclass, requested, &avd, rc, auditdata, 0);
+	if (rc2)
+		return rc2;
+	return rc;
+}
+
+int avc_has_perm_flags(u32 ssid, u32 tsid, u16 tclass,
+		       u32 requested, struct common_audit_data *auditdata,
+		       int flags)
+{
+	struct av_decision avd;
+	int rc, rc2;
+
+	rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, 0, &avd);
+
+	rc2 = avc_audit(ssid, tsid, tclass, requested, &avd, rc,
+			auditdata, flags);
 	if (rc2)
 		return rc2;
 	return rc;
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index a2c29efcacc9..9a08b8c04eff 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1565,7 +1565,7 @@ static int cred_has_capability(const struct cred *cred,
 
 	rc = avc_has_perm_noaudit(sid, sid, sclass, av, 0, &avd);
 	if (audit == SECURITY_CAP_AUDIT) {
-		int rc2 = avc_audit(sid, sid, sclass, av, &avd, rc, &ad);
+		int rc2 = avc_audit(sid, sid, sclass, av, &avd, rc, &ad, 0);
 		if (rc2)
 			return rc2;
 	}
diff --git a/security/selinux/include/avc.h b/security/selinux/include/avc.h
index ddf8eec03f21..5973c327c54e 100644
--- a/security/selinux/include/avc.h
+++ b/security/selinux/include/avc.h
@@ -130,7 +130,8 @@ static inline int avc_audit(u32 ssid, u32 tsid,
 			    u16 tclass, u32 requested,
 			    struct av_decision *avd,
 			    int result,
-			    struct common_audit_data *a)
+			    struct common_audit_data *a,
+			    int flags)
 {
 	u32 audited, denied;
 	audited = avc_audit_required(requested, avd, result, 0, &denied);
@@ -138,7 +139,7 @@ static inline int avc_audit(u32 ssid, u32 tsid,
 		return 0;
 	return slow_avc_audit(ssid, tsid, tclass,
 			      requested, audited, denied, result,
-			      a, 0);
+			      a, flags);
 }
 
 #define AVC_STRICT 1 /* Ignore permissive mode. */
@@ -150,6 +151,10 @@ int avc_has_perm_noaudit(u32 ssid, u32 tsid,
 int avc_has_perm(u32 ssid, u32 tsid,
 		 u16 tclass, u32 requested,
 		 struct common_audit_data *auditdata);
+int avc_has_perm_flags(u32 ssid, u32 tsid,
+		       u16 tclass, u32 requested,
+		       struct common_audit_data *auditdata,
+		       int flags);
 
 u32 avc_policy_seqno(void);
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 13/20] VFS/namei: abort RCU-walk on symlink if atime needs updating.
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (10 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 08/20] VFS: make all ->follow_link handlers aware for LOOKUP_RCU NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 14/20] VFS/namei: add 'inode' arg to put_link() NeilBrown
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

touch_atime is not RCU-safe, and so cannot be called on an
RCU walk.
However in situations where RCU-walk makes a difference,
the symlink will likely to accessed much more often than
it is useful to update the atime.

So split out the test of "Does the atime actually need to be updated"
into  atime_needs_update(), and only allow RCU-walk on a symlink if
that fails.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/inode.c         |   26 +++++++++++++++++++-------
 fs/namei.c         |    7 ++++++-
 include/linux/fs.h |    1 +
 3 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index f00b16f45507..a0da920e4650 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1584,30 +1584,41 @@ static int update_time(struct inode *inode, struct timespec *time, int flags)
  *	This function automatically handles read only file systems and media,
  *	as well as the "noatime" flag and inode specific "noatime" markers.
  */
-void touch_atime(const struct path *path)
+int atime_needs_update(const struct path *path)
 {
 	struct vfsmount *mnt = path->mnt;
 	struct inode *inode = path->dentry->d_inode;
 	struct timespec now;
 
 	if (inode->i_flags & S_NOATIME)
-		return;
+		return 0;
 	if (IS_NOATIME(inode))
-		return;
+		return 0;
 	if ((inode->i_sb->s_flags & MS_NODIRATIME) && S_ISDIR(inode->i_mode))
-		return;
+		return 0;
 
 	if (mnt->mnt_flags & MNT_NOATIME)
-		return;
+		return 0;
 	if ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode))
-		return;
+		return 0;
 
 	now = current_fs_time(inode->i_sb);
 
 	if (!relatime_need_update(mnt, inode, now))
-		return;
+		return 0;
 
 	if (timespec_equal(&inode->i_atime, &now))
+		return 0;
+	return 1;
+}
+
+void touch_atime(const struct path *path)
+{
+	struct vfsmount *mnt = path->mnt;
+	struct inode *inode = path->dentry->d_inode;
+	struct timespec now;
+
+	if (!atime_needs_update(path))
 		return;
 
 	if (!sb_start_write_trylock(inode->i_sb))
@@ -1624,6 +1635,7 @@ void touch_atime(const struct path *path)
 	 * We may also fail on filesystems that have the ability to make parts
 	 * of the fs read only, e.g. subvolumes in Btrfs.
 	 */
+	now = current_fs_time(inode->i_sb);
 	update_time(inode, &now, S_ATIME);
 	__mnt_drop_write(mnt);
 skip_update:
diff --git a/fs/namei.c b/fs/namei.c
index 3262c8c2e73d..224b1495edae 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -909,7 +909,12 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	cond_resched();
 	nd->total_link_count++;
 
-	touch_atime(link);
+	if (nd->flags & LOOKUP_RCU) {
+		error = -ECHILD;
+		if (atime_needs_update(link))
+			goto out_put_nd_path;
+	} else
+		touch_atime(link);
 	nd_set_link(NULL);
 
 	error = security_inode_follow_link(dentry, inode,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index dda92ac8ef41..0cd650b4e7c3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1844,6 +1844,7 @@ enum file_time_flags {
 	S_VERSION = 8,
 };
 
+extern int atime_needs_update(const struct path *);
 extern void touch_atime(const struct path *);
 static inline void file_accessed(struct file *file)
 {

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 14/20] VFS/namei: add 'inode' arg to put_link().
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (11 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 13/20] VFS/namei: abort RCU-walk on symlink if atime needs updating NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-04-17 16:25   ` Al Viro
  2015-03-23  2:37 ` [PATCH 16/20] VFS/namei: enable RCU-walk when following symlinks NeilBrown
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

When symlinks are followed in RCU-walk, dentry->d_inode
may have changed between the call to ->follow_link and
the call to ->put_link.
So we need to preserve the inode used in the first instance,
and use it to find the correct put_link.

Note that this means that when RCU-walk is permitted in
->follow_link, dentry->d_inode cannot be used in ->put_link.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 Documentation/filesystems/porting |    4 ++++
 fs/namei.c                        |   20 ++++++++++++--------
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index eba8dd0a13e3..09454610515c 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -490,3 +490,7 @@ in your dentry operations instead.
 	The passed inode must be used rather than dentry->d_inode,
 	particularly if LOOKUP_RCU is set.
 	If s_fs_info is used, it must be freed using RCU.
+--
+[mandatory]
+	If ->follow_link permits RCU-walk, then ->put_link must
+	not access dentry->d_inode as that may have changed.
diff --git a/fs/namei.c b/fs/namei.c
index 224b1495edae..72f5a4f91855 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -763,9 +763,9 @@ static void terminate_walk(struct nameidata *nd)
 	}
 }
 
-static inline void put_link(struct nameidata *nd, struct path *link, void *cookie)
+static inline void put_link(struct nameidata *nd, struct path *link,
+			    struct inode *inode, void *cookie)
 {
-	struct inode *inode = link->dentry->d_inode;
 	if (inode->i_op->put_link)
 		inode->i_op->put_link(link->dentry, nd_get_link(nd), cookie);
 	if (!(nd->flags & LOOKUP_LINK_RCU))
@@ -934,7 +934,7 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	if (s) {
 		if (unlikely(IS_ERR(s))) {
 			terminate_walk(nd);
-			put_link(nd, link, *p);
+			put_link(nd, link, inode, *p);
 			return PTR_ERR(s);
 		}
 		if (*s == '/') {
@@ -948,7 +948,7 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 		nd->inode = nd->path.dentry->d_inode;
 		error = link_path_walk(s, nd);
 		if (unlikely(error))
-			put_link(nd, link, *p);
+			put_link(nd, link, inode, *p);
 	}
 
 	return error;
@@ -1669,13 +1669,14 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
 
 	do {
 		struct path link = *path;
+		struct inode *inode = link.dentry->d_inode;
 		void *cookie;
 
 		res = follow_link(&link, nd, &cookie);
 		if (res)
 			break;
 		res = walk_component(nd, path, LOOKUP_FOLLOW);
-		put_link(nd, &link, cookie);
+		put_link(nd, &link, inode, cookie);
 	} while (res > 0);
 
 	nd->link_count--;
@@ -2036,6 +2037,7 @@ static int path_lookupat(int dfd, const char *name,
 		while (err > 0) {
 			void *cookie;
 			struct path link = path;
+			struct inode *inode = link.dentry->d_inode;
 			err = may_follow_link(&link, nd);
 			if (unlikely(err))
 				break;
@@ -2044,7 +2046,7 @@ static int path_lookupat(int dfd, const char *name,
 			if (err)
 				break;
 			err = lookup_last(nd, &path);
-			put_link(nd, &link, cookie);
+			put_link(nd, &link, inode, cookie);
 		}
 	}
 
@@ -2396,6 +2398,7 @@ path_mountpoint(int dfd, const char *name, struct path *path, unsigned int flags
 	while (err > 0) {
 		void *cookie;
 		struct path link = *path;
+		struct inode *inode = link.dentry->d_inode;
 		err = may_follow_link(&link, &nd);
 		if (unlikely(err))
 			break;
@@ -2404,7 +2407,7 @@ path_mountpoint(int dfd, const char *name, struct path *path, unsigned int flags
 		if (err)
 			break;
 		err = mountpoint_last(&nd, path);
-		put_link(&nd, &link, cookie);
+		put_link(&nd, &link, inode, cookie);
 	}
 out:
 	path_cleanup(&nd);
@@ -3281,6 +3284,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
 	error = do_last(nd, &path, file, op, &opened, pathname);
 	while (unlikely(error > 0)) { /* trailing symlink */
 		struct path link = path;
+		struct inode *inode = link.dentry->d_inode;
 		void *cookie;
 		if (!(nd->flags & LOOKUP_FOLLOW)) {
 			path_to_nameidata(&path, nd);
@@ -3297,7 +3301,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
 		if (unlikely(error))
 			break;
 		error = do_last(nd, &path, file, op, &opened, pathname);
-		put_link(nd, &link, cookie);
+		put_link(nd, &link, inode, cookie);
 	}
 out:
 	path_cleanup(nd);



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 12/20] VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU.
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (17 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 17/20] VFS/namei: handle LOOKUP_RCU in page_follow_link_light NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 15/20] VFS/namei: enhance follow_link to support RCU-walk NeilBrown
  2015-03-25 23:23 ` [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

When we support ->follow_link in RCU-walk we will not want to
take a reference to the 'struct path *link' passed to follow_link,
and correspondingly will not want to drop that reference.

As link_path_walk will complete_walk() in the case of an error,
and as complete_walk() will clear LOOKUP_RCU, we cannot test
LOOKUP_RCU to determine if the path should be 'put'.

So introduce a new flag: LOOKUP_LINK_RCU.  This is set on
entry to follow_link() if appropriate and put_link() will
only call path_put() if it is clear.

Also, unlazy_walk() will fail if LOOKUP_LINK_RCU is set.
This is because there is no way for unlazy_walk to get references
on all the "struct path *link"s that are protected by that flag.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c            |   18 +++++++++++++-----
 include/linux/namei.h |    1 +
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1a8cc0e47df6..3262c8c2e73d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -552,6 +552,9 @@ static int unlazy_walk(struct nameidata *nd, struct dentry *dentry)
 	struct dentry *parent = nd->path.dentry;
 
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
+	if (nd->flags & LOOKUP_LINK_RCU)
+		/* Cannot unlazy in the middle of following a symlink */
+		return -ECHILD;
 
 	/*
 	 * After legitimizing the bastards, terminate_walk()
@@ -765,7 +768,8 @@ static inline void put_link(struct nameidata *nd, struct path *link, void *cooki
 	struct inode *inode = link->dentry->d_inode;
 	if (inode->i_op->put_link)
 		inode->i_op->put_link(link->dentry, nd_get_link(nd), cookie);
-	path_put(link);
+	if (!(nd->flags & LOOKUP_LINK_RCU))
+		path_put(link);
 }
 
 int sysctl_protected_symlinks __read_mostly = 0;
@@ -892,9 +896,10 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	int error;
 	char *s;
 
-	BUG_ON(nd->flags & LOOKUP_RCU);
-
-	if (link->mnt == nd->path.mnt)
+	nd->flags &= ~LOOKUP_LINK_RCU;
+	if (nd->flags & LOOKUP_RCU)
+		nd->flags |= LOOKUP_LINK_RCU;
+	else if (link->mnt == nd->path.mnt)
 		mntget(link->mnt);
 
 	error = -ELOOP;
@@ -946,7 +951,8 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 out_put_nd_path:
 	*p = NULL;
 	terminate_walk(nd);
-	path_put(link);
+	if (!(nd->flags & LOOKUP_LINK_RCU))
+		path_put(link);
 	return error;
 }
 
@@ -1669,6 +1675,8 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
 
 	nd->link_count--;
 	nd->depth--;
+	if (!nd->depth)
+		nd->flags &= ~LOOKUP_LINK_RCU;
 	return res;
 }
 
diff --git a/include/linux/namei.h b/include/linux/namei.h
index cc8b51a47160..633101964520 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -31,6 +31,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
 #define LOOKUP_PARENT		0x0010
 #define LOOKUP_REVAL		0x0020
 #define LOOKUP_RCU		0x0040
+#define LOOKUP_LINK_RCU		0x0080
 
 /*
  * Intent data

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 15/20] VFS/namei: enhance follow_link to support RCU-walk.
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (18 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 12/20] VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-25 23:23 ` [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

If LOOKUP_RCU is set, follow_link will not take/drop reference counts.

Replace cond_resched() with _cond_resched() as the latter
is a no-op if rcu_read_lock() is held while the former will
give a warning in that case.

After taking a copy of dentry->d_inode, check d_seq to ensure this
is still the symlink we were looking for.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |   27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 72f5a4f91855..40ff4cb04244 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -897,16 +897,21 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	char *s;
 
 	nd->flags &= ~LOOKUP_LINK_RCU;
-	if (nd->flags & LOOKUP_RCU)
+	if (nd->flags & LOOKUP_RCU) {
 		nd->flags |= LOOKUP_LINK_RCU;
-	else if (link->mnt == nd->path.mnt)
+		if (__read_seqcount_retry(&dentry->d_seq, nd->seq)) {
+			error = -ECHILD;
+			goto out_put_nd_path;
+		}
+	} else if (link->mnt == nd->path.mnt)
 		mntget(link->mnt);
 
 	error = -ELOOP;
 	if (unlikely(nd->total_link_count >= 40))
 		goto out_put_nd_path;
 
-	cond_resched();
+	/* If rcu_read_locked(), this will not resched, and will not warn */
+	_cond_resched();
 	nd->total_link_count++;
 
 	if (nd->flags & LOOKUP_RCU) {
@@ -938,11 +943,17 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 			return PTR_ERR(s);
 		}
 		if (*s == '/') {
-			if (!nd->root.mnt)
-				set_root(nd);
-			path_put(&nd->path);
-			nd->path = nd->root;
-			path_get(&nd->root);
+			if (nd->flags & LOOKUP_RCU) {
+				if (!nd->root.mnt)
+					set_root_rcu(nd);
+				nd->path = nd->root;
+			} else {
+				if (!nd->root.mnt)
+					set_root(nd);
+				path_put(&nd->path);
+				nd->path = nd->root;
+				path_get(&nd->root);
+			}
 			nd->flags |= LOOKUP_JUMPED;
 		}
 		nd->inode = nd->path.dentry->d_inode;

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 16/20] VFS/namei: enable RCU-walk when following symlinks.
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (12 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 14/20] VFS/namei: add 'inode' arg to put_link() NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 19/20] XFS: allow follow_link to often succeed in RCU-walk NeilBrown
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

Now that follow_link handles LOOKUP_RCU, we do not need to
'unlazy_walk' when a symlink is found.

As we remain in RCU-walk mode, dentry->d_inode can change
so the BUG_ON() assertions are no longer appropriate.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |   21 +++------------------
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 40ff4cb04244..0f5b627bd78e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1636,16 +1636,9 @@ static inline int walk_component(struct nameidata *nd, struct path *path,
 	if (!inode || d_is_negative(path->dentry))
 		goto out_path_put;
 
-	if (should_follow_link(path->dentry, follow)) {
-		if (nd->flags & LOOKUP_RCU) {
-			if (unlikely(unlazy_walk(nd, path->dentry))) {
-				err = -ECHILD;
-				goto out_err;
-			}
-		}
-		BUG_ON(inode != path->dentry->d_inode);
+	if (should_follow_link(path->dentry, follow))
 		return 1;
-	}
+
 	path_to_nameidata(path, nd);
 	nd->inode = inode;
 	return 0;
@@ -3102,16 +3095,8 @@ finish_lookup:
 		goto out;
 	}
 
-	if (should_follow_link(path->dentry, !symlink_ok)) {
-		if (nd->flags & LOOKUP_RCU) {
-			if (unlikely(unlazy_walk(nd, path->dentry))) {
-				error = -ECHILD;
-				goto out;
-			}
-		}
-		BUG_ON(inode != path->dentry->d_inode);
+	if (should_follow_link(path->dentry, !symlink_ok))
 		return 1;
-	}
 
 	if ((nd->flags & LOOKUP_RCU) || nd->path.mnt != path->mnt) {
 		path_to_nameidata(path, nd);

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 17/20] VFS/namei: handle LOOKUP_RCU in page_follow_link_light.
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (16 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 18/20] xfs: use RCU to free 'struct xfs_mount' NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 12/20] VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU NeilBrown
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

If the symlink has already be been read-in, then
page_follow_link_light can succeed in RCU-walk mode.
page_getlink_rcu() is added to support this.

With this, many filesystems can follow links in RCU-walk
mode when everything is cached.  This includes ext?fs and
others.

If the page is a HighMem page we do *not* try to kmap_atomic,
but simply give up - only page_address() is used.
This is because we need to be able to sleep while holding
the address of the page, particularly over calls to do_last()
which can be quite slow and in particular takes a mutex.

If this were a problem, then copying into a GFP_ATOMIC allocation
might be a workable solution.

This selective calling of kmap requires us to know, in page_put_link,
whether or not kunmap() needs to be called.  Pass this information in
the lsb of the cookie.

The new page_getlink_rcu() needs to be passed the inode rather than
the dentry (as dentry->d_inode is not stable), so change
page_getlink() to behave the same way: it only needed the dentry
to get the inode.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |   45 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 0f5b627bd78e..d13b4315447f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4497,24 +4497,48 @@ int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 EXPORT_SYMBOL(generic_readlink);
 
 /* get the link contents into pagecache */
-static char *page_getlink(struct dentry * dentry, struct page **ppage)
+static char *page_getlink(struct inode *inode, struct page **ppage)
 {
 	char *kaddr;
 	struct page *page;
-	struct address_space *mapping = dentry->d_inode->i_mapping;
+	struct address_space *mapping = inode->i_mapping;
 	page = read_mapping_page(mapping, 0, NULL);
 	if (IS_ERR(page))
 		return (char*)page;
 	*ppage = page;
 	kaddr = kmap(page);
-	nd_terminate_link(kaddr, dentry->d_inode->i_size, PAGE_SIZE - 1);
+	nd_terminate_link(kaddr, inode->i_size, PAGE_SIZE - 1);
+	return kaddr;
+}
+
+/* get the link contents from pagecache under RCU */
+static char *page_getlink_rcu(struct inode *inode, struct page **ppage)
+{
+	char *kaddr;
+	struct page *page;
+	struct address_space *mapping = inode->i_mapping;
+
+	page = find_get_page(mapping, 0);
+	if (page &&
+	    (!PageUptodate(page) || PageHighMem(page))) {
+		put_page(page);
+		page = NULL;
+	}
+	if (!page) {
+		*ppage = ERR_PTR(-ECHILD);
+		return NULL;
+	}
+	*ppage = page;
+	kaddr = page_address(page);
+	nd_terminate_link(kaddr, inode->i_size, PAGE_SIZE - 1);
 	return kaddr;
 }
 
 int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 {
 	struct page *page = NULL;
-	int res = readlink_copy(buffer, buflen, page_getlink(dentry, &page));
+	int res = readlink_copy(buffer, buflen,
+				page_getlink(dentry->d_inode, &page));
 	if (page) {
 		kunmap(page);
 		page_cache_release(page);
@@ -4527,19 +4551,22 @@ void *page_follow_link_light(struct dentry *dentry, struct inode *inode,
 			     int flags)
 {
 	struct page *page = NULL;
-	if (flags & LOOKUP_RCU)
-		return ERR_PTR(-ECHILD);
-	nd_set_link(page_getlink(dentry, &page));
+	if (flags & LOOKUP_RCU) {
+		nd_set_link(page_getlink_rcu(inode, &page));
+		page = (void *)((unsigned long)page | 1);
+	} else
+		nd_set_link(page_getlink(inode, &page));
 	return page;
 }
 EXPORT_SYMBOL(page_follow_link_light);
 
 void page_put_link(struct dentry *dentry, char *link, void *cookie)
 {
-	struct page *page = cookie;
+	struct page *page = (void *)((unsigned long)cookie & ~1UL);
 
 	if (page) {
-		kunmap(page);
+		if (page == cookie)
+			kunmap(page);
 		page_cache_release(page);
 	}
 }

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 19/20] XFS: allow follow_link to often succeed in RCU-walk.
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (13 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 16/20] VFS/namei: enable RCU-walk when following symlinks NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 20/20] NFS: support LOOKUP_RCU in nfs_follow_link NeilBrown
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

If LOOKUP_RCU is set, use GFP_ATOMIC rather than GFP_KERNEL,
and try to get the ilock without blocking.

When these succeed, follow_link() can succeed without dropping
out of RCU-walk.

As xfs_readlink can now races with xfs_fs_evict_inode:
- xfs_readlink must check if the inode is being evicted after
  getting the lock
- xfs_fs_evict_inode cannot assert that the lock is not held.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/xfs/xfs_ioctl.c   |    2 +-
 fs/xfs/xfs_iops.c    |   15 ++++++++++-----
 fs/xfs/xfs_super.c   |    2 --
 fs/xfs/xfs_symlink.c |   15 +++++++++++++--
 fs/xfs/xfs_symlink.h |    2 +-
 5 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index ac4feae45eb3..29d95a1b76c0 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -303,7 +303,7 @@ xfs_readlink_by_handle(
 		goto out_dput;
 	}
 
-	error = xfs_readlink(XFS_I(dentry->d_inode), link);
+	error = xfs_readlink(XFS_I(dentry->d_inode), link, 0);
 	if (error)
 		goto out_kfree;
 	error = readlink_copy(hreq->ohandle, olen, link);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index c2c136ef3a50..631b1ec2e650 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -416,15 +416,20 @@ xfs_vn_follow_link(
 	int			flags)
 {
 	char			*link;
-	int			error = -ENOMEM;
+	int			error;
 
-	if (flags & LOOKUP_RCU)
-		return ERR_PTR(-ECHILD);
-	link = kmalloc(MAXPATHLEN+1, GFP_KERNEL);
+	if (flags & LOOKUP_RCU) {
+		error = -ECHILD;
+		link = kmalloc(MAXPATHLEN+1, GFP_ATOMIC);
+	} else {
+		error = -ENOMEM;
+		link = kmalloc(MAXPATHLEN+1, GFP_KERNEL);
+	}
 	if (!link)
 		goto out_err;
 
-	error = xfs_readlink(XFS_I(inode), link);
+	error = xfs_readlink(XFS_I(inode), link,
+			     flags & LOOKUP_RCU);
 	if (unlikely(error))
 		goto out_kfree;
 
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 3827be14383c..e041fa55912b 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -996,8 +996,6 @@ xfs_fs_evict_inode(
 {
 	xfs_inode_t		*ip = XFS_I(inode);
 
-	ASSERT(!rwsem_is_locked(&ip->i_iolock.mr_lock));
-
 	trace_xfs_evict_inode(ip);
 
 	truncate_inode_pages_final(&inode->i_data);
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 25791df6f638..228987b8e758 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -123,7 +123,8 @@ xfs_readlink_bmap(
 int
 xfs_readlink(
 	struct xfs_inode *ip,
-	char		*link)
+	char		*link,
+	int		rcu)
 {
 	struct xfs_mount *mp = ip->i_mount;
 	xfs_fsize_t	pathlen;
@@ -134,7 +135,15 @@ xfs_readlink(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	xfs_ilock(ip, XFS_ILOCK_SHARED);
+	if (rcu) {
+		if (xfs_ilock_nowait(ip, XFS_ILOCK_SHARED) == 0)
+			return -ECHILD;
+		if (ip->i_vnode.i_state & (I_FREEING | I_CLEAR)) {
+			xfs_iunlock(ip, XFS_ILOCK_EXCL);
+			return -ECHILD;
+		}
+	} else
+		xfs_ilock(ip, XFS_ILOCK_SHARED);
 
 	pathlen = ip->i_d.di_size;
 	if (!pathlen)
@@ -153,6 +162,8 @@ xfs_readlink(
 	if (ip->i_df.if_flags & XFS_IFINLINE) {
 		memcpy(link, ip->i_df.if_u1.if_data, pathlen);
 		link[pathlen] = '\0';
+	} else if (rcu) {
+		error = -ECHILD;
 	} else {
 		error = xfs_readlink_bmap(ip, link);
 	}
diff --git a/fs/xfs/xfs_symlink.h b/fs/xfs/xfs_symlink.h
index e75245d09116..a71d26643e20 100644
--- a/fs/xfs/xfs_symlink.h
+++ b/fs/xfs/xfs_symlink.h
@@ -21,7 +21,7 @@
 
 int xfs_symlink(struct xfs_inode *dp, struct xfs_name *link_name,
 		const char *target_path, umode_t mode, struct xfs_inode **ipp);
-int xfs_readlink(struct xfs_inode *ip, char *link);
+int xfs_readlink(struct xfs_inode *ip, char *link, int rcu);
 int xfs_inactive_symlink(struct xfs_inode *ip);
 
 #endif /* __XFS_SYMLINK_H */

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 18/20] xfs: use RCU to free 'struct xfs_mount'.
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (15 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 20/20] NFS: support LOOKUP_RCU in nfs_follow_link NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 17/20] VFS/namei: handle LOOKUP_RCU in page_follow_link_light NeilBrown
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

In order for ->follow_link to be safe in RCU-walk, any
data structures accessed need to be freed after
an RCU grace period.

'struct xfs_mount' is not currently guaranteed to be delayed
sufficiently, so use kfree_rcu() to free it.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/xfs/xfs_mount.h |    2 ++
 fs/xfs/xfs_super.c |    2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 0d8abd6364d9..6a1094e493e9 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -185,6 +185,8 @@ typedef struct xfs_mount {
 	 * to various other kinds of pain inflicted on the pNFS server.
 	 */
 	__uint32_t		m_generation;
+
+	struct rcu_head		m_rcu;
 } xfs_mount_t;
 
 /*
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 8fcc4ccc5c79..3827be14383c 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1047,7 +1047,7 @@ xfs_fs_put_super(
 	xfs_destroy_mount_workqueues(mp);
 	xfs_close_devices(mp);
 	xfs_free_fsname(mp);
-	kfree(mp);
+	kfree_rcu(mp, m_rcu);
 }
 
 STATIC int

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 20/20] NFS: support LOOKUP_RCU in nfs_follow_link.
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (14 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 19/20] XFS: allow follow_link to often succeed in RCU-walk NeilBrown
@ 2015-03-23  2:37 ` NeilBrown
  2015-03-23  2:37 ` [PATCH 18/20] xfs: use RCU to free 'struct xfs_mount' NeilBrown
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-23  2:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

If the inode is valid and the page has been read in,
then we can follow a link in RCU-walk.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/inode.c         |   21 +++++++++++++++++++++
 fs/nfs/symlink.c       |   20 ++++++++++++++++++--
 include/linux/nfs_fs.h |    1 +
 3 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index d42dff6d5e98..430899a789c6 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1162,6 +1162,27 @@ int nfs_revalidate_mapping_protected(struct inode *inode, struct address_space *
 	return __nfs_revalidate_mapping(inode, mapping, true);
 }
 
+int nfs_revalidate_mapping_rcu(struct inode *inode)
+{
+	struct nfs_inode *nfsi = NFS_I(inode);
+	unsigned long *bitlock = &nfsi->flags;
+	int ret = 0;
+
+	if (IS_SWAPFILE(inode))
+		goto out;
+	if (nfs_mapping_need_revalidate_inode(inode)) {
+		ret = -ECHILD;
+		goto out;
+	}
+	spin_lock(&inode->i_lock);
+	if (test_bit(NFS_INO_INVALIDATING, bitlock) ||
+	    (nfsi->cache_validity & NFS_INO_INVALID_DATA))
+		ret = -ECHILD;
+	spin_unlock(&inode->i_lock);
+out:
+	return ret;
+}
+
 static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr *fattr)
 {
 	struct nfs_inode *nfsi = NFS_I(inode);
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index 32bbac1bb4bc..1c0a94b3d12e 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -49,8 +49,24 @@ static void *nfs_follow_link(struct dentry *dentry, struct inode *inode,
 	struct page *page;
 	void *err;
 
-	if (flags & LOOKUP_RCU)
-		return ERR_PTR(-ECHILD);
+	if (flags & LOOKUP_RCU) {
+		err = ERR_PTR(nfs_revalidate_mapping_rcu(inode));
+		if (err)
+			goto read_failed;
+		page = find_get_page(inode->i_mapping, 0);
+		if (page &&
+		    (!PageUptodate(page) || PageHighMem(page))) {
+			put_page(page);
+			page = NULL;
+		}
+		if (!page) {
+			err = ERR_PTR(-ECHILD);
+			goto read_failed;
+		}
+		nd_set_link(page_address(page));
+		page = (void *)((unsigned long)page | 1);
+		return page;
+	}
 	err = ERR_PTR(nfs_revalidate_mapping(inode, inode->i_mapping));
 	if (err)
 		goto read_failed;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index b01ccf371fdc..2eea59456ebb 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -357,6 +357,7 @@ extern int nfs_revalidate_inode_rcu(struct nfs_server *server, struct inode *ino
 extern int __nfs_revalidate_inode(struct nfs_server *, struct inode *);
 extern int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping);
 extern int nfs_revalidate_mapping_protected(struct inode *inode, struct address_space *mapping);
+extern int nfs_revalidate_mapping_rcu(struct inode *inode);
 extern int nfs_setattr(struct dentry *, struct iattr *);
 extern void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr, struct nfs_fattr *);
 extern void nfs_setsecurity(struct inode *inode, struct nfs_fattr *fattr,

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/20] Support follow_link in RCU-walk - V3
  2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
                   ` (19 preceding siblings ...)
  2015-03-23  2:37 ` [PATCH 15/20] VFS/namei: enhance follow_link to support RCU-walk NeilBrown
@ 2015-03-25 23:23 ` NeilBrown
  20 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2015-03-25 23:23 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 968 bytes --]

On Mon, 23 Mar 2015 13:37:38 +1100 NeilBrown <neilb@suse.de> wrote:

> Hi Al,
>  thanks for all your review help - particularly the fact that
>  dentry->d_inode is not stable in RCU-walk.  That has lead to
>  a number of changes.
> 
>  I think this set addresses all of your review comments, improves
>  some documentation, and has a go at providing a solution for lustre.
> 
>  I hope to organize some proper testing soon, so I can confirm that it
>  makes certain loads a lot faster.

Just FYI - I've now tested this series on a 64-CPU machine running load
that has lots of threads doing lots of stats on lots of non-existent files.
Without the patches, the test takes twice as long if there is a symlink early
in the path for the stats, than if there is not.
With the patches, there is no significant difference in the time the test
takes between the symlink and no-symlink cases.

So it appears to be achieving the goal.

Thanks,
NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 14/20] VFS/namei: add 'inode' arg to put_link().
  2015-03-23  2:37 ` [PATCH 14/20] VFS/namei: add 'inode' arg to put_link() NeilBrown
@ 2015-04-17 16:25   ` Al Viro
  2015-04-17 19:09     ` Al Viro
  0 siblings, 1 reply; 29+ messages in thread
From: Al Viro @ 2015-04-17 16:25 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel, linux-kernel

On Mon, Mar 23, 2015 at 01:37:40PM +1100, NeilBrown wrote:
> @@ -1669,13 +1669,14 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
>  
>  	do {
>  		struct path link = *path;
> +		struct inode *inode = link.dentry->d_inode;
>  		void *cookie;
>  
>  		res = follow_link(&link, nd, &cookie);
>  		if (res)
>  			break;
>  		res = walk_component(nd, path, LOOKUP_FOLLOW);
> -		put_link(nd, &link, cookie);
> +		put_link(nd, &link, inode, cookie);
>  	} while (res > 0);

That's really unpleasant - it means increased stack footprint in the
recursion.

Damn, maybe it's time to bite the bullet and kill the recursion completely...

What do we really need to save across the recursive call?
	* how far did we get in the previous pathname
	* data needed for put_link:
		cookie
		link body
		dentry of link
		vfsmount (to pin containing fs; non-RCU) or inode (RCU)

We are already saving link body in nameidata, so we could fatten that array.
It would allow flattening link_path_walk() completely - instead of
recursive call we would just save what needed saving and jump to the beginning
and on exits we'd check the depth and either return or restore the saved state
and jump back to just past the place where recursive call used to be.
It would even save quite a bit of space in the worst case.  However, it would
blow the stack footprint in normal cases *and* blow it even worse for the
things that need two struct nameidata instances at once (rename(), basically).
5 pointers instead of 1 pointer per level - extra 32 words on stack, i.e.
extra 256 bytes on 64bit.  Extra 0.5Kb of stack footprint on rename() is
probably too much, especially since this "saved" stuff from its two nameidata
instances will never be used at the same time...

Alternatively, we could just allocate about a page worth of an array when
the depth of nesting goes beyond 2 and put this saved stuff there - at
5 pointers per level it would completely dispose of the depth of nesting
limit, giving us uniform "can't traverse more than 40 symlinks per pathname
resolution".  40 * 5 * sizeof(pointer) is what, at most 1600 bytes?  So
even half a page would suffice for that quite comfortably...

The question is whether we'll be able to avoid blowing the I-cache footprint
of link_path_walk() to hell while doing that; it feels like we should be,
but we'll have to see how well does that work in reality...

I'll try to implement that (with your #3..#7 as the first steps) and see
how well does it work; it's obviously the next cycle fodder, but hopefully
in testable shape by -rc2...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 14/20] VFS/namei: add 'inode' arg to put_link().
  2015-04-17 16:25   ` Al Viro
@ 2015-04-17 19:09     ` Al Viro
  2015-04-18  8:09       ` Al Viro
  0 siblings, 1 reply; 29+ messages in thread
From: Al Viro @ 2015-04-17 19:09 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel, linux-kernel

On Fri, Apr 17, 2015 at 05:25:36PM +0100, Al Viro wrote:
> On Mon, Mar 23, 2015 at 01:37:40PM +1100, NeilBrown wrote:
> > @@ -1669,13 +1669,14 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
> >  
> >  	do {
> >  		struct path link = *path;
> > +		struct inode *inode = link.dentry->d_inode;
> >  		void *cookie;
> >  
> >  		res = follow_link(&link, nd, &cookie);
> >  		if (res)
> >  			break;
> >  		res = walk_component(nd, path, LOOKUP_FOLLOW);
> > -		put_link(nd, &link, cookie);
> > +		put_link(nd, &link, inode, cookie);
> >  	} while (res > 0);
> 
> That's really unpleasant - it means increased stack footprint in the
> recursion.
> 
> Damn, maybe it's time to bite the bullet and kill the recursion completely...
> 
> What do we really need to save across the recursive call?
> 	* how far did we get in the previous pathname
> 	* data needed for put_link:
> 		cookie
> 		link body
> 		dentry of link
> 		vfsmount (to pin containing fs; non-RCU) or inode (RCU)
> 
> We are already saving link body in nameidata, so we could fatten that array.
> It would allow flattening link_path_walk() completely - instead of
> recursive call we would just save what needed saving and jump to the beginning
> and on exits we'd check the depth and either return or restore the saved state
> and jump back to just past the place where recursive call used to be.
> It would even save quite a bit of space in the worst case.  However, it would
> blow the stack footprint in normal cases *and* blow it even worse for the
> things that need two struct nameidata instances at once (rename(), basically).
> 5 pointers instead of 1 pointer per level - extra 32 words on stack, i.e.
> extra 256 bytes on 64bit.  Extra 0.5Kb of stack footprint on rename() is
> probably too much, especially since this "saved" stuff from its two nameidata
> instances will never be used at the same time...
> 
> Alternatively, we could just allocate about a page worth of an array when
> the depth of nesting goes beyond 2 and put this saved stuff there - at
> 5 pointers per level it would completely dispose of the depth of nesting
> limit, giving us uniform "can't traverse more than 40 symlinks per pathname
> resolution".  40 * 5 * sizeof(pointer) is what, at most 1600 bytes?  So
> even half a page would suffice for that quite comfortably...
> 
> The question is whether we'll be able to avoid blowing the I-cache footprint
> of link_path_walk() to hell while doing that; it feels like we should be,
> but we'll have to see how well does that work in reality...
> 
> I'll try to implement that (with your #3..#7 as the first steps) and see
> how well does it work; it's obviously the next cycle fodder, but hopefully
> in testable shape by -rc2...

	Hmm...  Actually, right now we have 192 bytes of stack footprint per
nesting level (amd64 allmodconfig).  Which means that simply making the
array fatter would give a clean benefit at the 3rd level of recursion (symlink
encountered while traversing a symlink) for everything other than rename()...
allnoconfig+64bit gives 160 bytes per level, with the same breakeven point.

	Interesting...  It might even make sense to separate that array from
struct nameidata and solve the rename() problem that way (current->nameidata
would be replaced with pointer to that sucker in such variant, of course, and
->depth would move there).  In that variant we do not get rid of nesting limit
completely, but it would probably be simpler than the one above...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/20] STAGING/lustre: limit follow_link recursion using stack space.
  2015-03-23  2:37 ` [PATCH 02/20] STAGING/lustre: limit follow_link recursion using stack space NeilBrown
@ 2015-04-18  3:01   ` Al Viro
  2015-04-19 20:57     ` Andreas Dilger
  0 siblings, 1 reply; 29+ messages in thread
From: Al Viro @ 2015-04-18  3:01 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel, linux-kernel, Drokin, Oleg, Andreas Dilger

On Mon, Mar 23, 2015 at 01:37:38PM +1100, NeilBrown wrote:
> lustre's ->follow_link() uses a lot of stack space and so
> need to limit symlink recursion based on stack size.
> 
> It currently tests current->link_count, but that will soon
> become private to fs/namei.c.
> So instead base on actual available stack space.
> This patch aborts recursive symlinks in less than 2K of space
> is available.  This seems consistent with current code, but
> hasn't been tested.

BTW, in the best case that logics is fishy.  We have "up to 5 levels
with 4Kb stack and up to 7 with 8Kb one".  Could somebody manage to dig out
the reasons for such limits?  Preferably along with the kernel version where
the overflows had been observed, both for 4K and 8K cases.

I'm very tempted to rip that thing out in the "kill link_path_walk()
recursion completely" series...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 14/20] VFS/namei: add 'inode' arg to put_link().
  2015-04-17 19:09     ` Al Viro
@ 2015-04-18  8:09       ` Al Viro
  0 siblings, 0 replies; 29+ messages in thread
From: Al Viro @ 2015-04-18  8:09 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel, linux-kernel

On Fri, Apr 17, 2015 at 08:09:10PM +0100, Al Viro wrote:
> On Fri, Apr 17, 2015 at 05:25:36PM +0100, Al Viro wrote:
> > On Mon, Mar 23, 2015 at 01:37:40PM +1100, NeilBrown wrote:
> > > @@ -1669,13 +1669,14 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
> > >  
> > >  	do {
> > >  		struct path link = *path;
> > > +		struct inode *inode = link.dentry->d_inode;
> > >  		void *cookie;
> > >  
> > >  		res = follow_link(&link, nd, &cookie);
> > >  		if (res)
> > >  			break;
> > >  		res = walk_component(nd, path, LOOKUP_FOLLOW);
> > > -		put_link(nd, &link, cookie);
> > > +		put_link(nd, &link, inode, cookie);
> > >  	} while (res > 0);
> > 
> > That's really unpleasant - it means increased stack footprint in the
> > recursion.
> > 
> > Damn, maybe it's time to bite the bullet and kill the recursion completely...
> > 
> > What do we really need to save across the recursive call?
> > 	* how far did we get in the previous pathname
> > 	* data needed for put_link:
> > 		cookie
> > 		link body
> > 		dentry of link
> > 		vfsmount (to pin containing fs; non-RCU) or inode (RCU)
> > 
> > We are already saving link body in nameidata, so we could fatten that array.
> > It would allow flattening link_path_walk() completely - instead of
> > recursive call we would just save what needed saving and jump to the beginning
> > and on exits we'd check the depth and either return or restore the saved state
> > and jump back to just past the place where recursive call used to be.
> > It would even save quite a bit of space in the worst case.  However, it would
> > blow the stack footprint in normal cases *and* blow it even worse for the
> > things that need two struct nameidata instances at once (rename(), basically).
> > 5 pointers instead of 1 pointer per level - extra 32 words on stack, i.e.
> > extra 256 bytes on 64bit.  Extra 0.5Kb of stack footprint on rename() is
> > probably too much, especially since this "saved" stuff from its two nameidata
> > instances will never be used at the same time...
> > 
> > Alternatively, we could just allocate about a page worth of an array when
> > the depth of nesting goes beyond 2 and put this saved stuff there - at
> > 5 pointers per level it would completely dispose of the depth of nesting
> > limit, giving us uniform "can't traverse more than 40 symlinks per pathname
> > resolution".  40 * 5 * sizeof(pointer) is what, at most 1600 bytes?  So
> > even half a page would suffice for that quite comfortably...
> > 
> > The question is whether we'll be able to avoid blowing the I-cache footprint
> > of link_path_walk() to hell while doing that; it feels like we should be,
> > but we'll have to see how well does that work in reality...
> > 
> > I'll try to implement that (with your #3..#7 as the first steps) and see
> > how well does it work; it's obviously the next cycle fodder, but hopefully
> > in testable shape by -rc2...
> 
> 	Hmm...  Actually, right now we have 192 bytes of stack footprint per
> nesting level (amd64 allmodconfig).  Which means that simply making the
> array fatter would give a clean benefit at the 3rd level of recursion (symlink
> encountered while traversing a symlink) for everything other than rename()...
> allnoconfig+64bit gives 160 bytes per level, with the same breakeven point.
> 
> 	Interesting...  It might even make sense to separate that array from
> struct nameidata and solve the rename() problem that way (current->nameidata
> would be replaced with pointer to that sucker in such variant, of course, and
> ->depth would move there).  In that variant we do not get rid of nesting limit
> completely, but it would probably be simpler than the one above...

	OK, right now in my tree recursion is gone, it seems to survive the
testing *and* stack footprint of link_path_walk() is 432 bytes.  Less than the
mainline with two nested symlinks, and I definitely see how to trim that down
by another 64 bytes, which would put us within a spitting distance from what
the mainline gets with a single symlink encountered in the middle of a
pathname.  It still needs more massage (link_path_walk() is ugly as hell
right now), but I see how to clean it up, and porting the rest of Neil's
RCU follow_link series on top of that shouldn't be hard.  Obviously next cycle
fodder, but if everything works out, we'll get serious stack footprint
reduction *and* not falling out of lazy pathwalk whenever we run into a symlink.

	I've dumped the current branch in vfs.git#link_path_walk; more after
I get some sleep...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/20] STAGING/lustre: limit follow_link recursion using stack space.
  2015-04-18  3:01   ` Al Viro
@ 2015-04-19 20:57     ` Andreas Dilger
  2015-04-19 21:33       ` Al Viro
  0 siblings, 1 reply; 29+ messages in thread
From: Andreas Dilger @ 2015-04-19 20:57 UTC (permalink / raw)
  To: Al Viro
  Cc: NeilBrown, linux-fsdevel, linux-kernel, Oleg Drokin, Andreas Dilger

On Apr 17, 2015, at 9:01 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> On Mon, Mar 23, 2015 at 01:37:38PM +1100, NeilBrown wrote:
>> lustre's ->follow_link() uses a lot of stack space and so
>> need to limit symlink recursion based on stack size.
>> 
>> It currently tests current->link_count, but that will soon
>> become private to fs/namei.c.
>> So instead base on actual available stack space.
>> This patch aborts recursive symlinks in less than 2K of space
>> is available.  This seems consistent with current code, but
>> hasn't been tested.
> 
> BTW, in the best case that logics is fishy.  We have "up to 5 levels with
> 4Kb stack and up to 7 with 8Kb one".  Could somebody manage to dig out
> the reasons for such limits?  Preferably along with the kernel version
> where the overflows had been observed, both for 4K and 8K cases.

Hi Al,
I checked in our bug history, and the 8KB stack limit was hit with
older clients running racer or our recursive-symlink regression test:

2.6.18: https://bugzilla.lustre.org/show_bug.cgi?id=18533#c0
2.6.16: https://bugzilla.lustre.org/show_bug.cgi?id=19380#c11

The 4KB stack limit for clients has existed a lot longer than that,
but CONFIG_4KSTACKS was not the default on all kernels for a while.
The following bug showed a stack overflow with 2.6.22 kernels:

https://bugzilla.lustre.org/show_bug.cgi?id=17379#c0

Prior to 2.6.16 when we needed client-side kernel patches and a custom
kernel build, we always forced the CONFIG_4KSTACKS off in the config.

In general, Lustre is a heavy stack user because it is a network
filesystem, and doubly so if the Lustre client is re-exporting the
filesystem to NFS clients.

I'd be happy if symlink recursion was removed completely, but so far the
added symlink recursion limit hasn't been a problem for Lustre users.

Cheers, Andreas






^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/20] STAGING/lustre: limit follow_link recursion using stack space.
  2015-04-19 20:57     ` Andreas Dilger
@ 2015-04-19 21:33       ` Al Viro
  2015-04-20  2:29         ` Al Viro
  0 siblings, 1 reply; 29+ messages in thread
From: Al Viro @ 2015-04-19 21:33 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: NeilBrown, linux-fsdevel, linux-kernel, Oleg Drokin, Andreas Dilger

On Sun, Apr 19, 2015 at 02:57:07PM -0600, Andreas Dilger wrote:

> I'd be happy if symlink recursion was removed completely, but so far the
> added symlink recursion limit hasn't been a problem for Lustre users.

Well, it's gone in my tree; I've just pushed the current queue to
vfs.git#link_path_walk.  Right now I'm looking at the unholy mess
gcc does to stack footprint with inlining - the last commit in there
is a result of exactly that.  Inlines in there really need tuning ;-/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/20] STAGING/lustre: limit follow_link recursion using stack space.
  2015-04-19 21:33       ` Al Viro
@ 2015-04-20  2:29         ` Al Viro
  0 siblings, 0 replies; 29+ messages in thread
From: Al Viro @ 2015-04-20  2:29 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: NeilBrown, linux-fsdevel, linux-kernel, Oleg Drokin, Andreas Dilger

On Sun, Apr 19, 2015 at 10:33:48PM +0100, Al Viro wrote:
> On Sun, Apr 19, 2015 at 02:57:07PM -0600, Andreas Dilger wrote:
> 
> > I'd be happy if symlink recursion was removed completely, but so far the
> > added symlink recursion limit hasn't been a problem for Lustre users.
> 
> Well, it's gone in my tree; I've just pushed the current queue to
> vfs.git#link_path_walk.  Right now I'm looking at the unholy mess
> gcc does to stack footprint with inlining - the last commit in there
> is a result of exactly that.  Inlines in there really need tuning ;-/

FWIW, right now in my tree the maximal stack footprint of call chains through
fs/namei.c (amd64, my test config, including aushit) is 1408 bytes.
Goes via rename() -> renameat2() -> user_path_parent() -> filename_lookup() ->
path_lookupa() -> path_init() or follow_link() -> link_path_walk() ->
walk_component() -> lookup_fast() -> follow_managed().  And that does *not*
depend upon the depth of symlink nesting.  The maximal depth when calling
any methods present in lustre is 1328; similar path, except that its tail
goes like walk_component() -> __lookup_hash() -> lookup_dcache() ->
->d_revalidate().  Again, independent from the symlink nesting depth.
->lookup() calls are at 1296 maximum, similar call chain, for ->permission()
it's 1152, for ->follow_link() - 1088.

For mainline it's _much_ worse.  Maximal depth on the same config is
2986 bytes (with 8 levels of nesting) and each level costs 208 bytes.
->d_revalidate() is at 2880; for lustre it would be reduced a bit (again,
208 per level), but if you have any symlinks at all, you will end up
deeper than in non-recursive variant.

And frankly, the most scary thing in there isn't lustre-related - it's NFS4
(and AFS, etc.), where ->d_automount() might get called on _that_ depth.  With
quite a bit of stack footprint of its own - we are doing NFS referral handling.
With almost 3Kb of stack already eaten up.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2015-04-20  2:29 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-23  2:37 [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown
2015-03-23  2:37 ` [PATCH 02/20] STAGING/lustre: limit follow_link recursion using stack space NeilBrown
2015-04-18  3:01   ` Al Viro
2015-04-19 20:57     ` Andreas Dilger
2015-04-19 21:33       ` Al Viro
2015-04-20  2:29         ` Al Viro
2015-03-23  2:37 ` [PATCH 03/20] VFS: replace {, total_}link_count in task_struct with pointer to nameidata NeilBrown
2015-03-23  2:37 ` [PATCH 01/20] Documentation: remove outdated information from automount-support.txt NeilBrown
2015-03-23  2:37 ` [PATCH 10/20] security: make inode_follow_link RCU-walk aware NeilBrown
2015-03-23  2:37 ` [PATCH 04/20] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link NeilBrown
2015-03-23  2:37 ` [PATCH 07/20] VFS: remove nameidata args from ->follow_link NeilBrown
2015-03-23  2:37 ` [PATCH 06/20] SECURITY: remove nameidata arg from inode_follow_link NeilBrown
2015-03-23  2:37 ` [PATCH 05/20] VFS: replace nameidata arg to ->put_link with a char* NeilBrown
2015-03-23  2:37 ` [PATCH 09/20] security/selinux: pass 'flags' arg to avc_audit() and avc_has_perm_flags() NeilBrown
2015-03-23  2:37 ` [PATCH 11/20] VFS/namei: use terminate_walk when symlink lookup fails NeilBrown
2015-03-23  2:37 ` [PATCH 08/20] VFS: make all ->follow_link handlers aware for LOOKUP_RCU NeilBrown
2015-03-23  2:37 ` [PATCH 13/20] VFS/namei: abort RCU-walk on symlink if atime needs updating NeilBrown
2015-03-23  2:37 ` [PATCH 14/20] VFS/namei: add 'inode' arg to put_link() NeilBrown
2015-04-17 16:25   ` Al Viro
2015-04-17 19:09     ` Al Viro
2015-04-18  8:09       ` Al Viro
2015-03-23  2:37 ` [PATCH 16/20] VFS/namei: enable RCU-walk when following symlinks NeilBrown
2015-03-23  2:37 ` [PATCH 19/20] XFS: allow follow_link to often succeed in RCU-walk NeilBrown
2015-03-23  2:37 ` [PATCH 20/20] NFS: support LOOKUP_RCU in nfs_follow_link NeilBrown
2015-03-23  2:37 ` [PATCH 18/20] xfs: use RCU to free 'struct xfs_mount' NeilBrown
2015-03-23  2:37 ` [PATCH 17/20] VFS/namei: handle LOOKUP_RCU in page_follow_link_light NeilBrown
2015-03-23  2:37 ` [PATCH 12/20] VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU NeilBrown
2015-03-23  2:37 ` [PATCH 15/20] VFS/namei: enhance follow_link to support RCU-walk NeilBrown
2015-03-25 23:23 ` [PATCH 00/20] Support follow_link in RCU-walk - V3 NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).