All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
@ 2015-04-20 18:12 Al Viro
  2015-04-20 18:12 ` [PATCH 01/24] lustre: rip the private symlink nesting limit out Al Viro
                   ` (25 more replies)
  0 siblings, 26 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: NeilBrown, linux-kernel, linux-fsdevel

	The patchset below gets rid of recursion in pathname resolution.
As the result, stack footprint of fs/namei.c stuff doesn't depend upon
the symlink nesting anymore and is much less than the worst cases in
mainline - as the matter of fact, it's around the footprint mainline gets
with just one or two levels of nesting.

	We could reduce it further (see below), but I'm not sure it's worth
doing - it's not much extra complexity, but we only squeeze out ~250 bytes
that way, with the worst-case footprints (those are triggered by rename())
are around 1.4Kb (instead of about twice as much in mainline).  OTOH,
that extra complexity would've allowed us to get rid of the nesting limit
entirely (i.e. we get ELOOP when we encounter 40 symlinks, no matter in
which manner they are nested).  That might be worth considering...

	Performance of the result seems to be about on par with the mainline
and so far it has survived everything I'd hit it with.  More testing is
obviously needed.

	Patches 2/24..6/24 are from Neil's RCU follow_link patchset; the
rest of his patchset is, of course, derailed by the massage done here,
but AFAICS we should be able to port it on top of this one with reasonably
little PITA.

	There are several fs-visible interface changes:
1) nameidata is _not_ passed to ->follow_link() and ->put_link().  We use
the same kind of approach as with pt_regs.
2) instead of "store the link body (with nd_set_link()), return an opaque
pointer to be given to ->put_link() to undo whatever we'd done to pin the
link body" ->follow_link() now does the opposite - it *returns* the symlink
body and stores whatever opaque pointer ->put_link() will need.  Stores
it with nd_pin_link(cookie).  Rules:
	* ERR_PTR() returned => fail lookup with that error.
	* NULL returned => we'd done the move ourselves, no body to traverse.
	* pointer to string => symlink body for traversal.  ->put_link()
will be called if non-NULL pointer had been stored by nd_pin_link().
3) ->put_link() does *not* have access to symlink body anymore.  The cases
that used to rely on seeing it (kfree_put_link(), mostly) are trivially
switched to new rules - just do nd_pin_link(body); return body; and we
are fine.

	I've converted all in-tree instances of ->follow_link()/->put_link()
(see 23/24) and it's actually less headache than with the mainline calling
conventions.

	The main reason the series is that long is that I'm trying to keep
all steps small - it *is* a serious rewrite and I want to do it in easily
verified steps.

	Overall it's fairly straightforward - instead of getting a new
stack frame for each level of nesting, we add an explicit array just for
the stuff that needs to be preserved across the recursive call - there's
not much of it (how far did we get in pathname + stuff needed for ->put_link(),
i.e. cookie and struct path of symlink itself).  That array is *not*
embedded into nameidata - only a pointer to it.  That allows to reduce the
size of struct nameidata, which is especially sensitive in rename(), where
we have two of those.  Old nd->saved_names[] is gone - not needed anymore.
	Array is held in caller (or caller of caller) stack frame.  We could
go for "just keep a couple of elements there, if we need more we'll allocate
from slab", but that doesn't buy us a lot unless we remove the limit on
nesting at the same time.  I hadn't done that in this series; we can always
do that later, it would be a fairly isolated modification.

	For RCU follow_link porting we'll probably need to replace struct
path of symlink with union of struct path and (dentry,inode) pair.  Should
be doable without blowing the stack footprint.

	The scariest remaining thing about fs/namei.c stack footprint is that
we get ->d_automount() called at pretty much maximal depth; it's about half of
what it used to be in mainline (1.3Kb instead of 2.7Kb), but there's a lot
of work done by that thing (and by finish_automount()).  Something like
NFS referral point encountered by rename(2) while looking for parent
directories can get really nasty...

	Folks, please review; the patches (on top of vfs.git#for-next) go in followups, the entire branch is in vfs.git#link_path_walk.

Shortlog:
Al Viro (19):
      lustre: rip the private symlink nesting limit out
      namei: expand nested_symlink() in its only caller
      namei.c: separate the parts of follow_link() that find the link body
      namei: fold follow_link() into link_path_walk()
      link_path_walk: handle get_link() returning ERR_PTR() immediately
      link_path_walk: don't bother with walk_component() after jumping link
      link_path_walk: turn inner loop into explicit goto
      link_path_walk: massage a bit more
      link_path_walk: get rid of duplication
      link_path_walk: final preparations to killing recursion
      link_path_walk: kill the recursion
      link_path_walk: split "return from recursive call" path
      link_path_walk: cleanup - turn goto start; into continue;
      namei: fold may_follow_link() into follow_link()
      namei: introduce nameidata->stack
      namei: regularize use of put_link() and follow_link(), trim arguments
      namei: trim the arguments of get_link()
      new ->follow_link() and ->put_link() calling conventions
      uninline walk_component()

NeilBrown (5):
      VFS: replace {, total_}link_count in task_struct with pointer to nameidata
      ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link
      VFS: replace nameidata arg to ->put_link with a char*.
      SECURITY: remove nameidata arg from inode_follow_link.
      VFS: remove nameidata args from ->follow_link

Diffstat:
 Documentation/filesystems/Locking             |   4 +-
 Documentation/filesystems/porting             |  10 +
 Documentation/filesystems/vfs.txt             |   4 +-
 drivers/staging/lustre/lustre/llite/symlink.c |  26 +-
 fs/9p/vfs_inode.c                             |  15 +-
 fs/9p/vfs_inode_dotl.c                        |   9 +-
 fs/autofs4/symlink.c                          |   5 +-
 fs/befs/linuxvfs.c                            |  46 ++--
 fs/ceph/inode.c                               |   6 +-
 fs/cifs/cifsfs.h                              |   2 +-
 fs/cifs/link.c                                |  29 +-
 fs/configfs/symlink.c                         |  28 +-
 fs/debugfs/file.c                             |   5 +-
 fs/ecryptfs/inode.c                           |  11 +-
 fs/exofs/symlink.c                            |   7 +-
 fs/ext2/symlink.c                             |   6 +-
 fs/ext3/symlink.c                             |   6 +-
 fs/ext4/symlink.c                             |   6 +-
 fs/freevxfs/vxfs_immed.c                      |  11 +-
 fs/fuse/dir.c                                 |  19 +-
 fs/gfs2/inode.c                               |  11 +-
 fs/hostfs/hostfs_kern.c                       |  16 +-
 fs/hppfs/hppfs.c                              |   9 +-
 fs/jffs2/symlink.c                            |   9 +-
 fs/jfs/symlink.c                              |   6 +-
 fs/kernfs/symlink.c                           |  23 +-
 fs/libfs.c                                    |   7 +-
 fs/namei.c                                    | 378 +++++++++++++++-----------
 fs/nfs/symlink.c                              |  18 +-
 fs/ntfs/namei.c                               |   1 -
 fs/overlayfs/inode.c                          |  36 +--
 fs/proc/base.c                                |   4 +-
 fs/proc/inode.c                               |   8 +-
 fs/proc/namespaces.c                          |   4 +-
 fs/proc/self.c                                |  24 +-
 fs/proc/thread_self.c                         |  22 +-
 fs/sysv/symlink.c                             |   5 +-
 fs/ubifs/file.c                               |   7 +-
 fs/ufs/symlink.c                              |   6 +-
 fs/xfs/xfs_iops.c                             |  12 +-
 include/linux/fs.h                            |  11 +-
 include/linux/namei.h                         |   7 +-
 include/linux/sched.h                         |   3 +-
 include/linux/security.h                      |   9 +-
 mm/shmem.c                                    |  28 +-
 security/capability.c                         |   3 +-
 security/security.c                           |   4 +-
 security/selinux/hooks.c                      |   2 +-
 48 files changed, 457 insertions(+), 471 deletions(-)

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 01/24] lustre: rip the private symlink nesting limit out
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
@ 2015-04-20 18:12 ` Al Viro
  2015-04-20 19:08   ` Andreas Dilger
  2015-04-20 18:12 ` [PATCH 02/24] VFS: replace {, total_}link_count in task_struct with pointer to nameidata Al Viro
                   ` (24 subsequent siblings)
  25 siblings, 1 reply; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/staging/lustre/lustre/llite/symlink.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index 3711e67..0615f86 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -126,18 +126,9 @@ static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
 	char *symname = NULL;
 
 	CDEBUG(D_VFSTRACE, "VFS Op\n");
-	/* Limit the recursive symlink depth to 5 instead of default
-	 * 8 links when kernel has 4k stack to prevent stack overflow.
-	 * For 8k stacks we need to limit it to 7 for local servers. */
-	if (THREAD_SIZE < 8192 && current->link_count >= 6) {
-		rc = -ELOOP;
-	} else if (THREAD_SIZE == 8192 && current->link_count >= 8) {
-		rc = -ELOOP;
-	} else {
-		ll_inode_size_lock(inode);
-		rc = ll_readlink_internal(inode, &request, &symname);
-		ll_inode_size_unlock(inode);
-	}
+	ll_inode_size_lock(inode);
+	rc = ll_readlink_internal(inode, &request, &symname);
+	ll_inode_size_unlock(inode);
 	if (rc) {
 		ptlrpc_req_finished(request);
 		request = NULL;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 02/24] VFS: replace {, total_}link_count in task_struct with pointer to nameidata
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
  2015-04-20 18:12 ` [PATCH 01/24] lustre: rip the private symlink nesting limit out Al Viro
@ 2015-04-20 18:12 ` Al Viro
  2015-04-20 18:12 ` [PATCH 03/24] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link Al Viro
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: NeilBrown <neilb@suse.de>

task_struct currently contains two ad-hoc members for use by
the VFS: link_count and total_link_count.
These are only interesting to fs/namei.c, so exposing them
explicitly is poor layering.

This patches replaces those with a single pointer to 'struct
nameidata'.
This structure represents the current filename lookup of which
there can only be one per process, and is a natural place to
store link_count and total_link_count.

This will allow the current "nameidata" argument to all
follow_link operations to be removed as current->nameidata
can be used instead.

As there are occasional circumstances where pathname lookup can
recurse, such as through kern_path_locked, we always save and old
current->nameidata (if there is one) when setting a new value, and
make sure any active link_counts are preserved.

follow_mount and follow_automount now get a 'struct nameidata *'
rather than 'int flags' so that they can directly access
link_count and total_link_count, rather than going through 'current'.

Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c            | 79 +++++++++++++++++++++++++++++++++++----------------
 include/linux/sched.h |  2 +-
 2 files changed, 55 insertions(+), 26 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ffab2e0..bdd5067 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -501,10 +501,29 @@ struct nameidata {
 	unsigned	seq, m_seq;
 	int		last_type;
 	unsigned	depth;
+	int		link_count,
+			total_link_count;
 	struct file	*base;
 	char *saved_names[MAX_NESTED_LINKS + 1];
 };
 
+static struct nameidata *set_nameidata(struct nameidata *p)
+{
+	struct nameidata *old = current->nameidata;
+
+	current->nameidata = p;
+	if (p) {
+		if (!old) {
+			p->link_count = 0;
+			p->total_link_count = 0;
+		} else {
+			p->link_count = old->link_count;
+			p->total_link_count = old->total_link_count;
+		}
+	}
+	return old;
+}
+
 /*
  * Path walking has 2 modes, rcu-walk and ref-walk (see
  * Documentation/filesystems/path-lookup.txt).  In situations when we can't
@@ -862,11 +881,11 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 		mntget(link->mnt);
 
 	error = -ELOOP;
-	if (unlikely(current->total_link_count >= 40))
+	if (unlikely(nd->total_link_count >= 40))
 		goto out_put_nd_path;
 
 	cond_resched();
-	current->total_link_count++;
+	nd->total_link_count++;
 
 	touch_atime(link);
 	nd_set_link(nd, NULL);
@@ -965,7 +984,7 @@ EXPORT_SYMBOL(follow_up);
  * - return -EISDIR to tell follow_managed() to stop and return the path we
  *   were called with.
  */
-static int follow_automount(struct path *path, unsigned flags,
+static int follow_automount(struct path *path, struct nameidata *nd,
 			    bool *need_mntput)
 {
 	struct vfsmount *mnt;
@@ -985,13 +1004,13 @@ static int follow_automount(struct path *path, unsigned flags,
 	 * as being automount points.  These will need the attentions
 	 * of the daemon to instantiate them before they can be used.
 	 */
-	if (!(flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
-		     LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
+	if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
+			   LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
 	    path->dentry->d_inode)
 		return -EISDIR;
 
-	current->total_link_count++;
-	if (current->total_link_count >= 40)
+	nd->total_link_count++;
+	if (nd->total_link_count >= 40)
 		return -ELOOP;
 
 	mnt = path->dentry->d_op->d_automount(path);
@@ -1005,7 +1024,7 @@ static int follow_automount(struct path *path, unsigned flags,
 		 * the path being looked up; if it wasn't then the remainder of
 		 * the path is inaccessible and we should say so.
 		 */
-		if (PTR_ERR(mnt) == -EISDIR && (flags & LOOKUP_PARENT))
+		if (PTR_ERR(mnt) == -EISDIR && (nd->flags & LOOKUP_PARENT))
 			return -EREMOTE;
 		return PTR_ERR(mnt);
 	}
@@ -1045,7 +1064,7 @@ static int follow_automount(struct path *path, unsigned flags,
  *
  * Serialization is taken care of in namespace.c
  */
-static int follow_managed(struct path *path, unsigned flags)
+static int follow_managed(struct path *path, struct nameidata *nd)
 {
 	struct vfsmount *mnt = path->mnt; /* held by caller, must be left alone */
 	unsigned managed;
@@ -1089,7 +1108,7 @@ static int follow_managed(struct path *path, unsigned flags)
 
 		/* Handle an automount point */
 		if (managed & DCACHE_NEED_AUTOMOUNT) {
-			ret = follow_automount(path, flags, &need_mntput);
+			ret = follow_automount(path, nd, &need_mntput);
 			if (ret < 0)
 				break;
 			continue;
@@ -1474,7 +1493,7 @@ unlazy:
 
 	path->mnt = mnt;
 	path->dentry = dentry;
-	err = follow_managed(path, nd->flags);
+	err = follow_managed(path, nd);
 	if (unlikely(err < 0)) {
 		path_put_conditional(path, nd);
 		return err;
@@ -1504,7 +1523,7 @@ static int lookup_slow(struct nameidata *nd, struct path *path)
 		return PTR_ERR(dentry);
 	path->mnt = nd->path.mnt;
 	path->dentry = dentry;
-	err = follow_managed(path, nd->flags);
+	err = follow_managed(path, nd);
 	if (unlikely(err < 0)) {
 		path_put_conditional(path, nd);
 		return err;
@@ -1620,7 +1639,7 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
 {
 	int res;
 
-	if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
+	if (unlikely(nd->link_count >= MAX_NESTED_LINKS)) {
 		path_put_conditional(path, nd);
 		path_put(&nd->path);
 		return -ELOOP;
@@ -1628,7 +1647,7 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
 	BUG_ON(nd->depth >= MAX_NESTED_LINKS);
 
 	nd->depth++;
-	current->link_count++;
+	nd->link_count++;
 
 	do {
 		struct path link = *path;
@@ -1641,7 +1660,7 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
 		put_link(nd, &link, cookie);
 	} while (res > 0);
 
-	current->link_count--;
+	nd->link_count--;
 	nd->depth--;
 	return res;
 }
@@ -1948,7 +1967,7 @@ static int path_init(int dfd, const struct filename *name, unsigned int flags,
 	rcu_read_unlock();
 	return -ECHILD;
 done:
-	current->total_link_count = 0;
+	nd->total_link_count = 0;
 	return link_path_walk(s, nd);
 }
 
@@ -2027,7 +2046,10 @@ static int path_lookupat(int dfd, const struct filename *name,
 static int filename_lookup(int dfd, struct filename *name,
 				unsigned int flags, struct nameidata *nd)
 {
-	int retval = path_lookupat(dfd, name, flags | LOOKUP_RCU, nd);
+	int retval;
+	struct nameidata *saved_nd = set_nameidata(nd);
+
+	retval = path_lookupat(dfd, name, flags | LOOKUP_RCU, nd);
 	if (unlikely(retval == -ECHILD))
 		retval = path_lookupat(dfd, name, flags, nd);
 	if (unlikely(retval == -ESTALE))
@@ -2035,6 +2057,7 @@ static int filename_lookup(int dfd, struct filename *name,
 
 	if (likely(!retval))
 		audit_inode(name, nd->path.dentry, flags & LOOKUP_PARENT);
+	set_nameidata(saved_nd);
 	return retval;
 }
 
@@ -2341,7 +2364,7 @@ static int
 path_mountpoint(int dfd, const struct filename *name, struct path *path,
 		unsigned int flags)
 {
-	struct nameidata nd;
+	struct nameidata nd, *saved = set_nameidata(&nd);
 	int err;
 
 	err = path_init(dfd, name, flags, &nd);
@@ -2364,6 +2387,7 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
 	}
 out:
 	path_cleanup(&nd);
+	set_nameidata(saved);
 	return err;
 }
 
@@ -3026,7 +3050,7 @@ retry_lookup:
 	if ((open_flag & (O_EXCL | O_CREAT)) == (O_EXCL | O_CREAT))
 		goto exit_dput;
 
-	error = follow_managed(path, nd->flags);
+	error = follow_managed(path, nd);
 	if (error < 0)
 		goto exit_dput;
 
@@ -3215,12 +3239,14 @@ static struct file *path_openat(int dfd, struct filename *pathname,
 	struct path path;
 	int opened = 0;
 	int error;
+	struct nameidata *saved_nd;
 
 	file = get_empty_filp();
 	if (IS_ERR(file))
 		return file;
 
 	file->f_flags = op->open_flag;
+	saved_nd = set_nameidata(nd);
 
 	if (unlikely(file->f_flags & __O_TMPFILE)) {
 		error = do_tmpfile(dfd, pathname, nd, flags, op, file, &opened);
@@ -3267,6 +3293,7 @@ out:
 		}
 		file = ERR_PTR(error);
 	}
+	set_nameidata(saved_nd);
 	return file;
 }
 
@@ -4427,18 +4454,20 @@ EXPORT_SYMBOL(readlink_copy);
  */
 int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 {
-	struct nameidata nd;
+	struct nameidata nd, *saved = set_nameidata(&nd);
 	void *cookie;
 	int res;
 
 	nd.depth = 0;
 	cookie = dentry->d_inode->i_op->follow_link(dentry, &nd);
 	if (IS_ERR(cookie))
-		return PTR_ERR(cookie);
-
-	res = readlink_copy(buffer, buflen, nd_get_link(&nd));
-	if (dentry->d_inode->i_op->put_link)
-		dentry->d_inode->i_op->put_link(dentry, &nd, cookie);
+		res = PTR_ERR(cookie);
+	else {
+		res = readlink_copy(buffer, buflen, nd_get_link(&nd));
+		if (dentry->d_inode->i_op->put_link)
+			dentry->d_inode->i_op->put_link(dentry, &nd, cookie);
+	}
+	set_nameidata(saved);
 	return res;
 }
 EXPORT_SYMBOL(generic_readlink);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a419b65..4402c91 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1447,7 +1447,7 @@ struct task_struct {
 				       it with task_lock())
 				     - initialized normally by setup_new_exec */
 /* file system info */
-	int link_count, total_link_count;
+	struct nameidata *nameidata;
 #ifdef CONFIG_SYSVIPC
 /* ipc stuff */
 	struct sysv_sem sysvsem;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 03/24] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
  2015-04-20 18:12 ` [PATCH 01/24] lustre: rip the private symlink nesting limit out Al Viro
  2015-04-20 18:12 ` [PATCH 02/24] VFS: replace {, total_}link_count in task_struct with pointer to nameidata Al Viro
@ 2015-04-20 18:12 ` Al Viro
  2015-04-20 18:12 ` [PATCH 04/24] VFS: replace nameidata arg to ->put_link with a char* Al Viro
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: NeilBrown <neilb@suse.de>

ovl_follow_link current calls ->put_link on an error path.
However ->put_link is about to change in a way that it will be
impossible to call it from ovl_follow_link.

So rearrange the code to avoid the need for that error path.
Specifically: move the kmalloc() call before the ->follow_link()
call to the subordinate filesystem.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/overlayfs/inode.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 04f1248..1b4b9c5e 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -145,6 +145,7 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
 	void *ret;
 	struct dentry *realdentry;
 	struct inode *realinode;
+	struct ovl_link_data *data = NULL;
 
 	realdentry = ovl_dentry_real(dentry);
 	realinode = realdentry->d_inode;
@@ -152,25 +153,23 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
 	if (WARN_ON(!realinode->i_op->follow_link))
 		return ERR_PTR(-EPERM);
 
-	ret = realinode->i_op->follow_link(realdentry, nd);
-	if (IS_ERR(ret))
-		return ret;
-
 	if (realinode->i_op->put_link) {
-		struct ovl_link_data *data;
-
 		data = kmalloc(sizeof(struct ovl_link_data), GFP_KERNEL);
-		if (!data) {
-			realinode->i_op->put_link(realdentry, nd, ret);
+		if (!data)
 			return ERR_PTR(-ENOMEM);
-		}
 		data->realdentry = realdentry;
-		data->cookie = ret;
+	}
 
-		return data;
-	} else {
-		return NULL;
+	ret = realinode->i_op->follow_link(realdentry, nd);
+	if (IS_ERR(ret)) {
+		kfree(data);
+		return ret;
 	}
+
+	if (data)
+		data->cookie = ret;
+
+	return data;
 }
 
 static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 04/24] VFS: replace nameidata arg to ->put_link with a char*.
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (2 preceding siblings ...)
  2015-04-20 18:12 ` [PATCH 03/24] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link Al Viro
@ 2015-04-20 18:12 ` Al Viro
  2015-04-20 18:12 ` [PATCH 05/24] SECURITY: remove nameidata arg from inode_follow_link Al Viro
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: NeilBrown <neilb@suse.de>

The only thing any ->put_link() function did with the
nameidata was to call nd_get_link() to get the link name.

So now just pass the link name directly.

This allows us to make nd_get_link() completely local to
namei.c.

Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 Documentation/filesystems/Locking             |  2 +-
 Documentation/filesystems/porting             |  5 +++++
 Documentation/filesystems/vfs.txt             |  4 ++--
 drivers/staging/lustre/lustre/llite/symlink.c |  2 +-
 fs/configfs/symlink.c                         |  2 +-
 fs/fuse/dir.c                                 |  4 ++--
 fs/hostfs/hostfs_kern.c                       |  3 +--
 fs/hppfs/hppfs.c                              |  5 ++---
 fs/kernfs/symlink.c                           |  3 +--
 fs/libfs.c                                    |  4 +---
 fs/namei.c                                    | 13 +++++++------
 fs/overlayfs/inode.c                          |  4 ++--
 fs/proc/inode.c                               |  2 +-
 include/linux/fs.h                            |  6 +++---
 include/linux/namei.h                         |  1 -
 mm/shmem.c                                    |  4 ++--
 16 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 7cdbca4..3fc99e0 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -51,7 +51,7 @@ prototypes:
 			struct inode *, struct dentry *, unsigned int);
 	int (*readlink) (struct dentry *, char __user *,int);
 	void * (*follow_link) (struct dentry *, struct nameidata *);
-	void (*put_link) (struct dentry *, struct nameidata *, void *);
+	void (*put_link) (struct dentry *, char *, void *);
 	void (*truncate) (struct inode *);
 	int (*permission) (struct inode *, int, unsigned int);
 	int (*get_acl)(struct inode *, int);
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index e69274d..c1ee367 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -483,3 +483,8 @@ in your dentry operations instead.
 --
 [mandatory]
 	->aio_read/->aio_write are gone.  Use ->read_iter/->write_iter.
+--
+[mandatory]
+	->put_link now takes a 'char *' rather than a 'struct nameidata*'.
+	Instead of calling nd_get_link() on the later, just use the former
+	directly.
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 5d833b3..7bba952 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -350,8 +350,8 @@ struct inode_operations {
 	int (*rename2) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *, unsigned int);
 	int (*readlink) (struct dentry *, char __user *,int);
-        void * (*follow_link) (struct dentry *, struct nameidata *);
-        void (*put_link) (struct dentry *, struct nameidata *, void *);
+	void * (*follow_link) (struct dentry *, struct nameidata *);
+	void (*put_link) (struct dentry *, char *, void *);
 	int (*permission) (struct inode *, int);
 	int (*get_acl)(struct inode *, int);
 	int (*setattr) (struct dentry *, struct iattr *);
diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index 0615f86..a0bebc3 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -142,7 +142,7 @@ static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return request;
 }
 
-static void ll_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+static void ll_put_link(struct dentry *dentry, char *link, void *cookie)
 {
 	ptlrpc_req_finished(cookie);
 }
diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index cc9f254..e860ddb 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -296,7 +296,7 @@ static void *configfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return NULL;
 }
 
-static void configfs_put_link(struct dentry *dentry, struct nameidata *nd,
+static void configfs_put_link(struct dentry *dentry, char *link,
 			      void *cookie)
 {
 	if (cookie) {
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 0572bca..1453c6f 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1406,9 +1406,9 @@ static void *fuse_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return NULL;
 }
 
-static void fuse_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
+static void fuse_put_link(struct dentry *dentry, char *link, void *c)
 {
-	free_link(nd_get_link(nd));
+	free_link(link);
 }
 
 static int fuse_dir_open(struct inode *inode, struct file *file)
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index b827aa4..5f40dfa 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -904,9 +904,8 @@ static void *hostfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return NULL;
 }
 
-static void hostfs_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+static void hostfs_put_link(struct dentry *dentry, char *s, void *cookie)
 {
-	char *s = nd_get_link(nd);
 	if (!IS_ERR(s))
 		__putname(s);
 }
diff --git a/fs/hppfs/hppfs.c b/fs/hppfs/hppfs.c
index fa2bd53..237907e 100644
--- a/fs/hppfs/hppfs.c
+++ b/fs/hppfs/hppfs.c
@@ -649,13 +649,12 @@ static void *hppfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return d_inode(proc_dentry)->i_op->follow_link(proc_dentry, nd);
 }
 
-static void hppfs_put_link(struct dentry *dentry, struct nameidata *nd,
-			   void *cookie)
+static void hppfs_put_link(struct dentry *dentry, char *link, void *cookie)
 {
 	struct dentry *proc_dentry = HPPFS_I(d_inode(dentry))->proc_dentry;
 
 	if (d_inode(proc_dentry)->i_op->put_link)
-		d_inode(proc_dentry)->i_op->put_link(proc_dentry, nd, cookie);
+		d_inode(proc_dentry)->i_op->put_link(proc_dentry, link, cookie);
 }
 
 static const struct inode_operations hppfs_dir_iops = {
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 8a19889..2aa55cb 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -125,10 +125,9 @@ static void *kernfs_iop_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return NULL;
 }
 
-static void kernfs_iop_put_link(struct dentry *dentry, struct nameidata *nd,
+static void kernfs_iop_put_link(struct dentry *dentry, char *page,
 				void *cookie)
 {
-	char *page = nd_get_link(nd);
 	if (!IS_ERR(page))
 		free_page((unsigned long)page);
 }
diff --git a/fs/libfs.c b/fs/libfs.c
index cb1fb4b..fe6041a 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1024,10 +1024,8 @@ int noop_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 }
 EXPORT_SYMBOL(noop_fsync);
 
-void kfree_put_link(struct dentry *dentry, struct nameidata *nd,
-				void *cookie)
+void kfree_put_link(struct dentry *dentry, char *s, void *cookie)
 {
-	char *s = nd_get_link(nd);
 	if (!IS_ERR(s))
 		kfree(s);
 }
diff --git a/fs/namei.c b/fs/namei.c
index bdd5067..3b52954 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -738,17 +738,16 @@ void nd_set_link(struct nameidata *nd, char *path)
 }
 EXPORT_SYMBOL(nd_set_link);
 
-char *nd_get_link(struct nameidata *nd)
+static inline char *nd_get_link(struct nameidata *nd)
 {
 	return nd->saved_names[nd->depth];
 }
-EXPORT_SYMBOL(nd_get_link);
 
 static inline void put_link(struct nameidata *nd, struct path *link, void *cookie)
 {
 	struct inode *inode = link->dentry->d_inode;
 	if (inode->i_op->put_link)
-		inode->i_op->put_link(link->dentry, nd, cookie);
+		inode->i_op->put_link(link->dentry, nd_get_link(nd), cookie);
 	path_put(link);
 }
 
@@ -4463,9 +4462,11 @@ int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 	if (IS_ERR(cookie))
 		res = PTR_ERR(cookie);
 	else {
-		res = readlink_copy(buffer, buflen, nd_get_link(&nd));
+		char *link = nd_get_link(&nd);
+
+		res = readlink_copy(buffer, buflen, link);
 		if (dentry->d_inode->i_op->put_link)
-			dentry->d_inode->i_op->put_link(dentry, &nd, cookie);
+			dentry->d_inode->i_op->put_link(dentry, link, cookie);
 	}
 	set_nameidata(saved);
 	return res;
@@ -4507,7 +4508,7 @@ void *page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
 }
 EXPORT_SYMBOL(page_follow_link_light);
 
-void page_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+void page_put_link(struct dentry *dentry, char *link, void *cookie)
 {
 	struct page *page = cookie;
 
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 1b4b9c5e..f1abb51 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -172,7 +172,7 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return data;
 }
 
-static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
+static void ovl_put_link(struct dentry *dentry, char *link, void *c)
 {
 	struct inode *realinode;
 	struct ovl_link_data *data = c;
@@ -181,7 +181,7 @@ static void ovl_put_link(struct dentry *dentry, struct nameidata *nd, void *c)
 		return;
 
 	realinode = data->realdentry->d_inode;
-	realinode->i_op->put_link(data->realdentry, nd, data->cookie);
+	realinode->i_op->put_link(data->realdentry, link, data->cookie);
 	kfree(data);
 }
 
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 8272aab..27fd86d 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -403,7 +403,7 @@ static void *proc_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return pde;
 }
 
-static void proc_put_link(struct dentry *dentry, struct nameidata *nd, void *p)
+static void proc_put_link(struct dentry *dentry, char *link, void *p)
 {
 	unuse_pde(p);
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4356560..d407d59 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1604,7 +1604,7 @@ struct inode_operations {
 	struct posix_acl * (*get_acl)(struct inode *, int);
 
 	int (*readlink) (struct dentry *, char __user *,int);
-	void (*put_link) (struct dentry *, struct nameidata *, void *);
+	void (*put_link) (struct dentry *, char *, void *);
 
 	int (*create) (struct inode *,struct dentry *, umode_t, bool);
 	int (*link) (struct dentry *,struct inode *,struct dentry *);
@@ -2696,12 +2696,12 @@ extern const struct file_operations generic_ro_fops;
 extern int readlink_copy(char __user *, int, const char *);
 extern int page_readlink(struct dentry *, char __user *, int);
 extern void *page_follow_link_light(struct dentry *, struct nameidata *);
-extern void page_put_link(struct dentry *, struct nameidata *, void *);
+extern void page_put_link(struct dentry *, char *, void *);
 extern int __page_symlink(struct inode *inode, const char *symname, int len,
 		int nofs);
 extern int page_symlink(struct inode *inode, const char *symname, int len);
 extern const struct inode_operations page_symlink_inode_operations;
-extern void kfree_put_link(struct dentry *, struct nameidata *, void *);
+extern void kfree_put_link(struct dentry *, char *, void *);
 extern int generic_readlink(struct dentry *, char __user *, int);
 extern void generic_fillattr(struct inode *, struct kstat *);
 int vfs_getattr_nosec(struct path *path, struct kstat *stat);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index c899077..2ec27f2 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -72,7 +72,6 @@ extern void unlock_rename(struct dentry *, struct dentry *);
 
 extern void nd_jump_link(struct nameidata *nd, struct path *path);
 extern void nd_set_link(struct nameidata *nd, char *path);
-extern char *nd_get_link(struct nameidata *nd);
 
 static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
 {
diff --git a/mm/shmem.c b/mm/shmem.c
index de98137..00707f8 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2490,9 +2490,9 @@ static void *shmem_follow_link(struct dentry *dentry, struct nameidata *nd)
 	return page;
 }
 
-static void shmem_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
+static void shmem_put_link(struct dentry *dentry, char *link, void *cookie)
 {
-	if (!IS_ERR(nd_get_link(nd))) {
+	if (!IS_ERR(link)) {
 		struct page *page = cookie;
 		kunmap(page);
 		mark_page_accessed(page);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 05/24] SECURITY: remove nameidata arg from inode_follow_link.
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (3 preceding siblings ...)
  2015-04-20 18:12 ` [PATCH 04/24] VFS: replace nameidata arg to ->put_link with a char* Al Viro
@ 2015-04-20 18:12 ` Al Viro
  2015-04-20 18:12 ` [PATCH 06/24] VFS: remove nameidata args from ->follow_link Al Viro
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: NeilBrown <neilb@suse.de>

No ->inode_follow_link() methods use the nameidata arg, and
it is about to become private to namei.c.
So remove from all inode_follow_link() functions.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c               | 2 +-
 include/linux/security.h | 9 +++------
 security/capability.c    | 3 +--
 security/security.c      | 4 ++--
 security/selinux/hooks.c | 2 +-
 5 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3b52954..981080c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -889,7 +889,7 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	touch_atime(link);
 	nd_set_link(nd, NULL);
 
-	error = security_inode_follow_link(link->dentry, nd);
+	error = security_inode_follow_link(link->dentry);
 	if (error)
 		goto out_put_nd_path;
 
diff --git a/include/linux/security.h b/include/linux/security.h
index 18264ea..62a6620 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -43,7 +43,6 @@ struct file;
 struct vfsmount;
 struct path;
 struct qstr;
-struct nameidata;
 struct iattr;
 struct fown_struct;
 struct file_operations;
@@ -477,7 +476,6 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
  * @inode_follow_link:
  *	Check permission to follow a symbolic link when looking up a pathname.
  *	@dentry contains the dentry structure for the link.
- *	@nd contains the nameidata structure for the parent directory.
  *	Return 0 if permission is granted.
  * @inode_permission:
  *	Check permission before accessing an inode.  This hook is called by the
@@ -1553,7 +1551,7 @@ struct security_operations {
 	int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry,
 			     struct inode *new_dir, struct dentry *new_dentry);
 	int (*inode_readlink) (struct dentry *dentry);
-	int (*inode_follow_link) (struct dentry *dentry, struct nameidata *nd);
+	int (*inode_follow_link) (struct dentry *dentry);
 	int (*inode_permission) (struct inode *inode, int mask);
 	int (*inode_setattr)	(struct dentry *dentry, struct iattr *attr);
 	int (*inode_getattr) (const struct path *path);
@@ -1839,7 +1837,7 @@ int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry,
 			  struct inode *new_dir, struct dentry *new_dentry,
 			  unsigned int flags);
 int security_inode_readlink(struct dentry *dentry);
-int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd);
+int security_inode_follow_link(struct dentry *dentry);
 int security_inode_permission(struct inode *inode, int mask);
 int security_inode_setattr(struct dentry *dentry, struct iattr *attr);
 int security_inode_getattr(const struct path *path);
@@ -2241,8 +2239,7 @@ static inline int security_inode_readlink(struct dentry *dentry)
 	return 0;
 }
 
-static inline int security_inode_follow_link(struct dentry *dentry,
-					      struct nameidata *nd)
+static inline int security_inode_follow_link(struct dentry *dentry)
 {
 	return 0;
 }
diff --git a/security/capability.c b/security/capability.c
index 0d03fcc..fae99db 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -209,8 +209,7 @@ static int cap_inode_readlink(struct dentry *dentry)
 	return 0;
 }
 
-static int cap_inode_follow_link(struct dentry *dentry,
-				 struct nameidata *nameidata)
+static int cap_inode_follow_link(struct dentry *dentry)
 {
 	return 0;
 }
diff --git a/security/security.c b/security/security.c
index 8e9b1f4..d7c30b0 100644
--- a/security/security.c
+++ b/security/security.c
@@ -581,11 +581,11 @@ int security_inode_readlink(struct dentry *dentry)
 	return security_ops->inode_readlink(dentry);
 }
 
-int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd)
+int security_inode_follow_link(struct dentry *dentry)
 {
 	if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
 		return 0;
-	return security_ops->inode_follow_link(dentry, nd);
+	return security_ops->inode_follow_link(dentry);
 }
 
 int security_inode_permission(struct inode *inode, int mask)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 7dade28..8016229 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2861,7 +2861,7 @@ static int selinux_inode_readlink(struct dentry *dentry)
 	return dentry_has_perm(cred, dentry, FILE__READ);
 }
 
-static int selinux_inode_follow_link(struct dentry *dentry, struct nameidata *nameidata)
+static int selinux_inode_follow_link(struct dentry *dentry)
 {
 	const struct cred *cred = current_cred();
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 06/24] VFS: remove nameidata args from ->follow_link
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (4 preceding siblings ...)
  2015-04-20 18:12 ` [PATCH 05/24] SECURITY: remove nameidata arg from inode_follow_link Al Viro
@ 2015-04-20 18:12 ` Al Viro
  2015-04-20 18:12 ` [PATCH 07/24] namei: expand nested_symlink() in its only caller Al Viro
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: NeilBrown <neilb@suse.de>

Now that current->nameidata is available, nd_set_link() can
use that directly, so 'nd' doesn't need to be passed through
->follow_link.

As a result of this change, 'nameidata' is almost entirely
local to namei.c.  It is only exposed externally as an opaque struct
pointed to by current->nameidata.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 Documentation/filesystems/Locking             |  2 +-
 Documentation/filesystems/porting             |  5 +++++
 Documentation/filesystems/vfs.txt             |  2 +-
 drivers/staging/lustre/lustre/llite/symlink.c |  4 ++--
 fs/9p/vfs_inode.c                             |  5 ++---
 fs/9p/vfs_inode_dotl.c                        |  5 ++---
 fs/autofs4/symlink.c                          |  4 ++--
 fs/befs/linuxvfs.c                            | 12 ++++++------
 fs/ceph/inode.c                               |  4 ++--
 fs/cifs/cifsfs.h                              |  2 +-
 fs/cifs/link.c                                |  4 ++--
 fs/configfs/symlink.c                         |  6 +++---
 fs/debugfs/file.c                             |  4 ++--
 fs/ecryptfs/inode.c                           |  6 ++----
 fs/exofs/symlink.c                            |  4 ++--
 fs/ext2/symlink.c                             |  4 ++--
 fs/ext3/symlink.c                             |  4 ++--
 fs/ext4/symlink.c                             |  4 ++--
 fs/freevxfs/vxfs_immed.c                      |  7 +++----
 fs/fuse/dir.c                                 |  4 ++--
 fs/gfs2/inode.c                               |  7 +++----
 fs/hostfs/hostfs_kern.c                       |  4 ++--
 fs/hppfs/hppfs.c                              |  4 ++--
 fs/jffs2/symlink.c                            |  6 +++---
 fs/jfs/symlink.c                              |  4 ++--
 fs/kernfs/symlink.c                           |  4 ++--
 fs/namei.c                                    | 18 +++++++++++-------
 fs/nfs/symlink.c                              |  6 +++---
 fs/ntfs/namei.c                               |  1 -
 fs/overlayfs/inode.c                          |  4 ++--
 fs/proc/base.c                                |  4 ++--
 fs/proc/inode.c                               |  4 ++--
 fs/proc/namespaces.c                          |  4 ++--
 fs/proc/self.c                                |  4 ++--
 fs/proc/thread_self.c                         |  4 ++--
 fs/sysv/symlink.c                             |  4 ++--
 fs/ubifs/file.c                               |  4 ++--
 fs/ufs/symlink.c                              |  4 ++--
 fs/xfs/xfs_iops.c                             |  7 +++----
 include/linux/fs.h                            |  5 ++---
 include/linux/namei.h                         |  5 ++---
 include/linux/sched.h                         |  1 +
 mm/shmem.c                                    |  8 ++++----
 43 files changed, 104 insertions(+), 104 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 3fc99e0..e4a0c36 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -50,7 +50,7 @@ prototypes:
 	int (*rename2) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *, unsigned int);
 	int (*readlink) (struct dentry *, char __user *,int);
-	void * (*follow_link) (struct dentry *, struct nameidata *);
+	void * (*follow_link) (struct dentry *);
 	void (*put_link) (struct dentry *, char *, void *);
 	void (*truncate) (struct inode *);
 	int (*permission) (struct inode *, int, unsigned int);
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index c1ee367..08f8a27 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -488,3 +488,8 @@ in your dentry operations instead.
 	->put_link now takes a 'char *' rather than a 'struct nameidata*'.
 	Instead of calling nd_get_link() on the later, just use the former
 	directly.
+--
+[mandatory]
+	->follow_link() no longer receives 'struct nameidata *'.
+	The nd is now attached to 'current' and nd_set_link()
+	accesses it directly.
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 7bba952..807bd4b 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -350,7 +350,7 @@ struct inode_operations {
 	int (*rename2) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *, unsigned int);
 	int (*readlink) (struct dentry *, char __user *,int);
-	void * (*follow_link) (struct dentry *, struct nameidata *);
+        void * (*follow_link) (struct dentry *);
 	void (*put_link) (struct dentry *, char *, void *);
 	int (*permission) (struct inode *, int);
 	int (*get_acl)(struct inode *, int);
diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index a0bebc3..d71c2a9 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -118,7 +118,7 @@ failed:
 	return rc;
 }
 
-static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ll_follow_link(struct dentry *dentry)
 {
 	struct inode *inode = d_inode(dentry);
 	struct ptlrpc_request *request = NULL;
@@ -135,7 +135,7 @@ static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
 		symname = ERR_PTR(rc);
 	}
 
-	nd_set_link(nd, symname);
+	nd_set_link(symname);
 	/* symname may contain a pointer to the request message buffer,
 	 * we delay request releasing until ll_put_link then.
 	 */
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 0ba1171..2895295 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1226,11 +1226,10 @@ ino_t v9fs_qid2ino(struct p9_qid *qid)
 /**
  * v9fs_vfs_follow_link - follow a symlink path
  * @dentry: dentry for symlink
- * @nd: nameidata
  *
  */
 
-static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *v9fs_vfs_follow_link(struct dentry *dentry)
 {
 	struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
 	struct p9_fid *fid = v9fs_fid_lookup(dentry);
@@ -1256,7 +1255,7 @@ static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	if (strlen(st->extension) >= PATH_MAX)
 		st->extension[PATH_MAX - 1] = '\0';
 
-	nd_set_link(nd, st->extension);
+	nd_set_link(st->extension);
 	st->extension = NULL;
 	p9stat_free(st);
 	kfree(st);
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index bc2a91f..577af9b 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -905,12 +905,11 @@ error:
 /**
  * v9fs_vfs_follow_link_dotl - follow a symlink path
  * @dentry: dentry for symlink
- * @nd: nameidata
  *
  */
 
 static void *
-v9fs_vfs_follow_link_dotl(struct dentry *dentry, struct nameidata *nd)
+v9fs_vfs_follow_link_dotl(struct dentry *dentry)
 {
 	struct p9_fid *fid = v9fs_fid_lookup(dentry);
 	char *target;
@@ -923,7 +922,7 @@ v9fs_vfs_follow_link_dotl(struct dentry *dentry, struct nameidata *nd)
 	retval = p9_client_readlink(fid, &target);
 	if (retval)
 		return ERR_PTR(retval);
-	nd_set_link(nd, target);
+	nd_set_link(target);
 	return NULL;
 }
 
diff --git a/fs/autofs4/symlink.c b/fs/autofs4/symlink.c
index de58cc7..d6dd4c5 100644
--- a/fs/autofs4/symlink.c
+++ b/fs/autofs4/symlink.c
@@ -12,13 +12,13 @@
 
 #include "autofs_i.h"
 
-static void *autofs4_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *autofs4_follow_link(struct dentry *dentry)
 {
 	struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
 	struct autofs_info *ino = autofs4_dentry_ino(dentry);
 	if (ino && !autofs4_oz_mode(sbi))
 		ino->last_used = jiffies;
-	nd_set_link(nd, d_inode(dentry)->i_private);
+	nd_set_link(d_inode(dentry)->i_private);
 	return NULL;
 }
 
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index d3cb877..0714644 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -42,8 +42,8 @@ static struct inode *befs_iget(struct super_block *, unsigned long);
 static struct inode *befs_alloc_inode(struct super_block *sb);
 static void befs_destroy_inode(struct inode *inode);
 static void befs_destroy_inodecache(void);
-static void *befs_follow_link(struct dentry *, struct nameidata *);
-static void *befs_fast_follow_link(struct dentry *, struct nameidata *);
+static void *befs_follow_link(struct dentry *);
+static void *befs_fast_follow_link(struct dentry *);
 static int befs_utf2nls(struct super_block *sb, const char *in, int in_len,
 			char **out, int *out_len);
 static int befs_nls2utf(struct super_block *sb, const char *in, int in_len,
@@ -469,7 +469,7 @@ befs_destroy_inodecache(void)
  * flag is set.
  */
 static void *
-befs_follow_link(struct dentry *dentry, struct nameidata *nd)
+befs_follow_link(struct dentry *dentry)
 {
 	struct super_block *sb = dentry->d_sb;
 	befs_inode_info *befs_ino = BEFS_I(d_inode(dentry));
@@ -494,16 +494,16 @@ befs_follow_link(struct dentry *dentry, struct nameidata *nd)
 			link[len - 1] = '\0';
 		}
 	}
-	nd_set_link(nd, link);
+	nd_set_link(link);
 	return NULL;
 }
 
 
 static void *
-befs_fast_follow_link(struct dentry *dentry, struct nameidata *nd)
+befs_fast_follow_link(struct dentry *dentry)
 {
 	befs_inode_info *befs_ino = BEFS_I(d_inode(dentry));
-	nd_set_link(nd, befs_ino->i_data.symlink);
+	nd_set_link(befs_ino->i_data.symlink);
 	return NULL;
 }
 
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index e876e19..7dd8f1c 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1691,10 +1691,10 @@ retry:
 /*
  * symlinks
  */
-static void *ceph_sym_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ceph_sym_follow_link(struct dentry *dentry)
 {
 	struct ceph_inode_info *ci = ceph_inode(d_inode(dentry));
-	nd_set_link(nd, ci->i_symlink);
+	nd_set_link(ci->i_symlink);
 	return NULL;
 }
 
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 252f5c1..e3a6ef5 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -120,7 +120,7 @@ extern struct vfsmount *cifs_dfs_d_automount(struct path *path);
 #endif
 
 /* Functions related to symlinks */
-extern void *cifs_follow_link(struct dentry *direntry, struct nameidata *nd);
+extern void *cifs_follow_link(struct dentry *direntry);
 extern int cifs_readlink(struct dentry *direntry, char __user *buffer,
 			 int buflen);
 extern int cifs_symlink(struct inode *inode, struct dentry *direntry,
diff --git a/fs/cifs/link.c b/fs/cifs/link.c
index 252e672..470666c 100644
--- a/fs/cifs/link.c
+++ b/fs/cifs/link.c
@@ -627,7 +627,7 @@ cifs_hl_exit:
 }
 
 void *
-cifs_follow_link(struct dentry *direntry, struct nameidata *nd)
+cifs_follow_link(struct dentry *direntry)
 {
 	struct inode *inode = d_inode(direntry);
 	int rc = -ENOMEM;
@@ -679,7 +679,7 @@ out:
 	free_xid(xid);
 	if (tlink)
 		cifs_put_tlink(tlink);
-	nd_set_link(nd, target_path);
+	nd_set_link(target_path);
 	return NULL;
 }
 
diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index e860ddb..ff41712 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -279,7 +279,7 @@ static int configfs_getlink(struct dentry *dentry, char * path)
 
 }
 
-static void *configfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *configfs_follow_link(struct dentry *dentry)
 {
 	int error = -ENOMEM;
 	unsigned long page = get_zeroed_page(GFP_KERNEL);
@@ -287,12 +287,12 @@ static void *configfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	if (page) {
 		error = configfs_getlink(dentry, (char *)page);
 		if (!error) {
-			nd_set_link(nd, (char *)page);
+			nd_set_link((char *)page);
 			return (void *)page;
 		}
 	}
 
-	nd_set_link(nd, ERR_PTR(error));
+	nd_set_link(ERR_PTR(error));
 	return NULL;
 }
 
diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index 830a7e7..4a612f1 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -43,9 +43,9 @@ const struct file_operations debugfs_file_operations = {
 	.llseek =	noop_llseek,
 };
 
-static void *debugfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *debugfs_follow_link(struct dentry *dentry)
 {
-	nd_set_link(nd, d_inode(dentry)->i_private);
+	nd_set_link(d_inode(dentry)->i_private);
 	return NULL;
 }
 
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index fc850b5..8f46945 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -170,7 +170,6 @@ out_unlock:
  * @directory_inode: inode of the new file's dentry's parent in ecryptfs
  * @ecryptfs_dentry: New file's dentry in ecryptfs
  * @mode: The mode of the new file
- * @nd: nameidata of ecryptfs' parent's dentry & vfsmount
  *
  * Creates the underlying file and the eCryptfs inode which will link to
  * it. It will also update the eCryptfs directory inode to mimic the
@@ -384,7 +383,6 @@ static int ecryptfs_lookup_interpose(struct dentry *dentry,
  * ecryptfs_lookup
  * @ecryptfs_dir_inode: The eCryptfs directory inode
  * @ecryptfs_dentry: The eCryptfs dentry that we are looking up
- * @ecryptfs_nd: nameidata; may be NULL
  *
  * Find a file on disk. If the file does not exist, then we'll add it to the
  * dentry cache and continue on to read it from the disk.
@@ -675,7 +673,7 @@ out:
 	return rc ? ERR_PTR(rc) : buf;
 }
 
-static void *ecryptfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ecryptfs_follow_link(struct dentry *dentry)
 {
 	size_t len;
 	char *buf = ecryptfs_readlink_lower(dentry, &len);
@@ -685,7 +683,7 @@ static void *ecryptfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 				d_inode(ecryptfs_dentry_to_lower(dentry)));
 	buf[len] = '\0';
 out:
-	nd_set_link(nd, buf);
+	nd_set_link(buf);
 	return NULL;
 }
 
diff --git a/fs/exofs/symlink.c b/fs/exofs/symlink.c
index 6f6f3a4..c4e3db4 100644
--- a/fs/exofs/symlink.c
+++ b/fs/exofs/symlink.c
@@ -35,11 +35,11 @@
 
 #include "exofs.h"
 
-static void *exofs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *exofs_follow_link(struct dentry *dentry)
 {
 	struct exofs_i_info *oi = exofs_i(d_inode(dentry));
 
-	nd_set_link(nd, (char *)oi->i_data);
+	nd_set_link((char *)oi->i_data);
 	return NULL;
 }
 
diff --git a/fs/ext2/symlink.c b/fs/ext2/symlink.c
index 20608f1..a6b1642 100644
--- a/fs/ext2/symlink.c
+++ b/fs/ext2/symlink.c
@@ -21,10 +21,10 @@
 #include "xattr.h"
 #include <linux/namei.h>
 
-static void *ext2_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ext2_follow_link(struct dentry *dentry)
 {
 	struct ext2_inode_info *ei = EXT2_I(d_inode(dentry));
-	nd_set_link(nd, (char *)ei->i_data);
+	nd_set_link((char *)ei->i_data);
 	return NULL;
 }
 
diff --git a/fs/ext3/symlink.c b/fs/ext3/symlink.c
index ea96df3..dd9763f 100644
--- a/fs/ext3/symlink.c
+++ b/fs/ext3/symlink.c
@@ -21,10 +21,10 @@
 #include "ext3.h"
 #include "xattr.h"
 
-static void * ext3_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void * ext3_follow_link(struct dentry *dentry)
 {
 	struct ext3_inode_info *ei = EXT3_I(d_inode(dentry));
-	nd_set_link(nd, (char*)ei->i_data);
+	nd_set_link((char*)ei->i_data);
 	return NULL;
 }
 
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index 57f5009..af7dc39 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -23,10 +23,10 @@
 #include "ext4.h"
 #include "xattr.h"
 
-static void *ext4_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ext4_follow_link(struct dentry *dentry)
 {
 	struct ext4_inode_info *ei = EXT4_I(d_inode(dentry));
-	nd_set_link(nd, (char *) ei->i_data);
+	nd_set_link((char *) ei->i_data);
 	return NULL;
 }
 
diff --git a/fs/freevxfs/vxfs_immed.c b/fs/freevxfs/vxfs_immed.c
index 8b9229e..f906f31 100644
--- a/fs/freevxfs/vxfs_immed.c
+++ b/fs/freevxfs/vxfs_immed.c
@@ -39,7 +39,7 @@
 #include "vxfs_inode.h"
 
 
-static void *	vxfs_immed_follow_link(struct dentry *, struct nameidata *);
+static void *	vxfs_immed_follow_link(struct dentry *);
 
 static int	vxfs_immed_readpage(struct file *, struct page *);
 
@@ -64,7 +64,6 @@ const struct address_space_operations vxfs_immed_aops = {
 /**
  * vxfs_immed_follow_link - follow immed symlink
  * @dp:		dentry for the link
- * @np:		pathname lookup data for the current path walk
  *
  * Description:
  *   vxfs_immed_follow_link restarts the pathname lookup with
@@ -74,10 +73,10 @@ const struct address_space_operations vxfs_immed_aops = {
  *   Zero on success, else a negative error code.
  */
 static void *
-vxfs_immed_follow_link(struct dentry *dp, struct nameidata *np)
+vxfs_immed_follow_link(struct dentry *dp)
 {
 	struct vxfs_inode_info		*vip = VXFS_INO(d_inode(dp));
-	nd_set_link(np, vip->vii_immed.vi_immed);
+	nd_set_link(vip->vii_immed.vi_immed);
 	return NULL;
 }
 
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 1453c6f..99230de 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1400,9 +1400,9 @@ static void free_link(char *link)
 		free_page((unsigned long) link);
 }
 
-static void *fuse_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *fuse_follow_link(struct dentry *dentry)
 {
-	nd_set_link(nd, read_link(dentry));
+	nd_set_link(read_link(dentry));
 	return NULL;
 }
 
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index e301850..b89fa98 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1541,14 +1541,13 @@ out:
 /**
  * gfs2_follow_link - Follow a symbolic link
  * @dentry: The dentry of the link
- * @nd: Data that we pass to vfs_follow_link()
  *
  * This can handle symlinks of any size.
  *
  * Returns: 0 on success or error code
  */
 
-static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *gfs2_follow_link(struct dentry *dentry)
 {
 	struct gfs2_inode *ip = GFS2_I(d_inode(dentry));
 	struct gfs2_holder i_gh;
@@ -1561,7 +1560,7 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
 	error = gfs2_glock_nq(&i_gh);
 	if (error) {
 		gfs2_holder_uninit(&i_gh);
-		nd_set_link(nd, ERR_PTR(error));
+		nd_set_link(ERR_PTR(error));
 		return NULL;
 	}
 
@@ -1586,7 +1585,7 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
 	brelse(dibh);
 out:
 	gfs2_glock_dq_uninit(&i_gh);
-	nd_set_link(nd, buf);
+	nd_set_link(buf);
 	return NULL;
 }
 
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index 5f40dfa..f80292b 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -880,7 +880,7 @@ static const struct inode_operations hostfs_dir_iops = {
 	.setattr	= hostfs_setattr,
 };
 
-static void *hostfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *hostfs_follow_link(struct dentry *dentry)
 {
 	char *link = __getname();
 	if (link) {
@@ -900,7 +900,7 @@ static void *hostfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 		link = ERR_PTR(-ENOMEM);
 	}
 
-	nd_set_link(nd, link);
+	nd_set_link(link);
 	return NULL;
 }
 
diff --git a/fs/hppfs/hppfs.c b/fs/hppfs/hppfs.c
index 237907e..2620030 100644
--- a/fs/hppfs/hppfs.c
+++ b/fs/hppfs/hppfs.c
@@ -642,11 +642,11 @@ static int hppfs_readlink(struct dentry *dentry, char __user *buffer,
 						    buflen);
 }
 
-static void *hppfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *hppfs_follow_link(struct dentry *dentry)
 {
 	struct dentry *proc_dentry = HPPFS_I(d_inode(dentry))->proc_dentry;
 
-	return d_inode(proc_dentry)->i_op->follow_link(proc_dentry, nd);
+	return d_inode(proc_dentry)->i_op->follow_link(proc_dentry);
 }
 
 static void hppfs_put_link(struct dentry *dentry, char *link, void *cookie)
diff --git a/fs/jffs2/symlink.c b/fs/jffs2/symlink.c
index 1fefa25..ec28871 100644
--- a/fs/jffs2/symlink.c
+++ b/fs/jffs2/symlink.c
@@ -16,7 +16,7 @@
 #include <linux/namei.h>
 #include "nodelist.h"
 
-static void *jffs2_follow_link(struct dentry *dentry, struct nameidata *nd);
+static void *jffs2_follow_link(struct dentry *dentry);
 
 const struct inode_operations jffs2_symlink_inode_operations =
 {
@@ -29,7 +29,7 @@ const struct inode_operations jffs2_symlink_inode_operations =
 	.removexattr =	jffs2_removexattr
 };
 
-static void *jffs2_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *jffs2_follow_link(struct dentry *dentry)
 {
 	struct jffs2_inode_info *f = JFFS2_INODE_INFO(d_inode(dentry));
 	char *p = (char *)f->target;
@@ -54,7 +54,7 @@ static void *jffs2_follow_link(struct dentry *dentry, struct nameidata *nd)
 	jffs2_dbg(1, "%s(): target path is '%s'\n",
 		  __func__, (char *)f->target);
 
-	nd_set_link(nd, p);
+	nd_set_link(p);
 
 	/*
 	 * We will unlock the f->sem mutex but VFS will use the f->target string. This is safe
diff --git a/fs/jfs/symlink.c b/fs/jfs/symlink.c
index 80f42bc..4299a3c 100644
--- a/fs/jfs/symlink.c
+++ b/fs/jfs/symlink.c
@@ -22,10 +22,10 @@
 #include "jfs_inode.h"
 #include "jfs_xattr.h"
 
-static void *jfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *jfs_follow_link(struct dentry *dentry)
 {
 	char *s = JFS_IP(d_inode(dentry))->i_inline;
-	nd_set_link(nd, s);
+	nd_set_link(s);
 	return NULL;
 }
 
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 2aa55cb..63d08ec 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -112,7 +112,7 @@ static int kernfs_getlink(struct dentry *dentry, char *path)
 	return error;
 }
 
-static void *kernfs_iop_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *kernfs_iop_follow_link(struct dentry *dentry)
 {
 	int error = -ENOMEM;
 	unsigned long page = get_zeroed_page(GFP_KERNEL);
@@ -121,7 +121,7 @@ static void *kernfs_iop_follow_link(struct dentry *dentry, struct nameidata *nd)
 		if (error < 0)
 			free_page((unsigned long)page);
 	}
-	nd_set_link(nd, error ? ERR_PTR(error) : (char *)page);
+	nd_set_link(error ? ERR_PTR(error) : (char *)page);
 	return NULL;
 }
 
diff --git a/fs/namei.c b/fs/namei.c
index 981080c..4c6ce01f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -723,8 +723,10 @@ static inline void path_to_nameidata(const struct path *path,
  * Helper to directly jump to a known parsed path from ->follow_link,
  * caller must have taken a reference to path beforehand.
  */
-void nd_jump_link(struct nameidata *nd, struct path *path)
+void nd_jump_link(struct path *path)
 {
+	struct nameidata *nd = current->nameidata;
+
 	path_put(&nd->path);
 
 	nd->path = *path;
@@ -732,8 +734,10 @@ void nd_jump_link(struct nameidata *nd, struct path *path)
 	nd->flags |= LOOKUP_JUMPED;
 }
 
-void nd_set_link(struct nameidata *nd, char *path)
+void nd_set_link(char *path)
 {
+	struct nameidata *nd = current->nameidata;
+
 	nd->saved_names[nd->depth] = path;
 }
 EXPORT_SYMBOL(nd_set_link);
@@ -887,14 +891,14 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	nd->total_link_count++;
 
 	touch_atime(link);
-	nd_set_link(nd, NULL);
+	nd_set_link(NULL);
 
 	error = security_inode_follow_link(link->dentry);
 	if (error)
 		goto out_put_nd_path;
 
 	nd->last_type = LAST_BIND;
-	*p = dentry->d_inode->i_op->follow_link(dentry, nd);
+	*p = dentry->d_inode->i_op->follow_link(dentry);
 	error = PTR_ERR(*p);
 	if (IS_ERR(*p))
 		goto out_put_nd_path;
@@ -4458,7 +4462,7 @@ int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 	int res;
 
 	nd.depth = 0;
-	cookie = dentry->d_inode->i_op->follow_link(dentry, &nd);
+	cookie = dentry->d_inode->i_op->follow_link(dentry);
 	if (IS_ERR(cookie))
 		res = PTR_ERR(cookie);
 	else {
@@ -4500,10 +4504,10 @@ int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 }
 EXPORT_SYMBOL(page_readlink);
 
-void *page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
+void *page_follow_link_light(struct dentry *dentry)
 {
 	struct page *page = NULL;
-	nd_set_link(nd, page_getlink(dentry, &page));
+	nd_set_link(page_getlink(dentry, &page));
 	return page;
 }
 EXPORT_SYMBOL(page_follow_link_light);
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index 2d56200..1394f87 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -43,7 +43,7 @@ error:
 	return -EIO;
 }
 
-static void *nfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *nfs_follow_link(struct dentry *dentry)
 {
 	struct inode *inode = d_inode(dentry);
 	struct page *page;
@@ -58,11 +58,11 @@ static void *nfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 		err = page;
 		goto read_failed;
 	}
-	nd_set_link(nd, kmap(page));
+	nd_set_link(kmap(page));
 	return page;
 
 read_failed:
-	nd_set_link(nd, err);
+	nd_set_link(err);
 	return NULL;
 }
 
diff --git a/fs/ntfs/namei.c b/fs/ntfs/namei.c
index 0f35b80..dcb5490 100644
--- a/fs/ntfs/namei.c
+++ b/fs/ntfs/namei.c
@@ -35,7 +35,6 @@
  * ntfs_lookup - find the inode represented by a dentry in a directory inode
  * @dir_ino:	directory inode in which to look for the inode
  * @dent:	dentry representing the inode to look for
- * @nd:		lookup nameidata
  *
  * In short, ntfs_lookup() looks for the inode represented by the dentry @dent
  * in the directory inode @dir_ino and if found attaches the inode to the
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index f1abb51..0de7b87 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -140,7 +140,7 @@ struct ovl_link_data {
 	void *cookie;
 };
 
-static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ovl_follow_link(struct dentry *dentry)
 {
 	void *ret;
 	struct dentry *realdentry;
@@ -160,7 +160,7 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
 		data->realdentry = realdentry;
 	}
 
-	ret = realinode->i_op->follow_link(realdentry, nd);
+	ret = realinode->i_op->follow_link(realdentry);
 	if (IS_ERR(ret)) {
 		kfree(data);
 		return ret;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index a3d6b55..a511738 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1371,7 +1371,7 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
 		return -ENOENT;
 }
 
-static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *proc_pid_follow_link(struct dentry *dentry)
 {
 	struct inode *inode = d_inode(dentry);
 	struct path path;
@@ -1385,7 +1385,7 @@ static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
 	if (error)
 		goto out;
 
-	nd_jump_link(nd, &path);
+	nd_jump_link(&path);
 	return NULL;
 out:
 	return ERR_PTR(error);
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 27fd86d..68934dd 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -394,12 +394,12 @@ static const struct file_operations proc_reg_file_ops_no_compat = {
 };
 #endif
 
-static void *proc_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *proc_follow_link(struct dentry *dentry)
 {
 	struct proc_dir_entry *pde = PDE(d_inode(dentry));
 	if (unlikely(!use_pde(pde)))
 		return ERR_PTR(-EINVAL);
-	nd_set_link(nd, pde->data);
+	nd_set_link(pde->data);
 	return pde;
 }
 
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index e512642..15852e1 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -30,7 +30,7 @@ static const struct proc_ns_operations *ns_entries[] = {
 	&mntns_operations,
 };
 
-static void *proc_ns_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *proc_ns_follow_link(struct dentry *dentry)
 {
 	struct inode *inode = d_inode(dentry);
 	const struct proc_ns_operations *ns_ops = PROC_I(inode)->ns_ops;
@@ -45,7 +45,7 @@ static void *proc_ns_follow_link(struct dentry *dentry, struct nameidata *nd)
 	if (ptrace_may_access(task, PTRACE_MODE_READ)) {
 		error = ns_get_path(&ns_path, task, ns_ops);
 		if (!error)
-			nd_jump_link(nd, &ns_path);
+			nd_jump_link(&ns_path);
 	}
 	put_task_struct(task);
 	return error;
diff --git a/fs/proc/self.c b/fs/proc/self.c
index 6195b4a..51ed086 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -19,7 +19,7 @@ static int proc_self_readlink(struct dentry *dentry, char __user *buffer,
 	return readlink_copy(buffer, buflen, tmp);
 }
 
-static void *proc_self_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *proc_self_follow_link(struct dentry *dentry)
 {
 	struct pid_namespace *ns = dentry->d_sb->s_fs_info;
 	pid_t tgid = task_tgid_nr_ns(current, ns);
@@ -32,7 +32,7 @@ static void *proc_self_follow_link(struct dentry *dentry, struct nameidata *nd)
 		else
 			sprintf(name, "%d", tgid);
 	}
-	nd_set_link(nd, name);
+	nd_set_link(name);
 	return NULL;
 }
 
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index a837199..60457f6 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -20,7 +20,7 @@ static int proc_thread_self_readlink(struct dentry *dentry, char __user *buffer,
 	return readlink_copy(buffer, buflen, tmp);
 }
 
-static void *proc_thread_self_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *proc_thread_self_follow_link(struct dentry *dentry)
 {
 	struct pid_namespace *ns = dentry->d_sb->s_fs_info;
 	pid_t tgid = task_tgid_nr_ns(current, ns);
@@ -33,7 +33,7 @@ static void *proc_thread_self_follow_link(struct dentry *dentry, struct nameidat
 		else
 			sprintf(name, "%d/task/%d", tgid, pid);
 	}
-	nd_set_link(nd, name);
+	nd_set_link(name);
 	return NULL;
 }
 
diff --git a/fs/sysv/symlink.c b/fs/sysv/symlink.c
index d3fa0d7..6af76f1 100644
--- a/fs/sysv/symlink.c
+++ b/fs/sysv/symlink.c
@@ -8,9 +8,9 @@
 #include "sysv.h"
 #include <linux/namei.h>
 
-static void *sysv_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *sysv_follow_link(struct dentry *dentry)
 {
-	nd_set_link(nd, (char *)SYSV_I(d_inode(dentry))->i_data);
+	nd_set_link((char *)SYSV_I(d_inode(dentry))->i_data);
 	return NULL;
 }
 
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index b5c6941..c8ea213 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1299,11 +1299,11 @@ static void ubifs_invalidatepage(struct page *page, unsigned int offset,
 	ClearPageChecked(page);
 }
 
-static void *ubifs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ubifs_follow_link(struct dentry *dentry)
 {
 	struct ubifs_inode *ui = ubifs_inode(d_inode(dentry));
 
-	nd_set_link(nd, ui->data);
+	nd_set_link(ui->data);
 	return NULL;
 }
 
diff --git a/fs/ufs/symlink.c b/fs/ufs/symlink.c
index 5b537e2..dd34c35 100644
--- a/fs/ufs/symlink.c
+++ b/fs/ufs/symlink.c
@@ -32,10 +32,10 @@
 #include "ufs.h"
 
 
-static void *ufs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ufs_follow_link(struct dentry *dentry)
 {
 	struct ufs_inode_info *p = UFS_I(d_inode(dentry));
-	nd_set_link(nd, (char*)p->i_u1.i_symlink);
+	nd_set_link((char*)p->i_u1.i_symlink);
 	return NULL;
 }
 
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 54b9523..1c3df6b 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -411,8 +411,7 @@ xfs_vn_rename(
  */
 STATIC void *
 xfs_vn_follow_link(
-	struct dentry		*dentry,
-	struct nameidata	*nd)
+	struct dentry		*dentry)
 {
 	char			*link;
 	int			error = -ENOMEM;
@@ -425,13 +424,13 @@ xfs_vn_follow_link(
 	if (unlikely(error))
 		goto out_kfree;
 
-	nd_set_link(nd, link);
+	nd_set_link(link);
 	return NULL;
 
  out_kfree:
 	kfree(link);
  out_err:
-	nd_set_link(nd, ERR_PTR(error));
+	nd_set_link(ERR_PTR(error));
 	return NULL;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d407d59..a892fb7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -38,7 +38,6 @@ struct backing_dev_info;
 struct export_operations;
 struct hd_geometry;
 struct iovec;
-struct nameidata;
 struct kiocb;
 struct kobject;
 struct pipe_inode_info;
@@ -1599,7 +1598,7 @@ struct file_operations {
 
 struct inode_operations {
 	struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
-	void * (*follow_link) (struct dentry *, struct nameidata *);
+	void * (*follow_link) (struct dentry *);
 	int (*permission) (struct inode *, int);
 	struct posix_acl * (*get_acl)(struct inode *, int);
 
@@ -2695,7 +2694,7 @@ extern const struct file_operations generic_ro_fops;
 
 extern int readlink_copy(char __user *, int, const char *);
 extern int page_readlink(struct dentry *, char __user *, int);
-extern void *page_follow_link_light(struct dentry *, struct nameidata *);
+extern void *page_follow_link_light(struct dentry *);
 extern void page_put_link(struct dentry *, char *, void *);
 extern int __page_symlink(struct inode *inode, const char *symname, int len,
 		int nofs);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 2ec27f2..cc8b51a 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -7,7 +7,6 @@
 #include <linux/path.h>
 
 struct vfsmount;
-struct nameidata;
 
 enum { MAX_NESTED_LINKS = 8 };
 
@@ -70,8 +69,8 @@ extern int follow_up(struct path *);
 extern struct dentry *lock_rename(struct dentry *, struct dentry *);
 extern void unlock_rename(struct dentry *, struct dentry *);
 
-extern void nd_jump_link(struct nameidata *nd, struct path *path);
-extern void nd_set_link(struct nameidata *nd, char *path);
+extern void nd_jump_link(struct path *path);
+extern void nd_set_link(char *path);
 
 static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
 {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4402c91..d3886e5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1267,6 +1267,7 @@ union rcu_special {
 	short s;
 };
 struct rcu_node;
+struct nameidata;
 
 enum perf_event_task_context {
 	perf_invalid_context = -1,
diff --git a/mm/shmem.c b/mm/shmem.c
index 00707f8..daad5af 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2474,17 +2474,17 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
 	return 0;
 }
 
-static void *shmem_follow_short_symlink(struct dentry *dentry, struct nameidata *nd)
+static void *shmem_follow_short_symlink(struct dentry *dentry)
 {
-	nd_set_link(nd, SHMEM_I(d_inode(dentry))->symlink);
+	nd_set_link(SHMEM_I(d_inode(dentry))->symlink);
 	return NULL;
 }
 
-static void *shmem_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *shmem_follow_link(struct dentry *dentry)
 {
 	struct page *page = NULL;
 	int error = shmem_getpage(d_inode(dentry), 0, &page, SGP_READ, NULL);
-	nd_set_link(nd, error ? ERR_PTR(error) : kmap(page));
+	nd_set_link(error ? ERR_PTR(error) : kmap(page));
 	if (page)
 		unlock_page(page);
 	return page;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 07/24] namei: expand nested_symlink() in its only caller
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (5 preceding siblings ...)
  2015-04-20 18:12 ` [PATCH 06/24] VFS: remove nameidata args from ->follow_link Al Viro
@ 2015-04-20 18:12 ` Al Viro
  2015-04-20 18:12 ` [PATCH 08/24] namei.c: separate the parts of follow_link() that find the link body Al Viro
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 61 +++++++++++++++++++++++--------------------------------------
 1 file changed, 23 insertions(+), 38 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 4c6ce01f..a1f6271 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1632,43 +1632,6 @@ out_err:
 }
 
 /*
- * This limits recursive symlink follows to 8, while
- * limiting consecutive symlinks to 40.
- *
- * Without that kind of total limit, nasty chains of consecutive
- * symlinks can cause almost arbitrarily long lookups.
- */
-static inline int nested_symlink(struct path *path, struct nameidata *nd)
-{
-	int res;
-
-	if (unlikely(nd->link_count >= MAX_NESTED_LINKS)) {
-		path_put_conditional(path, nd);
-		path_put(&nd->path);
-		return -ELOOP;
-	}
-	BUG_ON(nd->depth >= MAX_NESTED_LINKS);
-
-	nd->depth++;
-	nd->link_count++;
-
-	do {
-		struct path link = *path;
-		void *cookie;
-
-		res = follow_link(&link, nd, &cookie);
-		if (res)
-			break;
-		res = walk_component(nd, path, LOOKUP_FOLLOW);
-		put_link(nd, &link, cookie);
-	} while (res > 0);
-
-	nd->link_count--;
-	nd->depth--;
-	return res;
-}
-
-/*
  * We can do the critical dentry name comparison and hashing
  * operations one word at a time, but we are limited to:
  *
@@ -1859,7 +1822,29 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 			return err;
 
 		if (err) {
-			err = nested_symlink(&next, nd);
+			if (unlikely(nd->link_count >= MAX_NESTED_LINKS)) {
+				path_put_conditional(&next, nd);
+				path_put(&nd->path);
+				return -ELOOP;
+			}
+			BUG_ON(nd->depth >= MAX_NESTED_LINKS);
+
+			nd->depth++;
+			nd->link_count++;
+
+			do {
+				struct path link = next;
+				void *cookie;
+
+				err = follow_link(&link, nd, &cookie);
+				if (err)
+					break;
+				err = walk_component(nd, &next, LOOKUP_FOLLOW);
+				put_link(nd, &link, cookie);
+			} while (err > 0);
+
+			nd->link_count--;
+			nd->depth--;
 			if (err)
 				return err;
 		}
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 08/24] namei.c: separate the parts of follow_link() that find the link body
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (6 preceding siblings ...)
  2015-04-20 18:12 ` [PATCH 07/24] namei: expand nested_symlink() in its only caller Al Viro
@ 2015-04-20 18:12 ` Al Viro
  2015-04-20 18:12 ` [PATCH 09/24] namei: fold follow_link() into link_path_walk() Al Viro
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

Calling conventions for ->follow_link() are rather unfortunate.  It
would be better to have it return ERR_PTR(error) on error, NULL on
jumps and link body for normal symlinks.  What we currently return
(opaque pointer used by ->put_link() once we are done) should've
been given to analogue (and replacement) of nd_set_link().

For now let's just split a piece of fs/namei.c:follow_link() that
does obtaining the link body into a separate function.  When we
get around to changing ->follow_link() API, the changes will be
contained in that sucker.  follow_link() itself converted to
calling get_link() and then doing the body traversal (if any).

The next step will expand follow_link() call in link_path_walk()
and this helps to keep the size down...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 79 +++++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 45 insertions(+), 34 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a1f6271..e07bf5c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -871,21 +871,23 @@ static int may_linkat(struct path *link)
 	return -EPERM;
 }
 
-static __always_inline int
-follow_link(struct path *link, struct nameidata *nd, void **p)
+static __always_inline char *
+get_link(struct path *link, struct nameidata *nd, void **p)
 {
 	struct dentry *dentry = link->dentry;
+	struct inode *inode = dentry->d_inode;
+	void *cookie;
 	int error;
-	char *s;
+	char *res;
 
 	BUG_ON(nd->flags & LOOKUP_RCU);
 
 	if (link->mnt == nd->path.mnt)
 		mntget(link->mnt);
 
-	error = -ELOOP;
+	res = ERR_PTR(-ELOOP);
 	if (unlikely(nd->total_link_count >= 40))
-		goto out_put_nd_path;
+		goto out;
 
 	cond_resched();
 	nd->total_link_count++;
@@ -893,44 +895,53 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	touch_atime(link);
 	nd_set_link(NULL);
 
-	error = security_inode_follow_link(link->dentry);
+	error = security_inode_follow_link(dentry);
+	res = ERR_PTR(error);
 	if (error)
-		goto out_put_nd_path;
+		goto out;
 
 	nd->last_type = LAST_BIND;
-	*p = dentry->d_inode->i_op->follow_link(dentry);
-	error = PTR_ERR(*p);
-	if (IS_ERR(*p))
-		goto out_put_nd_path;
+	res = cookie = inode->i_op->follow_link(dentry);
+	if (IS_ERR(cookie))
+		goto out;
 
-	error = 0;
-	s = nd_get_link(nd);
-	if (s) {
-		if (unlikely(IS_ERR(s))) {
-			path_put(&nd->path);
-			put_link(nd, link, *p);
-			return PTR_ERR(s);
-		}
-		if (*s == '/') {
-			if (!nd->root.mnt)
-				set_root(nd);
-			path_put(&nd->path);
-			nd->path = nd->root;
-			path_get(&nd->root);
-			nd->flags |= LOOKUP_JUMPED;
-		}
-		nd->inode = nd->path.dentry->d_inode;
-		error = link_path_walk(s, nd);
-		if (unlikely(error))
-			put_link(nd, link, *p);
+	res = nd_get_link(nd);
+	if (!IS_ERR(res)) {
+		*p = cookie;
+		return res;
 	}
 
-	return error;
-
-out_put_nd_path:
+	if (inode->i_op->put_link)
+		inode->i_op->put_link(dentry, res, cookie);
+out:
 	*p = NULL;
 	path_put(&nd->path);
 	path_put(link);
+	return res;
+}
+
+static __always_inline int
+follow_link(struct path *link, struct nameidata *nd, void **p)
+{
+	char *s = get_link(link, nd, p);
+	int error;
+
+	if (unlikely(IS_ERR(s)))
+		return PTR_ERR(s);
+	if (unlikely(!s))
+		return 0;
+	if (*s == '/') {
+		if (!nd->root.mnt)
+			set_root(nd);
+		path_put(&nd->path);
+		nd->path = nd->root;
+		path_get(&nd->root);
+		nd->flags |= LOOKUP_JUMPED;
+	}
+	nd->inode = nd->path.dentry->d_inode;
+	error = link_path_walk(s, nd);
+	if (unlikely(error))
+		put_link(nd, link, *p);
 	return error;
 }
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 09/24] namei: fold follow_link() into link_path_walk()
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (7 preceding siblings ...)
  2015-04-20 18:12 ` [PATCH 08/24] namei.c: separate the parts of follow_link() that find the link body Al Viro
@ 2015-04-20 18:12 ` Al Viro
  2015-04-20 18:12 ` [PATCH 10/24] link_path_walk: handle get_link() returning ERR_PTR() immediately Al Viro
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

... and strip __always_inline from follow_link() - remaining callers
don't need that.

Now link_path_walk() recursion is a direct one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e07bf5c..48b2f17 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -920,8 +920,7 @@ out:
 	return res;
 }
 
-static __always_inline int
-follow_link(struct path *link, struct nameidata *nd, void **p)
+static int follow_link(struct path *link, struct nameidata *nd, void **p)
 {
 	char *s = get_link(link, nd, p);
 	int error;
@@ -1846,10 +1845,29 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 			do {
 				struct path link = next;
 				void *cookie;
+				char *s = get_link(&link, nd, &cookie);
 
-				err = follow_link(&link, nd, &cookie);
-				if (err)
+				if (unlikely(IS_ERR(s))) {
+					err = PTR_ERR(s);
 					break;
+				}
+				err = 0;
+				if (likely(s)) {
+					if (*s == '/') {
+						if (!nd->root.mnt)
+							set_root(nd);
+						path_put(&nd->path);
+						nd->path = nd->root;
+						path_get(&nd->root);
+						nd->flags |= LOOKUP_JUMPED;
+					}
+					nd->inode = nd->path.dentry->d_inode;
+					err = link_path_walk(s, nd);
+					if (unlikely(err)) {
+						put_link(nd, &link, cookie);
+						break;
+					}
+				}
 				err = walk_component(nd, &next, LOOKUP_FOLLOW);
 				put_link(nd, &link, cookie);
 			} while (err > 0);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 10/24] link_path_walk: handle get_link() returning ERR_PTR() immediately
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (8 preceding siblings ...)
  2015-04-20 18:12 ` [PATCH 09/24] namei: fold follow_link() into link_path_walk() Al Viro
@ 2015-04-20 18:12 ` Al Viro
  2015-04-20 18:12 ` [PATCH 11/24] link_path_walk: don't bother with walk_component() after jumping link Al Viro
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

If we get ERR_PTR() from get_link(), we are guaranteed to get err != 0
when we break out of do-while, so we are going to hit if (err) return err;
shortly after it.  Pull that into the if (IS_ERR(s)) body.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 48b2f17..6e6630c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1849,7 +1849,9 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 
 				if (unlikely(IS_ERR(s))) {
 					err = PTR_ERR(s);
-					break;
+					nd->link_count--;
+					nd->depth--;
+					return err;
 				}
 				err = 0;
 				if (likely(s)) {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 11/24] link_path_walk: don't bother with walk_component() after jumping link
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (9 preceding siblings ...)
  2015-04-20 18:12 ` [PATCH 10/24] link_path_walk: handle get_link() returning ERR_PTR() immediately Al Viro
@ 2015-04-20 18:12 ` Al Viro
  2015-04-20 18:12 ` [PATCH 12/24] link_path_walk: turn inner loop into explicit goto Al Viro
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

... it does nothing if nd->last_type is LAST_BIND.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6e6630c..5366b6d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1854,7 +1854,11 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 					return err;
 				}
 				err = 0;
-				if (likely(s)) {
+				if (unlikely(!s)) {
+					/* jumped */
+					put_link(nd, &link, cookie);
+					break;
+				} else {
 					if (*s == '/') {
 						if (!nd->root.mnt)
 							set_root(nd);
@@ -1869,9 +1873,9 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 						put_link(nd, &link, cookie);
 						break;
 					}
+					err = walk_component(nd, &next, LOOKUP_FOLLOW);
+					put_link(nd, &link, cookie);
 				}
-				err = walk_component(nd, &next, LOOKUP_FOLLOW);
-				put_link(nd, &link, cookie);
 			} while (err > 0);
 
 			nd->link_count--;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 12/24] link_path_walk: turn inner loop into explicit goto
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (10 preceding siblings ...)
  2015-04-20 18:12 ` [PATCH 11/24] link_path_walk: don't bother with walk_component() after jumping link Al Viro
@ 2015-04-20 18:12 ` Al Viro
  2015-04-20 18:12 ` [PATCH 13/24] link_path_walk: massage a bit more Al Viro
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 61 ++++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 32 insertions(+), 29 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5366b6d..86d54e5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1832,6 +1832,10 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 			return err;
 
 		if (err) {
+			struct path link;
+			void *cookie;
+			char *s;
+
 			if (unlikely(nd->link_count >= MAX_NESTED_LINKS)) {
 				path_put_conditional(&next, nd);
 				path_put(&nd->path);
@@ -1842,41 +1846,40 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 			nd->depth++;
 			nd->link_count++;
 
-			do {
-				struct path link = next;
-				void *cookie;
-				char *s = get_link(&link, nd, &cookie);
-
-				if (unlikely(IS_ERR(s))) {
-					err = PTR_ERR(s);
-					nd->link_count--;
-					nd->depth--;
-					return err;
+l:
+			link = next;
+			s = get_link(&link, nd, &cookie);
+
+			if (unlikely(IS_ERR(s))) {
+				err = PTR_ERR(s);
+				nd->link_count--;
+				nd->depth--;
+				return err;
+			}
+			err = 0;
+			if (unlikely(!s)) {
+				/* jumped */
+				put_link(nd, &link, cookie);
+			} else {
+				if (*s == '/') {
+					if (!nd->root.mnt)
+						set_root(nd);
+					path_put(&nd->path);
+					nd->path = nd->root;
+					path_get(&nd->root);
+					nd->flags |= LOOKUP_JUMPED;
 				}
-				err = 0;
-				if (unlikely(!s)) {
-					/* jumped */
+				nd->inode = nd->path.dentry->d_inode;
+				err = link_path_walk(s, nd);
+				if (unlikely(err)) {
 					put_link(nd, &link, cookie);
-					break;
 				} else {
-					if (*s == '/') {
-						if (!nd->root.mnt)
-							set_root(nd);
-						path_put(&nd->path);
-						nd->path = nd->root;
-						path_get(&nd->root);
-						nd->flags |= LOOKUP_JUMPED;
-					}
-					nd->inode = nd->path.dentry->d_inode;
-					err = link_path_walk(s, nd);
-					if (unlikely(err)) {
-						put_link(nd, &link, cookie);
-						break;
-					}
 					err = walk_component(nd, &next, LOOKUP_FOLLOW);
 					put_link(nd, &link, cookie);
+					if (err > 0)
+						goto l;
 				}
-			} while (err > 0);
+			}
 
 			nd->link_count--;
 			nd->depth--;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 13/24] link_path_walk: massage a bit more
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (11 preceding siblings ...)
  2015-04-20 18:12 ` [PATCH 12/24] link_path_walk: turn inner loop into explicit goto Al Viro
@ 2015-04-20 18:12 ` Al Viro
  2015-04-20 18:12 ` [PATCH 14/24] link_path_walk: get rid of duplication Al Viro
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

Pull the block after the if-else in the end of what used to be do-while
body into all branches there.  We are almost done with the massage...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 86d54e5..06f1aa3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1860,6 +1860,8 @@ l:
 			if (unlikely(!s)) {
 				/* jumped */
 				put_link(nd, &link, cookie);
+				nd->link_count--;
+				nd->depth--;
 			} else {
 				if (*s == '/') {
 					if (!nd->root.mnt)
@@ -1873,18 +1875,23 @@ l:
 				err = link_path_walk(s, nd);
 				if (unlikely(err)) {
 					put_link(nd, &link, cookie);
+					nd->link_count--;
+					nd->depth--;
+					return err;
 				} else {
 					err = walk_component(nd, &next, LOOKUP_FOLLOW);
 					put_link(nd, &link, cookie);
-					if (err > 0)
+					nd->link_count--;
+					nd->depth--;
+					if (err < 0)
+						return err;
+					if (err > 0) {
+						nd->link_count++;
+						nd->depth++;
 						goto l;
+					}
 				}
 			}
-
-			nd->link_count--;
-			nd->depth--;
-			if (err)
-				return err;
 		}
 		if (!d_can_lookup(nd->path.dentry)) {
 			err = -ENOTDIR; 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 14/24] link_path_walk: get rid of duplication
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (12 preceding siblings ...)
  2015-04-20 18:12 ` [PATCH 13/24] link_path_walk: massage a bit more Al Viro
@ 2015-04-20 18:12 ` Al Viro
  2015-04-20 18:12 ` [PATCH 15/24] link_path_walk: final preparations to killing recursion Al Viro
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

What we do after the second walk_component() + put_link() + depth
decrement in there is exactly equivalent to what's done right
after the first walk_component().  Easy to verify and not at all
surprising, seeing that there we have just walked the last
component of nested symlink.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 06f1aa3..9cef3c0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1828,6 +1828,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 			return 0;
 
 		err = walk_component(nd, &next, LOOKUP_FOLLOW);
+Walked:
 		if (err < 0)
 			return err;
 
@@ -1846,7 +1847,6 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 			nd->depth++;
 			nd->link_count++;
 
-l:
 			link = next;
 			s = get_link(&link, nd, &cookie);
 
@@ -1883,13 +1883,7 @@ l:
 					put_link(nd, &link, cookie);
 					nd->link_count--;
 					nd->depth--;
-					if (err < 0)
-						return err;
-					if (err > 0) {
-						nd->link_count++;
-						nd->depth++;
-						goto l;
-					}
+					goto Walked;
 				}
 			}
 		}
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 15/24] link_path_walk: final preparations to killing recursion
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (13 preceding siblings ...)
  2015-04-20 18:12 ` [PATCH 14/24] link_path_walk: get rid of duplication Al Viro
@ 2015-04-20 18:12 ` Al Viro
  2015-04-20 18:13 ` [PATCH 16/24] link_path_walk: kill the recursion Al Viro
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

reduce the number of returns in there - turn all places
where it returns zero into goto OK and places where it
returns non-zero into goto Err.  The only non-trivial
detail is that all breaks in the loop are guaranteed
to be with non-zero err.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9cef3c0..6d79e00 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1773,7 +1773,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 	while (*name=='/')
 		name++;
 	if (!*name)
-		return 0;
+		goto OK;
 
 	/* At this point we know we have a real path component. */
 	for(;;) {
@@ -1816,7 +1816,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 
 		name += hashlen_len(hash_len);
 		if (!*name)
-			return 0;
+			goto OK;
 		/*
 		 * If it wasn't NUL, we know it was '/'. Skip that
 		 * slash, and continue until no more slashes.
@@ -1825,12 +1825,12 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 			name++;
 		} while (unlikely(*name == '/'));
 		if (!*name)
-			return 0;
+			goto OK;
 
 		err = walk_component(nd, &next, LOOKUP_FOLLOW);
 Walked:
 		if (err < 0)
-			return err;
+			goto Err;
 
 		if (err) {
 			struct path link;
@@ -1840,7 +1840,8 @@ Walked:
 			if (unlikely(nd->link_count >= MAX_NESTED_LINKS)) {
 				path_put_conditional(&next, nd);
 				path_put(&nd->path);
-				return -ELOOP;
+				err = -ELOOP;
+				goto Err;
 			}
 			BUG_ON(nd->depth >= MAX_NESTED_LINKS);
 
@@ -1854,7 +1855,7 @@ Walked:
 				err = PTR_ERR(s);
 				nd->link_count--;
 				nd->depth--;
-				return err;
+				goto Err;
 			}
 			err = 0;
 			if (unlikely(!s)) {
@@ -1877,7 +1878,7 @@ Walked:
 					put_link(nd, &link, cookie);
 					nd->link_count--;
 					nd->depth--;
-					return err;
+					goto Err;
 				} else {
 					err = walk_component(nd, &next, LOOKUP_FOLLOW);
 					put_link(nd, &link, cookie);
@@ -1893,7 +1894,10 @@ Walked:
 		}
 	}
 	terminate_walk(nd);
+Err:
 	return err;
+OK:
+	return 0;
 }
 
 static int path_init(int dfd, const struct filename *name, unsigned int flags,
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 16/24] link_path_walk: kill the recursion
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (14 preceding siblings ...)
  2015-04-20 18:12 ` [PATCH 15/24] link_path_walk: final preparations to killing recursion Al Viro
@ 2015-04-20 18:13 ` Al Viro
  2015-04-20 21:04   ` Linus Torvalds
  2015-04-20 18:13 ` [PATCH 17/24] link_path_walk: split "return from recursive call" path Al Viro
                   ` (9 subsequent siblings)
  25 siblings, 1 reply; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

absolutely straightforward now - the only variables we need to preserve
across the recursive call are name, link and cookie, and recursion depth
is limited (and can is equal to nd->depth).  So arrange an array of
triples to hold instances of those and be done with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 40 +++++++++++++++++++++++++++++-----------
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 6d79e00..4b98f4b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1767,9 +1767,15 @@ static inline u64 hash_name(const char *name)
  */
 static int link_path_walk(const char *name, struct nameidata *nd)
 {
+	struct saved {
+		struct path link;
+		void *cookie;
+		const char *name;
+	} stack[MAX_NESTED_LINKS], *last = stack + nd->depth - 1;
 	struct path next;
 	int err;
-	
+
+start:
 	while (*name=='/')
 		name++;
 	if (!*name)
@@ -1833,8 +1839,6 @@ Walked:
 			goto Err;
 
 		if (err) {
-			struct path link;
-			void *cookie;
 			char *s;
 
 			if (unlikely(nd->link_count >= MAX_NESTED_LINKS)) {
@@ -1847,22 +1851,25 @@ Walked:
 
 			nd->depth++;
 			nd->link_count++;
+			last++;
 
-			link = next;
-			s = get_link(&link, nd, &cookie);
+			last->link = next;
+			s = get_link(&last->link, nd, &last->cookie);
 
 			if (unlikely(IS_ERR(s))) {
 				err = PTR_ERR(s);
 				nd->link_count--;
 				nd->depth--;
+				last--;
 				goto Err;
 			}
 			err = 0;
 			if (unlikely(!s)) {
 				/* jumped */
-				put_link(nd, &link, cookie);
+				put_link(nd, &last->link, last->cookie);
 				nd->link_count--;
 				nd->depth--;
+				last--;
 			} else {
 				if (*s == '/') {
 					if (!nd->root.mnt)
@@ -1873,17 +1880,24 @@ Walked:
 					nd->flags |= LOOKUP_JUMPED;
 				}
 				nd->inode = nd->path.dentry->d_inode;
-				err = link_path_walk(s, nd);
+				last->name = name;
+				name = s;
+				goto start;
+
+back:
+				name = last->name;
 				if (unlikely(err)) {
-					put_link(nd, &link, cookie);
+					put_link(nd, &last->link, last->cookie);
 					nd->link_count--;
 					nd->depth--;
+					last--;
 					goto Err;
 				} else {
 					err = walk_component(nd, &next, LOOKUP_FOLLOW);
-					put_link(nd, &link, cookie);
+					put_link(nd, &last->link, last->cookie);
 					nd->link_count--;
 					nd->depth--;
+					last--;
 					goto Walked;
 				}
 			}
@@ -1895,9 +1909,13 @@ Walked:
 	}
 	terminate_walk(nd);
 Err:
-	return err;
+	if (likely(!nd->depth))
+		return err;
+	goto back;
 OK:
-	return 0;
+	if (likely(!nd->depth))
+		return 0;
+	goto back;
 }
 
 static int path_init(int dfd, const struct filename *name, unsigned int flags,
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 17/24] link_path_walk: split "return from recursive call" path
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (15 preceding siblings ...)
  2015-04-20 18:13 ` [PATCH 16/24] link_path_walk: kill the recursion Al Viro
@ 2015-04-20 18:13 ` Al Viro
  2015-04-20 18:13 ` [PATCH 18/24] link_path_walk: cleanup - turn goto start; into continue; Al Viro
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 40 +++++++++++++++++-----------------------
 1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 4b98f4b..975918f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1883,23 +1883,6 @@ Walked:
 				last->name = name;
 				name = s;
 				goto start;
-
-back:
-				name = last->name;
-				if (unlikely(err)) {
-					put_link(nd, &last->link, last->cookie);
-					nd->link_count--;
-					nd->depth--;
-					last--;
-					goto Err;
-				} else {
-					err = walk_component(nd, &next, LOOKUP_FOLLOW);
-					put_link(nd, &last->link, last->cookie);
-					nd->link_count--;
-					nd->depth--;
-					last--;
-					goto Walked;
-				}
 			}
 		}
 		if (!d_can_lookup(nd->path.dentry)) {
@@ -1909,13 +1892,24 @@ back:
 	}
 	terminate_walk(nd);
 Err:
-	if (likely(!nd->depth))
-		return err;
-	goto back;
+	while (unlikely(nd->depth)) {
+		put_link(nd, &last->link, last->cookie);
+		nd->link_count--;
+		nd->depth--;
+		last--;
+	}
+	return err;
 OK:
-	if (likely(!nd->depth))
-		return 0;
-	goto back;
+	if (unlikely(nd->depth)) {
+		name = last->name;
+		err = walk_component(nd, &next, LOOKUP_FOLLOW);
+		put_link(nd, &last->link, last->cookie);
+		nd->link_count--;
+		nd->depth--;
+		last--;
+		goto Walked;
+	}
+	return 0;
 }
 
 static int path_init(int dfd, const struct filename *name, unsigned int flags,
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 18/24] link_path_walk: cleanup - turn goto start; into continue;
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (16 preceding siblings ...)
  2015-04-20 18:13 ` [PATCH 17/24] link_path_walk: split "return from recursive call" path Al Viro
@ 2015-04-20 18:13 ` Al Viro
  2015-04-20 18:13 ` [PATCH 19/24] namei: fold may_follow_link() into follow_link() Al Viro
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

Deal with skipping leading slashes before what used to be the
recursive call.  That way we can get rid of that goto completely.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 975918f..71c9546 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1775,11 +1775,10 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 	struct path next;
 	int err;
 
-start:
 	while (*name=='/')
 		name++;
 	if (!*name)
-		goto OK;
+		return 0;
 
 	/* At this point we know we have a real path component. */
 	for(;;) {
@@ -1878,11 +1877,15 @@ Walked:
 					nd->path = nd->root;
 					path_get(&nd->root);
 					nd->flags |= LOOKUP_JUMPED;
+					while (unlikely(*++s == '/'))
+						;
 				}
 				nd->inode = nd->path.dentry->d_inode;
 				last->name = name;
+				if (!*s)
+					goto OK;
 				name = s;
-				goto start;
+				continue;
 			}
 		}
 		if (!d_can_lookup(nd->path.dentry)) {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 19/24] namei: fold may_follow_link() into follow_link()
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (17 preceding siblings ...)
  2015-04-20 18:13 ` [PATCH 18/24] link_path_walk: cleanup - turn goto start; into continue; Al Viro
@ 2015-04-20 18:13 ` Al Viro
  2015-04-20 18:13 ` [PATCH 20/24] namei: introduce nameidata->stack Al Viro
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

All remaining callers of the former are preceded by the latter

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 71c9546..e03c18f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -922,9 +922,12 @@ out:
 
 static int follow_link(struct path *link, struct nameidata *nd, void **p)
 {
-	char *s = get_link(link, nd, p);
-	int error;
-
+	char *s;
+	int error = may_follow_link(link, nd);
+	if (unlikely(error))
+		return error;
+	nd->flags |= LOOKUP_PARENT;
+	s = get_link(link, nd, p);
 	if (unlikely(IS_ERR(s)))
 		return PTR_ERR(s);
 	if (unlikely(!s))
@@ -2063,10 +2066,6 @@ static int path_lookupat(int dfd, const struct filename *name,
 		while (err > 0) {
 			void *cookie;
 			struct path link = path;
-			err = may_follow_link(&link, nd);
-			if (unlikely(err))
-				break;
-			nd->flags |= LOOKUP_PARENT;
 			err = follow_link(&link, nd, &cookie);
 			if (err)
 				break;
@@ -2421,10 +2420,6 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
 	while (err > 0) {
 		void *cookie;
 		struct path link = *path;
-		err = may_follow_link(&link, &nd);
-		if (unlikely(err))
-			break;
-		nd.flags |= LOOKUP_PARENT;
 		err = follow_link(&link, &nd, &cookie);
 		if (err)
 			break;
@@ -3313,10 +3308,6 @@ static struct file *path_openat(int dfd, struct filename *pathname,
 			error = -ELOOP;
 			break;
 		}
-		error = may_follow_link(&link, nd);
-		if (unlikely(error))
-			break;
-		nd->flags |= LOOKUP_PARENT;
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 		error = follow_link(&link, nd, &cookie);
 		if (unlikely(error))
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 20/24] namei: introduce nameidata->stack
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (18 preceding siblings ...)
  2015-04-20 18:13 ` [PATCH 19/24] namei: fold may_follow_link() into follow_link() Al Viro
@ 2015-04-20 18:13 ` Al Viro
  2015-04-20 18:13 ` [PATCH 21/24] namei: regularize use of put_link() and follow_link(), trim arguments Al Viro
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

move the stack of saved states out of link_path_walk() stack frame to
that of callers, put a reference to it into struct nameidata as
nd->stack, move nd->saved_names[] elements in there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e03c18f..1c967d1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -504,7 +504,12 @@ struct nameidata {
 	int		link_count,
 			total_link_count;
 	struct file	*base;
-	char *saved_names[MAX_NESTED_LINKS + 1];
+	struct saved {
+		struct path link;
+		const char *name;
+		void *cookie;
+		char *body;
+	} *stack;
 };
 
 static struct nameidata *set_nameidata(struct nameidata *p)
@@ -738,13 +743,13 @@ void nd_set_link(char *path)
 {
 	struct nameidata *nd = current->nameidata;
 
-	nd->saved_names[nd->depth] = path;
+	nd->stack[nd->depth].body = path;
 }
 EXPORT_SYMBOL(nd_set_link);
 
 static inline char *nd_get_link(struct nameidata *nd)
 {
-	return nd->saved_names[nd->depth];
+	return nd->stack[nd->depth].body;
 }
 
 static inline void put_link(struct nameidata *nd, struct path *link, void *cookie)
@@ -1770,11 +1775,7 @@ static inline u64 hash_name(const char *name)
  */
 static int link_path_walk(const char *name, struct nameidata *nd)
 {
-	struct saved {
-		struct path link;
-		void *cookie;
-		const char *name;
-	} stack[MAX_NESTED_LINKS], *last = stack + nd->depth - 1;
+	struct saved *last = nd->stack;
 	struct path next;
 	int err;
 
@@ -2092,7 +2093,11 @@ static int filename_lookup(int dfd, struct filename *name,
 				unsigned int flags, struct nameidata *nd)
 {
 	int retval;
-	struct nameidata *saved_nd = set_nameidata(nd);
+	struct saved stack[MAX_NESTED_LINKS + 1];
+	struct nameidata *saved_nd;
+
+	nd->stack = stack;
+	saved_nd = set_nameidata(nd);
 
 	retval = path_lookupat(dfd, name, flags | LOOKUP_RCU, nd);
 	if (unlikely(retval == -ECHILD))
@@ -2409,9 +2414,12 @@ static int
 path_mountpoint(int dfd, const struct filename *name, struct path *path,
 		unsigned int flags)
 {
-	struct nameidata nd, *saved = set_nameidata(&nd);
+	struct saved stack[MAX_NESTED_LINKS + 1];
+	struct nameidata nd, *saved;
 	int err;
 
+	nd.stack = stack;
+	saved = set_nameidata(&nd);
 	err = path_init(dfd, name, flags, &nd);
 	if (unlikely(err))
 		goto out;
@@ -3280,6 +3288,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
 	struct path path;
 	int opened = 0;
 	int error;
+	struct saved stack[MAX_NESTED_LINKS + 1];
 	struct nameidata *saved_nd;
 
 	file = get_empty_filp();
@@ -3287,6 +3296,7 @@ static struct file *path_openat(int dfd, struct filename *pathname,
 		return file;
 
 	file->f_flags = op->open_flag;
+	nd->stack = stack;
 	saved_nd = set_nameidata(nd);
 
 	if (unlikely(file->f_flags & __O_TMPFILE)) {
@@ -4491,11 +4501,14 @@ EXPORT_SYMBOL(readlink_copy);
  */
 int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 {
-	struct nameidata nd, *saved = set_nameidata(&nd);
+	struct saved stack;
+	struct nameidata nd, *saved;
 	void *cookie;
 	int res;
 
 	nd.depth = 0;
+	nd.stack = &stack;
+	saved = set_nameidata(&nd);
 	cookie = dentry->d_inode->i_op->follow_link(dentry);
 	if (IS_ERR(cookie))
 		res = PTR_ERR(cookie);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 21/24] namei: regularize use of put_link() and follow_link(), trim arguments
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (19 preceding siblings ...)
  2015-04-20 18:13 ` [PATCH 20/24] namei: introduce nameidata->stack Al Viro
@ 2015-04-20 18:13 ` Al Viro
  2015-04-20 18:13 ` [PATCH 22/24] namei: trim the arguments of get_link() Al Viro
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

don't bother with separate variables for link and cookie in trailing
symlink loops; use nd->stack[0] for that.  That makes *all* callers
of put_link() (and follow_link()) operate on nd->stack[nd->depth].link
and nd->stack[nd->depth].cookie.  Trim the argument lists.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 42 ++++++++++++++++++++----------------------
 1 file changed, 20 insertions(+), 22 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1c967d1..a8d6751 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -752,12 +752,14 @@ static inline char *nd_get_link(struct nameidata *nd)
 	return nd->stack[nd->depth].body;
 }
 
-static inline void put_link(struct nameidata *nd, struct path *link, void *cookie)
+static inline void put_link(struct nameidata *nd)
 {
-	struct inode *inode = link->dentry->d_inode;
+	struct saved *last = nd->stack + nd->depth;
+	struct inode *inode = last->link.dentry->d_inode;
 	if (inode->i_op->put_link)
-		inode->i_op->put_link(link->dentry, nd_get_link(nd), cookie);
-	path_put(link);
+		inode->i_op->put_link(last->link.dentry, last->body,
+				      last->cookie);
+	path_put(&last->link);
 }
 
 int sysctl_protected_symlinks __read_mostly = 0;
@@ -925,14 +927,16 @@ out:
 	return res;
 }
 
-static int follow_link(struct path *link, struct nameidata *nd, void **p)
+static int follow_link(struct nameidata *nd, struct path *link)
 {
 	char *s;
 	int error = may_follow_link(link, nd);
+	struct saved *last = nd->stack + nd->depth;
 	if (unlikely(error))
 		return error;
 	nd->flags |= LOOKUP_PARENT;
-	s = get_link(link, nd, p);
+	last->link = *link;
+	s = get_link(&last->link, nd, &last->cookie);
 	if (unlikely(IS_ERR(s)))
 		return PTR_ERR(s);
 	if (unlikely(!s))
@@ -948,7 +952,7 @@ static int follow_link(struct path *link, struct nameidata *nd, void **p)
 	nd->inode = nd->path.dentry->d_inode;
 	error = link_path_walk(s, nd);
 	if (unlikely(error))
-		put_link(nd, link, *p);
+		put_link(nd);
 	return error;
 }
 
@@ -1869,7 +1873,7 @@ Walked:
 			err = 0;
 			if (unlikely(!s)) {
 				/* jumped */
-				put_link(nd, &last->link, last->cookie);
+				put_link(nd);
 				nd->link_count--;
 				nd->depth--;
 				last--;
@@ -1900,7 +1904,7 @@ Walked:
 	terminate_walk(nd);
 Err:
 	while (unlikely(nd->depth)) {
-		put_link(nd, &last->link, last->cookie);
+		put_link(nd);
 		nd->link_count--;
 		nd->depth--;
 		last--;
@@ -1910,7 +1914,7 @@ OK:
 	if (unlikely(nd->depth)) {
 		name = last->name;
 		err = walk_component(nd, &next, LOOKUP_FOLLOW);
-		put_link(nd, &last->link, last->cookie);
+		put_link(nd);
 		nd->link_count--;
 		nd->depth--;
 		last--;
@@ -2065,13 +2069,11 @@ static int path_lookupat(int dfd, const struct filename *name,
 	if (!err && !(flags & LOOKUP_PARENT)) {
 		err = lookup_last(nd, &path);
 		while (err > 0) {
-			void *cookie;
-			struct path link = path;
-			err = follow_link(&link, nd, &cookie);
+			err = follow_link(nd, &path);
 			if (err)
 				break;
 			err = lookup_last(nd, &path);
-			put_link(nd, &link, cookie);
+			put_link(nd);
 		}
 	}
 
@@ -2426,13 +2428,11 @@ path_mountpoint(int dfd, const struct filename *name, struct path *path,
 
 	err = mountpoint_last(&nd, path);
 	while (err > 0) {
-		void *cookie;
-		struct path link = *path;
-		err = follow_link(&link, &nd, &cookie);
+		err = follow_link(&nd, path);
 		if (err)
 			break;
 		err = mountpoint_last(&nd, path);
-		put_link(&nd, &link, cookie);
+		put_link(&nd);
 	}
 out:
 	path_cleanup(&nd);
@@ -3310,8 +3310,6 @@ static struct file *path_openat(int dfd, struct filename *pathname,
 
 	error = do_last(nd, &path, file, op, &opened, pathname);
 	while (unlikely(error > 0)) { /* trailing symlink */
-		struct path link = path;
-		void *cookie;
 		if (!(nd->flags & LOOKUP_FOLLOW)) {
 			path_put_conditional(&path, nd);
 			path_put(&nd->path);
@@ -3319,11 +3317,11 @@ static struct file *path_openat(int dfd, struct filename *pathname,
 			break;
 		}
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
-		error = follow_link(&link, nd, &cookie);
+		error = follow_link(nd, &path);
 		if (unlikely(error))
 			break;
 		error = do_last(nd, &path, file, op, &opened, pathname);
-		put_link(nd, &link, cookie);
+		put_link(nd);
 	}
 out:
 	path_cleanup(nd);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 22/24] namei: trim the arguments of get_link()
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (20 preceding siblings ...)
  2015-04-20 18:13 ` [PATCH 21/24] namei: regularize use of put_link() and follow_link(), trim arguments Al Viro
@ 2015-04-20 18:13 ` Al Viro
  2015-04-20 18:13 ` [PATCH 23/24] new ->follow_link() and ->put_link() calling conventions Al Viro
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

same story as the previous commit

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 37 +++++++++++++++----------------------
 1 file changed, 15 insertions(+), 22 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index a8d6751..425986a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -878,10 +878,10 @@ static int may_linkat(struct path *link)
 	return -EPERM;
 }
 
-static __always_inline char *
-get_link(struct path *link, struct nameidata *nd, void **p)
+static __always_inline char *get_link(struct nameidata *nd, struct path *next)
 {
-	struct dentry *dentry = link->dentry;
+	struct saved *last = nd->stack + nd->depth;
+	struct dentry *dentry = next->dentry;
 	struct inode *inode = dentry->d_inode;
 	void *cookie;
 	int error;
@@ -889,8 +889,10 @@ get_link(struct path *link, struct nameidata *nd, void **p)
 
 	BUG_ON(nd->flags & LOOKUP_RCU);
 
-	if (link->mnt == nd->path.mnt)
-		mntget(link->mnt);
+	if (next->mnt == nd->path.mnt)
+		mntget(next->mnt);
+
+	last->link = *next;
 
 	res = ERR_PTR(-ELOOP);
 	if (unlikely(nd->total_link_count >= 40))
@@ -899,7 +901,7 @@ get_link(struct path *link, struct nameidata *nd, void **p)
 	cond_resched();
 	nd->total_link_count++;
 
-	touch_atime(link);
+	touch_atime(&last->link);
 	nd_set_link(NULL);
 
 	error = security_inode_follow_link(dentry);
@@ -914,16 +916,16 @@ get_link(struct path *link, struct nameidata *nd, void **p)
 
 	res = nd_get_link(nd);
 	if (!IS_ERR(res)) {
-		*p = cookie;
+		last->cookie = cookie;
 		return res;
 	}
 
 	if (inode->i_op->put_link)
 		inode->i_op->put_link(dentry, res, cookie);
 out:
-	*p = NULL;
+	last->cookie = NULL;
 	path_put(&nd->path);
-	path_put(link);
+	path_put(&last->link);
 	return res;
 }
 
@@ -931,12 +933,10 @@ static int follow_link(struct nameidata *nd, struct path *link)
 {
 	char *s;
 	int error = may_follow_link(link, nd);
-	struct saved *last = nd->stack + nd->depth;
 	if (unlikely(error))
 		return error;
 	nd->flags |= LOOKUP_PARENT;
-	last->link = *link;
-	s = get_link(&last->link, nd, &last->cookie);
+	s = get_link(nd, link);
 	if (unlikely(IS_ERR(s)))
 		return PTR_ERR(s);
 	if (unlikely(!s))
@@ -1779,7 +1779,6 @@ static inline u64 hash_name(const char *name)
  */
 static int link_path_walk(const char *name, struct nameidata *nd)
 {
-	struct saved *last = nd->stack;
 	struct path next;
 	int err;
 
@@ -1858,16 +1857,13 @@ Walked:
 
 			nd->depth++;
 			nd->link_count++;
-			last++;
 
-			last->link = next;
-			s = get_link(&last->link, nd, &last->cookie);
+			s = get_link(nd, &next);
 
 			if (unlikely(IS_ERR(s))) {
 				err = PTR_ERR(s);
 				nd->link_count--;
 				nd->depth--;
-				last--;
 				goto Err;
 			}
 			err = 0;
@@ -1876,7 +1872,6 @@ Walked:
 				put_link(nd);
 				nd->link_count--;
 				nd->depth--;
-				last--;
 			} else {
 				if (*s == '/') {
 					if (!nd->root.mnt)
@@ -1889,7 +1884,7 @@ Walked:
 						;
 				}
 				nd->inode = nd->path.dentry->d_inode;
-				last->name = name;
+				nd->stack[nd->depth].name = name;
 				if (!*s)
 					goto OK;
 				name = s;
@@ -1907,17 +1902,15 @@ Err:
 		put_link(nd);
 		nd->link_count--;
 		nd->depth--;
-		last--;
 	}
 	return err;
 OK:
 	if (unlikely(nd->depth)) {
-		name = last->name;
+		name = nd->stack[nd->depth].name;
 		err = walk_component(nd, &next, LOOKUP_FOLLOW);
 		put_link(nd);
 		nd->link_count--;
 		nd->depth--;
-		last--;
 		goto Walked;
 	}
 	return 0;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 23/24] new ->follow_link() and ->put_link() calling conventions
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (21 preceding siblings ...)
  2015-04-20 18:13 ` [PATCH 22/24] namei: trim the arguments of get_link() Al Viro
@ 2015-04-20 18:13 ` Al Viro
  2015-04-20 18:13 ` [PATCH 24/24] uninline walk_component() Al Viro
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

a) instead of storing the symlink body (via nd_set_link()) and returning
an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
that opaque pointer (via nd_pin_link()) and returns the symlink body.
Returning ERR_PTR() on error, NULL on jump (procfs magic symlinks) and
pointer to symlink body for normal symlinks.

Storing NULL for opaque pointer (or not storing it at all) means no call
of ->put_link().

b) the body isn't passed to ->put_link() anymore, only the opaque pointer
is.  In the cases when we used the symlink body to free stuff, ->follow_link()
now stores it as opaque pointer as well as returning it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 Documentation/filesystems/Locking             |  4 +-
 Documentation/filesystems/vfs.txt             |  4 +-
 drivers/staging/lustre/lustre/llite/symlink.c | 11 ++--
 fs/9p/vfs_inode.c                             | 14 +++--
 fs/9p/vfs_inode_dotl.c                        |  6 +-
 fs/autofs4/symlink.c                          |  5 +-
 fs/befs/linuxvfs.c                            | 42 +++++++-------
 fs/ceph/inode.c                               |  6 +-
 fs/cifs/cifsfs.h                              |  2 +-
 fs/cifs/link.c                                | 27 +++++----
 fs/configfs/symlink.c                         | 28 ++++-----
 fs/debugfs/file.c                             |  5 +-
 fs/ecryptfs/inode.c                           |  9 ++-
 fs/exofs/symlink.c                            |  7 +--
 fs/ext2/symlink.c                             |  6 +-
 fs/ext3/symlink.c                             |  6 +-
 fs/ext4/symlink.c                             |  6 +-
 fs/freevxfs/vxfs_immed.c                      |  8 +--
 fs/fuse/dir.c                                 | 19 ++-----
 fs/gfs2/inode.c                               | 10 ++--
 fs/hostfs/hostfs_kern.c                       | 15 +++--
 fs/hppfs/hppfs.c                              |  6 +-
 fs/jffs2/symlink.c                            |  9 +--
 fs/jfs/symlink.c                              |  6 +-
 fs/kernfs/symlink.c                           | 22 ++++----
 fs/libfs.c                                    |  5 +-
 fs/namei.c                                    | 81 +++++++++++----------------
 fs/nfs/symlink.c                              | 18 ++----
 fs/overlayfs/inode.c                          | 17 +++---
 fs/proc/base.c                                |  2 +-
 fs/proc/inode.c                               |  8 +--
 fs/proc/namespaces.c                          |  2 +-
 fs/proc/self.c                                | 24 ++++----
 fs/proc/thread_self.c                         | 22 ++++----
 fs/sysv/symlink.c                             |  5 +-
 fs/ubifs/file.c                               |  7 +--
 fs/ufs/symlink.c                              |  6 +-
 fs/xfs/xfs_iops.c                             |  9 ++-
 include/linux/fs.h                            | 10 ++--
 include/linux/namei.h                         |  3 +-
 mm/shmem.c                                    | 28 +++++----
 41 files changed, 232 insertions(+), 298 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index e4a0c36..f13e9e7 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -50,8 +50,8 @@ prototypes:
 	int (*rename2) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *, unsigned int);
 	int (*readlink) (struct dentry *, char __user *,int);
-	void * (*follow_link) (struct dentry *);
-	void (*put_link) (struct dentry *, char *, void *);
+	const char *(*follow_link) (struct dentry *);
+	void (*put_link) (struct dentry *, void *);
 	void (*truncate) (struct inode *);
 	int (*permission) (struct inode *, int, unsigned int);
 	int (*get_acl)(struct inode *, int);
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 807bd4b..5d61fc8 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -350,8 +350,8 @@ struct inode_operations {
 	int (*rename2) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *, unsigned int);
 	int (*readlink) (struct dentry *, char __user *,int);
-        void * (*follow_link) (struct dentry *);
-	void (*put_link) (struct dentry *, char *, void *);
+        const char *(*follow_link) (struct dentry *);
+	void (*put_link) (struct dentry *, void *);
 	int (*permission) (struct inode *, int);
 	int (*get_acl)(struct inode *, int);
 	int (*setattr) (struct dentry *, struct iattr *);
diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index d71c2a9..4f45b47 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -118,7 +118,7 @@ failed:
 	return rc;
 }
 
-static void *ll_follow_link(struct dentry *dentry)
+static const char *ll_follow_link(struct dentry *dentry)
 {
 	struct inode *inode = d_inode(dentry);
 	struct ptlrpc_request *request = NULL;
@@ -131,18 +131,17 @@ static void *ll_follow_link(struct dentry *dentry)
 	ll_inode_size_unlock(inode);
 	if (rc) {
 		ptlrpc_req_finished(request);
-		request = NULL;
-		symname = ERR_PTR(rc);
+		return ERR_PTR(rc);
 	}
 
-	nd_set_link(symname);
 	/* symname may contain a pointer to the request message buffer,
 	 * we delay request releasing until ll_put_link then.
 	 */
-	return request;
+	nd_pin_link(request);
+	return symname;
 }
 
-static void ll_put_link(struct dentry *dentry, char *link, void *cookie)
+static void ll_put_link(struct dentry *dentry, void *cookie)
 {
 	ptlrpc_req_finished(cookie);
 }
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 2895295..9cbeb8b 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1229,11 +1229,12 @@ ino_t v9fs_qid2ino(struct p9_qid *qid)
  *
  */
 
-static void *v9fs_vfs_follow_link(struct dentry *dentry)
+static const char *v9fs_vfs_follow_link(struct dentry *dentry)
 {
 	struct v9fs_session_info *v9ses = v9fs_dentry2v9ses(dentry);
 	struct p9_fid *fid = v9fs_fid_lookup(dentry);
 	struct p9_wstat *st;
+	char *res;
 
 	p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);
 
@@ -1252,14 +1253,15 @@ static void *v9fs_vfs_follow_link(struct dentry *dentry)
 		kfree(st);
 		return ERR_PTR(-EINVAL);
 	}
-	if (strlen(st->extension) >= PATH_MAX)
-		st->extension[PATH_MAX - 1] = '\0';
-
-	nd_set_link(st->extension);
+	res = st->extension;
 	st->extension = NULL;
+	if (strlen(res) >= PATH_MAX)
+		res[PATH_MAX - 1] = '\0';
+
 	p9stat_free(st);
 	kfree(st);
-	return NULL;
+	nd_pin_link(res);
+	return res;
 }
 
 /**
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 577af9b..b708806 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -908,7 +908,7 @@ error:
  *
  */
 
-static void *
+static const char *
 v9fs_vfs_follow_link_dotl(struct dentry *dentry)
 {
 	struct p9_fid *fid = v9fs_fid_lookup(dentry);
@@ -922,8 +922,8 @@ v9fs_vfs_follow_link_dotl(struct dentry *dentry)
 	retval = p9_client_readlink(fid, &target);
 	if (retval)
 		return ERR_PTR(retval);
-	nd_set_link(target);
-	return NULL;
+	nd_pin_link(target);
+	return target;
 }
 
 int v9fs_refresh_inode_dotl(struct p9_fid *fid, struct inode *inode)
diff --git a/fs/autofs4/symlink.c b/fs/autofs4/symlink.c
index d6dd4c5..a6668fb 100644
--- a/fs/autofs4/symlink.c
+++ b/fs/autofs4/symlink.c
@@ -12,14 +12,13 @@
 
 #include "autofs_i.h"
 
-static void *autofs4_follow_link(struct dentry *dentry)
+static const char *autofs4_follow_link(struct dentry *dentry)
 {
 	struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
 	struct autofs_info *ino = autofs4_dentry_ino(dentry);
 	if (ino && !autofs4_oz_mode(sbi))
 		ino->last_used = jiffies;
-	nd_set_link(d_inode(dentry)->i_private);
-	return NULL;
+	return d_inode(dentry)->i_private;
 }
 
 const struct inode_operations autofs4_symlink_inode_operations = {
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 0714644..2521316 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -42,8 +42,8 @@ static struct inode *befs_iget(struct super_block *, unsigned long);
 static struct inode *befs_alloc_inode(struct super_block *sb);
 static void befs_destroy_inode(struct inode *inode);
 static void befs_destroy_inodecache(void);
-static void *befs_follow_link(struct dentry *);
-static void *befs_fast_follow_link(struct dentry *);
+static const char *befs_follow_link(struct dentry *);
+static const char *befs_fast_follow_link(struct dentry *);
 static int befs_utf2nls(struct super_block *sb, const char *in, int in_len,
 			char **out, int *out_len);
 static int befs_nls2utf(struct super_block *sb, const char *in, int in_len,
@@ -468,7 +468,7 @@ befs_destroy_inodecache(void)
  * The data stream become link name. Unless the LONG_SYMLINK
  * flag is set.
  */
-static void *
+static const char *
 befs_follow_link(struct dentry *dentry)
 {
 	struct super_block *sb = dentry->d_sb;
@@ -479,32 +479,28 @@ befs_follow_link(struct dentry *dentry)
 
 	if (len == 0) {
 		befs_error(sb, "Long symlink with illegal length");
-		link = ERR_PTR(-EIO);
-	} else {
-		befs_debug(sb, "Follow long symlink");
-
-		link = kmalloc(len, GFP_NOFS);
-		if (!link) {
-			link = ERR_PTR(-ENOMEM);
-		} else if (befs_read_lsymlink(sb, data, link, len) != len) {
-			kfree(link);
-			befs_error(sb, "Failed to read entire long symlink");
-			link = ERR_PTR(-EIO);
-		} else {
-			link[len - 1] = '\0';
-		}
+		return ERR_PTR(-EIO);
 	}
-	nd_set_link(link);
-	return NULL;
+	befs_debug(sb, "Follow long symlink");
+
+	link = kmalloc(len, GFP_NOFS);
+	if (!link)
+		return ERR_PTR(-ENOMEM);
+	if (befs_read_lsymlink(sb, data, link, len) != len) {
+		kfree(link);
+		befs_error(sb, "Failed to read entire long symlink");
+		return ERR_PTR(-EIO);
+	}
+	link[len - 1] = '\0';
+	nd_pin_link(link);
+	return link;
 }
 
 
-static void *
+static const char *
 befs_fast_follow_link(struct dentry *dentry)
 {
-	befs_inode_info *befs_ino = BEFS_I(d_inode(dentry));
-	nd_set_link(befs_ino->i_data.symlink);
-	return NULL;
+	return BEFS_I(d_inode(dentry))->i_data.symlink;
 }
 
 /*
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 7dd8f1c..5646a9b 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1691,11 +1691,9 @@ retry:
 /*
  * symlinks
  */
-static void *ceph_sym_follow_link(struct dentry *dentry)
+static const char *ceph_sym_follow_link(struct dentry *dentry)
 {
-	struct ceph_inode_info *ci = ceph_inode(d_inode(dentry));
-	nd_set_link(ci->i_symlink);
-	return NULL;
+	return ceph_inode(d_inode(dentry))->i_symlink;
 }
 
 static const struct inode_operations ceph_symlink_iops = {
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index e3a6ef5..cb26cbe 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -120,7 +120,7 @@ extern struct vfsmount *cifs_dfs_d_automount(struct path *path);
 #endif
 
 /* Functions related to symlinks */
-extern void *cifs_follow_link(struct dentry *direntry);
+extern const char *cifs_follow_link(struct dentry *direntry);
 extern int cifs_readlink(struct dentry *direntry, char __user *buffer,
 			 int buflen);
 extern int cifs_symlink(struct inode *inode, struct dentry *direntry,
diff --git a/fs/cifs/link.c b/fs/cifs/link.c
index 470666c..3760999 100644
--- a/fs/cifs/link.c
+++ b/fs/cifs/link.c
@@ -626,7 +626,7 @@ cifs_hl_exit:
 	return rc;
 }
 
-void *
+const char *
 cifs_follow_link(struct dentry *direntry)
 {
 	struct inode *inode = d_inode(direntry);
@@ -643,16 +643,18 @@ cifs_follow_link(struct dentry *direntry)
 
 	tlink = cifs_sb_tlink(cifs_sb);
 	if (IS_ERR(tlink)) {
-		rc = PTR_ERR(tlink);
-		tlink = NULL;
-		goto out;
+		free_xid(xid);
+		return ERR_CAST(tlink);
 	}
 	tcon = tlink_tcon(tlink);
 	server = tcon->ses->server;
 
 	full_path = build_path_from_dentry(direntry);
-	if (!full_path)
-		goto out;
+	if (!full_path) {
+		free_xid(xid);
+		cifs_put_tlink(tlink);
+		return ERR_PTR(-ENOMEM);
+	}
 
 	cifs_dbg(FYI, "Full path: %s inode = 0x%p\n", full_path, inode);
 
@@ -670,17 +672,14 @@ cifs_follow_link(struct dentry *direntry)
 						&target_path, cifs_sb);
 
 	kfree(full_path);
-out:
+	free_xid(xid);
+	cifs_put_tlink(tlink);
 	if (rc != 0) {
 		kfree(target_path);
-		target_path = ERR_PTR(rc);
+		return ERR_PTR(rc);
 	}
-
-	free_xid(xid);
-	if (tlink)
-		cifs_put_tlink(tlink);
-	nd_set_link(target_path);
-	return NULL;
+	nd_pin_link(target_path);
+	return target_path;
 }
 
 int
diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index ff41712..d2ec994 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -279,30 +279,26 @@ static int configfs_getlink(struct dentry *dentry, char * path)
 
 }
 
-static void *configfs_follow_link(struct dentry *dentry)
+static const char *configfs_follow_link(struct dentry *dentry)
 {
-	int error = -ENOMEM;
 	unsigned long page = get_zeroed_page(GFP_KERNEL);
+	int error;
 
-	if (page) {
-		error = configfs_getlink(dentry, (char *)page);
-		if (!error) {
-			nd_set_link((char *)page);
-			return (void *)page;
-		}
+	if (!page)
+		return ERR_PTR(-ENOMEM);
+
+	error = configfs_getlink(dentry, (char *)page);
+	if (!error) {
+		nd_pin_link((void *)page);
+		return (char *)page;
 	}
 
-	nd_set_link(ERR_PTR(error));
-	return NULL;
+	return ERR_PTR(error);
 }
 
-static void configfs_put_link(struct dentry *dentry, char *link,
-			      void *cookie)
+static void configfs_put_link(struct dentry *dentry, void *cookie)
 {
-	if (cookie) {
-		unsigned long page = (unsigned long)cookie;
-		free_page(page);
-	}
+	free_page((unsigned long)cookie);
 }
 
 const struct inode_operations configfs_symlink_inode_operations = {
diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index 4a612f1..914d0cc 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -43,10 +43,9 @@ const struct file_operations debugfs_file_operations = {
 	.llseek =	noop_llseek,
 };
 
-static void *debugfs_follow_link(struct dentry *dentry)
+static const char *debugfs_follow_link(struct dentry *dentry)
 {
-	nd_set_link(d_inode(dentry)->i_private);
-	return NULL;
+	return d_inode(dentry)->i_private;
 }
 
 const struct inode_operations debugfs_link_operations = {
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 8f46945..dc3f17c 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -673,18 +673,17 @@ out:
 	return rc ? ERR_PTR(rc) : buf;
 }
 
-static void *ecryptfs_follow_link(struct dentry *dentry)
+static const char *ecryptfs_follow_link(struct dentry *dentry)
 {
 	size_t len;
 	char *buf = ecryptfs_readlink_lower(dentry, &len);
 	if (IS_ERR(buf))
-		goto out;
+		return buf;
 	fsstack_copy_attr_atime(d_inode(dentry),
 				d_inode(ecryptfs_dentry_to_lower(dentry)));
 	buf[len] = '\0';
-out:
-	nd_set_link(buf);
-	return NULL;
+	nd_pin_link(buf);
+	return buf;
 }
 
 /**
diff --git a/fs/exofs/symlink.c b/fs/exofs/symlink.c
index c4e3db4..279b4d2 100644
--- a/fs/exofs/symlink.c
+++ b/fs/exofs/symlink.c
@@ -35,12 +35,9 @@
 
 #include "exofs.h"
 
-static void *exofs_follow_link(struct dentry *dentry)
+static const char *exofs_follow_link(struct dentry *dentry)
 {
-	struct exofs_i_info *oi = exofs_i(d_inode(dentry));
-
-	nd_set_link((char *)oi->i_data);
-	return NULL;
+	return (char *)exofs_i(d_inode(dentry))->i_data;
 }
 
 const struct inode_operations exofs_symlink_inode_operations = {
diff --git a/fs/ext2/symlink.c b/fs/ext2/symlink.c
index a6b1642..18eeb68 100644
--- a/fs/ext2/symlink.c
+++ b/fs/ext2/symlink.c
@@ -21,11 +21,9 @@
 #include "xattr.h"
 #include <linux/namei.h>
 
-static void *ext2_follow_link(struct dentry *dentry)
+static const char *ext2_follow_link(struct dentry *dentry)
 {
-	struct ext2_inode_info *ei = EXT2_I(d_inode(dentry));
-	nd_set_link((char *)ei->i_data);
-	return NULL;
+	return (char *)EXT2_I(d_inode(dentry))->i_data;
 }
 
 const struct inode_operations ext2_symlink_inode_operations = {
diff --git a/fs/ext3/symlink.c b/fs/ext3/symlink.c
index dd9763f..0110f84 100644
--- a/fs/ext3/symlink.c
+++ b/fs/ext3/symlink.c
@@ -21,11 +21,9 @@
 #include "ext3.h"
 #include "xattr.h"
 
-static void * ext3_follow_link(struct dentry *dentry)
+static const char *ext3_follow_link(struct dentry *dentry)
 {
-	struct ext3_inode_info *ei = EXT3_I(d_inode(dentry));
-	nd_set_link((char*)ei->i_data);
-	return NULL;
+	return (char *)EXT3_I(d_inode(dentry))->i_data;
 }
 
 const struct inode_operations ext3_symlink_inode_operations = {
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index af7dc39..993e42f 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -23,11 +23,9 @@
 #include "ext4.h"
 #include "xattr.h"
 
-static void *ext4_follow_link(struct dentry *dentry)
+static const char *ext4_follow_link(struct dentry *dentry)
 {
-	struct ext4_inode_info *ei = EXT4_I(d_inode(dentry));
-	nd_set_link((char *) ei->i_data);
-	return NULL;
+	return (char *)EXT4_I(d_inode(dentry))->i_data;
 }
 
 const struct inode_operations ext4_symlink_inode_operations = {
diff --git a/fs/freevxfs/vxfs_immed.c b/fs/freevxfs/vxfs_immed.c
index f906f31..f4b0132 100644
--- a/fs/freevxfs/vxfs_immed.c
+++ b/fs/freevxfs/vxfs_immed.c
@@ -39,7 +39,7 @@
 #include "vxfs_inode.h"
 
 
-static void *	vxfs_immed_follow_link(struct dentry *);
+static const char *vxfs_immed_follow_link(struct dentry *);
 
 static int	vxfs_immed_readpage(struct file *, struct page *);
 
@@ -72,12 +72,10 @@ const struct address_space_operations vxfs_immed_aops = {
  * Returns:
  *   Zero on success, else a negative error code.
  */
-static void *
+static const char *
 vxfs_immed_follow_link(struct dentry *dp)
 {
-	struct vxfs_inode_info		*vip = VXFS_INO(d_inode(dp));
-	nd_set_link(vip->vii_immed.vi_immed);
-	return NULL;
+	return VXFS_INO(d_inode(dp))->vii_immed.vi_immed;
 }
 
 /**
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 99230de..c098df4 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1365,7 +1365,7 @@ static int fuse_readdir(struct file *file, struct dir_context *ctx)
 	return err;
 }
 
-static char *read_link(struct dentry *dentry)
+static const char *fuse_follow_link(struct dentry *dentry)
 {
 	struct inode *inode = d_inode(dentry);
 	struct fuse_conn *fc = get_fuse_conn(inode);
@@ -1389,26 +1389,15 @@ static char *read_link(struct dentry *dentry)
 		link = ERR_PTR(ret);
 	} else {
 		link[ret] = '\0';
+		nd_pin_link(link);
 	}
 	fuse_invalidate_atime(inode);
 	return link;
 }
 
-static void free_link(char *link)
+static void fuse_put_link(struct dentry *dentry, void *cookie)
 {
-	if (!IS_ERR(link))
-		free_page((unsigned long) link);
-}
-
-static void *fuse_follow_link(struct dentry *dentry)
-{
-	nd_set_link(read_link(dentry));
-	return NULL;
-}
-
-static void fuse_put_link(struct dentry *dentry, char *link, void *c)
-{
-	free_link(link);
+	free_page((unsigned long) cookie);
 }
 
 static int fuse_dir_open(struct inode *inode, struct file *file)
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index b89fa98..62685cc 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1547,7 +1547,7 @@ out:
  * Returns: 0 on success or error code
  */
 
-static void *gfs2_follow_link(struct dentry *dentry)
+static const char *gfs2_follow_link(struct dentry *dentry)
 {
 	struct gfs2_inode *ip = GFS2_I(d_inode(dentry));
 	struct gfs2_holder i_gh;
@@ -1560,8 +1560,7 @@ static void *gfs2_follow_link(struct dentry *dentry)
 	error = gfs2_glock_nq(&i_gh);
 	if (error) {
 		gfs2_holder_uninit(&i_gh);
-		nd_set_link(ERR_PTR(error));
-		return NULL;
+		return ERR_PTR(error);
 	}
 
 	size = (unsigned int)i_size_read(&ip->i_inode);
@@ -1585,8 +1584,9 @@ static void *gfs2_follow_link(struct dentry *dentry)
 	brelse(dibh);
 out:
 	gfs2_glock_dq_uninit(&i_gh);
-	nd_set_link(buf);
-	return NULL;
+	if (!IS_ERR(buf))
+		nd_pin_link(buf);
+	return buf;
 }
 
 /**
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index f80292b..9dcc97a 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -880,7 +880,7 @@ static const struct inode_operations hostfs_dir_iops = {
 	.setattr	= hostfs_setattr,
 };
 
-static void *hostfs_follow_link(struct dentry *dentry)
+static const char *hostfs_follow_link(struct dentry *dentry)
 {
 	char *link = __getname();
 	if (link) {
@@ -894,20 +894,19 @@ static void *hostfs_follow_link(struct dentry *dentry)
 		}
 		if (err < 0) {
 			__putname(link);
-			link = ERR_PTR(err);
+			return ERR_PTR(err);
 		}
 	} else {
-		link = ERR_PTR(-ENOMEM);
+		return ERR_PTR(-ENOMEM);
 	}
 
-	nd_set_link(link);
-	return NULL;
+	nd_pin_link(link);
+	return link;
 }
 
-static void hostfs_put_link(struct dentry *dentry, char *s, void *cookie)
+static void hostfs_put_link(struct dentry *dentry, void *cookie)
 {
-	if (!IS_ERR(s))
-		__putname(s);
+	__putname(cookie);
 }
 
 static const struct inode_operations hostfs_link_iops = {
diff --git a/fs/hppfs/hppfs.c b/fs/hppfs/hppfs.c
index 2620030..969f7e0 100644
--- a/fs/hppfs/hppfs.c
+++ b/fs/hppfs/hppfs.c
@@ -642,19 +642,19 @@ static int hppfs_readlink(struct dentry *dentry, char __user *buffer,
 						    buflen);
 }
 
-static void *hppfs_follow_link(struct dentry *dentry)
+static const char *hppfs_follow_link(struct dentry *dentry)
 {
 	struct dentry *proc_dentry = HPPFS_I(d_inode(dentry))->proc_dentry;
 
 	return d_inode(proc_dentry)->i_op->follow_link(proc_dentry);
 }
 
-static void hppfs_put_link(struct dentry *dentry, char *link, void *cookie)
+static void hppfs_put_link(struct dentry *dentry, void *cookie)
 {
 	struct dentry *proc_dentry = HPPFS_I(d_inode(dentry))->proc_dentry;
 
 	if (d_inode(proc_dentry)->i_op->put_link)
-		d_inode(proc_dentry)->i_op->put_link(proc_dentry, link, cookie);
+		d_inode(proc_dentry)->i_op->put_link(proc_dentry, cookie);
 }
 
 static const struct inode_operations hppfs_dir_iops = {
diff --git a/fs/jffs2/symlink.c b/fs/jffs2/symlink.c
index ec28871..6f0b82e 100644
--- a/fs/jffs2/symlink.c
+++ b/fs/jffs2/symlink.c
@@ -16,7 +16,7 @@
 #include <linux/namei.h>
 #include "nodelist.h"
 
-static void *jffs2_follow_link(struct dentry *dentry);
+static const char *jffs2_follow_link(struct dentry *dentry);
 
 const struct inode_operations jffs2_symlink_inode_operations =
 {
@@ -29,7 +29,7 @@ const struct inode_operations jffs2_symlink_inode_operations =
 	.removexattr =	jffs2_removexattr
 };
 
-static void *jffs2_follow_link(struct dentry *dentry)
+static const char *jffs2_follow_link(struct dentry *dentry)
 {
 	struct jffs2_inode_info *f = JFFS2_INODE_INFO(d_inode(dentry));
 	char *p = (char *)f->target;
@@ -54,13 +54,10 @@ static void *jffs2_follow_link(struct dentry *dentry)
 	jffs2_dbg(1, "%s(): target path is '%s'\n",
 		  __func__, (char *)f->target);
 
-	nd_set_link(p);
-
 	/*
 	 * We will unlock the f->sem mutex but VFS will use the f->target string. This is safe
 	 * since the only way that may cause f->target to be changed is iput() operation.
 	 * But VFS will not use f->target after iput() has been called.
 	 */
-	return NULL;
+	return p;
 }
-
diff --git a/fs/jfs/symlink.c b/fs/jfs/symlink.c
index 4299a3c..71f8c31 100644
--- a/fs/jfs/symlink.c
+++ b/fs/jfs/symlink.c
@@ -22,11 +22,9 @@
 #include "jfs_inode.h"
 #include "jfs_xattr.h"
 
-static void *jfs_follow_link(struct dentry *dentry)
+static const char *jfs_follow_link(struct dentry *dentry)
 {
-	char *s = JFS_IP(d_inode(dentry))->i_inline;
-	nd_set_link(s);
-	return NULL;
+	return JFS_IP(d_inode(dentry))->i_inline;
 }
 
 const struct inode_operations jfs_fast_symlink_inode_operations = {
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 63d08ec..f831629 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -112,24 +112,24 @@ static int kernfs_getlink(struct dentry *dentry, char *path)
 	return error;
 }
 
-static void *kernfs_iop_follow_link(struct dentry *dentry)
+static const char *kernfs_iop_follow_link(struct dentry *dentry)
 {
 	int error = -ENOMEM;
 	unsigned long page = get_zeroed_page(GFP_KERNEL);
-	if (page) {
-		error = kernfs_getlink(dentry, (char *) page);
-		if (error < 0)
-			free_page((unsigned long)page);
+	if (!page)
+		return ERR_PTR(-ENOMEM);
+	error = kernfs_getlink(dentry, (char *)page);
+	if (unlikely(error < 0)) {
+		free_page((unsigned long)page);
+		return ERR_PTR(error);
 	}
-	nd_set_link(error ? ERR_PTR(error) : (char *)page);
-	return NULL;
+	nd_pin_link((void *)page);
+	return (char *)page;
 }
 
-static void kernfs_iop_put_link(struct dentry *dentry, char *page,
-				void *cookie)
+static void kernfs_iop_put_link(struct dentry *dentry, void *cookie)
 {
-	if (!IS_ERR(page))
-		free_page((unsigned long)page);
+	free_page((unsigned long)cookie);
 }
 
 const struct inode_operations kernfs_symlink_iops = {
diff --git a/fs/libfs.c b/fs/libfs.c
index fe6041a..c6955d3 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1024,10 +1024,9 @@ int noop_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 }
 EXPORT_SYMBOL(noop_fsync);
 
-void kfree_put_link(struct dentry *dentry, char *s, void *cookie)
+void kfree_put_link(struct dentry *dentry, void *cookie)
 {
-	if (!IS_ERR(s))
-		kfree(s);
+	kfree(cookie);
 }
 EXPORT_SYMBOL(kfree_put_link);
 
diff --git a/fs/namei.c b/fs/namei.c
index 425986a..f1ec430 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -508,7 +508,6 @@ struct nameidata {
 		struct path link;
 		const char *name;
 		void *cookie;
-		char *body;
 	} *stack;
 };
 
@@ -739,26 +738,27 @@ void nd_jump_link(struct path *path)
 	nd->flags |= LOOKUP_JUMPED;
 }
 
-void nd_set_link(char *path)
+void nd_pin_link(void *cookie)
 {
 	struct nameidata *nd = current->nameidata;
 
-	nd->stack[nd->depth].body = path;
+	nd->stack[nd->depth].cookie = cookie;
 }
-EXPORT_SYMBOL(nd_set_link);
+EXPORT_SYMBOL(nd_pin_link);
 
-static inline char *nd_get_link(struct nameidata *nd)
+void *nd_pinned_link(void)
 {
-	return nd->stack[nd->depth].body;
+	struct nameidata *nd = current->nameidata;
+	return nd->stack[nd->depth].cookie;
 }
+EXPORT_SYMBOL(nd_pinned_link);
 
 static inline void put_link(struct nameidata *nd)
 {
 	struct saved *last = nd->stack + nd->depth;
 	struct inode *inode = last->link.dentry->d_inode;
-	if (inode->i_op->put_link)
-		inode->i_op->put_link(last->link.dentry, last->body,
-				      last->cookie);
+	if (last->cookie && inode->i_op->put_link)
+		inode->i_op->put_link(last->link.dentry, last->cookie);
 	path_put(&last->link);
 }
 
@@ -878,14 +878,14 @@ static int may_linkat(struct path *link)
 	return -EPERM;
 }
 
-static __always_inline char *get_link(struct nameidata *nd, struct path *next)
+static __always_inline
+const char *get_link(struct nameidata *nd, struct path *next)
 {
 	struct saved *last = nd->stack + nd->depth;
 	struct dentry *dentry = next->dentry;
 	struct inode *inode = dentry->d_inode;
-	void *cookie;
 	int error;
-	char *res;
+	const char *res;
 
 	BUG_ON(nd->flags & LOOKUP_RCU);
 
@@ -893,6 +893,7 @@ static __always_inline char *get_link(struct nameidata *nd, struct path *next)
 		mntget(next->mnt);
 
 	last->link = *next;
+	last->cookie = NULL;
 
 	res = ERR_PTR(-ELOOP);
 	if (unlikely(nd->total_link_count >= 40))
@@ -902,7 +903,6 @@ static __always_inline char *get_link(struct nameidata *nd, struct path *next)
 	nd->total_link_count++;
 
 	touch_atime(&last->link);
-	nd_set_link(NULL);
 
 	error = security_inode_follow_link(dentry);
 	res = ERR_PTR(error);
@@ -910,28 +910,18 @@ static __always_inline char *get_link(struct nameidata *nd, struct path *next)
 		goto out;
 
 	nd->last_type = LAST_BIND;
-	res = cookie = inode->i_op->follow_link(dentry);
-	if (IS_ERR(cookie))
-		goto out;
-
-	res = nd_get_link(nd);
-	if (!IS_ERR(res)) {
-		last->cookie = cookie;
-		return res;
-	}
-
-	if (inode->i_op->put_link)
-		inode->i_op->put_link(dentry, res, cookie);
+	res = inode->i_op->follow_link(dentry);
+	if (IS_ERR(res)) {
 out:
-	last->cookie = NULL;
-	path_put(&nd->path);
-	path_put(&last->link);
+		path_put(&nd->path);
+		path_put(&last->link);
+	}
 	return res;
 }
 
 static int follow_link(struct nameidata *nd, struct path *link)
 {
-	char *s;
+	const char *s;
 	int error = may_follow_link(link, nd);
 	if (unlikely(error))
 		return error;
@@ -1845,7 +1835,7 @@ Walked:
 			goto Err;
 
 		if (err) {
-			char *s;
+			const char *s;
 
 			if (unlikely(nd->link_count >= MAX_NESTED_LINKS)) {
 				path_put_conditional(&next, nd);
@@ -4494,21 +4484,19 @@ int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 {
 	struct saved stack;
 	struct nameidata nd, *saved;
-	void *cookie;
+	const char *link;
 	int res;
 
 	nd.depth = 0;
 	nd.stack = &stack;
 	saved = set_nameidata(&nd);
-	cookie = dentry->d_inode->i_op->follow_link(dentry);
-	if (IS_ERR(cookie))
-		res = PTR_ERR(cookie);
-	else {
-		char *link = nd_get_link(&nd);
-
+	link = dentry->d_inode->i_op->follow_link(dentry);
+	if (IS_ERR(link)) {
+		res = PTR_ERR(link);
+	} else {
 		res = readlink_copy(buffer, buflen, link);
 		if (dentry->d_inode->i_op->put_link)
-			dentry->d_inode->i_op->put_link(dentry, link, cookie);
+			dentry->d_inode->i_op->put_link(dentry, stack.cookie);
 	}
 	set_nameidata(saved);
 	return res;
@@ -4542,22 +4530,21 @@ int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 }
 EXPORT_SYMBOL(page_readlink);
 
-void *page_follow_link_light(struct dentry *dentry)
+const char *page_follow_link_light(struct dentry *dentry)
 {
 	struct page *page = NULL;
-	nd_set_link(page_getlink(dentry, &page));
-	return page;
+	char *res = page_getlink(dentry, &page);
+	if (!IS_ERR(res))
+		nd_pin_link(page);
+	return res;
 }
 EXPORT_SYMBOL(page_follow_link_light);
 
-void page_put_link(struct dentry *dentry, char *link, void *cookie)
+void page_put_link(struct dentry *dentry, void *cookie)
 {
 	struct page *page = cookie;
-
-	if (page) {
-		kunmap(page);
-		page_cache_release(page);
-	}
+	kunmap(page);
+	page_cache_release(page);
 }
 EXPORT_SYMBOL(page_put_link);
 
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index 1394f87..62879f2 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -43,7 +43,7 @@ error:
 	return -EIO;
 }
 
-static void *nfs_follow_link(struct dentry *dentry)
+static const char *nfs_follow_link(struct dentry *dentry)
 {
 	struct inode *inode = d_inode(dentry);
 	struct page *page;
@@ -51,19 +51,13 @@ static void *nfs_follow_link(struct dentry *dentry)
 
 	err = ERR_PTR(nfs_revalidate_mapping(inode, inode->i_mapping));
 	if (err)
-		goto read_failed;
+		return err;
 	page = read_cache_page(&inode->i_data, 0,
 				(filler_t *)nfs_symlink_filler, inode);
-	if (IS_ERR(page)) {
-		err = page;
-		goto read_failed;
-	}
-	nd_set_link(kmap(page));
-	return page;
-
-read_failed:
-	nd_set_link(err);
-	return NULL;
+	if (IS_ERR(page))
+		return ERR_CAST(page);
+	nd_pin_link(page);
+	return kmap(page);
 }
 
 /*
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 0de7b87..6efb23e 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -10,6 +10,7 @@
 #include <linux/fs.h>
 #include <linux/slab.h>
 #include <linux/xattr.h>
+#include <linux/namei.h>
 #include "overlayfs.h"
 
 static int ovl_copy_up_last(struct dentry *dentry, struct iattr *attr,
@@ -140,12 +141,12 @@ struct ovl_link_data {
 	void *cookie;
 };
 
-static void *ovl_follow_link(struct dentry *dentry)
+static const char *ovl_follow_link(struct dentry *dentry)
 {
-	void *ret;
 	struct dentry *realdentry;
 	struct inode *realinode;
 	struct ovl_link_data *data = NULL;
+	const char *ret;
 
 	realdentry = ovl_dentry_real(dentry);
 	realinode = realdentry->d_inode;
@@ -161,18 +162,20 @@ static void *ovl_follow_link(struct dentry *dentry)
 	}
 
 	ret = realinode->i_op->follow_link(realdentry);
-	if (IS_ERR(ret)) {
+	if (IS_ERR_OR_NULL(ret)) {
 		kfree(data);
 		return ret;
 	}
 
 	if (data)
-		data->cookie = ret;
+		data->cookie = nd_pinned_link();
 
-	return data;
+	nd_pin_link(data);
+
+	return ret;
 }
 
-static void ovl_put_link(struct dentry *dentry, char *link, void *c)
+static void ovl_put_link(struct dentry *dentry, void *c)
 {
 	struct inode *realinode;
 	struct ovl_link_data *data = c;
@@ -181,7 +184,7 @@ static void ovl_put_link(struct dentry *dentry, char *link, void *c)
 		return;
 
 	realinode = data->realdentry->d_inode;
-	realinode->i_op->put_link(data->realdentry, link, data->cookie);
+	realinode->i_op->put_link(data->realdentry, data->cookie);
 	kfree(data);
 }
 
diff --git a/fs/proc/base.c b/fs/proc/base.c
index a511738..7f55e66 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1371,7 +1371,7 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
 		return -ENOENT;
 }
 
-static void *proc_pid_follow_link(struct dentry *dentry)
+static const char *proc_pid_follow_link(struct dentry *dentry)
 {
 	struct inode *inode = d_inode(dentry);
 	struct path path;
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 68934dd..a40e792 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -394,16 +394,16 @@ static const struct file_operations proc_reg_file_ops_no_compat = {
 };
 #endif
 
-static void *proc_follow_link(struct dentry *dentry)
+static const char *proc_follow_link(struct dentry *dentry)
 {
 	struct proc_dir_entry *pde = PDE(d_inode(dentry));
 	if (unlikely(!use_pde(pde)))
 		return ERR_PTR(-EINVAL);
-	nd_set_link(pde->data);
-	return pde;
+	nd_pin_link(pde);
+	return pde->data;
 }
 
-static void proc_put_link(struct dentry *dentry, char *link, void *p)
+static void proc_put_link(struct dentry *dentry, void *p)
 {
 	unuse_pde(p);
 }
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index 15852e1..bc9c3dd 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -30,7 +30,7 @@ static const struct proc_ns_operations *ns_entries[] = {
 	&mntns_operations,
 };
 
-static void *proc_ns_follow_link(struct dentry *dentry)
+static const char *proc_ns_follow_link(struct dentry *dentry)
 {
 	struct inode *inode = d_inode(dentry);
 	const struct proc_ns_operations *ns_ops = PROC_I(inode)->ns_ops;
diff --git a/fs/proc/self.c b/fs/proc/self.c
index 51ed086..eb966da 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -19,21 +19,21 @@ static int proc_self_readlink(struct dentry *dentry, char __user *buffer,
 	return readlink_copy(buffer, buflen, tmp);
 }
 
-static void *proc_self_follow_link(struct dentry *dentry)
+static const char *proc_self_follow_link(struct dentry *dentry)
 {
 	struct pid_namespace *ns = dentry->d_sb->s_fs_info;
 	pid_t tgid = task_tgid_nr_ns(current, ns);
-	char *name = ERR_PTR(-ENOENT);
-	if (tgid) {
-		/* 11 for max length of signed int in decimal + NULL term */
-		name = kmalloc(12, GFP_KERNEL);
-		if (!name)
-			name = ERR_PTR(-ENOMEM);
-		else
-			sprintf(name, "%d", tgid);
-	}
-	nd_set_link(name);
-	return NULL;
+	char *name;
+
+	if (!tgid)
+		return ERR_PTR(-ENOENT);
+	/* 11 for max length of signed int in decimal + NULL term */
+	name = kmalloc(12, GFP_KERNEL);
+	if (!name)
+		return ERR_PTR(-ENOMEM);
+	sprintf(name, "%d", tgid);
+	nd_pin_link(name);
+	return name;
 }
 
 static const struct inode_operations proc_self_inode_operations = {
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index 60457f6..95e75c2 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -20,21 +20,21 @@ static int proc_thread_self_readlink(struct dentry *dentry, char __user *buffer,
 	return readlink_copy(buffer, buflen, tmp);
 }
 
-static void *proc_thread_self_follow_link(struct dentry *dentry)
+static const char *proc_thread_self_follow_link(struct dentry *dentry)
 {
 	struct pid_namespace *ns = dentry->d_sb->s_fs_info;
 	pid_t tgid = task_tgid_nr_ns(current, ns);
 	pid_t pid = task_pid_nr_ns(current, ns);
-	char *name = ERR_PTR(-ENOENT);
-	if (pid) {
-		name = kmalloc(PROC_NUMBUF + 6 + PROC_NUMBUF, GFP_KERNEL);
-		if (!name)
-			name = ERR_PTR(-ENOMEM);
-		else
-			sprintf(name, "%d/task/%d", tgid, pid);
-	}
-	nd_set_link(name);
-	return NULL;
+	char *name;
+
+	if (!pid)
+		return ERR_PTR(-ENOENT);
+	name = kmalloc(PROC_NUMBUF + 6 + PROC_NUMBUF, GFP_KERNEL);
+	if (!name)
+		return ERR_PTR(-ENOMEM);
+	sprintf(name, "%d/task/%d", tgid, pid);
+	nd_pin_link(name);
+	return name;
 }
 
 static const struct inode_operations proc_thread_self_inode_operations = {
diff --git a/fs/sysv/symlink.c b/fs/sysv/symlink.c
index 6af76f1..6d30ad0 100644
--- a/fs/sysv/symlink.c
+++ b/fs/sysv/symlink.c
@@ -8,10 +8,9 @@
 #include "sysv.h"
 #include <linux/namei.h>
 
-static void *sysv_follow_link(struct dentry *dentry)
+static const char *sysv_follow_link(struct dentry *dentry)
 {
-	nd_set_link((char *)SYSV_I(d_inode(dentry))->i_data);
-	return NULL;
+	return (char *)SYSV_I(d_inode(dentry))->i_data;
 }
 
 const struct inode_operations sysv_fast_symlink_inode_operations = {
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index c8ea213..4767a53a 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1299,12 +1299,9 @@ static void ubifs_invalidatepage(struct page *page, unsigned int offset,
 	ClearPageChecked(page);
 }
 
-static void *ubifs_follow_link(struct dentry *dentry)
+static const char *ubifs_follow_link(struct dentry *dentry)
 {
-	struct ubifs_inode *ui = ubifs_inode(d_inode(dentry));
-
-	nd_set_link(ui->data);
-	return NULL;
+	return ubifs_inode(d_inode(dentry))->data;
 }
 
 int ubifs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
diff --git a/fs/ufs/symlink.c b/fs/ufs/symlink.c
index dd34c35..58f8541 100644
--- a/fs/ufs/symlink.c
+++ b/fs/ufs/symlink.c
@@ -32,11 +32,9 @@
 #include "ufs.h"
 
 
-static void *ufs_follow_link(struct dentry *dentry)
+static const char *ufs_follow_link(struct dentry *dentry)
 {
-	struct ufs_inode_info *p = UFS_I(d_inode(dentry));
-	nd_set_link((char*)p->i_u1.i_symlink);
-	return NULL;
+	return (char *)UFS_I(d_inode(dentry))->i_u1.i_symlink;
 }
 
 const struct inode_operations ufs_fast_symlink_inode_operations = {
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 1c3df6b..b8cfa9d 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -409,7 +409,7 @@ xfs_vn_rename(
  * we need to be very careful about how much stack we use.
  * uio is kmalloced for this reason...
  */
-STATIC void *
+STATIC const char *
 xfs_vn_follow_link(
 	struct dentry		*dentry)
 {
@@ -424,14 +424,13 @@ xfs_vn_follow_link(
 	if (unlikely(error))
 		goto out_kfree;
 
-	nd_set_link(link);
-	return NULL;
+	nd_pin_link(link);
+	return link;
 
  out_kfree:
 	kfree(link);
  out_err:
-	nd_set_link(ERR_PTR(error));
-	return NULL;
+	return ERR_PTR(error);
 }
 
 STATIC int
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a892fb7..691efe4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1598,12 +1598,12 @@ struct file_operations {
 
 struct inode_operations {
 	struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
-	void * (*follow_link) (struct dentry *);
+	const char * (*follow_link) (struct dentry *);
 	int (*permission) (struct inode *, int);
 	struct posix_acl * (*get_acl)(struct inode *, int);
 
 	int (*readlink) (struct dentry *, char __user *,int);
-	void (*put_link) (struct dentry *, char *, void *);
+	void (*put_link) (struct dentry *, void *);
 
 	int (*create) (struct inode *,struct dentry *, umode_t, bool);
 	int (*link) (struct dentry *,struct inode *,struct dentry *);
@@ -2694,13 +2694,13 @@ extern const struct file_operations generic_ro_fops;
 
 extern int readlink_copy(char __user *, int, const char *);
 extern int page_readlink(struct dentry *, char __user *, int);
-extern void *page_follow_link_light(struct dentry *);
-extern void page_put_link(struct dentry *, char *, void *);
+extern const char *page_follow_link_light(struct dentry *);
+extern void page_put_link(struct dentry *, void *);
 extern int __page_symlink(struct inode *inode, const char *symname, int len,
 		int nofs);
 extern int page_symlink(struct inode *inode, const char *symname, int len);
 extern const struct inode_operations page_symlink_inode_operations;
-extern void kfree_put_link(struct dentry *, char *, void *);
+extern void kfree_put_link(struct dentry *, void *);
 extern int generic_readlink(struct dentry *, char __user *, int);
 extern void generic_fillattr(struct inode *, struct kstat *);
 int vfs_getattr_nosec(struct path *path, struct kstat *stat);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index cc8b51a..3a06d96 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -70,7 +70,8 @@ extern struct dentry *lock_rename(struct dentry *, struct dentry *);
 extern void unlock_rename(struct dentry *, struct dentry *);
 
 extern void nd_jump_link(struct path *path);
-extern void nd_set_link(char *path);
+extern void nd_pin_link(void *cookie);
+extern void *nd_pinned_link(void);
 
 static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
 {
diff --git a/mm/shmem.c b/mm/shmem.c
index daad5af..ed19f64 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2474,30 +2474,28 @@ static int shmem_symlink(struct inode *dir, struct dentry *dentry, const char *s
 	return 0;
 }
 
-static void *shmem_follow_short_symlink(struct dentry *dentry)
+static const char *shmem_follow_short_symlink(struct dentry *dentry)
 {
-	nd_set_link(SHMEM_I(d_inode(dentry))->symlink);
-	return NULL;
+	return SHMEM_I(d_inode(dentry))->symlink;
 }
 
-static void *shmem_follow_link(struct dentry *dentry)
+static const char *shmem_follow_link(struct dentry *dentry)
 {
 	struct page *page = NULL;
 	int error = shmem_getpage(d_inode(dentry), 0, &page, SGP_READ, NULL);
-	nd_set_link(error ? ERR_PTR(error) : kmap(page));
-	if (page)
-		unlock_page(page);
-	return page;
+	if (error)
+		return ERR_PTR(error);
+	unlock_page(page);
+	nd_pin_link(page);
+	return kmap(page);
 }
 
-static void shmem_put_link(struct dentry *dentry, char *link, void *cookie)
+static void shmem_put_link(struct dentry *dentry, void *cookie)
 {
-	if (!IS_ERR(link)) {
-		struct page *page = cookie;
-		kunmap(page);
-		mark_page_accessed(page);
-		page_cache_release(page);
-	}
+	struct page *page = cookie;
+	kunmap(page);
+	mark_page_accessed(page);
+	page_cache_release(page);
 }
 
 #ifdef CONFIG_TMPFS_XATTR
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 24/24] uninline walk_component()
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (22 preceding siblings ...)
  2015-04-20 18:13 ` [PATCH 23/24] new ->follow_link() and ->put_link() calling conventions Al Viro
@ 2015-04-20 18:13 ` Al Viro
  2015-04-21 14:49 ` [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
  2015-04-21 14:51 ` [PATCH] logfs: fix a pagecache leak for symlinks Al Viro
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 18:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, linux-kernel, linux-fsdevel

From: Al Viro <viro@zeniv.linux.org.uk>

seriously improves the stack *and* I-cache footprint...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f1ec430..80edea9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1595,8 +1595,7 @@ static inline int should_follow_link(struct dentry *dentry, int follow)
 	return unlikely(d_is_symlink(dentry)) ? follow : 0;
 }
 
-static inline int walk_component(struct nameidata *nd, struct path *path,
-		int follow)
+static int walk_component(struct nameidata *nd, struct path *path, int follow)
 {
 	struct inode *inode;
 	int err;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 01/24] lustre: rip the private symlink nesting limit out
  2015-04-20 18:12 ` [PATCH 01/24] lustre: rip the private symlink nesting limit out Al Viro
@ 2015-04-20 19:08   ` Andreas Dilger
  2015-04-20 19:22     ` Al Viro
  0 siblings, 1 reply; 53+ messages in thread
From: Andreas Dilger @ 2015-04-20 19:08 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Oleg Drokin, Neil Brown, LKML,
	Linux Filesystem Mailing List

On Apr 20, 2015, at 12:12 PM, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> 
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Al, the patch itself looks good, thanks.

However, if this is applied at the start of the series it could
allow tests to easily cause a stack overflow during a bisection (I
don't think users would see a kernel in the middle of the series).

Could this be converted over to checking nd->link_count along with
the [02/24] patch until closer to the end of the series when the
recursion has been removed?

It isn't fatal if that doesn't happen, since this whole series should
land at one time and the chance of testing Lustre symlinks right
in the middle of the series is low, just something I thought when
reviewing the patch.

Cheers, Andreas

> ---
> drivers/staging/lustre/lustre/llite/symlink.c | 15 +++------------
> 1 file changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
> index 3711e67..0615f86 100644
> --- a/drivers/staging/lustre/lustre/llite/symlink.c
> +++ b/drivers/staging/lustre/lustre/llite/symlink.c
> @@ -126,18 +126,9 @@ static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
> 	char *symname = NULL;
> 
> 	CDEBUG(D_VFSTRACE, "VFS Op\n");
> -	/* Limit the recursive symlink depth to 5 instead of default
> -	 * 8 links when kernel has 4k stack to prevent stack overflow.
> -	 * For 8k stacks we need to limit it to 7 for local servers. */
> -	if (THREAD_SIZE < 8192 && current->link_count >= 6) {
> -		rc = -ELOOP;
> -	} else if (THREAD_SIZE == 8192 && current->link_count >= 8) {
> -		rc = -ELOOP;
> -	} else {
> -		ll_inode_size_lock(inode);
> -		rc = ll_readlink_internal(inode, &request, &symname);
> -		ll_inode_size_unlock(inode);
> -	}
> +	ll_inode_size_lock(inode);
> +	rc = ll_readlink_internal(inode, &request, &symname);
> +	ll_inode_size_unlock(inode);
> 	if (rc) {
> 		ptlrpc_req_finished(request);
> 		request = NULL;
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 01/24] lustre: rip the private symlink nesting limit out
  2015-04-20 19:08   ` Andreas Dilger
@ 2015-04-20 19:22     ` Al Viro
  2015-04-20 20:35       ` Al Viro
  0 siblings, 1 reply; 53+ messages in thread
From: Al Viro @ 2015-04-20 19:22 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Linus Torvalds, Oleg Drokin, Neil Brown, LKML,
	Linux Filesystem Mailing List

On Mon, Apr 20, 2015 at 01:08:16PM -0600, Andreas Dilger wrote:
> On Apr 20, 2015, at 12:12 PM, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> > 
> > From: Al Viro <viro@zeniv.linux.org.uk>
> > 
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> 
> Al, the patch itself looks good, thanks.
> 
> However, if this is applied at the start of the series it could
> allow tests to easily cause a stack overflow during a bisection (I
> don't think users would see a kernel in the middle of the series).
> 
> Could this be converted over to checking nd->link_count along with
> the [02/24] patch until closer to the end of the series when the
> recursion has been removed?

Er...  You do realize that struct nameidata is opaque for anything outside
of fs/namei.c and has been that way for a while now?  Sure, we can export
a helper that would return that and rip it out in the end of the series,
but...

> It isn't fatal if that doesn't happen, since this whole series should
> land at one time and the chance of testing Lustre symlinks right
> in the middle of the series is low, just something I thought when
> reviewing the patch.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 01/24] lustre: rip the private symlink nesting limit out
  2015-04-20 19:22     ` Al Viro
@ 2015-04-20 20:35       ` Al Viro
  0 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 20:35 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Linus Torvalds, Oleg Drokin, Neil Brown, LKML,
	Linux Filesystem Mailing List

On Mon, Apr 20, 2015 at 08:22:53PM +0100, Al Viro wrote:
> On Mon, Apr 20, 2015 at 01:08:16PM -0600, Andreas Dilger wrote:
> > On Apr 20, 2015, at 12:12 PM, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> > > 
> > > From: Al Viro <viro@zeniv.linux.org.uk>
> > > 
> > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > 
> > Al, the patch itself looks good, thanks.
> > 
> > However, if this is applied at the start of the series it could
> > allow tests to easily cause a stack overflow during a bisection (I
> > don't think users would see a kernel in the middle of the series).
> > 
> > Could this be converted over to checking nd->link_count along with
> > the [02/24] patch until closer to the end of the series when the
> > recursion has been removed?
> 
> Er...  You do realize that struct nameidata is opaque for anything outside
> of fs/namei.c and has been that way for a while now?  Sure, we can export
> a helper that would return that and rip it out in the end of the series,
> but...

Actually, a cleaner solution would be to reorder that bunch (1--6) past
the link_path_walk() reorganization.  Done and force-pushed to the same
branch...

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 16/24] link_path_walk: kill the recursion
  2015-04-20 18:13 ` [PATCH 16/24] link_path_walk: kill the recursion Al Viro
@ 2015-04-20 21:04   ` Linus Torvalds
  2015-04-20 21:32     ` Al Viro
  0 siblings, 1 reply; 53+ messages in thread
From: Linus Torvalds @ 2015-04-20 21:04 UTC (permalink / raw)
  To: Al Viro; +Cc: Neil Brown, Linux Kernel Mailing List, linux-fsdevel

On Mon, Apr 20, 2015 at 11:13 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>   */
>  static int link_path_walk(const char *name, struct nameidata *nd)
>  {
> +       struct saved {
> +               struct path link;
> +               void *cookie;
> +               const char *name;
> +       } stack[MAX_NESTED_LINKS], *last = stack + nd->depth - 1;
>         struct path next;
>         int err;

I was going to complain about this, and suggest that you move it to
the nameidata, but I see that you did that later.

That said, you then introduce a stack-allocated "struct saved stack[]"
in path_mountpoint[] instead, *and* nameidata is saved on stack, so
this all ends up being very stack-intensive anyway.

I might have missed some patch here, but would it be possible to just
allocate a single per-thread nameidata, and just leave it at that?
Because allocating that thing on the stack when it contains what is
now one kilobyte of array data is *not* acceptable.

Other than that, my quick scan through this looked fine. And maybe
that quick scan missed where you already did that too.

                           Linus

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 16/24] link_path_walk: kill the recursion
  2015-04-20 21:04   ` Linus Torvalds
@ 2015-04-20 21:32     ` Al Viro
  2015-04-20 21:39       ` Linus Torvalds
  2015-04-20 21:41       ` Linus Torvalds
  0 siblings, 2 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 21:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, Linux Kernel Mailing List, linux-fsdevel

On Mon, Apr 20, 2015 at 02:04:53PM -0700, Linus Torvalds wrote:

> That said, you then introduce a stack-allocated "struct saved stack[]"
> in path_mountpoint[] instead, *and* nameidata is saved on stack, so
> this all ends up being very stack-intensive anyway.
> 
> I might have missed some patch here, but would it be possible to just
> allocate a single per-thread nameidata, and just leave it at that?
> Because allocating that thing on the stack when it contains what is
> now one kilobyte of array data is *not* acceptable.

What kilobyte?  It's 9*4 pointers, IOW, 288 bytes total (assuming 64bit box).
And nd->saved_names[] goes away, so scratch 9 pointers we used to have.  Sure,
we can allocate that dynamically (or hold a couple of elements on stack and
allocate when/if we overgrow that), but it's not particularly large win.

Breakeven point is circa the second level of nesting - symlink met when
traversing a symlink...  That - on amd64; on something with fatter stack
frames I would expect the comparison to be even worse for mainline...

We need to preserve 4 pointers on stack per level of nesting.  Seeing that
single link_path_walk() stack frame in mainline is about 5-6 times bigger
than that, "just put enough for all levels into auto array" is an obvious
approach - a couple of link_path_walk() stack frames will be heavier than
that.  For renameat() (the worst-case user of link_path_walk() - there are
two struct nameidata on stack) we end up with breakeven at *one* level of
nesting, what with getting rid of 2*9 pointers in ->saved_names[] of those
nameidata.  And yes, I've measured the actual stack use before and after...

A kilobyte would suffice for 32 levels.  _IF_ we go for "lift the restrictions
on nesting completely", sure, we want to switch to (on-demand) dynamic
allocation.  It's not particularly hard to add and it might be worth doing,
but it's a separate story.  This series leaves the set of accepted pathnames
as-is...

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 16/24] link_path_walk: kill the recursion
  2015-04-20 21:32     ` Al Viro
@ 2015-04-20 21:39       ` Linus Torvalds
  2015-04-20 21:51         ` Al Viro
  2015-04-20 21:41       ` Linus Torvalds
  1 sibling, 1 reply; 53+ messages in thread
From: Linus Torvalds @ 2015-04-20 21:39 UTC (permalink / raw)
  To: Al Viro; +Cc: Neil Brown, Linux Kernel Mailing List, linux-fsdevel

On Mon, Apr 20, 2015 at 2:32 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> What kilobyte?  It's 9*4 pointers, IOW, 288 bytes total (assuming 64bit box).

You also said that you were going to up the recursion limit to 40.. So
40*3*8 bytes..

                   Linus

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 16/24] link_path_walk: kill the recursion
  2015-04-20 21:32     ` Al Viro
  2015-04-20 21:39       ` Linus Torvalds
@ 2015-04-20 21:41       ` Linus Torvalds
  2015-04-20 21:42         ` Linus Torvalds
  2015-04-20 21:52         ` Al Viro
  1 sibling, 2 replies; 53+ messages in thread
From: Linus Torvalds @ 2015-04-20 21:41 UTC (permalink / raw)
  To: Al Viro; +Cc: Neil Brown, Linux Kernel Mailing List, linux-fsdevel

On Mon, Apr 20, 2015 at 2:32 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> A kilobyte would suffice for 32 levels.  _IF_ we go for "lift the restrictions
> on nesting completely", sure, we want to switch to (on-demand) dynamic
> allocation.

And no, we will *never* lift the recursion limit. Not for 1kB, not for
1MB. Never.

Because it's a latency and DoS issue too. We need to react well to
true loops, but also to "very deep" non-loops. It's not about memory
use, it's about users triggering unreasonable CPU resources.

                      Linus

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 16/24] link_path_walk: kill the recursion
  2015-04-20 21:41       ` Linus Torvalds
@ 2015-04-20 21:42         ` Linus Torvalds
  2015-04-20 21:59           ` Al Viro
  2015-04-20 21:52         ` Al Viro
  1 sibling, 1 reply; 53+ messages in thread
From: Linus Torvalds @ 2015-04-20 21:42 UTC (permalink / raw)
  To: Al Viro; +Cc: Neil Brown, Linux Kernel Mailing List, linux-fsdevel

On Mon, Apr 20, 2015 at 2:41 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And no, we will *never* lift the recursion limit. Not for 1kB, not for
> 1MB. Never.

Just to clarify: that's for the "remove restrictions completely".
Upping it to 32 or 40 would be fine. But not with allocations on the
stack.

                 Linus

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 16/24] link_path_walk: kill the recursion
  2015-04-20 21:39       ` Linus Torvalds
@ 2015-04-20 21:51         ` Al Viro
  0 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 21:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, Linux Kernel Mailing List, linux-fsdevel

On Mon, Apr 20, 2015 at 02:39:43PM -0700, Linus Torvalds wrote:
> On Mon, Apr 20, 2015 at 2:32 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > What kilobyte?  It's 9*4 pointers, IOW, 288 bytes total (assuming 64bit box).
> 
> You also said that you were going to up the recursion limit to 40.. So
> 40*3*8 bytes..

Er...  That's exactly what

||        We could reduce it further (see below), but I'm not sure it's worth
|| doing - it's not much extra complexity, but we only squeeze out ~250 bytes
|| that way, with the worst-case footprints (those are triggered by rename())
|| are around 1.4Kb (instead of about twice as much in mainline).  OTOH,
|| that extra complexity would've allowed us to get rid of the nesting limit
|| entirely (i.e. we get ELOOP when we encounter 40 symlinks, no matter in
|| which manner they are nested).  That might be worth considering...

had been about.  And yes, it is easy to implement - new nameidata flag
for "need to kfree() nd->stack",
                        if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
                                path_put_conditional(&next, nd);
                                path_put(&nd->path);
                                return -ELOOP;
                        }
                        BUG_ON(nd->depth >= MAX_NESTED_LINKS);

                        nd->depth++;
replaced with
			if (nd->depth == 2 && !(nd->flags & LOOKUP_KFREE)) {
				struct saved *p = kmalloc(41 * sizeof(*p));
				if (!p) {
					path_put_conditional(&next, nd);
					path_put(&nd->path);
					return -ENOMEM;
				}
				memcpy(p, nd->stack, 2 * sizeof(*p));
				nd->stack = p;
				nd->flags |= LOOKUP_KFREE;
			}
			nd->depth++;
with obvious logics for freeing that crap afterwards.

I really don't like the idea of putting it into nameidata, BTW - consider
e.g. rename().  We don't need the contents of that thing after the
link_path_walk() returns; no point duplicating it...

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 16/24] link_path_walk: kill the recursion
  2015-04-20 21:41       ` Linus Torvalds
  2015-04-20 21:42         ` Linus Torvalds
@ 2015-04-20 21:52         ` Al Viro
  1 sibling, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 21:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, Linux Kernel Mailing List, linux-fsdevel

On Mon, Apr 20, 2015 at 02:41:31PM -0700, Linus Torvalds wrote:
> On Mon, Apr 20, 2015 at 2:32 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > A kilobyte would suffice for 32 levels.  _IF_ we go for "lift the restrictions
> > on nesting completely", sure, we want to switch to (on-demand) dynamic
> > allocation.
> 
> And no, we will *never* lift the recursion limit. Not for 1kB, not for
> 1MB. Never.

We still have the "no more than 40 links total" limit, so it would obviously
be limited by that...

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 16/24] link_path_walk: kill the recursion
  2015-04-20 21:42         ` Linus Torvalds
@ 2015-04-20 21:59           ` Al Viro
  0 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-20 21:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Neil Brown, Linux Kernel Mailing List, linux-fsdevel

On Mon, Apr 20, 2015 at 02:42:56PM -0700, Linus Torvalds wrote:
> On Mon, Apr 20, 2015 at 2:41 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > And no, we will *never* lift the recursion limit. Not for 1kB, not for
> > 1MB. Never.
> 
> Just to clarify: that's for the "remove restrictions completely".
> Upping it to 32 or 40 would be fine. But not with allocations on the
> stack.

Yep.  There are two restrictions - one on the total amount of symlinks
encountered (40), and another on the depth of nesting (8).  The latter
is due to recursion issues, and with dynamic nd->stack it goes away.
The former, of course, stays, for the reasons you've mentioned - it has
nothing to do with the depth of recursion.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (23 preceding siblings ...)
  2015-04-20 18:13 ` [PATCH 24/24] uninline walk_component() Al Viro
@ 2015-04-21 14:49 ` Al Viro
  2015-04-21 15:04   ` Christoph Hellwig
  2015-04-21 14:51 ` [PATCH] logfs: fix a pagecache leak for symlinks Al Viro
  25 siblings, 1 reply; 53+ messages in thread
From: Al Viro @ 2015-04-21 14:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: NeilBrown, linux-kernel, linux-fsdevel

On Mon, Apr 20, 2015 at 07:12:22PM +0100, Al Viro wrote:

> 	Patches 2/24..6/24 are from Neil's RCU follow_link patchset; the
> rest of his patchset is, of course, derailed by the massage done here,
> but AFAICS we should be able to port it on top of this one with reasonably
> little PITA.

BTW, looking at the ->put_link() instances in the tree, after this series
all but one of them ignore *everything* other than cookie.  The only exception
is hppfs; it wants dentry (and its inode as well):

static void hppfs_put_link(struct dentry *dentry, void *cookie)
{
        struct dentry *proc_dentry = HPPFS_I(d_inode(dentry))->proc_dentry;

        if (d_inode(proc_dentry)->i_op->put_link)
                d_inode(proc_dentry)->i_op->put_link(proc_dentry, cookie);
}

Does anybody have objections against passing them inodes instead of dentries?
It would be a lot more convenient for RCU purposes...

Other observations about the Neil's series:
	* the games played in page_follow_link_light/page_put_link are
completely pointless; this
 void page_put_link(struct dentry *dentry, char *link, void *cookie)
 {
-       struct page *page = cookie;
+       struct page *page = (void *)((unsigned long)cookie & ~1UL);

        if (page) {
-               kunmap(page);
+               if (page == cookie)
+                       kunmap(page);
                page_cache_release(page);
        }
 }
is trivially avoided by making sure that inodes with such ->put_link() in
their inode_operations have
	mapping_set_gfp_mask(mapping,
			     mapping_get_gfp_mask(mapping) & ~__GFP_HIGHMEM);
done to them.  And to hell with that kmap/kunmap pair - it's a bad idea to
have kmap held across practically unlimited period, anyway (run into a hung
NFS server while traversing the symlink body, get stuck for minutes until it
times out).  I'll take care of that, it's not much work.
	* XFS: is there any reason why you guys don't separate inline and
non-inline symlinks?  Or don't just use page_follow_link_light() for the
latter...  Note, BTW, that NUL-termination for the latter will be taken
care of by page_getlink().  It's inline ones that look like crap, actually -
unless I'm misreading that code, the longest possible inline symlink will
have ip->i_df.if_u1.if_data completely filled, with no place for terminating
NUL.  Is that correct?  If so, it might make sense to have three variants -
short (== inline shorter than XFS_IFORK_DSIZE) just having ->follow_link()
return ip->i_df.if_u1.if_data and have xfs_setup_inode() do nd_terminate_link()
to make sure they are NUL-terminated, long (non-inline) just using
page_follow_link_light() and bogus (inline with length exactly equal to
XFS_IFORK_DSIZE); the last would be the only ones with XFS-specific non-trivial
->follow_link() - those really need to allocate a buffer and copy into it...

	FWIW, there's a potentially interesting alternative - the thing is,
normal callers of link_path_walk() already know the length of name; both
getname() and getname_kernel() calculate the lengths.  And ->follow_link()
tend to have the lengths available as well (and usually equal to ->i_size of
the inode in question).  So we might consider passing (pointer,length) pairs
to link_path_walk().  In principle, it might simplify things...  Comments?

	* logfs has ->follow_link equal to page_follow_link_light and
NULL ->put_link.  Obvious leak, and that one's -stable fodder - it had
been there all way back to original merge.  I'll send a fix in a minute.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH] logfs: fix a pagecache leak for symlinks
  2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
                   ` (24 preceding siblings ...)
  2015-04-21 14:49 ` [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
@ 2015-04-21 14:51 ` Al Viro
  25 siblings, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-21 14:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Joern Engel, linux-fsdevel


Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/logfs/dir.c b/fs/logfs/dir.c
index 4cf38f1..f9b45d4 100644
--- a/fs/logfs/dir.c
+++ b/fs/logfs/dir.c
@@ -779,6 +779,7 @@ fail:
 const struct inode_operations logfs_symlink_iops = {
 	.readlink	= generic_readlink,
 	.follow_link	= page_follow_link_light,
+	.put_link	= page_put_link,
 };
 
 const struct inode_operations logfs_dir_iops = {

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
  2015-04-21 14:49 ` [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
@ 2015-04-21 15:04   ` Christoph Hellwig
  2015-04-21 15:12     ` Richard Weinberger
  0 siblings, 1 reply; 53+ messages in thread
From: Christoph Hellwig @ 2015-04-21 15:04 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, NeilBrown, linux-kernel, linux-fsdevel,
	Richard Weinberger

On Tue, Apr 21, 2015 at 03:49:59PM +0100, Al Viro wrote:
> On Mon, Apr 20, 2015 at 07:12:22PM +0100, Al Viro wrote:
> 
> > 	Patches 2/24..6/24 are from Neil's RCU follow_link patchset; the
> > rest of his patchset is, of course, derailed by the massage done here,
> > but AFAICS we should be able to port it on top of this one with reasonably
> > little PITA.
> 
> BTW, looking at the ->put_link() instances in the tree, after this series
> all but one of them ignore *everything* other than cookie.  The only exception
> is hppfs; it wants dentry (and its inode as well):
> 
> static void hppfs_put_link(struct dentry *dentry, void *cookie)
> {
>         struct dentry *proc_dentry = HPPFS_I(d_inode(dentry))->proc_dentry;
> 
>         if (d_inode(proc_dentry)->i_op->put_link)
>                 d_inode(proc_dentry)->i_op->put_link(proc_dentry, cookie);
> }

The hppfs code looks totally bogus in general.  Richard, do you know if
anyone still uses that part of UML?

> Does anybody have objections against passing them inodes instead of dentries?
> It would be a lot more convenient for RCU purposes...

Sounds fine to me.

> 	* XFS: is there any reason why you guys don't separate inline and
> non-inline symlinks?  Or don't just use page_follow_link_light() for the
> latter...

Historical reasons, feel free to split it up.

> Note, BTW, that NUL-termination for the latter will be taken
> care of by page_getlink().  It's inline ones that look like crap, actually -
> unless I'm misreading that code, the longest possible inline symlink will
> have ip->i_df.if_u1.if_data completely filled, with no place for terminating
> NUL.  Is that correct?  If so, it might make sense to have three variants -
> short (== inline shorter than XFS_IFORK_DSIZE) just having ->follow_link()
> return ip->i_df.if_u1.if_data and have xfs_setup_inode() do nd_terminate_link()
> to make sure they are NUL-terminated, long (non-inline) just using
> page_follow_link_light() and bogus (inline with length exactly equal to
> XFS_IFORK_DSIZE); the last would be the only ones with XFS-specific non-trivial
> ->follow_link() - those really need to allocate a buffer and copy into it...

In theory we could allocate if_data to include th terminator in memory.
I'd have to see how ugly that would be.

> 	* logfs has ->follow_link equal to page_follow_link_light and
> NULL ->put_link.  Obvious leak, and that one's -stable fodder - it had
> been there all way back to original merge.  I'll send a fix in a minute.

logfs has been unmaintained since 2011, we might as well drop it..

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
  2015-04-21 15:04   ` Christoph Hellwig
@ 2015-04-21 15:12     ` Richard Weinberger
  2015-04-21 15:45       ` Al Viro
  0 siblings, 1 reply; 53+ messages in thread
From: Richard Weinberger @ 2015-04-21 15:12 UTC (permalink / raw)
  To: Christoph Hellwig, Al Viro
  Cc: Linus Torvalds, NeilBrown, linux-kernel, linux-fsdevel

Am 21.04.2015 um 17:04 schrieb Christoph Hellwig:
> On Tue, Apr 21, 2015 at 03:49:59PM +0100, Al Viro wrote:
>> On Mon, Apr 20, 2015 at 07:12:22PM +0100, Al Viro wrote:
>>
>>> 	Patches 2/24..6/24 are from Neil's RCU follow_link patchset; the
>>> rest of his patchset is, of course, derailed by the massage done here,
>>> but AFAICS we should be able to port it on top of this one with reasonably
>>> little PITA.
>>
>> BTW, looking at the ->put_link() instances in the tree, after this series
>> all but one of them ignore *everything* other than cookie.  The only exception
>> is hppfs; it wants dentry (and its inode as well):
>>
>> static void hppfs_put_link(struct dentry *dentry, void *cookie)
>> {
>>         struct dentry *proc_dentry = HPPFS_I(d_inode(dentry))->proc_dentry;
>>
>>         if (d_inode(proc_dentry)->i_op->put_link)
>>                 d_inode(proc_dentry)->i_op->put_link(proc_dentry, cookie);
>> }
> 
> The hppfs code looks totally bogus in general.  Richard, do you know if
> anyone still uses that part of UML?

I'm pretty sure we can kill it. I had the plan to rip it out during this merge window
along with other broken UML stuff but I was too late to ask on the UML mailinglist
if someone is using it (which I really doubt).
So, let's kill it with v4.2.

Or we move it into drivers/staging and hope that someone else is fixing it for us?
...just kidding. ;-)

Thanks,
//richard

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
  2015-04-21 15:12     ` Richard Weinberger
@ 2015-04-21 15:45       ` Al Viro
  2015-04-21 16:46         ` Boaz Harrosh
  2015-04-21 21:20         ` Al Viro
  0 siblings, 2 replies; 53+ messages in thread
From: Al Viro @ 2015-04-21 15:45 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Christoph Hellwig, Linus Torvalds, NeilBrown, linux-kernel,
	linux-fsdevel

On Tue, Apr 21, 2015 at 05:12:01PM +0200, Richard Weinberger wrote:

> I'm pretty sure we can kill it. I had the plan to rip it out during this merge window
> along with other broken UML stuff but I was too late to ask on the UML mailinglist
> if someone is using it (which I really doubt).
> So, let's kill it with v4.2.

Let's do it.  Then ->put_link() is left in an interesting situation - *all*
instances only use the 'cookie' argument...

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
  2015-04-21 15:45       ` Al Viro
@ 2015-04-21 16:46         ` Boaz Harrosh
  2015-04-21 21:20         ` Al Viro
  1 sibling, 0 replies; 53+ messages in thread
From: Boaz Harrosh @ 2015-04-21 16:46 UTC (permalink / raw)
  To: Al Viro, Richard Weinberger
  Cc: Christoph Hellwig, Linus Torvalds, NeilBrown, linux-kernel,
	linux-fsdevel

On 04/21/2015 06:45 PM, Al Viro wrote:
> On Tue, Apr 21, 2015 at 05:12:01PM +0200, Richard Weinberger wrote:
> 
>> I'm pretty sure we can kill it. I had the plan to rip it out during this merge window
>> along with other broken UML stuff but I was too late to ask on the UML mailinglist
>> if someone is using it (which I really doubt).
>> So, let's kill it with v4.2.
> 
> Let's do it.  Then ->put_link() is left in an interesting situation - *all*
> instances only use the 'cookie' argument...

A future user of inode or anything else can embed that in the cookie
anyway.

Just my $0.017
Boaz


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
  2015-04-21 15:45       ` Al Viro
  2015-04-21 16:46         ` Boaz Harrosh
@ 2015-04-21 21:20         ` Al Viro
  2015-04-22 18:07           ` Al Viro
  2015-04-23  5:01           ` Al Viro
  1 sibling, 2 replies; 53+ messages in thread
From: Al Viro @ 2015-04-21 21:20 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Christoph Hellwig, Linus Torvalds, NeilBrown, linux-kernel,
	linux-fsdevel

On Tue, Apr 21, 2015 at 04:45:04PM +0100, Al Viro wrote:
> On Tue, Apr 21, 2015 at 05:12:01PM +0200, Richard Weinberger wrote:
> 
> > I'm pretty sure we can kill it. I had the plan to rip it out during this merge window
> > along with other broken UML stuff but I was too late to ask on the UML mailinglist
> > if someone is using it (which I really doubt).
> > So, let's kill it with v4.2.
> 
> Let's do it.  Then ->put_link() is left in an interesting situation - *all*
> instances only use the 'cookie' argument...

OK, so here's what we have:
	* a lot of filesystems are using page_follow_link_light(); for RCU
mode they should simply look for page and if it's there and uptodate, that's
it - just grab a reference and be done.  If it's not uptodate - oh, well,
fallback to non-RCU mode.  Corresponding ->put_link() doesn't give a damn
which inode or dentry it is - it's just page_cache_release() (we need to
get rid of that kmap() crap anyway).
	* a lot of fast symlinks are using only inode; no ->put_link(),
no blocking operations, etc.  No problem at all.
	* shmem would probably want something similar to what
page_follow_link_light() would be doing for RCU case.
	* befs: should switch to page_follow_link_light(); just a matter of
giving it proper ->readpage().
	* NFS: probably as in Neil's series, except that we really ought to
add a helper for what page_follow_link_light() would do in RCU case, rather
than open-coding it here (and, again, kmap/kunmap crap should go)
	* /proc/self and its per-thread ilk: just do GFP_ATOMIC allocation for
RCU case (and handle failure as -ECHILD rather than usual -ENOMEM).
	* proc_symlink() stuff: uses only inode, nothing blocking, no problem.
	* 9p, cifs and fuse: those always query server on ->follow_link();
-ECHILD and be done with that.  _IF_ they want some kind of caching, they can
do as NFS does.  hostfs is that way too.
	* gfs2: _probably_ want to bugger off with -ECHILD; OTOH, ocfs2
uses page_follow_link_light(), maybe correctly, maybe not, and it ought
to have similar issues...
	* kernfs, configfs: -ECHILD.  And git rm is _very_ tempting after
reading that code...
	* lustre: hell knows, maybe always -ECHILD, maybe something like NFS.
	* XFS: see above.
	* hppfs: agreed to kill it off
	* autofs: not sure; it would be almost the usual fast symlink, if not
for the fact that it marks an object reached from dentry as "used now".
With RCU pathwalk it's _probably_ harmless, but I'd like a confirmation from
autofs folks.
	* /proc/*/ns/*: in theory, we might make it handle RCU mode, but
it's probably easier to say "just bugger off"
	* /proc/*/fd/*, /proc/*/exe, /proc/*/cwd, /proc/*/root: in principle
doable, but not without serious massage.
	* /proc/*/map_files/*: -ECHILD.
	* overlayfs: usual "use GFP_ATOMIC in RCU mode, treat failures as
-ECHILD".
	* ecryptfs: -ECHILD (and its use of ->readlink() is fishy, IMO).

I agree that unlazy_walk() attempted when walking a symlink ought to fail
with -ECHILD; we can't legitimize the symlink itself, so once we are out
of RCU mode, there's nothing to hold the inode of symlink (and its body)
from getting freed.  Solution is wrong though; for example, when
nested symlink occurs in the middle of a trailing one, we should *not*
remove the flag upon leaving the nested symlink.

Another unpleasant thing is that ->follow_link() saying "can't do that in
RCU mode" ends up with restart from scratch - that actually risks to be
worse than the mainline; there we would attempt unlazy_walk() and normally
it would've succeed.

AFAICS, the real rule is "can't unlazy if nd->last.name points into a symlink
body and we might still need to access it"...

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
  2015-04-21 21:20         ` Al Viro
@ 2015-04-22 18:07           ` Al Viro
  2015-04-22 20:12             ` Al Viro
  2015-04-23  5:01           ` Al Viro
  1 sibling, 1 reply; 53+ messages in thread
From: Al Viro @ 2015-04-22 18:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: Christoph Hellwig, Linus Torvalds, linux-kernel, linux-fsdevel

On Tue, Apr 21, 2015 at 10:20:07PM +0100, Al Viro wrote:

> I agree that unlazy_walk() attempted when walking a symlink ought to fail
> with -ECHILD; we can't legitimize the symlink itself, so once we are out
> of RCU mode, there's nothing to hold the inode of symlink (and its body)
> from getting freed.  Solution is wrong though; for example, when
> nested symlink occurs in the middle of a trailing one, we should *not*
> remove the flag upon leaving the nested symlink.
> 
> Another unpleasant thing is that ->follow_link() saying "can't do that in
> RCU mode" ends up with restart from scratch - that actually risks to be
> worse than the mainline; there we would attempt unlazy_walk() and normally
> it would've succeed.
> 
> AFAICS, the real rule is "can't unlazy if nd->last.name points into a symlink
> body and we might still need to access it"...

And one more: may_follow_link() is now potentially oopsable.  Look: suppose
we've reached the link in RCU mode, just as it got unlinked.  link->dentry
has become negative and may_follow_link() steps into
        /* Allowed if owner and follower match. */
        inode = link->dentry->d_inode;
        if (uid_eq(current_cred()->fsuid, inode->i_uid))
                return 0;
Oops...  Incidentally, I suspect that your __read_seqcount_retry() in
follow_link() might be lacking a barrier; why isn't full read_seqcount_retry()
needed?

FWIW, I would rather fetch ->d_inode *and* checked ->seq proir to calling
get_link(), and passed inode to it as an explicit argument.  And passed it
to may_follow_link() as well...

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
  2015-04-22 18:07           ` Al Viro
@ 2015-04-22 20:12             ` Al Viro
  2015-04-22 21:05               ` Al Viro
  0 siblings, 1 reply; 53+ messages in thread
From: Al Viro @ 2015-04-22 20:12 UTC (permalink / raw)
  To: NeilBrown; +Cc: Christoph Hellwig, Linus Torvalds, linux-kernel, linux-fsdevel

On Wed, Apr 22, 2015 at 07:07:59PM +0100, Al Viro wrote:
> And one more: may_follow_link() is now potentially oopsable.  Look: suppose
> we've reached the link in RCU mode, just as it got unlinked.  link->dentry
> has become negative and may_follow_link() steps into
>         /* Allowed if owner and follower match. */
>         inode = link->dentry->d_inode;
>         if (uid_eq(current_cred()->fsuid, inode->i_uid))
>                 return 0;
> Oops...  Incidentally, I suspect that your __read_seqcount_retry() in
> follow_link() might be lacking a barrier; why isn't full read_seqcount_retry()
> needed?
> 
> FWIW, I would rather fetch ->d_inode *and* checked ->seq proir to calling
> get_link(), and passed inode to it as an explicit argument.  And passed it
> to may_follow_link() as well...

Hrm...  You know, something really weird is going on here.  Where are
you setting nd->seq?  I don't see anything in follow_link() doing that.
And nd->seq _definitely_ needs setting if you want to stay in RCU mode -
at that point it matches the dentry of symlink, not that of nd->path
(== parent directory).  Neil, could you tell me which kernel you'd been
testing (ideally - commit ID is a public git tree), what config and what
tests had those been?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
  2015-04-22 20:12             ` Al Viro
@ 2015-04-22 21:05               ` Al Viro
  2015-04-23  7:45                 ` NeilBrown
  0 siblings, 1 reply; 53+ messages in thread
From: Al Viro @ 2015-04-22 21:05 UTC (permalink / raw)
  To: NeilBrown; +Cc: Christoph Hellwig, Linus Torvalds, linux-kernel, linux-fsdevel

On Wed, Apr 22, 2015 at 09:12:38PM +0100, Al Viro wrote:
> On Wed, Apr 22, 2015 at 07:07:59PM +0100, Al Viro wrote:
> > And one more: may_follow_link() is now potentially oopsable.  Look: suppose
> > we've reached the link in RCU mode, just as it got unlinked.  link->dentry
> > has become negative and may_follow_link() steps into
> >         /* Allowed if owner and follower match. */
> >         inode = link->dentry->d_inode;
> >         if (uid_eq(current_cred()->fsuid, inode->i_uid))
> >                 return 0;
> > Oops...  Incidentally, I suspect that your __read_seqcount_retry() in
> > follow_link() might be lacking a barrier; why isn't full read_seqcount_retry()
> > needed?
> > 
> > FWIW, I would rather fetch ->d_inode *and* checked ->seq proir to calling
> > get_link(), and passed inode to it as an explicit argument.  And passed it
> > to may_follow_link() as well...
> 
> Hrm...  You know, something really weird is going on here.  Where are
> you setting nd->seq?  I don't see anything in follow_link() doing that.
> And nd->seq _definitely_ needs setting if you want to stay in RCU mode -
> at that point it matches the dentry of symlink, not that of nd->path
> (== parent directory).  Neil, could you tell me which kernel you'd been
> testing (ideally - commit ID is a public git tree), what config and what
> tests had those been?

FWIW, there's a wart that had been annoying me for quite a while, and it
might be related to dealing with that properly.  Namely, walk_component()
calling conventions.  We have walk_component(nd, &path, follow), which can
	* return -E..., and leave us with pathwalk terminated; path contains
junk, and so does nd->path.
	* return 0, update nd->path, nd->inode and nd->seq.  The contents
of path is in undefined state - it might be unchanged, it might be equal to
nd->path (and not pinned down, RCU mode or not).  In any case, callers do
not touch it afterwards.  That's the normal case.
	* return 1, update nd->seq, leave nd->path and nd->inode unchanged and
set path pointing to our symlink.  nd->seq matches path, not nd->path.

In all cases the original contents of path is ignored - it's purely 'out'
parameter, but compiler can't establish that on its own; it _might_ be
left untouched.  In all cases when its contents survives we don't look at
it afterwards, but proving that requires a non-trivial analysis.

And in case when we return 1 (== symlink to be followed), we bugger nd->seq.
It's left as we need it for unlazy_walk() (and after unlazy_walk() we don't
care about it at all), so currently everything works, but if we want to
stay in RCU mode for symlink traversal, we _will_ need ->d_seq of parent
directory.

I wonder if the right way to solve that would be to drop the path argument
entirely and store the bugger in nameidata.  As in
	union {
		struct qstr last;
		struct path link;
	};
	...
	union {
		int last_type;
		unsigned link_seq;
	};
in struct nameidata.  We never need both at the same time; after
walk_component() (or its analogue in do_last()) we don't need the component
name anymore.  That way walk_component() would not trash nd->seq when
finding a symlink...

It would also shrink the stack footprint a bit - local struct path next
in link_path_walk() would be gone, along with the same thing in path_lookupat()
and friends.  Not a lot of win (4 pointers total), but it might be enough
to excuse putting ->d_seq of root in there, along with ->link.dentry->d_inode,
to avoid rechecking its ->d_seq.  As the matter of fact, we do have an
odd number of 32bit fields in there, so ->d_seq of root would fit nicely...

Comments?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
  2015-04-21 21:20         ` Al Viro
  2015-04-22 18:07           ` Al Viro
@ 2015-04-23  5:01           ` Al Viro
  1 sibling, 0 replies; 53+ messages in thread
From: Al Viro @ 2015-04-23  5:01 UTC (permalink / raw)
  To: NeilBrown; +Cc: Christoph Hellwig, Linus Torvalds, linux-kernel, linux-fsdevel

On Tue, Apr 21, 2015 at 10:20:07PM +0100, Al Viro wrote:

> I agree that unlazy_walk() attempted when walking a symlink ought to fail
> with -ECHILD; we can't legitimize the symlink itself, so once we are out
> of RCU mode, there's nothing to hold the inode of symlink (and its body)
> from getting freed.  Solution is wrong though; for example, when
> nested symlink occurs in the middle of a trailing one, we should *not*
> remove the flag upon leaving the nested symlink.
> 
> Another unpleasant thing is that ->follow_link() saying "can't do that in
> RCU mode" ends up with restart from scratch - that actually risks to be
> worse than the mainline; there we would attempt unlazy_walk() and normally
> it would've succeed.
> 
> AFAICS, the real rule is "can't unlazy if nd->last.name points into a symlink
> body and we might still need to access it"...

Actually, I'm not sure anymore.  What if we have unlazy_walk() legitimize
all the symlinks we are traversing?  They are visible in nd->stack, after
all...  It would mean more complex unlazy_walk(), but not terribly so -
succeeding legitimize_mnt() won't block and we already deal with the
possibility of having vfsmount legitimized, only to be dropped afterwards.
The real unpleasantness here is different - it's the need to keep ->d_seq
of those dentries to tell if they can be grabbed.  That's 4 more bytes per
level plus the fun with alignment.  OTOH, it both avoids the fun with getting
the logics of when to bail out right *and* avoids the guaranteed restarts
when running into a symlink we can't deal with in RCU mode - we could simply
unlazy and continue in such a situation.

Hell knows... it probably means going all the way wrt dynamic (on demand)
allocation, though.  Say it, keeping a couple of levels on stack and allocating
when we need more; the interesting part is in not freeing that sucker too
early.  At the very least, we don't want the progression through RCU/normal/
revalidate-everything modes to trigger allocation/freeing on each step; the
nesting depth is going to be the same every time.  That's not hard to do...

I'm about to fall asleep right now, so all of the above might very well be
complete hogwash; I'll look into it when I wake up.  If anyone has any
comments (including "Al, you are nuts", but something more specific would
be more interesting), please reply.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
  2015-04-22 21:05               ` Al Viro
@ 2015-04-23  7:45                 ` NeilBrown
  2015-04-23 18:07                   ` Al Viro
  0 siblings, 1 reply; 53+ messages in thread
From: NeilBrown @ 2015-04-23  7:45 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, Linus Torvalds, linux-kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 5113 bytes --]

On Wed, 22 Apr 2015 22:05:53 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Wed, Apr 22, 2015 at 09:12:38PM +0100, Al Viro wrote:
> > On Wed, Apr 22, 2015 at 07:07:59PM +0100, Al Viro wrote:
> > > And one more: may_follow_link() is now potentially oopsable.  Look: suppose
> > > we've reached the link in RCU mode, just as it got unlinked.  link->dentry
> > > has become negative and may_follow_link() steps into
> > >         /* Allowed if owner and follower match. */
> > >         inode = link->dentry->d_inode;
> > >         if (uid_eq(current_cred()->fsuid, inode->i_uid))
> > >                 return 0;
> > > Oops...  Incidentally, I suspect that your __read_seqcount_retry() in
> > > follow_link() might be lacking a barrier; why isn't full read_seqcount_retry()
> > > needed?

Blind copy-paste error I suspect.

> > > 
> > > FWIW, I would rather fetch ->d_inode *and* checked ->seq proir to calling
> > > get_link(), and passed inode to it as an explicit argument.  And passed it
> > > to may_follow_link() as well...
> > 
> > Hrm...  You know, something really weird is going on here.  Where are
> > you setting nd->seq?  I don't see anything in follow_link() doing that.

follow_link calls  link_path_walk -> walk_component -> lookup_fast which sets
nd->seq.  Is that not enough?  I guess not when nd_jump_link is called.  Is
that what I missed?


> > And nd->seq _definitely_ needs setting if you want to stay in RCU mode -
> > at that point it matches the dentry of symlink, not that of nd->path
> > (== parent directory).  Neil, could you tell me which kernel you'd been
> > testing (ideally - commit ID is a public git tree), what config and what
> > tests had those been?

The 'devel' branch of git://neil.brown.name/md, which is based on 4.0-rc7.

Tests have been fairly causal so far, making sure I can follow symlinks on a
small variety of filesystems, and running a particular test which repeatedly
stats a large number of non-existent files with paths that go through a
symlink.


> 
> FWIW, there's a wart that had been annoying me for quite a while, and it
> might be related to dealing with that properly.  Namely, walk_component()
> calling conventions.  We have walk_component(nd, &path, follow), which can
> 	* return -E..., and leave us with pathwalk terminated; path contains
> junk, and so does nd->path.
> 	* return 0, update nd->path, nd->inode and nd->seq.  The contents
> of path is in undefined state - it might be unchanged, it might be equal to
> nd->path (and not pinned down, RCU mode or not).  In any case, callers do
> not touch it afterwards.  That's the normal case.
> 	* return 1, update nd->seq, leave nd->path and nd->inode unchanged and
> set path pointing to our symlink.  nd->seq matches path, not nd->path.
> 
> In all cases the original contents of path is ignored - it's purely 'out'
> parameter, but compiler can't establish that on its own; it _might_ be
> left untouched.  In all cases when its contents survives we don't look at
> it afterwards, but proving that requires a non-trivial analysis.
> 
> And in case when we return 1 (== symlink to be followed), we bugger nd->seq.
> It's left as we need it for unlazy_walk() (and after unlazy_walk() we don't
> care about it at all), so currently everything works, but if we want to
> stay in RCU mode for symlink traversal, we _will_ need ->d_seq of parent
> directory.
> 
> I wonder if the right way to solve that would be to drop the path argument
> entirely and store the bugger in nameidata.  As in
> 	union {
> 		struct qstr last;
> 		struct path link;
> 	};
> 	...
> 	union {
> 		int last_type;
> 		unsigned link_seq;
> 	};
> in struct nameidata.  We never need both at the same time; after
> walk_component() (or its analogue in do_last()) we don't need the component
> name anymore.  That way walk_component() would not trash nd->seq when
> finding a symlink...
> 
> It would also shrink the stack footprint a bit - local struct path next
> in link_path_walk() would be gone, along with the same thing in path_lookupat()
> and friends.  Not a lot of win (4 pointers total), but it might be enough
> to excuse putting ->d_seq of root in there, along with ->link.dentry->d_inode,
> to avoid rechecking its ->d_seq.  As the matter of fact, we do have an
> odd number of 32bit fields in there, so ->d_seq of root would fit nicely...
> 
> Comments?

One thing that is clear to me is that I don't really understand all the
handling of 'seq' numbers, making me unable to comment usefully.
I'll try to go through the current code and you current patch with that issue
in mind and make sure I do understand it.  Then I'll try to comment.

Meanwhile, I like your suggestion in separate email to legitimize the whole
symlink stack when unlazy_walk is needed.  One reason that I didn't even try
that originally is that it was not practical to walk the stack to find
everything that needed legitimizing.  Now that you have that in an explicit
stack it is at least work looking into.

Thanks,
NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
  2015-04-23  7:45                 ` NeilBrown
@ 2015-04-23 18:07                   ` Al Viro
  2015-04-24  6:35                     ` NeilBrown
  0 siblings, 1 reply; 53+ messages in thread
From: Al Viro @ 2015-04-23 18:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: Christoph Hellwig, Linus Torvalds, linux-kernel, linux-fsdevel

On Thu, Apr 23, 2015 at 05:45:44PM +1000, NeilBrown wrote:

> follow_link calls  link_path_walk -> walk_component -> lookup_fast which sets
> nd->seq.  Is that not enough?  I guess not when nd_jump_link is called.  Is
> that what I missed?

No.  Potential nd_jump_link() callers are just refusing to go there in lazy
mode, end of story.  That's not the problem; see below.

> One thing that is clear to me is that I don't really understand all the
> handling of 'seq' numbers, making me unable to comment usefully.
> I'll try to go through the current code and you current patch with that issue
> in mind and make sure I do understand it.  Then I'll try to comment.

OK, here's the basic theory behind the lazy pathwalk:

* during the entire exercise we never drop rcu_read_lock, therefore any
objects that have all references to them removed before RCU-delayed
freeing (inodes, dentries, superblocks and vfsmounts are among such)
that we might find in process won't be freed until after we are done.

* invariant we are keeping:
	at some point between the beginning of walk and now the pathname
traversed so far would lead to nd->path, with nd->seq and nd->inode equal to
->d_seq and ->d_inode of the dentry in question.

* path_init() arranges for that to be true in the beginning of the walk.

* symlinks aside, walk_component() preserves that.
	+ normal name components go through lookup_fast(), where we have
	  __d_lookup_rcu() find a child of current nd->path with matching
	  name and (atomically) picks ->d_seq of that child, which had the
	  name matching our component.  Atomicity is provided by ->d_lock
	  on child.  Then we proceed to pick ->d_inode (and verify that
	  ->d_seq has not changed, thus making sure that ->d_inode value
	  at the moment when the name matched had been the same and child
	  is still hashed.  Then we verify that parent's ->d_seq has not
	  changed, guaranteeing that parent hadn't been moved or unhashed
	  from the moment we'd found it until after we'd found its child.
	  Assuming nothing's mounted on top of that thing, and there's no
	  problem with ->d_revalidate()), that's it - we have new nd->seq,
	  nd->path and nd->inode satisfying our invariant for longer
	  piece of pathname.
	+ "." needs nothing - we just stay where we are
	+ ".." is handled by follow_dotdot_rcu(), which in the normal case
	  (no mountpoint crossing) picks ->d_parent of where we are,
	  fetches its ->d_inode and ->d_seq and verifies that our directory
	  still hadn't changed _its_ ->d_seq.  The last part guarantees that
	  it hadn't been moved since the time we'd found it, and thus its
	  ->d_parent had remained unchanged at least until that verification.
	  Therefore, it remained pinned all along, and it ->d_inode had
	  remained stable, including the moment when we fetched ->d_seq.
	  Which means that the value we had picked *and* its ->d_inode and
	  ->d_seq would satisfy the invariant for the longer piece of
	  pathname.
	+ mountpoint crossing towards leaves is handled in __follow_mount_rcu();
	  it is simple (->mnt_root never changes and is always pinned,
	  stabilizing its ->d_inode), but we might need to worry about
	  automount points *and* need to make sure that we stop right
	  there if mount_lock has been bumped.  See commit b37199e6 for
	  the details on the last part - basically, making sure that false
	  negatives from __lookup_mnt() won't end up with hard error when
	  we walk into whatever had been under the mountpoint we'd missed.
	+ mountpoint crossing towards root (in follow_dotdot_rcu()) is
	  similar, but there we obviously don't care about automounts.
	  Looking at it now, it might make sense to recheck mount_lock there
	  as well, though - potential danger is to hit the moment when
	  mnt_parent and mnt_mountpoint are out of sync, leaving us with
	  mismatched vfsmount,dentry pair.  Generally, that will be caught
	  when we try to leave lazy mode (legitimize_mnt() will fail) or
	  to cross towards leaves, but the next crossing towards root
	  might be bogus as well, and we could end up with unwarranted hard
	  error.  Should be very hard to hit, but it's easy enough to check
	  *and* backport, so it looks like a good idea for -stable.  Linus,
	  do you have any objections against adding
                if (read_seqretry(&mount_lock, nd->m_seq))
                        goto failed;
	  right after
                if (!follow_up_rcu(&nd->path))
                        break;
	  in follow_dotdot_rcu()?  It's a very narrow race with mount --move
	  and most of the time it ends up being completely harmless, but
	  it's possible to construct a case when we'll get a bogus hard error
	  instead of falling back to non-lazy walk...  OTOH, anyone doing
	  that will get something inherently timing-dependent as the result,
	  lazy mode or not.  I'm in favour of adding that check, but IMO it's
	  not a critical problem.

* if we find that we can't continue in lazy mode because some verification
fails, we just fail with -ECHILD.  However, in cases when our current position
might be fine, but the next step can't be done in lazy mode, we attempt to
fall back to non-lazy without full restart that would be caused by -ECHILD.
That's what unlazy_walk() is.  Of course, if we reach the end of walk we
need to leave the lazy mode as well (complete_walk()).  Either can fail,
and such a failure means restart from scratch in non-lazy mode.  We need
to grab references on vfsmount and dentry (or two dentries, when we have
parent and child to deal with).  The interesting part is vfsmount - we need
to make sure we won't interfere with umount(2), having walked in that sucker
*after* umount(2) has checked that it's not busy.  See legitimize_mnt() for
details; basically, we have umount(2) mark the "I've verified it's not busy
and it's going to be killed no matter what" with MNT_SYNC_UMOUNT and rely
on RCU delays on that path - if we find one of those, we undo the
increment of its refcount we'd just done, without having dropped
rcu_read_lock().  Full-blown mntput() is done only on mismatches that
had _not_ been marked that way.

BTW, something similar to the above probably needs to be turned into coherent
text, either in parallel to Documentation/filesystems/path-lookup.txt, or
as update to it.

The reason why walk_component() in your call chain won't suffice is simple -
it will fail this
                /*
                 * This sequence count validates that the parent had no
                 * changes while we did the lookup of the dentry above.
                 *
                 * The memory barrier in read_seqcount_begin of child is
                 *  enough, we can use __read_seqcount_retry here.
                 */
                if (__read_seqcount_retry(&parent->d_seq, nd->seq))
                        return -ECHILD;
in lookup_fast(), just before updating nd->seq to new value.  Basically,
it has no way to tell if parent has been buggered to hell and back.

It's not _that_ hard to prevent - we just need to stop discarding the parent's
seq number in "need to follow a symlink" case of walk_component().  Will take
some massage, but not too much...

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
  2015-04-23 18:07                   ` Al Viro
@ 2015-04-24  6:35                     ` NeilBrown
  2015-04-24 13:42                       ` Al Viro
  0 siblings, 1 reply; 53+ messages in thread
From: NeilBrown @ 2015-04-24  6:35 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, Linus Torvalds, linux-kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 9971 bytes --]

On Thu, 23 Apr 2015 19:07:55 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Thu, Apr 23, 2015 at 05:45:44PM +1000, NeilBrown wrote:
> 
> > follow_link calls  link_path_walk -> walk_component -> lookup_fast which sets
> > nd->seq.  Is that not enough?  I guess not when nd_jump_link is called.  Is
> > that what I missed?
> 
> No.  Potential nd_jump_link() callers are just refusing to go there in lazy
> mode, end of story.  That's not the problem; see below.
> 
> > One thing that is clear to me is that I don't really understand all the
> > handling of 'seq' numbers, making me unable to comment usefully.
> > I'll try to go through the current code and you current patch with that issue
> > in mind and make sure I do understand it.  Then I'll try to comment.
> 
> OK, here's the basic theory behind the lazy pathwalk:
> 
> * during the entire exercise we never drop rcu_read_lock, therefore any
> objects that have all references to them removed before RCU-delayed
> freeing (inodes, dentries, superblocks and vfsmounts are among such)
> that we might find in process won't be freed until after we are done.
> 
> * invariant we are keeping:
> 	at some point between the beginning of walk and now the pathname
> traversed so far would lead to nd->path, with nd->seq and nd->inode equal to
> ->d_seq and ->d_inode of the dentry in question.

Thanks for all of this!
I think one part that was confusing me is that there is a place where nd->seq
does related to nd->path.dentry.
Usually  the two are updated at the same time.
However updated nd->seq, but *doesn't* update nd->path.dentry.
Instead, the dentry is stored in a separate 'path' variable.
This discrepancy is eventually resolved in a call to path_to_nameidata().

Which is a bit confusing until you know what is happening :-)


> 
> * path_init() arranges for that to be true in the beginning of the walk.
> 
> * symlinks aside, walk_component() preserves that.
> 	+ normal name components go through lookup_fast(), where we have
> 	  __d_lookup_rcu() find a child of current nd->path with matching
> 	  name and (atomically) picks ->d_seq of that child, which had the
> 	  name matching our component.  Atomicity is provided by ->d_lock
> 	  on child.  Then we proceed to pick ->d_inode (and verify that

I don't see d_lock being taken  in __d_lookup_rcu.
I think this atomicity is provided by ->d_seq on the child.


> 	  ->d_seq has not changed, thus making sure that ->d_inode value
> 	  at the moment when the name matched had been the same and child
> 	  is still hashed.  Then we verify that parent's ->d_seq has not
> 	  changed, guaranteeing that parent hadn't been moved or unhashed
> 	  from the moment we'd found it until after we'd found its child.
> 	  Assuming nothing's mounted on top of that thing, and there's no
> 	  problem with ->d_revalidate()), that's it - we have new nd->seq,
> 	  nd->path and nd->inode satisfying our invariant for longer
> 	  piece of pathname.
> 	+ "." needs nothing - we just stay where we are
> 	+ ".." is handled by follow_dotdot_rcu(), which in the normal case
> 	  (no mountpoint crossing) picks ->d_parent of where we are,
> 	  fetches its ->d_inode and ->d_seq and verifies that our directory
> 	  still hadn't changed _its_ ->d_seq.  The last part guarantees that
> 	  it hadn't been moved since the time we'd found it, and thus its
> 	  ->d_parent had remained unchanged at least until that verification.
> 	  Therefore, it remained pinned all along, and it ->d_inode had
> 	  remained stable, including the moment when we fetched ->d_seq.
> 	  Which means that the value we had picked *and* its ->d_inode and
> 	  ->d_seq would satisfy the invariant for the longer piece of
> 	  pathname.
> 	+ mountpoint crossing towards leaves is handled in __follow_mount_rcu();
> 	  it is simple (->mnt_root never changes and is always pinned,
> 	  stabilizing its ->d_inode), but we might need to worry about
> 	  automount points *and* need to make sure that we stop right
> 	  there if mount_lock has been bumped.  See commit b37199e6 for
> 	  the details on the last part - basically, making sure that false
> 	  negatives from __lookup_mnt() won't end up with hard error when
> 	  we walk into whatever had been under the mountpoint we'd missed.
> 	+ mountpoint crossing towards root (in follow_dotdot_rcu()) is
> 	  similar, but there we obviously don't care about automounts.
> 	  Looking at it now, it might make sense to recheck mount_lock there
> 	  as well, though - potential danger is to hit the moment when
> 	  mnt_parent and mnt_mountpoint are out of sync, leaving us with
> 	  mismatched vfsmount,dentry pair.  Generally, that will be caught
> 	  when we try to leave lazy mode (legitimize_mnt() will fail) or
> 	  to cross towards leaves, but the next crossing towards root
> 	  might be bogus as well, and we could end up with unwarranted hard
> 	  error.  Should be very hard to hit, but it's easy enough to check
> 	  *and* backport, so it looks like a good idea for -stable.  Linus,
> 	  do you have any objections against adding
>                 if (read_seqretry(&mount_lock, nd->m_seq))
>                         goto failed;
> 	  right after
>                 if (!follow_up_rcu(&nd->path))
>                         break;
> 	  in follow_dotdot_rcu()?  It's a very narrow race with mount --move
> 	  and most of the time it ends up being completely harmless, but
> 	  it's possible to construct a case when we'll get a bogus hard error
> 	  instead of falling back to non-lazy walk...  OTOH, anyone doing
> 	  that will get something inherently timing-dependent as the result,
> 	  lazy mode or not.  I'm in favour of adding that check, but IMO it's
> 	  not a critical problem.
> 
> * if we find that we can't continue in lazy mode because some verification
> fails, we just fail with -ECHILD.  However, in cases when our current position
> might be fine, but the next step can't be done in lazy mode, we attempt to
> fall back to non-lazy without full restart that would be caused by -ECHILD.
> That's what unlazy_walk() is.  Of course, if we reach the end of walk we
> need to leave the lazy mode as well (complete_walk()).  Either can fail,
> and such a failure means restart from scratch in non-lazy mode.  We need
> to grab references on vfsmount and dentry (or two dentries, when we have
> parent and child to deal with).  The interesting part is vfsmount - we need
> to make sure we won't interfere with umount(2), having walked in that sucker
> *after* umount(2) has checked that it's not busy.  See legitimize_mnt() for
> details; basically, we have umount(2) mark the "I've verified it's not busy
> and it's going to be killed no matter what" with MNT_SYNC_UMOUNT and rely
> on RCU delays on that path - if we find one of those, we undo the
> increment of its refcount we'd just done, without having dropped
> rcu_read_lock().  Full-blown mntput() is done only on mismatches that
> had _not_ been marked that way.
> 
> BTW, something similar to the above probably needs to be turned into coherent
> text, either in parallel to Documentation/filesystems/path-lookup.txt, or
> as update to it.

That might be fun.... if I find time.


> 
> The reason why walk_component() in your call chain won't suffice is simple -
> it will fail this
>                 /*
>                  * This sequence count validates that the parent had no
>                  * changes while we did the lookup of the dentry above.
>                  *
>                  * The memory barrier in read_seqcount_begin of child is
>                  *  enough, we can use __read_seqcount_retry here.
>                  */
>                 if (__read_seqcount_retry(&parent->d_seq, nd->seq))
>                         return -ECHILD;
> in lookup_fast(), just before updating nd->seq to new value.  Basically,
> it has no way to tell if parent has been buggered to hell and back.
> 
> It's not _that_ hard to prevent - we just need to stop discarding the parent's
> seq number in "need to follow a symlink" case of walk_component().  Will take
> some massage, but not too much...

So.....
Where I have:

+                       if (nd->flags & LOOKUP_RCU) {
+                               if (!nd->root.mnt)
+                                       set_root_rcu(nd);
+                               nd->path = nd->root;


in the case where the symlink starts '/', I need to set nd->seq

		nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);


but in the case where symlink doesn't start '/', I don't change nd->path,
so nd->seq should still be correct?

Also, just after that (still in follow_link()), we have

		nd->inode = nd->path.dentry->d_inode;

*after* the test for (*s == '/').
Wouldn't is be OK to have that *inside* the if?  In the relative-symlink case
it should be unchanged, and in the jump-symlink case nd_jump_link() has already
done that (and probably NULL was returns for 's' so this code doesn't execute).

So following on top of my previous patches?

Thanks heaps,
NeilBrown


diff --git a/fs/namei.c b/fs/namei.c
index d13b4315447f..ce6387d5317c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -947,6 +947,8 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 				if (!nd->root.mnt)
 					set_root_rcu(nd);
 				nd->path = nd->root;
+				nd->seq = read_seqcount_begin(
+					&nd->path->dentry->d_seq);
 			} else {
 				if (!nd->root.mnt)
 					set_root(nd);
@@ -954,9 +956,9 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 				nd->path = nd->root;
 				path_get(&nd->root);
 			}
+			nd->inode = nd->path.dentry->d_inode;
 			nd->flags |= LOOKUP_JUMPED;
 		}
-		nd->inode = nd->path.dentry->d_inode;
 		error = link_path_walk(s, nd);
 		if (unlikely(error))
 			put_link(nd, link, inode, *p);

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
  2015-04-24  6:35                     ` NeilBrown
@ 2015-04-24 13:42                       ` Al Viro
  2015-05-04  5:11                         ` Al Viro
  0 siblings, 1 reply; 53+ messages in thread
From: Al Viro @ 2015-04-24 13:42 UTC (permalink / raw)
  To: NeilBrown; +Cc: Christoph Hellwig, Linus Torvalds, linux-kernel, linux-fsdevel

On Fri, Apr 24, 2015 at 04:35:34PM +1000, NeilBrown wrote:
> > * path_init() arranges for that to be true in the beginning of the walk.
> > 
> > * symlinks aside, walk_component() preserves that.
> > 	+ normal name components go through lookup_fast(), where we have
> > 	  __d_lookup_rcu() find a child of current nd->path with matching
> > 	  name and (atomically) picks ->d_seq of that child, which had the
> > 	  name matching our component.  Atomicity is provided by ->d_lock
> > 	  on child.  Then we proceed to pick ->d_inode (and verify that
> 
> I don't see d_lock being taken  in __d_lookup_rcu.

Do'h...  No, it isn't - sorry about the braino.

> I think this atomicity is provided by ->d_seq on the child.

Yes.  It's fetch ->d_seq, compare names, then in caller fetch ->d_inode and
compare ->d_seq.

> So.....
> Where I have:
> 
> +                       if (nd->flags & LOOKUP_RCU) {
> +                               if (!nd->root.mnt)
> +                                       set_root_rcu(nd);
> +                               nd->path = nd->root;
> 
> 
> in the case where the symlink starts '/', I need to set nd->seq
> 
> 		nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
> 
> 
> but in the case where symlink doesn't start '/', I don't change nd->path,
> so nd->seq should still be correct?

It would be correct, if walk_component() hadn't _already_ flipped it to
match the symlink.  By that point it's too late - we'd already lost the
old value.  This is the area where it hits the fan:
                if (__read_seqcount_retry(&parent->d_seq, nd->seq))
                        return -ECHILD;
OK, parent is still valid
                nd->seq = seq;
... and we lose its seq number, replacing it with that of a child.

                if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE)) {
                        status = d_revalidate(dentry, nd->flags);
                        if (unlikely(status <= 0)) {
                                if (status != -ECHILD)
                                        need_reval = 0;
                                goto unlazy;
                        }
                }
                path->mnt = mnt;
                path->dentry = dentry;
set path to (nd->path.mnt, dentry of child)
                if (likely(__follow_mount_rcu(nd, path, inode)))
see if we need to (and can) cross the mountpoint here.  That can change
(vfsmount,dentry) pair *and* nd->seq - making it match whatever path->dentry
ends us being.
                        return 0;
if everything's fine, we return to caller, still in RCU mode.
unlazy:
                if (unlazy_walk(nd, dentry))
... and here we attempt to fall back to non-lazy mode, expecting nd->seq to
match dentry.  Win or lose, that's the last point where nd->seq will be looked
at.
                        return -ECHILD;

See the problem?  We have discarded the nd->seq value for parent and we can't
re-pick it without opening a race.

AFAICS, the simplest solution is to have nd->next_seq and set _that_ in
lookup_fast() and __follow_mount_rcu(), using it in unlazy_walk().  And
in callers have it copied into nd->seq after we'd established that it's not
a symlink to be followed.

Handling of absolute symlinks also needs some care - we have no idea whether
nd->root still makes any sense.  It might have become negative by that point.
unlazy_walk() done first would've checked it's still our process' root, but
if we stay in lazy mode, we obviously can't rely on that check.

One possible solution is to do the same kind of check we do in unlazy_walk() - 
        if (!(nd->flags & LOOKUP_ROOT)) {
                spin_lock(&fs->lock);
                if (nd->root.mnt != fs->root.mnt || nd->root.dentry != fs->root.dentry)
			bugger off
		fetch ->d_inode and ->d_seq
                spin_unlock(&fs->lock);
	} else {
		fetch ->d_inode and ->d_seq
	}
Another - to store the ->d_seq of root in nd->seq_root, have set_root_rcu()
set that instead of (or, better, in addition to) returning it to caller,
have path_init() initialize it explicitly in LOOKUP_ROOT case and have this
"reset to root" in following an absolute symlink just do
	set_root_rcu(nd);
	nd->path = nd->root;
	nd->seq = nd->seq_root;
	nd->inode = nd->path.dentry->d_inode;
	check if nd->path.dentry is stale, bugger off if it is

That avoids this spin_lock() on each absolute symlink at the price of extra
32 bits in struct nameidata.  It looks like doing on-demand reallocation
of nd->stack is the right way to go anyway, so the pressure on nameidata size
is going to be weaker and that might be the right way to go...

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
  2015-04-24 13:42                       ` Al Viro
@ 2015-05-04  5:11                         ` Al Viro
  2015-05-04  7:30                           ` NeilBrown
  0 siblings, 1 reply; 53+ messages in thread
From: Al Viro @ 2015-05-04  5:11 UTC (permalink / raw)
  To: NeilBrown; +Cc: Christoph Hellwig, Linus Torvalds, linux-kernel, linux-fsdevel

On Fri, Apr 24, 2015 at 02:42:03PM +0100, Al Viro wrote:

> That avoids this spin_lock() on each absolute symlink at the price of extra
> 32 bits in struct nameidata.  It looks like doing on-demand reallocation
> of nd->stack is the right way to go anyway, so the pressure on nameidata size
> is going to be weaker and that might be the right way to go...

OK, on-demand reallocation is done.  What I have right now is
	* flat MAXSYMLINKS 40, no matter what kind of nesting there might
be.
	* purely iterative link_path_walk().
	* no damn nameidata on stack for generic_readlink()
	* stack footprint of the entire thing independent from the nesting
depth, and about on par with "no symlinks at all" case in mainline.
	* some massage towards RCU follow_link done (in the end of queue),
but quite a bit more work remains.

What I've got so far is in vfs.git#link_path_walk; I'm not too happy about
posting a 70-chunk mailbomb, but it really will need review and testing.
It survives xfstests and LTP with no regressions, but it will need
serious profiling, etc., along with RTFS.  I tried to keep it in reasonably
small pieces, but there's a lot of them ;-/

FWIW, I've a bit more reorganization plotted out, but it's not far from
where we need to be for RCU follow_link.  Some notes:
	* I don't believe we want to pass flags to ->follow_link() - it's
much simpler to give the damn thing NULL for dentry in RCU case.  In *all*
cases where we might have a change to get the symlink body without blocking
we can do that by inode alone.  We obviously want to pass dentry and inode
separately (and in case of fast symlinks we don't hit the filesystem at
all), but that's it - flags isn't needed.
	* terminate_walk() should do bulk put_link().  So should the
failure cases of complete_walk().  _Success_ of complete_walk() should
be careful about legitimizing links - it *can* be called with one link
on stack, and be followed by access to link body.  Yes, really - do_last()
in O_CREAT case.
	* do_last(), lookup_last() and mountpoint_last() ought to
have put_link() done when called on non-empty stack (thus turning the loops
into something like
                while ((err = lookup_last(nd)) > 0) {
                        err = trailing_symlink(nd);
                        if (err)
                                break;
                }
_After_ the point where they don't need to look at the last component of
name, obviously.
	* I think we should leave terminate_walk() to callers in failure
cases of walk_component() and handle_dots(), as well as get_link().  Makes
life simpler in callers, actually.  I'll play with that a bit more.
	* it might make sense to add the second flag to walk_component(),
in addition to LOOKUP_FOLLOW, meaning "do put_link() once you are done looking
at the name".  In principle, it promises simpler logics with unlazy_walk(),
but that's one area I'm still not entirely sure about.  Will need to
experiment a bit...
	* nd->seq clobbering will need to be dealt with, as discussed upthread.
	* I _really_ hate your "let's use the LSB of struct page * to tell
if we need to kunmap()" approach.  It's too damn ugly to live.  _And_ it's
trivial to avoid - all we need is to have (non-lazy) page_follow_link_light()
and page_symlink() to remove __GFP_HIGHMEM from inode->i_mapping before
ever asking to allocate pages there.  That'll suffice, and it makes sense
regardless of RCU work - that kmap/kunmap with potential for minutes in
between (while waiting for stuck NFS server in the middle of symlink traversal)
is simply wrong.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint
  2015-05-04  5:11                         ` Al Viro
@ 2015-05-04  7:30                           ` NeilBrown
  0 siblings, 0 replies; 53+ messages in thread
From: NeilBrown @ 2015-05-04  7:30 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, Linus Torvalds, linux-kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 3935 bytes --]

On Mon, 4 May 2015 06:11:29 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Fri, Apr 24, 2015 at 02:42:03PM +0100, Al Viro wrote:
> 
> > That avoids this spin_lock() on each absolute symlink at the price of extra
> > 32 bits in struct nameidata.  It looks like doing on-demand reallocation
> > of nd->stack is the right way to go anyway, so the pressure on nameidata size
> > is going to be weaker and that might be the right way to go...
> 
> OK, on-demand reallocation is done.  What I have right now is
> 	* flat MAXSYMLINKS 40, no matter what kind of nesting there might
> be.
> 	* purely iterative link_path_walk().
> 	* no damn nameidata on stack for generic_readlink()
> 	* stack footprint of the entire thing independent from the nesting
> depth, and about on par with "no symlinks at all" case in mainline.
> 	* some massage towards RCU follow_link done (in the end of queue),
> but quite a bit more work remains.
> 
> What I've got so far is in vfs.git#link_path_walk; I'm not too happy about
> posting a 70-chunk mailbomb, but it really will need review and testing.
> It survives xfstests and LTP with no regressions, but it will need
> serious profiling, etc., along with RTFS.  I tried to keep it in reasonably
> small pieces, but there's a lot of them ;-/
> 
> FWIW, I've a bit more reorganization plotted out, but it's not far from
> where we need to be for RCU follow_link.  Some notes:
> 	* I don't believe we want to pass flags to ->follow_link() - it's
> much simpler to give the damn thing NULL for dentry in RCU case.  In *all*
> cases where we might have a change to get the symlink body without blocking
> we can do that by inode alone.  We obviously want to pass dentry and inode
> separately (and in case of fast symlinks we don't hit the filesystem at
> all), but that's it - flags isn't needed.
> 	* terminate_walk() should do bulk put_link().  So should the
> failure cases of complete_walk().  _Success_ of complete_walk() should
> be careful about legitimizing links - it *can* be called with one link
> on stack, and be followed by access to link body.  Yes, really - do_last()
> in O_CREAT case.
> 	* do_last(), lookup_last() and mountpoint_last() ought to
> have put_link() done when called on non-empty stack (thus turning the loops
> into something like
>                 while ((err = lookup_last(nd)) > 0) {
>                         err = trailing_symlink(nd);
>                         if (err)
>                                 break;
>                 }
> _After_ the point where they don't need to look at the last component of
> name, obviously.
> 	* I think we should leave terminate_walk() to callers in failure
> cases of walk_component() and handle_dots(), as well as get_link().  Makes
> life simpler in callers, actually.  I'll play with that a bit more.
> 	* it might make sense to add the second flag to walk_component(),
> in addition to LOOKUP_FOLLOW, meaning "do put_link() once you are done looking
> at the name".  In principle, it promises simpler logics with unlazy_walk(),
> but that's one area I'm still not entirely sure about.  Will need to
> experiment a bit...
> 	* nd->seq clobbering will need to be dealt with, as discussed upthread.
> 	* I _really_ hate your "let's use the LSB of struct page * to tell
> if we need to kunmap()" approach.  It's too damn ugly to live.  _And_ it's
> trivial to avoid - all we need is to have (non-lazy) page_follow_link_light()
> and page_symlink() to remove __GFP_HIGHMEM from inode->i_mapping before
> ever asking to allocate pages there.  That'll suffice, and it makes sense
> regardless of RCU work - that kmap/kunmap with potential for minutes in
> between (while waiting for stuck NFS server in the middle of symlink traversal)
> is simply wrong.


Thanks!
I'll have another look and see about adding what is needed for RCU symlink
support.

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2015-05-04  7:30 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-20 18:12 [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
2015-04-20 18:12 ` [PATCH 01/24] lustre: rip the private symlink nesting limit out Al Viro
2015-04-20 19:08   ` Andreas Dilger
2015-04-20 19:22     ` Al Viro
2015-04-20 20:35       ` Al Viro
2015-04-20 18:12 ` [PATCH 02/24] VFS: replace {, total_}link_count in task_struct with pointer to nameidata Al Viro
2015-04-20 18:12 ` [PATCH 03/24] ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link Al Viro
2015-04-20 18:12 ` [PATCH 04/24] VFS: replace nameidata arg to ->put_link with a char* Al Viro
2015-04-20 18:12 ` [PATCH 05/24] SECURITY: remove nameidata arg from inode_follow_link Al Viro
2015-04-20 18:12 ` [PATCH 06/24] VFS: remove nameidata args from ->follow_link Al Viro
2015-04-20 18:12 ` [PATCH 07/24] namei: expand nested_symlink() in its only caller Al Viro
2015-04-20 18:12 ` [PATCH 08/24] namei.c: separate the parts of follow_link() that find the link body Al Viro
2015-04-20 18:12 ` [PATCH 09/24] namei: fold follow_link() into link_path_walk() Al Viro
2015-04-20 18:12 ` [PATCH 10/24] link_path_walk: handle get_link() returning ERR_PTR() immediately Al Viro
2015-04-20 18:12 ` [PATCH 11/24] link_path_walk: don't bother with walk_component() after jumping link Al Viro
2015-04-20 18:12 ` [PATCH 12/24] link_path_walk: turn inner loop into explicit goto Al Viro
2015-04-20 18:12 ` [PATCH 13/24] link_path_walk: massage a bit more Al Viro
2015-04-20 18:12 ` [PATCH 14/24] link_path_walk: get rid of duplication Al Viro
2015-04-20 18:12 ` [PATCH 15/24] link_path_walk: final preparations to killing recursion Al Viro
2015-04-20 18:13 ` [PATCH 16/24] link_path_walk: kill the recursion Al Viro
2015-04-20 21:04   ` Linus Torvalds
2015-04-20 21:32     ` Al Viro
2015-04-20 21:39       ` Linus Torvalds
2015-04-20 21:51         ` Al Viro
2015-04-20 21:41       ` Linus Torvalds
2015-04-20 21:42         ` Linus Torvalds
2015-04-20 21:59           ` Al Viro
2015-04-20 21:52         ` Al Viro
2015-04-20 18:13 ` [PATCH 17/24] link_path_walk: split "return from recursive call" path Al Viro
2015-04-20 18:13 ` [PATCH 18/24] link_path_walk: cleanup - turn goto start; into continue; Al Viro
2015-04-20 18:13 ` [PATCH 19/24] namei: fold may_follow_link() into follow_link() Al Viro
2015-04-20 18:13 ` [PATCH 20/24] namei: introduce nameidata->stack Al Viro
2015-04-20 18:13 ` [PATCH 21/24] namei: regularize use of put_link() and follow_link(), trim arguments Al Viro
2015-04-20 18:13 ` [PATCH 22/24] namei: trim the arguments of get_link() Al Viro
2015-04-20 18:13 ` [PATCH 23/24] new ->follow_link() and ->put_link() calling conventions Al Viro
2015-04-20 18:13 ` [PATCH 24/24] uninline walk_component() Al Viro
2015-04-21 14:49 ` [RFC][PATCHSET] non-recursive link_path_walk() and reducing stack footprint Al Viro
2015-04-21 15:04   ` Christoph Hellwig
2015-04-21 15:12     ` Richard Weinberger
2015-04-21 15:45       ` Al Viro
2015-04-21 16:46         ` Boaz Harrosh
2015-04-21 21:20         ` Al Viro
2015-04-22 18:07           ` Al Viro
2015-04-22 20:12             ` Al Viro
2015-04-22 21:05               ` Al Viro
2015-04-23  7:45                 ` NeilBrown
2015-04-23 18:07                   ` Al Viro
2015-04-24  6:35                     ` NeilBrown
2015-04-24 13:42                       ` Al Viro
2015-05-04  5:11                         ` Al Viro
2015-05-04  7:30                           ` NeilBrown
2015-04-23  5:01           ` Al Viro
2015-04-21 14:51 ` [PATCH] logfs: fix a pagecache leak for symlinks Al Viro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.