linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] Support follow_link in RCU-walk.
@ 2015-03-05  5:21 NeilBrown
  2015-03-05  5:21 ` [PATCH 4/9] VFS/namei: abort RCU-walk on symlink if atime needs updating NeilBrown
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: NeilBrown @ 2015-03-05  5:21 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-fsdevel, linux-kernel

Hi Al (and others),

 I wonder if you could look over this patchset.
 It allows RCU-walk to follow symlinks in many common cases,
 thus removing a surprising performance hit caused by using symlinks.

 The last could of patches make changes to XFS and NFS to support
 this but I haven't forwarded to the relevant lists yet.
 If/when the early code meets with approval I'll do that.

 The first patch almost certainly needs to be changed.  I originally
 wrote this code when filesystems could see inside nameidata.
 It is now opaque so the simplest solution was to provide an
 accessor function.
 Maybe I should as a 'flags' arg to ->follow_link?? Or have
 ->follow_link and ->follow_link_rcu ??
 What do you suggest?


Thanks,
NeilBrown


---

NeilBrown (9):
      FS: make all ->follow_link handlers aware for LOOKUP_RCU
      VFS/namei: use terminate_walk when symlink lookup fails.
      VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU.
      VFS/namei: abort RCU-walk on symlink if atime needs updating.
      VFS/namei: enhance follow_link to support RCU-walk.
      VFS/namei: enable RCU-walk when following symlinks.
      VFS/namei: handle LOOKUP_RCU in page_follow_link_light.
      XFS: allow follow_link to often succeed in RCU-walk.
      NFS: support LOOKUP_RCU in nfs_follow_link.


 drivers/staging/lustre/lustre/llite/symlink.c |    3 +
 fs/9p/vfs_inode.c                             |    6 +
 fs/9p/vfs_inode_dotl.c                        |    5 +
 fs/befs/linuxvfs.c                            |    2 
 fs/cifs/link.c                                |    2 
 fs/configfs/symlink.c                         |    7 +
 fs/ecryptfs/inode.c                           |    7 +
 fs/fuse/dir.c                                 |    2 
 fs/gfs2/inode.c                               |    2 
 fs/hostfs/hostfs_kern.c                       |    7 +
 fs/inode.c                                    |   26 ++++-
 fs/kernfs/symlink.c                           |    7 +
 fs/namei.c                                    |  132 ++++++++++++++++---------
 fs/nfs/inode.c                                |   22 ++++
 fs/nfs/symlink.c                              |   17 +++
 fs/overlayfs/inode.c                          |    3 +
 fs/proc/base.c                                |    2 
 fs/proc/namespaces.c                          |    3 +
 fs/proc/self.c                                |    4 +
 fs/proc/thread_self.c                         |    4 +
 fs/xfs/xfs_ioctl.c                            |    2 
 fs/xfs/xfs_iops.c                             |   13 ++
 fs/xfs/xfs_symlink.c                          |   11 ++
 fs/xfs/xfs_symlink.h                          |    2 
 include/linux/fs.h                            |    2 
 include/linux/namei.h                         |    1 
 include/linux/nfs_fs.h                        |    1 
 mm/shmem.c                                    |    6 +
 28 files changed, 234 insertions(+), 67 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/9] FS: make all ->follow_link handlers aware for LOOKUP_RCU
  2015-03-05  5:21 [PATCH 0/9] Support follow_link in RCU-walk NeilBrown
  2015-03-05  5:21 ` [PATCH 4/9] VFS/namei: abort RCU-walk on symlink if atime needs updating NeilBrown
@ 2015-03-05  5:21 ` NeilBrown
  2015-03-05  5:21 ` [PATCH 8/9] XFS: allow follow_link to often succeed in RCU-walk NeilBrown
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2015-03-05  5:21 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-fsdevel, linux-kernel

In preparation for supporting ->follow_link in RCU-walk,
make sure all ->follow_link handers which are not atomic
will fail if LOOKUP_RCU is set.

Later patches will make some of these handle LOOKUP_RCU
more gracefully.

This is current achieved by introducing a new function
"nd_is_rcu" to check if a nameidata has LOOKUP_RCU set.
There must be a better way.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/staging/lustre/lustre/llite/symlink.c |    3 +++
 fs/9p/vfs_inode.c                             |    6 +++++-
 fs/9p/vfs_inode_dotl.c                        |    5 ++++-
 fs/befs/linuxvfs.c                            |    2 ++
 fs/cifs/link.c                                |    2 ++
 fs/configfs/symlink.c                         |    7 ++++++-
 fs/ecryptfs/inode.c                           |    7 ++++++-
 fs/fuse/dir.c                                 |    2 ++
 fs/gfs2/inode.c                               |    2 ++
 fs/hostfs/hostfs_kern.c                       |    7 ++++++-
 fs/kernfs/symlink.c                           |    7 ++++++-
 fs/namei.c                                    |    8 ++++++++
 fs/nfs/symlink.c                              |    2 ++
 fs/overlayfs/inode.c                          |    3 +++
 fs/proc/base.c                                |    2 ++
 fs/proc/namespaces.c                          |    3 +++
 fs/proc/self.c                                |    4 ++++
 fs/proc/thread_self.c                         |    4 ++++
 fs/xfs/xfs_iops.c                             |    2 ++
 include/linux/fs.h                            |    1 +
 mm/shmem.c                                    |    6 +++++-
 21 files changed, 78 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index 686b6a574cc5..d4ad3925f11f 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -125,6 +125,9 @@ static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
 	int rc;
 	char *symname = NULL;
 
+	if (nd_is_rcu(nd))
+		return PTR_ERR(-ECHILD);
+
 	CDEBUG(D_VFSTRACE, "VFS Op\n");
 	/* Limit the recursive symlink depth to 5 instead of default
 	 * 8 links when kernel has 4k stack to prevent stack overflow.
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 3662f1d1d9cf..8aff5d684154 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1281,7 +1281,11 @@ done:
 static void *v9fs_vfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
 	int len = 0;
-	char *link = __getname();
+	char *link;
+
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+	link = __getname();
 
 	p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);
 
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 6054c16b8fae..51776a3cc842 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -914,9 +914,12 @@ v9fs_vfs_follow_link_dotl(struct dentry *dentry, struct nameidata *nd)
 {
 	int retval;
 	struct p9_fid *fid;
-	char *link = __getname();
+	char *link;
 	char *target;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+	link = __getname();
 	p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);
 
 	if (!link) {
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index e089f1985fca..bbe8f90924b2 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -477,6 +477,8 @@ befs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	befs_off_t len = data->size;
 	char *link;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
 	if (len == 0) {
 		befs_error(sb, "Long symlink with illegal length");
 		link = ERR_PTR(-EIO);
diff --git a/fs/cifs/link.c b/fs/cifs/link.c
index 2ec6037f61c7..0dbe1a326632 100644
--- a/fs/cifs/link.c
+++ b/fs/cifs/link.c
@@ -639,6 +639,8 @@ cifs_follow_link(struct dentry *direntry, struct nameidata *nd)
 	struct cifs_tcon *tcon;
 	struct TCP_Server_Info *server;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
 	xid = get_xid();
 
 	tlink = cifs_sb_tlink(cifs_sb);
diff --git a/fs/configfs/symlink.c b/fs/configfs/symlink.c
index cc9f2546ea4a..1397342aad5b 100644
--- a/fs/configfs/symlink.c
+++ b/fs/configfs/symlink.c
@@ -282,7 +282,12 @@ static int configfs_getlink(struct dentry *dentry, char * path)
 static void *configfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
 	int error = -ENOMEM;
-	unsigned long page = get_zeroed_page(GFP_KERNEL);
+	unsigned long page;
+
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+
+	page = get_zeroed_page(GFP_KERNEL);
 
 	if (page) {
 		error = configfs_getlink(dentry, (char *)page);
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index b08b5187f662..49d3dd96344c 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -678,7 +678,12 @@ out:
 static void *ecryptfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
 	size_t len;
-	char *buf = ecryptfs_readlink_lower(dentry, &len);
+	char *buf;
+
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+
+	buf = ecryptfs_readlink_lower(dentry, &len);
 	if (IS_ERR(buf))
 		goto out;
 	fsstack_copy_attr_atime(dentry->d_inode,
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 1545b711ddcf..15d326ec5943 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1402,6 +1402,8 @@ static void free_link(char *link)
 
 static void *fuse_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
 	nd_set_link(nd, read_link(dentry));
 	return NULL;
 }
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 73c72253faac..21086c7870f1 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1557,6 +1557,8 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
 	char *buf;
 	int error;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
 	gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &i_gh);
 	error = gfs2_glock_nq(&i_gh);
 	if (error) {
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index fd62cae0fdcb..374d04909538 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -884,7 +884,12 @@ static const struct inode_operations hostfs_dir_iops = {
 
 static void *hostfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
-	char *link = __getname();
+	char *link;
+
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+
+	link = __getname();
 	if (link) {
 		char *path = dentry_name(dentry);
 		int err = -ENOMEM;
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 8a198898e39a..8e5421f386c0 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -115,7 +115,12 @@ static int kernfs_getlink(struct dentry *dentry, char *path)
 static void *kernfs_iop_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
 	int error = -ENOMEM;
-	unsigned long page = get_zeroed_page(GFP_KERNEL);
+	unsigned long page;
+
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+
+	page = get_zeroed_page(GFP_KERNEL);
 	if (page) {
 		error = kernfs_getlink(dentry, (char *) page);
 		if (error < 0)
diff --git a/fs/namei.c b/fs/namei.c
index c83145af4bfc..de508a3cec42 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4475,6 +4475,8 @@ EXPORT_SYMBOL(page_readlink);
 void *page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
 {
 	struct page *page = NULL;
+	if (nd->flags & LOOKUP_RCU)
+		return ERR_PTR(-ECHILD);
 	nd_set_link(nd, page_getlink(dentry, &page));
 	return page;
 }
@@ -4542,3 +4544,9 @@ const struct inode_operations page_symlink_inode_operations = {
 	.put_link	= page_put_link,
 };
 EXPORT_SYMBOL(page_symlink_inode_operations);
+
+int nd_is_rcu(struct nameidata *nd)
+{
+	return nd->flags & LOOKUP_RCU;
+}
+EXPORT_SYMBOL(nd_is_rcu);
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index 05c9e02f4153..c9a2d3cc4619 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -49,6 +49,8 @@ static void *nfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	struct page *page;
 	void *err;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
 	err = ERR_PTR(nfs_revalidate_mapping(inode, inode->i_mapping));
 	if (err)
 		goto read_failed;
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 04f124884687..db370d5d84c4 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/fs.h>
+#include <linux/namei.h>
 #include <linux/slab.h>
 #include <linux/xattr.h>
 #include "overlayfs.h"
@@ -146,6 +147,8 @@ static void *ovl_follow_link(struct dentry *dentry, struct nameidata *nd)
 	struct dentry *realdentry;
 	struct inode *realinode;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
 	realdentry = ovl_dentry_real(dentry);
 	realinode = realdentry->d_inode;
 
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 3f3d7aeb0712..6f5dbfe68516 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1377,6 +1377,8 @@ static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
 	struct path path;
 	int error = -EACCES;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
 	/* Are we allowed to snoop on the tasks file descriptors? */
 	if (!proc_fd_access_allowed(inode))
 		goto out;
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index c9eac4563fa8..c89a51401bb5 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -38,6 +38,9 @@ static void *proc_ns_follow_link(struct dentry *dentry, struct nameidata *nd)
 	struct path ns_path;
 	void *error = ERR_PTR(-EACCES);
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+
 	task = get_proc_task(inode);
 	if (!task)
 		return error;
diff --git a/fs/proc/self.c b/fs/proc/self.c
index 4348bb8907c2..faf0cf8046f1 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -24,6 +24,10 @@ static void *proc_self_follow_link(struct dentry *dentry, struct nameidata *nd)
 	struct pid_namespace *ns = dentry->d_sb->s_fs_info;
 	pid_t tgid = task_tgid_nr_ns(current, ns);
 	char *name = ERR_PTR(-ENOENT);
+
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+
 	if (tgid) {
 		/* 11 for max length of signed int in decimal + NULL term */
 		name = kmalloc(12, GFP_KERNEL);
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index 59075b509df3..954cf6699393 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -26,6 +26,10 @@ static void *proc_thread_self_follow_link(struct dentry *dentry, struct nameidat
 	pid_t tgid = task_tgid_nr_ns(current, ns);
 	pid_t pid = task_pid_nr_ns(current, ns);
 	char *name = ERR_PTR(-ENOENT);
+
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+
 	if (pid) {
 		name = kmalloc(PROC_NUMBUF + 6 + PROC_NUMBUF, GFP_KERNEL);
 		if (!name)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index e53a90331422..23cea798b777 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -417,6 +417,8 @@ xfs_vn_follow_link(
 	char			*link;
 	int			error = -ENOMEM;
 
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
 	link = kmalloc(MAXPATHLEN+1, GFP_KERNEL);
 	if (!link)
 		goto out_err;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b821fa32ba3f..eaef987ae3cf 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2167,6 +2167,7 @@ extern struct filename *getname_flags(const char __user *, int, int *);
 extern struct filename *getname(const char __user *);
 extern struct filename *getname_kernel(const char *);
 extern void putname(struct filename *name);
+extern int nd_is_rcu(struct nameidata *nd);
 
 enum {
 	FILE_CREATED = 1,
diff --git a/mm/shmem.c b/mm/shmem.c
index cf2d0ca010bc..fdf6ba18fce3 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2483,7 +2483,11 @@ static void *shmem_follow_short_symlink(struct dentry *dentry, struct nameidata
 static void *shmem_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
 	struct page *page = NULL;
-	int error = shmem_getpage(dentry->d_inode, 0, &page, SGP_READ, NULL);
+	int error;
+
+	if (nd_is_rcu(nd))
+		return ERR_PTR(-ECHILD);
+	error = shmem_getpage(dentry->d_inode, 0, &page, SGP_READ, NULL);
 	nd_set_link(nd, error ? ERR_PTR(error) : kmap(page));
 	if (page)
 		unlock_page(page);



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/9] VFS/namei: use terminate_walk when symlink lookup fails.
  2015-03-05  5:21 [PATCH 0/9] Support follow_link in RCU-walk NeilBrown
                   ` (2 preceding siblings ...)
  2015-03-05  5:21 ` [PATCH 8/9] XFS: allow follow_link to often succeed in RCU-walk NeilBrown
@ 2015-03-05  5:21 ` NeilBrown
  2015-03-05  5:21 ` [PATCH 9/9] NFS: support LOOKUP_RCU in nfs_follow_link NeilBrown
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2015-03-05  5:21 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-fsdevel, linux-kernel

Currently following a symlink never uses rcu-walk, so
terminate_walk isn't needed.
That will change in a future patch.  In preparation, change
some
  path_put_condtional()
  path_put()
sequences to
  path_to_nameidata()
  terminate_walk()

These sequence are identical when in ref-walk, and correct when in
rcu-walk.

Also change two path_put() calls to equivalent terminate_walk().

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |   40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index de508a3cec42..515adc4594be 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -726,6 +726,18 @@ char *nd_get_link(struct nameidata *nd)
 }
 EXPORT_SYMBOL(nd_get_link);
 
+static void terminate_walk(struct nameidata *nd)
+{
+	if (!(nd->flags & LOOKUP_RCU)) {
+		path_put(&nd->path);
+	} else {
+		nd->flags &= ~LOOKUP_RCU;
+		if (!(nd->flags & LOOKUP_ROOT))
+			nd->root.mnt = NULL;
+		rcu_read_unlock();
+	}
+}
+
 static inline void put_link(struct nameidata *nd, struct path *link, void *cookie)
 {
 	struct inode *inode = link->dentry->d_inode;
@@ -776,8 +788,8 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd)
 		return 0;
 
 	audit_log_link_denied("follow_link", link);
-	path_put_conditional(link, nd);
-	path_put(&nd->path);
+	path_to_nameidata(link, nd);
+	terminate_walk(nd);
 	return -EACCES;
 }
 
@@ -886,7 +898,7 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	s = nd_get_link(nd);
 	if (s) {
 		if (unlikely(IS_ERR(s))) {
-			path_put(&nd->path);
+			terminate_walk(nd);
 			put_link(nd, link, *p);
 			return PTR_ERR(s);
 		}
@@ -908,7 +920,7 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 
 out_put_nd_path:
 	*p = NULL;
-	path_put(&nd->path);
+	terminate_walk(nd);
 	path_put(link);
 	return error;
 }
@@ -1539,18 +1551,6 @@ static inline int handle_dots(struct nameidata *nd, int type)
 	return 0;
 }
 
-static void terminate_walk(struct nameidata *nd)
-{
-	if (!(nd->flags & LOOKUP_RCU)) {
-		path_put(&nd->path);
-	} else {
-		nd->flags &= ~LOOKUP_RCU;
-		if (!(nd->flags & LOOKUP_ROOT))
-			nd->root.mnt = NULL;
-		rcu_read_unlock();
-	}
-}
-
 /*
  * Do we need to follow links? We _really_ want to be able
  * to do this check without having to look at inode->i_op,
@@ -1622,8 +1622,8 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
 	int res;
 
 	if (unlikely(current->link_count >= MAX_NESTED_LINKS)) {
-		path_put_conditional(path, nd);
-		path_put(&nd->path);
+		path_to_nameidata(path, nd);
+		terminate_walk(nd);
 		return -ELOOP;
 	}
 	BUG_ON(nd->depth >= MAX_NESTED_LINKS);
@@ -3238,8 +3238,8 @@ static struct file *path_openat(int dfd, struct filename *pathname,
 		struct path link = path;
 		void *cookie;
 		if (!(nd->flags & LOOKUP_FOLLOW)) {
-			path_put_conditional(&path, nd);
-			path_put(&nd->path);
+			path_to_nameidata(&path, nd);
+			terminate_walk(nd);
 			error = -ELOOP;
 			break;
 		}



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/9] VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU.
  2015-03-05  5:21 [PATCH 0/9] Support follow_link in RCU-walk NeilBrown
                   ` (6 preceding siblings ...)
  2015-03-05  5:21 ` [PATCH 6/9] VFS/namei: enable RCU-walk when following symlinks NeilBrown
@ 2015-03-05  5:21 ` NeilBrown
  2015-03-05  5:21 ` [PATCH 7/9] VFS/namei: handle LOOKUP_RCU in page_follow_link_light NeilBrown
  2015-03-05  6:05 ` [PATCH 0/9] Support follow_link in RCU-walk Al Viro
  9 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2015-03-05  5:21 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-fsdevel, linux-kernel

When we support ->follow_link in RCU-walk we will not want to
take a reference to the 'struct path *link' passed to follow_link,
and correspondingly will not want to drop that reference.

As link_path_walk will complete_walk() in the case of an error,
and as complete_walk() will clear LOOKUP_RCU, we cannot test
LOOKUP_RCU to determine if the path should be 'put'.

So introduce a new flag: LOOKUP_LINK_RCU.  This is set on
entry to follow_link() if appropriate and put_link() will
only call path_put() if it is clear.

Also, unlazy_walk() will fail if LOOKUP_LINK_RCU is set.
This is because there is no way for unlazy_walk to get references
on all the "struct path *link"s that are protected by that flag.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c            |   18 +++++++++++++-----
 include/linux/namei.h |    1 +
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 515adc4594be..a4879bb3b4ee 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -533,6 +533,9 @@ static int unlazy_walk(struct nameidata *nd, struct dentry *dentry)
 	struct dentry *parent = nd->path.dentry;
 
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
+	if (nd->flags & LOOKUP_LINK_RCU)
+		/* Cannot unlazy in the middle of following a symlink */
+		return -ECHILD;
 
 	/*
 	 * After legitimizing the bastards, terminate_walk()
@@ -743,7 +746,8 @@ static inline void put_link(struct nameidata *nd, struct path *link, void *cooki
 	struct inode *inode = link->dentry->d_inode;
 	if (inode->i_op->put_link)
 		inode->i_op->put_link(link->dentry, nd, cookie);
-	path_put(link);
+	if (!(nd->flags & LOOKUP_LINK_RCU))
+		path_put(link);
 }
 
 int sysctl_protected_symlinks __read_mostly = 0;
@@ -869,9 +873,10 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	int error;
 	char *s;
 
-	BUG_ON(nd->flags & LOOKUP_RCU);
-
-	if (link->mnt == nd->path.mnt)
+	nd->flags &= ~LOOKUP_LINK_RCU;
+	if (nd->flags & LOOKUP_RCU)
+		nd->flags |= LOOKUP_LINK_RCU;
+	else if (link->mnt == nd->path.mnt)
 		mntget(link->mnt);
 
 	error = -ELOOP;
@@ -921,7 +926,8 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 out_put_nd_path:
 	*p = NULL;
 	terminate_walk(nd);
-	path_put(link);
+	if (!(nd->flags & LOOKUP_LINK_RCU))
+		path_put(link);
 	return error;
 }
 
@@ -1644,6 +1650,8 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd)
 
 	current->link_count--;
 	nd->depth--;
+	if (!nd->depth)
+		nd->flags &= ~LOOKUP_LINK_RCU;
 	return res;
 }
 
diff --git a/include/linux/namei.h b/include/linux/namei.h
index c8990779f0c3..34bc87845b0e 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -32,6 +32,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND};
 #define LOOKUP_PARENT		0x0010
 #define LOOKUP_REVAL		0x0020
 #define LOOKUP_RCU		0x0040
+#define LOOKUP_LINK_RCU		0x0080
 
 /*
  * Intent data



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/9] VFS/namei: abort RCU-walk on symlink if atime needs updating.
  2015-03-05  5:21 [PATCH 0/9] Support follow_link in RCU-walk NeilBrown
@ 2015-03-05  5:21 ` NeilBrown
  2015-03-05  5:21 ` [PATCH 1/9] FS: make all ->follow_link handlers aware for LOOKUP_RCU NeilBrown
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2015-03-05  5:21 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-fsdevel, linux-kernel

touch_atime is not RCU-safe, and so cannot be called on an
RCU walk.
However in situations where RCU-walk makes a difference,
the symlink will likely to accessed much more often than
it is useful to update the atime.

So split out the test of "Does the atime actually need to be updated"
into  atime_needs_update(), and only allow RCU-walk on a symlink if
that fails.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/inode.c         |   26 +++++++++++++++++++-------
 fs/namei.c         |    7 ++++++-
 include/linux/fs.h |    1 +
 3 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index f00b16f45507..a0da920e4650 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1584,30 +1584,41 @@ static int update_time(struct inode *inode, struct timespec *time, int flags)
  *	This function automatically handles read only file systems and media,
  *	as well as the "noatime" flag and inode specific "noatime" markers.
  */
-void touch_atime(const struct path *path)
+int atime_needs_update(const struct path *path)
 {
 	struct vfsmount *mnt = path->mnt;
 	struct inode *inode = path->dentry->d_inode;
 	struct timespec now;
 
 	if (inode->i_flags & S_NOATIME)
-		return;
+		return 0;
 	if (IS_NOATIME(inode))
-		return;
+		return 0;
 	if ((inode->i_sb->s_flags & MS_NODIRATIME) && S_ISDIR(inode->i_mode))
-		return;
+		return 0;
 
 	if (mnt->mnt_flags & MNT_NOATIME)
-		return;
+		return 0;
 	if ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode))
-		return;
+		return 0;
 
 	now = current_fs_time(inode->i_sb);
 
 	if (!relatime_need_update(mnt, inode, now))
-		return;
+		return 0;
 
 	if (timespec_equal(&inode->i_atime, &now))
+		return 0;
+	return 1;
+}
+
+void touch_atime(const struct path *path)
+{
+	struct vfsmount *mnt = path->mnt;
+	struct inode *inode = path->dentry->d_inode;
+	struct timespec now;
+
+	if (!atime_needs_update(path))
 		return;
 
 	if (!sb_start_write_trylock(inode->i_sb))
@@ -1624,6 +1635,7 @@ void touch_atime(const struct path *path)
 	 * We may also fail on filesystems that have the ability to make parts
 	 * of the fs read only, e.g. subvolumes in Btrfs.
 	 */
+	now = current_fs_time(inode->i_sb);
 	update_time(inode, &now, S_ATIME);
 	__mnt_drop_write(mnt);
 skip_update:
diff --git a/fs/namei.c b/fs/namei.c
index a4879bb3b4ee..5814d1b2daab 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -886,7 +886,12 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	cond_resched();
 	current->total_link_count++;
 
-	touch_atime(link);
+	if (nd->flags & LOOKUP_RCU) {
+		error = -ECHILD;
+		if (atime_needs_update(link))
+			goto out_put_nd_path;
+	} else
+		touch_atime(link);
 	nd_set_link(nd, NULL);
 
 	error = security_inode_follow_link(link->dentry, nd);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index eaef987ae3cf..6665aa16abcb 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1845,6 +1845,7 @@ enum file_time_flags {
 	S_VERSION = 8,
 };
 
+extern int atime_needs_update(const struct path *);
 extern void touch_atime(const struct path *);
 static inline void file_accessed(struct file *file)
 {



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/9] VFS/namei: enhance follow_link to support RCU-walk.
  2015-03-05  5:21 [PATCH 0/9] Support follow_link in RCU-walk NeilBrown
                   ` (4 preceding siblings ...)
  2015-03-05  5:21 ` [PATCH 9/9] NFS: support LOOKUP_RCU in nfs_follow_link NeilBrown
@ 2015-03-05  5:21 ` NeilBrown
  2015-03-05  5:21 ` [PATCH 6/9] VFS/namei: enable RCU-walk when following symlinks NeilBrown
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2015-03-05  5:21 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-fsdevel, linux-kernel

If LOOKUP_RCU is set, follow_link will not take/drop reference counts.

Replace cond_resched() with _cond_resched() as the latter
is a no-op if rcu_read_lock() is held while the former will
give a warning in that case.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |   19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 5814d1b2daab..4ddbc1ef5726 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -883,7 +883,8 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 	if (unlikely(current->total_link_count >= 40))
 		goto out_put_nd_path;
 
-	cond_resched();
+	/* If rcu_read_locked(), this will not resched, and will not warn */
+	_cond_resched();
 	current->total_link_count++;
 
 	if (nd->flags & LOOKUP_RCU) {
@@ -913,11 +914,17 @@ follow_link(struct path *link, struct nameidata *nd, void **p)
 			return PTR_ERR(s);
 		}
 		if (*s == '/') {
-			if (!nd->root.mnt)
-				set_root(nd);
-			path_put(&nd->path);
-			nd->path = nd->root;
-			path_get(&nd->root);
+			if (nd->flags & LOOKUP_RCU) {
+				if (!nd->root.mnt)
+					set_root_rcu(nd);
+				nd->path = nd->root;
+			} else {
+				if (!nd->root.mnt)
+					set_root(nd);
+				path_put(&nd->path);
+				nd->path = nd->root;
+				path_get(&nd->root);
+			}
 			nd->flags |= LOOKUP_JUMPED;
 		}
 		nd->inode = nd->path.dentry->d_inode;



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/9] VFS/namei: enable RCU-walk when following symlinks.
  2015-03-05  5:21 [PATCH 0/9] Support follow_link in RCU-walk NeilBrown
                   ` (5 preceding siblings ...)
  2015-03-05  5:21 ` [PATCH 5/9] VFS/namei: enhance follow_link to support RCU-walk NeilBrown
@ 2015-03-05  5:21 ` NeilBrown
  2015-03-05  5:21 ` [PATCH 3/9] VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU NeilBrown
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2015-03-05  5:21 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-fsdevel, linux-kernel

Now that follow_link handles LOOKUP_RCU, we do not need to
'unlazy_walk' when a symlink is found.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |   12 ------------
 1 file changed, 12 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 4ddbc1ef5726..11e6b2068c96 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1608,12 +1608,6 @@ static inline int walk_component(struct nameidata *nd, struct path *path,
 		goto out_path_put;
 
 	if (should_follow_link(path->dentry, follow)) {
-		if (nd->flags & LOOKUP_RCU) {
-			if (unlikely(unlazy_walk(nd, path->dentry))) {
-				err = -ECHILD;
-				goto out_err;
-			}
-		}
 		BUG_ON(inode != path->dentry->d_inode);
 		return 1;
 	}
@@ -3066,12 +3060,6 @@ finish_lookup:
 	}
 
 	if (should_follow_link(path->dentry, !symlink_ok)) {
-		if (nd->flags & LOOKUP_RCU) {
-			if (unlikely(unlazy_walk(nd, path->dentry))) {
-				error = -ECHILD;
-				goto out;
-			}
-		}
 		BUG_ON(inode != path->dentry->d_inode);
 		return 1;
 	}



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/9] VFS/namei: handle LOOKUP_RCU in page_follow_link_light.
  2015-03-05  5:21 [PATCH 0/9] Support follow_link in RCU-walk NeilBrown
                   ` (7 preceding siblings ...)
  2015-03-05  5:21 ` [PATCH 3/9] VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU NeilBrown
@ 2015-03-05  5:21 ` NeilBrown
  2015-03-05  6:05 ` [PATCH 0/9] Support follow_link in RCU-walk Al Viro
  9 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2015-03-05  5:21 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-fsdevel, linux-kernel

If the symlink has already be been read-in, then
page_follow_link_light can succeed in RCU-walk mode.
page_getlink_rcu() is added to support this.

With this many filesystems can follow links in RCU-walk
mode when everything is cached.  This  includes ext?fs and
others.

If the page is a HighMem page we do *not* try to kmap_atomic,
but simply give up - only page_address() is used.
This is because we need to be able to sleep while holding
the address of the page, particularly over calls to do_last()
which can be quite slow and in particular takes a mutex.

If this were a problem, then copying into a GFP_ATOMIC allocation
might be a workable solution.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |   30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 11e6b2068c96..48571b2eaa18 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4468,6 +4468,28 @@ static char *page_getlink(struct dentry * dentry, struct page **ppage)
 	return kaddr;
 }
 
+/* get the link contents from pagecache under RCU */
+static char *page_getlink_rcu(struct dentry * dentry, struct page **ppage)
+{
+	char *kaddr;
+	struct page *page;
+	struct address_space *mapping = dentry->d_inode->i_mapping;
+	page = find_get_page(mapping, 0);
+	if (page &&
+	    (!PageUptodate(page) || PageHighMem(page))) {
+		put_page(page);
+		page = NULL;
+	}
+	if (!page) {
+		*ppage = ERR_PTR(-ECHILD);
+		return NULL;
+	}
+	*ppage = page;
+	kaddr = page_address(page);
+	nd_terminate_link(kaddr, dentry->d_inode->i_size, PAGE_SIZE - 1);
+	return kaddr;
+}
+
 int page_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 {
 	struct page *page = NULL;
@@ -4484,8 +4506,9 @@ void *page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
 {
 	struct page *page = NULL;
 	if (nd->flags & LOOKUP_RCU)
-		return ERR_PTR(-ECHILD);
-	nd_set_link(nd, page_getlink(dentry, &page));
+		nd_set_link(nd, page_getlink_rcu(dentry, &page));
+	else
+		nd_set_link(nd, page_getlink(dentry, &page));
 	return page;
 }
 EXPORT_SYMBOL(page_follow_link_light);
@@ -4495,7 +4518,8 @@ void page_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
 	struct page *page = cookie;
 
 	if (page) {
-		kunmap(page);
+		if (!(nd->flags & LOOKUP_LINK_RCU))
+			kunmap(page);
 		page_cache_release(page);
 	}
 }



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 8/9] XFS: allow follow_link to often succeed in RCU-walk.
  2015-03-05  5:21 [PATCH 0/9] Support follow_link in RCU-walk NeilBrown
  2015-03-05  5:21 ` [PATCH 4/9] VFS/namei: abort RCU-walk on symlink if atime needs updating NeilBrown
  2015-03-05  5:21 ` [PATCH 1/9] FS: make all ->follow_link handlers aware for LOOKUP_RCU NeilBrown
@ 2015-03-05  5:21 ` NeilBrown
  2015-03-05  5:21 ` [PATCH 2/9] VFS/namei: use terminate_walk when symlink lookup fails NeilBrown
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2015-03-05  5:21 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-fsdevel, linux-kernel

If LOOKUP_RCU is set, use GFP_ATOMIC rather than GFP_KERNEL,
and try to get the ilock without blocking.

When these succeed, follow_link() can succeed without dropping
out of RCU-walk.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/xfs/xfs_ioctl.c   |    2 +-
 fs/xfs/xfs_iops.c    |   15 ++++++++++-----
 fs/xfs/xfs_symlink.c |   11 +++++++++--
 fs/xfs/xfs_symlink.h |    2 +-
 4 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index ac4feae45eb3..29d95a1b76c0 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -303,7 +303,7 @@ xfs_readlink_by_handle(
 		goto out_dput;
 	}
 
-	error = xfs_readlink(XFS_I(dentry->d_inode), link);
+	error = xfs_readlink(XFS_I(dentry->d_inode), link, 0);
 	if (error)
 		goto out_kfree;
 	error = readlink_copy(hreq->ohandle, olen, link);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 23cea798b777..2abab666df46 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -415,15 +415,20 @@ xfs_vn_follow_link(
 	struct nameidata	*nd)
 {
 	char			*link;
-	int			error = -ENOMEM;
+	int			error;
 
-	if (nd_is_rcu(nd))
-		return ERR_PTR(-ECHILD);
-	link = kmalloc(MAXPATHLEN+1, GFP_KERNEL);
+	if (nd_is_rcu(nd)) {
+		error = -ECHILD;
+		link = kmalloc(MAXPATHLEN+1, GFP_ATOMIC);
+	} else {
+		error = -ENOMEM;
+		link = kmalloc(MAXPATHLEN+1, GFP_KERNEL);
+	}
 	if (!link)
 		goto out_err;
 
-	error = xfs_readlink(XFS_I(dentry->d_inode), link);
+	error = xfs_readlink(XFS_I(dentry->d_inode), link,
+			     nd_is_rcu(nd));
 	if (unlikely(error))
 		goto out_kfree;
 
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 25791df6f638..87b5b2ba3d38 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -123,7 +123,8 @@ xfs_readlink_bmap(
 int
 xfs_readlink(
 	struct xfs_inode *ip,
-	char		*link)
+	char		*link,
+	int		rcu)
 {
 	struct xfs_mount *mp = ip->i_mount;
 	xfs_fsize_t	pathlen;
@@ -134,7 +135,11 @@ xfs_readlink(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	xfs_ilock(ip, XFS_ILOCK_SHARED);
+	if (rcu) {
+		if (xfs_ilock_nowait(ip, XFS_ILOCK_SHARED) == 0)
+			return -ECHILD;
+	} else
+		xfs_ilock(ip, XFS_ILOCK_SHARED);
 
 	pathlen = ip->i_d.di_size;
 	if (!pathlen)
@@ -153,6 +158,8 @@ xfs_readlink(
 	if (ip->i_df.if_flags & XFS_IFINLINE) {
 		memcpy(link, ip->i_df.if_u1.if_data, pathlen);
 		link[pathlen] = '\0';
+	} else if (rcu) {
+		error = -ECHILD;
 	} else {
 		error = xfs_readlink_bmap(ip, link);
 	}
diff --git a/fs/xfs/xfs_symlink.h b/fs/xfs/xfs_symlink.h
index e75245d09116..a71d26643e20 100644
--- a/fs/xfs/xfs_symlink.h
+++ b/fs/xfs/xfs_symlink.h
@@ -21,7 +21,7 @@
 
 int xfs_symlink(struct xfs_inode *dp, struct xfs_name *link_name,
 		const char *target_path, umode_t mode, struct xfs_inode **ipp);
-int xfs_readlink(struct xfs_inode *ip, char *link);
+int xfs_readlink(struct xfs_inode *ip, char *link, int rcu);
 int xfs_inactive_symlink(struct xfs_inode *ip);
 
 #endif /* __XFS_SYMLINK_H */



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 9/9] NFS: support LOOKUP_RCU in nfs_follow_link.
  2015-03-05  5:21 [PATCH 0/9] Support follow_link in RCU-walk NeilBrown
                   ` (3 preceding siblings ...)
  2015-03-05  5:21 ` [PATCH 2/9] VFS/namei: use terminate_walk when symlink lookup fails NeilBrown
@ 2015-03-05  5:21 ` NeilBrown
  2015-03-05  5:21 ` [PATCH 5/9] VFS/namei: enhance follow_link to support RCU-walk NeilBrown
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2015-03-05  5:21 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-fsdevel, linux-kernel

If the inode is valid and the page has been read in,
then we can follow a link in RCU-walk.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/inode.c         |   22 ++++++++++++++++++++++
 fs/nfs/symlink.c       |   19 +++++++++++++++++--
 include/linux/nfs_fs.h |    1 +
 3 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 83107be3dd01..80f192405102 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1123,6 +1123,28 @@ out:
 	return ret;
 }
 
+int nfs_revalidate_mapping_rcu(struct inode *inode)
+{
+	struct nfs_inode *nfsi = NFS_I(inode);
+	unsigned long *bitlock = &nfsi->flags;
+	int ret = 0;
+
+	if (IS_SWAPFILE(inode))
+		goto out;
+	if (nfs_mapping_need_revalidate_inode(inode)) {
+		ret = -ECHILD;
+		goto out;
+	}
+	spin_lock(&inode->i_lock);
+	if (test_bit(NFS_INO_INVALIDATING, bitlock) ||
+	    (nfsi->cache_validity & NFS_INO_INVALID_DATA))
+		ret = -ECHILD;
+	spin_unlock(&inode->i_lock);
+out:
+	return ret;
+}
+
+
 static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr *fattr)
 {
 	struct nfs_inode *nfsi = NFS_I(inode);
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index c9a2d3cc4619..3cd295b7e094 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -49,8 +49,23 @@ static void *nfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 	struct page *page;
 	void *err;
 
-	if (nd_is_rcu(nd))
-		return ERR_PTR(-ECHILD);
+	if (nd_is_rcu(nd)) {
+		err = ERR_PTR(nfs_revalidate_mapping_rcu(inode));
+		if (err)
+			goto read_failed;
+		page = find_get_page(inode->i_mapping, 0);
+		if (page &&
+		    (!PageUptodate(page) || PageHighMem(page))) {
+			put_page(page);
+			page = NULL;
+		}
+		if (!page) {
+			err = ERR_PTR(-ECHILD);
+			goto read_failed;
+		}
+		nd_set_link(nd, page_address(page));
+		return page;
+	}
 	err = ERR_PTR(nfs_revalidate_mapping(inode, inode->i_mapping));
 	if (err)
 		goto read_failed;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 2f77e0c651c8..78c2f812eaeb 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -355,6 +355,7 @@ extern int nfs_revalidate_inode(struct nfs_server *server, struct inode *inode);
 extern int nfs_revalidate_inode_rcu(struct nfs_server *server, struct inode *inode);
 extern int __nfs_revalidate_inode(struct nfs_server *, struct inode *);
 extern int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping);
+extern int nfs_revalidate_mapping_rcu(struct inode *inode);
 extern int nfs_setattr(struct dentry *, struct iattr *);
 extern void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr);
 extern void nfs_setsecurity(struct inode *inode, struct nfs_fattr *fattr,



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] Support follow_link in RCU-walk.
  2015-03-05  5:21 [PATCH 0/9] Support follow_link in RCU-walk NeilBrown
                   ` (8 preceding siblings ...)
  2015-03-05  5:21 ` [PATCH 7/9] VFS/namei: handle LOOKUP_RCU in page_follow_link_light NeilBrown
@ 2015-03-05  6:05 ` Al Viro
  2015-03-05 13:52   ` John Stoffel
                     ` (2 more replies)
  9 siblings, 3 replies; 16+ messages in thread
From: Al Viro @ 2015-03-05  6:05 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-fsdevel, linux-kernel

On Thu, Mar 05, 2015 at 04:21:21PM +1100, NeilBrown wrote:
> Hi Al (and others),
> 
>  I wonder if you could look over this patchset.
>  It allows RCU-walk to follow symlinks in many common cases,
>  thus removing a surprising performance hit caused by using symlinks.
> 
>  The last could of patches make changes to XFS and NFS to support
>  this but I haven't forwarded to the relevant lists yet.
>  If/when the early code meets with approval I'll do that.
> 
>  The first patch almost certainly needs to be changed.  I originally
>  wrote this code when filesystems could see inside nameidata.
>  It is now opaque so the simplest solution was to provide an
>  accessor function.
>  Maybe I should as a 'flags' arg to ->follow_link?? Or have
>  ->follow_link and ->follow_link_rcu ??
>  What do you suggest?

Umm...  Some observations:
	* now ->follow_link() can be called in RCU mode, which means
that it can race with fs shutdown; not a problem, except that now it
joins ->lookup() et.al. in "if some data structure is needed in RCU
case of that, make sure it's not destroyed without an RCU delay somewhere
between the entry into ->kill_sb() and destruction.
	* highmem pages in symlinks: that BS shouldn't be allowed at
all.  Just make sure that at least for those filesystems symlink inodes
get mapping_set_gfp_mask(&inode->i_data, GFP_KERNEL) and be done with that.
	* are you sure that security_inode_follow_link() is OK to call in
RCU mode?
	* what warranties are you giving for the lifetime of strings
passed to nd_set_link()?  Right now it's "should not be freed until the
matching ->put_link()"; what happens for RCU mode?
	* really nasty one: creat(2) on a dangling symlink.  What's to
preserve the last component if you get into that symlink in RCU mode?

TBH, I'm less than fond of passing nameidata to ->follow_link() at all,
flags or no flags.  We could kill current->link_count and
current->total_link_count, replacing them with one void * current->nameidata
and taking counters into struct nameidata itself.  Have places like e.g.
kern_path_locked() do
	struct nameidata nd, *saved = set_nameidata(&nd);
	...
	set_nameidata(saved);
with set_nameidata(p) doing this:
	old = current->nameidata;
	current->nameidata = p;
	if (p) {
		if (!old) {
			p->link_count = 0;
			p->total_link_count = 0;
		} else {
			p->link_count = old->link_count;
			p->total_link_count = old->total_link_count;
		}
	}
	return old;

Then nd_set_link() et.al. would use current->nameidata instead of an
explicitly passed pointer and ->follow_link() instances wouldn't need
that opaque pointer passed to them at all.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] Support follow_link in RCU-walk.
  2015-03-05  6:05 ` [PATCH 0/9] Support follow_link in RCU-walk Al Viro
@ 2015-03-05 13:52   ` John Stoffel
  2015-03-05 16:00     ` Al Viro
  2015-03-05 21:08   ` NeilBrown
  2015-03-09  2:21   ` NeilBrown
  2 siblings, 1 reply; 16+ messages in thread
From: John Stoffel @ 2015-03-05 13:52 UTC (permalink / raw)
  To: Al Viro; +Cc: NeilBrown, linux-fsdevel, linux-kernel

>>>>> "Al" == Al Viro <viro@ZenIV.linux.org.uk> writes:

Al> On Thu, Mar 05, 2015 at 04:21:21PM +1100, NeilBrown wrote:
>> Hi Al (and others),
>> 
>> I wonder if you could look over this patchset.
>> It allows RCU-walk to follow symlinks in many common cases,
>> thus removing a surprising performance hit caused by using symlinks.
>> 
>> The last could of patches make changes to XFS and NFS to support
>> this but I haven't forwarded to the relevant lists yet.
>> If/when the early code meets with approval I'll do that.
>> 
>> The first patch almost certainly needs to be changed.  I originally
>> wrote this code when filesystems could see inside nameidata.
>> It is now opaque so the simplest solution was to provide an
>> accessor function.
>> Maybe I should as a 'flags' arg to ->follow_link?? Or have
-> follow_link and ->follow_link_rcu ??
>> What do you suggest?

Al> Umm...  Some observations:
Al> 	* now ->follow_link() can be called in RCU mode, which means
Al> that it can race with fs shutdown; not a problem, except that now it
Al> joins ->lookup() et.al. in "if some data structure is needed in RCU
Al> case of that, make sure it's not destroyed without an RCU delay somewhere
Al> between the entry into ->kill_sb() and destruction.
Al> 	* highmem pages in symlinks: that BS shouldn't be allowed at
Al> all.  Just make sure that at least for those filesystems symlink inodes
Al> get mapping_set_gfp_mask(&inode->i_data, GFP_KERNEL) and be done with that.


So what happens if your filesystem is 10Tb in size, and you have 50
million files and lots of them are symlinks?  I've got developers who
do shit like this and wonder why performance sucks....  and I just
worry that GPF_KERNEL is a limited resource.  But maybe 64bit systems
won't really have any problems?

Rest of this is way outside my pay grade to commment on.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] Support follow_link in RCU-walk.
  2015-03-05 13:52   ` John Stoffel
@ 2015-03-05 16:00     ` Al Viro
  2015-03-05 17:17       ` John Stoffel
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2015-03-05 16:00 UTC (permalink / raw)
  To: John Stoffel; +Cc: NeilBrown, linux-fsdevel, linux-kernel

On Thu, Mar 05, 2015 at 08:52:20AM -0500, John Stoffel wrote:

> So what happens if your filesystem is 10Tb in size, and you have 50
> million files and lots of them are symlinks?  I've got developers who
> do shit like this and wonder why performance sucks....  and I just
> worry that GPF_KERNEL is a limited resource.  But maybe 64bit systems
> won't really have any problems?

What would keep all those symlinks' contents pinned in page cache?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] Support follow_link in RCU-walk.
  2015-03-05 16:00     ` Al Viro
@ 2015-03-05 17:17       ` John Stoffel
  0 siblings, 0 replies; 16+ messages in thread
From: John Stoffel @ 2015-03-05 17:17 UTC (permalink / raw)
  To: Al Viro; +Cc: John Stoffel, NeilBrown, linux-fsdevel, linux-kernel

>>>>> "Al" == Al Viro <viro@ZenIV.linux.org.uk> writes:

Al> On Thu, Mar 05, 2015 at 08:52:20AM -0500, John Stoffel wrote:
>> So what happens if your filesystem is 10Tb in size, and you have 50
>> million files and lots of them are symlinks?  I've got developers who
>> do shit like this and wonder why performance sucks....  and I just
>> worry that GPF_KERNEL is a limited resource.  But maybe 64bit systems
>> won't really have any problems?

Al> What would keep all those symlinks' contents pinned in page cache?

Dunno honestly... but your statement nudges my memory that they would
be evicted as memory pressure grows, so it's probably not a problem
afterall.  I just know that my users love tickling strange bugs like
this because they hate to actually cleanup after themselves unless
absolutely necessary.

Thanks for the nudge.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] Support follow_link in RCU-walk.
  2015-03-05  6:05 ` [PATCH 0/9] Support follow_link in RCU-walk Al Viro
  2015-03-05 13:52   ` John Stoffel
@ 2015-03-05 21:08   ` NeilBrown
  2015-03-09  2:21   ` NeilBrown
  2 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2015-03-05 21:08 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3019 bytes --]

On Thu, 5 Mar 2015 06:05:20 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Thu, Mar 05, 2015 at 04:21:21PM +1100, NeilBrown wrote:
> > Hi Al (and others),
> > 
> >  I wonder if you could look over this patchset.
> >  It allows RCU-walk to follow symlinks in many common cases,
> >  thus removing a surprising performance hit caused by using symlinks.
> > 
> >  The last could of patches make changes to XFS and NFS to support
> >  this but I haven't forwarded to the relevant lists yet.
> >  If/when the early code meets with approval I'll do that.
> > 
> >  The first patch almost certainly needs to be changed.  I originally
> >  wrote this code when filesystems could see inside nameidata.
> >  It is now opaque so the simplest solution was to provide an
> >  accessor function.
> >  Maybe I should as a 'flags' arg to ->follow_link?? Or have
> >  ->follow_link and ->follow_link_rcu ??
> >  What do you suggest?
> 
> Umm...  Some observations:
> 	* now ->follow_link() can be called in RCU mode, which means
> that it can race with fs shutdown; not a problem, except that now it
> joins ->lookup() et.al. in "if some data structure is needed in RCU
> case of that, make sure it's not destroyed without an RCU delay somewhere
> between the entry into ->kill_sb() and destruction.
> 	* highmem pages in symlinks: that BS shouldn't be allowed at
> all.  Just make sure that at least for those filesystems symlink inodes
> get mapping_set_gfp_mask(&inode->i_data, GFP_KERNEL) and be done with that.
> 	* are you sure that security_inode_follow_link() is OK to call in
> RCU mode?
> 	* what warranties are you giving for the lifetime of strings
> passed to nd_set_link()?  Right now it's "should not be freed until the
> matching ->put_link()"; what happens for RCU mode?
> 	* really nasty one: creat(2) on a dangling symlink.  What's to
> preserve the last component if you get into that symlink in RCU mode?
> 
> TBH, I'm less than fond of passing nameidata to ->follow_link() at all,
> flags or no flags.  We could kill current->link_count and
> current->total_link_count, replacing them with one void * current->nameidata
> and taking counters into struct nameidata itself.  Have places like e.g.
> kern_path_locked() do
> 	struct nameidata nd, *saved = set_nameidata(&nd);
> 	...
> 	set_nameidata(saved);
> with set_nameidata(p) doing this:
> 	old = current->nameidata;
> 	current->nameidata = p;
> 	if (p) {
> 		if (!old) {
> 			p->link_count = 0;
> 			p->total_link_count = 0;
> 		} else {
> 			p->link_count = old->link_count;
> 			p->total_link_count = old->total_link_count;
> 		}
> 	}
> 	return old;
> 
> Then nd_set_link() et.al. would use current->nameidata instead of an
> explicitly passed pointer and ->follow_link() instances wouldn't need
> that opaque pointer passed to them at all.

Wow, thanks for the quick response!!!

I'll look into all those issues and get back to you when I have something
coherent to say.

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] Support follow_link in RCU-walk.
  2015-03-05  6:05 ` [PATCH 0/9] Support follow_link in RCU-walk Al Viro
  2015-03-05 13:52   ` John Stoffel
  2015-03-05 21:08   ` NeilBrown
@ 2015-03-09  2:21   ` NeilBrown
  2 siblings, 0 replies; 16+ messages in thread
From: NeilBrown @ 2015-03-09  2:21 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4944 bytes --]

On Thu, 5 Mar 2015 06:05:20 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Thu, Mar 05, 2015 at 04:21:21PM +1100, NeilBrown wrote:
> > Hi Al (and others),
> > 
> >  I wonder if you could look over this patchset.
> >  It allows RCU-walk to follow symlinks in many common cases,
> >  thus removing a surprising performance hit caused by using symlinks.
> > 
> >  The last could of patches make changes to XFS and NFS to support
> >  this but I haven't forwarded to the relevant lists yet.
> >  If/when the early code meets with approval I'll do that.
> > 
> >  The first patch almost certainly needs to be changed.  I originally
> >  wrote this code when filesystems could see inside nameidata.
> >  It is now opaque so the simplest solution was to provide an
> >  accessor function.
> >  Maybe I should as a 'flags' arg to ->follow_link?? Or have
> >  ->follow_link and ->follow_link_rcu ??
> >  What do you suggest?
> 
> Umm...  Some observations:
> 	* now ->follow_link() can be called in RCU mode, which means
> that it can race with fs shutdown; not a problem, except that now it
> joins ->lookup() et.al. in "if some data structure is needed in RCU
> case of that, make sure it's not destroyed without an RCU delay somewhere
> between the entry into ->kill_sb() and destruction.

So inodes and dentries and associated private data should already be safe.
And s_fs_info can be used if it is freed by e.g. kfree_rcu (like autofs)
but not if just kfree (like ext3).

xfs_fs_put_super() directly frees the 'xfs_mount', which xfs_readlink
accesses.  I guess that needs to be fixed.


> 	* highmem pages in symlinks: that BS shouldn't be allowed at
> all.  Just make sure that at least for those filesystems symlink inodes
> get mapping_set_gfp_mask(&inode->i_data, GFP_KERNEL) and be done with that.

page_getlink() already uses kmap(), implying that highmem pages are
supported.   All I'm doing is making sure that my page_getlink_rcu()
doesn't fail horribly if the page is a highmem page.

If a filesystem needs improved follow_link performance on a highmem machine,
then setting the gfp_mask as you suggest is probably a good idea, but I don't
really want to impose that on filesystems if I don't need to.  And at present
I don't.
So I'd rather leave it to the filesystem maintainer, or someone who discovers
a need.


> 	* are you sure that security_inode_follow_link() is OK to call in
> RCU mode?

No.
avc_has_perm() doesn't look RCU safe, even without auditing enabled.
At the very least we'll need to pass a "lookup_rcu" flag in there.


> 	* what warranties are you giving for the lifetime of strings
> passed to nd_set_link()?  Right now it's "should not be freed until the
> matching ->put_link()"; what happens for RCU mode?

The same....

For XFS, we kmalloc a buffer GFP_ATOMIC and copy into that.  Then
put_link() kfrees it.
For filesystems with the symlink in the page cache, we get a reference to
the page (which is a bit heavy-handed for RCU-walk, but much less so than the
current code) and drop the reference in ->put_link.

For filesystems with a short symlink in the inode, we just provide a pointer
to that... How long can we expect that to be around?
I cannot see any provision for keeping those inodes in memory while we
follow the symlink... What am I missing?

In any case, if there is a reference held on the inode for ref-walk, then
presumably complete_walk() will take a reference on that same inode when
dropping out of rcu-walk.... I hope.


So I think the rules here are unchanged.


> 	* really nasty one: creat(2) on a dangling symlink.  What's to
> preserve the last component if you get into that symlink in RCU mode?

As above - we will have a counted reference to whatever holds the text of the
symlink.



> 
> TBH, I'm less than fond of passing nameidata to ->follow_link() at all,
> flags or no flags.  We could kill current->link_count and
> current->total_link_count, replacing them with one void * current->nameidata
> and taking counters into struct nameidata itself.  Have places like e.g.
> kern_path_locked() do
> 	struct nameidata nd, *saved = set_nameidata(&nd);
> 	...
> 	set_nameidata(saved);
> with set_nameidata(p) doing this:
> 	old = current->nameidata;
> 	current->nameidata = p;
> 	if (p) {
> 		if (!old) {
> 			p->link_count = 0;
> 			p->total_link_count = 0;
> 		} else {
> 			p->link_count = old->link_count;
> 			p->total_link_count = old->total_link_count;
> 		}
> 	}
> 	return old;
> 
> Then nd_set_link() et.al. would use current->nameidata instead of an
> explicitly passed pointer and ->follow_link() instances wouldn't need
> that opaque pointer passed to them at all.

Sounds interesting  - I might try it.

Would ->follow_link() than get a 'flags' argument, or would "nd_is_rcu()"
reference current->nameidata->flags ??

Thanks,
NeilBrown


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-03-09  2:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-05  5:21 [PATCH 0/9] Support follow_link in RCU-walk NeilBrown
2015-03-05  5:21 ` [PATCH 4/9] VFS/namei: abort RCU-walk on symlink if atime needs updating NeilBrown
2015-03-05  5:21 ` [PATCH 1/9] FS: make all ->follow_link handlers aware for LOOKUP_RCU NeilBrown
2015-03-05  5:21 ` [PATCH 8/9] XFS: allow follow_link to often succeed in RCU-walk NeilBrown
2015-03-05  5:21 ` [PATCH 2/9] VFS/namei: use terminate_walk when symlink lookup fails NeilBrown
2015-03-05  5:21 ` [PATCH 9/9] NFS: support LOOKUP_RCU in nfs_follow_link NeilBrown
2015-03-05  5:21 ` [PATCH 5/9] VFS/namei: enhance follow_link to support RCU-walk NeilBrown
2015-03-05  5:21 ` [PATCH 6/9] VFS/namei: enable RCU-walk when following symlinks NeilBrown
2015-03-05  5:21 ` [PATCH 3/9] VFS/namei: new flag to support RCU symlinks: LOOKUP_LINK_RCU NeilBrown
2015-03-05  5:21 ` [PATCH 7/9] VFS/namei: handle LOOKUP_RCU in page_follow_link_light NeilBrown
2015-03-05  6:05 ` [PATCH 0/9] Support follow_link in RCU-walk Al Viro
2015-03-05 13:52   ` John Stoffel
2015-03-05 16:00     ` Al Viro
2015-03-05 17:17       ` John Stoffel
2015-03-05 21:08   ` NeilBrown
2015-03-09  2:21   ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).