All of lore.kernel.org
 help / color / mirror / Atom feed
* VFS pathname walking cleanups (i_op and ACL access)
@ 2011-07-22 17:37 Linus Torvalds
  2011-07-22 17:37 ` [PATCH 1/2] VFS: Cut down inode->i_op->xyz accesses in path walking Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Linus Torvalds @ 2011-07-22 17:37 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig, linux-fsdevel


I've got a branch named 'i_op-access-removal' which removes most (I think 
all) of the inode->i_op accesses from the normal RCU path resolution code 
for the common case. They did show up very clearly in profiles, even when 
the load is mostly cached - just the pointer chasing is expensive, 
especially since even cached loads aren't small enough to fit in L1.

The first patch is trivial, and just extends the DCACHE_OP_xyz flags in 
d_flags to also cover some of the core inode->i_op->xyz accesses that we 
don't need to *call* (either because they are NULL or because the results 
are cached), but we used to look things up.

The second patch moves the generic ACL caching case into the VFS layer the 
way it should have been done a long time ago (we already moved the cache 
fields into the core inode data structure), and just removes the need to 
call into the filesystem for 'check_acl()' for the common cached case.

I don't expect this to be really at all controversial, but doing this, I 
noticed that some of the ACL cache setup was a bit odd in filesystems. In 
the case of XFS, for example, the test for

	if (!XFS_IFORK_Q(ip))
		return -EAGAIN;

still remains in the filesystem-specific ACL check routine, which means 
that since it doesn't show up in the cache, that case is now going to kick 
XFS out of RCU. The fix seems to be trivial (just do a set_acl_cache() of 
NULL for that case), but I didn't want to go into locking issues, so I 
just added a comment.

Other filesystems should probably also check their ACL cache usage. Afaik, 
only for XFS did semantics *change* (because that XFS_IFORK_Q() used to be 
before the RCU check), but it looks to me like some filesystems aren't 
really doing the whole cache at all. Maybe they don't care, but it does 
mean that you won't get RCU walking on such a filesystem.

The two patches will come as follow-ons to this email.

                        Linus

---
Linus Torvalds (2):
  VFS: Cut down inode->i_op->xyz accesses in path walking
  vfs: move ACL cache lookup into generic code

 fs/9p/acl.c                |    3 -
 fs/afs/security.c          |    2 +-
 fs/btrfs/acl.c             |   19 +++------
 fs/btrfs/inode.c           |    7 +++-
 fs/ceph/inode.c            |    2 +-
 fs/cifs/cifsfs.c           |    2 +-
 fs/dcache.c                |    6 +++
 fs/ext2/acl.c              |    6 ---
 fs/ext3/acl.c              |    6 ---
 fs/ext4/acl.c              |    6 ---
 fs/fuse/dir.c              |    4 +-
 fs/gfs2/acl.c              |    6 ---
 fs/gfs2/inode.c            |    5 ++-
 fs/hostfs/hostfs_kern.c    |    2 +-
 fs/hpfs/namei.c            |    2 +-
 fs/jffs2/acl.c             |    3 -
 fs/jfs/acl.c               |    3 -
 fs/namei.c                 |   92 ++++++++++++++++++++++++++++++++-----------
 fs/nfs/dir.c               |    2 +-
 fs/nilfs2/inode.c          |    2 +-
 fs/ocfs2/acl.c             |    3 -
 fs/ocfs2/file.c            |    4 +-
 fs/proc/base.c             |    2 +-
 fs/reiserfs/xattr.c        |   17 +++-----
 fs/sysfs/inode.c           |    2 +-
 fs/xfs/linux-2.6/xfs_acl.c |    8 +---
 include/linux/dcache.h     |   12 ++++-
 include/linux/fs.h         |    3 +-
 28 files changed, 123 insertions(+), 108 deletions(-)

-- 
1.7.6.233.gd79bc.dirty


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/2] VFS: Cut down inode->i_op->xyz accesses in path walking
  2011-07-22 17:37 VFS pathname walking cleanups (i_op and ACL access) Linus Torvalds
@ 2011-07-22 17:37 ` Linus Torvalds
  2011-07-22 17:47   ` Christoph Hellwig
                     ` (2 more replies)
  2011-07-22 17:40 ` VFS pathname walking cleanups (i_op and ACL access) Christoph Hellwig
  2011-07-22 17:45 ` [PATCH 2/2] vfs: move ACL cache lookup into generic code Linus Torvalds
  2 siblings, 3 replies; 37+ messages in thread
From: Linus Torvalds @ 2011-07-22 17:37 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig, linux-fsdevel


From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 22 Jul 2011 08:44:51 -0700
Subject: [PATCH 1/2] VFS: Cut down inode->i_op->xyz accesses in path walking

One of the biggest remaining unnecessary costs in path walking is the
pointer chasing in inode operations.  We already avoided the
dentry->d_op derferences with the DCACHE_OP_xyz flags, this just starts
doing the same thing for the i_op->xyz cases.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/dcache.c            |    6 ++++++
 fs/namei.c             |   25 +++++++++++++------------
 include/linux/dcache.h |   12 +++++++++---
 3 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index fbdcbca40725..2dacddb7b101 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1374,6 +1374,12 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 		if (unlikely(IS_AUTOMOUNT(inode)))
 			dentry->d_flags |= DCACHE_NEED_AUTOMOUNT;
 		list_add(&dentry->d_alias, &inode->i_dentry);
+		if (unlikely(inode->i_op->lookup))
+			dentry->d_flags |= DCACHE_OP_LOOKUP;
+		if (unlikely(inode->i_op->permission))
+			dentry->d_flags |= DCACHE_OP_PERMISSION;
+		if (unlikely(inode->i_op->follow_link))
+			dentry->d_flags |= DCACHE_OP_FOLLOW_LINK;
 	}
 	dentry->d_inode = inode;
 	dentry_rcuwalk_barrier(dentry);
diff --git a/fs/namei.c b/fs/namei.c
index 14ab8d3f2f0c..02b680e0e816 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -577,12 +577,12 @@ static int complete_walk(struct nameidata *nd)
  * short-cut DAC fails, then call ->permission() to do more
  * complete permission check.
  */
-static inline int exec_permission(struct inode *inode, unsigned int flags)
+static inline int exec_permission(struct dentry *dentry, struct inode *inode, unsigned int flags)
 {
 	int ret;
 	struct user_namespace *ns = inode_userns(inode);
 
-	if (inode->i_op->permission) {
+	if (dentry->d_flags & DCACHE_OP_PERMISSION) {
 		ret = inode->i_op->permission(inode, MAY_EXEC, flags);
 	} else {
 		ret = acl_permission_check(inode, MAY_EXEC, flags,
@@ -1234,13 +1234,13 @@ retry:
 static inline int may_lookup(struct nameidata *nd)
 {
 	if (nd->flags & LOOKUP_RCU) {
-		int err = exec_permission(nd->inode, IPERM_FLAG_RCU);
+		int err = exec_permission(nd->path.dentry, nd->inode, IPERM_FLAG_RCU);
 		if (err != -ECHILD)
 			return err;
 		if (unlazy_walk(nd, NULL))
 			return -ECHILD;
 	}
-	return exec_permission(nd->inode, 0);
+	return exec_permission(nd->path.dentry, nd->inode, 0);
 }
 
 static inline int handle_dots(struct nameidata *nd, int type)
@@ -1290,7 +1290,7 @@ static inline int walk_component(struct nameidata *nd, struct path *path,
 		terminate_walk(nd);
 		return -ENOENT;
 	}
-	if (unlikely(inode->i_op->follow_link) && follow) {
+	if (unlikely(path->dentry->d_flags & DCACHE_OP_FOLLOW_LINK) && follow) {
 		if (nd->flags & LOOKUP_RCU) {
 			if (unlikely(unlazy_walk(nd, path->dentry))) {
 				terminate_walk(nd);
@@ -1425,7 +1425,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 				return err;
 		}
 		err = -ENOTDIR; 
-		if (!nd->inode->i_op->lookup)
+		if (!(nd->path.dentry->d_flags & DCACHE_OP_LOOKUP))
 			break;
 		continue;
 		/* here ends the main loop */
@@ -1452,11 +1452,12 @@ static int path_init(int dfd, const char *name, unsigned int flags,
 	nd->flags = flags | LOOKUP_JUMPED;
 	nd->depth = 0;
 	if (flags & LOOKUP_ROOT) {
-		struct inode *inode = nd->root.dentry->d_inode;
+		struct dentry *dentry = nd->root.dentry;
+		struct inode *inode = dentry->d_inode;
 		if (*name) {
-			if (!inode->i_op->lookup)
+			if (!(dentry->d_flags & DCACHE_OP_LOOKUP))
 				return -ENOTDIR;
-			retval = inode_permission(inode, MAY_EXEC);
+			retval = exec_permission(dentry, inode, 0);
 			if (retval)
 				return retval;
 		}
@@ -1599,7 +1600,7 @@ static int path_lookupat(int dfd, const char *name,
 		err = complete_walk(nd);
 
 	if (!err && nd->flags & LOOKUP_DIRECTORY) {
-		if (!nd->inode->i_op->lookup) {
+		if (!(nd->path.dentry->d_flags & DCACHE_OP_LOOKUP)) {
 			path_put(&nd->path);
 			err = -ENOTDIR;
 		}
@@ -1672,7 +1673,7 @@ static struct dentry *__lookup_hash(struct qstr *name,
 	struct dentry *dentry;
 	int err;
 
-	err = exec_permission(inode, 0);
+	err = exec_permission(base, inode, 0);
 	if (err)
 		return ERR_PTR(err);
 
@@ -2099,7 +2100,7 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
 
 		error = -ENOTDIR;
 		if (nd->flags & LOOKUP_DIRECTORY) {
-			if (!nd->inode->i_op->lookup)
+			if (!(nd->path.dentry->d_flags & DCACHE_OP_LOOKUP))
 				goto exit;
 		}
 		audit_inode(pathname, nd->path.dentry);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 19d90a55541d..8cd1a3d5b320 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -208,14 +208,20 @@ struct dentry_operations {
 #define DCACHE_CANT_MOUNT	0x0100
 #define DCACHE_GENOCIDE		0x0200
 
+/* Avoid 'dentry->d_op->op' dereference chain */
 #define DCACHE_OP_HASH		0x1000
 #define DCACHE_OP_COMPARE	0x2000
 #define DCACHE_OP_REVALIDATE	0x4000
 #define DCACHE_OP_DELETE	0x8000
 
-#define DCACHE_MOUNTED		0x10000	/* is a mountpoint */
-#define DCACHE_NEED_AUTOMOUNT	0x20000	/* handle automount on this dir */
-#define DCACHE_MANAGE_TRANSIT	0x40000	/* manage transit from this dirent */
+/* Avoid 'inode->i_op->op' dereference chain */
+#define DCACHE_OP_LOOKUP	0x10000
+#define DCACHE_OP_PERMISSION	0x20000
+#define DCACHE_OP_FOLLOW_LINK	0x40000
+
+#define DCACHE_MOUNTED		0x100000	/* is a mountpoint */
+#define DCACHE_NEED_AUTOMOUNT	0x200000	/* handle automount on this dir */
+#define DCACHE_MANAGE_TRANSIT	0x400000	/* manage transit from this dirent */
 #define DCACHE_MANAGED_DENTRY \
 	(DCACHE_MOUNTED|DCACHE_NEED_AUTOMOUNT|DCACHE_MANAGE_TRANSIT)
 
-- 
1.7.6.233.gd79bc.dirty


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: VFS pathname walking cleanups (i_op and ACL access)
  2011-07-22 17:37 VFS pathname walking cleanups (i_op and ACL access) Linus Torvalds
  2011-07-22 17:37 ` [PATCH 1/2] VFS: Cut down inode->i_op->xyz accesses in path walking Linus Torvalds
@ 2011-07-22 17:40 ` Christoph Hellwig
  2011-07-22 17:45 ` [PATCH 2/2] vfs: move ACL cache lookup into generic code Linus Torvalds
  2 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2011-07-22 17:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, Christoph Hellwig, linux-fsdevel

On Fri, Jul 22, 2011 at 10:37:06AM -0700, Linus Torvalds wrote:
> The second patch moves the generic ACL caching case into the VFS layer the 
> way it should have been done a long time ago (we already moved the cache 
> fields into the core inode data structure), and just removes the need to 
> call into the filesystem for 'check_acl()' for the common cached case.

I've been wanting to that for a while and already had local patches for
it that needed forward porting.  I'll compare them with your version.

> 
> I don't expect this to be really at all controversial, but doing this, I 
> noticed that some of the ACL cache setup was a bit odd in filesystems. In 
> the case of XFS, for example, the test for
> 
> 	if (!XFS_IFORK_Q(ip))
> 		return -EAGAIN;
> 
> still remains in the filesystem-specific ACL check routine, which means 
> that since it doesn't show up in the cache, that case is now going to kick 
> XFS out of RCU. The fix seems to be trivial (just do a set_acl_cache() of 
> NULL for that case), but I didn't want to go into locking issues, so I 
> just added a comment.

It really just means we don't have an ACL.  I'll happily fix this in a
late merge window update.  We'll already need to do one as I have a few
small changes that should go in ontop of Jens' and Als trees.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 2/2] vfs: move ACL cache lookup into generic code
  2011-07-22 17:37 VFS pathname walking cleanups (i_op and ACL access) Linus Torvalds
  2011-07-22 17:37 ` [PATCH 1/2] VFS: Cut down inode->i_op->xyz accesses in path walking Linus Torvalds
  2011-07-22 17:40 ` VFS pathname walking cleanups (i_op and ACL access) Christoph Hellwig
@ 2011-07-22 17:45 ` Linus Torvalds
  2011-07-22 17:50   ` Christoph Hellwig
  2011-07-23  2:34   ` [PATCH] " Linus Torvalds
  2 siblings, 2 replies; 37+ messages in thread
From: Linus Torvalds @ 2011-07-22 17:45 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig, linux-fsdevel


From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 22 Jul 2011 10:04:21 -0700
Subject: [PATCH 2/2] vfs: move ACL cache lookup into generic code

.. and streamline 'check_acl()' semantics for lowlevel file systems.

This removes the 'check_acl' argument from 'generic_permission()', and
just makes the rule be that it should be in inode->i_op, whether you
have your own filesystem-specific ->permission callback or not.

The end result is a streamlined ACL check that doesn't need to load the
inode->i_op->check_acl pointer at all for the common cached case.  So
now the pathname lookup never even needs to look up i_op for the common
case.

The filesystems also don't need to check for RCU walk case in their
acl_check() functions, because that is all handled at a VFS layer.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

So with these two patches, profiles of very pathname intensive loads show 
that it's the actual name comparison and hash calculation that is the 
expensive part - rather than stupid work that we could avoid.

Which is as it should be.

But as mentioned, that will depend on the filesystem getting the ACL 
caching right, in order to not drop out of RCU. There's two choices: use 
the "i_op->permission()" function and handle RCU _and_ ACL on your own, or 
use just "check_acl()" and make sure that you use the ACL caches, and let 
generic_permission() DTRT.

 fs/9p/acl.c                |    3 --
 fs/afs/security.c          |    2 +-
 fs/btrfs/acl.c             |   19 ++++--------
 fs/btrfs/inode.c           |    7 ++++-
 fs/ceph/inode.c            |    2 +-
 fs/cifs/cifsfs.c           |    2 +-
 fs/ext2/acl.c              |    6 ----
 fs/ext3/acl.c              |    6 ----
 fs/ext4/acl.c              |    6 ----
 fs/fuse/dir.c              |    4 +-
 fs/gfs2/acl.c              |    6 ----
 fs/gfs2/inode.c            |    5 ++-
 fs/hostfs/hostfs_kern.c    |    2 +-
 fs/hpfs/namei.c            |    2 +-
 fs/jffs2/acl.c             |    3 --
 fs/jfs/acl.c               |    3 --
 fs/namei.c                 |   67 ++++++++++++++++++++++++++++++++++++--------
 fs/nfs/dir.c               |    2 +-
 fs/nilfs2/inode.c          |    2 +-
 fs/ocfs2/acl.c             |    3 --
 fs/ocfs2/file.c            |    4 ++-
 fs/proc/base.c             |    2 +-
 fs/reiserfs/xattr.c        |   17 ++++-------
 fs/sysfs/inode.c           |    2 +-
 fs/xfs/linux-2.6/xfs_acl.c |    8 +----
 include/linux/fs.h         |    3 +-
 26 files changed, 95 insertions(+), 93 deletions(-)

diff --git a/fs/9p/acl.c b/fs/9p/acl.c
index 535ab6eccb1a..d2b04743d7ab 100644
--- a/fs/9p/acl.c
+++ b/fs/9p/acl.c
@@ -101,9 +101,6 @@ int v9fs_check_acl(struct inode *inode, int mask, unsigned int flags)
 	struct posix_acl *acl;
 	struct v9fs_session_info *v9ses;
 
-	if (flags & IPERM_FLAG_RCU)
-		return -ECHILD;
-
 	v9ses = v9fs_inode2v9ses(inode);
 	if (((v9ses->flags & V9FS_ACCESS_MASK) != V9FS_ACCESS_CLIENT) ||
 			((v9ses->flags & V9FS_ACL_MASK) != V9FS_POSIX_ACL)) {
diff --git a/fs/afs/security.c b/fs/afs/security.c
index f44b9d355377..745ee654165f 100644
--- a/fs/afs/security.c
+++ b/fs/afs/security.c
@@ -350,7 +350,7 @@ int afs_permission(struct inode *inode, int mask, unsigned int flags)
 	}
 
 	key_put(key);
-	ret = generic_permission(inode, mask, flags, NULL);
+	ret = generic_permission(inode, mask, flags);
 	_leave(" = %d", ret);
 	return ret;
 
diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c
index f66fc9959733..a7978872c52b 100644
--- a/fs/btrfs/acl.c
+++ b/fs/btrfs/acl.c
@@ -198,20 +198,15 @@ out:
 int btrfs_check_acl(struct inode *inode, int mask, unsigned int flags)
 {
 	int error = -EAGAIN;
+	struct posix_acl *acl;
 
-	if (flags & IPERM_FLAG_RCU) {
-		if (!negative_cached_acl(inode, ACL_TYPE_ACCESS))
-			error = -ECHILD;
+	acl = btrfs_get_acl(inode, ACL_TYPE_ACCESS);
+	if (IS_ERR(acl))
+		return PTR_ERR(acl);
 
-	} else {
-		struct posix_acl *acl;
-		acl = btrfs_get_acl(inode, ACL_TYPE_ACCESS);
-		if (IS_ERR(acl))
-			return PTR_ERR(acl);
-		if (acl) {
-			error = posix_acl_permission(inode, acl, mask);
-			posix_acl_release(acl);
-		}
+	if (acl) {
+		error = posix_acl_permission(inode, acl, mask);
+		posix_acl_release(acl);
 	}
 
 	return error;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3601f0aebddf..076d29aaceca 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7339,7 +7339,7 @@ static int btrfs_permission(struct inode *inode, int mask, unsigned int flags)
 		return -EROFS;
 	if ((BTRFS_I(inode)->flags & BTRFS_INODE_READONLY) && (mask & MAY_WRITE))
 		return -EACCES;
-	return generic_permission(inode, mask, flags, btrfs_check_acl);
+	return generic_permission(inode, mask, flags);
 }
 
 static const struct inode_operations btrfs_dir_inode_operations = {
@@ -7359,10 +7359,12 @@ static const struct inode_operations btrfs_dir_inode_operations = {
 	.listxattr	= btrfs_listxattr,
 	.removexattr	= btrfs_removexattr,
 	.permission	= btrfs_permission,
+	.check_acl	= btrfs_check_acl,
 };
 static const struct inode_operations btrfs_dir_ro_inode_operations = {
 	.lookup		= btrfs_lookup,
 	.permission	= btrfs_permission,
+	.check_acl	= btrfs_check_acl,
 };
 
 static const struct file_operations btrfs_dir_file_operations = {
@@ -7430,12 +7432,14 @@ static const struct inode_operations btrfs_file_inode_operations = {
 	.listxattr      = btrfs_listxattr,
 	.removexattr	= btrfs_removexattr,
 	.permission	= btrfs_permission,
+	.check_acl	= btrfs_check_acl,
 	.fiemap		= btrfs_fiemap,
 };
 static const struct inode_operations btrfs_special_inode_operations = {
 	.getattr	= btrfs_getattr,
 	.setattr	= btrfs_setattr,
 	.permission	= btrfs_permission,
+	.check_acl	= btrfs_check_acl,
 	.setxattr	= btrfs_setxattr,
 	.getxattr	= btrfs_getxattr,
 	.listxattr	= btrfs_listxattr,
@@ -7447,6 +7451,7 @@ static const struct inode_operations btrfs_symlink_inode_operations = {
 	.put_link	= page_put_link,
 	.getattr	= btrfs_getattr,
 	.permission	= btrfs_permission,
+	.check_acl	= btrfs_check_acl,
 	.setxattr	= btrfs_setxattr,
 	.getxattr	= btrfs_getxattr,
 	.listxattr	= btrfs_listxattr,
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index d8858e96ab18..beb5d55d6fd2 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1805,7 +1805,7 @@ int ceph_permission(struct inode *inode, int mask, unsigned int flags)
 	err = ceph_do_getattr(inode, CEPH_CAP_AUTH_SHARED);
 
 	if (!err)
-		err = generic_permission(inode, mask, flags, NULL);
+		err = generic_permission(inode, mask, flags);
 	return err;
 }
 
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index bc4b12ca537b..b79804fa410f 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -239,7 +239,7 @@ static int cifs_permission(struct inode *inode, int mask, unsigned int flags)
 		on the client (above and beyond ACL on servers) for
 		servers which do not support setting and viewing mode bits,
 		so allowing client to check permissions is useful */
-		return generic_permission(inode, mask, flags, NULL);
+		return generic_permission(inode, mask, flags);
 }
 
 static struct kmem_cache *cifs_inode_cachep;
diff --git a/fs/ext2/acl.c b/fs/ext2/acl.c
index abea5a17c764..8bb00df4982e 100644
--- a/fs/ext2/acl.c
+++ b/fs/ext2/acl.c
@@ -236,12 +236,6 @@ ext2_check_acl(struct inode *inode, int mask, unsigned int flags)
 {
 	struct posix_acl *acl;
 
-	if (flags & IPERM_FLAG_RCU) {
-		if (!negative_cached_acl(inode, ACL_TYPE_ACCESS))
-			return -ECHILD;
-		return -EAGAIN;
-	}
-
 	acl = ext2_get_acl(inode, ACL_TYPE_ACCESS);
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
diff --git a/fs/ext3/acl.c b/fs/ext3/acl.c
index 9d021c0d472a..5de762eeb662 100644
--- a/fs/ext3/acl.c
+++ b/fs/ext3/acl.c
@@ -244,12 +244,6 @@ ext3_check_acl(struct inode *inode, int mask, unsigned int flags)
 {
 	struct posix_acl *acl;
 
-	if (flags & IPERM_FLAG_RCU) {
-		if (!negative_cached_acl(inode, ACL_TYPE_ACCESS))
-			return -ECHILD;
-		return -EAGAIN;
-	}
-
 	acl = ext3_get_acl(inode, ACL_TYPE_ACCESS);
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
index 21eacd7b7d79..9a4fa6347270 100644
--- a/fs/ext4/acl.c
+++ b/fs/ext4/acl.c
@@ -242,12 +242,6 @@ ext4_check_acl(struct inode *inode, int mask, unsigned int flags)
 {
 	struct posix_acl *acl;
 
-	if (flags & IPERM_FLAG_RCU) {
-		if (!negative_cached_acl(inode, ACL_TYPE_ACCESS))
-			return -ECHILD;
-		return -EAGAIN;
-	}
-
 	acl = ext4_get_acl(inode, ACL_TYPE_ACCESS);
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index d50160714595..0a2fcd860ad6 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1018,7 +1018,7 @@ static int fuse_permission(struct inode *inode, int mask, unsigned int flags)
 	}
 
 	if (fc->flags & FUSE_DEFAULT_PERMISSIONS) {
-		err = generic_permission(inode, mask, flags, NULL);
+		err = generic_permission(inode, mask, flags);
 
 		/* If permission is denied, try to refresh file
 		   attributes.  This is also needed, because the root
@@ -1027,7 +1027,7 @@ static int fuse_permission(struct inode *inode, int mask, unsigned int flags)
 			err = fuse_perm_getattr(inode, flags);
 			if (!err)
 				err = generic_permission(inode, mask,
-							flags, NULL);
+							flags);
 		}
 
 		/* Note: the opposite of the above test does not
diff --git a/fs/gfs2/acl.c b/fs/gfs2/acl.c
index cbc07155b1a0..71690e2dcada 100644
--- a/fs/gfs2/acl.c
+++ b/fs/gfs2/acl.c
@@ -80,12 +80,6 @@ int gfs2_check_acl(struct inode *inode, int mask, unsigned int flags)
 	struct posix_acl *acl;
 	int error;
 
-	if (flags & IPERM_FLAG_RCU) {
-		if (!negative_cached_acl(inode, ACL_TYPE_ACCESS))
-			return -ECHILD;
-		return -EAGAIN;
-	}
-
 	acl = gfs2_acl_get(GFS2_I(inode), ACL_TYPE_ACCESS);
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 03e0c529063e..49786f8228d9 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -1564,7 +1564,7 @@ int gfs2_permission(struct inode *inode, int mask, unsigned int flags)
 	if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode))
 		error = -EACCES;
 	else
-		error = generic_permission(inode, mask, flags, gfs2_check_acl);
+		error = generic_permission(inode, mask, flags);
 	if (unlock)
 		gfs2_glock_dq_uninit(&i_gh);
 
@@ -1847,6 +1847,7 @@ out:
 
 const struct inode_operations gfs2_file_iops = {
 	.permission = gfs2_permission,
+	.check_acl = gfs2_check_acl,
 	.setattr = gfs2_setattr,
 	.getattr = gfs2_getattr,
 	.setxattr = gfs2_setxattr,
@@ -1867,6 +1868,7 @@ const struct inode_operations gfs2_dir_iops = {
 	.mknod = gfs2_mknod,
 	.rename = gfs2_rename,
 	.permission = gfs2_permission,
+	.check_acl = gfs2_check_acl,
 	.setattr = gfs2_setattr,
 	.getattr = gfs2_getattr,
 	.setxattr = gfs2_setxattr,
@@ -1881,6 +1883,7 @@ const struct inode_operations gfs2_symlink_iops = {
 	.follow_link = gfs2_follow_link,
 	.put_link = gfs2_put_link,
 	.permission = gfs2_permission,
+	.check_acl = gfs2_check_acl,
 	.setattr = gfs2_setattr,
 	.getattr = gfs2_getattr,
 	.setxattr = gfs2_setxattr,
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index 2638c834ed28..a98d0d1aef65 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -770,7 +770,7 @@ int hostfs_permission(struct inode *ino, int desired, unsigned int flags)
 		err = access_file(name, r, w, x);
 	__putname(name);
 	if (!err)
-		err = generic_permission(ino, desired, flags, NULL);
+		err = generic_permission(ino, desired, flags);
 	return err;
 }
 
diff --git a/fs/hpfs/namei.c b/fs/hpfs/namei.c
index acf95dab2aac..bd2ce7dd8df3 100644
--- a/fs/hpfs/namei.c
+++ b/fs/hpfs/namei.c
@@ -398,7 +398,7 @@ again:
 			hpfs_unlock(dir->i_sb);
 			return -ENOSPC;
 		}
-		if (generic_permission(inode, MAY_WRITE, 0, NULL) ||
+		if (generic_permission(inode, MAY_WRITE, 0) ||
 		    !S_ISREG(inode->i_mode) ||
 		    get_write_access(inode)) {
 			d_rehash(dentry);
diff --git a/fs/jffs2/acl.c b/fs/jffs2/acl.c
index 828a0e1ea438..5648b0c1660c 100644
--- a/fs/jffs2/acl.c
+++ b/fs/jffs2/acl.c
@@ -264,9 +264,6 @@ int jffs2_check_acl(struct inode *inode, int mask, unsigned int flags)
 	struct posix_acl *acl;
 	int rc;
 
-	if (flags & IPERM_FLAG_RCU)
-		return -ECHILD;
-
 	acl = jffs2_get_acl(inode, ACL_TYPE_ACCESS);
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
diff --git a/fs/jfs/acl.c b/fs/jfs/acl.c
index e5de9422fa32..787183eb2ce6 100644
--- a/fs/jfs/acl.c
+++ b/fs/jfs/acl.c
@@ -118,9 +118,6 @@ int jfs_check_acl(struct inode *inode, int mask, unsigned int flags)
 {
 	struct posix_acl *acl;
 
-	if (flags & IPERM_FLAG_RCU)
-		return -ECHILD;
-
 	acl = jfs_get_acl(inode, ACL_TYPE_ACCESS);
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
diff --git a/fs/namei.c b/fs/namei.c
index 02b680e0e816..e05c335a9a27 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -32,6 +32,7 @@
 #include <linux/fcntl.h>
 #include <linux/device_cgroup.h>
 #include <linux/fs_struct.h>
+#include <linux/posix_acl.h>
 #include <asm/uaccess.h>
 
 #include "internal.h"
@@ -173,11 +174,57 @@ void putname(const char *name)
 EXPORT_SYMBOL(putname);
 #endif
 
+static int check_acl(struct inode *inode, int mask, unsigned int flags)
+{
+	struct posix_acl *acl;
+
+	/*
+	 * Under RCU walk, we cannot even do a "get_cached_acl()",
+	 * because that involves locking and getting a refcount on
+	 * a cached ACL.
+	 *
+	 * So the only case we handle during RCU walking is the
+	 * case of a cached "no ACL at all", which needs no locks
+	 * or refcounts.
+	 */
+	if (flags & IPERM_FLAG_RCU) {
+		if (negative_cached_acl(inode, ACL_TYPE_ACCESS))
+			return -EAGAIN;
+		return -ECHILD;
+	}
+
+	acl = get_cached_acl(inode, ACL_TYPE_ACCESS);
+
+	/*
+	 * A filesystem can force a ACL callback by just never
+	 * filling the ACL cache. But normally you'd fill the
+	 * cache either at inode instantiation time, or on the
+	 * first ->check_acl call.
+	 *
+	 * If the filesystem doesn't have a check_acl() function
+	 * at all, we'll just create the negative cache entry.
+	 */
+	if (acl == ACL_NOT_CACHED) {
+		if (inode->i_op->check_acl)
+			return inode->i_op->check_acl(inode, mask, flags);
+
+		set_cached_acl(inode, ACL_TYPE_ACCESS, NULL);
+		return -EAGAIN;
+	}
+
+	if (acl) {
+		int error = posix_acl_permission(inode, acl, mask);
+		posix_acl_release(acl);
+		return error;
+	}
+
+	return -EAGAIN;
+}
+
 /*
  * This does basic POSIX ACL permission checking
  */
-static int acl_permission_check(struct inode *inode, int mask, unsigned int flags,
-		int (*check_acl)(struct inode *inode, int mask, unsigned int flags))
+static int acl_permission_check(struct inode *inode, int mask, unsigned int flags)
 {
 	unsigned int mode = inode->i_mode;
 
@@ -189,7 +236,7 @@ static int acl_permission_check(struct inode *inode, int mask, unsigned int flag
 	if (current_fsuid() == inode->i_uid)
 		mode >>= 6;
 	else {
-		if (IS_POSIXACL(inode) && (mode & S_IRWXG) && check_acl) {
+		if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
 			int error = check_acl(inode, mask, flags);
 			if (error != -EAGAIN)
 				return error;
@@ -212,7 +259,6 @@ other_perms:
  * generic_permission -  check for access rights on a Posix-like filesystem
  * @inode:	inode to check access rights for
  * @mask:	right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
- * @check_acl:	optional callback to check for Posix ACLs
  * @flags:	IPERM_FLAG_ flags.
  *
  * Used to check for read/write/execute permissions on a file.
@@ -220,19 +266,18 @@ other_perms:
  * for filesystem access without changing the "normal" uids which
  * are used for other things.
  *
- * generic_permission is rcu-walk aware. It returns -ECHILD in case an rcu-walk
+ * acl_permission_check() is rcu-walk aware. It returns -ECHILD in case an rcu-walk
  * request cannot be satisfied (eg. requires blocking or too much complexity).
  * It would then be called again in ref-walk mode.
  */
-int generic_permission(struct inode *inode, int mask, unsigned int flags,
-	int (*check_acl)(struct inode *inode, int mask, unsigned int flags))
+int generic_permission(struct inode *inode, int mask, unsigned int flags)
 {
 	int ret;
 
 	/*
 	 * Do the basic POSIX ACL permission checks.
 	 */
-	ret = acl_permission_check(inode, mask, flags, check_acl);
+	ret = acl_permission_check(inode, mask, flags);
 	if (ret != -EACCES)
 		return ret;
 
@@ -290,8 +335,7 @@ int inode_permission(struct inode *inode, int mask)
 	if (inode->i_op->permission)
 		retval = inode->i_op->permission(inode, mask, 0);
 	else
-		retval = generic_permission(inode, mask, 0,
-				inode->i_op->check_acl);
+		retval = generic_permission(inode, mask, 0);
 
 	if (retval)
 		return retval;
@@ -585,8 +629,7 @@ static inline int exec_permission(struct dentry *dentry, struct inode *inode, un
 	if (dentry->d_flags & DCACHE_OP_PERMISSION) {
 		ret = inode->i_op->permission(inode, MAY_EXEC, flags);
 	} else {
-		ret = acl_permission_check(inode, MAY_EXEC, flags,
-				inode->i_op->check_acl);
+		ret = acl_permission_check(inode, MAY_EXEC, flags);
 	}
 	if (likely(!ret))
 		goto ok;
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index ededdbd0db38..0485dca34fb1 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2328,7 +2328,7 @@ out:
 out_notsup:
 	res = nfs_revalidate_inode(NFS_SERVER(inode), inode);
 	if (res == 0)
-		res = generic_permission(inode, mask, flags, NULL);
+		res = generic_permission(inode, mask, flags);
 	goto out;
 }
 
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index b9b45fc2903e..650aa7755003 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -806,7 +806,7 @@ int nilfs_permission(struct inode *inode, int mask, unsigned int flags)
 	    root->cno != NILFS_CPTREE_CURRENT_CNO)
 		return -EROFS; /* snapshot is not writable */
 
-	return generic_permission(inode, mask, flags, NULL);
+	return generic_permission(inode, mask, flags);
 }
 
 int nilfs_load_inode_block(struct inode *inode, struct buffer_head **pbh)
diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index e913ad130fdd..bf1c5baad160 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -297,9 +297,6 @@ int ocfs2_check_acl(struct inode *inode, int mask, unsigned int flags)
 	struct posix_acl *acl;
 	int ret = -EAGAIN;
 
-	if (flags & IPERM_FLAG_RCU)
-		return -ECHILD;
-
 	osb = OCFS2_SB(inode->i_sb);
 	if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
 		return ret;
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index b1e35a392ca5..adb340e37785 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1293,7 +1293,7 @@ int ocfs2_permission(struct inode *inode, int mask, unsigned int flags)
 		goto out;
 	}
 
-	ret = generic_permission(inode, mask, flags, ocfs2_check_acl);
+	ret = generic_permission(inode, mask, flags);
 
 	ocfs2_inode_unlock(inode, 0);
 out:
@@ -2588,6 +2588,7 @@ const struct inode_operations ocfs2_file_iops = {
 	.setattr	= ocfs2_setattr,
 	.getattr	= ocfs2_getattr,
 	.permission	= ocfs2_permission,
+	.check_acl	= ocfs2_check_acl,
 	.setxattr	= generic_setxattr,
 	.getxattr	= generic_getxattr,
 	.listxattr	= ocfs2_listxattr,
@@ -2599,6 +2600,7 @@ const struct inode_operations ocfs2_special_file_iops = {
 	.setattr	= ocfs2_setattr,
 	.getattr	= ocfs2_getattr,
 	.permission	= ocfs2_permission,
+	.check_acl	= ocfs2_check_acl,
 };
 
 /*
diff --git a/fs/proc/base.c b/fs/proc/base.c
index fc5bc2767692..8b8470113576 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2169,7 +2169,7 @@ static const struct file_operations proc_fd_operations = {
  */
 static int proc_fd_permission(struct inode *inode, int mask, unsigned int flags)
 {
-	int rv = generic_permission(inode, mask, flags, NULL);
+	int rv = generic_permission(inode, mask, flags);
 	if (rv == 0)
 		return 0;
 	if (task_pid(current) == proc_pid(inode))
diff --git a/fs/reiserfs/xattr.c b/fs/reiserfs/xattr.c
index d78089690965..5bc43c088169 100644
--- a/fs/reiserfs/xattr.c
+++ b/fs/reiserfs/xattr.c
@@ -873,8 +873,11 @@ static int reiserfs_check_acl(struct inode *inode, int mask, unsigned int flags)
 	struct posix_acl *acl;
 	int error = -EAGAIN; /* do regular unix permission checks by default */
 
-	if (flags & IPERM_FLAG_RCU)
-		return -ECHILD;
+	/*
+	 * Stat data v1 doesn't support ACLs.
+	 */
+	if (get_inode_sd_version(inode) != STAT_DATA_V1)
+		return -EAGAIN;
 
 	acl = reiserfs_get_acl(inode, ACL_TYPE_ACCESS);
 
@@ -961,15 +964,7 @@ int reiserfs_permission(struct inode *inode, int mask, unsigned int flags)
 	if (IS_PRIVATE(inode))
 		return 0;
 
-#ifdef CONFIG_REISERFS_FS_XATTR
-	/*
-	 * Stat data v1 doesn't support ACLs.
-	 */
-	if (get_inode_sd_version(inode) != STAT_DATA_V1)
-		return generic_permission(inode, mask, flags,
-					reiserfs_check_acl);
-#endif
-	return generic_permission(inode, mask, flags, NULL);
+	return generic_permission(inode, mask, flags);
 }
 
 static int xattr_hide_revalidate(struct dentry *dentry, struct nameidata *nd)
diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 0a12eb89cd32..a37165c64757 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -362,5 +362,5 @@ int sysfs_permission(struct inode *inode, int mask, unsigned int flags)
 	sysfs_refresh_inode(sd, inode);
 	mutex_unlock(&sysfs_mutex);
 
-	return generic_permission(inode, mask, flags, NULL);
+	return generic_permission(inode, mask, flags);
 }
diff --git a/fs/xfs/linux-2.6/xfs_acl.c b/fs/xfs/linux-2.6/xfs_acl.c
index 39f4f809bb68..b550f05b1843 100644
--- a/fs/xfs/linux-2.6/xfs_acl.c
+++ b/fs/xfs/linux-2.6/xfs_acl.c
@@ -231,16 +231,12 @@ xfs_check_acl(struct inode *inode, int mask, unsigned int flags)
 	/*
 	 * If there is no attribute fork no ACL exists on this inode and
 	 * we can skip the whole exercise.
+	 *
+	 * FIXME! Fill the cache! Locking?
 	 */
 	if (!XFS_IFORK_Q(ip))
 		return -EAGAIN;
 
-	if (flags & IPERM_FLAG_RCU) {
-		if (!negative_cached_acl(inode, ACL_TYPE_ACCESS))
-			return -ECHILD;
-		return -EAGAIN;
-	}
-
 	acl = xfs_get_acl(inode, ACL_TYPE_ACCESS);
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b5b979247863..f742f27bdcfa 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2188,8 +2188,7 @@ extern sector_t bmap(struct inode *, sector_t);
 #endif
 extern int notify_change(struct dentry *, struct iattr *);
 extern int inode_permission(struct inode *, int);
-extern int generic_permission(struct inode *, int, unsigned int,
-		int (*check_acl)(struct inode *, int, unsigned int));
+extern int generic_permission(struct inode *, int, unsigned int);
 
 static inline bool execute_ok(struct inode *inode)
 {
-- 
1.7.6.233.gd79bc.dirty


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] VFS: Cut down inode->i_op->xyz accesses in path walking
  2011-07-22 17:37 ` [PATCH 1/2] VFS: Cut down inode->i_op->xyz accesses in path walking Linus Torvalds
@ 2011-07-22 17:47   ` Christoph Hellwig
  2011-07-22 23:40   ` Al Viro
  2011-07-23  3:55   ` [PATCH] " Linus Torvalds
  2 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2011-07-22 17:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, Christoph Hellwig, linux-fsdevel

On Fri, Jul 22, 2011 at 10:37:58AM -0700, Linus Torvalds wrote:
> +		if (unlikely(inode->i_op->lookup))
> +			dentry->d_flags |= DCACHE_OP_LOOKUP;

Would simply adding an S_ISDIR check first help the same without adding
another flag?

> +		if (unlikely(inode->i_op->permission))
> +			dentry->d_flags |= DCACHE_OP_PERMISSION;
> +		if (unlikely(inode->i_op->follow_link))
> +			dentry->d_flags |= DCACHE_OP_FOLLOW_LINK;

Now that the follow_link abuse for automoun is gone I wonder if we really
need to keep supporting it on anything but links.  If not we could
avoid the additional flag by simply checking of S_ISLNK first.


We can also add new dentry to inode pointers without going through
__d_instanciate in d_obtain_alias, so that case also needs to set your
flags.

> diff --git a/fs/namei.c b/fs/namei.c
> index 14ab8d3f2f0c..02b680e0e816 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -577,12 +577,12 @@ static int complete_walk(struct nameidata *nd)
>   * short-cut DAC fails, then call ->permission() to do more
>   * complete permission check.
>   */
> -static inline int exec_permission(struct inode *inode, unsigned int flags)
> +static inline int exec_permission(struct dentry *dentry, struct inode *inode, unsigned int flags)
>  {
>  	int ret;
>  	struct user_namespace *ns = inode_userns(inode);
>  
> -	if (inode->i_op->permission) {
> +	if (dentry->d_flags & DCACHE_OP_PERMISSION) {
>  		ret = inode->i_op->permission(inode, MAY_EXEC, flags);
>  	} else {

The permission bits clash with Al changes in that area, including the
removal of exec_permission and the ->permission prototype change.

I think you should rebase ontop of his for-next tree.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] vfs: move ACL cache lookup into generic code
  2011-07-22 17:45 ` [PATCH 2/2] vfs: move ACL cache lookup into generic code Linus Torvalds
@ 2011-07-22 17:50   ` Christoph Hellwig
  2011-07-22 17:54     ` Linus Torvalds
  2011-07-23  2:34   ` [PATCH] " Linus Torvalds
  1 sibling, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2011-07-22 17:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, Christoph Hellwig, linux-fsdevel

On Fri, Jul 22, 2011 at 10:45:04AM -0700, Linus Torvalds wrote:
> 
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Fri, 22 Jul 2011 10:04:21 -0700
> Subject: [PATCH 2/2] vfs: move ACL cache lookup into generic code
> 
> .. and streamline 'check_acl()' semantics for lowlevel file systems.
> 
> This removes the 'check_acl' argument from 'generic_permission()', and
> just makes the rule be that it should be in inode->i_op, whether you
> have your own filesystem-specific ->permission callback or not.

Al has alread done that in his for-next tree, see:

http://git.kernel.org/?p=linux/kernel/git/viro/vfs-2.6.git;a=commitdiff;h=178ea73521d64ba41d7aa5488fb9f549c6d4507d


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] vfs: move ACL cache lookup into generic code
  2011-07-22 17:50   ` Christoph Hellwig
@ 2011-07-22 17:54     ` Linus Torvalds
  0 siblings, 0 replies; 37+ messages in thread
From: Linus Torvalds @ 2011-07-22 17:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Al Viro, linux-fsdevel

On Fri, Jul 22, 2011 at 10:50 AM, Christoph Hellwig <hch@lst.de> wrote:
>
> Al has alread done that in his for-next tree, see:
>
> http://git.kernel.org/?p=linux/kernel/git/viro/vfs-2.6.git;a=commitdiff;h=178ea73521d64ba41d7aa5488fb9f549c6d4507d

Goodie. Looks largely identical for that code. Works for me.

                      Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] VFS: Cut down inode->i_op->xyz accesses in path walking
  2011-07-22 17:37 ` [PATCH 1/2] VFS: Cut down inode->i_op->xyz accesses in path walking Linus Torvalds
  2011-07-22 17:47   ` Christoph Hellwig
@ 2011-07-22 23:40   ` Al Viro
  2011-07-22 23:54     ` Linus Torvalds
  2011-07-23  3:55   ` [PATCH] " Linus Torvalds
  2 siblings, 1 reply; 37+ messages in thread
From: Al Viro @ 2011-07-22 23:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel

On Fri, Jul 22, 2011 at 10:37:58AM -0700, Linus Torvalds wrote:
> 
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Fri, 22 Jul 2011 08:44:51 -0700
> Subject: [PATCH 1/2] VFS: Cut down inode->i_op->xyz accesses in path walking
> 
> One of the biggest remaining unnecessary costs in path walking is the
> pointer chasing in inode operations.  We already avoided the
> dentry->d_op derferences with the DCACHE_OP_xyz flags, this just starts
> doing the same thing for the i_op->xyz cases.

> +		if (unlikely(inode->i_op->lookup))
> +			dentry->d_flags |= DCACHE_OP_LOOKUP;
> +		if (unlikely(inode->i_op->permission))
> +			dentry->d_flags |= DCACHE_OP_PERMISSION;
> +		if (unlikely(inode->i_op->follow_link))
> +			dentry->d_flags |= DCACHE_OP_FOLLOW_LINK;

I'm not sure about that one...  What happens after ln -s foo bar; rm bar;
touch bar;?  IOW, where do you clean them?

Other than that, it needs rebase on top of #for-next - conflicts with
->permission() series.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] VFS: Cut down inode->i_op->xyz accesses in path walking
  2011-07-22 23:40   ` Al Viro
@ 2011-07-22 23:54     ` Linus Torvalds
  0 siblings, 0 replies; 37+ messages in thread
From: Linus Torvalds @ 2011-07-22 23:54 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel

On Fri, Jul 22, 2011 at 4:40 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> I'm not sure about that one...  What happens after ln -s foo bar; rm bar;
> touch bar;?  IOW, where do you clean them?

Good point. I didn't think the inode would ever change, but you're
right, it does get cleared. Will look at it.

> Other than that, it needs rebase on top of #for-next - conflicts with
> ->permission() series.

.. and I'll do that after I see your pull. Most of the ACL patch goes
away at that point too (it looks close enough to your permission patch
that there will be very few conflicts, but I'll rebase rather than
merge).

                                  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH] vfs: move ACL cache lookup into generic code
  2011-07-22 17:45 ` [PATCH 2/2] vfs: move ACL cache lookup into generic code Linus Torvalds
  2011-07-22 17:50   ` Christoph Hellwig
@ 2011-07-23  2:34   ` Linus Torvalds
  2011-07-23  3:29     ` Al Viro
  1 sibling, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2011-07-23  2:34 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig, linux-fsdevel


From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 22 Jul 2011 19:30:19 -0700
Subject: [PATCH] vfs: move ACL cache lookup into generic code

This moves logic for checking the cached ACL values from low-level
filesystems into generic code.  The end result is a streamlined ACL
check that doesn't need to load the inode->i_op->check_acl pointer at
all for the common cached case.

The filesystems also don't need to check for a non-blocking RCU walk
case in their acl_check() functions, because that is all handled at a
VFS layer.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

This is the previous 2/2 patch, rebased on top of Al's VFS tree that I 
just pulled in.

I haven't done 1/2, because as Al correctly points out, I do need to clear 
the bits when d_inode is cleared. 

Comments on this part? It's untested, but it's a very straightforward 
conversion of my previous patch (that I did test).

 fs/9p/acl.c                |    3 --
 fs/btrfs/acl.c             |   19 ++++++----------
 fs/ext2/acl.c              |    6 -----
 fs/ext3/acl.c              |    6 -----
 fs/ext4/acl.c              |    6 -----
 fs/generic_acl.c           |   19 ++++++----------
 fs/gfs2/acl.c              |    6 -----
 fs/jffs2/acl.c             |    3 --
 fs/jfs/acl.c               |    3 --
 fs/namei.c                 |   52 +++++++++++++++++++++++++++++++++++++++++--
 fs/ocfs2/acl.c             |    3 --
 fs/xfs/linux-2.6/xfs_acl.c |    8 +-----
 12 files changed, 65 insertions(+), 69 deletions(-)

diff --git a/fs/9p/acl.c b/fs/9p/acl.c
index e98f56d3787d..59455bf0f5dd 100644
--- a/fs/9p/acl.c
+++ b/fs/9p/acl.c
@@ -101,9 +101,6 @@ int v9fs_check_acl(struct inode *inode, int mask)
 	struct posix_acl *acl;
 	struct v9fs_session_info *v9ses;
 
-	if (mask & MAY_NOT_BLOCK)
-		return -ECHILD;
-
 	v9ses = v9fs_inode2v9ses(inode);
 	if (((v9ses->flags & V9FS_ACCESS_MASK) != V9FS_ACCESS_CLIENT) ||
 			((v9ses->flags & V9FS_ACL_MASK) != V9FS_POSIX_ACL)) {
diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c
index 9f62ab2a7282..c13ea9fbf36b 100644
--- a/fs/btrfs/acl.c
+++ b/fs/btrfs/acl.c
@@ -198,19 +198,14 @@ out:
 int btrfs_check_acl(struct inode *inode, int mask)
 {
 	int error = -EAGAIN;
+	struct posix_acl *acl;
 
-	if (mask & MAY_NOT_BLOCK) {
-		if (!negative_cached_acl(inode, ACL_TYPE_ACCESS))
-			error = -ECHILD;
-	} else {
-		struct posix_acl *acl;
-		acl = btrfs_get_acl(inode, ACL_TYPE_ACCESS);
-		if (IS_ERR(acl))
-			return PTR_ERR(acl);
-		if (acl) {
-			error = posix_acl_permission(inode, acl, mask);
-			posix_acl_release(acl);
-		}
+	acl = btrfs_get_acl(inode, ACL_TYPE_ACCESS);
+	if (IS_ERR(acl))
+		return PTR_ERR(acl);
+	if (acl) {
+		error = posix_acl_permission(inode, acl, mask);
+		posix_acl_release(acl);
 	}
 
 	return error;
diff --git a/fs/ext2/acl.c b/fs/ext2/acl.c
index bfe651f9ae16..ced1c478ebdb 100644
--- a/fs/ext2/acl.c
+++ b/fs/ext2/acl.c
@@ -236,12 +236,6 @@ ext2_check_acl(struct inode *inode, int mask)
 {
 	struct posix_acl *acl;
 
-	if (mask & MAY_NOT_BLOCK) {
-		if (!negative_cached_acl(inode, ACL_TYPE_ACCESS))
-			return -ECHILD;
-		return -EAGAIN;
-	}
-
 	acl = ext2_get_acl(inode, ACL_TYPE_ACCESS);
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
diff --git a/fs/ext3/acl.c b/fs/ext3/acl.c
index edfeb293d4cb..5326038e8536 100644
--- a/fs/ext3/acl.c
+++ b/fs/ext3/acl.c
@@ -244,12 +244,6 @@ ext3_check_acl(struct inode *inode, int mask)
 {
 	struct posix_acl *acl;
 
-	if (mask & MAY_NOT_BLOCK) {
-		if (!negative_cached_acl(inode, ACL_TYPE_ACCESS))
-			return -ECHILD;
-		return -EAGAIN;
-	}
-
 	acl = ext3_get_acl(inode, ACL_TYPE_ACCESS);
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
index 60d900fcc3db..4cd9e2e4085e 100644
--- a/fs/ext4/acl.c
+++ b/fs/ext4/acl.c
@@ -242,12 +242,6 @@ ext4_check_acl(struct inode *inode, int mask)
 {
 	struct posix_acl *acl;
 
-	if (mask & MAY_NOT_BLOCK) {
-		if (!negative_cached_acl(inode, ACL_TYPE_ACCESS))
-			return -ECHILD;
-		return -EAGAIN;
-	}
-
 	acl = ext4_get_acl(inode, ACL_TYPE_ACCESS);
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
diff --git a/fs/generic_acl.c b/fs/generic_acl.c
index 70e90b4974ce..4949473d3542 100644
--- a/fs/generic_acl.c
+++ b/fs/generic_acl.c
@@ -192,18 +192,13 @@ generic_acl_chmod(struct inode *inode)
 int
 generic_check_acl(struct inode *inode, int mask)
 {
-	if (mask & MAY_NOT_BLOCK) {
-		if (!negative_cached_acl(inode, ACL_TYPE_ACCESS))
-			return -ECHILD;
-	} else {
-		struct posix_acl *acl;
-
-		acl = get_cached_acl(inode, ACL_TYPE_ACCESS);
-		if (acl) {
-			int error = posix_acl_permission(inode, acl, mask);
-			posix_acl_release(acl);
-			return error;
-		}
+	struct posix_acl *acl;
+
+	acl = get_cached_acl(inode, ACL_TYPE_ACCESS);
+	if (acl) {
+		int error = posix_acl_permission(inode, acl, mask);
+		posix_acl_release(acl);
+		return error;
 	}
 	return -EAGAIN;
 }
diff --git a/fs/gfs2/acl.c b/fs/gfs2/acl.c
index 8ef1079f1665..48171f4c943d 100644
--- a/fs/gfs2/acl.c
+++ b/fs/gfs2/acl.c
@@ -80,12 +80,6 @@ int gfs2_check_acl(struct inode *inode, int mask)
 	struct posix_acl *acl;
 	int error;
 
-	if (mask & MAY_NOT_BLOCK) {
-		if (!negative_cached_acl(inode, ACL_TYPE_ACCESS))
-			return -ECHILD;
-		return -EAGAIN;
-	}
-
 	acl = gfs2_acl_get(GFS2_I(inode), ACL_TYPE_ACCESS);
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
diff --git a/fs/jffs2/acl.c b/fs/jffs2/acl.c
index 3675b3cdee89..556daa797e0a 100644
--- a/fs/jffs2/acl.c
+++ b/fs/jffs2/acl.c
@@ -264,9 +264,6 @@ int jffs2_check_acl(struct inode *inode, int mask)
 	struct posix_acl *acl;
 	int rc;
 
-	if (mask & MAY_NOT_BLOCK)
-		return -ECHILD;
-
 	acl = jffs2_get_acl(inode, ACL_TYPE_ACCESS);
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
diff --git a/fs/jfs/acl.c b/fs/jfs/acl.c
index 8a0a0666d5a6..ead200eef5e4 100644
--- a/fs/jfs/acl.c
+++ b/fs/jfs/acl.c
@@ -118,9 +118,6 @@ int jfs_check_acl(struct inode *inode, int mask)
 {
 	struct posix_acl *acl;
 
-	if (mask & MAY_NOT_BLOCK)
-		return -ECHILD;
-
 	acl = jfs_get_acl(inode, ACL_TYPE_ACCESS);
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
diff --git a/fs/namei.c b/fs/namei.c
index b7fad009bbf6..120efc76d3d0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -32,6 +32,7 @@
 #include <linux/fcntl.h>
 #include <linux/device_cgroup.h>
 #include <linux/fs_struct.h>
+#include <linux/posix_acl.h>
 #include <asm/uaccess.h>
 
 #include "internal.h"
@@ -173,12 +174,58 @@ void putname(const char *name)
 EXPORT_SYMBOL(putname);
 #endif
 
+static int check_acl(struct inode *inode, int mask)
+{
+	struct posix_acl *acl;
+
+	/*
+	 * Under RCU walk, we cannot even do a "get_cached_acl()",
+	 * because that involves locking and getting a refcount on
+	 * a cached ACL.
+	 *
+	 * So the only case we handle during RCU walking is the
+	 * case of a cached "no ACL at all", which needs no locks
+	 * or refcounts.
+	 */
+	if (mask & MAY_NOT_BLOCK) {
+	        if (negative_cached_acl(inode, ACL_TYPE_ACCESS))
+	                return -EAGAIN;
+	        return -ECHILD;
+	}
+
+	acl = get_cached_acl(inode, ACL_TYPE_ACCESS);
+
+	/*
+	 * A filesystem can force a ACL callback by just never
+	 * filling the ACL cache. But normally you'd fill the
+	 * cache either at inode instantiation time, or on the
+	 * first ->check_acl call.
+	 *
+	 * If the filesystem doesn't have a check_acl() function
+	 * at all, we'll just create the negative cache entry.
+	 */
+	if (acl == ACL_NOT_CACHED) {
+	        if (inode->i_op->check_acl)
+	                return inode->i_op->check_acl(inode, mask);
+
+	        set_cached_acl(inode, ACL_TYPE_ACCESS, NULL);
+	        return -EAGAIN;
+	}
+
+	if (acl) {
+	        int error = posix_acl_permission(inode, acl, mask);
+	        posix_acl_release(acl);
+	        return error;
+	}
+
+	return -EAGAIN;
+}
+
 /*
  * This does basic POSIX ACL permission checking
  */
 static int acl_permission_check(struct inode *inode, int mask)
 {
-	int (*check_acl)(struct inode *inode, int mask);
 	unsigned int mode = inode->i_mode;
 
 	mask &= MAY_READ | MAY_WRITE | MAY_EXEC | MAY_NOT_BLOCK;
@@ -189,8 +236,7 @@ static int acl_permission_check(struct inode *inode, int mask)
 	if (current_fsuid() == inode->i_uid)
 		mode >>= 6;
 	else {
-		check_acl = inode->i_op->check_acl;
-		if (IS_POSIXACL(inode) && (mode & S_IRWXG) && check_acl) {
+		if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
 			int error = check_acl(inode, mask);
 			if (error != -EAGAIN)
 				return error;
diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
index 1cee970eb55a..4de93fb7fd5f 100644
--- a/fs/ocfs2/acl.c
+++ b/fs/ocfs2/acl.c
@@ -297,9 +297,6 @@ int ocfs2_check_acl(struct inode *inode, int mask)
 	struct posix_acl *acl;
 	int ret = -EAGAIN;
 
-	if (mask & MAY_NOT_BLOCK)
-		return -ECHILD;
-
 	osb = OCFS2_SB(inode->i_sb);
 	if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
 		return ret;
diff --git a/fs/xfs/linux-2.6/xfs_acl.c b/fs/xfs/linux-2.6/xfs_acl.c
index cac48fe22ad5..f6d065ac56b5 100644
--- a/fs/xfs/linux-2.6/xfs_acl.c
+++ b/fs/xfs/linux-2.6/xfs_acl.c
@@ -231,16 +231,12 @@ xfs_check_acl(struct inode *inode, int mask)
 	/*
 	 * If there is no attribute fork no ACL exists on this inode and
 	 * we can skip the whole exercise.
+	 *
+	 * FIXME! Fill the cache! Locking?
 	 */
 	if (!XFS_IFORK_Q(ip))
 		return -EAGAIN;
 
-	if (mask & MAY_NOT_BLOCK) {
-		if (!negative_cached_acl(inode, ACL_TYPE_ACCESS))
-			return -ECHILD;
-		return -EAGAIN;
-	}
-
 	acl = xfs_get_acl(inode, ACL_TYPE_ACCESS);
 	if (IS_ERR(acl))
 		return PTR_ERR(acl);
-- 
1.7.6.233.gd79bc.dirty

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH] vfs: move ACL cache lookup into generic code
  2011-07-23  2:34   ` [PATCH] " Linus Torvalds
@ 2011-07-23  3:29     ` Al Viro
  2011-07-23  3:42       ` Linus Torvalds
  2011-07-23  7:47       ` Al Viro
  0 siblings, 2 replies; 37+ messages in thread
From: Al Viro @ 2011-07-23  3:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel

On Fri, Jul 22, 2011 at 07:34:43PM -0700, Linus Torvalds wrote:

> Comments on this part? It's untested, but it's a very straightforward 
> conversion of my previous patch (that I did test).

Looks fine and I'm OK with applying and throwing into the next pull request, 
but...

> +	 * Under RCU walk, we cannot even do a "get_cached_acl()",
> +	 * because that involves locking and getting a refcount on
> +	 * a cached ACL.

... why is that a problem?  Locking there is mere ->i_lock and getting
a refcount is atomic_inc().  Grabbing a reference might be Not Nice from
the cacheline bouncing POV, but...

IIRC, these suckers (in-core ones) are copy-on-write.  If so we should be
reasonably safe here.  We obviously have no business trying to get ACL
from fs if it's not cached, but other than that...

Mind you, if they *are* C-O-W, we could do freeing them via RCU and avoid
even bothering with reference...

<checks> oh, my...  All callers of posix_acl_chmod_masq() have exactly the
same form:
	clone = posix_acl_clone(acl, some_flags);
	if (clone)
		error = -ENOMEM;
		sod off
	posix_acl_release(acl);
	error = posix_acl_chmod_masq(clone, mode);
	if (error)
		posix_acl_release(clone);
		sod off
	do something useful
Sounds like a really dumb API to me...

They are C-O-W, all right.  With too low-level helpers exposed...

posix_acl_create_masq(): same story, apparently.  Whee...  on ocfs2:
                clone = posix_acl_clone(acl, GFP_NOFS);
                ret = -ENOMEM;
                if (!clone)
                        goto cleanup;

                mode = inode->i_mode;
                ret = posix_acl_create_masq(clone, &mode);
                if (ret >= 0) {
                        ret2 = ocfs2_acl_set_mode(inode, di_bh, handle, mode);
                        if (ret2) {
                                mlog_errno(ret2);
                                ret = ret2;
                                goto cleanup;
	...
cleanup:
        posix_acl_release(acl);
        return ret;

Leaks'R'Us...

OK, unless there are serious objections, I
	* apply Linus' patch as-is
	* fix that ocfs2 leak
	* replace posix_acl_..._masq() with saner helpers (take original
acl + other arguments, return modified clone or ERR_PTR) and kill open-coded
instances...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] vfs: move ACL cache lookup into generic code
  2011-07-23  3:29     ` Al Viro
@ 2011-07-23  3:42       ` Linus Torvalds
  2011-07-23  4:31         ` Al Viro
  2011-07-23  7:47       ` Al Viro
  1 sibling, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2011-07-23  3:42 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel

On Fri, Jul 22, 2011 at 8:29 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>> +      * Under RCU walk, we cannot even do a "get_cached_acl()",
>> +      * because that involves locking and getting a refcount on
>> +      * a cached ACL.
>
> ... why is that a problem?  Locking there is mere ->i_lock and getting
> a refcount is atomic_inc().  Grabbing a reference might be Not Nice from
> the cacheline bouncing POV, but...

I agree in theory, but it's not something we've done before. Some of
the posix-acl code is pretty disgusting, I didn't even want to go
there.

And the case that tends to really *matter* is the "no acls" case
anyway, and that's the case that is guaranteed to have no nasty races
or odd issues with having to allocate/de-allocate any acl structures.
But yes, this could be looked at.

                 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH] VFS: Cut down inode->i_op->xyz accesses in path walking
  2011-07-22 17:37 ` [PATCH 1/2] VFS: Cut down inode->i_op->xyz accesses in path walking Linus Torvalds
  2011-07-22 17:47   ` Christoph Hellwig
  2011-07-22 23:40   ` Al Viro
@ 2011-07-23  3:55   ` Linus Torvalds
  2011-07-23 13:35     ` Christoph Hellwig
       [not found]     ` <alpine.LFD.2.02.1107251852220.13796@i5.linux-foundation.org>
  2 siblings, 2 replies; 37+ messages in thread
From: Linus Torvalds @ 2011-07-23  3:55 UTC (permalink / raw)
  To: Al Viro, Christoph Hellwig, linux-fsdevel


One of the biggest remaining unnecessary costs in path walking is the 
pointer chasing in inode operations.  We already avoided the dentry->d_op 
derferences with the DCACHE_OP_xyz flags, this just starts doing the same 
thing for the i_op->xyz cases.

Maybe-signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

This is a cut-down version of the earlier PATCH 1/2, forward-ported to the 
current vfs state.

It's tested, although I don't claim any serious amount of stressing of it. 
And I changed it so that those DCACHE_OP_{LOOKUP|FOLLOW_LINK} bits are 
only really valid when there is an inode at all. So the bits don't get 
invalidated when turning the dentry negative, they only get re-validated 
when the entry is instantiated again.

If something sets dentry->d_inode without going through __d_instantiate, 
that misses it. I'm looking at d_obtain_alias(), and wondering, for 
example.

This is a very minimal version of the previous patch: those *two* places 
where we now test the new DCACHE_OP_xyz bits are where we really see the 
dereferencing of the inode->i_op in the normal path walking. HOWEVER, the 
"inode_permission()" function also shows up pretty clearly on profiles, 
and the hottest part of that is the testing of "inode->i_op->permission".

However, with Al's other changes, the old special-case "exec_permission()" 
no longer exists. The whole reason it existed before was that the execute 
permission checking on directory entries is/was special from a performance 
standpoint. So it might make sense to resurrect it, and then add the
DCACHE_OP_PERMISSION case like in my older patch.

Anyway, it's not pretty, but that double indirect into a cold-cache
"inode->i_op->xyz" really does show up. So consider this an RFC. The "move 
ACL to generic" code was a serious patch, this is more a fodder for 
discussion.

                Linus

---
 fs/dcache.c            |   10 +++++++++-
 fs/namei.c             |    4 ++--
 include/linux/dcache.h |    6 ++++++
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index be18598c7fd7..3f9d1480f349 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1303,12 +1303,20 @@ EXPORT_SYMBOL(d_set_d_op);
 
 static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 {
+	unsigned int flags;
+
 	spin_lock(&dentry->d_lock);
+	flags = dentry->d_flags & ~DCACHE_OP_INODE;
 	if (inode) {
 		if (unlikely(IS_AUTOMOUNT(inode)))
-			dentry->d_flags |= DCACHE_NEED_AUTOMOUNT;
+			flags |= DCACHE_NEED_AUTOMOUNT;
 		list_add(&dentry->d_alias, &inode->i_dentry);
+		if (inode->i_op->lookup)
+			flags |= DCACHE_OP_LOOKUP;
+		if (inode->i_op->follow_link)
+			flags |= DCACHE_OP_FOLLOW_LINK;
 	}
+	dentry->d_flags = flags;
 	dentry->d_inode = inode;
 	dentry_rcuwalk_barrier(dentry);
 	spin_unlock(&dentry->d_lock);
diff --git a/fs/namei.c b/fs/namei.c
index 120efc76d3d0..8c0955c3a4b2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1262,7 +1262,7 @@ static inline int walk_component(struct nameidata *nd, struct path *path,
 		terminate_walk(nd);
 		return -ENOENT;
 	}
-	if (unlikely(inode->i_op->follow_link) && follow) {
+	if (unlikely(path->dentry->d_flags & DCACHE_OP_FOLLOW_LINK) && follow) {
 		if (nd->flags & LOOKUP_RCU) {
 			if (unlikely(unlazy_walk(nd, path->dentry))) {
 				terminate_walk(nd);
@@ -1394,7 +1394,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 				return err;
 		}
 		err = -ENOTDIR; 
-		if (!nd->inode->i_op->lookup)
+		if (!(nd->path.dentry->d_flags & DCACHE_OP_LOOKUP))
 			break;
 		continue;
 		/* here ends the main loop */
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 3f22d8d6d8a3..c719566c4088 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -208,6 +208,12 @@ struct dentry_operations {
 #define DCACHE_CANT_MOUNT	0x0100
 #define DCACHE_GENOCIDE		0x0200
 
+/* Avoid dereferencing inode->i_op */
+#define DCACHE_OP_LOOKUP	0x0400
+#define DCACHE_OP_FOLLOW_LINK	0x0800
+#define DCACHE_OP_INODE	(DCACHE_OP_LOOKUP | DCACHE_OP_FOLLOW_LINK)
+
+/* Avoid dereferencing dentry->d_op */
 #define DCACHE_OP_HASH		0x1000
 #define DCACHE_OP_COMPARE	0x2000
 #define DCACHE_OP_REVALIDATE	0x4000

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH] vfs: move ACL cache lookup into generic code
  2011-07-23  3:42       ` Linus Torvalds
@ 2011-07-23  4:31         ` Al Viro
  2011-07-23  6:06           ` Al Viro
  2011-07-25  8:16           ` Aneesh Kumar K.V
  0 siblings, 2 replies; 37+ messages in thread
From: Al Viro @ 2011-07-23  4:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel

On Fri, Jul 22, 2011 at 08:42:35PM -0700, Linus Torvalds wrote:
> > ... why is that a problem? ?Locking there is mere ->i_lock and getting
> > a refcount is atomic_inc(). ?Grabbing a reference might be Not Nice from
> > the cacheline bouncing POV, but...
> 
> I agree in theory, but it's not something we've done before. Some of
> the posix-acl code is pretty disgusting, I didn't even want to go
> there.
> 
> And the case that tends to really *matter* is the "no acls" case
> anyway, and that's the case that is guaranteed to have no nasty races
> or odd issues with having to allocate/de-allocate any acl structures.
> But yes, this could be looked at.

Heh...  In addition to ocfs2 leak: 9p leaks nicely if v9fs_acl_mode() is
called with !S_ISDIR(mode).  In that case acl reference is simply lost.
So yes, it's worth looking at.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] vfs: move ACL cache lookup into generic code
  2011-07-23  4:31         ` Al Viro
@ 2011-07-23  6:06           ` Al Viro
  2011-07-25  8:15             ` Aneesh Kumar K.V
  2011-07-25  8:16           ` Aneesh Kumar K.V
  1 sibling, 1 reply; 37+ messages in thread
From: Al Viro @ 2011-07-23  6:06 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: Christoph Hellwig, linux-fsdevel, Linus Torvalds

On Sat, Jul 23, 2011 at 05:31:20AM +0100, Al Viro wrote:

> Heh...  In addition to ocfs2 leak: 9p leaks nicely if v9fs_acl_mode() is
> called with !S_ISDIR(mode).  In that case acl reference is simply lost.
> So yes, it's worth looking at.

Damn...  FWIW, that code is seriously broken and not only on failure exits.
Guys, think what happens if we have a negative dentry and call e.g. mkdir().
Dentry is hashed.  Eventually we get to
        d_instantiate(dentry, inode);
        err = v9fs_fid_add(dentry, fid);
in v9fs_create() (or v9fs_vfs_mkdir_dotl()).  At the same time somebody does
lookup *in* that directory.  We do find that thing in dcache, it's (already)
positive and we hit
        dfid = v9fs_fid_lookup(dentry->d_parent);
at the same time.  That calls v9fs_fid_lookup_with_uid() and then
v9fs_fid_add().  Now, we have two tasks doing v9fs_fid_add() on the same
dentry.  Which is to say,
        dent = dentry->d_fsdata;
        if (!dent) {
                dent = kmalloc(sizeof(struct v9fs_dentry), GFP_KERNEL);
                if (!dent)
                        return -ENOMEM;

                spin_lock_init(&dent->lock);
                INIT_LIST_HEAD(&dent->fidlist);
                dentry->d_fsdata = dent;
        }
and that's an obvious leak.  It's not easy to hit, but...  Note that
->d_revalidate() does *not* help - v9fs_create() will force revaliation
of parent, but the second process might be already past that point.
Moreover, ->d_revalidate() would just talk to server and return 1, so
even if the second process does hit it, nothing will change except for
minor delay.  And the first one might be sleeping in blocking allocation
at that point...

I don't like that; we certainly can protect v9fs_fid_add() in standard way
(i.e. recheck ->d_fsdata after kmalloc(), if we have lost the race - kfree()
and go on, otherwise set ->d_fsdata and make that check+assignment atomic,
e.g. by use of ->d_lock).  However, that looks like a kludge.  Is there any
saner way to do it?  E.g. do we absolutely have to do that *after*
d_instantiate()?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] vfs: move ACL cache lookup into generic code
  2011-07-23  3:29     ` Al Viro
  2011-07-23  3:42       ` Linus Torvalds
@ 2011-07-23  7:47       ` Al Viro
  2011-07-23 14:50         ` Christoph Hellwig
  1 sibling, 1 reply; 37+ messages in thread
From: Al Viro @ 2011-07-23  7:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel

On Sat, Jul 23, 2011 at 04:29:44AM +0100, Al Viro wrote:

> OK, unless there are serious objections, I
> 	* apply Linus' patch as-is
> 	* fix that ocfs2 leak
> 	* replace posix_acl_..._masq() with saner helpers (take original
> acl + other arguments, return modified clone or ERR_PTR) and kill open-coded
> instances...

Done (along with 9p leak fixes).  NOTE: this is completely untested.
posix_acl_clone()/posix_acl_create_masq()/posix_acl_chmod_masq() all
became static in fs/posix_acl.c; replacement functions are
posix_acl_create() and posix_acl_chmod().  All callers converted, with
about a hundred lines of boilerplates gone.

And yes, it's pure copy-on-write.  The objects are refcounted, shared a lot
and the only primitives that did modifications had been *too* primitive -
their callers all started with creating a copy, then modifying it, then
letting somebody see the result.

Conclusion: that's an obvious free-by-RCU fodder.  I have not done that yet,
but it should be trivial to add.  posix_acl_release() would, instead of
kfree(), do kfree_rcu() (and we'd need to add rcu_head to struct posix_acl).

I'm not sure if merging the copying into ..._masq() would be worth doing.
posix_acl_clone() is basically kmemdup() and ..._masq() is a loop over all
array elements, with switch by element tag, modifying some of them.  Might
be worth merging with copying the elements...  Hell knows.

Anyway, that'll have to wait for tomorrow; I'm going down right now.  This
stuff (plus Tim's "mount lock scalability for internal mounts" patch) is
in #untested in usual place.  Comments/testing/etc. are welcome...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] VFS: Cut down inode->i_op->xyz accesses in path walking
  2011-07-23  3:55   ` [PATCH] " Linus Torvalds
@ 2011-07-23 13:35     ` Christoph Hellwig
  2011-07-23 14:46       ` Al Viro
       [not found]     ` <alpine.LFD.2.02.1107251852220.13796@i5.linux-foundation.org>
  1 sibling, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2011-07-23 13:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, Christoph Hellwig, linux-fsdevel

On Fri, Jul 22, 2011 at 08:55:48PM -0700, Linus Torvalds wrote:
> If something sets dentry->d_inode without going through __d_instantiate, 
> that misses it. I'm looking at d_obtain_alias(), and wondering, for 
> example.

As pointed out in my last mail d_obtain_alias will absolute need setting
up these flags as well.

> This is a very minimal version of the previous patch: those *two* places 
> where we now test the new DCACHE_OP_xyz bits are where we really see the 
> dereferencing of the inode->i_op in the normal path walking. HOWEVER, the 
> "inode_permission()" function also shows up pretty clearly on profiles, 
> and the hottest part of that is the testing of "inode->i_op->permission".
> 
> However, with Al's other changes, the old special-case "exec_permission()" 
> no longer exists. The whole reason it existed before was that the execute 
> permission checking on directory entries is/was special from a performance 
> standpoint. So it might make sense to resurrect it, and then add the
> DCACHE_OP_PERMISSION case like in my older patch.

I don't think we need to re-add exec_permission for it.  By checking
DCACHE_OP_PERMISSION instead of inode->i_op->permission you'll always
got directly to generic_permission in inode_permission, and in there
acl_permission_check -> check_acl should simply do the right thing with
your your patch to take the ACL cache checking into common code.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] VFS: Cut down inode->i_op->xyz accesses in path walking
  2011-07-23 13:35     ` Christoph Hellwig
@ 2011-07-23 14:46       ` Al Viro
  2011-07-23 14:51         ` Christoph Hellwig
  2011-07-23 15:45         ` Linus Torvalds
  0 siblings, 2 replies; 37+ messages in thread
From: Al Viro @ 2011-07-23 14:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Linus Torvalds, linux-fsdevel

On Sat, Jul 23, 2011 at 03:35:34PM +0200, Christoph Hellwig wrote:
> On Fri, Jul 22, 2011 at 08:55:48PM -0700, Linus Torvalds wrote:
> > If something sets dentry->d_inode without going through __d_instantiate, 
> > that misses it. I'm looking at d_obtain_alias(), and wondering, for 
> > example.
> 
> As pointed out in my last mail d_obtain_alias will absolute need setting
> up these flags as well.

... and dentry_unlink_inode() - clearing them (note that we already are
clearing DCACHE_CANT_MOUNT there, so additional price will be trivial).

> I don't think we need to re-add exec_permission for it.  By checking
> DCACHE_OP_PERMISSION instead of inode->i_op->permission you'll always
> got directly to generic_permission in inode_permission, and in there
> acl_permission_check -> check_acl should simply do the right thing with
> your your patch to take the ACL cache checking into common code.

Checking DCACHE_OP_PERMISSION on _what_?  inode_permission() gets inode,
not dentry...  Now, we could mirror those into struct inode itself (and
that might make more sense, actually), but I'd rather not go through
the equivalent of d_set_d_op() patchset for i_op, with no hope for per-sb
defaults reducing the size of mess this time.  Calculating those flags
at __d_instantiate/d_obtain_alias still looks like the least painful
way to do it...

Come to think of that, reintroducing exec_permission() and making it
take dentry has a problem - we would be asking for trouble with the
"clean the flags" side of things.  If we race with e.g. rmdir(),
dentry *can* become negative under us on those calls.  Inode will
remain allocated and, with MAY_NOT_BLOCK, the actual ->permission()
will hopefully not rely on anything that might have been freed in
->evict_inode() (we do need to document and verify that, BTW), but
dentry flags could be cleared...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] vfs: move ACL cache lookup into generic code
  2011-07-23  7:47       ` Al Viro
@ 2011-07-23 14:50         ` Christoph Hellwig
  2011-07-23 15:32           ` Al Viro
  0 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2011-07-23 14:50 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Christoph Hellwig, linux-fsdevel

On Sat, Jul 23, 2011 at 08:47:22AM +0100, Al Viro wrote:
> Anyway, that'll have to wait for tomorrow; I'm going down right now.  This
> stuff (plus Tim's "mount lock scalability for internal mounts" patch) is
> in #untested in usual place.  Comments/testing/etc. are welcome...

Can't be just pass &inode->i_mode to posix_acl_create for those filesystems
that simply write directly into i_mode, instead of keeping it in a local
variable?

Also the posix_acl_create should be after the function body, not a lot
before it.

Otherwise looks fine, and passes QA on xfs.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] VFS: Cut down inode->i_op->xyz accesses in path walking
  2011-07-23 14:46       ` Al Viro
@ 2011-07-23 14:51         ` Christoph Hellwig
  2011-07-23 15:45         ` Linus Torvalds
  1 sibling, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2011-07-23 14:51 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, Linus Torvalds, linux-fsdevel

On Sat, Jul 23, 2011 at 03:46:43PM +0100, Al Viro wrote:
> Checking DCACHE_OP_PERMISSION on _what_?  inode_permission() gets inode,

Doh, missed that we don't have the dentry.  That does indeed make it heard.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] vfs: move ACL cache lookup into generic code
  2011-07-23 14:50         ` Christoph Hellwig
@ 2011-07-23 15:32           ` Al Viro
  2011-07-23 17:02             ` Al Viro
  0 siblings, 1 reply; 37+ messages in thread
From: Al Viro @ 2011-07-23 15:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Linus Torvalds, linux-fsdevel

On Sat, Jul 23, 2011 at 04:50:37PM +0200, Christoph Hellwig wrote:
> On Sat, Jul 23, 2011 at 08:47:22AM +0100, Al Viro wrote:
> > Anyway, that'll have to wait for tomorrow; I'm going down right now.  This
> > stuff (plus Tim's "mount lock scalability for internal mounts" patch) is
> > in #untested in usual place.  Comments/testing/etc. are welcome...
> 
> Can't be just pass &inode->i_mode to posix_acl_create for those filesystems
> that simply write directly into i_mode, instead of keeping it in a local
> variable?

umode_t vs. mode_t...
 
> Also the posix_acl_create should be after the function body, not a lot
> before it.

???
Oh, you mean the export for it?  Done (and pushed)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] VFS: Cut down inode->i_op->xyz accesses in path walking
  2011-07-23 14:46       ` Al Viro
  2011-07-23 14:51         ` Christoph Hellwig
@ 2011-07-23 15:45         ` Linus Torvalds
  1 sibling, 0 replies; 37+ messages in thread
From: Linus Torvalds @ 2011-07-23 15:45 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel

On Sat, Jul 23, 2011 at 7:46 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Checking DCACHE_OP_PERMISSION on _what_?  inode_permission() gets inode,
> not dentry...  Now, we could mirror those into struct inode itself (and
> that might make more sense, actually), but I'd rather not go through
> the equivalent of d_set_d_op() patchset for i_op,

I thought about that, and indeed could not see a good place to hook
into. However, we do have "only" 317 places that do

    inode->i_op = &..myops..

and an alternative would be to simply do an automated replacement with

    set_inode_ops(inode, &..myops..);

because the scripting wouldn't be all that bad.

Then, if we just make sure that all the inode fields we touch are in
the same cacheline, we should be good. I think it's mainly i_mode,
i_uid and the acl cache entries. And there is space for some bits in
there, umode_t is just 16 bits, so we could easily put another 16-bit
"inode op flags" in next to it.

So that would be fairly straightforward.

                        Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] vfs: move ACL cache lookup into generic code
  2011-07-23 15:32           ` Al Viro
@ 2011-07-23 17:02             ` Al Viro
  2011-07-23 17:31               ` Linus Torvalds
  0 siblings, 1 reply; 37+ messages in thread
From: Al Viro @ 2011-07-23 17:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Linus Torvalds, linux-fsdevel

On Sat, Jul 23, 2011 at 04:32:00PM +0100, Al Viro wrote:
> On Sat, Jul 23, 2011 at 04:50:37PM +0200, Christoph Hellwig wrote:
> > On Sat, Jul 23, 2011 at 08:47:22AM +0100, Al Viro wrote:
> > > Anyway, that'll have to wait for tomorrow; I'm going down right now.  This
> > > stuff (plus Tim's "mount lock scalability for internal mounts" patch) is
> > > in #untested in usual place.  Comments/testing/etc. are welcome...
> > 
> > Can't be just pass &inode->i_mode to posix_acl_create for those filesystems
> > that simply write directly into i_mode, instead of keeping it in a local
> > variable?
> 
> umode_t vs. mode_t...

We have some really stunning misuses of mode_t, BTW:
static mode_t
find_smbios_instance_string(struct pci_dev *pdev, char *buf,
                            enum smbios_attr_enum attribute)
{
...
                                        return scnprintf(buf, PAGE_SIZE,
                                                         "%d\n",
...
                                        return scnprintf(buf, PAGE_SIZE,
                                                         "%s\n",
                                                         dmi->name);
...
                        return strlen(dmi->name);
...
        return 0;
}

i.e. that one is misspelled size_t (drivers/pci/pci-label.c).  I don't
like how mode_t and umode_t are mixed up as well...

Why is the latter per-architecture, BTW?  It's unsigned short everywhere
and we'd already got some clowns exposing it in userland ABI (see
include/trace/events/ext4.h).  Is there any good reason for separate
mode_t and umode_t kernel-side?  We use mode_t in some syscall declarations
(not all, BTW - mknod() uses unsigned int), but inside the kernel we
almost immediately lose upper bits on conversions anyway...

And things like ->mkdir() are declare as passing int, treat it as mode_t and
end up storing it in umode_t...  I haven't found any obvious places where
that kind of stuff could cause problems, but that was just a casual look.

umode_t is *old* - it's been introduced in 0.95, back when a bunch of
->i_... got typedefed types.  I never looked into its history and that's
way before my time...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] vfs: move ACL cache lookup into generic code
  2011-07-23 17:02             ` Al Viro
@ 2011-07-23 17:31               ` Linus Torvalds
  2011-07-23 18:20                 ` Al Viro
  0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2011-07-23 17:31 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel

On Sat, Jul 23, 2011 at 10:02 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> I don't like how mode_t and umode_t are mixed up as well...
>
> Why is the latter per-architecture, BTW?

I think it's some crazy legacy reason, due to simply the differences
between what shows up in 'struct stat' and what we've saved. It
probably made sense at the time it was introduced. It's now an unholy
mess, probably partly because the "obvious name" (mode_t) doesn't
match the obvious size (traditional 16-bit mode, which is umode_t).

It would be nice if we

 (a) got rid of umode_t entirely

 (b) changed 'mode_t' to the 16-bit traditional unix mode

 (c) made the users who now use 'mode_t' use the actual size they
want, since if they actually depend on the higher bits, it shouldn't
be called mode_t.

but we'll never do it, because it's such a pain.

So practically speaking, we'd probably be better off just getting rid
of 'umode_t' and replacing all users with u16.

                  Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] vfs: move ACL cache lookup into generic code
  2011-07-23 17:31               ` Linus Torvalds
@ 2011-07-23 18:20                 ` Al Viro
  2011-07-23 18:29                   ` Linus Torvalds
  0 siblings, 1 reply; 37+ messages in thread
From: Al Viro @ 2011-07-23 18:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel

On Sat, Jul 23, 2011 at 10:31:23AM -0700, Linus Torvalds wrote:

> It would be nice if we
> 
>  (a) got rid of umode_t entirely
> 
>  (b) changed 'mode_t' to the 16-bit traditional unix mode
> 
>  (c) made the users who now use 'mode_t' use the actual size they
> want, since if they actually depend on the higher bits, it shouldn't
> be called mode_t.
> 
> but we'll never do it, because it's such a pain.

Maybe, maybe not...  There's not *that* many of users of either of those.
BTW, what was fchmod(fd, -1) about?  It ends up using current ->i_mode,
but that's not only a Linuxism, it's an undocumented one...  Old one,
too - 0.99.14v, AFAICS.  Looking around on google gave no results (including
usenet search) and there's no changelog for that revision on kernel.org,
AFAICS...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] vfs: move ACL cache lookup into generic code
  2011-07-23 18:20                 ` Al Viro
@ 2011-07-23 18:29                   ` Linus Torvalds
  2011-07-23 21:53                     ` Al Viro
  0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2011-07-23 18:29 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel

On Sat, Jul 23, 2011 at 11:20 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> but we'll never do it, because it's such a pain.
>
> Maybe, maybe not...  There's not *that* many of users of either of those.

The problem is that current 'mode_t' and if it is ever exported to
user space (and I bet it is, in various structures), we're basically
screwed. You have a random architecture-specific size, that you can't
clean up to be the size you actually _use_, because it's now part of
the ABI.

              Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] vfs: move ACL cache lookup into generic code
  2011-07-23 18:29                   ` Linus Torvalds
@ 2011-07-23 21:53                     ` Al Viro
  2011-07-23 22:38                       ` Al Viro
  0 siblings, 1 reply; 37+ messages in thread
From: Al Viro @ 2011-07-23 21:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel

On Sat, Jul 23, 2011 at 11:29:39AM -0700, Linus Torvalds wrote:
> On Sat, Jul 23, 2011 at 11:20 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >>
> >> but we'll never do it, because it's such a pain.
> >
> > Maybe, maybe not... ?There's not *that* many of users of either of those.
> 
> The problem is that current 'mode_t' and if it is ever exported to
> user space (and I bet it is, in various structures), we're basically
> screwed. You have a random architecture-specific size, that you can't
> clean up to be the size you actually _use_, because it's now part of
> the ABI.

It's not that bad, really.  On part of targets struct stat has mode_t st_mode;
same for hpux_stat64 and ipc64_perm on parisc and stat64 on mips.  Other
just declare st_mode as unsigned int or unsigned short without any problems
(including x86, FWIW).  There's also __kernel_mode_t, which is arch-dependent;
not sure if it's always the same as what's used for st_mode in struct stat.
That's often used for ipc64_perm (->mode) and (AFAICS, always) for ipc_perm
(->mode, again).

We also have a few syscalls passing mode_t - chmod, fchmod, fchmodat,
mq_open.  open(), mknod() and mkdir() use int.

	That's about it - shouldn't be a problem to switch mode_t in ABI to
__kernel_mode_t (equivalent, since mode_t is typedefed to __kernel_mode_t)
and we are free to do whatever we want with internal mode_t.  I suspect
that starting with centralized typedef for umode_t (always u16) it wouldn't
take much to drive all internal uses to umode_t (or size_t, in at least one
case ;-), then replace all remaining (part of ABI) instances with
__kernel_mode_t and rename umode_t to mode_t.  I'll try to put such a series
together tonight; will push to #mode_t once done...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] vfs: move ACL cache lookup into generic code
  2011-07-23 21:53                     ` Al Viro
@ 2011-07-23 22:38                       ` Al Viro
  0 siblings, 0 replies; 37+ messages in thread
From: Al Viro @ 2011-07-23 22:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel, David Woodhouse

On Sat, Jul 23, 2011 at 10:53:37PM +0100, Al Viro wrote:
> 	That's about it - shouldn't be a problem to switch mode_t in ABI to
> __kernel_mode_t (equivalent, since mode_t is typedefed to __kernel_mode_t)
> and we are free to do whatever we want with internal mode_t.  I suspect
> that starting with centralized typedef for umode_t (always u16) it wouldn't
> take much to drive all internal uses to umode_t (or size_t, in at least one
> case ;-), then replace all remaining (part of ABI) instances with
> __kernel_mode_t and rename umode_t to mode_t.  I'll try to put such a series
> together tonight; will push to #mode_t once done...

And that has immediately caught a lovely bug on jffs2 with ACLs - any
big-endian platform with 16bit mode_t is fucked, since we have
jffs2_new_inode(..., int mode, ...) do
        ret = jffs2_init_acl_pre(dir_i, inode, &mode);
The latter expects int *, which is what it gets.  The it explicitly casts
that to mode_t * and passes to posix_acl_create_masq() (directly in current
mainline, through posix_acl_create() in #untested).  Back in jffs2_new_inode()
we use the damn thing afterwards.  Moreover, posix_acl_create_masq() uses
the value pointed to by mode_t * it gets.  And expects normal S_IF... in
it.

IOW, on sparc32 (at least) it's screwed - it'll see 0 what's actually upper
16 bits of jffs2_new_inode() mode argument instead of what should've seen
(the lower 16 bits).  It will leave them 0, so the caller won't notice anything
weird going on, but ACL will be buggered and mode won't be adjusted.

Also screwed: avr32, frv, h8300, m68k, parisc, s390 and, in their big-endian
variants, arm, m32r, microblaze and sh.

Fucked-up-by: commit cfc8dc6f6f69ede939e09c2af06a01adee577285

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] vfs: move ACL cache lookup into generic code
  2011-07-23  6:06           ` Al Viro
@ 2011-07-25  8:15             ` Aneesh Kumar K.V
  0 siblings, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2011-07-25  8:15 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel, Linus Torvalds

On Sat, 23 Jul 2011 07:06:26 +0100, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> On Sat, Jul 23, 2011 at 05:31:20AM +0100, Al Viro wrote:
> 
> > Heh...  In addition to ocfs2 leak: 9p leaks nicely if v9fs_acl_mode() is
> > called with !S_ISDIR(mode).  In that case acl reference is simply lost.
> > So yes, it's worth looking at.
> 
> Damn...  FWIW, that code is seriously broken and not only on failure exits.
> Guys, think what happens if we have a negative dentry and call e.g. mkdir().
> Dentry is hashed.  Eventually we get to
>         d_instantiate(dentry, inode);
>         err = v9fs_fid_add(dentry, fid);
> in v9fs_create() (or v9fs_vfs_mkdir_dotl()).  At the same time somebody does
> lookup *in* that directory.  We do find that thing in dcache, it's (already)
> positive and we hit
>         dfid = v9fs_fid_lookup(dentry->d_parent);
> at the same time.  That calls v9fs_fid_lookup_with_uid() and then
> v9fs_fid_add().  Now, we have two tasks doing v9fs_fid_add() on the same
> dentry.  Which is to say,
>         dent = dentry->d_fsdata;
>         if (!dent) {
>                 dent = kmalloc(sizeof(struct v9fs_dentry), GFP_KERNEL);
>                 if (!dent)
>                         return -ENOMEM;
> 
>                 spin_lock_init(&dent->lock);
>                 INIT_LIST_HEAD(&dent->fidlist);
>                 dentry->d_fsdata = dent;
>         }
> and that's an obvious leak.  It's not easy to hit, but...  Note that
> ->d_revalidate() does *not* help - v9fs_create() will force revaliation
> of parent, but the second process might be already past that point.
> Moreover, ->d_revalidate() would just talk to server and return 1, so
> even if the second process does hit it, nothing will change except for
> minor delay.  And the first one might be sleeping in blocking allocation
> at that point...
> 
> I don't like that; we certainly can protect v9fs_fid_add() in standard way
> (i.e. recheck ->d_fsdata after kmalloc(), if we have lost the race - kfree()
> and go on, otherwise set ->d_fsdata and make that check+assignment atomic,
> e.g. by use of ->d_lock).  However, that looks like a kludge.  Is there any
> saner way to do it?  E.g. do we absolutely have to do that *after*
> d_instantiate()?

We should be able to move that v9fs_fid_add before d_instantiate. That
way we can be sure that positive dentries have fids attached.
v9fs_vfs_lookup does that already. I guess we should audit all the
v9fs_fid_add usage to make sure we do that consistently all other place.

-aneesh


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] vfs: move ACL cache lookup into generic code
  2011-07-23  4:31         ` Al Viro
  2011-07-23  6:06           ` Al Viro
@ 2011-07-25  8:16           ` Aneesh Kumar K.V
  1 sibling, 0 replies; 37+ messages in thread
From: Aneesh Kumar K.V @ 2011-07-25  8:16 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel

On Sat, 23 Jul 2011 05:31:20 +0100, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> On Fri, Jul 22, 2011 at 08:42:35PM -0700, Linus Torvalds wrote:
> > > ... why is that a problem? ?Locking there is mere ->i_lock and getting
> > > a refcount is atomic_inc(). ?Grabbing a reference might be Not Nice from
> > > the cacheline bouncing POV, but...
> > 
> > I agree in theory, but it's not something we've done before. Some of
> > the posix-acl code is pretty disgusting, I didn't even want to go
> > there.
> > 
> > And the case that tends to really *matter* is the "no acls" case
> > anyway, and that's the case that is guaranteed to have no nasty races
> > or odd issues with having to allocate/de-allocate any acl structures.
> > But yes, this could be looked at.
> 
> Heh...  In addition to ocfs2 leak: 9p leaks nicely if v9fs_acl_mode() is
> called with !S_ISDIR(mode).  In that case acl reference is simply lost.
> So yes, it's worth looking at.

something like the below ? I will also audit rest of the code.

diff --git a/fs/9p/acl.c b/fs/9p/acl.c
index e98f56d..abad8b3 100644
--- a/fs/9p/acl.c
+++ b/fs/9p/acl.c
@@ -212,7 +212,7 @@ int v9fs_acl_mode(struct inode *dir, mode_t *modep,
 		struct posix_acl *clone;
 
 		if (S_ISDIR(mode))
-			*dpacl = acl;
+			*dpacl = posix_acl_clone(acl, GFP_NOFS);
 		clone = posix_acl_clone(acl, GFP_NOFS);
 		retval = -ENOMEM;
 		if (!clone)
@@ -227,7 +227,7 @@ int v9fs_acl_mode(struct inode *dir, mode_t *modep,
 			*pacl = clone;
 	}
 	*modep  = mode;
-	return 0;
+	retval = 0;
 cleanup:
 	posix_acl_release(acl);
 	return retval;

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH] VFS: Cut down inode->i_op->xyz accesses in path walking
       [not found]     ` <alpine.LFD.2.02.1107251852220.13796@i5.linux-foundation.org>
@ 2011-07-26  3:05       ` Al Viro
  2011-07-26  3:23         ` Linus Torvalds
  0 siblings, 1 reply; 37+ messages in thread
From: Al Viro @ 2011-07-26  3:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel

On Mon, Jul 25, 2011 at 07:04:11PM -0700, Linus Torvalds wrote:
> 
> 
> One of the biggest remaining unnecessary costs in path walking is the
> pointer chasing in inode operations.  We already avoided the
> dentry->d_op derferences with the DCACHE_OP_xyz flags, this just starts
> doing the same thing for the i_op->xyz cases and new IOP_xyz flags in 
> the new inode->i_opflag field.
> 
> To make sure the i_opflag field is consistent with i_op, we introduce a 
> simple wrapper function to do assignments to i_op. That makes the patch 
> big, but it was almost entirely mechanical and very straightforward.
> 
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
> 
> Ok, so this patch is pretty big, but the bulk if it is an entirely 
> mechanical replacement of
> 
> 	inode->i_op = xyz;
> 
> with
> 
> 	set_inode_ops(inode, xyz);
> 
> using a very stupid sed-script. It was followed by some manual editing 
> (a couple of fixups for my sed-script being stupid, and a couple of 
> whitespace cleanups for pre-existing whitespace problems that 'git diff' 
> just highlights).
> 
> This removes the 'inode->i_op->xyz' chain from the critical pathname walk, 
> although the *single* hottest instruction in link_path_walk on my Westmere 
> system remains the test of 'inode->i_opflag' for IOP_FOLLOW_LINK:
> 
>    14.70%: testb  $0x2,(%rax)
> 
> because it turns out that that instruction is what brings in the inode 
> into the L1 cache. But hey, it's certainly better than getting that i_op 
> and then following the pointer there and checking it for NULL.
> 
> And it's better to the tune of roughly 0.1s for my empty "make -j" test 
> (which is actually dominated by the user-space costs in 'make', but has a 
> largish pathname component to it too). That's out of about 4.3s, so it is 
> about 2%. Not huge, but it seems quite measurable.
> 
> Of course, those kinds of costs will depend very much on cache sizes and 
> microarchitectural details, and the "empty kernel make -j" may not be the 
> most relevant benchmark around, but it's more or a real load than some.
> 
> Hmm? Too painful?

We could actually cheat a bit.  *All* inodes that ever reach inode_permission()
have at some point been pointed to by ->d_inode of some dentry.  It's
inconvenient as hell (and inviting abuse) to pass such dentry instead
of inode; moreover, there would be nasty questions of ->d_inode stability
on RCU path.

However, we can greatly reduce that amount of changes if we do the
following:
	* one bit in your new field set when inode is put in ->d_inode.
Checked at inode_permission() time, BUG_ON() if not set.
	* other bit(s) set by checking ->i_op contents at the same time,
if bit hadn't been already set (->i_op can't change afterwards); checked
by inode_permission() and whatever else might want to (e.g. checks for
non-NULL ->lookup(); for those we also know that inode went through
some ->d_inode at some point).
	* we need to play with setting these flags only in two places -
__d_instantiate() and d_obtain_alias().

IOW, add

/* called with inode->i_lock held */
static inline void set_inode_flags(struct inode *inode)
{
	if (!(inode->i_opflag & Iop_valid)) {
		struct inode_operations *op = inode->i_op;
		int flags = Iop_valid;
		if (op->permission)
			flags |= Iop_permission;
		if (op->lookup)
			flags |= Iop_lookup;
		inode->i_opflag = flags;
	}
}

and turn __d_instantiate() into
static void __d_instantiate(struct dentry *dentry, struct inode *inode)
{
        spin_lock(&dentry->d_lock);
        if (inode) {
                if (unlikely(IS_AUTOMOUNT(inode)))
                        dentry->d_flags |= DCACHE_NEED_AUTOMOUNT;
		set_inode_flags(inode);
                list_add(&dentry->d_alias, &inode->i_dentry);
        }
        dentry->d_inode = inode;
        dentry_rcuwalk_barrier(dentry);
        spin_unlock(&dentry->d_lock);
        fsnotify_d_instantiate(dentry, inode);
}

with set_inode_flags(inode); also added right after
        tmp->d_inode = inode;
in d_obtain_alias()

Voila - inode_permission() would do
	int flags;
	...
	flags = inode->i_opflag;
	BUG_ON(!(flags & Iop_valid));
	if (unlikely(flags & Iop_permission))
		res = inode->i_op->permission(...);
	else
		res = generic_permission(...)
and we are all set.  Individual fs code is not affected at all...  Sure, it's
a bit of cheating, but it avoids a lot of churn *and* a nasty class of bugs
in the future - somebody assigning ->i_op directly.  Dunno...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] VFS: Cut down inode->i_op->xyz accesses in path walking
  2011-07-26  3:05       ` Al Viro
@ 2011-07-26  3:23         ` Linus Torvalds
  2011-07-26 18:41           ` Al Viro
  2011-08-07  6:06           ` Linus Torvalds
  0 siblings, 2 replies; 37+ messages in thread
From: Linus Torvalds @ 2011-07-26  3:23 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel

On Mon, Jul 25, 2011 at 8:05 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> We could actually cheat a bit.  *All* inodes that ever reach inode_permission()
> have at some point been pointed to by ->d_inode of some dentry.  It's
> inconvenient as hell (and inviting abuse) to pass such dentry instead
> of inode; moreover, there would be nasty questions of ->d_inode stability
> on RCU path.
>
> However, we can greatly reduce that amount of changes if we do the
> following:
>        * one bit in your new field set when inode is put in ->d_inode.
> Checked at inode_permission() time, BUG_ON() if not set.
>        * other bit(s) set by checking ->i_op contents at the same time,
> if bit hadn't been already set (->i_op can't change afterwards); checked
> by inode_permission() and whatever else might want to (e.g. checks for
> non-NULL ->lookup(); for those we also know that inode went through
> some ->d_inode at some point).
>        * we need to play with setting these flags only in two places -
> __d_instantiate() and d_obtain_alias().

Ugh. That sounds more complex than I'd really like. Smaller change,
yes. But the end result is just more complicated.

What I *did* consider was to have the bits just be "slow case" bits.
And just set them all (blindly) at inode allocation time. Then, the
users would simply end up doing

  static inline int do_inode_permission(struct inode *inode, int mask)
  {
    if (inode->i_opflag & SLOW_PERMISSION) {
      if (inode->i_op->permission)
        return inode->i_op->permission(inode, mask);
      bit_clear(SLOW_PERMISSION, &inode->i_opflag);
    }
    return generic_permission(inode, mask);
  }

which basically would trigger the *really* slow case once (that atomic
bit clear that happens just once - or maybe a couple of concurrent
times for the harmless race), but would have very trivial complexity,
and you could tailor the bits arbitrarily for what you want to test as
the fast case.

The reason I didn't go any further along that way is that we don't
have any nice atomic operations for small fields like that. The bitops
want a "long" entity, and it becomes just a nightmare to work out bit
positions with big-endian etc. And I definitely didn't want to grow
the inode.

But I think it would be a reasonable approach with less complexity,
and we could even use a spinlock (ie i_lock) for the slow case, since
it really should only ever happen once per inode load.

It's absolutely ok to test the i_op fields when we actually use them,
because by then we'll obviously have the whole "populate the cache and
follow the pointer chain" issue anyway. So we have a clear "fast case"
when we simply don't want to even look at that pointer at all, and the
above kind of code would work fine, I think.

That said, the big patch is clearly the simplest approach of them all.
It does have the "did you set i_op with the proper function" of
course, but that' s at least not a *complex* bug, and if you get the
i_opflag value wrong, it will be mostly be quite obvious (eg a NULL
pointer dereference or simply not calling the filesystem permission
check at all).

But I'm not married to any particular solution. It's just annoying how
visible that i_op access is in profiles, and how close we are to not
needing it ;)

                           Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] VFS: Cut down inode->i_op->xyz accesses in path walking
  2011-07-26  3:23         ` Linus Torvalds
@ 2011-07-26 18:41           ` Al Viro
  2011-07-26 18:45             ` Linus Torvalds
  2011-08-07  6:06           ` Linus Torvalds
  1 sibling, 1 reply; 37+ messages in thread
From: Al Viro @ 2011-07-26 18:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel

On Mon, Jul 25, 2011 at 08:23:38PM -0700, Linus Torvalds wrote:

> The reason I didn't go any further along that way is that we don't
> have any nice atomic operations for small fields like that. The bitops
> want a "long" entity, and it becomes just a nightmare to work out bit
> positions with big-endian etc. And I definitely didn't want to grow
> the inode.

Hmm...  We still have 19 spare bits in ->i_flags and at least 22 in ->i_state.
Do we need to bother with new field at all?  ->i_state is unsigned long...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] VFS: Cut down inode->i_op->xyz accesses in path walking
  2011-07-26 18:41           ` Al Viro
@ 2011-07-26 18:45             ` Linus Torvalds
  0 siblings, 0 replies; 37+ messages in thread
From: Linus Torvalds @ 2011-07-26 18:45 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel

On Tue, Jul 26, 2011 at 11:41 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Hmm...  We still have 19 spare bits in ->i_flags and at least 22 in ->i_state.
> Do we need to bother with new field at all?  ->i_state is unsigned long...

Do we *need* it? No. But it's often in the next cache-line. It would
be beautiful to put the bits right next to i_mode, and basically make
a lot of the common operations just touch those four bytes for doing
the S_ISDIR tests and the "can I use generic_permission()" tests etc.

And the whole point of that is to make for a smaller cacheline
footprint (and admittedly also the few cycles to avoid the dereference
- and that you'd get from using i_state).

                  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] VFS: Cut down inode->i_op->xyz accesses in path walking
  2011-07-26  3:23         ` Linus Torvalds
  2011-07-26 18:41           ` Al Viro
@ 2011-08-07  6:06           ` Linus Torvalds
  2011-08-07  6:51             ` Al Viro
  1 sibling, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2011-08-07  6:06 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel

On Mon, Jul 25, 2011 at 8:23 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> What I *did* consider was to have the bits just be "slow case" bits.
> And just set them all (blindly) at inode allocation time. Then, the
> users would simply end up doing [ ... ]

I ended up with this, after all. Except rather than be "slow case"
bits, they are "fast case" bits. The code seemed more natural that
way: clear the bits initially, and then set a bit for the default case
we care about.

And I use inode->i_lock to protect the setting, to avoid the need for
atomic ops on an "unsigned short". It's going to happen once per bit
and per inode lifetime, it didn't seem worth it to try to be clever
about it.

I considered using i_state, which is already a proper "unsigned long"
exactly so that the bitops will work on it. But since the aim was to
be dense in the D$, it seemed more natural to use the 16-bit slot we
have next to i_mode.

My timings still showed roughly a 2% improvement (well 1.8% on the
"repeat ten times, do the average") but I didn't really do a lot of
deviation testing. The noise in the numbers is quite noticeable
between reboots, but I've seen that 1-2% several times over various
different implementations of the same "try to avoid touching
inode->i_op->xyz", so I'm pretty convinced it's not just random noise.

                         Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] VFS: Cut down inode->i_op->xyz accesses in path walking
  2011-08-07  6:06           ` Linus Torvalds
@ 2011-08-07  6:51             ` Al Viro
  2011-08-07 23:43               ` Linus Torvalds
  0 siblings, 1 reply; 37+ messages in thread
From: Al Viro @ 2011-08-07  6:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel

On Sat, Aug 06, 2011 at 11:06:18PM -0700, Linus Torvalds wrote:

> My timings still showed roughly a 2% improvement (well 1.8% on the
> "repeat ten times, do the average") but I didn't really do a lot of
> deviation testing. The noise in the numbers is quite noticeable
> between reboots, but I've seen that 1-2% several times over various
> different implementations of the same "try to avoid touching
> inode->i_op->xyz", so I'm pretty convinced it's not just random noise.

Trivial nit: s/do_follow_link/should_follow_link/, I think...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] VFS: Cut down inode->i_op->xyz accesses in path walking
  2011-08-07  6:51             ` Al Viro
@ 2011-08-07 23:43               ` Linus Torvalds
  0 siblings, 0 replies; 37+ messages in thread
From: Linus Torvalds @ 2011-08-07 23:43 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel

On Sat, Aug 6, 2011 at 11:51 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Trivial nit: s/do_follow_link/should_follow_link/, I think...

Done. I've also done more performance testing, and it seems that now
it would be a win again to do the "prefetch inode" in
__d_lookup_rcu().

But I'll leave it be for now. My test loads are ridiculously
pathname-intensive, I don't know if anything else really cares that
deeply.

At least there is no longer anything in the profiles that look
horribly stupid (except for the security subsystem - pointer chasing
through i_security, and I can't see anything we can do about that).

                             Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2011-08-07 23:43 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-22 17:37 VFS pathname walking cleanups (i_op and ACL access) Linus Torvalds
2011-07-22 17:37 ` [PATCH 1/2] VFS: Cut down inode->i_op->xyz accesses in path walking Linus Torvalds
2011-07-22 17:47   ` Christoph Hellwig
2011-07-22 23:40   ` Al Viro
2011-07-22 23:54     ` Linus Torvalds
2011-07-23  3:55   ` [PATCH] " Linus Torvalds
2011-07-23 13:35     ` Christoph Hellwig
2011-07-23 14:46       ` Al Viro
2011-07-23 14:51         ` Christoph Hellwig
2011-07-23 15:45         ` Linus Torvalds
     [not found]     ` <alpine.LFD.2.02.1107251852220.13796@i5.linux-foundation.org>
2011-07-26  3:05       ` Al Viro
2011-07-26  3:23         ` Linus Torvalds
2011-07-26 18:41           ` Al Viro
2011-07-26 18:45             ` Linus Torvalds
2011-08-07  6:06           ` Linus Torvalds
2011-08-07  6:51             ` Al Viro
2011-08-07 23:43               ` Linus Torvalds
2011-07-22 17:40 ` VFS pathname walking cleanups (i_op and ACL access) Christoph Hellwig
2011-07-22 17:45 ` [PATCH 2/2] vfs: move ACL cache lookup into generic code Linus Torvalds
2011-07-22 17:50   ` Christoph Hellwig
2011-07-22 17:54     ` Linus Torvalds
2011-07-23  2:34   ` [PATCH] " Linus Torvalds
2011-07-23  3:29     ` Al Viro
2011-07-23  3:42       ` Linus Torvalds
2011-07-23  4:31         ` Al Viro
2011-07-23  6:06           ` Al Viro
2011-07-25  8:15             ` Aneesh Kumar K.V
2011-07-25  8:16           ` Aneesh Kumar K.V
2011-07-23  7:47       ` Al Viro
2011-07-23 14:50         ` Christoph Hellwig
2011-07-23 15:32           ` Al Viro
2011-07-23 17:02             ` Al Viro
2011-07-23 17:31               ` Linus Torvalds
2011-07-23 18:20                 ` Al Viro
2011-07-23 18:29                   ` Linus Torvalds
2011-07-23 21:53                     ` Al Viro
2011-07-23 22:38                       ` Al Viro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.