All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] VFS name lookup permission checking cleanup
@ 2009-09-07 21:01 Linus Torvalds
  2009-09-07 21:02 ` [PATCH 1/8] Do not call 'ima_path_check()' for each path component Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Linus Torvalds @ 2009-09-07 21:01 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Al Viro, Linux Filesystem Mailing List
  Cc: Eric Paris, Mimi Zohar, James Morris


This is a series of eight trivial patches that I'd like people to take a 
look at, because I am hoping to eventually do multiple path component 
lookups in one go without taking the per-dentry lock or incrementing (and 
then decrementing) the per-dentry atomic count for each component.

The aim would be to try to avoid getting that annoying cacheline ping-pong 
on the common top-level dentries that everybody looks up (ie root and home 
directories, /usr, /usr/bin etc).

Right now I have some simple (but real) loads that show the contention on 
dentry->d_lock to be a roughly 3% performance hit on a single-socket 
nehalem, and I assume it can be much worse on multi-socket machines.

And the thing is, it should be entirely possible to do everything but the 
last component lookup with just a single read_seqbegin()/read_seqretry() 
around the whole lookup. Yes, the last component is special and absolutely 
needs locking and counting - but the last component also doesn't tend to 
be shared, so locking it is fine.

Now, I may never actually get there, but when looking at it, the biggest 
problem is actually not so much the path lookup itself, as the security 
tests that are done for each path component. And it should be noted that 
in order for a lockless seq-lock only lookup make sense, any such 
operations would have to be totally lock-free too. They certainly can't 
take mutexes etc, but right now they do.

Those security tests fall into two categories:

 - actual security layer callouts: ima_path_check().

   This one looks totally pointless. Path component lookup is a horribly 
   timing-critical path, and we will only do a successful lookup on a 
   directory (inode needs to have a ->lookup operation), yet in the middle 
   of that is a call to "ima_path_check()".

   Now, it looks like ima_path_check() is very much designed to only check 
   the _final_ path anyway, and was never meant to be used to check the 
   directories we hit on the way. In fact, the whole function starts with

	if (!ima_initialized || !S_ISREG(inode->i_mode))
		return 0;

   so it's totally pointless to do that thing on a directory where 
   that !S_ISREG() test will trigger.

   So just remove it. IMA should never have put that check in there to 
   begin with, it's just way too performance-sensitive.

 - the real filesystem permission checks. 

   We used to do the common case entirely in the VFS layer, but these days 
   the common case includes POSIX ACL checking, and as a result, the 
   trivial short-circuit code in the VFS layer almost never triggers in
   practice, and we call down to the low-level filesystem for each 
   component. 

   We can't fix that by just removing the call, but what we _can_ do is to 
   at least avoid the silly calling back-and-forth: most filesystems will 
   just call back to the VFS layer to do the "generic_permission()" with 
   their own ACL-checking routines.

   That way we can flatten the call-chain out a bit, and avoid one 
   unnecessary indirect call in that timing-critical region. And 
   eventually, if we make the whole ACL caching thing be something that we 
   do at a VFS layer (Al Viro already worked on _some_ of that), we'll be 
   able to avoid the calls entirely when we can see the cached ACL 
   pointers directly.

So this series of 8 patches do all these preliminary things. As shown by 
the diffstat below, it actually reduces the lines of code (mainly by just 
removing the silly per-filesystem wrappers around "generic_permission()") 
and it also makes it a _lot_ clearer what actually gets called in that 
whole 'exec_permission_lite()' function that we use to check the 
permission of a pathname lookup.

Comments?  Especially from the IMA people (first patch) and from generic 
VFS, security and low-level FS people (the 'Simplify exec_permission_lite' 
series, and then the check_acl + per-filesystem changes).

Al?

I'm looking to merge these shortly after 2.6.31 is released, but comments 
welcome.

			Linus

----
Linus Torvalds (8):
  Do not call 'ima_path_check()' for each path component
  Simplify exec_permission_lite() logic
  Simplify exec_permission_lite() further
  Simplify exec_permission_lite(), part 3
  Make 'check_acl()' a first-class filesystem op
  shmfs: use 'check_acl' instead of 'permission'
  ext[234]: move over to 'check_acl' permission model
  jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()'

 fs/ext2/acl.c               |    8 +----
 fs/ext2/acl.h               |    4 +-
 fs/ext2/file.c              |    2 +-
 fs/ext2/namei.c             |    4 +-
 fs/ext3/acl.c               |    8 +----
 fs/ext3/acl.h               |    4 +-
 fs/ext3/file.c              |    2 +-
 fs/ext3/namei.c             |    4 +-
 fs/ext4/acl.c               |    8 +----
 fs/ext4/acl.h               |    4 +-
 fs/ext4/file.c              |    2 +-
 fs/ext4/namei.c             |    4 +-
 fs/jffs2/acl.c              |    7 +---
 fs/jffs2/acl.h              |    4 +-
 fs/jffs2/dir.c              |    2 +-
 fs/jffs2/file.c             |    2 +-
 fs/jffs2/symlink.c          |    2 +-
 fs/jfs/acl.c                |    7 +---
 fs/jfs/file.c               |    2 +-
 fs/jfs/jfs_acl.h            |    2 +-
 fs/jfs/namei.c              |    2 +-
 fs/namei.c                  |   82 +++++++++++++++++++++---------------------
 fs/xfs/linux-2.6/xfs_iops.c |   16 ++------
 include/linux/fs.h          |    1 +
 include/linux/shmem_fs.h    |    2 +-
 mm/shmem.c                  |    6 ++--
 mm/shmem_acl.c              |   11 +-----
 27 files changed, 79 insertions(+), 123 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/8] Do not call 'ima_path_check()' for each path component
  2009-09-07 21:01 [PATCH 0/8] VFS name lookup permission checking cleanup Linus Torvalds
@ 2009-09-07 21:02 ` Linus Torvalds
  2009-09-07 21:02   ` [PATCH 2/8] Simplify exec_permission_lite() logic Linus Torvalds
  2009-09-08  1:50   ` [PATCH 1/8] Do not call 'ima_path_check()' for each path component Christoph Hellwig
  2009-09-08 14:01 ` [PATCH 0/8] VFS name lookup permission checking cleanup Serge E. Hallyn
  2009-09-08 17:52 ` Mimi Zohar
  2 siblings, 2 replies; 24+ messages in thread
From: Linus Torvalds @ 2009-09-07 21:02 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Al Viro, Linux Filesystem Mailing List
  Cc: Eric Paris, Mimi Zohar, James Morris


From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 28 Aug 2009 10:05:33 -0700

Not only is that a supremely timing-critical path, but it's hopefully some 
day going to be lockless for the common case, and ima can't do that.

Plus the integrity code doesn't even care about non-regular files,
so it was always a total waste of time and effort.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/namei.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f3c5b27..164aa15 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -856,9 +856,6 @@ static int __link_path_walk(const char *name, struct nameidata *nd)
 		if (err == -EAGAIN)
 			err = inode_permission(nd->path.dentry->d_inode,
 					       MAY_EXEC);
-		if (!err)
-			err = ima_path_check(&nd->path, MAY_EXEC,
-				             IMA_COUNT_UPDATE);
  		if (err)
 			break;
 
-- 
1.6.4.1.209.g74b8


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/8] Simplify exec_permission_lite() logic
  2009-09-07 21:02 ` [PATCH 1/8] Do not call 'ima_path_check()' for each path component Linus Torvalds
@ 2009-09-07 21:02   ` Linus Torvalds
  2009-09-07 21:03     ` [PATCH 3/8] Simplify exec_permission_lite() further Linus Torvalds
  2009-09-07 22:12     ` [PATCH 2/8] Simplify exec_permission_lite() logic James Morris
  2009-09-08  1:50   ` [PATCH 1/8] Do not call 'ima_path_check()' for each path component Christoph Hellwig
  1 sibling, 2 replies; 24+ messages in thread
From: Linus Torvalds @ 2009-09-07 21:02 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Al Viro, Linux Filesystem Mailing List
  Cc: Eric Paris, Mimi Zohar, James Morris


From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 28 Aug 2009 10:50:37 -0700

Instead of returning EAGAIN and having the caller do something
special for that case,  just do the special case directly.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/namei.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 164aa15..bf8aa95 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -435,7 +435,7 @@ static int exec_permission_lite(struct inode *inode)
 	umode_t	mode = inode->i_mode;
 
 	if (inode->i_op->permission)
-		return -EAGAIN;
+		return inode_permission(inode, MAY_EXEC);
 
 	if (current_fsuid() == inode->i_uid)
 		mode >>= 6;
@@ -853,9 +853,6 @@ static int __link_path_walk(const char *name, struct nameidata *nd)
 
 		nd->flags |= LOOKUP_CONTINUE;
 		err = exec_permission_lite(inode);
-		if (err == -EAGAIN)
-			err = inode_permission(nd->path.dentry->d_inode,
-					       MAY_EXEC);
  		if (err)
 			break;
 
-- 
1.6.4.1.209.g74b8


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/8] Simplify exec_permission_lite() further
  2009-09-07 21:02   ` [PATCH 2/8] Simplify exec_permission_lite() logic Linus Torvalds
@ 2009-09-07 21:03     ` Linus Torvalds
  2009-09-07 21:03       ` [PATCH 4/8] Simplify exec_permission_lite(), part 3 Linus Torvalds
                         ` (2 more replies)
  2009-09-07 22:12     ` [PATCH 2/8] Simplify exec_permission_lite() logic James Morris
  1 sibling, 3 replies; 24+ messages in thread
From: Linus Torvalds @ 2009-09-07 21:03 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Al Viro, Linux Filesystem Mailing List
  Cc: Eric Paris, Mimi Zohar, James Morris


From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 28 Aug 2009 10:53:56 -0700

This function is only called for path components that are already known
to be directories (they have a '->lookup' method).  So don't bother
doing that whole S_ISDIR() testing, the whole point of the 'lite()'
version is that we know that we are looking at a directory component,
and that we're only checking name lookup permission.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/namei.c |    8 +-------
 1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index bf8aa95..e39e5cb 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -445,13 +445,7 @@ static int exec_permission_lite(struct inode *inode)
 	if (mode & MAY_EXEC)
 		goto ok;
 
-	if ((inode->i_mode & S_IXUGO) && capable(CAP_DAC_OVERRIDE))
-		goto ok;
-
-	if (S_ISDIR(inode->i_mode) && capable(CAP_DAC_OVERRIDE))
-		goto ok;
-
-	if (S_ISDIR(inode->i_mode) && capable(CAP_DAC_READ_SEARCH))
+	if (capable(CAP_DAC_OVERRIDE) || capable(CAP_DAC_READ_SEARCH))
 		goto ok;
 
 	return -EACCES;
-- 
1.6.4.1.209.g74b8


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/8] Simplify exec_permission_lite(), part 3
  2009-09-07 21:03     ` [PATCH 3/8] Simplify exec_permission_lite() further Linus Torvalds
@ 2009-09-07 21:03       ` Linus Torvalds
  2009-09-07 21:04         ` [PATCH 5/8] Make 'check_acl()' a first-class filesystem op Linus Torvalds
  2009-09-07 22:18         ` [PATCH 4/8] Simplify exec_permission_lite(), part 3 James Morris
  2009-09-07 22:15       ` [PATCH 3/8] Simplify exec_permission_lite() further James Morris
  2009-09-08 14:40       ` Jamie Lokier
  2 siblings, 2 replies; 24+ messages in thread
From: Linus Torvalds @ 2009-09-07 21:03 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Al Viro, Linux Filesystem Mailing List
  Cc: Eric Paris, Mimi Zohar, James Morris


From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 28 Aug 2009 11:08:31 -0700

Don't call down to the generic inode_permission() function just to
call the inode-specific permission function - just do it directly.

The generic inode_permission() code does things like checking MAY_WRITE
and devcgroup_inode_permission(), neither of which are relevant for the
light pathname walk permission checks (we always do just MAY_EXEC, and
the inode is never a special device).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/namei.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e39e5cb..7959e70 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -434,8 +434,12 @@ static int exec_permission_lite(struct inode *inode)
 {
 	umode_t	mode = inode->i_mode;
 
-	if (inode->i_op->permission)
-		return inode_permission(inode, MAY_EXEC);
+	if (inode->i_op->permission) {
+		int ret = inode->i_op->permission(inode, MAY_EXEC);
+		if (!ret)
+			goto ok;
+		return ret;
+	}
 
 	if (current_fsuid() == inode->i_uid)
 		mode >>= 6;
-- 
1.6.4.1.209.g74b8


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/8] Make 'check_acl()' a first-class filesystem op
  2009-09-07 21:03       ` [PATCH 4/8] Simplify exec_permission_lite(), part 3 Linus Torvalds
@ 2009-09-07 21:04         ` Linus Torvalds
  2009-09-07 21:04           ` [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission' Linus Torvalds
  2009-09-07 22:20           ` [PATCH 5/8] Make 'check_acl()' a first-class filesystem op James Morris
  2009-09-07 22:18         ` [PATCH 4/8] Simplify exec_permission_lite(), part 3 James Morris
  1 sibling, 2 replies; 24+ messages in thread
From: Linus Torvalds @ 2009-09-07 21:04 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Al Viro, Linux Filesystem Mailing List
  Cc: Eric Paris, Mimi Zohar, James Morris


From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 28 Aug 2009 11:51:25 -0700

This is stage one in flattening out the callchains for the common
permission testing.  Rather than have most filesystem implemnt their
inode->i_op->permission own function that just calls back down to the
VFS layers 'generic_permission()' with the per-filesystem ACL checking
function, the filesystem can just expose its 'check_acl' function
directly, and let the VFS layer do everything for it.

This is all just preparatory - no filesystem actually enables this yet.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/namei.c         |   62 +++++++++++++++++++++++++++++----------------------
 include/linux/fs.h |    1 +
 2 files changed, 36 insertions(+), 27 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 7959e70..5953913 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -169,19 +169,10 @@ void putname(const char *name)
 EXPORT_SYMBOL(putname);
 #endif
 
-
-/**
- * generic_permission  -  check for access rights on a Posix-like filesystem
- * @inode:	inode to check access rights for
- * @mask:	right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
- * @check_acl:	optional callback to check for Posix ACLs
- *
- * Used to check for read/write/execute permissions on a file.
- * We use "fsuid" for this, letting us set arbitrary permissions
- * for filesystem access without changing the "normal" uids which
- * are used for other things..
+/*
+ * This does basic POSIX ACL permission checking
  */
-int generic_permission(struct inode *inode, int mask,
+static int acl_permission_check(struct inode *inode, int mask,
 		int (*check_acl)(struct inode *inode, int mask))
 {
 	umode_t			mode = inode->i_mode;
@@ -193,9 +184,7 @@ int generic_permission(struct inode *inode, int mask,
 	else {
 		if (IS_POSIXACL(inode) && (mode & S_IRWXG) && check_acl) {
 			int error = check_acl(inode, mask);
-			if (error == -EACCES)
-				goto check_capabilities;
-			else if (error != -EAGAIN)
+			if (error != -EAGAIN)
 				return error;
 		}
 
@@ -208,8 +197,32 @@ int generic_permission(struct inode *inode, int mask,
 	 */
 	if ((mask & ~mode) == 0)
 		return 0;
+	return -EACCES;
+}
+
+/**
+ * generic_permission  -  check for access rights on a Posix-like filesystem
+ * @inode:	inode to check access rights for
+ * @mask:	right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
+ * @check_acl:	optional callback to check for Posix ACLs
+ *
+ * Used to check for read/write/execute permissions on a file.
+ * We use "fsuid" for this, letting us set arbitrary permissions
+ * for filesystem access without changing the "normal" uids which
+ * are used for other things..
+ */
+int generic_permission(struct inode *inode, int mask,
+		int (*check_acl)(struct inode *inode, int mask))
+{
+	int ret;
+
+	/*
+	 * Do the basic POSIX ACL permission checks.
+	 */
+	ret = acl_permission_check(inode, mask, check_acl);
+	if (ret != -EACCES)
+		return ret;
 
- check_capabilities:
 	/*
 	 * Read/write DACs are always overridable.
 	 * Executable DACs are overridable if at least one exec bit is set.
@@ -262,7 +275,7 @@ int inode_permission(struct inode *inode, int mask)
 	if (inode->i_op->permission)
 		retval = inode->i_op->permission(inode, mask);
 	else
-		retval = generic_permission(inode, mask, NULL);
+		retval = generic_permission(inode, mask, inode->i_op->check_acl);
 
 	if (retval)
 		return retval;
@@ -432,27 +445,22 @@ static struct dentry * cached_lookup(struct dentry * parent, struct qstr * name,
  */
 static int exec_permission_lite(struct inode *inode)
 {
-	umode_t	mode = inode->i_mode;
+	int ret;
 
 	if (inode->i_op->permission) {
-		int ret = inode->i_op->permission(inode, MAY_EXEC);
+		ret = inode->i_op->permission(inode, MAY_EXEC);
 		if (!ret)
 			goto ok;
 		return ret;
 	}
-
-	if (current_fsuid() == inode->i_uid)
-		mode >>= 6;
-	else if (in_group_p(inode->i_gid))
-		mode >>= 3;
-
-	if (mode & MAY_EXEC)
+	ret = acl_permission_check(inode, MAY_EXEC, inode->i_op->check_acl);
+	if (!ret)
 		goto ok;
 
 	if (capable(CAP_DAC_OVERRIDE) || capable(CAP_DAC_READ_SEARCH))
 		goto ok;
 
-	return -EACCES;
+	return ret;
 ok:
 	return security_inode_permission(inode, MAY_EXEC);
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 73e9b64..c1f9935 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1528,6 +1528,7 @@ struct inode_operations {
 	void (*put_link) (struct dentry *, struct nameidata *, void *);
 	void (*truncate) (struct inode *);
 	int (*permission) (struct inode *, int);
+	int (*check_acl)(struct inode *, int);
 	int (*setattr) (struct dentry *, struct iattr *);
 	int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
 	int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
-- 
1.6.4.1.209.g74b8


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission'
  2009-09-07 21:04         ` [PATCH 5/8] Make 'check_acl()' a first-class filesystem op Linus Torvalds
@ 2009-09-07 21:04           ` Linus Torvalds
  2009-09-07 21:05             ` [PATCH 7/8] ext[234]: move over to 'check_acl' permission model Linus Torvalds
                               ` (2 more replies)
  2009-09-07 22:20           ` [PATCH 5/8] Make 'check_acl()' a first-class filesystem op James Morris
  1 sibling, 3 replies; 24+ messages in thread
From: Linus Torvalds @ 2009-09-07 21:04 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Al Viro, Linux Filesystem Mailing List
  Cc: Eric Paris, Mimi Zohar, James Morris


From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 28 Aug 2009 12:04:28 -0700

shmfs wants purely standard POSIX ACL semantics, so we can use the new
generic VFS layer POSIX ACL checking rather than cooking our own
'permission()' function.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/shmem_fs.h |    2 +-
 mm/shmem.c               |    6 +++---
 mm/shmem_acl.c           |   11 +----------
 3 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index abff6c9..6d3f2f4 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -39,7 +39,7 @@ static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
 }
 
 #ifdef CONFIG_TMPFS_POSIX_ACL
-int shmem_permission(struct inode *, int);
+int shmem_check_acl(struct inode *, int);
 int shmem_acl_init(struct inode *, struct inode *);
 
 extern struct xattr_handler shmem_xattr_acl_access_handler;
diff --git a/mm/shmem.c b/mm/shmem.c
index d713239..5a0b3d4 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2446,7 +2446,7 @@ static const struct inode_operations shmem_inode_operations = {
 	.getxattr	= generic_getxattr,
 	.listxattr	= generic_listxattr,
 	.removexattr	= generic_removexattr,
-	.permission	= shmem_permission,
+	.check_acl	= shmem_check_acl,
 #endif
 
 };
@@ -2469,7 +2469,7 @@ static const struct inode_operations shmem_dir_inode_operations = {
 	.getxattr	= generic_getxattr,
 	.listxattr	= generic_listxattr,
 	.removexattr	= generic_removexattr,
-	.permission	= shmem_permission,
+	.check_acl	= shmem_check_acl,
 #endif
 };
 
@@ -2480,7 +2480,7 @@ static const struct inode_operations shmem_special_inode_operations = {
 	.getxattr	= generic_getxattr,
 	.listxattr	= generic_listxattr,
 	.removexattr	= generic_removexattr,
-	.permission	= shmem_permission,
+	.check_acl	= shmem_check_acl,
 #endif
 };
 
diff --git a/mm/shmem_acl.c b/mm/shmem_acl.c
index 606a8e7..df2c87f 100644
--- a/mm/shmem_acl.c
+++ b/mm/shmem_acl.c
@@ -157,7 +157,7 @@ shmem_acl_init(struct inode *inode, struct inode *dir)
 /**
  * shmem_check_acl  -  check_acl() callback for generic_permission()
  */
-static int
+int
 shmem_check_acl(struct inode *inode, int mask)
 {
 	struct posix_acl *acl = shmem_get_acl(inode, ACL_TYPE_ACCESS);
@@ -169,12 +169,3 @@ shmem_check_acl(struct inode *inode, int mask)
 	}
 	return -EAGAIN;
 }
-
-/**
- * shmem_permission  -  permission() inode operation
- */
-int
-shmem_permission(struct inode *inode, int mask)
-{
-	return generic_permission(inode, mask, shmem_check_acl);
-}
-- 
1.6.4.1.209.g74b8


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/8] ext[234]: move over to 'check_acl' permission model
  2009-09-07 21:04           ` [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission' Linus Torvalds
@ 2009-09-07 21:05             ` Linus Torvalds
  2009-09-07 21:05               ` [PATCH 8/8] jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()' Linus Torvalds
  2009-09-07 22:23             ` [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission' James Morris
  2009-09-08 18:05             ` Hugh Dickins
  2 siblings, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2009-09-07 21:05 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Al Viro, Linux Filesystem Mailing List
  Cc: Eric Paris, Mimi Zohar, James Morris


From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 28 Aug 2009 12:12:24 -0700

Don't implement per-filesystem 'extX_permission()' functions that have
to be called for every path component operation, and instead just expose
the actual ACL checking so that the VFS layer can now do it for us.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/ext2/acl.c   |    8 +-------
 fs/ext2/acl.h   |    4 ++--
 fs/ext2/file.c  |    2 +-
 fs/ext2/namei.c |    4 ++--
 fs/ext3/acl.c   |    8 +-------
 fs/ext3/acl.h   |    4 ++--
 fs/ext3/file.c  |    2 +-
 fs/ext3/namei.c |    4 ++--
 fs/ext4/acl.c   |    8 +-------
 fs/ext4/acl.h   |    4 ++--
 fs/ext4/file.c  |    2 +-
 fs/ext4/namei.c |    4 ++--
 12 files changed, 18 insertions(+), 36 deletions(-)

diff --git a/fs/ext2/acl.c b/fs/ext2/acl.c
index d636e12..a63d442 100644
--- a/fs/ext2/acl.c
+++ b/fs/ext2/acl.c
@@ -230,7 +230,7 @@ ext2_set_acl(struct inode *inode, int type, struct posix_acl *acl)
 	return error;
 }
 
-static int
+int
 ext2_check_acl(struct inode *inode, int mask)
 {
 	struct posix_acl *acl = ext2_get_acl(inode, ACL_TYPE_ACCESS);
@@ -246,12 +246,6 @@ ext2_check_acl(struct inode *inode, int mask)
 	return -EAGAIN;
 }
 
-int
-ext2_permission(struct inode *inode, int mask)
-{
-	return generic_permission(inode, mask, ext2_check_acl);
-}
-
 /*
  * Initialize the ACLs of a new inode. Called from ext2_new_inode.
  *
diff --git a/fs/ext2/acl.h b/fs/ext2/acl.h
index ecefe47..3ff6cbb 100644
--- a/fs/ext2/acl.h
+++ b/fs/ext2/acl.h
@@ -54,13 +54,13 @@ static inline int ext2_acl_count(size_t size)
 #ifdef CONFIG_EXT2_FS_POSIX_ACL
 
 /* acl.c */
-extern int ext2_permission (struct inode *, int);
+extern int ext2_check_acl (struct inode *, int);
 extern int ext2_acl_chmod (struct inode *);
 extern int ext2_init_acl (struct inode *, struct inode *);
 
 #else
 #include <linux/sched.h>
-#define ext2_permission NULL
+#define ext2_check_acl	NULL
 #define ext2_get_acl	NULL
 #define ext2_set_acl	NULL
 
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index 2b9e47d..a2f3afd 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -85,6 +85,6 @@ const struct inode_operations ext2_file_inode_operations = {
 	.removexattr	= generic_removexattr,
 #endif
 	.setattr	= ext2_setattr,
-	.permission	= ext2_permission,
+	.check_acl	= ext2_check_acl,
 	.fiemap		= ext2_fiemap,
 };
diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
index e1dedb0..58051df 100644
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@@ -396,7 +396,7 @@ const struct inode_operations ext2_dir_inode_operations = {
 	.removexattr	= generic_removexattr,
 #endif
 	.setattr	= ext2_setattr,
-	.permission	= ext2_permission,
+	.check_acl	= ext2_check_acl,
 };
 
 const struct inode_operations ext2_special_inode_operations = {
@@ -407,5 +407,5 @@ const struct inode_operations ext2_special_inode_operations = {
 	.removexattr	= generic_removexattr,
 #endif
 	.setattr	= ext2_setattr,
-	.permission	= ext2_permission,
+	.check_acl	= ext2_check_acl,
 };
diff --git a/fs/ext3/acl.c b/fs/ext3/acl.c
index e167bae..c9b0df3 100644
--- a/fs/ext3/acl.c
+++ b/fs/ext3/acl.c
@@ -238,7 +238,7 @@ ext3_set_acl(handle_t *handle, struct inode *inode, int type,
 	return error;
 }
 
-static int
+int
 ext3_check_acl(struct inode *inode, int mask)
 {
 	struct posix_acl *acl = ext3_get_acl(inode, ACL_TYPE_ACCESS);
@@ -254,12 +254,6 @@ ext3_check_acl(struct inode *inode, int mask)
 	return -EAGAIN;
 }
 
-int
-ext3_permission(struct inode *inode, int mask)
-{
-	return generic_permission(inode, mask, ext3_check_acl);
-}
-
 /*
  * Initialize the ACLs of a new inode. Called from ext3_new_inode.
  *
diff --git a/fs/ext3/acl.h b/fs/ext3/acl.h
index 07d15a3..5973346 100644
--- a/fs/ext3/acl.h
+++ b/fs/ext3/acl.h
@@ -54,13 +54,13 @@ static inline int ext3_acl_count(size_t size)
 #ifdef CONFIG_EXT3_FS_POSIX_ACL
 
 /* acl.c */
-extern int ext3_permission (struct inode *, int);
+extern int ext3_check_acl (struct inode *, int);
 extern int ext3_acl_chmod (struct inode *);
 extern int ext3_init_acl (handle_t *, struct inode *, struct inode *);
 
 #else  /* CONFIG_EXT3_FS_POSIX_ACL */
 #include <linux/sched.h>
-#define ext3_permission NULL
+#define ext3_check_acl NULL
 
 static inline int
 ext3_acl_chmod(struct inode *inode)
diff --git a/fs/ext3/file.c b/fs/ext3/file.c
index 5b49704..2992532 100644
--- a/fs/ext3/file.c
+++ b/fs/ext3/file.c
@@ -137,7 +137,7 @@ const struct inode_operations ext3_file_inode_operations = {
 	.listxattr	= ext3_listxattr,
 	.removexattr	= generic_removexattr,
 #endif
-	.permission	= ext3_permission,
+	.check_acl	= ext3_check_acl,
 	.fiemap		= ext3_fiemap,
 };
 
diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
index 6ff7b97..aad6400 100644
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -2445,7 +2445,7 @@ const struct inode_operations ext3_dir_inode_operations = {
 	.listxattr	= ext3_listxattr,
 	.removexattr	= generic_removexattr,
 #endif
-	.permission	= ext3_permission,
+	.check_acl	= ext3_check_acl,
 };
 
 const struct inode_operations ext3_special_inode_operations = {
@@ -2456,5 +2456,5 @@ const struct inode_operations ext3_special_inode_operations = {
 	.listxattr	= ext3_listxattr,
 	.removexattr	= generic_removexattr,
 #endif
-	.permission	= ext3_permission,
+	.check_acl	= ext3_check_acl,
 };
diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
index f6d8967..0df88b2 100644
--- a/fs/ext4/acl.c
+++ b/fs/ext4/acl.c
@@ -236,7 +236,7 @@ ext4_set_acl(handle_t *handle, struct inode *inode, int type,
 	return error;
 }
 
-static int
+int
 ext4_check_acl(struct inode *inode, int mask)
 {
 	struct posix_acl *acl = ext4_get_acl(inode, ACL_TYPE_ACCESS);
@@ -252,12 +252,6 @@ ext4_check_acl(struct inode *inode, int mask)
 	return -EAGAIN;
 }
 
-int
-ext4_permission(struct inode *inode, int mask)
-{
-	return generic_permission(inode, mask, ext4_check_acl);
-}
-
 /*
  * Initialize the ACLs of a new inode. Called from ext4_new_inode.
  *
diff --git a/fs/ext4/acl.h b/fs/ext4/acl.h
index 949789d..9d843d5 100644
--- a/fs/ext4/acl.h
+++ b/fs/ext4/acl.h
@@ -54,13 +54,13 @@ static inline int ext4_acl_count(size_t size)
 #ifdef CONFIG_EXT4_FS_POSIX_ACL
 
 /* acl.c */
-extern int ext4_permission(struct inode *, int);
+extern int ext4_check_acl(struct inode *, int);
 extern int ext4_acl_chmod(struct inode *);
 extern int ext4_init_acl(handle_t *, struct inode *, struct inode *);
 
 #else  /* CONFIG_EXT4_FS_POSIX_ACL */
 #include <linux/sched.h>
-#define ext4_permission NULL
+#define ext4_check_acl NULL
 
 static inline int
 ext4_acl_chmod(struct inode *inode)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 3f1873f..27f3c53 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -207,7 +207,7 @@ const struct inode_operations ext4_file_inode_operations = {
 	.listxattr	= ext4_listxattr,
 	.removexattr	= generic_removexattr,
 #endif
-	.permission	= ext4_permission,
+	.check_acl	= ext4_check_acl,
 	.fallocate	= ext4_fallocate,
 	.fiemap		= ext4_fiemap,
 };
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index de04013..114abe5 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2536,7 +2536,7 @@ const struct inode_operations ext4_dir_inode_operations = {
 	.listxattr	= ext4_listxattr,
 	.removexattr	= generic_removexattr,
 #endif
-	.permission	= ext4_permission,
+	.check_acl	= ext4_check_acl,
 	.fiemap         = ext4_fiemap,
 };
 
@@ -2548,5 +2548,5 @@ const struct inode_operations ext4_special_inode_operations = {
 	.listxattr	= ext4_listxattr,
 	.removexattr	= generic_removexattr,
 #endif
-	.permission	= ext4_permission,
+	.check_acl	= ext4_check_acl,
 };
-- 
1.6.4.1.209.g74b8


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 8/8] jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()'
  2009-09-07 21:05             ` [PATCH 7/8] ext[234]: move over to 'check_acl' permission model Linus Torvalds
@ 2009-09-07 21:05               ` Linus Torvalds
  2009-09-08 18:34                 ` Christoph Hellwig
  0 siblings, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2009-09-07 21:05 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Al Viro, Linux Filesystem Mailing List
  Cc: Eric Paris, Mimi Zohar, James Morris


From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 28 Aug 2009 12:29:03 -0700

This avoids an indirect call in the VFS for each path component lookup.

Well, at least as long as you own the directory in question, and the ACL
check is unnecessary.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/jffs2/acl.c              |    7 +------
 fs/jffs2/acl.h              |    4 ++--
 fs/jffs2/dir.c              |    2 +-
 fs/jffs2/file.c             |    2 +-
 fs/jffs2/symlink.c          |    2 +-
 fs/jfs/acl.c                |    7 +------
 fs/jfs/file.c               |    2 +-
 fs/jfs/jfs_acl.h            |    2 +-
 fs/jfs/namei.c              |    2 +-
 fs/xfs/linux-2.6/xfs_iops.c |   16 ++++------------
 10 files changed, 14 insertions(+), 32 deletions(-)

diff --git a/fs/jffs2/acl.c b/fs/jffs2/acl.c
index 8fcb623..7edb62e 100644
--- a/fs/jffs2/acl.c
+++ b/fs/jffs2/acl.c
@@ -258,7 +258,7 @@ static int jffs2_set_acl(struct inode *inode, int type, struct posix_acl *acl)
 	return rc;
 }
 
-static int jffs2_check_acl(struct inode *inode, int mask)
+int jffs2_check_acl(struct inode *inode, int mask)
 {
 	struct posix_acl *acl;
 	int rc;
@@ -274,11 +274,6 @@ static int jffs2_check_acl(struct inode *inode, int mask)
 	return -EAGAIN;
 }
 
-int jffs2_permission(struct inode *inode, int mask)
-{
-	return generic_permission(inode, mask, jffs2_check_acl);
-}
-
 int jffs2_init_acl_pre(struct inode *dir_i, struct inode *inode, int *i_mode)
 {
 	struct posix_acl *acl, *clone;
diff --git a/fs/jffs2/acl.h b/fs/jffs2/acl.h
index fc929f2..f0ba63e 100644
--- a/fs/jffs2/acl.h
+++ b/fs/jffs2/acl.h
@@ -26,7 +26,7 @@ struct jffs2_acl_header {
 
 #ifdef CONFIG_JFFS2_FS_POSIX_ACL
 
-extern int jffs2_permission(struct inode *, int);
+extern int jffs2_check_acl(struct inode *, int);
 extern int jffs2_acl_chmod(struct inode *);
 extern int jffs2_init_acl_pre(struct inode *, struct inode *, int *);
 extern int jffs2_init_acl_post(struct inode *);
@@ -36,7 +36,7 @@ extern struct xattr_handler jffs2_acl_default_xattr_handler;
 
 #else
 
-#define jffs2_permission			(NULL)
+#define jffs2_check_acl				(NULL)
 #define jffs2_acl_chmod(inode)			(0)
 #define jffs2_init_acl_pre(dir_i,inode,mode)	(0)
 #define jffs2_init_acl_post(inode)		(0)
diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c
index 6f60cc9..7aa4417 100644
--- a/fs/jffs2/dir.c
+++ b/fs/jffs2/dir.c
@@ -55,7 +55,7 @@ const struct inode_operations jffs2_dir_inode_operations =
 	.rmdir =	jffs2_rmdir,
 	.mknod =	jffs2_mknod,
 	.rename =	jffs2_rename,
-	.permission =	jffs2_permission,
+	.check_acl =	jffs2_check_acl,
 	.setattr =	jffs2_setattr,
 	.setxattr =	jffs2_setxattr,
 	.getxattr =	jffs2_getxattr,
diff --git a/fs/jffs2/file.c b/fs/jffs2/file.c
index 23c9475..b7b74e2 100644
--- a/fs/jffs2/file.c
+++ b/fs/jffs2/file.c
@@ -56,7 +56,7 @@ const struct file_operations jffs2_file_operations =
 
 const struct inode_operations jffs2_file_inode_operations =
 {
-	.permission =	jffs2_permission,
+	.check_acl =	jffs2_check_acl,
 	.setattr =	jffs2_setattr,
 	.setxattr =	jffs2_setxattr,
 	.getxattr =	jffs2_getxattr,
diff --git a/fs/jffs2/symlink.c b/fs/jffs2/symlink.c
index b7339c3..4ec11e8 100644
--- a/fs/jffs2/symlink.c
+++ b/fs/jffs2/symlink.c
@@ -21,7 +21,7 @@ const struct inode_operations jffs2_symlink_inode_operations =
 {
 	.readlink =	generic_readlink,
 	.follow_link =	jffs2_follow_link,
-	.permission =	jffs2_permission,
+	.check_acl =	jffs2_check_acl,
 	.setattr =	jffs2_setattr,
 	.setxattr =	jffs2_setxattr,
 	.getxattr =	jffs2_getxattr,
diff --git a/fs/jfs/acl.c b/fs/jfs/acl.c
index a29c7c3..d66477c 100644
--- a/fs/jfs/acl.c
+++ b/fs/jfs/acl.c
@@ -114,7 +114,7 @@ out:
 	return rc;
 }
 
-static int jfs_check_acl(struct inode *inode, int mask)
+int jfs_check_acl(struct inode *inode, int mask)
 {
 	struct posix_acl *acl = jfs_get_acl(inode, ACL_TYPE_ACCESS);
 
@@ -129,11 +129,6 @@ static int jfs_check_acl(struct inode *inode, int mask)
 	return -EAGAIN;
 }
 
-int jfs_permission(struct inode *inode, int mask)
-{
-	return generic_permission(inode, mask, jfs_check_acl);
-}
-
 int jfs_init_acl(tid_t tid, struct inode *inode, struct inode *dir)
 {
 	struct posix_acl *acl = NULL;
diff --git a/fs/jfs/file.c b/fs/jfs/file.c
index 7f6063a..2b70fa7 100644
--- a/fs/jfs/file.c
+++ b/fs/jfs/file.c
@@ -96,7 +96,7 @@ const struct inode_operations jfs_file_inode_operations = {
 	.removexattr	= jfs_removexattr,
 #ifdef CONFIG_JFS_POSIX_ACL
 	.setattr	= jfs_setattr,
-	.permission	= jfs_permission,
+	.check_acl	= jfs_check_acl,
 #endif
 };
 
diff --git a/fs/jfs/jfs_acl.h b/fs/jfs/jfs_acl.h
index 88475f1..b07bd41 100644
--- a/fs/jfs/jfs_acl.h
+++ b/fs/jfs/jfs_acl.h
@@ -20,7 +20,7 @@
 
 #ifdef CONFIG_JFS_POSIX_ACL
 
-int jfs_permission(struct inode *, int);
+int jfs_check_acl(struct inode *, int);
 int jfs_init_acl(tid_t, struct inode *, struct inode *);
 int jfs_setattr(struct dentry *, struct iattr *);
 
diff --git a/fs/jfs/namei.c b/fs/jfs/namei.c
index 514ee2e..c79a427 100644
--- a/fs/jfs/namei.c
+++ b/fs/jfs/namei.c
@@ -1543,7 +1543,7 @@ const struct inode_operations jfs_dir_inode_operations = {
 	.removexattr	= jfs_removexattr,
 #ifdef CONFIG_JFS_POSIX_ACL
 	.setattr	= jfs_setattr,
-	.permission	= jfs_permission,
+	.check_acl	= jfs_check_acl,
 #endif
 };
 
diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c
index 8070b34..6c32f1d 100644
--- a/fs/xfs/linux-2.6/xfs_iops.c
+++ b/fs/xfs/linux-2.6/xfs_iops.c
@@ -485,14 +485,6 @@ xfs_vn_put_link(
 }
 
 STATIC int
-xfs_vn_permission(
-	struct inode		*inode,
-	int			mask)
-{
-	return generic_permission(inode, mask, xfs_check_acl);
-}
-
-STATIC int
 xfs_vn_getattr(
 	struct vfsmount		*mnt,
 	struct dentry		*dentry,
@@ -696,7 +688,7 @@ xfs_vn_fiemap(
 }
 
 static const struct inode_operations xfs_inode_operations = {
-	.permission		= xfs_vn_permission,
+	.check_acl		= xfs_check_acl,
 	.truncate		= xfs_vn_truncate,
 	.getattr		= xfs_vn_getattr,
 	.setattr		= xfs_vn_setattr,
@@ -724,7 +716,7 @@ static const struct inode_operations xfs_dir_inode_operations = {
 	.rmdir			= xfs_vn_unlink,
 	.mknod			= xfs_vn_mknod,
 	.rename			= xfs_vn_rename,
-	.permission		= xfs_vn_permission,
+	.check_acl		= xfs_check_acl,
 	.getattr		= xfs_vn_getattr,
 	.setattr		= xfs_vn_setattr,
 	.setxattr		= generic_setxattr,
@@ -749,7 +741,7 @@ static const struct inode_operations xfs_dir_ci_inode_operations = {
 	.rmdir			= xfs_vn_unlink,
 	.mknod			= xfs_vn_mknod,
 	.rename			= xfs_vn_rename,
-	.permission		= xfs_vn_permission,
+	.check_acl		= xfs_check_acl,
 	.getattr		= xfs_vn_getattr,
 	.setattr		= xfs_vn_setattr,
 	.setxattr		= generic_setxattr,
@@ -762,7 +754,7 @@ static const struct inode_operations xfs_symlink_inode_operations = {
 	.readlink		= generic_readlink,
 	.follow_link		= xfs_vn_follow_link,
 	.put_link		= xfs_vn_put_link,
-	.permission		= xfs_vn_permission,
+	.check_acl		= xfs_check_acl,
 	.getattr		= xfs_vn_getattr,
 	.setattr		= xfs_vn_setattr,
 	.setxattr		= generic_setxattr,
-- 
1.6.4.1.209.g74b8


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/8] Simplify exec_permission_lite() logic
  2009-09-07 21:02   ` [PATCH 2/8] Simplify exec_permission_lite() logic Linus Torvalds
  2009-09-07 21:03     ` [PATCH 3/8] Simplify exec_permission_lite() further Linus Torvalds
@ 2009-09-07 22:12     ` James Morris
  1 sibling, 0 replies; 24+ messages in thread
From: James Morris @ 2009-09-07 22:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Al Viro,
	Linux Filesystem Mailing List, Eric Paris, Mimi Zohar

On Mon, 7 Sep 2009, Linus Torvalds wrote:

> 
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Fri, 28 Aug 2009 10:50:37 -0700
> 
> Instead of returning EAGAIN and having the caller do something
> special for that case,  just do the special case directly.
> 
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Reviewed-by: James Morris <jmorris@namei.org>


-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] Simplify exec_permission_lite() further
  2009-09-07 21:03     ` [PATCH 3/8] Simplify exec_permission_lite() further Linus Torvalds
  2009-09-07 21:03       ` [PATCH 4/8] Simplify exec_permission_lite(), part 3 Linus Torvalds
@ 2009-09-07 22:15       ` James Morris
  2009-09-08 14:40       ` Jamie Lokier
  2 siblings, 0 replies; 24+ messages in thread
From: James Morris @ 2009-09-07 22:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Al Viro,
	Linux Filesystem Mailing List, Eric Paris, Mimi Zohar

On Mon, 7 Sep 2009, Linus Torvalds wrote:

> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Looks ok to me, but I'd be interested in any comments from others on the 
cc list.

Reviewed-by: James Morris <jmorris@namei.org>


-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/8] Simplify exec_permission_lite(), part 3
  2009-09-07 21:03       ` [PATCH 4/8] Simplify exec_permission_lite(), part 3 Linus Torvalds
  2009-09-07 21:04         ` [PATCH 5/8] Make 'check_acl()' a first-class filesystem op Linus Torvalds
@ 2009-09-07 22:18         ` James Morris
  1 sibling, 0 replies; 24+ messages in thread
From: James Morris @ 2009-09-07 22:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Al Viro,
	Linux Filesystem Mailing List, Eric Paris, Mimi Zohar

On Mon, 7 Sep 2009, Linus Torvalds wrote:

> The generic inode_permission() code does things like checking MAY_WRITE
> and devcgroup_inode_permission(), neither of which are relevant for the
> light pathname walk permission checks (we always do just MAY_EXEC, and
> the inode is never a special device).
> 
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Reviewed-by: James Morris <jmorris@namei.org>

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/8] Make 'check_acl()' a first-class filesystem op
  2009-09-07 21:04         ` [PATCH 5/8] Make 'check_acl()' a first-class filesystem op Linus Torvalds
  2009-09-07 21:04           ` [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission' Linus Torvalds
@ 2009-09-07 22:20           ` James Morris
  1 sibling, 0 replies; 24+ messages in thread
From: James Morris @ 2009-09-07 22:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Al Viro,
	Linux Filesystem Mailing List, Eric Paris, Mimi Zohar

On Mon, 7 Sep 2009, Linus Torvalds wrote:

> 
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Fri, 28 Aug 2009 11:51:25 -0700
> 
> This is stage one in flattening out the callchains for the common
> permission testing.  Rather than have most filesystem implemnt their
> inode->i_op->permission own function that just calls back down to the
> VFS layers 'generic_permission()' with the per-filesystem ACL checking
> function, the filesystem can just expose its 'check_acl' function
> directly, and let the VFS layer do everything for it.
> 
> This is all just preparatory - no filesystem actually enables this yet.
> 
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Reviewed-by: James Morris <jmorris@namei.org>

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission'
  2009-09-07 21:04           ` [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission' Linus Torvalds
  2009-09-07 21:05             ` [PATCH 7/8] ext[234]: move over to 'check_acl' permission model Linus Torvalds
@ 2009-09-07 22:23             ` James Morris
  2009-09-08  0:03               ` James Morris
  2009-09-08 18:05             ` Hugh Dickins
  2 siblings, 1 reply; 24+ messages in thread
From: James Morris @ 2009-09-07 22:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Al Viro,
	Linux Filesystem Mailing List, Eric Paris, Mimi Zohar

On Mon, 7 Sep 2009, Linus Torvalds wrote:

> 
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Fri, 28 Aug 2009 12:04:28 -0700
> 
> shmfs wants purely standard POSIX ACL semantics, so we can use the new
> generic VFS layer POSIX ACL checking rather than cooking our own
> 'permission()' function.
> 
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Reviewed-by: James Morris <jmorris@namei.org>

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission'
  2009-09-07 22:23             ` [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission' James Morris
@ 2009-09-08  0:03               ` James Morris
  0 siblings, 0 replies; 24+ messages in thread
From: James Morris @ 2009-09-08  0:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Al Viro,
	Linux Filesystem Mailing List, Eric Paris, Mimi Zohar

On Tue, 8 Sep 2009, James Morris wrote:

> Reviewed-by: James Morris <jmorris@namei.org>

Ditto for patches 7 & 8.


-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/8] Do not call 'ima_path_check()' for each path component
  2009-09-07 21:02 ` [PATCH 1/8] Do not call 'ima_path_check()' for each path component Linus Torvalds
  2009-09-07 21:02   ` [PATCH 2/8] Simplify exec_permission_lite() logic Linus Torvalds
@ 2009-09-08  1:50   ` Christoph Hellwig
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2009-09-08  1:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Al Viro,
	Linux Filesystem Mailing List, Eric Paris, Mimi Zohar,
	James Morris

On Mon, Sep 07, 2009 at 02:02:16PM -0700, Linus Torvalds wrote:
> 
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Fri, 28 Aug 2009 10:05:33 -0700
> 
> Not only is that a supremely timing-critical path, but it's hopefully some 
> day going to be lockless for the common case, and ima can't do that.
> 
> Plus the integrity code doesn't even care about non-regular files,
> so it was always a total waste of time and effort.

Haha, nice one.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/8] VFS name lookup permission checking cleanup
  2009-09-07 21:01 [PATCH 0/8] VFS name lookup permission checking cleanup Linus Torvalds
  2009-09-07 21:02 ` [PATCH 1/8] Do not call 'ima_path_check()' for each path component Linus Torvalds
@ 2009-09-08 14:01 ` Serge E. Hallyn
  2009-09-08 17:52 ` Mimi Zohar
  2 siblings, 0 replies; 24+ messages in thread
From: Serge E. Hallyn @ 2009-09-08 14:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Al Viro,
	Linux Filesystem Mailing List, Eric Paris, Mimi Zohar,
	James Morris

Quoting Linus Torvalds (torvalds@linux-foundation.org):
> 
> This is a series of eight trivial patches that I'd like people to take a 
> look at, because I am hoping to eventually do multiple path component 
> lookups in one go without taking the per-dentry lock or incrementing (and 
> then decrementing) the per-dentry atomic count for each component.
> 
> The aim would be to try to avoid getting that annoying cacheline ping-pong 
> on the common top-level dentries that everybody looks up (ie root and home 
> directories, /usr, /usr/bin etc).
> 
> Right now I have some simple (but real) loads that show the contention on 
> dentry->d_lock to be a roughly 3% performance hit on a single-socket 
> nehalem, and I assume it can be much worse on multi-socket machines.
> 
> And the thing is, it should be entirely possible to do everything but the 
> last component lookup with just a single read_seqbegin()/read_seqretry() 
> around the whole lookup. Yes, the last component is special and absolutely 
> needs locking and counting - but the last component also doesn't tend to 
> be shared, so locking it is fine.
> 
> Now, I may never actually get there, but when looking at it, the biggest 
> problem is actually not so much the path lookup itself, as the security 
> tests that are done for each path component. And it should be noted that 
> in order for a lockless seq-lock only lookup make sense, any such 
> operations would have to be totally lock-free too. They certainly can't 
> take mutexes etc, but right now they do.
> 
> Those security tests fall into two categories:
> 
>  - actual security layer callouts: ima_path_check().
> 
>    This one looks totally pointless. Path component lookup is a horribly 
>    timing-critical path, and we will only do a successful lookup on a 
>    directory (inode needs to have a ->lookup operation), yet in the middle 
>    of that is a call to "ima_path_check()".
> 
>    Now, it looks like ima_path_check() is very much designed to only check 
>    the _final_ path anyway, and was never meant to be used to check the 
>    directories we hit on the way. In fact, the whole function starts with
> 
> 	if (!ima_initialized || !S_ISREG(inode->i_mode))
> 		return 0;
> 
>    so it's totally pointless to do that thing on a directory where 
>    that !S_ISREG() test will trigger.
> 
>    So just remove it. IMA should never have put that check in there to 
>    begin with, it's just way too performance-sensitive.
> 
>  - the real filesystem permission checks. 
> 
>    We used to do the common case entirely in the VFS layer, but these days 
>    the common case includes POSIX ACL checking, and as a result, the 
>    trivial short-circuit code in the VFS layer almost never triggers in
>    practice, and we call down to the low-level filesystem for each 
>    component. 
> 
>    We can't fix that by just removing the call, but what we _can_ do is to 
>    at least avoid the silly calling back-and-forth: most filesystems will 
>    just call back to the VFS layer to do the "generic_permission()" with 
>    their own ACL-checking routines.
> 
>    That way we can flatten the call-chain out a bit, and avoid one 
>    unnecessary indirect call in that timing-critical region. And 
>    eventually, if we make the whole ACL caching thing be something that we 
>    do at a VFS layer (Al Viro already worked on _some_ of that), we'll be 
>    able to avoid the calls entirely when we can see the cached ACL 
>    pointers directly.
> 
> So this series of 8 patches do all these preliminary things. As shown by 
> the diffstat below, it actually reduces the lines of code (mainly by just 
> removing the silly per-filesystem wrappers around "generic_permission()") 
> and it also makes it a _lot_ clearer what actually gets called in that 
> whole 'exec_permission_lite()' function that we use to check the 
> permission of a pathname lookup.
> 
> Comments?  Especially from the IMA people (first patch) and from generic 
> VFS, security and low-level FS people (the 'Simplify exec_permission_lite' 
> series, and then the check_acl + per-filesystem changes).
> 
> Al?
> 
> I'm looking to merge these shortly after 2.6.31 is released, but comments 
> welcome.

All of them seem good, and I don't see any thinkos, no resulting skipped
checks or anything.

Acked-by: Serge Hallyn <serue@us.ibm.com>

-serge

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] Simplify exec_permission_lite() further
  2009-09-07 21:03     ` [PATCH 3/8] Simplify exec_permission_lite() further Linus Torvalds
  2009-09-07 21:03       ` [PATCH 4/8] Simplify exec_permission_lite(), part 3 Linus Torvalds
  2009-09-07 22:15       ` [PATCH 3/8] Simplify exec_permission_lite() further James Morris
@ 2009-09-08 14:40       ` Jamie Lokier
  2009-09-08 14:58         ` Matthew Wilcox
  2009-09-08 15:02         ` Linus Torvalds
  2 siblings, 2 replies; 24+ messages in thread
From: Jamie Lokier @ 2009-09-08 14:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Al Viro,
	Linux Filesystem Mailing List, Eric Paris, Mimi Zohar,
	James Morris

Linus Torvalds wrote:
> This function is only called for path components that are already known
> to be directories (they have a '->lookup' method).  So don't bother
> doing that whole S_ISDIR() testing,

A few years ago you briefly discussed diddling the VFS to accept
objects which are both files and directories.  That is, have
->lookup() and also can be read as regular files.

I don't know if your thinking has changed, and I don't particularly
care, only thought I'd remind in case it's something you'd like to
support at some point.

-- Jamie

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] Simplify exec_permission_lite() further
  2009-09-08 14:40       ` Jamie Lokier
@ 2009-09-08 14:58         ` Matthew Wilcox
  2009-09-08 15:02         ` Linus Torvalds
  1 sibling, 0 replies; 24+ messages in thread
From: Matthew Wilcox @ 2009-09-08 14:58 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Linus Torvalds, Linux Kernel Mailing List, Al Viro,
	Linux Filesystem Mailing List, Eric Paris, Mimi Zohar,
	James Morris

On Tue, Sep 08, 2009 at 03:40:12PM +0100, Jamie Lokier wrote:
> Linus Torvalds wrote:
> > This function is only called for path components that are already known
> > to be directories (they have a '->lookup' method).  So don't bother
> > doing that whole S_ISDIR() testing,
> 
> A few years ago you briefly discussed diddling the VFS to accept
> objects which are both files and directories.  That is, have
> ->lookup() and also can be read as regular files.
> 
> I don't know if your thinking has changed, and I don't particularly
> care, only thought I'd remind in case it's something you'd like to
> support at some point.

If it's being used as a component of a pathname, we're _using_ it as a
directory, so it won't matter for this case.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] Simplify exec_permission_lite() further
  2009-09-08 14:40       ` Jamie Lokier
  2009-09-08 14:58         ` Matthew Wilcox
@ 2009-09-08 15:02         ` Linus Torvalds
  1 sibling, 0 replies; 24+ messages in thread
From: Linus Torvalds @ 2009-09-08 15:02 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Linux Kernel Mailing List, Al Viro,
	Linux Filesystem Mailing List, Eric Paris, Mimi Zohar,
	James Morris



On Tue, 8 Sep 2009, Jamie Lokier wrote:
>
> Linus Torvalds wrote:
> > This function is only called for path components that are already known
> > to be directories (they have a '->lookup' method).  So don't bother
> > doing that whole S_ISDIR() testing,
> 
> A few years ago you briefly discussed diddling the VFS to accept
> objects which are both files and directories.  That is, have
> ->lookup() and also can be read as regular files.
> 
> I don't know if your thinking has changed, and I don't particularly
> care, only thought I'd remind in case it's something you'd like to
> support at some point.

No, I explicitly thought about that case too, and the thing is, that even 
_if_ we do that some day, the change is actually the RightThing(tm) to do.

In fact, it's even _more_ true in that case. The old code was broken for 
that case.

The thing is, if we ever implement the concept of using a regular file as 
a path component (say, we look up some attributes or we teach the kernel 
to look into tar-files or whatever), then the permission check for using 
it as a path component would be _different_ from the permission check for 
"executing" the file as a file.  When used as a path component, we'd want 
to use CAP_DAC_SEARCH, for example, not some "is it ok to execute this 
file" permission.

So "permission" for a path component really is fundamentally different 
from the same thing as an actual end-result file open/use. They do share 
some logic, of course, like just the whole owner/group/other and ACL 
logic. But at the same time they have conceptual - and actual - 
differences too.

Just think about what "execute" permission means: it's not just the 'x' 
bit, there are things like per-filesystem "NOEXEC" bits. We check that at 
a completely different level already - exactly because the "execute" 
permission for a pathname component is something fundamentally _different_ 
from executing the result of an actual final lookup.

The "exec_permission_lite()" function makes some of that difference more 
explicit - and also makes it explicit what is shared. We do check the 'x' 
bit, and we do honor ACL bits, but the whole Integrity check part was very 
obviously meant to be done at the "actual final lookup result" level 
rather than at the path component level.

In fact, as it was, "ima_path_check()" wouldn't know whether MAY_EXEC 
meant that we were going to actually run a program, or look up a pathname. 
The way it "solved" that was to just ignore non-regular files. And I'm 
convinced that the real solution is to just say that "ima_path_check()" is 
always checking the end result of a path, not the components, and then 
MAY_EXEC actually has a much more unambiguous meaning.

(Admittedly I think the IMA code could also have disambiguated the cases 
by looking at whether MAY_OPEN is set ot not - looking at whether the 
inode in question had S_ISDIR set probably had other reasons too).

Anyway, I do suspect that if we ever do the whole "able to treat regular 
files as a lookup target" we _will_ hit other issues, and it's going to 
require a lot of thought - but I suspect this series is actually going to 
help it by separating out the permission handling more clearly.

			Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/8] VFS name lookup permission checking cleanup
  2009-09-07 21:01 [PATCH 0/8] VFS name lookup permission checking cleanup Linus Torvalds
  2009-09-07 21:02 ` [PATCH 1/8] Do not call 'ima_path_check()' for each path component Linus Torvalds
  2009-09-08 14:01 ` [PATCH 0/8] VFS name lookup permission checking cleanup Serge E. Hallyn
@ 2009-09-08 17:52 ` Mimi Zohar
  2 siblings, 0 replies; 24+ messages in thread
From: Mimi Zohar @ 2009-09-08 17:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Paris, James Morris, Linux Filesystem Mailing List,
	Linux Kernel Mailing List, Al Viro

Linus Torvalds <torvalds@linux-foundation.org> wrote on 09/07/2009 
05:01:14 PM:

> This is a series of eight trivial patches that I'd like people to take a 

> look at, because I am hoping to eventually do multiple path component 
> lookups in one go without taking the per-dentry lock or incrementing 
(and 
> then decrementing) the per-dentry atomic count for each component.
> 
> The aim would be to try to avoid getting that annoying cacheline 
ping-pong 
> on the common top-level dentries that everybody looks up (ie root and 
home 
> directories, /usr, /usr/bin etc).
> 
> Right now I have some simple (but real) loads that show the contention 
on 
> dentry->d_lock to be a roughly 3% performance hit on a single-socket 
> nehalem, and I assume it can be much worse on multi-socket machines.
> 
> And the thing is, it should be entirely possible to do everything but 
the 
> last component lookup with just a single read_seqbegin()/read_seqretry() 

> around the whole lookup. Yes, the last component is special and 
absolutely 
> needs locking and counting - but the last component also doesn't tend to 

> be shared, so locking it is fine.
> 
> Now, I may never actually get there, but when looking at it, the biggest 

> problem is actually not so much the path lookup itself, as the security 
> tests that are done for each path component. And it should be noted that 

> in order for a lockless seq-lock only lookup make sense, any such 
> operations would have to be totally lock-free too. They certainly can't 
> take mutexes etc, but right now they do.
> 
> Those security tests fall into two categories:
> 
>  - actual security layer callouts: ima_path_check().
> 
>    This one looks totally pointless. Path component lookup is a horribly 

>    timing-critical path, and we will only do a successful lookup on a 
>    directory (inode needs to have a ->lookup operation), yet in the 
middle 
>    of that is a call to "ima_path_check()".
> 
>    Now, it looks like ima_path_check() is very much designed to only 
check 
>    the _final_ path anyway, and was never meant to be used to check the 
>    directories we hit on the way. In fact, the whole function starts 
with
> 
>    if (!ima_initialized || !S_ISREG(inode->i_mode))
>       return 0;
> 
>    so it's totally pointless to do that thing on a directory where 
>    that !S_ISREG() test will trigger.
> 
>    So just remove it. IMA should never have put that check in there to 
>    begin with, it's just way too performance-sensitive.

You're right.  We don't need to call ima_path_check() here, as IMA
only measures the integrity of the file itself, and not directories.

Mimi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission'
  2009-09-07 21:04           ` [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission' Linus Torvalds
  2009-09-07 21:05             ` [PATCH 7/8] ext[234]: move over to 'check_acl' permission model Linus Torvalds
  2009-09-07 22:23             ` [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission' James Morris
@ 2009-09-08 18:05             ` Hugh Dickins
  2 siblings, 0 replies; 24+ messages in thread
From: Hugh Dickins @ 2009-09-08 18:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Al Viro,
	Linux Filesystem Mailing List, Eric Paris, Mimi Zohar,
	James Morris

On Mon, 7 Sep 2009, Linus Torvalds wrote:
> 
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Fri, 28 Aug 2009 12:04:28 -0700
> 
> shmfs wants purely standard POSIX ACL semantics, so we can use the new
> generic VFS layer POSIX ACL checking rather than cooking our own
> 'permission()' function.
> 
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>

> ---
>  include/linux/shmem_fs.h |    2 +-
>  mm/shmem.c               |    6 +++---
>  mm/shmem_acl.c           |   11 +----------
>  3 files changed, 5 insertions(+), 14 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 8/8] jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()'
  2009-09-07 21:05               ` [PATCH 8/8] jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()' Linus Torvalds
@ 2009-09-08 18:34                 ` Christoph Hellwig
  2009-09-09 17:09                   ` Linus Torvalds
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2009-09-08 18:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Al Viro,
	Linux Filesystem Mailing List, Eric Paris, Mimi Zohar,
	James Morris

The split of these patches is a bit odd, either do all in one patch or
one patch per filesystem instead of those groups.

That beeing said if we go down this way I would prefer if we go
down all the way, that is convert the remaining few filesystems that
pass a check_acl argument to generic_permission (btrfs, gfs2, ocfs2)
and just kill off that argument.

After that there is another step we can easily go:  as we now cache the
ACLs in the generic inode instead of the per-fs one we can move the
get_cached_acl call to your acl_permission_check helper (for gfs2/ocfs2
that don't cache ACLs it will always fail), and not call out to the fs
for the fast path at all.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 8/8] jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()'
  2009-09-08 18:34                 ` Christoph Hellwig
@ 2009-09-09 17:09                   ` Linus Torvalds
  0 siblings, 0 replies; 24+ messages in thread
From: Linus Torvalds @ 2009-09-09 17:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux Kernel Mailing List, Al Viro,
	Linux Filesystem Mailing List, Eric Paris, Mimi Zohar,
	James Morris



On Tue, 8 Sep 2009, Christoph Hellwig wrote:
>
> The split of these patches is a bit odd, either do all in one patch or
> one patch per filesystem instead of those groups.

Hmm. I actually did that somewhat on purpose. 

The reason? If there is something wrong, I want bisection to specify it 
fairly well, and I was thinking that maybe there would be some filesystem
specific issue. I know, it's unlikely, but hey, if I knew when bugs 
happened, I'd just make sure they never did..

So I wanted to keep the shmfs one separate, because people use that 
filesystem independently of - and generally differently from - the other 
on-disk ones, and maybe you could have a shmfs-specific bug but not a 
filesystem-specific one (or vice versa). So if some application suddenly 
breaks, anybody doing bisection would see which one it is.

Now, I could then have bundled up the rest of the filesystems as one 
commit, or done them all as individual ones, and there I don't really have 
any huge preferences. There were six filesystems that had "obviously just 
the wrapper function" (done by just doing a

	git grep -2 generic_permission

in case anybody cares), and quite frankly, if you do that grep, then the 
ext* group stands out very clearly (next to each other, same indentation 
due to filenames etc etc). So I just did them next.

And the third group was just "the rest". Not standing out in any 
particular way, but also not worth doing individually in any particular 
way. Bisectability in those groups doesn't much matter, because I don't 
think it's all that likely that a machine that shows problems would run a 
mix of filesystems, the way you'd mix shmfs with other filesystems and 
other patterns.

> That beeing said if we go down this way I would prefer if we go
> down all the way, that is convert the remaining few filesystems that
> pass a check_acl argument to generic_permission (btrfs, gfs2, ocfs2)
> and just kill off that argument.

And I do agree, eventually. But the series was really meant to be a 
standalone thing where I didn't force filesystems to change. I also hope 
that some filesystems, like btrfs, that don't use the "trivial wrapper", 
would look at _why_ they don't just use the trivial wrapper.

For some of them, they seem to have real major reasons not to just use 
'generic_permission()' (eg fuse), while others - btrfs - seem to be just 
odd, and eg have its own special case of btrfs-specific read-only crap. 
Which is a real downer, since that MAY_WRITE case isn't even relevant for 
pathname lookup, but even for other inodes I really don't see why btrfs 
doesn't just use the generic VFS-layer "S_IMMUTABLE" bit.

Now, for your particular suggestion it doesn't matter (we'd just drop the 
last argument, and have "generic_permission()" pick it up from the inode 
ops), but the larger point was that I really didn't want to change 
filesystem code unnecessarily. And I'd actually hope that eventually _no_ 
filesystem would use "generic_permission()" at all, and we'd just always 
do that whole check_acl thing at a VFS level if the filesystem provides 
acls.

Part of that is that I would also move the whole "get_cached_acl()" to 
the VFS layer entirely - and I think Al has patches to this. So then we'd 
only need to call some filesystem-specific "check_acl" in the cases where 
we do _not_ already have cached ACL's - and then we can finally do all the 
ACL checks without ever calling into the filesystem at all, which is 
pretty much required if we want to do lockless lookups.

> After that there is another step we can easily go:  as we now cache the
> ACLs in the generic inode instead of the per-fs one we can move the
> get_cached_acl call to your acl_permission_check helper (for gfs2/ocfs2
> that don't cache ACLs it will always fail), and not call out to the fs
> for the fast path at all.

Yes, see above. Al and I talked about this a couple of months ago, and 
that's why he already moved the ACL lists to the generic inode in 
preparation of this. The reason for that was that absolutely _all_ 
filesystems got the locking wrong (too much unnecessary locking for the 
common case of a cached "no acl" case), and some of them got the caching 
wrong (no caching at all).

The take-away message from this all is generally: "don't make individual 
filesystems do even simple things - they'll do it wrong". It's not just 
that the VFS layer will do it better, it's that if it's done in the VFS 
layer we can be clever about locking in a way that we can _not_ be when we 
have to rely on the filesystem writers being aware of subtle issues.

			Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2009-09-09 17:11 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-07 21:01 [PATCH 0/8] VFS name lookup permission checking cleanup Linus Torvalds
2009-09-07 21:02 ` [PATCH 1/8] Do not call 'ima_path_check()' for each path component Linus Torvalds
2009-09-07 21:02   ` [PATCH 2/8] Simplify exec_permission_lite() logic Linus Torvalds
2009-09-07 21:03     ` [PATCH 3/8] Simplify exec_permission_lite() further Linus Torvalds
2009-09-07 21:03       ` [PATCH 4/8] Simplify exec_permission_lite(), part 3 Linus Torvalds
2009-09-07 21:04         ` [PATCH 5/8] Make 'check_acl()' a first-class filesystem op Linus Torvalds
2009-09-07 21:04           ` [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission' Linus Torvalds
2009-09-07 21:05             ` [PATCH 7/8] ext[234]: move over to 'check_acl' permission model Linus Torvalds
2009-09-07 21:05               ` [PATCH 8/8] jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()' Linus Torvalds
2009-09-08 18:34                 ` Christoph Hellwig
2009-09-09 17:09                   ` Linus Torvalds
2009-09-07 22:23             ` [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission' James Morris
2009-09-08  0:03               ` James Morris
2009-09-08 18:05             ` Hugh Dickins
2009-09-07 22:20           ` [PATCH 5/8] Make 'check_acl()' a first-class filesystem op James Morris
2009-09-07 22:18         ` [PATCH 4/8] Simplify exec_permission_lite(), part 3 James Morris
2009-09-07 22:15       ` [PATCH 3/8] Simplify exec_permission_lite() further James Morris
2009-09-08 14:40       ` Jamie Lokier
2009-09-08 14:58         ` Matthew Wilcox
2009-09-08 15:02         ` Linus Torvalds
2009-09-07 22:12     ` [PATCH 2/8] Simplify exec_permission_lite() logic James Morris
2009-09-08  1:50   ` [PATCH 1/8] Do not call 'ima_path_check()' for each path component Christoph Hellwig
2009-09-08 14:01 ` [PATCH 0/8] VFS name lookup permission checking cleanup Serge E. Hallyn
2009-09-08 17:52 ` Mimi Zohar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.