All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serue@us.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Linux Filesystem Mailing List <linux-fsdevel@vger.kernel.org>,
	Eric Paris <eparis@redhat.com>, Mimi Zohar <zohar@us.ibm.com>,
	James Morris <jmorris@namei.org>
Subject: Re: [PATCH 0/8] VFS name lookup permission checking cleanup
Date: Tue, 8 Sep 2009 09:01:40 -0500	[thread overview]
Message-ID: <20090908140140.GB873@us.ibm.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0909071337510.3419@localhost.localdomain>

Quoting Linus Torvalds (torvalds@linux-foundation.org):
> 
> This is a series of eight trivial patches that I'd like people to take a 
> look at, because I am hoping to eventually do multiple path component 
> lookups in one go without taking the per-dentry lock or incrementing (and 
> then decrementing) the per-dentry atomic count for each component.
> 
> The aim would be to try to avoid getting that annoying cacheline ping-pong 
> on the common top-level dentries that everybody looks up (ie root and home 
> directories, /usr, /usr/bin etc).
> 
> Right now I have some simple (but real) loads that show the contention on 
> dentry->d_lock to be a roughly 3% performance hit on a single-socket 
> nehalem, and I assume it can be much worse on multi-socket machines.
> 
> And the thing is, it should be entirely possible to do everything but the 
> last component lookup with just a single read_seqbegin()/read_seqretry() 
> around the whole lookup. Yes, the last component is special and absolutely 
> needs locking and counting - but the last component also doesn't tend to 
> be shared, so locking it is fine.
> 
> Now, I may never actually get there, but when looking at it, the biggest 
> problem is actually not so much the path lookup itself, as the security 
> tests that are done for each path component. And it should be noted that 
> in order for a lockless seq-lock only lookup make sense, any such 
> operations would have to be totally lock-free too. They certainly can't 
> take mutexes etc, but right now they do.
> 
> Those security tests fall into two categories:
> 
>  - actual security layer callouts: ima_path_check().
> 
>    This one looks totally pointless. Path component lookup is a horribly 
>    timing-critical path, and we will only do a successful lookup on a 
>    directory (inode needs to have a ->lookup operation), yet in the middle 
>    of that is a call to "ima_path_check()".
> 
>    Now, it looks like ima_path_check() is very much designed to only check 
>    the _final_ path anyway, and was never meant to be used to check the 
>    directories we hit on the way. In fact, the whole function starts with
> 
> 	if (!ima_initialized || !S_ISREG(inode->i_mode))
> 		return 0;
> 
>    so it's totally pointless to do that thing on a directory where 
>    that !S_ISREG() test will trigger.
> 
>    So just remove it. IMA should never have put that check in there to 
>    begin with, it's just way too performance-sensitive.
> 
>  - the real filesystem permission checks. 
> 
>    We used to do the common case entirely in the VFS layer, but these days 
>    the common case includes POSIX ACL checking, and as a result, the 
>    trivial short-circuit code in the VFS layer almost never triggers in
>    practice, and we call down to the low-level filesystem for each 
>    component. 
> 
>    We can't fix that by just removing the call, but what we _can_ do is to 
>    at least avoid the silly calling back-and-forth: most filesystems will 
>    just call back to the VFS layer to do the "generic_permission()" with 
>    their own ACL-checking routines.
> 
>    That way we can flatten the call-chain out a bit, and avoid one 
>    unnecessary indirect call in that timing-critical region. And 
>    eventually, if we make the whole ACL caching thing be something that we 
>    do at a VFS layer (Al Viro already worked on _some_ of that), we'll be 
>    able to avoid the calls entirely when we can see the cached ACL 
>    pointers directly.
> 
> So this series of 8 patches do all these preliminary things. As shown by 
> the diffstat below, it actually reduces the lines of code (mainly by just 
> removing the silly per-filesystem wrappers around "generic_permission()") 
> and it also makes it a _lot_ clearer what actually gets called in that 
> whole 'exec_permission_lite()' function that we use to check the 
> permission of a pathname lookup.
> 
> Comments?  Especially from the IMA people (first patch) and from generic 
> VFS, security and low-level FS people (the 'Simplify exec_permission_lite' 
> series, and then the check_acl + per-filesystem changes).
> 
> Al?
> 
> I'm looking to merge these shortly after 2.6.31 is released, but comments 
> welcome.

All of them seem good, and I don't see any thinkos, no resulting skipped
checks or anything.

Acked-by: Serge Hallyn <serue@us.ibm.com>

-serge

  parent reply	other threads:[~2009-09-08 14:21 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-07 21:01 [PATCH 0/8] VFS name lookup permission checking cleanup Linus Torvalds
2009-09-07 21:02 ` [PATCH 1/8] Do not call 'ima_path_check()' for each path component Linus Torvalds
2009-09-07 21:02   ` [PATCH 2/8] Simplify exec_permission_lite() logic Linus Torvalds
2009-09-07 21:03     ` [PATCH 3/8] Simplify exec_permission_lite() further Linus Torvalds
2009-09-07 21:03       ` [PATCH 4/8] Simplify exec_permission_lite(), part 3 Linus Torvalds
2009-09-07 21:04         ` [PATCH 5/8] Make 'check_acl()' a first-class filesystem op Linus Torvalds
2009-09-07 21:04           ` [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission' Linus Torvalds
2009-09-07 21:05             ` [PATCH 7/8] ext[234]: move over to 'check_acl' permission model Linus Torvalds
2009-09-07 21:05               ` [PATCH 8/8] jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()' Linus Torvalds
2009-09-08 18:34                 ` Christoph Hellwig
2009-09-09 17:09                   ` Linus Torvalds
2009-09-07 22:23             ` [PATCH 6/8] shmfs: use 'check_acl' instead of 'permission' James Morris
2009-09-08  0:03               ` James Morris
2009-09-08 18:05             ` Hugh Dickins
2009-09-07 22:20           ` [PATCH 5/8] Make 'check_acl()' a first-class filesystem op James Morris
2009-09-07 22:18         ` [PATCH 4/8] Simplify exec_permission_lite(), part 3 James Morris
2009-09-07 22:15       ` [PATCH 3/8] Simplify exec_permission_lite() further James Morris
2009-09-08 14:40       ` Jamie Lokier
2009-09-08 14:58         ` Matthew Wilcox
2009-09-08 15:02         ` Linus Torvalds
2009-09-07 22:12     ` [PATCH 2/8] Simplify exec_permission_lite() logic James Morris
2009-09-08  1:50   ` [PATCH 1/8] Do not call 'ima_path_check()' for each path component Christoph Hellwig
2009-09-08 14:01 ` Serge E. Hallyn [this message]
2009-09-08 17:52 ` [PATCH 0/8] VFS name lookup permission checking cleanup Mimi Zohar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090908140140.GB873@us.ibm.com \
    --to=serue@us.ibm.com \
    --cc=eparis@redhat.com \
    --cc=jmorris@namei.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zohar@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.