All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH] VFS: Cut down inode->i_op->xyz accesses in path walking
Date: Mon, 25 Jul 2011 20:23:38 -0700	[thread overview]
Message-ID: <CA+55aFwNQLXT=4KPsq6xsv-pjk2a=8+mKAe5Jmd4QSnC7-+a3g@mail.gmail.com> (raw)
In-Reply-To: <20110726030531.GE22133@ZenIV.linux.org.uk>

On Mon, Jul 25, 2011 at 8:05 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> We could actually cheat a bit.  *All* inodes that ever reach inode_permission()
> have at some point been pointed to by ->d_inode of some dentry.  It's
> inconvenient as hell (and inviting abuse) to pass such dentry instead
> of inode; moreover, there would be nasty questions of ->d_inode stability
> on RCU path.
>
> However, we can greatly reduce that amount of changes if we do the
> following:
>        * one bit in your new field set when inode is put in ->d_inode.
> Checked at inode_permission() time, BUG_ON() if not set.
>        * other bit(s) set by checking ->i_op contents at the same time,
> if bit hadn't been already set (->i_op can't change afterwards); checked
> by inode_permission() and whatever else might want to (e.g. checks for
> non-NULL ->lookup(); for those we also know that inode went through
> some ->d_inode at some point).
>        * we need to play with setting these flags only in two places -
> __d_instantiate() and d_obtain_alias().

Ugh. That sounds more complex than I'd really like. Smaller change,
yes. But the end result is just more complicated.

What I *did* consider was to have the bits just be "slow case" bits.
And just set them all (blindly) at inode allocation time. Then, the
users would simply end up doing

  static inline int do_inode_permission(struct inode *inode, int mask)
  {
    if (inode->i_opflag & SLOW_PERMISSION) {
      if (inode->i_op->permission)
        return inode->i_op->permission(inode, mask);
      bit_clear(SLOW_PERMISSION, &inode->i_opflag);
    }
    return generic_permission(inode, mask);
  }

which basically would trigger the *really* slow case once (that atomic
bit clear that happens just once - or maybe a couple of concurrent
times for the harmless race), but would have very trivial complexity,
and you could tailor the bits arbitrarily for what you want to test as
the fast case.

The reason I didn't go any further along that way is that we don't
have any nice atomic operations for small fields like that. The bitops
want a "long" entity, and it becomes just a nightmare to work out bit
positions with big-endian etc. And I definitely didn't want to grow
the inode.

But I think it would be a reasonable approach with less complexity,
and we could even use a spinlock (ie i_lock) for the slow case, since
it really should only ever happen once per inode load.

It's absolutely ok to test the i_op fields when we actually use them,
because by then we'll obviously have the whole "populate the cache and
follow the pointer chain" issue anyway. So we have a clear "fast case"
when we simply don't want to even look at that pointer at all, and the
above kind of code would work fine, I think.

That said, the big patch is clearly the simplest approach of them all.
It does have the "did you set i_op with the proper function" of
course, but that' s at least not a *complex* bug, and if you get the
i_opflag value wrong, it will be mostly be quite obvious (eg a NULL
pointer dereference or simply not calling the filesystem permission
check at all).

But I'm not married to any particular solution. It's just annoying how
visible that i_op access is in profiles, and how close we are to not
needing it ;)

                           Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-07-26  3:24 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-22 17:37 VFS pathname walking cleanups (i_op and ACL access) Linus Torvalds
2011-07-22 17:37 ` [PATCH 1/2] VFS: Cut down inode->i_op->xyz accesses in path walking Linus Torvalds
2011-07-22 17:47   ` Christoph Hellwig
2011-07-22 23:40   ` Al Viro
2011-07-22 23:54     ` Linus Torvalds
2011-07-23  3:55   ` [PATCH] " Linus Torvalds
2011-07-23 13:35     ` Christoph Hellwig
2011-07-23 14:46       ` Al Viro
2011-07-23 14:51         ` Christoph Hellwig
2011-07-23 15:45         ` Linus Torvalds
     [not found]     ` <alpine.LFD.2.02.1107251852220.13796@i5.linux-foundation.org>
2011-07-26  3:05       ` Al Viro
2011-07-26  3:23         ` Linus Torvalds [this message]
2011-07-26 18:41           ` Al Viro
2011-07-26 18:45             ` Linus Torvalds
2011-08-07  6:06           ` Linus Torvalds
2011-08-07  6:51             ` Al Viro
2011-08-07 23:43               ` Linus Torvalds
2011-07-22 17:40 ` VFS pathname walking cleanups (i_op and ACL access) Christoph Hellwig
2011-07-22 17:45 ` [PATCH 2/2] vfs: move ACL cache lookup into generic code Linus Torvalds
2011-07-22 17:50   ` Christoph Hellwig
2011-07-22 17:54     ` Linus Torvalds
2011-07-23  2:34   ` [PATCH] " Linus Torvalds
2011-07-23  3:29     ` Al Viro
2011-07-23  3:42       ` Linus Torvalds
2011-07-23  4:31         ` Al Viro
2011-07-23  6:06           ` Al Viro
2011-07-25  8:15             ` Aneesh Kumar K.V
2011-07-25  8:16           ` Aneesh Kumar K.V
2011-07-23  7:47       ` Al Viro
2011-07-23 14:50         ` Christoph Hellwig
2011-07-23 15:32           ` Al Viro
2011-07-23 17:02             ` Al Viro
2011-07-23 17:31               ` Linus Torvalds
2011-07-23 18:20                 ` Al Viro
2011-07-23 18:29                   ` Linus Torvalds
2011-07-23 21:53                     ` Al Viro
2011-07-23 22:38                       ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+55aFwNQLXT=4KPsq6xsv-pjk2a=8+mKAe5Jmd4QSnC7-+a3g@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.