All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Moore <paul@paul-moore.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: viro@zeniv.linux.org.uk, serge@hallyn.com,
	torvalds@linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org
Subject: Re: [PATCH v2 2/2] vfs: avoid duplicating creds in faccessat if possible
Date: Mon, 23 Jan 2023 16:29:59 -0500	[thread overview]
Message-ID: <CAHC9VhTnpWKnKRu3wFTNfub_qdcDePdEXYZWOpvpqL0fcfS_Uw@mail.gmail.com> (raw)
In-Reply-To: <CAGudoHF+bg0qiq+ByVpysa9t8J=zpF8=d1CqDVS5GmOGpVM9rQ@mail.gmail.com>

On Fri, Jan 20, 2023 at 7:50 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> On 1/20/23, Paul Moore <paul@paul-moore.com> wrote:
> > On Mon, Jan 16, 2023 at 4:21 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> >>
> >> access(2) remains commonly used, for example on exec:
> >> access("/etc/ld.so.preload", R_OK)
> >>
> >> or when running gcc: strace -c gcc empty.c
> >> % time     seconds  usecs/call     calls    errors syscall
> >> ------ ----------- ----------- --------- --------- ----------------
> >>   0.00    0.000000           0        42        26 access
> >>
> >> It falls down to do_faccessat without the AT_EACCESS flag, which in turn
> >> results in allocation of new creds in order to modify fsuid/fsgid and
> >> caps. This is a very expensive process single-threaded and most notably
> >> multi-threaded, with numerous structures getting refed and unrefed on
> >> imminent new cred destruction.
> >>
> >> Turns out for typical consumers the resulting creds would be identical
> >> and this can be checked upfront, avoiding the hard work.
> >>
> >> An access benchmark plugged into will-it-scale running on Cascade Lake
> >> shows:
> >> test    proc    before  after
> >> access1 1       1310582 2908735  (+121%)  # distinct files
> >> access1 24      4716491 63822173 (+1353%) # distinct files
> >> access2 24      2378041 5370335  (+125%)  # same file
> >
> > Out of curiosity, do you have any measurements of the impact this
> > patch has on the AT_EACCESS case when the creds do need to be
> > modified?
>
> I could not be arsed to bench that. I'm not saying there is literally 0
> impact, but it should not be high and the massive win in the case I
> patched imho justifies it.

That's one way to respond to an honest question asking if you've done
any tests on the other side of the change.  I agree the impact should
be less than the advantage you've shown, but sometimes it's nice to
see these things.

> Last week I got a private reply from Linus suggesting the new checks
> only happen once at commit_cred() time, which would mean there would be
> at most one extra branch for the case you are concerned with. However,
> this quickly turn out to be rather hairy as there are games being
> played for example in copy_creds() which assigns them *without* calling
> commit_creds(). I was not comfortable pre-computing without sorting out
> the mess first and he conceded the new branchfest is not necessarily a
> big deal.
>
> That said, if you want some performance recovered for this case, here
> is an easy one:
>
> static const struct cred *access_override_creds(void)
> [..]
>         old_cred = override_creds(override_cred);
>
>         /* override_cred() gets its own ref */
>         put_cred(override_cred);
>
> As in the new creds get refed only to get unrefed immediately after.
> Whacking the 2 atomics should make up for it no problem on x86-64.
>
> Also see below.
>
> >> The above benchmarks are not integrated into will-it-scale, but can be
> >> found in a pull request:
> >> https://github.com/antonblanchard/will-it-scale/pull/36/files
> >>
> >> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
> >>
> >> v2:
> >> - fix current->cred usage warn reported by the kernel test robot
> >> Link: https://lore.kernel.org/all/202301150709.9EC6UKBT-lkp@intel.com/
> >> ---
> >>  fs/open.c | 32 +++++++++++++++++++++++++++++++-
> >>  1 file changed, 31 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/open.c b/fs/open.c
> >> index 82c1a28b3308..3c068a38044c 100644
> >> --- a/fs/open.c
> >> +++ b/fs/open.c
> >> @@ -367,7 +367,37 @@ COMPAT_SYSCALL_DEFINE6(fallocate, int, fd, int, mode,
> >> compat_arg_u64_dual(offset
> >>   * access() needs to use the real uid/gid, not the effective uid/gid.
> >>   * We do this by temporarily clearing all FS-related capabilities and
> >>   * switching the fsuid/fsgid around to the real ones.
> >> + *
> >> + * Creating new credentials is expensive, so we try to skip doing it,
> >> + * which we can if the result would match what we already got.
> >>   */
> >> +static bool access_need_override_creds(int flags)
> >> +{
> >> +       const struct cred *cred;
> >> +
> >> +       if (flags & AT_EACCESS)
> >> +               return false;
> >> +
> >> +       cred = current_cred();
> >> +       if (!uid_eq(cred->fsuid, cred->uid) ||
> >> +           !gid_eq(cred->fsgid, cred->gid))
> >> +               return true;
> >> +
> >> +       if (!issecure(SECURE_NO_SETUID_FIXUP)) {
> >> +               kuid_t root_uid = make_kuid(cred->user_ns, 0);
> >> +               if (!uid_eq(cred->uid, root_uid)) {
> >> +                       if (!cap_isclear(cred->cap_effective))
> >> +                               return true;
> >> +               } else {
> >> +                       if (!cap_isidentical(cred->cap_effective,
> >> +                           cred->cap_permitted))
> >> +                               return true;
> >> +               }
> >> +       }
> >> +
> >> +       return false;
> >> +}
> >
> > I worry a little that with nothing connecting
> > access_need_override_creds() to access_override_creds() there is a bug
> > waiting to happen if/when only one of the functions is updated.
>
> These funcs are literally next to each other, I don't think that is easy
> to miss. I concede a comment in access_override_creds to take a look at
> access_need_override_creds would not hurt, but I don't know if a resend
> to add it is justified.

Perhaps it's because I have to deal with a fair amount of code getting
changed in one place and not another, but I would think that a comment
would be the least one could do here and would justify a respin.

> > Given the limited credential changes in access_override_creds(), I
> > wonder if a better solution would be to see if we could create a
> > light(er)weight prepare_creds()/override_creds() that would avoid some
> > of the prepare_creds() hotspots (I'm assuming that is where most of
> > the time is being spent).  It's possible this could help improve the
> > performance of other, similar operations that need to modify task
> > creds for a brief, and synchronous, period of time.

...

> For a Real Solution(tm) for a general case I think has to start with an
> observartion creds either persist for a long time *OR* keep getting
> recreated. This would suggest holding on to them and looking them up
> instead just allocating, but all this opens another can of worms and
> I don't believe is worth the effort at this stage. But maybe someone
> has a better idea.
>
> That said, for the case of access(), I had the following in mind but
> once more considered it not justified at this stage.
>
> pseudocode-wise:
> struct cred *prepare_shallow_creds(void)
>         new = kmem_cache_alloc(cred_jar, GFP_KERNEL);
>         old = task->cred;
>         memcpy(new, old, sizeof(struct cred));
>
> here new creds have all the same pointers as old, but the target objs
> are only kept alive by the old creds still refing them. So by API
> contract you are required to keep them around.
>
> after you temporarily assign them you call revert_shallow_creds():
>         if (tempcred->usage == 1)
>                 /* nobody refed them, do the non_rcu check */
>                 ...
>         else
>                 /* somebody grabbed them, legitimize creds by
>                  * grabbing the missing refs
>                  */
>                  get_uid(new->user);
>                  get_user_ns(new->user_ns);
>                  get_group_info(new->group_info);
>                  /* and so on */
>
> So this would shave work from the case you are concerned with probably
> for all calls.
>
> I do think this is an ok idea overall, but I felt like overengineering for the
> problem at hand *at the time*.

In my opinion a generalized shallow copy approach has more value than
a one-off solution that has the potential to fall out of sync and
cause a problem in the future (I recognize that you disagree on the
likelihood of this happening).

> For some context, I'm looking at performance of certain VFS stuff and
> there is some serious fish to fry in the fstat department.

I assumed it was part of some larger perf work, but I'm not sure the
context is that important for this particular discussion.

> The patch I
> posted is definitely worthwhile perf-wise and easy enough to reason
> about that I did no expect much opposition to it. If anything I expected
> opposition to the idea outlined above.

Ultimately it's a call for the VFS folks as they are responsible for
the access() code.

--
paul-moore.com

  parent reply	other threads:[~2023-01-23 21:30 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-16 21:21 [PATCH v2 1/2] capability: add cap_isidentical Mateusz Guzik
2023-01-16 21:21 ` [PATCH v2 2/2] vfs: avoid duplicating creds in faccessat if possible Mateusz Guzik
2023-01-20 20:45   ` Paul Moore
2023-01-21  0:50     ` Mateusz Guzik
2023-01-21  1:18       ` Mateusz Guzik
2023-01-23 21:29       ` Paul Moore [this message]
2023-01-24 10:16         ` Mateusz Guzik
2023-01-24 17:00           ` Paul Moore
2023-01-24 17:14             ` Linus Torvalds
2023-01-24 18:42               ` Mateusz Guzik
2023-01-24 20:37                 ` Linus Torvalds
2023-01-24 21:33               ` Paul Moore
2023-01-25 15:00                 ` Mateusz Guzik
2023-01-25 16:17                   ` Mateusz Guzik
2023-01-25 17:11                     ` Paul Moore
2023-01-25 17:07                   ` Paul Moore

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHC9VhTnpWKnKRu3wFTNfub_qdcDePdEXYZWOpvpqL0fcfS_Uw@mail.gmail.com \
    --to=paul@paul-moore.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=mjguzik@gmail.com \
    --cc=serge@hallyn.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.