All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Mateusz Guzik <mjguzik@gmail.com>, Yury Norov <yury.norov@gmail.com>
Cc: Alexander Potapenko <glider@google.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Kees Cook <keescook@chromium.org>,
	Eric Biggers <ebiggers@google.com>,
	Christian Brauner <brauner@kernel.org>,
	serge@hallyn.com, paul@paul-moore.com,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org
Subject: Re: [PATCH v3 2/2] vfs: avoid duplicating creds in faccessat if possible
Date: Fri, 3 Mar 2023 17:29:30 -0800	[thread overview]
Message-ID: <CAHk-=wgfNrMFQCFWFtn+UXjAdJAGAAFFJZ1JpEomTneza32A6g@mail.gmail.com> (raw)
In-Reply-To: <CAGudoHGYaWTCnL4GOR+4Lbcfg5qrdOtNjestGZOkgtUaTwdGrQ@mail.gmail.com>

On Fri, Mar 3, 2023 at 12:39 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> as an example here is a one-liner to show crappers which do 0-sized ops:
> bpftrace -e 'kprobe:memset,kprobe:memcpy /arg2 == 0/ { @[probe,
> kstack(2)] = count(); }'

Looking not at 0-sized ops, but crazy small memset() ops, at least
some of these seem to have grown from the bitmap "optimizations".

In particular, 'cpumask_clear()' should just zero the cpumask, and on
the config I use, I have

    CONFIG_NR_CPUS=64

so it should literally just be a single "store zero to cpumask word".
And that's what it used to be.

But then we had commit aa47a7c215e7 ("lib/cpumask: deprecate
nr_cpumask_bits") and suddenly 'nr_cpumask_bits' isn't a simple
constant any more for the "small mask that fits on stack" case, and
instead you end up with code like

        movl    nr_cpu_ids(%rip), %edx
        addq    $63, %rdx
        shrq    $3, %rdx
        andl    $-8, %edx
        ..
        callq   memset@PLT

that does a 8-byte memset because I have 32 cores and 64 threads.

Now, at least some distro kernels seem to be built with CONFIG_MAXSMP,
so CONFIG_NR_CPUS is something insane (namely 8192), and then it is
indeed better to calculate some minimum size instead of doing a 1kB
memset().

But yes, it does look like we've had several regressions in various
areas, where we do insane "memset()" calls with variable size that
should never have been that. Both with kzalloc() and with cpumasks.

There are probably lots of other cases, I just happened to notice the
cpumask one when looking at "why is there a memset call in a function
that shouldn't have any at all".

I don't think this is a "rep stos" problem in general. Obviously it
might be in some cases, but I do think that the deeper problem is that
those small memset() calls generally should not have existed at all,
and were actual bugs. That cpumask_clear() should never have been a
function call in the first place for the "clear one or a couple of
words" case.

The kernel very seldom acts on byte data, so a memset() or memcpy()
that looks like individual bytes is some very odd stuff. We have lots
of often fairly small structures with fixed sizes, though. And maybe
there are other cases like that bogus "let's try to optimize the
bitmap range"...

Of course, then we *do* have distros that do actively detrimental
things. You have Ubuntu with that CONFIG_INIT_ON_ALLOC_DEFAULT_ON, and
if I didn't just rebuild my kernel with sane config options, I'd have
Fedora with CONFIG_MAXSMP that is meant to be a "check worst case"
option, not something you *use*.

Oh well.

          Linus

  parent reply	other threads:[~2023-03-04  1:29 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-25 15:55 [PATCH v3 1/2] capability: add cap_isidentical Mateusz Guzik
2023-01-25 15:55 ` [PATCH v3 2/2] vfs: avoid duplicating creds in faccessat if possible Mateusz Guzik
2023-02-28  0:44   ` Linus Torvalds
2023-03-02  8:30     ` Christian Brauner
2023-03-02 17:51       ` Linus Torvalds
2023-03-02 18:14         ` Mateusz Guzik
2023-03-02 18:18           ` Al Viro
2023-03-02 18:22             ` Mateusz Guzik
2023-03-02 18:43               ` Al Viro
2023-03-02 18:51                 ` Mateusz Guzik
2023-03-02 19:02                 ` Al Viro
2023-03-02 19:18                   ` Al Viro
2023-03-02 19:03               ` Linus Torvalds
2023-03-02 19:10                 ` Linus Torvalds
2023-03-02 19:19                   ` Al Viro
2023-03-02 19:54                     ` Kees Cook
2023-03-02 20:11                       ` Al Viro
2023-03-03 15:30                         ` Alexander Potapenko
2023-03-03 17:39                           ` Mateusz Guzik
2023-03-03 17:54                             ` Linus Torvalds
2023-03-03 19:37                               ` Mateusz Guzik
2023-03-03 19:38                                 ` Mateusz Guzik
2023-03-03 20:08                                 ` Linus Torvalds
2023-03-03 20:39                                   ` Mateusz Guzik
2023-03-03 20:58                                     ` Linus Torvalds
2023-03-03 21:09                                       ` Mateusz Guzik
2023-03-04 19:01                                       ` Mateusz Guzik
2023-03-04 20:31                                         ` Mateusz Guzik
2023-03-04 20:48                                           ` Linus Torvalds
2023-03-05 17:23                                             ` David Laight
2023-03-04  1:29                                     ` Linus Torvalds [this message]
2023-03-04  3:25                                       ` Yury Norov
2023-03-04  3:42                                         ` Linus Torvalds
2023-03-04  5:51                                           ` Yury Norov
2023-03-04 16:41                                             ` David Vernet
2023-03-04 19:02                                             ` Linus Torvalds
2023-03-04 19:19                                             ` Linus Torvalds
2023-03-04 20:34                                               ` Linus Torvalds
2023-03-04 20:51                                               ` Yury Norov
2023-03-04 21:01                                                 ` Linus Torvalds
2023-03-04 21:03                                                   ` Linus Torvalds
2023-03-04 21:10                                                   ` Linus Torvalds
2023-03-04 23:08                                                     ` Linus Torvalds
2023-03-04 23:52                                                       ` Linus Torvalds
2023-03-05  9:26                                                       ` Sedat Dilek
2023-03-05 18:17                                                         ` Linus Torvalds
2023-03-05 18:43                                                           ` Linus Torvalds
2023-03-06  5:43                                                       ` Yury Norov
2023-03-04 20:18                                     ` Al Viro
2023-03-04 20:42                                       ` Mateusz Guzik
2023-03-02 19:38                   ` Kees Cook
2023-03-02 19:48                     ` Eric Biggers
2023-03-02 18:41             ` Al Viro
2023-03-03 14:49         ` Christian Brauner
2023-03-02 18:11       ` Al Viro
2023-03-03 14:27         ` Christian Brauner
2023-02-28  1:14 ` [PATCH v3 1/2] capability: add cap_isidentical Linus Torvalds
2023-02-28  2:46   ` Casey Schaufler
2023-02-28 14:47     ` Mateusz Guzik
2023-02-28 19:39       ` Linus Torvalds
2023-02-28 19:51         ` Linus Torvalds
2023-02-28 20:48         ` Linus Torvalds
2023-02-28 21:21           ` Mateusz Guzik
2023-02-28 21:29             ` Linus Torvalds
2023-03-01 18:13               ` Linus Torvalds
2023-02-28 17:32     ` Serge E. Hallyn
2023-02-28 17:52       ` Casey Schaufler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgfNrMFQCFWFtn+UXjAdJAGAAFFJZ1JpEomTneza32A6g@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=ebiggers@google.com \
    --cc=glider@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=mjguzik@gmail.com \
    --cc=paul@paul-moore.com \
    --cc=serge@hallyn.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.