linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yury Norov <yury.norov@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>,
	Alexander Potapenko <glider@google.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Kees Cook <keescook@chromium.org>,
	Eric Biggers <ebiggers@google.com>,
	Christian Brauner <brauner@kernel.org>,
	serge@hallyn.com, paul@paul-moore.com,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org
Subject: Re: [PATCH v3 2/2] vfs: avoid duplicating creds in faccessat if possible
Date: Fri, 3 Mar 2023 21:51:41 -0800	[thread overview]
Message-ID: <ZALcbQoKA7K8k2gJ@yury-laptop> (raw)
In-Reply-To: <CAHk-=wh1r3KfATA-JSdt3qt2y3sC=5U9+wZsbabW+dvPsqRCvA@mail.gmail.com>

On Fri, Mar 03, 2023 at 07:42:36PM -0800, Linus Torvalds wrote:
> ing: quoted-printable
> Status: O
> Content-Length: 1593
> Lines: 41
> 
> On Fri, Mar 3, 2023 at 7:25 PM Yury Norov <yury.norov@gmail.com> wrote:
> >
> > Did you enable CONFIG_FORCE_NR_CPUS? If you pick it, the kernel will
> > bind nr_cpu_ids to NR_CPUS at compile time, and the memset() call
> > should disappear.
> 
> I do not believe CONFIG_FORCE_NR_CPUS makes any sense, and I think I
> told you so at the time.

At that time you was OK with CONFIG_FORCE_NR_CPUS, only suggested to
hide it behind CONFIG_EXPERT:

https://lore.kernel.org/all/Yzx4fSmmr8bh6gdl@yury-laptop/T/#m92d405527636154c3b2000e0105379170d988315
 
> This all used to just work *without* some kind of config thing, First
> removing the automatic "do the right thing", and then adding a config
> option to "force" doing the right thing seems more than a bit silly to
> me.
> 
> I think CONFIG_FORCE_NR_CPUS should go away, and - once more - become
> just the "is the cpumask small enough to be just allocated directly"
> thing.

This all was just broken. For example, as I mentioned in commit message,
cpumask_full() was broken. I know because I wrote a test. There were no
a single user for the function, and nobody complained. Now we have one
in BPF code. So if we simply revert the aa47a7c215e, it will hurt real
users.

The pre-CONFIG_FORCE_NR_CPUS cpumask machinery would work only if you
set NR_CPUS to the number that matches to the actual number of CPUs as
detected at boot time.

In your example, if you have NR_CPUS == 64, and for some reason disable
hyper threading, nr_cpumask_bits will be set to 64 at compile time, but
nr_cpu_ids will be set to 32 at boot time, assuming
CONFIG_CPUMASK_OFFSTACK is disabled.

And the following code will be broken:

cpumask_t m1, m2;

cpumask_setall(m1); // m1 is ffff ffff ffff ffff because it uses
                    // compile-time optimized nr_cpumask_bits

for_each_cpu(cpu, m1) // 32 iterations because it relied on nr_cpu_ids
        cpumask_set_cpu(cpu, m2); // m2 is ffff ffff XXXX XXXX

BUG_ON(!cpumask_equal(m1, m2)); // Bug because it will test all 64 bits

Today with CONFIG_FORCE_NR_CPUS disabled, kernel consistently relies
on boot-time defined nr_cpu_ids in functions like cpumask_equal()
with the cost of disabled runtime optimizations.

If CONFIG_FORCE_NR_CPUS is enabled, it wires nr_cpu_ids to NR_CPUS
at compile time, which allows compile-time optimization.

If CONFIG_FORCE_NR_CPUS is enabled, but actual number of CPUs doesn't
match to NR_CPUS, the kernel throws a warning at boot time - better
than nothing.

I'm not happy bothering people with a new config parameter in such a
simple case. I just don't know how to fix it better. Is there a safe
way to teach compiler to optimize against NR_CPUS other than telling
it explicitly?

> Of course, the problem for others remain that distros will do that
> CONFIG_CPUMASK_OFFSTACK thing, and then things will suck regardless.
> 
> I was *so* happy with our clever "you can have large cpumasks, and
> we'll just allocate them off the stack" long long ago, because it
> meant that we could have one single source tree where this was all
> cleanly abstracted away, and we even had nice types and type safety
> for it all.
> 
> That meant that we could support all the fancy SGI machines with
> several thousand cores, and it all "JustWorked(tm)", and didn't make
> the normal case any worse.
> 
> I didn't expect distros to then go "ooh, we want that too", and enable
> it all by default, and make all our clever "you only see this
> indirection if you need it" go away, and now the normal case is the
> *bad* case, unless you just build your own kernel and pick sane
> defaults.
>
> Oh well.

From distro people's perspective, 'one size fits all' is the best
approach. It's hard to blame them.

Thanks,
Yury

  reply	other threads:[~2023-03-04  5:51 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-25 15:55 [PATCH v3 1/2] capability: add cap_isidentical Mateusz Guzik
2023-01-25 15:55 ` [PATCH v3 2/2] vfs: avoid duplicating creds in faccessat if possible Mateusz Guzik
2023-02-28  0:44   ` Linus Torvalds
2023-03-02  8:30     ` Christian Brauner
2023-03-02 17:51       ` Linus Torvalds
2023-03-02 18:14         ` Mateusz Guzik
2023-03-02 18:18           ` Al Viro
2023-03-02 18:22             ` Mateusz Guzik
2023-03-02 18:43               ` Al Viro
2023-03-02 18:51                 ` Mateusz Guzik
2023-03-02 19:02                 ` Al Viro
2023-03-02 19:18                   ` Al Viro
2023-03-02 19:03               ` Linus Torvalds
2023-03-02 19:10                 ` Linus Torvalds
2023-03-02 19:19                   ` Al Viro
2023-03-02 19:54                     ` Kees Cook
2023-03-02 20:11                       ` Al Viro
2023-03-03 15:30                         ` Alexander Potapenko
2023-03-03 17:39                           ` Mateusz Guzik
2023-03-03 17:54                             ` Linus Torvalds
2023-03-03 19:37                               ` Mateusz Guzik
2023-03-03 19:38                                 ` Mateusz Guzik
2023-03-03 20:08                                 ` Linus Torvalds
2023-03-03 20:39                                   ` Mateusz Guzik
2023-03-03 20:58                                     ` Linus Torvalds
2023-03-03 21:09                                       ` Mateusz Guzik
2023-03-04 19:01                                       ` Mateusz Guzik
2023-03-04 20:31                                         ` Mateusz Guzik
2023-03-04 20:48                                           ` Linus Torvalds
2023-03-05 17:23                                             ` David Laight
2023-03-04  1:29                                     ` Linus Torvalds
2023-03-04  3:25                                       ` Yury Norov
2023-03-04  3:42                                         ` Linus Torvalds
2023-03-04  5:51                                           ` Yury Norov [this message]
2023-03-04 16:41                                             ` David Vernet
2023-03-04 19:02                                             ` Linus Torvalds
2023-03-04 19:19                                             ` Linus Torvalds
2023-03-04 20:34                                               ` Linus Torvalds
2023-03-04 20:51                                               ` Yury Norov
2023-03-04 21:01                                                 ` Linus Torvalds
2023-03-04 21:03                                                   ` Linus Torvalds
2023-03-04 21:10                                                   ` Linus Torvalds
2023-03-04 23:08                                                     ` Linus Torvalds
2023-03-04 23:52                                                       ` Linus Torvalds
2023-03-05  9:26                                                       ` Sedat Dilek
2023-03-05 18:17                                                         ` Linus Torvalds
2023-03-05 18:43                                                           ` Linus Torvalds
2023-03-06  5:43                                                       ` Yury Norov
2023-03-04 20:18                                     ` Al Viro
2023-03-04 20:42                                       ` Mateusz Guzik
2023-03-02 19:38                   ` Kees Cook
2023-03-02 19:48                     ` Eric Biggers
2023-03-02 18:41             ` Al Viro
2023-03-03 14:49         ` Christian Brauner
2023-03-02 18:11       ` Al Viro
2023-03-03 14:27         ` Christian Brauner
2023-02-28  1:14 ` [PATCH v3 1/2] capability: add cap_isidentical Linus Torvalds
2023-02-28  2:46   ` Casey Schaufler
2023-02-28 14:47     ` Mateusz Guzik
2023-02-28 19:39       ` Linus Torvalds
2023-02-28 19:51         ` Linus Torvalds
2023-02-28 20:48         ` Linus Torvalds
2023-02-28 21:21           ` Mateusz Guzik
2023-02-28 21:29             ` Linus Torvalds
2023-03-01 18:13               ` Linus Torvalds
2023-02-28 17:32     ` Serge E. Hallyn
2023-02-28 17:52       ` Casey Schaufler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZALcbQoKA7K8k2gJ@yury-laptop \
    --to=yury.norov@gmail.com \
    --cc=brauner@kernel.org \
    --cc=ebiggers@google.com \
    --cc=glider@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=mjguzik@gmail.com \
    --cc=paul@paul-moore.com \
    --cc=serge@hallyn.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).