linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Wedson Almeida Filho <wedsonaf@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Kees Cook <keescook@chromium.org>,
	Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>,
	Konstantin Shelekhin <k.shelekhin@yadro.com>,
	ojeda@kernel.org, alex.gaynor@gmail.com, ark.email@gmail.com,
	bjorn3_gh@protonmail.com, bobo1239@web.de, bonifaido@gmail.com,
	boqun.feng@gmail.com, davidgow@google.com, dev@niklasmohrin.de,
	dsosnowski@dsosnowski.pl, foxhlchen@gmail.com, gary@garyguo.net,
	geofft@ldpreload.com, gregkh@linuxfoundation.org,
	jarkko@kernel.org, john.m.baublitz@gmail.com,
	leseulartichaut@gmail.com, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, m.falkowski@samsung.com,
	me@kloenk.de, milan@mdaverde.com, mjmouse9999@gmail.com,
	patches@lists.linux.dev, rust-for-linux@vger.kernel.org,
	thesven73@gmail.com, viktor@v-gar.de,
	Andreas Hindborg <andreas.hindborg@wdc.com>
Subject: Re: [PATCH v9 12/27] rust: add `kernel` crate
Date: Mon, 19 Sep 2022 09:09:40 -0700	[thread overview]
Message-ID: <CAHk-=wgaBaVaK2K=N05fwWSSLM6YJx=yLmP4f7j6d6o=nCAtdw@mail.gmail.com> (raw)
In-Reply-To: <Yyh3kFUvt2aMh4nq@wedsonaf-dev>

On Mon, Sep 19, 2022 at 7:07 AM Wedson Almeida Filho <wedsonaf@gmail.com> wrote:
>
> For GFP_ATOMIC, we could use preempt_count except that it isn't always
> enabled. Conveniently, it is already separated out into its own config.
> How do people feel about removing CONFIG_PREEMPT_COUNT and having the
> count always enabled?
>
> We would then have a way to reliably detect when we are in atomic
> context [..]

No.

First off, it's not true. There are non-preempt atomic contexts too,
like interrupts disabled etc. Can you enumerate all those? Possibly.

But more importantly, doing "depending on context, I silently and
automatically do different things" is simply WRONG. Don't do it. It's
a disaster.

Doing that for *debugging* is fine. So having a

        WARN_ON_ONCE(in_atomic_context());

is a perfectly fine debug check to find people who do bad bad things,
and we have lots of variations of that theme (ie might_sleep(), but
also things like lockdep_assert_held() and friends that assert other
constraints entirely).

But having *behavior changes* depending on context is a total
disaster. And that's invariably why people want this disgusting thing.

They want to do broken things like "I want to allocate memory, and I
don't want to care where I am, so I want the memory allocator to just
do the whole GFP_ATOMIC for me".

And that is FUNDAMENTALLY BROKEN.

If you want to allocate memory, and you don't want to care about what
context you are in, or whether you are holding spinlocks etc, then you
damn well shouldn't be doing kernel programming. Not in C, and not in
Rust.

It really is that simple. Contexts like this ("I am in a critical
region, I must not do memory allocation or use sleeping locks") is
*fundamental* to kernel programming. It has nothing to do with the
language, and everything to do with the problem space.

So don't go down this "let's have the allocator just know if you're in
an atomic context automatically" path. It's wrong. It's complete
garbage. It may generate kernel code that superficially "works", but
one that is fundamentally broken, and will fail and becaome unreliable
under memory pressure.

The thing is, when you do kernel programming, and you're holding a
spinlock and need to allocate memory, you generally shouldn't allocate
memory at all, you should go "Oh, maybe I need to do the allocation
*before* getting the lock".

And it's not just locks. It's also "I'm in a hardware interrupt", but
now the solution is fundamentally different. Maybe you still want to
do pre-allocation, but now you're a driver interrupt and the
pre-allocation has to happen in another code sequence entirely,
because obviously the interrupt itself is asynchronous.

But more commonly, you just want to use GFP_ATOMIC, and go "ok, I know
the VM layer tries to keep a _limited_ set of pre-allocated buffers
around".

But it needs to be explicit, because that GFP_ATOMIC pool of
allocations really is very limited, and you as the allocator need to
make it *explicit* that yeah, now you're not just doing a random
allocation, you are doing one of these *special* allocations that will
eat into that very limited global pool of allocations.

So no, you cannot and MUST NOT have an allocator that silently just
dips into that special pool, without the user being aware or
requesting it.

That really is very very fundamental. Allocators that "just work" in
different contexts are broken garbage within the context of a kernel.

Please just accept this, and really *internalize* it.  Because this
isn't actually just about allocators. Allocators may be one very
common special case of this kind of issue, and they come up quite
often as a result, but this whole "your code needs to *understand* the
special restrictions that the kernel is under" is something that is
quite fundamental in general.

It shows up in various other situations too, like "Oh, this code can
run in an interrupt handler" (which is *different* from the atomicity
of just "while holding a lock", because it implies a level of random
nesting that is very relevant for locking).

Or sometimes it's subtler things than just correctness, ie "I'm
running under a critical lock, so I must be very careful because there
are scalability issues". The code may *work* just fine, but things
like tasklist_lock are very very special.

So embrace that kernel requirement. Kernels are special. We do things,
and we have constraints that pretty much no other environment has.

It is often what makes kernel programming interesting - because it's
challenging. We have a very limited stack. We have some very direct
and deep interactions with the CPU, with things like IO and
interrupts. Some constraints are always there, and others are
context-dependent.

The whole "really know what context this code is running within" is
really important. You may want to write very explicit comments about
it.

And you definitely should always write your code knowing about it.

              Linus

  reply	other threads:[~2022-09-19 16:10 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-05 15:41 [PATCH v9 00/27] Rust support Miguel Ojeda
2022-08-05 15:41 ` [PATCH v9 01/27] kallsyms: use `sizeof` instead of hardcoded size Miguel Ojeda
2022-08-05 16:48   ` Geert Stappers
2022-08-05 18:46     ` Miguel Ojeda
2022-08-05 22:40   ` Konstantin Shelekhin
2022-08-17 19:36     ` Kees Cook
2022-08-18  9:03       ` Konstantin Shelekhin
2022-08-18 16:03         ` Kees Cook
2022-09-27 12:48           ` Miguel Ojeda
2022-08-05 15:41 ` [PATCH v9 02/27] kallsyms: avoid hardcoding buffer size Miguel Ojeda
2022-08-17 19:37   ` Kees Cook
2022-08-18 16:50     ` Geert Stappers
2022-08-05 15:41 ` [PATCH v9 03/27] kallsyms: add static relationship between `KSYM_NAME_LEN{,_BUFFER}` Miguel Ojeda
2022-08-17 19:39   ` Kees Cook
2022-08-17 19:50     ` Boqun Feng
2022-08-17 20:31       ` Kees Cook
2022-08-17 20:45         ` Miguel Ojeda
2022-08-05 15:41 ` [PATCH v9 04/27] kallsyms: support "big" kernel symbols Miguel Ojeda
2022-08-05 15:41 ` [PATCH v9 05/27] kallsyms: increase maximum kernel symbol length to 512 Miguel Ojeda
2022-08-05 15:41 ` [PATCH v9 06/27] rust: add C helpers Miguel Ojeda
2022-08-17 19:44   ` Kees Cook
2022-08-17 20:22     ` Miguel Ojeda
2022-08-17 20:34       ` Kees Cook
2022-08-17 21:44         ` Miguel Ojeda
2022-08-17 23:56           ` Kees Cook
2022-08-18 16:03             ` Miguel Ojeda
2022-08-18 16:08               ` Kees Cook
2022-08-18 17:01                 ` Miguel Ojeda
2022-08-05 15:41 ` [PATCH v9 07/27] rust: import upstream `alloc` crate Miguel Ojeda
2022-08-17 20:07   ` Kees Cook
2022-08-17 21:00     ` Miguel Ojeda
2022-08-05 15:41 ` [PATCH v9 08/27] rust: adapt `alloc` crate to the kernel Miguel Ojeda
2022-08-05 15:41 ` [PATCH v9 09/27] rust: add `compiler_builtins` crate Miguel Ojeda
2022-08-17 20:08   ` Kees Cook
2022-08-22 23:55   ` Nick Desaulniers
2022-08-24 18:38     ` Nick Desaulniers
2022-08-29 17:11       ` Gary Guo
2022-08-05 15:41 ` [PATCH v9 10/27] rust: add `macros` crate Miguel Ojeda
2022-08-05 15:41 ` [PATCH v9 11/27] rust: add `bindings` crate Miguel Ojeda
2022-08-05 15:41 ` [PATCH v9 12/27] rust: add `kernel` crate Miguel Ojeda
2022-08-06 10:24   ` Konstantin Shelekhin
2022-08-06 11:22     ` Miguel Ojeda
2022-08-06 12:15       ` Konstantin Shelekhin
2022-08-06 14:57       ` Matthew Wilcox
2022-09-19 14:07         ` Wedson Almeida Filho
2022-09-19 16:09           ` Linus Torvalds [this message]
2022-09-19 17:20             ` Linus Torvalds
2022-09-19 18:05               ` Wedson Almeida Filho
2022-09-19 20:42                 ` Linus Torvalds
2022-09-19 22:35                   ` Wedson Almeida Filho
2022-09-19 23:39                     ` Linus Torvalds
2022-09-19 23:50                       ` Alex Gaynor
2022-09-19 23:58                         ` Linus Torvalds
2022-09-20  0:15                           ` Linus Torvalds
2022-09-20 15:55                             ` Eric W. Biederman
2022-09-20 22:39                               ` Gary Guo
2022-09-21  6:42                                 ` comex
2022-09-21 14:19                                   ` Boqun Feng
2022-10-03  2:03                                     ` comex
2022-09-20  0:40                           ` Boqun Feng
2022-10-03  4:17                             ` Kyle Strand
2022-09-20  0:41                       ` Wedson Almeida Filho
2022-09-21 11:23       ` Konstantin Shelekhin
2022-09-21 11:46         ` Greg KH
2022-08-05 15:41 ` [PATCH v9 13/27] rust: export generated symbols Miguel Ojeda
2022-08-17 20:11   ` Kees Cook
2022-08-05 15:41 ` [PATCH v9 14/27] vsprintf: add new `%pA` format specifier Miguel Ojeda
2022-08-05 15:42 ` [PATCH v9 15/27] scripts: checkpatch: diagnose uses of `%pA` in the C side as errors Miguel Ojeda
2022-08-05 15:42 ` [PATCH v9 16/27] scripts: checkpatch: enable language-independent checks for Rust Miguel Ojeda
2022-08-17 20:12   ` Kees Cook
2022-08-05 15:42 ` [PATCH v9 17/27] scripts: decode_stacktrace: demangle Rust symbols Miguel Ojeda
2022-08-05 15:42 ` [PATCH v9 18/27] scripts: add `generate_rust_analyzer.py` Miguel Ojeda
2022-08-17 20:13   ` Kees Cook
2022-08-05 15:42 ` [PATCH v9 19/27] scripts: add `generate_rust_target.rs` Miguel Ojeda
2022-08-17 20:14   ` Kees Cook
2022-08-05 15:42 ` [PATCH v9 20/27] scripts: add `rust_is_available.sh` Miguel Ojeda
2022-08-17 20:18   ` Kees Cook
2022-08-17 20:40     ` Miguel Ojeda
2022-08-22 20:09   ` Nick Desaulniers
2022-08-23 12:12     ` Miguel Ojeda
2022-08-23 12:16       ` Miguel Ojeda
2022-08-05 15:42 ` [PATCH v9 21/27] scripts: add `is_rust_module.sh` Miguel Ojeda
2022-08-17 20:19   ` Kees Cook
2022-08-05 15:42 ` [PATCH v9 22/27] rust: add `.rustfmt.toml` Miguel Ojeda
2022-08-17 20:19   ` Kees Cook
2022-08-05 15:42 ` [PATCH v9 23/27] Kbuild: add Rust support Miguel Ojeda
2022-08-17 20:26   ` Kees Cook
2022-08-17 20:56     ` Miguel Ojeda
2022-08-22 22:35   ` Nick Desaulniers
2022-09-12 16:07   ` Masahiro Yamada
2022-09-12 16:18     ` Miguel Ojeda
2022-09-13  6:37       ` Masahiro Yamada
2022-08-05 15:42 ` [PATCH v9 24/27] docs: add Rust documentation Miguel Ojeda
2022-08-05 15:42 ` [PATCH v9 25/27] x86: enable initial Rust support Miguel Ojeda
2022-08-17 20:27   ` Kees Cook
2022-08-05 15:42 ` [PATCH v9 26/27] samples: add first Rust examples Miguel Ojeda
2022-08-06 13:14   ` Konstantin Shelekhin
2022-08-17 21:02     ` Miguel Ojeda
2022-08-18  9:04       ` Konstantin Shelekhin
2022-08-17 20:28   ` Kees Cook
2022-08-05 15:42 ` [PATCH v9 27/27] MAINTAINERS: Rust Miguel Ojeda
2022-08-17 20:28   ` Kees Cook
2022-08-17 20:43     ` Miguel Ojeda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgaBaVaK2K=N05fwWSSLM6YJx=yLmP4f7j6d6o=nCAtdw@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=alex.gaynor@gmail.com \
    --cc=andreas.hindborg@wdc.com \
    --cc=ark.email@gmail.com \
    --cc=bjorn3_gh@protonmail.com \
    --cc=bobo1239@web.de \
    --cc=bonifaido@gmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=davidgow@google.com \
    --cc=dev@niklasmohrin.de \
    --cc=dsosnowski@dsosnowski.pl \
    --cc=foxhlchen@gmail.com \
    --cc=gary@garyguo.net \
    --cc=geofft@ldpreload.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jarkko@kernel.org \
    --cc=john.m.baublitz@gmail.com \
    --cc=k.shelekhin@yadro.com \
    --cc=keescook@chromium.org \
    --cc=leseulartichaut@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.falkowski@samsung.com \
    --cc=me@kloenk.de \
    --cc=miguel.ojeda.sandonis@gmail.com \
    --cc=milan@mdaverde.com \
    --cc=mjmouse9999@gmail.com \
    --cc=ojeda@kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=thesven73@gmail.com \
    --cc=viktor@v-gar.de \
    --cc=wedsonaf@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).