linux-bcachefs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Kent Overstreet <kent.overstreet@linux.dev>,
	Kees Cook <keescook@chromium.org>
Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-bcachefs@vger.kernel.org" <linux-bcachefs@vger.kernel.org>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	"hch@infradead.org" <hch@infradead.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-hardening@vger.kernel.org"
	<linux-hardening@vger.kernel.org>
Subject: Re: [PATCH 07/32] mm: Bring back vmalloc_exec
Date: Fri, 16 Jun 2023 21:13:22 -0700	[thread overview]
Message-ID: <1d249326-e3dd-9c9d-7b53-2fffeb39bfb4@kernel.org> (raw)
In-Reply-To: <ZGPzocRpSlg+4vgN@moria.home.lan>

On 5/16/23 14:20, Kent Overstreet wrote:
> On Tue, May 16, 2023 at 02:02:11PM -0700, Kees Cook wrote:
>> For something that small, why not use the text_poke API?
> 
> This looks like it's meant for patching existing kernel text, which
> isn't what I want - I'm generating new functions on the fly, one per
> btree node.

Dynamically generating code is a giant can of worms.

Kees touched on a basic security thing: a linear address mapped W+X is a big
no-no.  And that's just scratching the surface -- ideally we would have a
strong protocol for generating code: the code is generated in some
extra-secure context, then it's made immutable and double-checked, then
it becomes live.  (And we would offer this to userspace, some day.)
Just having a different address for the W and X aliases is pretty weak.

(When x86 modifies itself at boot or for static keys, it changes out the
page tables temporarily.)

And even beyond security, we have correctness.  x86 is a fairly 
forgiving architecture.  If you go back in time about 20 years, modify
some code *at the same linear address at which you intend to execute 
it*, and jump to it, it works.  It may even work if you do it through
an alias (the manual is vague).  But it's not 20 years ago, and you have
multiple cores.  This does *not* work with multiple CPUs -- you need to 
serialize on the CPU executing the modified code.  On all the but the 
very newest CPUs, you need to kludge up the serialization, and that's
sloooooooooooooow.  Very new CPUs have the SERIALIZE instruction, which
is merely sloooooow.

(The manual is terrible.  It's clear that a way to do this without 
serializing must exist, because that's what happens when code is paged 
in from a user program.)

And remember that x86 is the forgiving architecture.  Other 
architectures have their own rules that may involve all kinds of 
terrifying cache management.  IIRC ARM (32-bit) is really quite nasty in 
this regard.  I've seen some references suggesting that RISC-V has a 
broken design of its cache management and this is a real mess.

x86 low level stuff on Linux gets away with it because the 
implementation is conservative and very slow, but it's very rarely invoked.

eBPF gets away with it in ways that probably no one really likes, but 
also no one expects eBPF to load programs particularly quickly.

You are proposing doing this when a btree node is loaded.  You could 
spend 20 *thousand* cycles, on *each CPU*, the first time you access 
that node, not to mention the extra branch to decide whether you need to 
spend those 20k cycles.  Or you could use IPIs.

Or you could just not do this.  I think you should just remove all this 
dynamic codegen stuff, at least for now.

> 
> I'm working up a new allocator - a (very simple) slab allocator where
> you pass a buffer, and it gives you a copy of that buffer mapped
> executable, but not writeable.
> 
> It looks like we'll be able to convert bpf, kprobes, and ftrace
> trampolines to it; it'll consolidate a fair amount of code (particularly
> in bpf), and they won't have to burn a full page per allocation anymore.
> 
> bpf has a neat trick where it maps the same page in two different
> locations, one is the executable location and the other is the writeable
> location - I'm stealing that.
> 
> external api will be:
> 
> void *jit_alloc(void *buf, size_t len, gfp_t gfp);
> void jit_free(void *buf);
> void jit_update(void *buf, void *new_code, size_t len); /* update an existing allocation */

Based on the above, I regret to inform you that jit_update() will either 
need to sync all cores via IPI or all cores will need to check whether a 
sync is needed and do it themselves.

That IPI could be, I dunno, 500k cycles?  1M cycles?  Depends on what 
cores are asleep at the time.  (I have some old Sandy Bridge machines 
where, if you tick all the boxes wrong, you might spend tens of 
milliseconds doing this due to power savings gone wrong.)  Or are you 
planning to implement a fancy mostly-lockless thing to track which cores 
actually need the IPI so you can avoid waking up sleeping cores?

Sorry to be a party pooper.

--Andy

P.S. I have given some thought to how to make a JIT API that was 
actually (somewhat) performant.  It's nontrivial, and it would involve 
having at least phone calls and possibly actual meetings with people who 
understand the microarchitecture of various CPUs to get all the details 
hammered out and documented properly.

I don't think it would be efficient for teeny little functions like 
bcachefs wants, but maybe?  That would be even more complex and messy.

  parent reply	other threads:[~2023-06-17  4:13 UTC|newest]

Thread overview: 186+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-09 16:56 [PATCH 00/32] bcachefs - a new COW filesystem Kent Overstreet
2023-05-09 16:56 ` [PATCH 01/32] Compiler Attributes: add __flatten Kent Overstreet
2023-05-09 17:04   ` Miguel Ojeda
2023-05-09 17:24     ` Kent Overstreet
2023-05-09 16:56 ` [PATCH 02/32] locking/lockdep: lock_class_is_held() Kent Overstreet
2023-05-09 19:30   ` Peter Zijlstra
2023-05-09 20:11     ` Kent Overstreet
2023-05-09 16:56 ` [PATCH 03/32] locking/lockdep: lockdep_set_no_check_recursion() Kent Overstreet
2023-05-09 19:31   ` Peter Zijlstra
2023-05-09 19:57     ` Kent Overstreet
2023-05-09 20:18     ` Kent Overstreet
2023-05-09 20:27       ` Waiman Long
2023-05-09 20:35         ` Kent Overstreet
2023-05-09 21:37           ` Waiman Long
2023-05-10  8:59       ` Peter Zijlstra
2023-05-10 20:38         ` Kent Overstreet
2023-05-11  8:25           ` Peter Zijlstra
2023-05-11  9:32             ` Kent Overstreet
2023-05-12 20:49         ` Kent Overstreet
2023-05-09 16:56 ` [PATCH 04/32] locking: SIX locks (shared/intent/exclusive) Kent Overstreet
2023-05-11 12:14   ` Jan Engelhardt
2023-05-12 20:58     ` Kent Overstreet
2023-05-12 22:39       ` Jan Engelhardt
2023-05-12 23:26         ` Kent Overstreet
2023-05-12 23:49           ` Randy Dunlap
2023-05-13  0:17             ` Kent Overstreet
2023-05-13  0:45               ` Eric Biggers
2023-05-13  0:51                 ` Kent Overstreet
2023-05-14 12:15   ` Jeff Layton
2023-05-15  2:39     ` Kent Overstreet
2023-05-09 16:56 ` [PATCH 05/32] MAINTAINERS: Add entry for six locks Kent Overstreet
2023-05-09 16:56 ` [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping Kent Overstreet
2023-05-10  1:07   ` Jan Kara
2023-05-10  6:18     ` Kent Overstreet
2023-05-23 13:34       ` Jan Kara
2023-05-23 16:21         ` [Cluster-devel] " Christoph Hellwig
2023-05-23 16:35           ` Kent Overstreet
2023-05-24  6:43             ` Christoph Hellwig
2023-05-24  8:09               ` Kent Overstreet
2023-05-25  8:58                 ` Christoph Hellwig
2023-05-25 20:50                   ` Kent Overstreet
2023-05-26  8:06                     ` Christoph Hellwig
2023-05-26  8:34                       ` Kent Overstreet
2023-05-25 21:40                   ` Kent Overstreet
2023-05-25 22:25           ` Andreas Grünbacher
2023-05-25 23:20             ` Kent Overstreet
2023-05-26  0:05               ` Andreas Grünbacher
2023-05-26  0:39                 ` Kent Overstreet
2023-05-26  8:10               ` Christoph Hellwig
2023-05-26  8:38                 ` Kent Overstreet
2023-05-23 16:49         ` Kent Overstreet
2023-05-25  8:47           ` Jan Kara
2023-05-25 21:36             ` Kent Overstreet
2023-05-25 22:45             ` Andreas Grünbacher
2023-05-25 22:04         ` Andreas Grünbacher
2023-05-09 16:56 ` [PATCH 07/32] mm: Bring back vmalloc_exec Kent Overstreet
2023-05-09 18:19   ` Lorenzo Stoakes
2023-05-09 20:15     ` Kent Overstreet
2023-05-09 20:46   ` Christoph Hellwig
2023-05-09 21:12     ` Lorenzo Stoakes
2023-05-09 21:29       ` Kent Overstreet
2023-05-10  6:48         ` Eric Biggers
2023-05-12 18:36           ` Kent Overstreet
2023-05-13  1:57             ` Eric Biggers
2023-05-13 19:28               ` Kent Overstreet
2023-05-14  5:45               ` Kent Overstreet
2023-05-14 18:43                 ` Eric Biggers
2023-05-15  5:38                   ` Kent Overstreet
2023-05-15  6:13                     ` Eric Biggers
2023-05-15  6:18                       ` Kent Overstreet
2023-05-15  7:13                         ` Eric Biggers
2023-05-15  7:26                           ` Kent Overstreet
2023-05-21 21:33                             ` Eric Biggers
2023-05-21 22:04                               ` Kent Overstreet
2023-05-15 10:29                 ` David Laight
2023-05-10 11:56         ` David Laight
2023-05-09 21:43       ` Darrick J. Wong
2023-05-09 21:54         ` Kent Overstreet
2023-05-11  5:33           ` Theodore Ts'o
2023-05-11  5:44             ` Kent Overstreet
2023-05-13 13:25       ` Lorenzo Stoakes
2023-05-14 18:39         ` Christophe Leroy
2023-05-14 23:43           ` Kent Overstreet
2023-05-15  4:45             ` Christophe Leroy
2023-05-15  5:02               ` Kent Overstreet
2023-05-10 14:18   ` Christophe Leroy
2023-05-10 15:05   ` Johannes Thumshirn
2023-05-11 22:28     ` Kees Cook
2023-05-12 18:41       ` Kent Overstreet
2023-05-16 21:02         ` Kees Cook
2023-05-16 21:20           ` Kent Overstreet
2023-05-16 21:47             ` Matthew Wilcox
2023-05-16 21:57               ` Kent Overstreet
2023-05-17  5:28               ` Kent Overstreet
2023-05-17 14:04                 ` Mike Rapoport
2023-05-17 14:18                   ` Kent Overstreet
2023-05-17 15:44                     ` Mike Rapoport
2023-05-17 15:59                       ` Kent Overstreet
2023-06-17  4:13             ` Andy Lutomirski [this message]
2023-06-17 15:34               ` Kent Overstreet
2023-06-17 19:19                 ` Andy Lutomirski
2023-06-17 20:08                   ` Kent Overstreet
2023-06-17 20:35                     ` Andy Lutomirski
2023-06-19 19:45                 ` Kees Cook
2023-06-20  0:39                   ` Kent Overstreet
2023-06-19  9:19   ` Mark Rutland
2023-06-19 10:47     ` Kent Overstreet
2023-06-19 12:47       ` Mark Rutland
2023-06-19 19:17         ` Kent Overstreet
2023-06-20 17:42           ` Andy Lutomirski
2023-06-20 18:08             ` Kent Overstreet
2023-06-20 18:15               ` Andy Lutomirski
2023-06-20 18:48                 ` Dave Hansen
2023-06-20 20:18                   ` Kent Overstreet
2023-06-20 20:42                   ` Andy Lutomirski
2023-06-20 22:32                     ` Andy Lutomirski
2023-06-20 22:43                       ` Nadav Amit
2023-06-21  1:27                         ` Andy Lutomirski
2023-05-09 16:56 ` [PATCH 08/32] fs: factor out d_mark_tmpfile() Kent Overstreet
2023-05-09 16:56 ` [PATCH 09/32] block: Add some exports for bcachefs Kent Overstreet
2023-05-09 16:56 ` [PATCH 10/32] block: Allow bio_iov_iter_get_pages() with bio->bi_bdev unset Kent Overstreet
2023-05-09 16:56 ` [PATCH 11/32] block: Bring back zero_fill_bio_iter Kent Overstreet
2023-05-09 16:56 ` [PATCH 12/32] block: Rework bio_for_each_segment_all() Kent Overstreet
2023-05-09 16:56 ` [PATCH 13/32] block: Rework bio_for_each_folio_all() Kent Overstreet
2023-05-09 16:56 ` [PATCH 14/32] block: Don't block on s_umount from __invalidate_super() Kent Overstreet
2023-05-09 16:56 ` [PATCH 15/32] bcache: move closures to lib/ Kent Overstreet
2023-05-10  1:10   ` Randy Dunlap
2023-05-09 16:56 ` [PATCH 16/32] MAINTAINERS: Add entry for closures Kent Overstreet
2023-05-09 17:05   ` Coly Li
2023-05-09 21:03   ` Randy Dunlap
2023-05-09 16:56 ` [PATCH 17/32] closures: closure_wait_event() Kent Overstreet
2023-05-09 16:56 ` [PATCH 18/32] closures: closure_nr_remaining() Kent Overstreet
2023-05-09 16:56 ` [PATCH 19/32] closures: Add a missing include Kent Overstreet
2023-05-09 16:56 ` [PATCH 20/32] vfs: factor out inode hash head calculation Kent Overstreet
2023-05-23  9:27   ` (subset) " Christian Brauner
2023-05-23 22:53     ` Dave Chinner
2023-05-24  6:44       ` Christoph Hellwig
2023-05-24  7:35         ` Dave Chinner
2023-05-24  8:31           ` Christian Brauner
2023-05-24  8:41             ` Kent Overstreet
2023-05-09 16:56 ` [PATCH 21/32] hlist-bl: add hlist_bl_fake() Kent Overstreet
2023-05-10  4:48   ` Dave Chinner
2023-05-23  9:27   ` (subset) " Christian Brauner
2023-05-09 16:56 ` [PATCH 22/32] vfs: inode cache conversion to hash-bl Kent Overstreet
2023-05-10  4:45   ` Dave Chinner
2023-05-16 15:45     ` Christian Brauner
2023-05-16 16:17       ` Kent Overstreet
2023-05-16 23:15         ` Dave Chinner
2023-05-22 13:04           ` Christian Brauner
2023-05-23  9:28   ` (subset) " Christian Brauner
2023-10-19 15:30     ` Mateusz Guzik
2023-10-19 15:59       ` Mateusz Guzik
2023-10-20 11:38         ` Dave Chinner
2023-10-20 17:49           ` Mateusz Guzik
2023-10-21 12:13             ` Mateusz Guzik
2023-10-23  5:10             ` Dave Chinner
2023-10-27 17:13               ` Mateusz Guzik
2023-10-27 18:36                 ` Darrick J. Wong
2023-10-31 11:02                 ` Christian Brauner
2023-10-31 11:31                   ` Mateusz Guzik
2023-11-02  2:36                   ` Kent Overstreet
2023-11-04 20:51                     ` Dave Chinner
2023-05-09 16:56 ` [PATCH 23/32] iov_iter: copy_folio_from_iter_atomic() Kent Overstreet
2023-05-10  2:20   ` kernel test robot
2023-05-11  2:08   ` kernel test robot
2023-05-09 16:56 ` [PATCH 24/32] MAINTAINERS: Add entry for generic-radix-tree Kent Overstreet
2023-05-09 21:03   ` Randy Dunlap
2023-05-09 16:56 ` [PATCH 25/32] lib/generic-radix-tree.c: Don't overflow in peek() Kent Overstreet
2023-05-09 16:56 ` [PATCH 26/32] lib/generic-radix-tree.c: Add a missing include Kent Overstreet
2023-05-09 16:56 ` [PATCH 27/32] lib/generic-radix-tree.c: Add peek_prev() Kent Overstreet
2023-05-09 16:56 ` [PATCH 28/32] stacktrace: Export stack_trace_save_tsk Kent Overstreet
2023-06-19  9:10   ` Mark Rutland
2023-06-19 11:16     ` Kent Overstreet
2023-05-09 16:56 ` [PATCH 29/32] lib/string_helpers: string_get_size() now returns characters wrote Kent Overstreet
2023-07-12 19:58   ` Kees Cook
2023-07-12 20:19     ` Kent Overstreet
2023-07-12 22:38       ` Kees Cook
2023-07-12 23:53         ` Kent Overstreet
2023-07-12 20:23     ` Kent Overstreet
2023-05-09 16:56 ` [PATCH 30/32] lib: Export errname Kent Overstreet
2023-05-09 16:56 ` [PATCH 31/32] lib: add mean and variance module Kent Overstreet
2023-05-09 16:56 ` [PATCH 32/32] MAINTAINERS: Add entry for bcachefs Kent Overstreet
2023-05-09 21:04   ` Randy Dunlap
2023-05-09 21:07     ` Kent Overstreet
2023-06-15 20:41 ` [PATCH 00/32] bcachefs - a new COW filesystem Pavel Machek
2023-06-15 21:26   ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1d249326-e3dd-9c9d-7b53-2fffeb39bfb4@kernel.org \
    --to=luto@kernel.org \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=keescook@chromium.org \
    --cc=kent.overstreet@gmail.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).