linux-modules.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-modules@vger.kernel.org,
	mcgrof@kernel.org, rostedt@goodmis.org, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, mhiramat@kernel.org,
	naveen.n.rao@linux.ibm.com, davem@davemloft.net,
	anil.s.keshavamurthy@intel.com, keescook@chromium.org,
	hch@infradead.org, dave@stgolabs.net, daniel@iogearbox.net,
	kernel-team@fb.com, x86@kernel.org, dave.hansen@linux.intel.com,
	rick.p.edgecombe@intel.com, akpm@linux-foundation.org
Subject: Re: [PATCH bpf-next 1/3] mm/vmalloc: introduce vmalloc_exec which allocates RO+X memory
Date: Wed, 13 Jul 2022 12:20:09 +0200	[thread overview]
Message-ID: <Ys6cWUMHO8XwyYgr@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20220713071846.3286727-2-song@kernel.org>

On Wed, Jul 13, 2022 at 12:18:44AM -0700, Song Liu wrote:
> Dynamically allocated kernel texts, such as module texts, bpf programs,
> and ftrace trampolines, are used in more and more scenarios. Currently,
> these users allocate meory with module_alloc, fill the memory with text,
> and then use set_memory_[ro|x] to protect the memory.
> 
> This approach has two issues:
>  1) each of these user occupies one or more RO+X page, and thus one or
>     more entry in the page table and the iTLB;
>  2) frequent allocate/free of RO+X pages causes fragmentation of kernel
>     direct map [1].
> 
> BPF prog pack [2] addresses this from the BPF side. Now, make the same
> logic available to other users of dynamic kernel text.
> 
> The new API is like:
> 
>   void *vmalloc_exec(size_t size);
>   void vfree_exec(void *addr, size_t size);
> 
> vmalloc_exec has different handling for small and big allocations
> (> PMD_SIZE * num_possible_nodes). bigger allocations have dedicated
> vmalloc allocation; while small allocations share a vmalloc_exec_pack
> with other allocations.
> 
> Once allocated, the vmalloc_exec_pack is filled with invalid instructions

*sigh*, again, INT3 is a *VALID* instruction.

> and protected with RO+X. Some text_poke feature is required to make
> changes to the vmalloc_exec_pack. Therefore, vmalloc_exec requires changes
> from the arch (to provide text_poke family APIs), and the user (to use
> text poke APIs to make any changes to the memory).

I hate the naming; this isn't just vmalloc, this is a whole different
allocator build on top of things.

I'm also not convinced this is the right way to go about doing this;
much of the design here is because of how the module range is mixing
text and data and working around that.

So how about instead we separate them? Then much of the problem goes
away, you don't need to track these 2M chunks at all.

Start by adding VM_TOPDOWN_VMAP, which instead of returning the lowest
(leftmost) vmap_area that fits, picks the higests (rightmost).

Then add module_alloc_data() that uses VM_TOPDOWN_VMAP and make
ARCH_WANTS_MODULE_DATA_IN_VMALLOC use that instead of vmalloc (with a
weak function doing the vmalloc).

This gets you bottom of module range is RO+X only, top is shattered
between different !X types.

Then track the boundary between X and !X and ensure module_alloc_data()
and module_alloc() never cross over and stay strictly separated.

Then change all module_alloc() users to expect RO+X memory, instead of
RW.

Then make sure any extention of the X range is 2M aligned.

And presto, *everybody* always uses 2M TLB for text, modules, bpf,
ftrace, the lot and nobody is tracking chunks.

Maybe migration can be eased by instead providing module_alloc_text()
and ARCH_WANTS_MODULE_ALLOC_TEXT.

  parent reply	other threads:[~2022-07-13 10:20 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20220713071846.3286727-1-song@kernel.org>
     [not found] ` <20220713071846.3286727-2-song@kernel.org>
2022-07-13  9:53   ` [PATCH bpf-next 1/3] mm/vmalloc: introduce vmalloc_exec which allocates RO+X memory Peter Zijlstra
2022-07-13 10:08   ` Christoph Hellwig
2022-07-13 15:49     ` Song Liu
2022-07-14  4:23       ` Christoph Hellwig
2022-07-14  4:54         ` Song Liu
2022-07-14 18:15           ` Uladzislau Rezki
2022-07-15  0:24             ` Song Liu
2022-07-13 10:20   ` Peter Zijlstra [this message]
2022-07-13 15:48     ` Song Liu
2022-07-13 20:26       ` Peter Zijlstra
2022-07-13 21:20         ` Song Liu
2022-07-14 10:10           ` Peter Zijlstra
2022-07-14  5:16     ` Christoph Hellwig
2022-07-14  7:26       ` Peter Zijlstra
2022-08-05  5:29     ` Song Liu
2022-08-05  5:29     ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ys6cWUMHO8XwyYgr@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=anil.s.keshavamurthy@intel.com \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@stgolabs.net \
    --cc=davem@davemloft.net \
    --cc=hch@infradead.org \
    --cc=keescook@chromium.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-modules@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=song@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).