linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v4 00/15] mm: jit/text allocator
       [not found] <20240411160051.2093261-1-rppt@kernel.org>
@ 2024-04-11 18:00 ` Kent Overstreet
  2024-04-11 19:45 ` Luis Chamberlain
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2024-04-11 18:00 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner, bpf,
	linux-arm-kernel, Thomas Bogendoerfer, linux-parisc,
	Puranjay Mohan, linux-mm, netdev, linux-kernel, Dinh Nguyen,
	Björn Töpel, Eric Chanudet, Palmer Dabbelt,
	Andrew Morton, Rick Edgecombe, linuxppc-dev, David S. Miller,
	linux-modules

On Thu, Apr 11, 2024 at 07:00:36PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> 
> Hi,
> 
> Since v3 I looked into making execmem more of an utility toolbox, as we
> discussed at LPC with Mark Rutland, but it was getting more hairier than
> having a struct describing architecture constraints and a type identifying
> the consumer of execmem.
> 
> And I do think that having the description of architecture constraints for
> allocations of executable memory in a single place is better that having it
> spread all over the place.
> 
> The patches available via git:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v4
> 
> v4 changes:
> * rebase on v6.9-rc2
> * rename execmem_params to execmem_info and execmem_arch_params() to
>   execmem_arch_setup()
> * use single execmem_alloc() API instead of execmem_{text,data}_alloc() (Song)
> * avoid extra copy of execmem parameters (Rick)
> * run execmem_init() as core_initcall() except for the architectures that
>   may allocated text really early (currently only x86) (Will)
> * add acks for some of arm64 and riscv changes, thanks Will and Alexandre
> * new commits:
>   - drop call to kasan_alloc_module_shadow() on arm64 because it's not
>     needed anymore
>   - rename MODULE_START to MODULES_VADDR on MIPS
>   - use CONFIG_EXECMEM instead of CONFIG_MODULES on powerpc as per Christophe:
>     https://lore.kernel.org/all/79062fa3-3402-47b3-8920-9231ad05e964@csgroup.eu/
> 
> v3: https://lore.kernel.org/all/20230918072955.2507221-1-rppt@kernel.org
> * add type parameter to execmem allocation APIs
> * remove BPF dependency on modules
> 
> v2: https://lore.kernel.org/all/20230616085038.4121892-1-rppt@kernel.org
> * Separate "module" and "others" allocations with execmem_text_alloc()
> and jit_text_alloc()
> * Drop ROX entailment on x86
> * Add ack for nios2 changes, thanks Dinh Nguyen
> 
> v1: https://lore.kernel.org/all/20230601101257.530867-1-rppt@kernel.org
> 
> = Cover letter from v1 (sligtly updated) =
> 
> module_alloc() is used everywhere as a mean to allocate memory for code.
> 
> Beside being semantically wrong, this unnecessarily ties all subsystmes
> that need to allocate code, such as ftrace, kprobes and BPF to modules and
> puts the burden of code allocation to the modules code.
> 
> Several architectures override module_alloc() because of various
> constraints where the executable memory can be located and this causes
> additional obstacles for improvements of code allocation.
> 
> A centralized infrastructure for code allocation allows allocations of
> executable memory as ROX, and future optimizations such as caching large
> pages for better iTLB performance and providing sub-page allocations for
> users that only need small jit code snippets.
> 
> Rick Edgecombe proposed perm_alloc extension to vmalloc [1] and Song Liu
> proposed execmem_alloc [2], but both these approaches were targeting BPF
> allocations and lacked the ground work to abstract executable allocations
> and split them from the modules core.
> 
> Thomas Gleixner suggested to express module allocation restrictions and
> requirements as struct mod_alloc_type_params [3] that would define ranges,
> protections and other parameters for different types of allocations used by
> modules and following that suggestion Song separated allocations of
> different types in modules (commit ac3b43283923 ("module: replace
> module_layout with module_memory")) and posted "Type aware module
> allocator" set [4].
> 
> I liked the idea of parametrising code allocation requirements as a
> structure, but I believe the original proposal and Song's module allocator
> was too module centric, so I came up with these patches.
> 
> This set splits code allocation from modules by introducing execmem_alloc()
> and and execmem_free(), APIs, replaces call sites of module_alloc() and
> module_memfree() with the new APIs and implements core text and related
> allocations in a central place.
> 
> Instead of architecture specific overrides for module_alloc(), the
> architectures that require non-default behaviour for text allocation must
> fill execmem_info structure and implement execmem_arch_setup() that returns
> a pointer to that structure. If an architecture does not implement
> execmem_arch_setup(), the defaults compatible with the current
> modules::module_alloc() are used.
> 
> Since architectures define different restrictions on placement,
> permissions, alignment and other parameters for memory that can be used by
> different subsystems that allocate executable memory, execmem APIs
> take a type argument, that will be used to identify the calling subsystem
> and to allow architectures to define parameters for ranges suitable for that
> subsystem.
> 
> The new infrastructure allows decoupling of BPF, kprobes and ftrace from
> modules, and most importantly it paves the way for ROX allocations for
> executable memory.

It looks like you're just doing API cleanup first, then improving the
implementation later?

Patch set looks nice and clean; previous versions did seem to leak too
much arch/module details (or perhaps we were just bikeshedding too much
;) - but the API first approach is nice.

Looking forward to seeing this merged.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
       [not found] ` <20240411160051.2093261-6-rppt@kernel.org>
@ 2024-04-11 19:42   ` Luis Chamberlain
  2024-04-14  6:53     ` Mike Rapoport
  2024-04-12  9:16   ` Ingo Molnar
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Luis Chamberlain @ 2024-04-11 19:42 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, sparclinux, linux-riscv, Nadav Amit, linux-arch,
	linux-s390, Helge Deller, Huacai Chen, Russell King,
	linux-trace-kernel, Alexandre Ghiti, Will Deacon, Heiko Carstens,
	Steven Rostedt, loongarch, Thomas Gleixner, bpf,
	linux-arm-kernel, Thomas Bogendoerfer, linux-parisc,
	Puranjay Mohan, linux-mm, netdev, Kent Overstreet, linux-kernel,
	Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules

On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> 
> module_alloc() is used everywhere as a mean to allocate memory for code.
> 
> Beside being semantically wrong, this unnecessarily ties all subsystems
> that need to allocate code, such as ftrace, kprobes and BPF to modules and
> puts the burden of code allocation to the modules code.
> 
> Several architectures override module_alloc() because of various
> constraints where the executable memory can be located and this causes
> additional obstacles for improvements of code allocation.
> 
> Start splitting code allocation from modules by introducing execmem_alloc()
> and execmem_free() APIs.
> 
> Initially, execmem_alloc() is a wrapper for module_alloc() and
> execmem_free() is a replacement of module_memfree() to allow updating all
> call sites to use the new APIs.
> 
> Since architectures define different restrictions on placement,
> permissions, alignment and other parameters for memory that can be used by
> different subsystems that allocate executable memory, execmem_alloc() takes
> a type argument, that will be used to identify the calling subsystem and to
> allow architectures define parameters for ranges suitable for that
> subsystem.

It would be good to describe this is a non-fuctional change.

> Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
> ---

> diff --git a/mm/execmem.c b/mm/execmem.c
> new file mode 100644
> index 000000000000..ed2ea41a2543
> --- /dev/null
> +++ b/mm/execmem.c
> @@ -0,0 +1,26 @@
> +// SPDX-License-Identifier: GPL-2.0

And this just needs to copy over the copyright notices from the main.c file.

  Luis

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 00/15] mm: jit/text allocator
       [not found] <20240411160051.2093261-1-rppt@kernel.org>
  2024-04-11 18:00 ` [PATCH v4 00/15] mm: jit/text allocator Kent Overstreet
@ 2024-04-11 19:45 ` Luis Chamberlain
       [not found] ` <20240411160051.2093261-7-rppt@kernel.org>
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Luis Chamberlain @ 2024-04-11 19:45 UTC (permalink / raw)
  To: Mike Rapoport, Thomas Gleixner
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, sparclinux, linux-riscv, Nadav Amit, linux-arch,
	linux-s390, Helge Deller, Huacai Chen, Russell King,
	linux-trace-kernel, Alexandre Ghiti, Will Deacon, Heiko Carstens,
	Steven Rostedt, loongarch, bpf, linux-arm-kernel,
	Thomas Bogendoerfer, linux-parisc, Puranjay Mohan, linux-mm,
	netdev, Kent Overstreet, linux-kernel, Dinh Nguyen,
	=?iso-88 59-1?Q?Bj=F6rn_T=F6pel?=,
	Eric Chanudet, Palmer Dabbelt, Andrew Morton, Rick Edgecombe,
	linuxppc-dev, David S. Miller, linux-modules

On Thu, Apr 11, 2024 at 07:00:36PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> 
> Hi,
> 
> Since v3 I looked into making execmem more of an utility toolbox, as we
> discussed at LPC with Mark Rutland, but it was getting more hairier than
> having a struct describing architecture constraints and a type identifying
> the consumer of execmem.
> 
> And I do think that having the description of architecture constraints for
> allocations of executable memory in a single place is better that having it
> spread all over the place.
> 
> The patches available via git:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/v4

I've taken the first 5 patches through modules-next for now to get early
exposure to testing. Of those I just had minor nit feedback on the 5th,
but the rest look good.

Let's wait for review for the rest of the patches 6-15.

  Luis

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 06/15] mm/execmem, arch: convert simple overrides of module_alloc to execmem
       [not found] ` <20240411160051.2093261-7-rppt@kernel.org>
@ 2024-04-11 20:53   ` Sam Ravnborg
  2024-04-14  7:26     ` Mike Rapoport
  2024-04-15  8:03   ` Peter Zijlstra
  1 sibling, 1 reply; 22+ messages in thread
From: Sam Ravnborg @ 2024-04-11 20:53 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner, bpf,
	linux-arm-kernel, Thomas Bogendoerfer, linux-parisc,
	Puranjay Mohan, linux-mm, netdev, Kent Overstreet, linux-kernel,
	Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules

Hi Mike.

On Thu, Apr 11, 2024 at 07:00:42PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> 
> Several architectures override module_alloc() only to define address
> range for code allocations different than VMALLOC address space.
> 
> Provide a generic implementation in execmem that uses the parameters for
> address space ranges, required alignment and page protections provided
> by architectures.
> 
> The architectures must fill execmem_info structure and implement
> execmem_arch_setup() that returns a pointer to that structure. This way the
> execmem initialization won't be called from every architecture, but rather
> from a central place, namely a core_initcall() in execmem.
> 
> The execmem provides execmem_alloc() API that wraps __vmalloc_node_range()
> with the parameters defined by the architectures.  If an architecture does
> not implement execmem_arch_setup(), execmem_alloc() will fall back to
> module_alloc().
> 
> Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
> ---

This code snippet could be more readable ...
> diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
> index 66c45a2764bc..b70047f944cc 100644
> --- a/arch/sparc/kernel/module.c
> +++ b/arch/sparc/kernel/module.c
> @@ -14,6 +14,7 @@
>  #include <linux/string.h>
>  #include <linux/ctype.h>
>  #include <linux/mm.h>
> +#include <linux/execmem.h>
>  
>  #include <asm/processor.h>
>  #include <asm/spitfire.h>
> @@ -21,34 +22,26 @@
>  
>  #include "entry.h"
>  
> +static struct execmem_info execmem_info __ro_after_init = {
> +	.ranges = {
> +		[EXECMEM_DEFAULT] = {
>  #ifdef CONFIG_SPARC64
> -
> -#include <linux/jump_label.h>
> -
> -static void *module_map(unsigned long size)
> -{
> -	if (PAGE_ALIGN(size) > MODULES_LEN)
> -		return NULL;
> -	return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> -				GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
> -				__builtin_return_address(0));
> -}
> +			.start = MODULES_VADDR,
> +			.end = MODULES_END,
>  #else
> -static void *module_map(unsigned long size)
> +			.start = VMALLOC_START,
> +			.end = VMALLOC_END,
> +#endif
> +			.alignment = 1,
> +		},
> +	},
> +};
> +
> +struct execmem_info __init *execmem_arch_setup(void)
>  {
> -	return vmalloc(size);
> -}
> -#endif /* CONFIG_SPARC64 */
> -
> -void *module_alloc(unsigned long size)
> -{
> -	void *ret;
> -
> -	ret = module_map(size);
> -	if (ret)
> -		memset(ret, 0, size);
> +	execmem_info.ranges[EXECMEM_DEFAULT].pgprot = PAGE_KERNEL;
>  
> -	return ret;
> +	return &execmem_info;
>  }
>  
>  /* Make generic code ignore STT_REGISTER dummy undefined symbols.  */

... if the following was added:

diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h
index 9e85d57ac3f2..62bcafe38b1f 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -432,6 +432,8 @@ static inline int io_remap_pfn_range(struct vm_area_struct *vma,

 #define VMALLOC_START           _AC(0xfe600000,UL)
 #define VMALLOC_END             _AC(0xffc00000,UL)
+#define MODULES_VADDR           VMALLOC_START
+#define MODULES_END             VMALLOC_END


Then the #ifdef CONFIG_SPARC64 could be dropped and the code would be
the same for 32 and 64 bits.

Just a drive-by comment.

	Sam

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
       [not found] ` <20240411160051.2093261-6-rppt@kernel.org>
  2024-04-11 19:42   ` [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free() Luis Chamberlain
@ 2024-04-12  9:16   ` Ingo Molnar
  2024-04-14  6:54     ` Mike Rapoport
  2024-04-15  7:52   ` Peter Zijlstra
  2024-04-17 21:06   ` Masami Hiramatsu
  3 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2024-04-12  9:16 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner, bpf,
	linux-arm-kernel, Thomas Bogendoerfer, linux-parisc,
	Puranjay Mohan, linux-mm, netdev, Kent Overstreet, linux-kernel,
	Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules


* Mike Rapoport <rppt@kernel.org> wrote:

> +/**
> + * enum execmem_type - types of executable memory ranges
> + *
> + * There are several subsystems that allocate executable memory.
> + * Architectures define different restrictions on placement,
> + * permissions, alignment and other parameters for memory that can be used
> + * by these subsystems.
> + * Types in this enum identify subsystems that allocate executable memory
> + * and let architectures define parameters for ranges suitable for
> + * allocations by each subsystem.
> + *
> + * @EXECMEM_DEFAULT: default parameters that would be used for types that
> + * are not explcitly defined.
> + * @EXECMEM_MODULE_TEXT: parameters for module text sections
> + * @EXECMEM_KPROBES: parameters for kprobes
> + * @EXECMEM_FTRACE: parameters for ftrace
> + * @EXECMEM_BPF: parameters for BPF
> + * @EXECMEM_TYPE_MAX:
> + */
> +enum execmem_type {
> +	EXECMEM_DEFAULT,
> +	EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT,
> +	EXECMEM_KPROBES,
> +	EXECMEM_FTRACE,
> +	EXECMEM_BPF,
> +	EXECMEM_TYPE_MAX,
> +};

s/explcitly
 /explicitly

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
  2024-04-11 19:42   ` [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free() Luis Chamberlain
@ 2024-04-14  6:53     ` Mike Rapoport
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Rapoport @ 2024-04-14  6:53 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, sparclinux, linux-riscv, Nadav Amit, linux-arch,
	linux-s390, Helge Deller, Huacai Chen, Russell King,
	linux-trace-kernel, Alexandre Ghiti, Will Deacon, Heiko Carstens,
	Steven Rostedt, loongarch, Thomas Gleixner, bpf,
	linux-arm-kernel, Thomas Bogendoerfer, linux-parisc,
	Puranjay Mohan, linux-mm, netdev, Kent Overstreet, linux-kernel,
	Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules

On Thu, Apr 11, 2024 at 12:42:05PM -0700, Luis Chamberlain wrote:
> On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote:
> > From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> > 
> > module_alloc() is used everywhere as a mean to allocate memory for code.
> > 
> > Beside being semantically wrong, this unnecessarily ties all subsystems
> > that need to allocate code, such as ftrace, kprobes and BPF to modules and
> > puts the burden of code allocation to the modules code.
> > 
> > Several architectures override module_alloc() because of various
> > constraints where the executable memory can be located and this causes
> > additional obstacles for improvements of code allocation.
> > 
> > Start splitting code allocation from modules by introducing execmem_alloc()
> > and execmem_free() APIs.
> > 
> > Initially, execmem_alloc() is a wrapper for module_alloc() and
> > execmem_free() is a replacement of module_memfree() to allow updating all
> > call sites to use the new APIs.
> > 
> > Since architectures define different restrictions on placement,
> > permissions, alignment and other parameters for memory that can be used by
> > different subsystems that allocate executable memory, execmem_alloc() takes
> > a type argument, that will be used to identify the calling subsystem and to
> > allow architectures define parameters for ranges suitable for that
> > subsystem.
> 
> It would be good to describe this is a non-fuctional change.

Ok.
 
> > Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
> > ---
> 
> > diff --git a/mm/execmem.c b/mm/execmem.c
> > new file mode 100644
> > index 000000000000..ed2ea41a2543
> > --- /dev/null
> > +++ b/mm/execmem.c
> > @@ -0,0 +1,26 @@
> > +// SPDX-License-Identifier: GPL-2.0
> 
> And this just needs to copy over the copyright notices from the main.c file.

Will do.
 
>   Luis

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
  2024-04-12  9:16   ` Ingo Molnar
@ 2024-04-14  6:54     ` Mike Rapoport
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Rapoport @ 2024-04-14  6:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner, bpf,
	linux-arm-kernel, Thomas Bogendoerfer, linux-parisc,
	Puranjay Mohan, linux-mm, netdev, Kent Overstreet, linux-kernel,
	Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules

On Fri, Apr 12, 2024 at 11:16:10AM +0200, Ingo Molnar wrote:
> 
> * Mike Rapoport <rppt@kernel.org> wrote:
> 
> > +/**
> > + * enum execmem_type - types of executable memory ranges
> > + *
> > + * There are several subsystems that allocate executable memory.
> > + * Architectures define different restrictions on placement,
> > + * permissions, alignment and other parameters for memory that can be used
> > + * by these subsystems.
> > + * Types in this enum identify subsystems that allocate executable memory
> > + * and let architectures define parameters for ranges suitable for
> > + * allocations by each subsystem.
> > + *
> > + * @EXECMEM_DEFAULT: default parameters that would be used for types that
> > + * are not explcitly defined.
> > + * @EXECMEM_MODULE_TEXT: parameters for module text sections
> > + * @EXECMEM_KPROBES: parameters for kprobes
> > + * @EXECMEM_FTRACE: parameters for ftrace
> > + * @EXECMEM_BPF: parameters for BPF
> > + * @EXECMEM_TYPE_MAX:
> > + */
> > +enum execmem_type {
> > +	EXECMEM_DEFAULT,
> > +	EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT,
> > +	EXECMEM_KPROBES,
> > +	EXECMEM_FTRACE,
> > +	EXECMEM_BPF,
> > +	EXECMEM_TYPE_MAX,
> > +};
> 
> s/explcitly
>  /explicitly
 
Sure, thanks

> Thanks,
> 
> 	Ingo

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 06/15] mm/execmem, arch: convert simple overrides of module_alloc to execmem
  2024-04-11 20:53   ` [PATCH v4 06/15] mm/execmem, arch: convert simple overrides of module_alloc to execmem Sam Ravnborg
@ 2024-04-14  7:26     ` Mike Rapoport
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Rapoport @ 2024-04-14  7:26 UTC (permalink / raw)
  To: Sam Ravnborg
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner, bpf,
	linux-arm-kernel, Thomas Bogendoerfer, linux-parisc,
	Puranjay Mohan, linux-mm, netdev, Kent Overstreet, linux-kernel,
	Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules

On Thu, Apr 11, 2024 at 10:53:46PM +0200, Sam Ravnborg wrote:
> Hi Mike.
> 
> On Thu, Apr 11, 2024 at 07:00:42PM +0300, Mike Rapoport wrote:
> > From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> > 
> > Several architectures override module_alloc() only to define address
> > range for code allocations different than VMALLOC address space.
> > 
> > Provide a generic implementation in execmem that uses the parameters for
> > address space ranges, required alignment and page protections provided
> > by architectures.
> > 
> > The architectures must fill execmem_info structure and implement
> > execmem_arch_setup() that returns a pointer to that structure. This way the
> > execmem initialization won't be called from every architecture, but rather
> > from a central place, namely a core_initcall() in execmem.
> > 
> > The execmem provides execmem_alloc() API that wraps __vmalloc_node_range()
> > with the parameters defined by the architectures.  If an architecture does
> > not implement execmem_arch_setup(), execmem_alloc() will fall back to
> > module_alloc().
> > 
> > Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
> > ---
> 
> This code snippet could be more readable ...
> > diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
> > index 66c45a2764bc..b70047f944cc 100644
> > --- a/arch/sparc/kernel/module.c
> > +++ b/arch/sparc/kernel/module.c
> > @@ -14,6 +14,7 @@
> >  #include <linux/string.h>
> >  #include <linux/ctype.h>
> >  #include <linux/mm.h>
> > +#include <linux/execmem.h>
> >  
> >  #include <asm/processor.h>
> >  #include <asm/spitfire.h>
> > @@ -21,34 +22,26 @@
> >  
> >  #include "entry.h"
> >  
> > +static struct execmem_info execmem_info __ro_after_init = {
> > +	.ranges = {
> > +		[EXECMEM_DEFAULT] = {
> >  #ifdef CONFIG_SPARC64
> > -
> > -#include <linux/jump_label.h>
> > -
> > -static void *module_map(unsigned long size)
> > -{
> > -	if (PAGE_ALIGN(size) > MODULES_LEN)
> > -		return NULL;
> > -	return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> > -				GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
> > -				__builtin_return_address(0));
> > -}
> > +			.start = MODULES_VADDR,
> > +			.end = MODULES_END,
> >  #else
> > -static void *module_map(unsigned long size)
> > +			.start = VMALLOC_START,
> > +			.end = VMALLOC_END,
> > +#endif
> > +			.alignment = 1,
> > +		},
> > +	},
> > +};
> > +
> > +struct execmem_info __init *execmem_arch_setup(void)
> >  {
> > -	return vmalloc(size);
> > -}
> > -#endif /* CONFIG_SPARC64 */
> > -
> > -void *module_alloc(unsigned long size)
> > -{
> > -	void *ret;
> > -
> > -	ret = module_map(size);
> > -	if (ret)
> > -		memset(ret, 0, size);
> > +	execmem_info.ranges[EXECMEM_DEFAULT].pgprot = PAGE_KERNEL;
> >  
> > -	return ret;
> > +	return &execmem_info;
> >  }
> >  
> >  /* Make generic code ignore STT_REGISTER dummy undefined symbols.  */
> 
> ... if the following was added:
> 
> diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h
> index 9e85d57ac3f2..62bcafe38b1f 100644
> --- a/arch/sparc/include/asm/pgtable_32.h
> +++ b/arch/sparc/include/asm/pgtable_32.h
> @@ -432,6 +432,8 @@ static inline int io_remap_pfn_range(struct vm_area_struct *vma,
> 
>  #define VMALLOC_START           _AC(0xfe600000,UL)
>  #define VMALLOC_END             _AC(0xffc00000,UL)
> +#define MODULES_VADDR           VMALLOC_START
> +#define MODULES_END             VMALLOC_END
> 
> 
> Then the #ifdef CONFIG_SPARC64 could be dropped and the code would be
> the same for 32 and 64 bits.
 
Yeah, the #ifdef there can be dropped even regardless of execmem.
I'll add a patch for that.

> Just a drive-by comment.
> 
> 	Sam
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
       [not found] ` <20240411160051.2093261-6-rppt@kernel.org>
  2024-04-11 19:42   ` [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free() Luis Chamberlain
  2024-04-12  9:16   ` Ingo Molnar
@ 2024-04-15  7:52   ` Peter Zijlstra
  2024-04-15 16:51     ` Mike Rapoport
  2024-04-15 17:36     ` Mark Rutland
  2024-04-17 21:06   ` Masami Hiramatsu
  3 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2024-04-15  7:52 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner, bpf,
	linux-arm-kernel, Thomas Bogendoerfer, linux-parisc,
	Puranjay Mohan, linux-mm, netdev, Kent Overstreet, linux-kernel,
	Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules

On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote:
> +/**
> + * enum execmem_type - types of executable memory ranges
> + *
> + * There are several subsystems that allocate executable memory.
> + * Architectures define different restrictions on placement,
> + * permissions, alignment and other parameters for memory that can be used
> + * by these subsystems.
> + * Types in this enum identify subsystems that allocate executable memory
> + * and let architectures define parameters for ranges suitable for
> + * allocations by each subsystem.
> + *
> + * @EXECMEM_DEFAULT: default parameters that would be used for types that
> + * are not explcitly defined.
> + * @EXECMEM_MODULE_TEXT: parameters for module text sections
> + * @EXECMEM_KPROBES: parameters for kprobes
> + * @EXECMEM_FTRACE: parameters for ftrace
> + * @EXECMEM_BPF: parameters for BPF
> + * @EXECMEM_TYPE_MAX:
> + */
> +enum execmem_type {
> +	EXECMEM_DEFAULT,
> +	EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT,
> +	EXECMEM_KPROBES,
> +	EXECMEM_FTRACE,
> +	EXECMEM_BPF,
> +	EXECMEM_TYPE_MAX,
> +};

Can we please get a break-down of how all these types are actually
different from one another?

I'm thinking some platforms have a tiny immediate space (arm64 comes to
mind) and has less strict placement constraints for some of them?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 06/15] mm/execmem, arch: convert simple overrides of module_alloc to execmem
       [not found] ` <20240411160051.2093261-7-rppt@kernel.org>
  2024-04-11 20:53   ` [PATCH v4 06/15] mm/execmem, arch: convert simple overrides of module_alloc to execmem Sam Ravnborg
@ 2024-04-15  8:03   ` Peter Zijlstra
  1 sibling, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2024-04-15  8:03 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner, bpf,
	linux-arm-kernel, Thomas Bogendoerfer, linux-parisc,
	Puranjay Mohan, linux-mm, netdev, Kent Overstreet, linux-kernel,
	Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules

On Thu, Apr 11, 2024 at 07:00:42PM +0300, Mike Rapoport wrote:
> +static struct execmem_info execmem_info __ro_after_init = {
> +	.ranges = {
> +		[EXECMEM_DEFAULT] = {
> +			.start = MODULES_VADDR,
> +			.end = MODULES_END,
> +			.alignment = 1,
> +		},
> +	},
> +};
> +
> +struct execmem_info __init *execmem_arch_setup(void)
>  {
> +	execmem_info.ranges[EXECMEM_DEFAULT].pgprot = PAGE_KERNEL;
> +
> +	return &execmem_info;
>  }

> +static struct execmem_info execmem_info __ro_after_init = {
> +	.ranges = {
> +		[EXECMEM_DEFAULT] = {
> +			.start = MODULES_VADDR,
> +			.end = MODULES_END,
> +			.pgprot = PAGE_KERNEL_EXEC,
> +			.alignment = 1,
> +		},
> +	},
> +};
> +
> +struct execmem_info __init *execmem_arch_setup(void)
>  {
> +	return &execmem_info;
>  }

> +static struct execmem_info execmem_info __ro_after_init = {
> +	.ranges = {
> +		[EXECMEM_DEFAULT] = {
> +			.pgprot = PAGE_KERNEL_RWX,
> +			.alignment = 1,
> +		},
> +	},
> +};
> +
> +struct execmem_info __init *execmem_arch_setup(void)
>  {
> +	execmem_info.ranges[EXECMEM_DEFAULT].start = VMALLOC_START;
> +	execmem_info.ranges[EXECMEM_DEFAULT].end = VMALLOC_END;
> +
> +	return &execmem_info;
>  }

> +static struct execmem_info execmem_info __ro_after_init = {
> +	.ranges = {
> +		[EXECMEM_DEFAULT] = {
> +			.pgprot = PAGE_KERNEL,
> +			.alignment = 1,
> +		},
> +	},
> +};
> +
> +struct execmem_info __init *execmem_arch_setup(void)
>  {
> +	execmem_info.ranges[EXECMEM_DEFAULT].start = MODULES_VADDR;
> +	execmem_info.ranges[EXECMEM_DEFAULT].end = MODULES_END;
> +
> +	return &execmem_info;
>  }

> +static struct execmem_info execmem_info __ro_after_init = {
> +	.ranges = {
> +		[EXECMEM_DEFAULT] = {
>  #ifdef CONFIG_SPARC64
> +			.start = MODULES_VADDR,
> +			.end = MODULES_END,
>  #else
> +			.start = VMALLOC_START,
> +			.end = VMALLOC_END,
> +#endif
> +			.alignment = 1,
> +		},
> +	},
> +};
> +
> +struct execmem_info __init *execmem_arch_setup(void)
>  {
> +	execmem_info.ranges[EXECMEM_DEFAULT].pgprot = PAGE_KERNEL;
>  
> +	return &execmem_info;
>  }

I'm amazed by the weird and inconsistent breakup of initializations.

What exactly is wrong with something like:

static struct execmem_info execmem_info __ro_after_init;

struct execmem_info __init *execmem_arch_setup(void)
{
	execmem_info = (struct execmem_info){
		.ranges = {
			[EXECMEM_DEFAULT] = {
				.start	= MODULES_VADDR,
				.end	= MODULES_END,
				.pgprot	= PAGE_KERNEL,
				.alignment = 1,
			},
		},
	};
	return &execmem_info;
}


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 07/15] mm/execmem, arch: convert remaining overrides of module_alloc to execmem
       [not found] ` <20240411160051.2093261-8-rppt@kernel.org>
@ 2024-04-15  9:36   ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2024-04-15  9:36 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner, bpf,
	linux-arm-kernel, Thomas Bogendoerfer, linux-parisc,
	Puranjay Mohan, linux-mm, netdev, Kent Overstreet, linux-kernel,
	Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules

On Thu, Apr 11, 2024 at 07:00:43PM +0300, Mike Rapoport wrote:

> +static struct execmem_info execmem_info __ro_after_init = {
> +	.ranges = {
> +		[EXECMEM_DEFAULT] = {
> +			.flags = EXECMEM_KASAN_SHADOW,
> +			.alignment = MODULE_ALIGN,
> +		},
> +	},
> +};
>  
> +struct execmem_info __init *execmem_arch_setup(void)
>  {
> +	unsigned long start, offset = 0;
>  
> +	if (kaslr_enabled())
> +		offset = get_random_u32_inclusive(1, 1024) * PAGE_SIZE;
>  
> +	start = MODULES_VADDR + offset;
> +	execmem_info.ranges[EXECMEM_DEFAULT].start = start;
> +	execmem_info.ranges[EXECMEM_DEFAULT].end = MODULES_END;
> +	execmem_info.ranges[EXECMEM_DEFAULT].pgprot = PAGE_KERNEL;
>  
> +	return &execmem_info;
>  }

struct execmem_info __init *execmem_arch_setup(void)
{
	unsigned long offset = 0;

	if (kaslr_enabled())
		offset = get_random_u32_inclusive(1, 1024) * PAGE_SIZE;

	execmem_info = (struct execmem_info){
		.ranges = {
			[EXECMEM_DEFAULT] = {
				.start     = MODULES_VADDR + offset,
				.end       = MODULES_END,
				.pgprot    = PAGE_KERNEL,
				.flags     = EXECMEM_KASAN_SHADOW,
				.alignment = 1,
			},
		},
	};

	return &execmem_info;
}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
  2024-04-15  7:52   ` Peter Zijlstra
@ 2024-04-15 16:51     ` Mike Rapoport
  2024-04-15 17:36     ` Mark Rutland
  1 sibling, 0 replies; 22+ messages in thread
From: Mike Rapoport @ 2024-04-15 16:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner, bpf,
	linux-arm-kernel, Thomas Bogendoerfer, linux-parisc,
	Puranjay Mohan, linux-mm, netdev, Kent Overstreet, linux-kernel,
	Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules

On Mon, Apr 15, 2024 at 09:52:41AM +0200, Peter Zijlstra wrote:
> On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote:
> > +/**
> > + * enum execmem_type - types of executable memory ranges
> > + *
> > + * There are several subsystems that allocate executable memory.
> > + * Architectures define different restrictions on placement,
> > + * permissions, alignment and other parameters for memory that can be used
> > + * by these subsystems.
> > + * Types in this enum identify subsystems that allocate executable memory
> > + * and let architectures define parameters for ranges suitable for
> > + * allocations by each subsystem.
> > + *
> > + * @EXECMEM_DEFAULT: default parameters that would be used for types that
> > + * are not explcitly defined.
> > + * @EXECMEM_MODULE_TEXT: parameters for module text sections
> > + * @EXECMEM_KPROBES: parameters for kprobes
> > + * @EXECMEM_FTRACE: parameters for ftrace
> > + * @EXECMEM_BPF: parameters for BPF
> > + * @EXECMEM_TYPE_MAX:
> > + */
> > +enum execmem_type {
> > +	EXECMEM_DEFAULT,
> > +	EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT,
> > +	EXECMEM_KPROBES,
> > +	EXECMEM_FTRACE,
> > +	EXECMEM_BPF,
> > +	EXECMEM_TYPE_MAX,
> > +};
> 
> Can we please get a break-down of how all these types are actually
> different from one another?
> 
> I'm thinking some platforms have a tiny immediate space (arm64 comes to
> mind) and has less strict placement constraints for some of them?

loongarch, mips, nios2 and sparc define modules address space different
from vmalloc and use that for modules, kprobes and bpf (where supported).

parisc uses vmalloc range for everything, but it sets permissions to
PAGE_KERNEL_RWX because it's PAGE_KERNEL_EXEC is read only and it lacks
set_memory_* APIs.

arm has an address space for modules, but it fall back to the entire
vmalloc with CONFIG_ARM_MODULE_PLTS=y.

arm64 uses different ranges for modules and bpf/kprobes. For kprobes it
does vmalloc(PAGE_KERNEL_ROX) and for bpf just plain vmalloc().
For modules arm64 first tries to allocated from 128M below kernel_end and
if that fails it uses 2G below kernel_end as a fallback.

powerpc uses vmalloc space for everything for some configurations. For
book3s-32 and 8xx it defines two ranges that are used for module text,
kprobes and bpf and the module data can be allocated anywhere in vmalloc.

riscv has an address space for modules, a different address space for bpf
and uses vmalloc space for kprobes.

s390 and x86 have modules address space and use that space for all
executable allocations.

The EXECMEM_FTRACE type is only used on s390 and x86 and for now it's there
more for completeness rather to denote special constraints or properties.

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
  2024-04-15  7:52   ` Peter Zijlstra
  2024-04-15 16:51     ` Mike Rapoport
@ 2024-04-15 17:36     ` Mark Rutland
  1 sibling, 0 replies; 22+ messages in thread
From: Mark Rutland @ 2024-04-15 17:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, Catalin Marinas, linux-mips, Song Liu, Donald Dutile,
	Luis Chamberlain, sparclinux, linux-riscv, Nadav Amit,
	linux-arch, linux-s390, Helge Deller, Huacai Chen, Russell King,
	linux-trace-kernel, Alexandre Ghiti, Will Deacon, Heiko Carstens,
	Steven Rostedt, loongarch, Thomas Gleixner, bpf,
	linux-arm-kernel, Thomas Bogendoerfer, linux-parisc,
	Puranjay Mohan, linux-mm, netdev, Kent Overstreet, linux-kernel,
	Dinh Nguyen, Bj"orn T"opel, Eric Chanudet,
	Palmer Dabbelt, linux-modules, Andrew Morton, Rick Edgecombe,
	linuxppc-dev, David S. Miller, Mike Rapoport

On Mon, Apr 15, 2024 at 09:52:41AM +0200, Peter Zijlstra wrote:
> On Thu, Apr 11, 2024 at 07:00:41PM +0300, Mike Rapoport wrote:
> > +/**
> > + * enum execmem_type - types of executable memory ranges
> > + *
> > + * There are several subsystems that allocate executable memory.
> > + * Architectures define different restrictions on placement,
> > + * permissions, alignment and other parameters for memory that can be used
> > + * by these subsystems.
> > + * Types in this enum identify subsystems that allocate executable memory
> > + * and let architectures define parameters for ranges suitable for
> > + * allocations by each subsystem.
> > + *
> > + * @EXECMEM_DEFAULT: default parameters that would be used for types that
> > + * are not explcitly defined.
> > + * @EXECMEM_MODULE_TEXT: parameters for module text sections
> > + * @EXECMEM_KPROBES: parameters for kprobes
> > + * @EXECMEM_FTRACE: parameters for ftrace
> > + * @EXECMEM_BPF: parameters for BPF
> > + * @EXECMEM_TYPE_MAX:
> > + */
> > +enum execmem_type {
> > +	EXECMEM_DEFAULT,
> > +	EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT,
> > +	EXECMEM_KPROBES,
> > +	EXECMEM_FTRACE,
> > +	EXECMEM_BPF,
> > +	EXECMEM_TYPE_MAX,
> > +};
> 
> Can we please get a break-down of how all these types are actually
> different from one another?
> 
> I'm thinking some platforms have a tiny immediate space (arm64 comes to
> mind) and has less strict placement constraints for some of them?

Yeah, and really I'd *much* rather deal with that in arch code, as I have said
several times.

For arm64 we have two bsaic restrictions: 

1) Direct branches can go +/-128M
   We can expand this range by having direct branches go to PLTs, at a
   performance cost.

2) PREL32 relocations can go +/-2G
   We cannot expand this further.

* We don't need to allocate memory for ftrace. We do not use trampolines.

* Kprobes XOL areas don't care about either of those; we don't place any
  PC-relative instructions in those. Maybe we want to in future.

* Modules care about both; we'd *prefer* to place them within +/-128M of all
  other kernel/module code, but if there's no space we can use PLTs and expand
  that to +/-2G. Since modules can refreence other modules, that ends up
  actually being halved, and modules have to fit within some 2G window that
  also covers the kernel.

* I'm not sure about BPF's requirements; it seems happy doing the same as
  modules.

So if we *must* use a common execmem allocator, what we'd reall want is our own
types, e.g.

	EXECMEM_ANYWHERE
	EXECMEM_NOPLT
	EXECMEM_PREL32

... and then we use those in arch code to implement module_alloc() and friends.

Mark.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
       [not found] ` <20240411160051.2093261-6-rppt@kernel.org>
                     ` (2 preceding siblings ...)
  2024-04-15  7:52   ` Peter Zijlstra
@ 2024-04-17 21:06   ` Masami Hiramatsu
  3 siblings, 0 replies; 22+ messages in thread
From: Masami Hiramatsu @ 2024-04-17 21:06 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner, bpf,
	linux-arm-kernel, Thomas Bogendoerfer, linux-parisc,
	Puranjay Mohan, linux-mm, netdev, Kent Overstreet, linux-kernel,
	Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules

On Thu, 11 Apr 2024 19:00:41 +0300
Mike Rapoport <rppt@kernel.org> wrote:

> From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> 
> module_alloc() is used everywhere as a mean to allocate memory for code.
> 
> Beside being semantically wrong, this unnecessarily ties all subsystems
> that need to allocate code, such as ftrace, kprobes and BPF to modules and
> puts the burden of code allocation to the modules code.
> 
> Several architectures override module_alloc() because of various
> constraints where the executable memory can be located and this causes
> additional obstacles for improvements of code allocation.
> 
> Start splitting code allocation from modules by introducing execmem_alloc()
> and execmem_free() APIs.
> 
> Initially, execmem_alloc() is a wrapper for module_alloc() and
> execmem_free() is a replacement of module_memfree() to allow updating all
> call sites to use the new APIs.
> 
> Since architectures define different restrictions on placement,
> permissions, alignment and other parameters for memory that can be used by
> different subsystems that allocate executable memory, execmem_alloc() takes
> a type argument, that will be used to identify the calling subsystem and to
> allow architectures define parameters for ranges suitable for that
> subsystem.
> 

This looks good to me for the kprobe part.

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thank you,

> Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
> ---
>  arch/powerpc/kernel/kprobes.c    |  6 ++--
>  arch/s390/kernel/ftrace.c        |  4 +--
>  arch/s390/kernel/kprobes.c       |  4 +--
>  arch/s390/kernel/module.c        |  5 +--
>  arch/sparc/net/bpf_jit_comp_32.c |  8 ++---
>  arch/x86/kernel/ftrace.c         |  6 ++--
>  arch/x86/kernel/kprobes/core.c   |  4 +--
>  include/linux/execmem.h          | 57 ++++++++++++++++++++++++++++++++
>  include/linux/moduleloader.h     |  3 --
>  kernel/bpf/core.c                |  6 ++--
>  kernel/kprobes.c                 |  8 ++---
>  kernel/module/Kconfig            |  1 +
>  kernel/module/main.c             | 25 +++++---------
>  mm/Kconfig                       |  3 ++
>  mm/Makefile                      |  1 +
>  mm/execmem.c                     | 26 +++++++++++++++
>  16 files changed, 122 insertions(+), 45 deletions(-)
>  create mode 100644 include/linux/execmem.h
>  create mode 100644 mm/execmem.c
> 
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index bbca90a5e2ec..9fcd01bb2ce6 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -19,8 +19,8 @@
>  #include <linux/extable.h>
>  #include <linux/kdebug.h>
>  #include <linux/slab.h>
> -#include <linux/moduleloader.h>
>  #include <linux/set_memory.h>
> +#include <linux/execmem.h>
>  #include <asm/code-patching.h>
>  #include <asm/cacheflush.h>
>  #include <asm/sstep.h>
> @@ -130,7 +130,7 @@ void *alloc_insn_page(void)
>  {
>  	void *page;
>  
> -	page = module_alloc(PAGE_SIZE);
> +	page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
>  	if (!page)
>  		return NULL;
>  
> @@ -142,7 +142,7 @@ void *alloc_insn_page(void)
>  	}
>  	return page;
>  error:
> -	module_memfree(page);
> +	execmem_free(page);
>  	return NULL;
>  }
>  
> diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
> index c46381ea04ec..798249ef5646 100644
> --- a/arch/s390/kernel/ftrace.c
> +++ b/arch/s390/kernel/ftrace.c
> @@ -7,13 +7,13 @@
>   *   Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
>   */
>  
> -#include <linux/moduleloader.h>
>  #include <linux/hardirq.h>
>  #include <linux/uaccess.h>
>  #include <linux/ftrace.h>
>  #include <linux/kernel.h>
>  #include <linux/types.h>
>  #include <linux/kprobes.h>
> +#include <linux/execmem.h>
>  #include <trace/syscall.h>
>  #include <asm/asm-offsets.h>
>  #include <asm/text-patching.h>
> @@ -220,7 +220,7 @@ static int __init ftrace_plt_init(void)
>  {
>  	const char *start, *end;
>  
> -	ftrace_plt = module_alloc(PAGE_SIZE);
> +	ftrace_plt = execmem_alloc(EXECMEM_FTRACE, PAGE_SIZE);
>  	if (!ftrace_plt)
>  		panic("cannot allocate ftrace plt\n");
>  
> diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
> index f0cf20d4b3c5..3c1b1be744de 100644
> --- a/arch/s390/kernel/kprobes.c
> +++ b/arch/s390/kernel/kprobes.c
> @@ -9,7 +9,6 @@
>  
>  #define pr_fmt(fmt) "kprobes: " fmt
>  
> -#include <linux/moduleloader.h>
>  #include <linux/kprobes.h>
>  #include <linux/ptrace.h>
>  #include <linux/preempt.h>
> @@ -21,6 +20,7 @@
>  #include <linux/slab.h>
>  #include <linux/hardirq.h>
>  #include <linux/ftrace.h>
> +#include <linux/execmem.h>
>  #include <asm/set_memory.h>
>  #include <asm/sections.h>
>  #include <asm/dis.h>
> @@ -38,7 +38,7 @@ void *alloc_insn_page(void)
>  {
>  	void *page;
>  
> -	page = module_alloc(PAGE_SIZE);
> +	page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
>  	if (!page)
>  		return NULL;
>  	set_memory_rox((unsigned long)page, 1);
> diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
> index 42215f9404af..ac97a905e8cd 100644
> --- a/arch/s390/kernel/module.c
> +++ b/arch/s390/kernel/module.c
> @@ -21,6 +21,7 @@
>  #include <linux/moduleloader.h>
>  #include <linux/bug.h>
>  #include <linux/memory.h>
> +#include <linux/execmem.h>
>  #include <asm/alternative.h>
>  #include <asm/nospec-branch.h>
>  #include <asm/facility.h>
> @@ -76,7 +77,7 @@ void *module_alloc(unsigned long size)
>  #ifdef CONFIG_FUNCTION_TRACER
>  void module_arch_cleanup(struct module *mod)
>  {
> -	module_memfree(mod->arch.trampolines_start);
> +	execmem_free(mod->arch.trampolines_start);
>  }
>  #endif
>  
> @@ -510,7 +511,7 @@ static int module_alloc_ftrace_hotpatch_trampolines(struct module *me,
>  
>  	size = FTRACE_HOTPATCH_TRAMPOLINES_SIZE(s->sh_size);
>  	numpages = DIV_ROUND_UP(size, PAGE_SIZE);
> -	start = module_alloc(numpages * PAGE_SIZE);
> +	start = execmem_alloc(EXECMEM_FTRACE, numpages * PAGE_SIZE);
>  	if (!start)
>  		return -ENOMEM;
>  	set_memory_rox((unsigned long)start, numpages);
> diff --git a/arch/sparc/net/bpf_jit_comp_32.c b/arch/sparc/net/bpf_jit_comp_32.c
> index da2df1e84ed4..bda2dbd3f4c5 100644
> --- a/arch/sparc/net/bpf_jit_comp_32.c
> +++ b/arch/sparc/net/bpf_jit_comp_32.c
> @@ -1,10 +1,10 @@
>  // SPDX-License-Identifier: GPL-2.0
> -#include <linux/moduleloader.h>
>  #include <linux/workqueue.h>
>  #include <linux/netdevice.h>
>  #include <linux/filter.h>
>  #include <linux/cache.h>
>  #include <linux/if_vlan.h>
> +#include <linux/execmem.h>
>  
>  #include <asm/cacheflush.h>
>  #include <asm/ptrace.h>
> @@ -713,7 +713,7 @@ cond_branch:			f_offset = addrs[i + filter[i].jf];
>  				if (unlikely(proglen + ilen > oldproglen)) {
>  					pr_err("bpb_jit_compile fatal error\n");
>  					kfree(addrs);
> -					module_memfree(image);
> +					execmem_free(image);
>  					return;
>  				}
>  				memcpy(image + proglen, temp, ilen);
> @@ -736,7 +736,7 @@ cond_branch:			f_offset = addrs[i + filter[i].jf];
>  			break;
>  		}
>  		if (proglen == oldproglen) {
> -			image = module_alloc(proglen);
> +			image = execmem_alloc(EXECMEM_BPF, proglen);
>  			if (!image)
>  				goto out;
>  		}
> @@ -758,7 +758,7 @@ cond_branch:			f_offset = addrs[i + filter[i].jf];
>  void bpf_jit_free(struct bpf_prog *fp)
>  {
>  	if (fp->jited)
> -		module_memfree(fp->bpf_func);
> +		execmem_free(fp->bpf_func);
>  
>  	bpf_prog_unlock_free(fp);
>  }
> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index 70139d9d2e01..c8ddb7abda7c 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -25,6 +25,7 @@
>  #include <linux/memory.h>
>  #include <linux/vmalloc.h>
>  #include <linux/set_memory.h>
> +#include <linux/execmem.h>
>  
>  #include <trace/syscall.h>
>  
> @@ -261,15 +262,14 @@ void arch_ftrace_update_code(int command)
>  #ifdef CONFIG_X86_64
>  
>  #ifdef CONFIG_MODULES
> -#include <linux/moduleloader.h>
>  /* Module allocation simplifies allocating memory for code */
>  static inline void *alloc_tramp(unsigned long size)
>  {
> -	return module_alloc(size);
> +	return execmem_alloc(EXECMEM_FTRACE, size);
>  }
>  static inline void tramp_free(void *tramp)
>  {
> -	module_memfree(tramp);
> +	execmem_free(tramp);
>  }
>  #else
>  /* Trampolines can only be created if modules are supported */
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index d0e49bd7c6f3..72e6a45e7ec2 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -40,12 +40,12 @@
>  #include <linux/kgdb.h>
>  #include <linux/ftrace.h>
>  #include <linux/kasan.h>
> -#include <linux/moduleloader.h>
>  #include <linux/objtool.h>
>  #include <linux/vmalloc.h>
>  #include <linux/pgtable.h>
>  #include <linux/set_memory.h>
>  #include <linux/cfi.h>
> +#include <linux/execmem.h>
>  
>  #include <asm/text-patching.h>
>  #include <asm/cacheflush.h>
> @@ -495,7 +495,7 @@ void *alloc_insn_page(void)
>  {
>  	void *page;
>  
> -	page = module_alloc(PAGE_SIZE);
> +	page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
>  	if (!page)
>  		return NULL;
>  
> diff --git a/include/linux/execmem.h b/include/linux/execmem.h
> new file mode 100644
> index 000000000000..43e7995593a1
> --- /dev/null
> +++ b/include/linux/execmem.h
> @@ -0,0 +1,57 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_EXECMEM_ALLOC_H
> +#define _LINUX_EXECMEM_ALLOC_H
> +
> +#include <linux/types.h>
> +#include <linux/moduleloader.h>
> +
> +/**
> + * enum execmem_type - types of executable memory ranges
> + *
> + * There are several subsystems that allocate executable memory.
> + * Architectures define different restrictions on placement,
> + * permissions, alignment and other parameters for memory that can be used
> + * by these subsystems.
> + * Types in this enum identify subsystems that allocate executable memory
> + * and let architectures define parameters for ranges suitable for
> + * allocations by each subsystem.
> + *
> + * @EXECMEM_DEFAULT: default parameters that would be used for types that
> + * are not explcitly defined.
> + * @EXECMEM_MODULE_TEXT: parameters for module text sections
> + * @EXECMEM_KPROBES: parameters for kprobes
> + * @EXECMEM_FTRACE: parameters for ftrace
> + * @EXECMEM_BPF: parameters for BPF
> + * @EXECMEM_TYPE_MAX:
> + */
> +enum execmem_type {
> +	EXECMEM_DEFAULT,
> +	EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT,
> +	EXECMEM_KPROBES,
> +	EXECMEM_FTRACE,
> +	EXECMEM_BPF,
> +	EXECMEM_TYPE_MAX,
> +};
> +
> +/**
> + * execmem_alloc - allocate executable memory
> + * @type: type of the allocation
> + * @size: how many bytes of memory are required
> + *
> + * Allocates memory that will contain executable code, either generated or
> + * loaded from kernel modules.
> + *
> + * The memory will have protections defined by architecture for executable
> + * region of the @type.
> + *
> + * Return: a pointer to the allocated memory or %NULL
> + */
> +void *execmem_alloc(enum execmem_type type, size_t size);
> +
> +/**
> + * execmem_free - free executable memory
> + * @ptr: pointer to the memory that should be freed
> + */
> +void execmem_free(void *ptr);
> +
> +#endif /* _LINUX_EXECMEM_ALLOC_H */
> diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
> index 89b1e0ed9811..a3b8caee9405 100644
> --- a/include/linux/moduleloader.h
> +++ b/include/linux/moduleloader.h
> @@ -29,9 +29,6 @@ unsigned int arch_mod_section_prepend(struct module *mod, unsigned int section);
>     sections.  Returns NULL on failure. */
>  void *module_alloc(unsigned long size);
>  
> -/* Free memory returned from module_alloc. */
> -void module_memfree(void *module_region);
> -
>  /* Determines if the section name is an init section (that is only used during
>   * module loading).
>   */
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 696bc55de8e8..75a54024e2f4 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -22,7 +22,6 @@
>  #include <linux/skbuff.h>
>  #include <linux/vmalloc.h>
>  #include <linux/random.h>
> -#include <linux/moduleloader.h>
>  #include <linux/bpf.h>
>  #include <linux/btf.h>
>  #include <linux/objtool.h>
> @@ -37,6 +36,7 @@
>  #include <linux/nospec.h>
>  #include <linux/bpf_mem_alloc.h>
>  #include <linux/memcontrol.h>
> +#include <linux/execmem.h>
>  
>  #include <asm/barrier.h>
>  #include <asm/unaligned.h>
> @@ -1050,12 +1050,12 @@ void bpf_jit_uncharge_modmem(u32 size)
>  
>  void *__weak bpf_jit_alloc_exec(unsigned long size)
>  {
> -	return module_alloc(size);
> +	return execmem_alloc(EXECMEM_BPF, size);
>  }
>  
>  void __weak bpf_jit_free_exec(void *addr)
>  {
> -	module_memfree(addr);
> +	execmem_free(addr);
>  }
>  
>  struct bpf_binary_header *
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 9d9095e81792..047ca629ce49 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -26,7 +26,6 @@
>  #include <linux/slab.h>
>  #include <linux/stddef.h>
>  #include <linux/export.h>
> -#include <linux/moduleloader.h>
>  #include <linux/kallsyms.h>
>  #include <linux/freezer.h>
>  #include <linux/seq_file.h>
> @@ -39,6 +38,7 @@
>  #include <linux/jump_label.h>
>  #include <linux/static_call.h>
>  #include <linux/perf_event.h>
> +#include <linux/execmem.h>
>  
>  #include <asm/sections.h>
>  #include <asm/cacheflush.h>
> @@ -113,17 +113,17 @@ enum kprobe_slot_state {
>  void __weak *alloc_insn_page(void)
>  {
>  	/*
> -	 * Use module_alloc() so this page is within +/- 2GB of where the
> +	 * Use execmem_alloc() so this page is within +/- 2GB of where the
>  	 * kernel image and loaded module images reside. This is required
>  	 * for most of the architectures.
>  	 * (e.g. x86-64 needs this to handle the %rip-relative fixups.)
>  	 */
> -	return module_alloc(PAGE_SIZE);
> +	return execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
>  }
>  
>  static void free_insn_page(void *page)
>  {
> -	module_memfree(page);
> +	execmem_free(page);
>  }
>  
>  struct kprobe_insn_cache kprobe_insn_slots = {
> diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
> index f3e0329337f6..744383c1eed1 100644
> --- a/kernel/module/Kconfig
> +++ b/kernel/module/Kconfig
> @@ -2,6 +2,7 @@
>  menuconfig MODULES
>  	bool "Enable loadable module support"
>  	modules
> +	select EXECMEM
>  	help
>  	  Kernel modules are small pieces of compiled code which can
>  	  be inserted in the running kernel, rather than being
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 5b82b069e0d3..d56b7df0cbb6 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -57,6 +57,7 @@
>  #include <linux/audit.h>
>  #include <linux/cfi.h>
>  #include <linux/debugfs.h>
> +#include <linux/execmem.h>
>  #include <uapi/linux/module.h>
>  #include "internal.h"
>  
> @@ -1179,16 +1180,6 @@ resolve_symbol_wait(struct module *mod,
>  	return ksym;
>  }
>  
> -void __weak module_memfree(void *module_region)
> -{
> -	/*
> -	 * This memory may be RO, and freeing RO memory in an interrupt is not
> -	 * supported by vmalloc.
> -	 */
> -	WARN_ON(in_interrupt());
> -	vfree(module_region);
> -}
> -
>  void __weak module_arch_cleanup(struct module *mod)
>  {
>  }
> @@ -1213,7 +1204,7 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
>  	if (mod_mem_use_vmalloc(type))
>  		ptr = vmalloc(size);
>  	else
> -		ptr = module_alloc(size);
> +		ptr = execmem_alloc(EXECMEM_MODULE_TEXT, size);
>  
>  	if (!ptr)
>  		return -ENOMEM;
> @@ -1244,7 +1235,7 @@ static void module_memory_free(struct module *mod, enum mod_mem_type type)
>  	if (mod_mem_use_vmalloc(type))
>  		vfree(ptr);
>  	else
> -		module_memfree(ptr);
> +		execmem_free(ptr);
>  }
>  
>  static void free_mod_mem(struct module *mod)
> @@ -2496,9 +2487,9 @@ static void do_free_init(struct work_struct *w)
>  
>  	llist_for_each_safe(pos, n, list) {
>  		initfree = container_of(pos, struct mod_initfree, node);
> -		module_memfree(initfree->init_text);
> -		module_memfree(initfree->init_data);
> -		module_memfree(initfree->init_rodata);
> +		execmem_free(initfree->init_text);
> +		execmem_free(initfree->init_data);
> +		execmem_free(initfree->init_rodata);
>  		kfree(initfree);
>  	}
>  }
> @@ -2608,10 +2599,10 @@ static noinline int do_init_module(struct module *mod)
>  	 * We want to free module_init, but be aware that kallsyms may be
>  	 * walking this with preempt disabled.  In all the failure paths, we
>  	 * call synchronize_rcu(), but we don't want to slow down the success
> -	 * path. module_memfree() cannot be called in an interrupt, so do the
> +	 * path. execmem_free() cannot be called in an interrupt, so do the
>  	 * work and call synchronize_rcu() in a work queue.
>  	 *
> -	 * Note that module_alloc() on most architectures creates W+X page
> +	 * Note that execmem_alloc() on most architectures creates W+X page
>  	 * mappings which won't be cleaned up until do_free_init() runs.  Any
>  	 * code such as mark_rodata_ro() which depends on those mappings to
>  	 * be cleaned up needs to sync with the queued work by invoking
> diff --git a/mm/Kconfig b/mm/Kconfig
> index b1448aa81e15..f08a216d4793 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -1241,6 +1241,9 @@ config LOCK_MM_AND_FIND_VMA
>  config IOMMU_MM_DATA
>  	bool
>  
> +config EXECMEM
> +	bool
> +
>  source "mm/damon/Kconfig"
>  
>  endmenu
> diff --git a/mm/Makefile b/mm/Makefile
> index 4abb40b911ec..001336c91864 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -133,3 +133,4 @@ obj-$(CONFIG_IO_MAPPING) += io-mapping.o
>  obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) += bootmem_info.o
>  obj-$(CONFIG_GENERIC_IOREMAP) += ioremap.o
>  obj-$(CONFIG_SHRINKER_DEBUG) += shrinker_debug.o
> +obj-$(CONFIG_EXECMEM) += execmem.o
> diff --git a/mm/execmem.c b/mm/execmem.c
> new file mode 100644
> index 000000000000..ed2ea41a2543
> --- /dev/null
> +++ b/mm/execmem.c
> @@ -0,0 +1,26 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/mm.h>
> +#include <linux/vmalloc.h>
> +#include <linux/execmem.h>
> +#include <linux/moduleloader.h>
> +
> +static void *__execmem_alloc(size_t size)
> +{
> +	return module_alloc(size);
> +}
> +
> +void *execmem_alloc(enum execmem_type type, size_t size)
> +{
> +	return __execmem_alloc(size);
> +}
> +
> +void execmem_free(void *ptr)
> +{
> +	/*
> +	 * This memory may be RO, and freeing RO memory in an interrupt is not
> +	 * supported by vmalloc.
> +	 */
> +	WARN_ON(in_interrupt());
> +	vfree(ptr);
> +}
> -- 
> 2.43.0
> 
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES
       [not found] ` <20240411160051.2093261-15-rppt@kernel.org>
@ 2024-04-17 21:16   ` Masami Hiramatsu
  2024-04-18 15:37     ` Mike Rapoport
  2024-04-19 15:49     ` Mike Rapoport
  0 siblings, 2 replies; 22+ messages in thread
From: Masami Hiramatsu @ 2024-04-17 21:16 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner, bpf,
	linux-arm-kernel, Thomas Bogendoerfer, linux-parisc,
	Puranjay Mohan, linux-mm, netdev, Kent Overstreet, linux-kernel,
	Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules

Hi Mike,

On Thu, 11 Apr 2024 19:00:50 +0300
Mike Rapoport <rppt@kernel.org> wrote:

> From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> 
> kprobes depended on CONFIG_MODULES because it has to allocate memory for
> code.
> 
> Since code allocations are now implemented with execmem, kprobes can be
> enabled in non-modular kernels.
> 
> Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
> modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
> dependency of CONFIG_KPROBES on CONFIG_MODULES.

Thanks for this work, but this conflicts with the latest fix in v6.9-rc4.
Also, can you use IS_ENABLED(CONFIG_MODULES) instead of #ifdefs in
function body? We have enough dummy functions for that, so it should
not make a problem.

Thank you,

> 
> Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
> ---
>  arch/Kconfig                |  2 +-
>  kernel/kprobes.c            | 43 +++++++++++++++++++++----------------
>  kernel/trace/trace_kprobe.c | 11 ++++++++++
>  3 files changed, 37 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index bc9e8e5dccd5..68177adf61a0 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -52,9 +52,9 @@ config GENERIC_ENTRY
>  
>  config KPROBES
>  	bool "Kprobes"
> -	depends on MODULES
>  	depends on HAVE_KPROBES
>  	select KALLSYMS
> +	select EXECMEM
>  	select TASKS_RCU if PREEMPTION
>  	help
>  	  Kprobes allows you to trap at almost any kernel address and
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 047ca629ce49..90c056853e6f 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1580,6 +1580,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
>  		goto out;
>  	}
>  
> +#ifdef CONFIG_MODULES
>  	/* Check if 'p' is probing a module. */
>  	*probed_mod = __module_text_address((unsigned long) p->addr);
>  	if (*probed_mod) {
> @@ -1603,6 +1604,8 @@ static int check_kprobe_address_safe(struct kprobe *p,
>  			ret = -ENOENT;
>  		}
>  	}
> +#endif
> +
>  out:
>  	preempt_enable();
>  	jump_label_unlock();
> @@ -2482,24 +2485,6 @@ int kprobe_add_area_blacklist(unsigned long start, unsigned long end)
>  	return 0;
>  }
>  
> -/* Remove all symbols in given area from kprobe blacklist */
> -static void kprobe_remove_area_blacklist(unsigned long start, unsigned long end)
> -{
> -	struct kprobe_blacklist_entry *ent, *n;
> -
> -	list_for_each_entry_safe(ent, n, &kprobe_blacklist, list) {
> -		if (ent->start_addr < start || ent->start_addr >= end)
> -			continue;
> -		list_del(&ent->list);
> -		kfree(ent);
> -	}
> -}
> -
> -static void kprobe_remove_ksym_blacklist(unsigned long entry)
> -{
> -	kprobe_remove_area_blacklist(entry, entry + 1);
> -}
> -
>  int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
>  				   char *type, char *sym)
>  {
> @@ -2564,6 +2549,25 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
>  	return ret ? : arch_populate_kprobe_blacklist();
>  }
>  
> +#ifdef CONFIG_MODULES
> +/* Remove all symbols in given area from kprobe blacklist */
> +static void kprobe_remove_area_blacklist(unsigned long start, unsigned long end)
> +{
> +	struct kprobe_blacklist_entry *ent, *n;
> +
> +	list_for_each_entry_safe(ent, n, &kprobe_blacklist, list) {
> +		if (ent->start_addr < start || ent->start_addr >= end)
> +			continue;
> +		list_del(&ent->list);
> +		kfree(ent);
> +	}
> +}
> +
> +static void kprobe_remove_ksym_blacklist(unsigned long entry)
> +{
> +	kprobe_remove_area_blacklist(entry, entry + 1);
> +}
> +
>  static void add_module_kprobe_blacklist(struct module *mod)
>  {
>  	unsigned long start, end;
> @@ -2665,6 +2669,7 @@ static struct notifier_block kprobe_module_nb = {
>  	.notifier_call = kprobes_module_callback,
>  	.priority = 0
>  };
> +#endif
>  
>  void kprobe_free_init_mem(void)
>  {
> @@ -2724,8 +2729,10 @@ static int __init init_kprobes(void)
>  	err = arch_init_kprobes();
>  	if (!err)
>  		err = register_die_notifier(&kprobe_exceptions_nb);
> +#ifdef CONFIG_MODULES
>  	if (!err)
>  		err = register_module_notifier(&kprobe_module_nb);
> +#endif
>  
>  	kprobes_initialized = (err == 0);
>  	kprobe_sysctls_init();
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 14099cc17fc9..f0610137d6a3 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -111,6 +111,7 @@ static nokprobe_inline bool trace_kprobe_within_module(struct trace_kprobe *tk,
>  	return strncmp(module_name(mod), name, len) == 0 && name[len] == ':';
>  }
>  
> +#ifdef CONFIG_MODULES
>  static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
>  {
>  	char *p;
> @@ -129,6 +130,12 @@ static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
>  
>  	return ret;
>  }
> +#else
> +static inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
> +{
> +	return false;
> +}
> +#endif
>  
>  static bool trace_kprobe_is_busy(struct dyn_event *ev)
>  {
> @@ -670,6 +677,7 @@ static int register_trace_kprobe(struct trace_kprobe *tk)
>  	return ret;
>  }
>  
> +#ifdef CONFIG_MODULES
>  /* Module notifier call back, checking event on the module */
>  static int trace_kprobe_module_callback(struct notifier_block *nb,
>  				       unsigned long val, void *data)
> @@ -704,6 +712,7 @@ static struct notifier_block trace_kprobe_module_nb = {
>  	.notifier_call = trace_kprobe_module_callback,
>  	.priority = 1	/* Invoked after kprobe module callback */
>  };
> +#endif
>  
>  static int count_symbols(void *data, unsigned long unused)
>  {
> @@ -1933,8 +1942,10 @@ static __init int init_kprobe_trace_early(void)
>  	if (ret)
>  		return ret;
>  
> +#ifdef CONFIG_MODULES
>  	if (register_module_notifier(&trace_kprobe_module_nb))
>  		return -EINVAL;
> +#endif
>  
>  	return 0;
>  }
> -- 
> 2.43.0
> 
> 


-- 
Masami Hiramatsu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES
  2024-04-17 21:16   ` [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES Masami Hiramatsu
@ 2024-04-18 15:37     ` Mike Rapoport
  2024-04-19 15:49     ` Mike Rapoport
  1 sibling, 0 replies; 22+ messages in thread
From: Mike Rapoport @ 2024-04-18 15:37 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner, bpf,
	linux-arm-kernel, Thomas Bogendoerfer, linux-parisc,
	Puranjay Mohan, linux-mm, netdev, Kent Overstreet, linux-kernel,
	Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules

Hi Masami,

On Thu, Apr 18, 2024 at 06:16:15AM +0900, Masami Hiramatsu wrote:
> Hi Mike,
> 
> On Thu, 11 Apr 2024 19:00:50 +0300
> Mike Rapoport <rppt@kernel.org> wrote:
> 
> > From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> > 
> > kprobes depended on CONFIG_MODULES because it has to allocate memory for
> > code.
> > 
> > Since code allocations are now implemented with execmem, kprobes can be
> > enabled in non-modular kernels.
> > 
> > Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
> > modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
> > dependency of CONFIG_KPROBES on CONFIG_MODULES.
> 
> Thanks for this work, but this conflicts with the latest fix in v6.9-rc4.
> Also, can you use IS_ENABLED(CONFIG_MODULES) instead of #ifdefs in
> function body? We have enough dummy functions for that, so it should
> not make a problem.

I'll rebase and will try to reduce ifdefery where possible.

> Thank you,
> 
> > 
> > Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
> > ---
> >  arch/Kconfig                |  2 +-
> >  kernel/kprobes.c            | 43 +++++++++++++++++++++----------------
> >  kernel/trace/trace_kprobe.c | 11 ++++++++++
> >  3 files changed, 37 insertions(+), 19 deletions(-)
> > 
> 
> -- 
> Masami Hiramatsu

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES
  2024-04-17 21:16   ` [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES Masami Hiramatsu
  2024-04-18 15:37     ` Mike Rapoport
@ 2024-04-19 15:49     ` Mike Rapoport
  2024-04-19 15:59       ` Christophe Leroy
  1 sibling, 1 reply; 22+ messages in thread
From: Mike Rapoport @ 2024-04-19 15:49 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner, bpf,
	linux-arm-kernel, Thomas Bogendoerfer, linux-parisc,
	Puranjay Mohan, linux-mm, netdev, Kent Overstreet, linux-kernel,
	Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules

Hi Masami,

On Thu, Apr 18, 2024 at 06:16:15AM +0900, Masami Hiramatsu wrote:
> Hi Mike,
> 
> On Thu, 11 Apr 2024 19:00:50 +0300
> Mike Rapoport <rppt@kernel.org> wrote:
> 
> > From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> > 
> > kprobes depended on CONFIG_MODULES because it has to allocate memory for
> > code.
> > 
> > Since code allocations are now implemented with execmem, kprobes can be
> > enabled in non-modular kernels.
> > 
> > Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
> > modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
> > dependency of CONFIG_KPROBES on CONFIG_MODULES.
> 
> Thanks for this work, but this conflicts with the latest fix in v6.9-rc4.
> Also, can you use IS_ENABLED(CONFIG_MODULES) instead of #ifdefs in
> function body? We have enough dummy functions for that, so it should
> not make a problem.

The code in check_kprobe_address_safe() that gets the module and checks for
__init functions does not compile with IS_ENABLED(CONFIG_MODULES). 
I can pull it out to a helper or leave #ifdef in the function body,
whichever you prefer.
 
> -- 
> Masami Hiramatsu

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES
  2024-04-19 15:49     ` Mike Rapoport
@ 2024-04-19 15:59       ` Christophe Leroy
  2024-04-20  7:33         ` Mike Rapoport
  0 siblings, 1 reply; 22+ messages in thread
From: Christophe Leroy @ 2024-04-19 15:59 UTC (permalink / raw)
  To: Mike Rapoport, Masami Hiramatsu
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner,
	bpf@vger.kerne l.org, linux-arm-kernel, Thomas Bogendoerfer,
	linux-parisc, Puranjay Mohan, linux-mm, netdev, Kent Overstreet,
	linux-kernel, Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules



Le 19/04/2024 à 17:49, Mike Rapoport a écrit :
> Hi Masami,
> 
> On Thu, Apr 18, 2024 at 06:16:15AM +0900, Masami Hiramatsu wrote:
>> Hi Mike,
>>
>> On Thu, 11 Apr 2024 19:00:50 +0300
>> Mike Rapoport <rppt@kernel.org> wrote:
>>
>>> From: "Mike Rapoport (IBM)" <rppt@kernel.org>
>>>
>>> kprobes depended on CONFIG_MODULES because it has to allocate memory for
>>> code.
>>>
>>> Since code allocations are now implemented with execmem, kprobes can be
>>> enabled in non-modular kernels.
>>>
>>> Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
>>> modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
>>> dependency of CONFIG_KPROBES on CONFIG_MODULES.
>>
>> Thanks for this work, but this conflicts with the latest fix in v6.9-rc4.
>> Also, can you use IS_ENABLED(CONFIG_MODULES) instead of #ifdefs in
>> function body? We have enough dummy functions for that, so it should
>> not make a problem.
> 
> The code in check_kprobe_address_safe() that gets the module and checks for
> __init functions does not compile with IS_ENABLED(CONFIG_MODULES).
> I can pull it out to a helper or leave #ifdef in the function body,
> whichever you prefer.

As far as I can see, the only problem is MODULE_STATE_COMING.
Can we move 'enum module_state' out of #ifdef CONFIG_MODULES in module.h  ?


>   
>> -- 
>> Masami Hiramatsu
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES
  2024-04-19 15:59       ` Christophe Leroy
@ 2024-04-20  7:33         ` Mike Rapoport
  2024-04-20  9:15           ` Masami Hiramatsu
  0 siblings, 1 reply; 22+ messages in thread
From: Mike Rapoport @ 2024-04-20  7:33 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner,
	Masami Hiramats u, bpf, linux-arm-kernel, Thomas Bogendoerfer,
	linux-parisc, Puranjay Mohan, linux-mm, netdev, Kent Overstreet,
	linux-kernel, Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules

On Fri, Apr 19, 2024 at 03:59:40PM +0000, Christophe Leroy wrote:
> 
> 
> Le 19/04/2024 à 17:49, Mike Rapoport a écrit :
> > Hi Masami,
> > 
> > On Thu, Apr 18, 2024 at 06:16:15AM +0900, Masami Hiramatsu wrote:
> >> Hi Mike,
> >>
> >> On Thu, 11 Apr 2024 19:00:50 +0300
> >> Mike Rapoport <rppt@kernel.org> wrote:
> >>
> >>> From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> >>>
> >>> kprobes depended on CONFIG_MODULES because it has to allocate memory for
> >>> code.
> >>>
> >>> Since code allocations are now implemented with execmem, kprobes can be
> >>> enabled in non-modular kernels.
> >>>
> >>> Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
> >>> modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
> >>> dependency of CONFIG_KPROBES on CONFIG_MODULES.
> >>
> >> Thanks for this work, but this conflicts with the latest fix in v6.9-rc4.
> >> Also, can you use IS_ENABLED(CONFIG_MODULES) instead of #ifdefs in
> >> function body? We have enough dummy functions for that, so it should
> >> not make a problem.
> > 
> > The code in check_kprobe_address_safe() that gets the module and checks for
> > __init functions does not compile with IS_ENABLED(CONFIG_MODULES).
> > I can pull it out to a helper or leave #ifdef in the function body,
> > whichever you prefer.
> 
> As far as I can see, the only problem is MODULE_STATE_COMING.
> Can we move 'enum module_state' out of #ifdef CONFIG_MODULES in module.h  ?

There's dereference of 'struct module' there:
 
		(*probed_mod)->state != MODULE_STATE_COMING) {
			...
		}

so moving out 'enum module_state' won't be enough.
 
> >   
> >> -- 
> >> Masami Hiramatsu
> > 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES
  2024-04-20  7:33         ` Mike Rapoport
@ 2024-04-20  9:15           ` Masami Hiramatsu
  2024-04-20 10:52             ` Mike Rapoport
  0 siblings, 1 reply; 22+ messages in thread
From: Masami Hiramatsu @ 2024-04-20  9:15 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner,
	bpf@vger.kerne l.org, linux-arm-kernel, Thomas Bogendoerfer,
	linux-parisc, Puranjay Mohan, linux-mm, netdev, Kent Overstreet,
	linux-kernel, Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules

On Sat, 20 Apr 2024 10:33:38 +0300
Mike Rapoport <rppt@kernel.org> wrote:

> On Fri, Apr 19, 2024 at 03:59:40PM +0000, Christophe Leroy wrote:
> > 
> > 
> > Le 19/04/2024 à 17:49, Mike Rapoport a écrit :
> > > Hi Masami,
> > > 
> > > On Thu, Apr 18, 2024 at 06:16:15AM +0900, Masami Hiramatsu wrote:
> > >> Hi Mike,
> > >>
> > >> On Thu, 11 Apr 2024 19:00:50 +0300
> > >> Mike Rapoport <rppt@kernel.org> wrote:
> > >>
> > >>> From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> > >>>
> > >>> kprobes depended on CONFIG_MODULES because it has to allocate memory for
> > >>> code.
> > >>>
> > >>> Since code allocations are now implemented with execmem, kprobes can be
> > >>> enabled in non-modular kernels.
> > >>>
> > >>> Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
> > >>> modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
> > >>> dependency of CONFIG_KPROBES on CONFIG_MODULES.
> > >>
> > >> Thanks for this work, but this conflicts with the latest fix in v6.9-rc4.
> > >> Also, can you use IS_ENABLED(CONFIG_MODULES) instead of #ifdefs in
> > >> function body? We have enough dummy functions for that, so it should
> > >> not make a problem.
> > > 
> > > The code in check_kprobe_address_safe() that gets the module and checks for
> > > __init functions does not compile with IS_ENABLED(CONFIG_MODULES).
> > > I can pull it out to a helper or leave #ifdef in the function body,
> > > whichever you prefer.
> > 
> > As far as I can see, the only problem is MODULE_STATE_COMING.
> > Can we move 'enum module_state' out of #ifdef CONFIG_MODULES in module.h  ?
> 
> There's dereference of 'struct module' there:
>  
> 		(*probed_mod)->state != MODULE_STATE_COMING) {
> 			...
> 		}
> 
> so moving out 'enum module_state' won't be enough.

Hmm, this part should be inline functions like;

#ifdef CONFIG_MODULES
static inline bool module_is_coming(struct module *mod)
{
	return mod->state == MODULE_STATE_COMING;
}
#else
#define module_is_coming(mod) (false)
#endif

Then we don't need the enum.
Thank you,

>  
> > >   
> > >> -- 
> > >> Masami Hiramatsu
> > > 
> 
> -- 
> Sincerely yours,
> Mike.
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES
  2024-04-20  9:15           ` Masami Hiramatsu
@ 2024-04-20 10:52             ` Mike Rapoport
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Rapoport @ 2024-04-20 10:52 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Mark Rutland, x86, Catalin Marinas, linux-mips, Song Liu,
	Donald Dutile, Luis Chamberlain, sparclinux, linux-riscv,
	Nadav Amit, linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Russell King, linux-trace-kernel, Alexandre Ghiti, Will Deacon,
	Heiko Carstens, Steven Rostedt, loongarch, Thomas Gleixner,
	bpf@vger.kerne l.org, linux-arm-kernel, Thomas Bogendoerfer,
	linux-parisc, Puranjay Mohan, linux-mm, netdev, Kent Overstreet,
	linux-kernel, Dinh Nguyen, Björn Töpel, Eric Chanudet,
	Palmer Dabbelt, Andrew Morton, Rick Edgecombe, linuxppc-dev,
	David S. Miller, linux-modules

On Sat, Apr 20, 2024 at 06:15:00PM +0900, Masami Hiramatsu wrote:
> On Sat, 20 Apr 2024 10:33:38 +0300
> Mike Rapoport <rppt@kernel.org> wrote:
> 
> > On Fri, Apr 19, 2024 at 03:59:40PM +0000, Christophe Leroy wrote:
> > > 
> > > 
> > > Le 19/04/2024 à 17:49, Mike Rapoport a écrit :
> > > > Hi Masami,
> > > > 
> > > > On Thu, Apr 18, 2024 at 06:16:15AM +0900, Masami Hiramatsu wrote:
> > > >> Hi Mike,
> > > >>
> > > >> On Thu, 11 Apr 2024 19:00:50 +0300
> > > >> Mike Rapoport <rppt@kernel.org> wrote:
> > > >>
> > > >>> From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> > > >>>
> > > >>> kprobes depended on CONFIG_MODULES because it has to allocate memory for
> > > >>> code.
> > > >>>
> > > >>> Since code allocations are now implemented with execmem, kprobes can be
> > > >>> enabled in non-modular kernels.
> > > >>>
> > > >>> Add #ifdef CONFIG_MODULE guards for the code dealing with kprobes inside
> > > >>> modules, make CONFIG_KPROBES select CONFIG_EXECMEM and drop the
> > > >>> dependency of CONFIG_KPROBES on CONFIG_MODULES.
> > > >>
> > > >> Thanks for this work, but this conflicts with the latest fix in v6.9-rc4.
> > > >> Also, can you use IS_ENABLED(CONFIG_MODULES) instead of #ifdefs in
> > > >> function body? We have enough dummy functions for that, so it should
> > > >> not make a problem.
> > > > 
> > > > The code in check_kprobe_address_safe() that gets the module and checks for
> > > > __init functions does not compile with IS_ENABLED(CONFIG_MODULES).
> > > > I can pull it out to a helper or leave #ifdef in the function body,
> > > > whichever you prefer.
> > > 
> > > As far as I can see, the only problem is MODULE_STATE_COMING.
> > > Can we move 'enum module_state' out of #ifdef CONFIG_MODULES in module.h  ?
> > 
> > There's dereference of 'struct module' there:
> >  
> > 		(*probed_mod)->state != MODULE_STATE_COMING) {
> > 			...
> > 		}
> > 
> > so moving out 'enum module_state' won't be enough.
> 
> Hmm, this part should be inline functions like;
> 
> #ifdef CONFIG_MODULES
> static inline bool module_is_coming(struct module *mod)
> {
> 	return mod->state == MODULE_STATE_COMING;
> }
> #else
> #define module_is_coming(mod) (false)

I'd prefer

static inline module_is_coming(struct module *mod)
{
	return false;
}

> #endif
>
> Then we don't need the enum.
> Thank you,
> 
> -- 
> Masami Hiramatsu (Google) <mhiramat@kernel.org>

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()
       [not found]                   ` <ZiNDGjkcqEPqruza@kernel.org>
@ 2024-04-20  9:11                     ` Masami Hiramatsu
  0 siblings, 0 replies; 22+ messages in thread
From: Masami Hiramatsu @ 2024-04-20  9:11 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mark Rutland, x86, Peter Zijlstra, Catalin Marinas, Russell King,
	Song Liu, Donald Dutile, sparclinux, linux-riscv, Nadav Amit,
	linux-arch, linux-s390, Helge Deller, Huacai Chen,
	Luis Chamberlain, linux-mips, linux-trace-kernel,
	Alexandre Ghiti, Will Deacon, Heiko Carstens, Steven Rostedt,
	loongarch, Thomas Gleixner, bpf, linux-arm-kernel,
	Thomas Bogendoerfer, linux-parisc, Puranjay Mohan, linux-mm,
	netdev, Ken t Overstreet, linux-kernel, Dinh Nguyen, Bjorn Topel,
	Eric Chanudet, Palmer Dabbelt, Andrew Morton, Rick Edgecombe,
	linuxppc-dev, David S. Miller, linux-modules

On Sat, 20 Apr 2024 07:22:50 +0300
Mike Rapoport <rppt@kernel.org> wrote:

> On Fri, Apr 19, 2024 at 02:42:16PM -0700, Song Liu wrote:
> > On Fri, Apr 19, 2024 at 1:00 PM Mike Rapoport <rppt@kernel.org> wrote:
> > >
> > > On Fri, Apr 19, 2024 at 10:32:39AM -0700, Song Liu wrote:
> > > > On Fri, Apr 19, 2024 at 10:03 AM Mike Rapoport <rppt@kernel.org> wrote:
> > > > [...]
> > > > > > >
> > > > > > > [1] https://lore.kernel.org/all/20240411160526.2093408-1-rppt@kernel.org
> > > > > >
> > > > > > For the ROX to work, we need different users (module text, kprobe, etc.) to have
> > > > > > the same execmem_range. From [1]:
> > > > > >
> > > > > > static void *execmem_cache_alloc(struct execmem_range *range, size_t size)
> > > > > > {
> > > > > > ...
> > > > > >        p = __execmem_cache_alloc(size);
> > > > > >        if (p)
> > > > > >                return p;
> > > > > >       err = execmem_cache_populate(range, size);
> > > > > > ...
> > > > > > }
> > > > > >
> > > > > > We are calling __execmem_cache_alloc() without range. For this to work,
> > > > > > we can only call execmem_cache_alloc() with one execmem_range.
> > > > >
> > > > > Actually, on x86 this will "just work" because everything shares the same
> > > > > address space :)
> > > > >
> > > > > The 2M pages in the cache will be in the modules space, so
> > > > > __execmem_cache_alloc() will always return memory from that address space.
> > > > >
> > > > > For other architectures this indeed needs to be fixed with passing the
> > > > > range to __execmem_cache_alloc() and limiting search in the cache for that
> > > > > range.
> > > >
> > > > I think we at least need the "map to" concept (initially proposed by Thomas)
> > > > to get this work. For example, EXECMEM_BPF and EXECMEM_KPROBE
> > > > maps to EXECMEM_MODULE_TEXT, so that all these actually share
> > > > the same range.
> > >
> > > Why?
> > 
> > IIUC, we need to update __execmem_cache_alloc() to take a range pointer as
> > input. module text will use "range" for EXECMEM_MODULE_TEXT, while kprobe
> > will use "range" for EXECMEM_KPROBE. Without "map to" concept or sharing
> > the "range" object, we will have to compare different range parameters to check
> > we can share cached pages between module text and kprobe, which is not
> > efficient. Did I miss something?

Song, thanks for trying to eplain. I think I need to explain why I used
module_alloc() originally.

This depends on how kprobe features are implemented on the architecture, and
how much features are supported on kprobes.

Because kprobe jump optimization and kprobe jump-back optimization need to
use a jump instruction to jump into the trampoline and jump back from the
trampoline directly, if the architecuture jmp instruction supports +-2GB range
like x86, it needs to allocate the trampoline buffer inside such address space.
This requirement is similar to the modules (because module function needs to
call other functions in the kernel etc.), at least kprobes on x86 used
module_alloc().

However, if an architecture only supports breakpoint/trap based kprobe,
it does not need to consider whether the execmem is allocated.

> 
> We can always share large ROX pages as long as they are within the correct
> address space. The permissions for them are ROX and the alignment
> differences are due to KASAN and this is handled during allocation of the
> large page to refill the cache. __execmem_cache_alloc() only needs to limit
> the search for the address space of the range.

So I don't think EXECMEM_KPROBE always same as EXECMEM_MODULE_TEXT, it
should be configured for each arch. Especially, if it is only used for
searching parameter, it looks OK to me.

Thank you,

> 
> And regardless, they way we deal with sharing of the cache can be sorted
> out later.
> 
> > Thanks,
> > Song
> 
> -- 
> Sincerely yours,
> Mike.
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-04-20 10:54 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20240411160051.2093261-1-rppt@kernel.org>
2024-04-11 18:00 ` [PATCH v4 00/15] mm: jit/text allocator Kent Overstreet
2024-04-11 19:45 ` Luis Chamberlain
     [not found] ` <20240411160051.2093261-7-rppt@kernel.org>
2024-04-11 20:53   ` [PATCH v4 06/15] mm/execmem, arch: convert simple overrides of module_alloc to execmem Sam Ravnborg
2024-04-14  7:26     ` Mike Rapoport
2024-04-15  8:03   ` Peter Zijlstra
     [not found] ` <20240411160051.2093261-8-rppt@kernel.org>
2024-04-15  9:36   ` [PATCH v4 07/15] mm/execmem, arch: convert remaining " Peter Zijlstra
     [not found] ` <20240411160051.2093261-6-rppt@kernel.org>
2024-04-11 19:42   ` [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free() Luis Chamberlain
2024-04-14  6:53     ` Mike Rapoport
2024-04-12  9:16   ` Ingo Molnar
2024-04-14  6:54     ` Mike Rapoport
2024-04-15  7:52   ` Peter Zijlstra
2024-04-15 16:51     ` Mike Rapoport
2024-04-15 17:36     ` Mark Rutland
2024-04-17 21:06   ` Masami Hiramatsu
     [not found] ` <20240411160051.2093261-15-rppt@kernel.org>
2024-04-17 21:16   ` [PATCH v4 14/15] kprobes: remove dependency on CONFIG_MODULES Masami Hiramatsu
2024-04-18 15:37     ` Mike Rapoport
2024-04-19 15:49     ` Mike Rapoport
2024-04-19 15:59       ` Christophe Leroy
2024-04-20  7:33         ` Mike Rapoport
2024-04-20  9:15           ` Masami Hiramatsu
2024-04-20 10:52             ` Mike Rapoport
     [not found] <ZiE91CJcNw7gBj9g@kernel.org>
     [not found] ` <CAPhsuW4au6v8k8Ab7Ff6Yj64rGvZ7wkz=Xrgh8ZZtLyscpChqQ@mail.gmail.com>
     [not found]   ` <ZiFd567L4Zzm2okO@kernel.org>
     [not found]     ` <CAPhsuW5SL4_=ZXdHZV8o0KS+5Vf25UMvEKhRgFQLioFtf2pgoQ@mail.gmail.com>
     [not found]       ` <ZiIVVBgaDN4RsroT@kernel.org>
     [not found]         ` <CAPhsuW7WoU+a46FhqqH8f-3=ehxeD4wSgKDWegMin1pT49OSWw@mail.gmail.com>
     [not found]           ` <ZiKjmaDgz_56ovbv@kernel.org>
     [not found]             ` <CAPhsuW7Nj1Sa_9xQtTgHz9AmX39zdh2x2COqA-qmkfpfX9hNWw@mail.gmail.com>
     [not found]               ` <ZiLNGgVSQ7_cg58y@kernel.org>
     [not found]                 ` <CAPhsuW4KRM4O4RFbYQrt=Coqyh9w29WiF2YF=8soDfauLFsKBA@mail.gmail.com>
     [not found]                   ` <ZiNDGjkcqEPqruza@kernel.org>
2024-04-20  9:11                     ` [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free() Masami Hiramatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).