All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: Song Liu <song@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	"David S. Miller" <davem@davemloft.net>,
	Dinh Nguyen <dinguyen@kernel.org>,
	Heiko Carstens <hca@linux.ibm.com>, Helge Deller <deller@gmx.de>,
	Huacai Chen <chenhuacai@kernel.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Russell King <linux@armlinux.org.uk>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>,
	bpf@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-mips@vger.kernel.org, linux-mm@kvack.org,
	linux-modules@vger.kernel.org, linux-parisc@vger.kernel.org,
	linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev,
	netdev@vger.kernel.org, sparclinux@vger.kernel.org,
	x86@kernel.org
Subject: Re: [PATCH 00/13] mm: jit/text allocator
Date: Tue, 13 Jun 2023 00:34:11 +0300	[thread overview]
Message-ID: <20230612213411.GP52412@kernel.org> (raw)
In-Reply-To: <CAPhsuW5YYa6nQhO2=zor75XkdKpFysZD42DgDRkKZvQT6aMqcA@mail.gmail.com>

On Fri, Jun 09, 2023 at 10:02:16AM -0700, Song Liu wrote:
> On Thu, Jun 8, 2023 at 11:41 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Tue, Jun 06, 2023 at 11:21:59AM -0700, Song Liu wrote:
> > > On Mon, Jun 5, 2023 at 3:09 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > [...]
> > >
> > > > > > > Can you give more detail on what parameters you need? If the only extra
> > > > > > > parameter is just "does this allocation need to live close to kernel
> > > > > > > text", that's not that big of a deal.
> > > > > >
> > > > > > My thinking was that we at least need the start + end for each caller. That
> > > > > > might be it, tbh.
> > > > >
> > > > > Do you mean that modules will have something like
> > > > >
> > > > >       jit_text_alloc(size, MODULES_START, MODULES_END);
> > > > >
> > > > > and kprobes will have
> > > > >
> > > > >       jit_text_alloc(size, KPROBES_START, KPROBES_END);
> > > > > ?
> > > >
> > > > Yes.
> > >
> > > How about we start with two APIs:
> > >      jit_text_alloc(size);
> > >      jit_text_alloc_range(size, start, end);
> > >
> > > AFAICT, arm64 is the only arch that requires the latter API. And TBH, I am
> > > not quite convinced it is needed.
> >
> > Right now arm64 and riscv override bpf and kprobes allocations to use the
> > entire vmalloc address space, but having the ability to allocate generated
> > code outside of modules area may be useful for other architectures.
> >
> > Still the start + end for the callers feels backwards to me because the
> > callers do not define the ranges, but rather the architectures, so we still
> > need a way for architectures to define how they want allocate memory for
> > the generated code.
> 
> Yeah, this makes sense.
> 
> >
> > > > > It sill can be achieved with a single jit_alloc_arch_params(), just by
> > > > > adding enum jit_type parameter to jit_text_alloc().
> > > >
> > > > That feels backwards to me; it centralizes a bunch of information about
> > > > distinct users to be able to shove that into a static array, when the callsites
> > > > can pass that information.
> > >
> > > I think we only two type of users: module and everything else (ftrace, kprobe,
> > > bpf stuff). The key differences are:
> > >
> > >   1. module uses text and data; while everything else only uses text.
> > >   2. module code is generated by the compiler, and thus has stronger
> > >   requirements in address ranges; everything else are generated via some
> > >   JIT or manual written assembly, so they are more flexible with address
> > >   ranges (in JIT, we can avoid using instructions that requires a specific
> > >   address range).
> > >
> > > The next question is, can we have the two types of users share the same
> > > address ranges? If not, we can reserve the preferred range for modules,
> > > and let everything else use the other range. I don't see reasons to further
> > > separate users in the "everything else" group.
> >
> > I agree that we can define only two types: modules and everything else and
> > let the architectures define if they need different ranges for these two
> > types, or want the same range for everything.
> >
> > With only two types we can have two API calls for alloc, and a single
> > structure that defines the ranges etc from the architecture side rather
> > than spread all over.
> >
> > Like something along these lines:
> >
> >         struct execmem_range {
> >                 unsigned long   start;
> >                 unsigned long   end;
> >                 unsigned long   fallback_start;
> >                 unsigned long   fallback_end;
> >                 pgprot_t        pgprot;
> >                 unsigned int    alignment;
> >         };
> >
> >         struct execmem_modules_range {
> >                 enum execmem_module_flags flags;
> >                 struct execmem_range text;
> >                 struct execmem_range data;
> >         };
> >
> >         struct execmem_jit_range {
> >                 struct execmem_range text;
> >         };
> >
> >         struct execmem_params {
> >                 struct execmem_modules_range    modules;
> >                 struct execmem_jit_range        jit;
> >         };
> >
> >         struct execmem_params *execmem_arch_params(void);
> >
> >         void *execmem_text_alloc(size_t size);
> >         void *execmem_data_alloc(size_t size);
> >         void execmem_free(void *ptr);
> 
> With the jit variation, maybe we can just call these
> module_[text|data]_alloc()?

I was thinking about "execmem_*_alloc()" for allocations that must be close to kernel
image, like modules, ftrace on x86 and s390 and maybe something else in the
future.

And jit_text_alloc() for allocations that can reside anywhere.

I tried to find a different name for 'struct execmem_modules_range' but
couldn't think of anything better than 'struct execmem_close_to_kernel', so
I've left modules in the name.
 
> btw: Depending on the implementation of the allocator, we may also
> need separate free()s for text and data.
> 
> >
> >         void *jit_text_alloc(size_t size);
> >         void jit_free(void *ptr);
> >

Let's just add jit_free() for completeness even if it will be the same as
execmem_free() for now.
 
> [...]
> 
> How should we move ahead from here?
> 
> AFAICT, all these changes can be easily extended and refactored
> in the future, so we don't have to make it perfect the first time.
> OTOH, having the interface committed (either this set or my
> module_alloc_type version) can unblock works in the binpack
> allocator and the users side. Therefore, I think we can move
> relatively fast here?

Once the interface and architecture abstraction is ready we can work on the
allocator and the users. We also need to update text_poking/alternatives on
architectures that would allocate executable memory as ROX. I did some
quick tests and with these patches 'modprobe xfs' takes tens time more than
before.
 
> Thanks,
> Song

-- 
Sincerely yours,
Mike.

WARNING: multiple messages have this Message-ID (diff)
From: Mike Rapoport <rppt@kernel.org>
To: Song Liu <song@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	"David S. Miller" <davem@davemloft.net>,
	Dinh Nguyen <dinguyen@kernel.org>,
	Heiko Carstens <hca@linux.ibm.com>, Helge Deller <deller@gmx.de>,
	Huacai Chen <chenhuacai@kernel.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Russell King <linux@armlinux.org.uk>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>,
	bpf@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-mips@vger.kernel.org, linux-mm@kvack.org,
	linux-modules@vger.kernel.org, linux-parisc@vger.kernel.org,
	linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev,
	netdev@vger.kernel.org, sparclinux@vger.kernel.org,
	x86@kernel.org
Subject: Re: [PATCH 00/13] mm: jit/text allocator
Date: Tue, 13 Jun 2023 00:34:11 +0300	[thread overview]
Message-ID: <20230612213411.GP52412@kernel.org> (raw)
In-Reply-To: <CAPhsuW5YYa6nQhO2=zor75XkdKpFysZD42DgDRkKZvQT6aMqcA@mail.gmail.com>

On Fri, Jun 09, 2023 at 10:02:16AM -0700, Song Liu wrote:
> On Thu, Jun 8, 2023 at 11:41 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Tue, Jun 06, 2023 at 11:21:59AM -0700, Song Liu wrote:
> > > On Mon, Jun 5, 2023 at 3:09 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > [...]
> > >
> > > > > > > Can you give more detail on what parameters you need? If the only extra
> > > > > > > parameter is just "does this allocation need to live close to kernel
> > > > > > > text", that's not that big of a deal.
> > > > > >
> > > > > > My thinking was that we at least need the start + end for each caller. That
> > > > > > might be it, tbh.
> > > > >
> > > > > Do you mean that modules will have something like
> > > > >
> > > > >       jit_text_alloc(size, MODULES_START, MODULES_END);
> > > > >
> > > > > and kprobes will have
> > > > >
> > > > >       jit_text_alloc(size, KPROBES_START, KPROBES_END);
> > > > > ?
> > > >
> > > > Yes.
> > >
> > > How about we start with two APIs:
> > >      jit_text_alloc(size);
> > >      jit_text_alloc_range(size, start, end);
> > >
> > > AFAICT, arm64 is the only arch that requires the latter API. And TBH, I am
> > > not quite convinced it is needed.
> >
> > Right now arm64 and riscv override bpf and kprobes allocations to use the
> > entire vmalloc address space, but having the ability to allocate generated
> > code outside of modules area may be useful for other architectures.
> >
> > Still the start + end for the callers feels backwards to me because the
> > callers do not define the ranges, but rather the architectures, so we still
> > need a way for architectures to define how they want allocate memory for
> > the generated code.
> 
> Yeah, this makes sense.
> 
> >
> > > > > It sill can be achieved with a single jit_alloc_arch_params(), just by
> > > > > adding enum jit_type parameter to jit_text_alloc().
> > > >
> > > > That feels backwards to me; it centralizes a bunch of information about
> > > > distinct users to be able to shove that into a static array, when the callsites
> > > > can pass that information.
> > >
> > > I think we only two type of users: module and everything else (ftrace, kprobe,
> > > bpf stuff). The key differences are:
> > >
> > >   1. module uses text and data; while everything else only uses text.
> > >   2. module code is generated by the compiler, and thus has stronger
> > >   requirements in address ranges; everything else are generated via some
> > >   JIT or manual written assembly, so they are more flexible with address
> > >   ranges (in JIT, we can avoid using instructions that requires a specific
> > >   address range).
> > >
> > > The next question is, can we have the two types of users share the same
> > > address ranges? If not, we can reserve the preferred range for modules,
> > > and let everything else use the other range. I don't see reasons to further
> > > separate users in the "everything else" group.
> >
> > I agree that we can define only two types: modules and everything else and
> > let the architectures define if they need different ranges for these two
> > types, or want the same range for everything.
> >
> > With only two types we can have two API calls for alloc, and a single
> > structure that defines the ranges etc from the architecture side rather
> > than spread all over.
> >
> > Like something along these lines:
> >
> >         struct execmem_range {
> >                 unsigned long   start;
> >                 unsigned long   end;
> >                 unsigned long   fallback_start;
> >                 unsigned long   fallback_end;
> >                 pgprot_t        pgprot;
> >                 unsigned int    alignment;
> >         };
> >
> >         struct execmem_modules_range {
> >                 enum execmem_module_flags flags;
> >                 struct execmem_range text;
> >                 struct execmem_range data;
> >         };
> >
> >         struct execmem_jit_range {
> >                 struct execmem_range text;
> >         };
> >
> >         struct execmem_params {
> >                 struct execmem_modules_range    modules;
> >                 struct execmem_jit_range        jit;
> >         };
> >
> >         struct execmem_params *execmem_arch_params(void);
> >
> >         void *execmem_text_alloc(size_t size);
> >         void *execmem_data_alloc(size_t size);
> >         void execmem_free(void *ptr);
> 
> With the jit variation, maybe we can just call these
> module_[text|data]_alloc()?

I was thinking about "execmem_*_alloc()" for allocations that must be close to kernel
image, like modules, ftrace on x86 and s390 and maybe something else in the
future.

And jit_text_alloc() for allocations that can reside anywhere.

I tried to find a different name for 'struct execmem_modules_range' but
couldn't think of anything better than 'struct execmem_close_to_kernel', so
I've left modules in the name.
 
> btw: Depending on the implementation of the allocator, we may also
> need separate free()s for text and data.
> 
> >
> >         void *jit_text_alloc(size_t size);
> >         void jit_free(void *ptr);
> >

Let's just add jit_free() for completeness even if it will be the same as
execmem_free() for now.
 
> [...]
> 
> How should we move ahead from here?
> 
> AFAICT, all these changes can be easily extended and refactored
> in the future, so we don't have to make it perfect the first time.
> OTOH, having the interface committed (either this set or my
> module_alloc_type version) can unblock works in the binpack
> allocator and the users side. Therefore, I think we can move
> relatively fast here?

Once the interface and architecture abstraction is ready we can work on the
allocator and the users. We also need to update text_poking/alternatives on
architectures that would allocate executable memory as ROX. I did some
quick tests and with these patches 'modprobe xfs' takes tens time more than
before.
 
> Thanks,
> Song

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Mike Rapoport <rppt@kernel.org>
To: Song Liu <song@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	"David S. Miller" <davem@davemloft.net>,
	Dinh Nguyen <dinguyen@kernel.org>,
	Heiko Carstens <hca@linux.ibm.com>, Helge Deller <deller@gmx.de>,
	Huacai Chen <chenhuacai@kernel.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Russell King <linux@armlinux.org.uk>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>,
	bpf@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-mips@vger.kernel.org, linux-mm@kvack.org,
	linux-modules@vger.kernel.org, linux-parisc@vger.kernel.org,
	linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev,
	netdev@vger.kernel.org, sparclinux@vger.kernel.org,
	x86@kernel.org
Subject: Re: [PATCH 00/13] mm: jit/text allocator
Date: Tue, 13 Jun 2023 00:34:11 +0300	[thread overview]
Message-ID: <20230612213411.GP52412@kernel.org> (raw)
In-Reply-To: <CAPhsuW5YYa6nQhO2=zor75XkdKpFysZD42DgDRkKZvQT6aMqcA@mail.gmail.com>

On Fri, Jun 09, 2023 at 10:02:16AM -0700, Song Liu wrote:
> On Thu, Jun 8, 2023 at 11:41 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Tue, Jun 06, 2023 at 11:21:59AM -0700, Song Liu wrote:
> > > On Mon, Jun 5, 2023 at 3:09 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > [...]
> > >
> > > > > > > Can you give more detail on what parameters you need? If the only extra
> > > > > > > parameter is just "does this allocation need to live close to kernel
> > > > > > > text", that's not that big of a deal.
> > > > > >
> > > > > > My thinking was that we at least need the start + end for each caller. That
> > > > > > might be it, tbh.
> > > > >
> > > > > Do you mean that modules will have something like
> > > > >
> > > > >       jit_text_alloc(size, MODULES_START, MODULES_END);
> > > > >
> > > > > and kprobes will have
> > > > >
> > > > >       jit_text_alloc(size, KPROBES_START, KPROBES_END);
> > > > > ?
> > > >
> > > > Yes.
> > >
> > > How about we start with two APIs:
> > >      jit_text_alloc(size);
> > >      jit_text_alloc_range(size, start, end);
> > >
> > > AFAICT, arm64 is the only arch that requires the latter API. And TBH, I am
> > > not quite convinced it is needed.
> >
> > Right now arm64 and riscv override bpf and kprobes allocations to use the
> > entire vmalloc address space, but having the ability to allocate generated
> > code outside of modules area may be useful for other architectures.
> >
> > Still the start + end for the callers feels backwards to me because the
> > callers do not define the ranges, but rather the architectures, so we still
> > need a way for architectures to define how they want allocate memory for
> > the generated code.
> 
> Yeah, this makes sense.
> 
> >
> > > > > It sill can be achieved with a single jit_alloc_arch_params(), just by
> > > > > adding enum jit_type parameter to jit_text_alloc().
> > > >
> > > > That feels backwards to me; it centralizes a bunch of information about
> > > > distinct users to be able to shove that into a static array, when the callsites
> > > > can pass that information.
> > >
> > > I think we only two type of users: module and everything else (ftrace, kprobe,
> > > bpf stuff). The key differences are:
> > >
> > >   1. module uses text and data; while everything else only uses text.
> > >   2. module code is generated by the compiler, and thus has stronger
> > >   requirements in address ranges; everything else are generated via some
> > >   JIT or manual written assembly, so they are more flexible with address
> > >   ranges (in JIT, we can avoid using instructions that requires a specific
> > >   address range).
> > >
> > > The next question is, can we have the two types of users share the same
> > > address ranges? If not, we can reserve the preferred range for modules,
> > > and let everything else use the other range. I don't see reasons to further
> > > separate users in the "everything else" group.
> >
> > I agree that we can define only two types: modules and everything else and
> > let the architectures define if they need different ranges for these two
> > types, or want the same range for everything.
> >
> > With only two types we can have two API calls for alloc, and a single
> > structure that defines the ranges etc from the architecture side rather
> > than spread all over.
> >
> > Like something along these lines:
> >
> >         struct execmem_range {
> >                 unsigned long   start;
> >                 unsigned long   end;
> >                 unsigned long   fallback_start;
> >                 unsigned long   fallback_end;
> >                 pgprot_t        pgprot;
> >                 unsigned int    alignment;
> >         };
> >
> >         struct execmem_modules_range {
> >                 enum execmem_module_flags flags;
> >                 struct execmem_range text;
> >                 struct execmem_range data;
> >         };
> >
> >         struct execmem_jit_range {
> >                 struct execmem_range text;
> >         };
> >
> >         struct execmem_params {
> >                 struct execmem_modules_range    modules;
> >                 struct execmem_jit_range        jit;
> >         };
> >
> >         struct execmem_params *execmem_arch_params(void);
> >
> >         void *execmem_text_alloc(size_t size);
> >         void *execmem_data_alloc(size_t size);
> >         void execmem_free(void *ptr);
> 
> With the jit variation, maybe we can just call these
> module_[text|data]_alloc()?

I was thinking about "execmem_*_alloc()" for allocations that must be close to kernel
image, like modules, ftrace on x86 and s390 and maybe something else in the
future.

And jit_text_alloc() for allocations that can reside anywhere.

I tried to find a different name for 'struct execmem_modules_range' but
couldn't think of anything better than 'struct execmem_close_to_kernel', so
I've left modules in the name.
 
> btw: Depending on the implementation of the allocator, we may also
> need separate free()s for text and data.
> 
> >
> >         void *jit_text_alloc(size_t size);
> >         void jit_free(void *ptr);
> >

Let's just add jit_free() for completeness even if it will be the same as
execmem_free() for now.
 
> [...]
> 
> How should we move ahead from here?
> 
> AFAICT, all these changes can be easily extended and refactored
> in the future, so we don't have to make it perfect the first time.
> OTOH, having the interface committed (either this set or my
> module_alloc_type version) can unblock works in the binpack
> allocator and the users side. Therefore, I think we can move
> relatively fast here?

Once the interface and architecture abstraction is ready we can work on the
allocator and the users. We also need to update text_poking/alternatives on
architectures that would allocate executable memory as ROX. I did some
quick tests and with these patches 'modprobe xfs' takes tens time more than
before.
 
> Thanks,
> Song

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Mike Rapoport <rppt@kernel.org>
To: Song Liu <song@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	x86@kernel.org, Catalin Marinas <catalin.marinas@arm.com>,
	linux-mips@vger.kernel.org, linux-mm@kvack.org,
	sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org,
	Will Deacon <will@kernel.org>,
	linux-s390@vger.kernel.org, Helge Deller <deller@gmx.de>,
	Huacai Chen <chenhuacai@kernel.org>,
	Russell King <linux@armlinux.org.uk>,
	"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
	linux-trace-kernel@vger.kernel.org,
	Heiko Carstens <hca@linux.ibm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	loongarch@lists.linux.dev, Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-arm-kernel@lists.infradead.org,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	linux-parisc@vger.kernel.org, netdev@vger.kernel.org,
	Kent Overstreet <kent.overstreet@linux.dev>,
	linux-kernel@vger.kernel.org, Dinh Nguyen <dinguyen@kernel.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	bpf@vger.kernel.org, linuxppc-dev@l ists.ozlabs.org,
	"David S. Miller" <davem@davemloft.net>,
	linux-modules@vger.kernel.org
Subject: Re: [PATCH 00/13] mm: jit/text allocator
Date: Tue, 13 Jun 2023 00:34:11 +0300	[thread overview]
Message-ID: <20230612213411.GP52412@kernel.org> (raw)
In-Reply-To: <CAPhsuW5YYa6nQhO2=zor75XkdKpFysZD42DgDRkKZvQT6aMqcA@mail.gmail.com>

On Fri, Jun 09, 2023 at 10:02:16AM -0700, Song Liu wrote:
> On Thu, Jun 8, 2023 at 11:41 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Tue, Jun 06, 2023 at 11:21:59AM -0700, Song Liu wrote:
> > > On Mon, Jun 5, 2023 at 3:09 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > [...]
> > >
> > > > > > > Can you give more detail on what parameters you need? If the only extra
> > > > > > > parameter is just "does this allocation need to live close to kernel
> > > > > > > text", that's not that big of a deal.
> > > > > >
> > > > > > My thinking was that we at least need the start + end for each caller. That
> > > > > > might be it, tbh.
> > > > >
> > > > > Do you mean that modules will have something like
> > > > >
> > > > >       jit_text_alloc(size, MODULES_START, MODULES_END);
> > > > >
> > > > > and kprobes will have
> > > > >
> > > > >       jit_text_alloc(size, KPROBES_START, KPROBES_END);
> > > > > ?
> > > >
> > > > Yes.
> > >
> > > How about we start with two APIs:
> > >      jit_text_alloc(size);
> > >      jit_text_alloc_range(size, start, end);
> > >
> > > AFAICT, arm64 is the only arch that requires the latter API. And TBH, I am
> > > not quite convinced it is needed.
> >
> > Right now arm64 and riscv override bpf and kprobes allocations to use the
> > entire vmalloc address space, but having the ability to allocate generated
> > code outside of modules area may be useful for other architectures.
> >
> > Still the start + end for the callers feels backwards to me because the
> > callers do not define the ranges, but rather the architectures, so we still
> > need a way for architectures to define how they want allocate memory for
> > the generated code.
> 
> Yeah, this makes sense.
> 
> >
> > > > > It sill can be achieved with a single jit_alloc_arch_params(), just by
> > > > > adding enum jit_type parameter to jit_text_alloc().
> > > >
> > > > That feels backwards to me; it centralizes a bunch of information about
> > > > distinct users to be able to shove that into a static array, when the callsites
> > > > can pass that information.
> > >
> > > I think we only two type of users: module and everything else (ftrace, kprobe,
> > > bpf stuff). The key differences are:
> > >
> > >   1. module uses text and data; while everything else only uses text.
> > >   2. module code is generated by the compiler, and thus has stronger
> > >   requirements in address ranges; everything else are generated via some
> > >   JIT or manual written assembly, so they are more flexible with address
> > >   ranges (in JIT, we can avoid using instructions that requires a specific
> > >   address range).
> > >
> > > The next question is, can we have the two types of users share the same
> > > address ranges? If not, we can reserve the preferred range for modules,
> > > and let everything else use the other range. I don't see reasons to further
> > > separate users in the "everything else" group.
> >
> > I agree that we can define only two types: modules and everything else and
> > let the architectures define if they need different ranges for these two
> > types, or want the same range for everything.
> >
> > With only two types we can have two API calls for alloc, and a single
> > structure that defines the ranges etc from the architecture side rather
> > than spread all over.
> >
> > Like something along these lines:
> >
> >         struct execmem_range {
> >                 unsigned long   start;
> >                 unsigned long   end;
> >                 unsigned long   fallback_start;
> >                 unsigned long   fallback_end;
> >                 pgprot_t        pgprot;
> >                 unsigned int    alignment;
> >         };
> >
> >         struct execmem_modules_range {
> >                 enum execmem_module_flags flags;
> >                 struct execmem_range text;
> >                 struct execmem_range data;
> >         };
> >
> >         struct execmem_jit_range {
> >                 struct execmem_range text;
> >         };
> >
> >         struct execmem_params {
> >                 struct execmem_modules_range    modules;
> >                 struct execmem_jit_range        jit;
> >         };
> >
> >         struct execmem_params *execmem_arch_params(void);
> >
> >         void *execmem_text_alloc(size_t size);
> >         void *execmem_data_alloc(size_t size);
> >         void execmem_free(void *ptr);
> 
> With the jit variation, maybe we can just call these
> module_[text|data]_alloc()?

I was thinking about "execmem_*_alloc()" for allocations that must be close to kernel
image, like modules, ftrace on x86 and s390 and maybe something else in the
future.

And jit_text_alloc() for allocations that can reside anywhere.

I tried to find a different name for 'struct execmem_modules_range' but
couldn't think of anything better than 'struct execmem_close_to_kernel', so
I've left modules in the name.
 
> btw: Depending on the implementation of the allocator, we may also
> need separate free()s for text and data.
> 
> >
> >         void *jit_text_alloc(size_t size);
> >         void jit_free(void *ptr);
> >

Let's just add jit_free() for completeness even if it will be the same as
execmem_free() for now.
 
> [...]
> 
> How should we move ahead from here?
> 
> AFAICT, all these changes can be easily extended and refactored
> in the future, so we don't have to make it perfect the first time.
> OTOH, having the interface committed (either this set or my
> module_alloc_type version) can unblock works in the binpack
> allocator and the users side. Therefore, I think we can move
> relatively fast here?

Once the interface and architecture abstraction is ready we can work on the
allocator and the users. We also need to update text_poking/alternatives on
architectures that would allocate executable memory as ROX. I did some
quick tests and with these patches 'modprobe xfs' takes tens time more than
before.
 
> Thanks,
> Song

-- 
Sincerely yours,
Mike.

  reply	other threads:[~2023-06-12 21:34 UTC|newest]

Thread overview: 220+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-01 10:12 [PATCH 00/13] mm: jit/text allocator Mike Rapoport
2023-06-01 10:12 ` Mike Rapoport
2023-06-01 10:12 ` Mike Rapoport
2023-06-01 10:12 ` Mike Rapoport
2023-06-01 10:12 ` [PATCH 01/13] nios2: define virtual address space for modules Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-13 22:16   ` Dinh Nguyen
2023-06-13 22:16     ` Dinh Nguyen
2023-06-13 22:16     ` Dinh Nguyen
2023-06-13 22:16     ` Dinh Nguyen
2023-06-01 10:12 ` [PATCH 02/13] mm: introduce jit_text_alloc() and use it instead of module_alloc() Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12 ` [PATCH 03/13] mm/jitalloc, arch: convert simple overrides of module_alloc to jitalloc Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12 ` [PATCH 04/13] mm/jitalloc, arch: convert remaining " Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 22:35   ` Song Liu
2023-06-01 22:35     ` Song Liu
2023-06-01 22:35     ` Song Liu
2023-06-01 22:35     ` Song Liu
2023-06-01 10:12 ` [PATCH 05/13] module, jitalloc: drop module_alloc Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12 ` [PATCH 06/13] mm/jitalloc: introduce jit_data_alloc() Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12 ` [PATCH 07/13] x86/ftrace: enable dynamic ftrace without CONFIG_MODULES Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12 ` [PATCH 08/13] arch: make jitalloc setup available regardless of CONFIG_MODULES Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12 ` [PATCH 09/13] kprobes: remove dependcy on CONFIG_MODULES Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12 ` [PATCH 10/13] modules, jitalloc: prepare to allocate executable memory as ROX Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12 ` [PATCH 11/13] ftrace: Add swap_func to ftrace_process_locs() Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12 ` [PATCH 12/13] x86/jitalloc: prepare to allocate exectuatble memory as ROX Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:30   ` Peter Zijlstra
2023-06-01 10:30     ` Peter Zijlstra
2023-06-01 10:30     ` Peter Zijlstra
2023-06-01 10:30     ` Peter Zijlstra
2023-06-01 11:07     ` Mike Rapoport
2023-06-01 11:07       ` Mike Rapoport
2023-06-01 11:07       ` Mike Rapoport
2023-06-01 11:07       ` Mike Rapoport
2023-06-02  0:02       ` Song Liu
2023-06-02  0:02         ` Song Liu
2023-06-02  0:02         ` Song Liu
2023-06-02  0:02         ` Song Liu
2023-06-01 17:52     ` Kent Overstreet
2023-06-01 17:52       ` Kent Overstreet
2023-06-01 17:52       ` Kent Overstreet
2023-06-01 17:52       ` Kent Overstreet
2023-06-01 16:54   ` Edgecombe, Rick P
2023-06-01 16:54     ` Edgecombe, Rick P
2023-06-01 16:54     ` Edgecombe, Rick P
2023-06-01 16:54     ` Edgecombe, Rick P
2023-06-01 18:00     ` Kent Overstreet
2023-06-01 18:00       ` Kent Overstreet
2023-06-01 18:00       ` Kent Overstreet
2023-06-01 18:00       ` Kent Overstreet
2023-06-01 18:13       ` Edgecombe, Rick P
2023-06-01 18:13         ` Edgecombe, Rick P
2023-06-01 18:13         ` Edgecombe, Rick P
2023-06-01 18:13         ` Edgecombe, Rick P
2023-06-01 18:38         ` Kent Overstreet
2023-06-01 18:38           ` Kent Overstreet
2023-06-01 18:38           ` Kent Overstreet
2023-06-01 18:38           ` Kent Overstreet
2023-06-01 20:50           ` Edgecombe, Rick P
2023-06-01 20:50             ` Edgecombe, Rick P
2023-06-01 20:50             ` Edgecombe, Rick P
2023-06-01 20:50             ` Edgecombe, Rick P
2023-06-01 23:54             ` Nadav Amit
2023-06-01 23:54               ` Nadav Amit
2023-06-01 23:54               ` Nadav Amit
2023-06-01 23:54               ` Nadav Amit
2023-06-05  2:52               ` Steven Rostedt
2023-06-05  2:52                 ` Steven Rostedt
2023-06-05  2:52                 ` Steven Rostedt
2023-06-05  2:52                 ` Steven Rostedt
2023-06-05  8:11                 ` Mike Rapoport
2023-06-05  8:11                   ` Mike Rapoport
2023-06-05  8:11                   ` Mike Rapoport
2023-06-05  8:11                   ` Mike Rapoport
2023-06-05 16:10                   ` Edgecombe, Rick P
2023-06-05 16:10                     ` Edgecombe, Rick P
2023-06-05 16:10                     ` Edgecombe, Rick P
2023-06-05 16:10                     ` Edgecombe, Rick P
2023-06-05 20:42                     ` Mike Rapoport
2023-06-05 20:42                       ` Mike Rapoport
2023-06-05 20:42                       ` Mike Rapoport
2023-06-05 20:42                       ` Mike Rapoport
2023-06-05 21:01                       ` Edgecombe, Rick P
2023-06-05 21:01                         ` Edgecombe, Rick P
2023-06-05 21:01                         ` Edgecombe, Rick P
2023-06-05 21:01                         ` Edgecombe, Rick P
2023-06-05 21:11                     ` Nadav Amit
2023-06-05 21:11                       ` Nadav Amit
2023-06-05 21:11                       ` Nadav Amit
2023-06-05 21:11                       ` Nadav Amit
2023-06-04 21:47             ` Kent Overstreet
2023-06-04 21:47               ` Kent Overstreet
2023-06-04 21:47               ` Kent Overstreet
2023-06-04 21:47               ` Kent Overstreet
2023-06-01 22:49   ` Song Liu
2023-06-01 22:49     ` Song Liu
2023-06-01 22:49     ` Song Liu
2023-06-01 22:49     ` Song Liu
2023-06-01 10:12 ` [PATCH 13/13] x86/jitalloc: make memory allocated for code ROX Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 10:12   ` Mike Rapoport
2023-06-01 16:12 ` [PATCH 00/13] mm: jit/text allocator Mark Rutland
2023-06-01 16:12   ` Mark Rutland
2023-06-01 16:12   ` Mark Rutland
2023-06-01 16:12   ` Mark Rutland
2023-06-01 18:14   ` Kent Overstreet
2023-06-01 18:14     ` Kent Overstreet
2023-06-01 18:14     ` Kent Overstreet
2023-06-01 18:14     ` Kent Overstreet
2023-06-02  9:35     ` Mark Rutland
2023-06-02  9:35       ` Mark Rutland
2023-06-02  9:35       ` Mark Rutland
2023-06-02  9:35       ` Mark Rutland
2023-06-02 18:20       ` Song Liu
2023-06-02 18:20         ` Song Liu
2023-06-02 18:20         ` Song Liu
2023-06-02 18:20         ` Song Liu
2023-06-03 21:11         ` Puranjay Mohan
2023-06-03 21:11           ` Puranjay Mohan
2023-06-03 21:11           ` Puranjay Mohan
2023-06-03 21:11           ` Puranjay Mohan
2023-06-04 18:02         ` Kent Overstreet
2023-06-04 18:02           ` Kent Overstreet
2023-06-04 18:02           ` Kent Overstreet
2023-06-04 18:02           ` Kent Overstreet
2023-06-04 21:22           ` Song Liu
2023-06-04 21:22             ` Song Liu
2023-06-04 21:22             ` Song Liu
2023-06-04 21:22             ` Song Liu
2023-06-04 21:40             ` Kent Overstreet
2023-06-04 21:40               ` Kent Overstreet
2023-06-04 21:40               ` Kent Overstreet
2023-06-04 21:40               ` Kent Overstreet
2023-06-05  4:05               ` Song Liu
2023-06-05  4:05                 ` Song Liu
2023-06-05  4:05                 ` Song Liu
2023-06-05  4:05                 ` Song Liu
2023-06-05  9:20       ` Mike Rapoport
2023-06-05  9:20         ` Mike Rapoport
2023-06-05  9:20         ` Mike Rapoport
2023-06-05  9:20         ` Mike Rapoport
2023-06-05 10:09         ` Mark Rutland
2023-06-05 10:09           ` Mark Rutland
2023-06-05 10:09           ` Mark Rutland
2023-06-05 10:09           ` Mark Rutland
2023-06-06 10:16           ` Mike Rapoport
2023-06-06 10:16             ` Mike Rapoport
2023-06-06 10:16             ` Mike Rapoport
2023-06-06 10:16             ` Mike Rapoport
2023-06-06 18:21           ` Song Liu
2023-06-06 18:21             ` Song Liu
2023-06-06 18:21             ` Song Liu
2023-06-06 18:21             ` Song Liu
2023-06-08 18:41             ` Mike Rapoport
2023-06-08 18:41               ` Mike Rapoport
2023-06-08 18:41               ` Mike Rapoport
2023-06-08 18:41               ` Mike Rapoport
2023-06-09 17:02               ` Song Liu
2023-06-09 17:02                 ` Song Liu
2023-06-09 17:02                 ` Song Liu
2023-06-09 17:02                 ` Song Liu
2023-06-12 21:34                 ` Mike Rapoport [this message]
2023-06-12 21:34                   ` Mike Rapoport
2023-06-12 21:34                   ` Mike Rapoport
2023-06-12 21:34                   ` Mike Rapoport
2023-06-13 18:56               ` Kent Overstreet
2023-06-13 18:56                 ` Kent Overstreet
2023-06-13 18:56                 ` Kent Overstreet
2023-06-13 18:56                 ` Kent Overstreet
2023-06-13 21:09                 ` Mike Rapoport
2023-06-13 21:09                   ` Mike Rapoport
2023-06-13 21:09                   ` Mike Rapoport
2023-06-13 21:09                   ` Mike Rapoport
2023-07-20  8:53           ` Mike Rapoport
2023-07-20  8:53             ` Mike Rapoport
2023-07-20  8:53             ` Mike Rapoport
2023-07-20  8:53             ` Mike Rapoport
2023-06-05 21:13         ` Kent Overstreet
2023-06-05 21:13           ` Kent Overstreet
2023-06-05 21:13           ` Kent Overstreet
2023-06-05 21:13           ` Kent Overstreet
2023-06-02  0:36 ` Song Liu
2023-06-02  0:36   ` Song Liu
2023-06-02  0:36   ` Song Liu
2023-06-02  0:36   ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230612213411.GP52412@kernel.org \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bpf@vger.kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chenhuacai@kernel.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=davem@davemloft.net \
    --cc=deller@gmx.de \
    --cc=dinguyen@kernel.org \
    --cc=hca@linux.ibm.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-modules@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=loongarch@lists.linux.dev \
    --cc=mark.rutland@arm.com \
    --cc=mcgrof@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=netdev@vger.kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=rostedt@goodmis.org \
    --cc=song@kernel.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tsbogend@alpha.franken.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.