All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping
@ 2022-10-25 20:06 Peter Zijlstra
  2022-10-25 20:06 ` [PATCH 1/5] mm: Move mm_cachep initialization to mm_init() Peter Zijlstra
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Peter Zijlstra @ 2022-10-25 20:06 UTC (permalink / raw)
  To: torvalds, rostedt, dave.hansen
  Cc: linux-kernel, peterz, x86, keescook, seanjc

Hi,

These few patches re-work and re-order boot things enough to avoid ftrace
creating boot time W+X maps.

The patches compile and boot for the one config I tested things on (with
ftrace=function enabled; *slooooow*).

I've pushed them out for the robots to have a go at here:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/mm.poke_me


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/5] mm: Move mm_cachep initialization to mm_init()
  2022-10-25 20:06 [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping Peter Zijlstra
@ 2022-10-25 20:06 ` Peter Zijlstra
  2022-11-02  9:12   ` [tip: x86/mm] " tip-bot2 for Peter Zijlstra
  2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
  2022-10-25 20:06 ` [PATCH 2/5] x86/mm: Use mm_alloc() in poking_init() Peter Zijlstra
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2022-10-25 20:06 UTC (permalink / raw)
  To: torvalds, rostedt, dave.hansen
  Cc: linux-kernel, peterz, x86, keescook, seanjc

In order to allow using mm_alloc() much earlier, move initializing
mm_cachep into mm_init().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/sched/task.h |    1 +
 init/main.c                |    1 +
 kernel/fork.c              |   32 ++++++++++++++++++--------------
 3 files changed, 20 insertions(+), 14 deletions(-)

--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -65,6 +65,7 @@ extern void sched_dead(struct task_struc
 void __noreturn do_task_dead(void);
 void __noreturn make_task_dead(int signr);
 
+extern void mm_cache_init(void);
 extern void proc_caches_init(void);
 
 extern void fork_init(void);
--- a/init/main.c
+++ b/init/main.c
@@ -860,6 +860,7 @@ static void __init mm_init(void)
 	/* Should be run after espfix64 is set up. */
 	pti_init();
 	kmsan_init_runtime();
+	mm_cache_init();
 }
 
 #ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -3015,10 +3015,27 @@ static void sighand_ctor(void *data)
 	init_waitqueue_head(&sighand->signalfd_wqh);
 }
 
-void __init proc_caches_init(void)
+void __init mm_cache_init(void)
 {
 	unsigned int mm_size;
 
+	/*
+	 * The mm_cpumask is located at the end of mm_struct, and is
+	 * dynamically sized based on the maximum CPU number this system
+	 * can have, taking hotplug into account (nr_cpu_ids).
+	 */
+	mm_size = sizeof(struct mm_struct) + cpumask_size();
+
+	mm_cachep = kmem_cache_create_usercopy("mm_struct",
+			mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
+			offsetof(struct mm_struct, saved_auxv),
+			sizeof_field(struct mm_struct, saved_auxv),
+			NULL);
+}
+
+void __init proc_caches_init(void)
+{
 	sighand_cachep = kmem_cache_create("sighand_cache",
 			sizeof(struct sighand_struct), 0,
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TYPESAFE_BY_RCU|
@@ -3036,19 +3053,6 @@ void __init proc_caches_init(void)
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
 			NULL);
 
-	/*
-	 * The mm_cpumask is located at the end of mm_struct, and is
-	 * dynamically sized based on the maximum CPU number this system
-	 * can have, taking hotplug into account (nr_cpu_ids).
-	 */
-	mm_size = sizeof(struct mm_struct) + cpumask_size();
-
-	mm_cachep = kmem_cache_create_usercopy("mm_struct",
-			mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
-			offsetof(struct mm_struct, saved_auxv),
-			sizeof_field(struct mm_struct, saved_auxv),
-			NULL);
 	vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT);
 	mmap_init();
 	nsproxy_cache_init();



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/5] x86/mm: Use mm_alloc() in poking_init()
  2022-10-25 20:06 [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping Peter Zijlstra
  2022-10-25 20:06 ` [PATCH 1/5] mm: Move mm_cachep initialization to mm_init() Peter Zijlstra
@ 2022-10-25 20:06 ` Peter Zijlstra
  2022-11-02  9:12   ` [tip: x86/mm] " tip-bot2 for Peter Zijlstra
  2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
  2022-10-25 20:06 ` [PATCH 3/5] x86/mm: Initialize text poking earlier Peter Zijlstra
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2022-10-25 20:06 UTC (permalink / raw)
  To: torvalds, rostedt, dave.hansen
  Cc: linux-kernel, peterz, x86, keescook, seanjc

Instead of duplicating init_mm, allocate a fresh mm. The advantage is
that mm_alloc() has much simpler dependencies. Additionally it makes
more conceptual sense, init_mm has no (and must not have) user state
to duplicate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/mm/init.c         |    2 +-
 include/linux/sched/task.h |    1 -
 kernel/fork.c              |    5 -----
 3 files changed, 1 insertion(+), 7 deletions(-)

--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -801,7 +801,7 @@ void __init poking_init(void)
 	spinlock_t *ptl;
 	pte_t *ptep;
 
-	poking_mm = copy_init_mm();
+	poking_mm = mm_alloc();
 	BUG_ON(!poking_mm);
 
 	/*
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -91,7 +91,6 @@ extern void exit_itimers(struct task_str
 extern pid_t kernel_clone(struct kernel_clone_args *kargs);
 struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node);
 struct task_struct *fork_idle(int);
-struct mm_struct *copy_init_mm(void);
 extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 extern pid_t user_mode_thread(int (*fn)(void *), void *arg, unsigned long flags);
 extern long kernel_wait4(pid_t, int __user *, int, struct rusage *);
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2592,11 +2592,6 @@ struct task_struct * __init fork_idle(in
 	return task;
 }
 
-struct mm_struct *copy_init_mm(void)
-{
-	return dup_mm(NULL, &init_mm);
-}
-
 /*
  * This is like kernel_clone(), but shaved down and tailored to just
  * creating io_uring workers. It returns a created task, or an error pointer.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 3/5] x86/mm: Initialize text poking earlier
  2022-10-25 20:06 [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping Peter Zijlstra
  2022-10-25 20:06 ` [PATCH 1/5] mm: Move mm_cachep initialization to mm_init() Peter Zijlstra
  2022-10-25 20:06 ` [PATCH 2/5] x86/mm: Use mm_alloc() in poking_init() Peter Zijlstra
@ 2022-10-25 20:06 ` Peter Zijlstra
  2022-11-02  9:12   ` [tip: x86/mm] " tip-bot2 for Peter Zijlstra
  2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
  2022-10-25 20:07 ` [PATCH 4/5] x86/ftrace: Remove SYSTEM_BOOTING exceptions Peter Zijlstra
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2022-10-25 20:06 UTC (permalink / raw)
  To: torvalds, rostedt, dave.hansen
  Cc: linux-kernel, peterz, x86, keescook, seanjc

Move poking_init() up a bunch; specifically move it right after
mm_init() which is right before ftrace_init().

This will allow simplifying ftrace text poking which currently has
a bunch of exceptions for early boot.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 init/main.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/init/main.c
+++ b/init/main.c
@@ -996,7 +996,7 @@ asmlinkage __visible void __init __no_sa
 	sort_main_extable();
 	trap_init();
 	mm_init();
-
+	poking_init();
 	ftrace_init();
 
 	/* trace_printk can be enabled here */
@@ -1135,7 +1135,6 @@ asmlinkage __visible void __init __no_sa
 	taskstats_init_early();
 	delayacct_init();
 
-	poking_init();
 	check_bugs();
 
 	acpi_subsystem_init();



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 4/5] x86/ftrace: Remove SYSTEM_BOOTING exceptions
  2022-10-25 20:06 [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping Peter Zijlstra
                   ` (2 preceding siblings ...)
  2022-10-25 20:06 ` [PATCH 3/5] x86/mm: Initialize text poking earlier Peter Zijlstra
@ 2022-10-25 20:07 ` Peter Zijlstra
  2022-10-25 20:59   ` Steven Rostedt
                     ` (2 more replies)
  2022-10-25 20:07 ` [PATCH 5/5] x86/mm: Do verify W^X at boot up Peter Zijlstra
  2022-10-25 23:07 ` [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping Linus Torvalds
  5 siblings, 3 replies; 27+ messages in thread
From: Peter Zijlstra @ 2022-10-25 20:07 UTC (permalink / raw)
  To: torvalds, rostedt, dave.hansen
  Cc: linux-kernel, peterz, x86, keescook, seanjc

Now that text_poke is available before ftrace, remove the
SYSTEM_BOOTING exceptions.

Specifically, this cures a W+X case during boot.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c |   10 ----------
 arch/x86/kernel/ftrace.c      |    3 +--
 2 files changed, 1 insertion(+), 12 deletions(-)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1681,11 +1681,6 @@ void __ref text_poke_queue(void *addr, c
 {
 	struct text_poke_loc *tp;
 
-	if (unlikely(system_state == SYSTEM_BOOTING)) {
-		text_poke_early(addr, opcode, len);
-		return;
-	}
-
 	text_poke_flush(addr);
 
 	tp = &tp_vec[tp_vec_nr++];
@@ -1707,11 +1702,6 @@ void __ref text_poke_bp(void *addr, cons
 {
 	struct text_poke_loc tp;
 
-	if (unlikely(system_state == SYSTEM_BOOTING)) {
-		text_poke_early(addr, opcode, len);
-		return;
-	}
-
 	text_poke_loc_init(&tp, addr, opcode, len, emulate);
 	text_poke_bp_batch(&tp, 1);
 }
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -415,8 +415,7 @@ create_trampoline(struct ftrace_ops *ops
 
 	set_vm_flush_reset_perms(trampoline);
 
-	if (likely(system_state != SYSTEM_BOOTING))
-		set_memory_ro((unsigned long)trampoline, npages);
+	set_memory_ro((unsigned long)trampoline, npages);
 	set_memory_x((unsigned long)trampoline, npages);
 	return (unsigned long)trampoline;
 fail:



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 5/5] x86/mm: Do verify W^X at boot up
  2022-10-25 20:06 [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping Peter Zijlstra
                   ` (3 preceding siblings ...)
  2022-10-25 20:07 ` [PATCH 4/5] x86/ftrace: Remove SYSTEM_BOOTING exceptions Peter Zijlstra
@ 2022-10-25 20:07 ` Peter Zijlstra
  2022-11-02  9:12   ` [tip: x86/mm] " tip-bot2 for Peter Zijlstra
  2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
  2022-10-25 23:07 ` [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping Linus Torvalds
  5 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2022-10-25 20:07 UTC (permalink / raw)
  To: torvalds, rostedt, dave.hansen
  Cc: linux-kernel, peterz, x86, keescook, seanjc

Straight up revert of commit:

  a970174d7a10 ("x86/mm: Do not verify W^X at boot up")

now that the root cause has been fixed.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/mm/pat/set_memory.c |    4 ----
 1 file changed, 4 deletions(-)

--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -587,10 +587,6 @@ static inline pgprot_t verify_rwx(pgprot
 {
 	unsigned long end;
 
-	/* Kernel text is rw at boot up */
-	if (system_state == SYSTEM_BOOTING)
-		return new;
-
 	/*
 	 * 32-bit has some unfixable W+X issues, like EFI code
 	 * and writeable data being in the same page.  Disable



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/5] x86/ftrace: Remove SYSTEM_BOOTING exceptions
  2022-10-25 20:07 ` [PATCH 4/5] x86/ftrace: Remove SYSTEM_BOOTING exceptions Peter Zijlstra
@ 2022-10-25 20:59   ` Steven Rostedt
  2022-10-26  7:02     ` Peter Zijlstra
  2022-11-02  9:12   ` [tip: x86/mm] " tip-bot2 for Peter Zijlstra
  2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
  2 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2022-10-25 20:59 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: torvalds, dave.hansen, linux-kernel, x86, keescook, seanjc

On Tue, 25 Oct 2022 22:07:00 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Now that text_poke is available before ftrace, remove the
> SYSTEM_BOOTING exceptions.
> 
> Specifically, this cures a W+X case during boot.

We have W+X all over the place (the entire kernel text). And I don't think
we really want this.

This will slow down boots in general, as it will cause all static_branches
to use this memory page logic. And I don't think we really want to do
that at boot up when we don't need to.

I would change this to:

	if (unlikely(system_state == SYSTEM_BOOTING) &&
	    core_kernel_text((unsigned long)addr)) {

This way we still do memcpy() on all core kernel text which is still
writable. It was the ftrace allocated trampoline that caused issues, not
the locations that were being updated.

-- Steve



> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/alternative.c |   10 ----------
>  arch/x86/kernel/ftrace.c      |    3 +--
>  2 files changed, 1 insertion(+), 12 deletions(-)
> 
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -1681,11 +1681,6 @@ void __ref text_poke_queue(void *addr, c
>  {
>  	struct text_poke_loc *tp;
>  
> -	if (unlikely(system_state == SYSTEM_BOOTING)) {
> -		text_poke_early(addr, opcode, len);
> -		return;
> -	}
> -
>  	text_poke_flush(addr);
>  
>  	tp = &tp_vec[tp_vec_nr++];
> @@ -1707,11 +1702,6 @@ void __ref text_poke_bp(void *addr, cons
>  {
>  	struct text_poke_loc tp;
>  
> -	if (unlikely(system_state == SYSTEM_BOOTING)) {
> -		text_poke_early(addr, opcode, len);
> -		return;
> -	}
> -
>  	text_poke_loc_init(&tp, addr, opcode, len, emulate);
>  	text_poke_bp_batch(&tp, 1);
>  }
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -415,8 +415,7 @@ create_trampoline(struct ftrace_ops *ops
>  
>  	set_vm_flush_reset_perms(trampoline);
>  
> -	if (likely(system_state != SYSTEM_BOOTING))
> -		set_memory_ro((unsigned long)trampoline, npages);
> +	set_memory_ro((unsigned long)trampoline, npages);
>  	set_memory_x((unsigned long)trampoline, npages);
>  	return (unsigned long)trampoline;
>  fail:
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping
  2022-10-25 20:06 [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping Peter Zijlstra
                   ` (4 preceding siblings ...)
  2022-10-25 20:07 ` [PATCH 5/5] x86/mm: Do verify W^X at boot up Peter Zijlstra
@ 2022-10-25 23:07 ` Linus Torvalds
  2022-10-25 23:17   ` Steven Rostedt
  2022-10-26  7:15   ` Peter Zijlstra
  5 siblings, 2 replies; 27+ messages in thread
From: Linus Torvalds @ 2022-10-25 23:07 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: rostedt, dave.hansen, linux-kernel, x86, keescook, seanjc

On Tue, Oct 25, 2022 at 1:11 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> These few patches re-work and re-order boot things enough to avoid ftrace
> creating boot time W+X maps.

Thanks, looks fine.

> The patches compile and boot for the one config I tested things on (with
> ftrace=function enabled; *slooooow*).

So this might be just tracing overhead, but it might also be that you
slowed down text_poke() at bootup a _lot_.

The only part that the NX^W checking cared about was that

-       if (likely(system_state != SYSTEM_BOOTING))
-               set_memory_ro((unsigned long)trampoline, npages);
+       set_memory_ro((unsigned long)trampoline, npages);
        set_memory_x((unsigned long)trampoline, npages);

for the create_trampoline(), because without the 'set_memory_ro()',
the 'set_memory_x()' will complain.

It does strike me that it's stupid to make those be two calls that do
exactly the same thing, and we should have a combined "set it
read-only and executable" function, but that's a separate issue.

The slowness is probably not the trampilines, but just the regular
"text_poke of kernel text" that we probably want to keep special just
because otherwise it's _so_ slow to do for every alternative etc.

                Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping
  2022-10-25 23:07 ` [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping Linus Torvalds
@ 2022-10-25 23:17   ` Steven Rostedt
  2022-10-26  7:15   ` Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2022-10-25 23:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, dave.hansen, linux-kernel, x86, keescook, seanjc

On Tue, 25 Oct 2022 16:07:25 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> The slowness is probably not the trampilines, but just the regular
> "text_poke of kernel text" that we probably want to keep special just
> because otherwise it's _so_ slow to do for every alternative etc.

Yes. That's why I recommended to change patch 4 to:

	if (unlikely(system_state == SYSTEM_BOOTING) &&
	    core_kernel_text((unsigned long)addr)) {
		text_poke_early(addr, opcode, len);
		return;
	}

The only special case we had was the ftrace trampoline that was dynamically
allocated, and there's paths that require updating it (when you add two
function callbacks to the same location).

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/5] x86/ftrace: Remove SYSTEM_BOOTING exceptions
  2022-10-25 20:59   ` Steven Rostedt
@ 2022-10-26  7:02     ` Peter Zijlstra
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2022-10-26  7:02 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: torvalds, dave.hansen, linux-kernel, x86, keescook, seanjc

On Tue, Oct 25, 2022 at 04:59:56PM -0400, Steven Rostedt wrote:
> On Tue, 25 Oct 2022 22:07:00 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Now that text_poke is available before ftrace, remove the
> > SYSTEM_BOOTING exceptions.
> > 
> > Specifically, this cures a W+X case during boot.
> 
> We have W+X all over the place (the entire kernel text). And I don't think
> we really want this.
> 
> This will slow down boots in general, as it will cause all static_branches
> to use this memory page logic. And I don't think we really want to do
> that at boot up when we don't need to.

Both static_call and jump_label explicitly call text_poke_early() when
appropriate.

> I would change this to:
> 
> 	if (unlikely(system_state == SYSTEM_BOOTING) &&
> 	    core_kernel_text((unsigned long)addr)) {
> 
> This way we still do memcpy() on all core kernel text which is still
> writable. It was the ftrace allocated trampoline that caused issues, not
> the locations that were being updated.

I would suggest changing ftrace to call text_poke_early() when
appropriate if it matters (it already does a little of that); doing a
boot test with and without my patch 4 on shows no noticable overhead
over being horribly slow either way.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping
  2022-10-25 23:07 ` [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping Linus Torvalds
  2022-10-25 23:17   ` Steven Rostedt
@ 2022-10-26  7:15   ` Peter Zijlstra
  2022-10-26 17:59     ` Linus Torvalds
                       ` (2 more replies)
  1 sibling, 3 replies; 27+ messages in thread
From: Peter Zijlstra @ 2022-10-26  7:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: rostedt, dave.hansen, linux-kernel, x86, keescook, seanjc

On Tue, Oct 25, 2022 at 04:07:25PM -0700, Linus Torvalds wrote:

> It does strike me that it's stupid to make those be two calls that do
> exactly the same thing, and we should have a combined "set it
> read-only and executable" function, but that's a separate issue.

Right, and we have it all over the place. Something like the below
perhaps? I'll feed it to the robots, see if something breaks.

> The slowness is probably not the trampilines, but just the regular
> "text_poke of kernel text" that we probably want to keep special just
> because otherwise it's _so_ slow to do for every alternative etc.

I tried with and without the patches, it's dead slow either way and I
couldn't spot a noticable difference between the two -- so I'm assuming
it's simply the trace overhead, not the trace-enable time.


---
--- a/arch/arm/mach-omap1/sram-init.c
+++ b/arch/arm/mach-omap1/sram-init.c
@@ -74,8 +74,7 @@ void *omap_sram_push(void *funcp, unsign
 
 	dst = fncpy(sram, funcp, size);
 
-	set_memory_ro(base, pages);
-	set_memory_x(base, pages);
+	set_memory_rox(base, pages);
 
 	return dst;
 }
@@ -126,8 +125,7 @@ static void __init omap_detect_and_map_s
 	base = (unsigned long)omap_sram_base;
 	pages = PAGE_ALIGN(omap_sram_size) / PAGE_SIZE;
 
-	set_memory_ro(base, pages);
-	set_memory_x(base, pages);
+	set_memory_rox(base, pages);
 }
 
 static void (*_omap_sram_reprogram_clock)(u32 dpllctl, u32 ckctl);
--- a/arch/arm/mach-omap2/sram.c
+++ b/arch/arm/mach-omap2/sram.c
@@ -96,8 +96,7 @@ void *omap_sram_push(void *funcp, unsign
 
 	dst = fncpy(sram, funcp, size);
 
-	set_memory_ro(base, pages);
-	set_memory_x(base, pages);
+	set_memory_rox(base, pages);
 
 	return dst;
 }
@@ -217,8 +216,7 @@ static void __init omap2_map_sram(void)
 	base = (unsigned long)omap_sram_base;
 	pages = PAGE_ALIGN(omap_sram_size) / PAGE_SIZE;
 
-	set_memory_ro(base, pages);
-	set_memory_x(base, pages);
+	set_memory_rox(base, pages);
 }
 
 static void (*_omap2_sram_ddr_init)(u32 *slow_dll_ctrl, u32 fast_dll_ctrl,
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -134,10 +134,9 @@ void *alloc_insn_page(void)
 	if (!page)
 		return NULL;
 
-	if (strict_module_rwx_enabled()) {
-		set_memory_ro((unsigned long)page, 1);
-		set_memory_x((unsigned long)page, 1);
-	}
+	if (strict_module_rwx_enabled())
+		set_memory_rox((unsigned long)page, 1);
+
 	return page;
 }
 
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -415,8 +415,7 @@ create_trampoline(struct ftrace_ops *ops
 
 	set_vm_flush_reset_perms(trampoline);
 
-	set_memory_ro((unsigned long)trampoline, npages);
-	set_memory_x((unsigned long)trampoline, npages);
+	set_memory_rox((unsigned long)trampoline, npages);
 	return (unsigned long)trampoline;
 fail:
 	tramp_free(trampoline);
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -415,17 +415,12 @@ void *alloc_insn_page(void)
 		return NULL;
 
 	set_vm_flush_reset_perms(page);
-	/*
-	 * First make the page read-only, and only then make it executable to
-	 * prevent it from being W+X in between.
-	 */
-	set_memory_ro((unsigned long)page, 1);
 
 	/*
 	 * TODO: Once additional kernel code protection mechanisms are set, ensure
 	 * that the page was not maliciously altered and it is still zeroed.
 	 */
-	set_memory_x((unsigned long)page, 1);
+	set_memory_rox((unsigned long)page, 1);
 
 	return page;
 }
--- a/drivers/misc/sram-exec.c
+++ b/drivers/misc/sram-exec.c
@@ -106,10 +106,7 @@ void *sram_exec_copy(struct gen_pool *po
 
 	dst_cpy = fncpy(dst, src, size);
 
-	ret = set_memory_ro((unsigned long)base, pages);
-	if (ret)
-		goto error_out;
-	ret = set_memory_x((unsigned long)base, pages);
+	ret = set_memory_rox((unsigned long)base, pages);
 	if (ret)
 		goto error_out;
 
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -860,8 +860,7 @@ static inline void bpf_prog_lock_ro(stru
 static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
 {
 	set_vm_flush_reset_perms(hdr);
-	set_memory_ro((unsigned long)hdr, hdr->size >> PAGE_SHIFT);
-	set_memory_x((unsigned long)hdr, hdr->size >> PAGE_SHIFT);
+	set_memory_rox((unsigned long)hdr, hdr->size >> PAGE_SHIFT);
 }
 
 int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap);
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -14,6 +14,14 @@ static inline int set_memory_x(unsigned
 static inline int set_memory_nx(unsigned long addr, int numpages) { return 0; }
 #endif
 
+static inline int set_memory_rox(unsigned long addr, int numpages)
+{
+	int ret = set_memory_ro(addr, numpages);
+	if (ret)
+		return ret;
+	return set_memory_x(addr, numpages);
+}
+
 #ifndef CONFIG_ARCH_HAS_SET_DIRECT_MAP
 static inline int set_direct_map_invalid_noflush(struct page *page)
 {
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -494,8 +494,7 @@ static int bpf_struct_ops_map_update_ele
 	refcount_set(&kvalue->refcnt, 1);
 	bpf_map_inc(map);
 
-	set_memory_ro((long)st_map->image, 1);
-	set_memory_x((long)st_map->image, 1);
+	set_memory_rox((long)st_map->image, 1);
 	err = st_ops->reg(kdata);
 	if (likely(!err)) {
 		/* Pair with smp_load_acquire() during lookup_elem().
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -864,8 +864,7 @@ static struct bpf_prog_pack *alloc_new_p
 	list_add_tail(&pack->list, &pack_list);
 
 	set_vm_flush_reset_perms(pack->ptr);
-	set_memory_ro((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
-	set_memory_x((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
+	set_memory_rox((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
 	return pack;
 }
 
@@ -883,8 +882,7 @@ void *bpf_prog_pack_alloc(u32 size, bpf_
 		if (ptr) {
 			bpf_fill_ill_insns(ptr, size);
 			set_vm_flush_reset_perms(ptr);
-			set_memory_ro((unsigned long)ptr, size / PAGE_SIZE);
-			set_memory_x((unsigned long)ptr, size / PAGE_SIZE);
+			set_memory_rox((unsigned long)ptr, size / PAGE_SIZE);
 		}
 		goto out;
 	}
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -468,8 +468,7 @@ static int bpf_trampoline_update(struct
 	if (err < 0)
 		goto out;
 
-	set_memory_ro((long)im->image, 1);
-	set_memory_x((long)im->image, 1);
+	set_memory_rox((long)im->image, 1);
 
 	WARN_ON(tr->cur_image && tr->selector == 0);
 	WARN_ON(!tr->cur_image && tr->selector);
--- a/net/bpf/bpf_dummy_struct_ops.c
+++ b/net/bpf/bpf_dummy_struct_ops.c
@@ -124,8 +124,7 @@ int bpf_struct_ops_test_run(struct bpf_p
 	if (err < 0)
 		goto out;
 
-	set_memory_ro((long)image, 1);
-	set_memory_x((long)image, 1);
+	set_memory_rox((long)image, 1);
 	prog_ret = dummy_ops_call_op(image, args);
 
 	err = dummy_ops_copy_args(args);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping
  2022-10-26  7:15   ` Peter Zijlstra
@ 2022-10-26 17:59     ` Linus Torvalds
  2022-10-27  6:59       ` Peter Zijlstra
  2022-11-02  9:12     ` [tip: x86/mm] mm: Introduce set_memory_rox() tip-bot2 for Peter Zijlstra
  2022-12-17 18:55     ` tip-bot2 for Peter Zijlstra
  2 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2022-10-26 17:59 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: rostedt, dave.hansen, linux-kernel, x86, keescook, seanjc

On Wed, Oct 26, 2022 at 12:15 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Right, and we have it all over the place. Something like the below
> perhaps? I'll feed it to the robots, see if something breaks.

I was nodding along with the patches like this:

> -       set_memory_ro(base, pages);
> -       set_memory_x(base, pages);
> +       set_memory_rox(base, pages);

but then I got to this part:

> +static inline int set_memory_rox(unsigned long addr, int numpages)
> +{
> +       int ret = set_memory_ro(addr, numpages);
> +       if (ret)
> +               return ret;
> +       return set_memory_x(addr, numpages);
> +}

and that's when I went "no, I really meant make it one single call".

set_memory_ro() and set_memory_x() basically end up doing the exact
same thing, just with different bits.  So it's not only silly to have
the callers do two different calls, it's silly to have the
*implementation* do two different scans of the page tables and page
merging/splitting.

I think in the case of x86, the set_memory_rox() function would
basically just be

    int set_memory_rox(unsigned long addr, int numpages)
    {
        pgprot_t clr = __pgprot(_PAGE_RW);
        pgprot_t set = { 0 };

        if (__supported_pte_mask & _PAGE_NX)
                set.pgprot |= _PAGE_NX;

        return change_page_attr_set_clr(&addr, numpages, set, clr, 0, NULL);
    }

or something close to that. (NOTE! The above was cobbled together in
the MUA, hasn't seen a compiler much less been booted, and might be
completely broken for some reason - it's meant to be the concept, not
some kind of final word).

IOW, the whole "set_rox" is really just a _single_
change_page_attr_set_clr() call.

Maybe you meant to do that, and this patch was just prep-work for the
arch code being the second stage?

                  Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping
  2022-10-26 17:59     ` Linus Torvalds
@ 2022-10-27  6:59       ` Peter Zijlstra
  2022-10-29 11:30         ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2022-10-27  6:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: rostedt, dave.hansen, linux-kernel, x86, keescook, seanjc

On Wed, Oct 26, 2022 at 10:59:29AM -0700, Linus Torvalds wrote:

> Maybe you meant to do that, and this patch was just prep-work for the
> arch code being the second stage?

Yeah; also, since this is cross arch, we need a fallback. Anyway;
robots hated on me for missing a few includes. I'll go prod at this
more.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping
  2022-10-27  6:59       ` Peter Zijlstra
@ 2022-10-29 11:30         ` Peter Zijlstra
  2022-10-29 17:35           ` Linus Torvalds
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2022-10-29 11:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: rostedt, dave.hansen, linux-kernel, x86, keescook, seanjc

On Thu, Oct 27, 2022 at 08:59:45AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 26, 2022 at 10:59:29AM -0700, Linus Torvalds wrote:
> 
> > Maybe you meant to do that, and this patch was just prep-work for the
> > arch code being the second stage?
> 
> Yeah; also, since this is cross arch, we need a fallback. Anyway;
> robots hated on me for missing a few includes. I'll go prod at this
> more.

Got around to it; I added the below patch on top and things seem to
still boot so it must be good :-)

---
Subject: x86/mm: Implement native set_memory_rox()
From: Peter Zijlstra <peterz@infradead.org>
Date: Sat Oct 29 13:19:31 CEST 2022

Provide a native implementation of set_memory_rox(), avoiding the
double set_memory_ro();set_memory_x(); calls.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/set_memory.h |    3 +++
 arch/x86/mm/pat/set_memory.c      |   10 ++++++++++
 include/linux/set_memory.h        |    2 ++
 3 files changed, 15 insertions(+)

--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -6,6 +6,9 @@
 #include <asm/page.h>
 #include <asm-generic/set_memory.h>
 
+#define set_memory_rox set_memory_rox
+int set_memory_rox(unsigned long addr, int numpages);
+
 /*
  * The set_memory_* API can be used to change various attributes of a virtual
  * address range. The attributes include:
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2025,6 +2025,16 @@ int set_memory_ro(unsigned long addr, in
 	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0);
 }
 
+int set_memory_rox(unsigned long addr, int numpages)
+{
+	pgprot_t clr = __pgprot(_PAGE_RW);
+
+	if (__supported_pte_mask & _PAGE_NX)
+		clr.pgprot |= _PAGE_NX;
+
+	return change_page_attr_clear(&addr, numpages, clr, 0);
+}
+
 int set_memory_rw(unsigned long addr, int numpages)
 {
 	return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_RW), 0);
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -14,6 +14,7 @@ static inline int set_memory_x(unsigned
 static inline int set_memory_nx(unsigned long addr, int numpages) { return 0; }
 #endif
 
+#ifndef set_memory_rox
 static inline int set_memory_rox(unsigned long addr, int numpages)
 {
 	int ret = set_memory_ro(addr, numpages);
@@ -21,6 +22,7 @@ static inline int set_memory_rox(unsigne
 		return ret;
 	return set_memory_x(addr, numpages);
 }
+#endif
 
 #ifndef CONFIG_ARCH_HAS_SET_DIRECT_MAP
 static inline int set_direct_map_invalid_noflush(struct page *page)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping
  2022-10-29 11:30         ` Peter Zijlstra
@ 2022-10-29 17:35           ` Linus Torvalds
  0 siblings, 0 replies; 27+ messages in thread
From: Linus Torvalds @ 2022-10-29 17:35 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: rostedt, dave.hansen, linux-kernel, x86, keescook, seanjc

On Sat, Oct 29, 2022 at 4:30 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Got around to it; I added the below patch on top and things seem to
> still boot so it must be good :-)

Thanks, looks good to me, and as I see your simpler version, I
realized that my broken MUA version wasn't "rox", it was "ronx".

                Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip: x86/mm] mm: Introduce set_memory_rox()
  2022-10-26  7:15   ` Peter Zijlstra
  2022-10-26 17:59     ` Linus Torvalds
@ 2022-11-02  9:12     ` tip-bot2 for Peter Zijlstra
  2022-12-17 18:55     ` tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2022-11-02  9:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Linus Torvalds, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     1f6eae43052889579dae56eae275003b9a876c21
Gitweb:        https://git.kernel.org/tip/1f6eae43052889579dae56eae275003b9a876c21
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 26 Oct 2022 12:13:03 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Nov 2022 13:43:58 +01:00

mm: Introduce set_memory_rox()

Because endlessly repeating:

	set_memory_ro()
	set_memory_x()

is getting tedious.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Y1jek64pXOsougmz@hirez.programming.kicks-ass.net
---
 arch/arm/mach-omap1/sram-init.c |  8 +++-----
 arch/arm/mach-omap2/sram.c      |  8 +++-----
 arch/powerpc/kernel/kprobes.c   |  9 ++++-----
 arch/x86/kernel/ftrace.c        |  5 ++---
 arch/x86/kernel/kprobes/core.c  |  9 ++-------
 drivers/misc/sram-exec.c        |  7 ++-----
 include/linux/filter.h          |  3 +--
 include/linux/set_memory.h      |  8 ++++++++
 kernel/bpf/bpf_struct_ops.c     |  3 +--
 kernel/bpf/core.c               |  6 ++----
 kernel/bpf/trampoline.c         |  3 +--
 net/bpf/bpf_dummy_struct_ops.c  |  3 +--
 12 files changed, 30 insertions(+), 42 deletions(-)

diff --git a/arch/arm/mach-omap1/sram-init.c b/arch/arm/mach-omap1/sram-init.c
index 27c42e2..dabf0c4 100644
--- a/arch/arm/mach-omap1/sram-init.c
+++ b/arch/arm/mach-omap1/sram-init.c
@@ -10,11 +10,11 @@
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/io.h>
+#include <linux/set_memory.h>
 
 #include <asm/fncpy.h>
 #include <asm/tlb.h>
 #include <asm/cacheflush.h>
-#include <asm/set_memory.h>
 
 #include <asm/mach/map.h>
 
@@ -74,8 +74,7 @@ void *omap_sram_push(void *funcp, unsigned long size)
 
 	dst = fncpy(sram, funcp, size);
 
-	set_memory_ro(base, pages);
-	set_memory_x(base, pages);
+	set_memory_rox(base, pages);
 
 	return dst;
 }
@@ -126,8 +125,7 @@ static void __init omap_detect_and_map_sram(void)
 	base = (unsigned long)omap_sram_base;
 	pages = PAGE_ALIGN(omap_sram_size) / PAGE_SIZE;
 
-	set_memory_ro(base, pages);
-	set_memory_x(base, pages);
+	set_memory_rox(base, pages);
 }
 
 static void (*_omap_sram_reprogram_clock)(u32 dpllctl, u32 ckctl);
diff --git a/arch/arm/mach-omap2/sram.c b/arch/arm/mach-omap2/sram.c
index 39cf270..815d390 100644
--- a/arch/arm/mach-omap2/sram.c
+++ b/arch/arm/mach-omap2/sram.c
@@ -14,11 +14,11 @@
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/io.h>
+#include <linux/set_memory.h>
 
 #include <asm/fncpy.h>
 #include <asm/tlb.h>
 #include <asm/cacheflush.h>
-#include <asm/set_memory.h>
 
 #include <asm/mach/map.h>
 
@@ -96,8 +96,7 @@ void *omap_sram_push(void *funcp, unsigned long size)
 
 	dst = fncpy(sram, funcp, size);
 
-	set_memory_ro(base, pages);
-	set_memory_x(base, pages);
+	set_memory_rox(base, pages);
 
 	return dst;
 }
@@ -217,8 +216,7 @@ static void __init omap2_map_sram(void)
 	base = (unsigned long)omap_sram_base;
 	pages = PAGE_ALIGN(omap_sram_size) / PAGE_SIZE;
 
-	set_memory_ro(base, pages);
-	set_memory_x(base, pages);
+	set_memory_rox(base, pages);
 }
 
 static void (*_omap2_sram_ddr_init)(u32 *slow_dll_ctrl, u32 fast_dll_ctrl,
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index bd7b1a0..7a89de3 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -20,12 +20,12 @@
 #include <linux/kdebug.h>
 #include <linux/slab.h>
 #include <linux/moduleloader.h>
+#include <linux/set_memory.h>
 #include <asm/code-patching.h>
 #include <asm/cacheflush.h>
 #include <asm/sstep.h>
 #include <asm/sections.h>
 #include <asm/inst.h>
-#include <asm/set_memory.h>
 #include <linux/uaccess.h>
 
 DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
@@ -134,10 +134,9 @@ void *alloc_insn_page(void)
 	if (!page)
 		return NULL;
 
-	if (strict_module_rwx_enabled()) {
-		set_memory_ro((unsigned long)page, 1);
-		set_memory_x((unsigned long)page, 1);
-	}
+	if (strict_module_rwx_enabled())
+		set_memory_rox((unsigned long)page, 1);
+
 	return page;
 }
 
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 43628b8..0357946 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -24,10 +24,10 @@
 #include <linux/module.h>
 #include <linux/memory.h>
 #include <linux/vmalloc.h>
+#include <linux/set_memory.h>
 
 #include <trace/syscall.h>
 
-#include <asm/set_memory.h>
 #include <asm/kprobes.h>
 #include <asm/ftrace.h>
 #include <asm/nops.h>
@@ -415,8 +415,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 
 	set_vm_flush_reset_perms(trampoline);
 
-	set_memory_ro((unsigned long)trampoline, npages);
-	set_memory_x((unsigned long)trampoline, npages);
+	set_memory_rox((unsigned long)trampoline, npages);
 	return (unsigned long)trampoline;
 fail:
 	tramp_free(trampoline);
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index eb8bc82..e7b7ca6 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -43,6 +43,7 @@
 #include <linux/objtool.h>
 #include <linux/vmalloc.h>
 #include <linux/pgtable.h>
+#include <linux/set_memory.h>
 
 #include <asm/text-patching.h>
 #include <asm/cacheflush.h>
@@ -51,7 +52,6 @@
 #include <asm/alternative.h>
 #include <asm/insn.h>
 #include <asm/debugreg.h>
-#include <asm/set_memory.h>
 #include <asm/ibt.h>
 
 #include "common.h"
@@ -415,17 +415,12 @@ void *alloc_insn_page(void)
 		return NULL;
 
 	set_vm_flush_reset_perms(page);
-	/*
-	 * First make the page read-only, and only then make it executable to
-	 * prevent it from being W+X in between.
-	 */
-	set_memory_ro((unsigned long)page, 1);
 
 	/*
 	 * TODO: Once additional kernel code protection mechanisms are set, ensure
 	 * that the page was not maliciously altered and it is still zeroed.
 	 */
-	set_memory_x((unsigned long)page, 1);
+	set_memory_rox((unsigned long)page, 1);
 
 	return page;
 }
diff --git a/drivers/misc/sram-exec.c b/drivers/misc/sram-exec.c
index a948e95..b71dbbd 100644
--- a/drivers/misc/sram-exec.c
+++ b/drivers/misc/sram-exec.c
@@ -10,9 +10,9 @@
 #include <linux/genalloc.h>
 #include <linux/mm.h>
 #include <linux/sram.h>
+#include <linux/set_memory.h>
 
 #include <asm/fncpy.h>
-#include <asm/set_memory.h>
 
 #include "sram.h"
 
@@ -106,10 +106,7 @@ void *sram_exec_copy(struct gen_pool *pool, void *dst, void *src,
 
 	dst_cpy = fncpy(dst, src, size);
 
-	ret = set_memory_ro((unsigned long)base, pages);
-	if (ret)
-		goto error_out;
-	ret = set_memory_x((unsigned long)base, pages);
+	ret = set_memory_rox((unsigned long)base, pages);
 	if (ret)
 		goto error_out;
 
diff --git a/include/linux/filter.h b/include/linux/filter.h
index efc42a6..f0b17af 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -860,8 +860,7 @@ static inline void bpf_prog_lock_ro(struct bpf_prog *fp)
 static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
 {
 	set_vm_flush_reset_perms(hdr);
-	set_memory_ro((unsigned long)hdr, hdr->size >> PAGE_SHIFT);
-	set_memory_x((unsigned long)hdr, hdr->size >> PAGE_SHIFT);
+	set_memory_rox((unsigned long)hdr, hdr->size >> PAGE_SHIFT);
 }
 
 int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap);
diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
index 369769c..023ebc6 100644
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -14,6 +14,14 @@ static inline int set_memory_x(unsigned long addr,  int numpages) { return 0; }
 static inline int set_memory_nx(unsigned long addr, int numpages) { return 0; }
 #endif
 
+static inline int set_memory_rox(unsigned long addr, int numpages)
+{
+	int ret = set_memory_ro(addr, numpages);
+	if (ret)
+		return ret;
+	return set_memory_x(addr, numpages);
+}
+
 #ifndef CONFIG_ARCH_HAS_SET_DIRECT_MAP
 static inline int set_direct_map_invalid_noflush(struct page *page)
 {
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 84b2d9d..ece9870 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -494,8 +494,7 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
 	refcount_set(&kvalue->refcnt, 1);
 	bpf_map_inc(map);
 
-	set_memory_ro((long)st_map->image, 1);
-	set_memory_x((long)st_map->image, 1);
+	set_memory_rox((long)st_map->image, 1);
 	err = st_ops->reg(kdata);
 	if (likely(!err)) {
 		/* Pair with smp_load_acquire() during lookup_elem().
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 25a54e0..b0525ea 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -864,8 +864,7 @@ static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_ins
 	list_add_tail(&pack->list, &pack_list);
 
 	set_vm_flush_reset_perms(pack->ptr);
-	set_memory_ro((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
-	set_memory_x((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
+	set_memory_rox((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
 	return pack;
 }
 
@@ -883,8 +882,7 @@ void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns)
 		if (ptr) {
 			bpf_fill_ill_insns(ptr, size);
 			set_vm_flush_reset_perms(ptr);
-			set_memory_ro((unsigned long)ptr, size / PAGE_SIZE);
-			set_memory_x((unsigned long)ptr, size / PAGE_SIZE);
+			set_memory_rox((unsigned long)ptr, size / PAGE_SIZE);
 		}
 		goto out;
 	}
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index bf0906e..a848922 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -468,8 +468,7 @@ again:
 	if (err < 0)
 		goto out;
 
-	set_memory_ro((long)im->image, 1);
-	set_memory_x((long)im->image, 1);
+	set_memory_rox((long)im->image, 1);
 
 	WARN_ON(tr->cur_image && tr->selector == 0);
 	WARN_ON(!tr->cur_image && tr->selector);
diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c
index e78dadf..9ff3232 100644
--- a/net/bpf/bpf_dummy_struct_ops.c
+++ b/net/bpf/bpf_dummy_struct_ops.c
@@ -124,8 +124,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
 	if (err < 0)
 		goto out;
 
-	set_memory_ro((long)image, 1);
-	set_memory_x((long)image, 1);
+	set_memory_rox((long)image, 1);
 	prog_ret = dummy_ops_call_op(image, args);
 
 	err = dummy_ops_copy_args(args);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: x86/mm] x86/mm: Do verify W^X at boot up
  2022-10-25 20:07 ` [PATCH 5/5] x86/mm: Do verify W^X at boot up Peter Zijlstra
@ 2022-11-02  9:12   ` tip-bot2 for Peter Zijlstra
  2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2022-11-02  9:12 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     237c7e967566ca7048fd7e74951fccb026f92df0
Gitweb:        https://git.kernel.org/tip/237c7e967566ca7048fd7e74951fccb026f92df0
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 25 Oct 2022 21:39:43 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Nov 2022 13:43:58 +01:00

x86/mm: Do verify W^X at boot up

Straight up revert of commit:

  a970174d7a10 ("x86/mm: Do not verify W^X at boot up")

now that the root cause has been fixed.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201058.011279208@infradead.org
---
 arch/x86/mm/pat/set_memory.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 2e5a045..97342c4 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -587,10 +587,6 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long star
 {
 	unsigned long end;
 
-	/* Kernel text is rw at boot up */
-	if (system_state == SYSTEM_BOOTING)
-		return new;
-
 	/*
 	 * 32-bit has some unfixable W+X issues, like EFI code
 	 * and writeable data being in the same page.  Disable

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: x86/mm] x86/ftrace: Remove SYSTEM_BOOTING exceptions
  2022-10-25 20:07 ` [PATCH 4/5] x86/ftrace: Remove SYSTEM_BOOTING exceptions Peter Zijlstra
  2022-10-25 20:59   ` Steven Rostedt
@ 2022-11-02  9:12   ` tip-bot2 for Peter Zijlstra
  2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2022-11-02  9:12 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     52a56f20bb7c34ed4b48466ad2d443165fad942f
Gitweb:        https://git.kernel.org/tip/52a56f20bb7c34ed4b48466ad2d443165fad942f
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 25 Oct 2022 21:39:47 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Nov 2022 13:43:58 +01:00

x86/ftrace: Remove SYSTEM_BOOTING exceptions

Now that text_poke is available before ftrace, remove the
SYSTEM_BOOTING exceptions.

Specifically, this cures a W+X case during boot.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.945960823@infradead.org
---
 arch/x86/kernel/alternative.c | 10 ----------
 arch/x86/kernel/ftrace.c      |  3 +--
 2 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 5cadcea..e240351 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1681,11 +1681,6 @@ void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const voi
 {
 	struct text_poke_loc *tp;
 
-	if (unlikely(system_state == SYSTEM_BOOTING)) {
-		text_poke_early(addr, opcode, len);
-		return;
-	}
-
 	text_poke_flush(addr);
 
 	tp = &tp_vec[tp_vec_nr++];
@@ -1707,11 +1702,6 @@ void __ref text_poke_bp(void *addr, const void *opcode, size_t len, const void *
 {
 	struct text_poke_loc tp;
 
-	if (unlikely(system_state == SYSTEM_BOOTING)) {
-		text_poke_early(addr, opcode, len);
-		return;
-	}
-
 	text_poke_loc_init(&tp, addr, opcode, len, emulate);
 	text_poke_bp_batch(&tp, 1);
 }
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index bd16500..43628b8 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -415,8 +415,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 
 	set_vm_flush_reset_perms(trampoline);
 
-	if (likely(system_state != SYSTEM_BOOTING))
-		set_memory_ro((unsigned long)trampoline, npages);
+	set_memory_ro((unsigned long)trampoline, npages);
 	set_memory_x((unsigned long)trampoline, npages);
 	return (unsigned long)trampoline;
 fail:

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: x86/mm] x86/mm: Initialize text poking earlier
  2022-10-25 20:06 ` [PATCH 3/5] x86/mm: Initialize text poking earlier Peter Zijlstra
@ 2022-11-02  9:12   ` tip-bot2 for Peter Zijlstra
  2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2022-11-02  9:12 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     4b6f3a4cd681fa54e67c2987dfebe413cd8d5a59
Gitweb:        https://git.kernel.org/tip/4b6f3a4cd681fa54e67c2987dfebe413cd8d5a59
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 25 Oct 2022 21:38:25 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Nov 2022 13:43:57 +01:00

x86/mm: Initialize text poking earlier

Move poking_init() up a bunch; specifically move it right after
mm_init() which is right before ftrace_init().

This will allow simplifying ftrace text poking which currently has
a bunch of exceptions for early boot.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.881703081@infradead.org
---
 init/main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/init/main.c b/init/main.c
index f1d1a54..5372ea2 100644
--- a/init/main.c
+++ b/init/main.c
@@ -996,7 +996,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
 	sort_main_extable();
 	trap_init();
 	mm_init();
-
+	poking_init();
 	ftrace_init();
 
 	/* trace_printk can be enabled here */
@@ -1135,7 +1135,6 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
 	taskstats_init_early();
 	delayacct_init();
 
-	poking_init();
 	check_bugs();
 
 	acpi_subsystem_init();

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: x86/mm] x86/mm: Use mm_alloc() in poking_init()
  2022-10-25 20:06 ` [PATCH 2/5] x86/mm: Use mm_alloc() in poking_init() Peter Zijlstra
@ 2022-11-02  9:12   ` tip-bot2 for Peter Zijlstra
  2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2022-11-02  9:12 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     107b6828a7cde2de0bb293588a59892831cef78b
Gitweb:        https://git.kernel.org/tip/107b6828a7cde2de0bb293588a59892831cef78b
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 25 Oct 2022 21:38:21 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Nov 2022 13:43:57 +01:00

x86/mm: Use mm_alloc() in poking_init()

Instead of duplicating init_mm, allocate a fresh mm. The advantage is
that mm_alloc() has much simpler dependencies. Additionally it makes
more conceptual sense, init_mm has no (and must not have) user state
to duplicate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.816175235@infradead.org
---
 arch/x86/mm/init.c         | 2 +-
 include/linux/sched/task.h | 1 -
 kernel/fork.c              | 5 -----
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 9121bc1..d398735 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -801,7 +801,7 @@ void __init poking_init(void)
 	spinlock_t *ptl;
 	pte_t *ptep;
 
-	poking_mm = copy_init_mm();
+	poking_mm = mm_alloc();
 	BUG_ON(!poking_mm);
 
 	/*
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 8431558..357e006 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -91,7 +91,6 @@ extern void exit_itimers(struct task_struct *);
 extern pid_t kernel_clone(struct kernel_clone_args *kargs);
 struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node);
 struct task_struct *fork_idle(int);
-struct mm_struct *copy_init_mm(void);
 extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 extern pid_t user_mode_thread(int (*fn)(void *), void *arg, unsigned long flags);
 extern long kernel_wait4(pid_t, int __user *, int, struct rusage *);
diff --git a/kernel/fork.c b/kernel/fork.c
index 451ce80..6142c58 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2592,11 +2592,6 @@ struct task_struct * __init fork_idle(int cpu)
 	return task;
 }
 
-struct mm_struct *copy_init_mm(void)
-{
-	return dup_mm(NULL, &init_mm);
-}
-
 /*
  * This is like kernel_clone(), but shaved down and tailored to just
  * creating io_uring workers. It returns a created task, or an error pointer.

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: x86/mm] mm: Move mm_cachep initialization to mm_init()
  2022-10-25 20:06 ` [PATCH 1/5] mm: Move mm_cachep initialization to mm_init() Peter Zijlstra
@ 2022-11-02  9:12   ` tip-bot2 for Peter Zijlstra
  2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2022-11-02  9:12 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     a2e87952bf54b99e8d560c095a2c75ebc676e1fb
Gitweb:        https://git.kernel.org/tip/a2e87952bf54b99e8d560c095a2c75ebc676e1fb
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 25 Oct 2022 21:38:18 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Nov 2022 13:43:56 +01:00

mm: Move mm_cachep initialization to mm_init()

In order to allow using mm_alloc() much earlier, move initializing
mm_cachep into mm_init().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.751153381@infradead.org
---
 include/linux/sched/task.h |  1 +
 init/main.c                |  1 +
 kernel/fork.c              | 32 ++++++++++++++++++--------------
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index d6c4816..8431558 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -65,6 +65,7 @@ extern void sched_dead(struct task_struct *p);
 void __noreturn do_task_dead(void);
 void __noreturn make_task_dead(int signr);
 
+extern void mm_cache_init(void);
 extern void proc_caches_init(void);
 
 extern void fork_init(void);
diff --git a/init/main.c b/init/main.c
index aa21add..f1d1a54 100644
--- a/init/main.c
+++ b/init/main.c
@@ -860,6 +860,7 @@ static void __init mm_init(void)
 	/* Should be run after espfix64 is set up. */
 	pti_init();
 	kmsan_init_runtime();
+	mm_cache_init();
 }
 
 #ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
diff --git a/kernel/fork.c b/kernel/fork.c
index 08969f5..451ce80 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -3015,10 +3015,27 @@ static void sighand_ctor(void *data)
 	init_waitqueue_head(&sighand->signalfd_wqh);
 }
 
-void __init proc_caches_init(void)
+void __init mm_cache_init(void)
 {
 	unsigned int mm_size;
 
+	/*
+	 * The mm_cpumask is located at the end of mm_struct, and is
+	 * dynamically sized based on the maximum CPU number this system
+	 * can have, taking hotplug into account (nr_cpu_ids).
+	 */
+	mm_size = sizeof(struct mm_struct) + cpumask_size();
+
+	mm_cachep = kmem_cache_create_usercopy("mm_struct",
+			mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
+			offsetof(struct mm_struct, saved_auxv),
+			sizeof_field(struct mm_struct, saved_auxv),
+			NULL);
+}
+
+void __init proc_caches_init(void)
+{
 	sighand_cachep = kmem_cache_create("sighand_cache",
 			sizeof(struct sighand_struct), 0,
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TYPESAFE_BY_RCU|
@@ -3036,19 +3053,6 @@ void __init proc_caches_init(void)
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
 			NULL);
 
-	/*
-	 * The mm_cpumask is located at the end of mm_struct, and is
-	 * dynamically sized based on the maximum CPU number this system
-	 * can have, taking hotplug into account (nr_cpu_ids).
-	 */
-	mm_size = sizeof(struct mm_struct) + cpumask_size();
-
-	mm_cachep = kmem_cache_create_usercopy("mm_struct",
-			mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
-			offsetof(struct mm_struct, saved_auxv),
-			sizeof_field(struct mm_struct, saved_auxv),
-			NULL);
 	vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT);
 	mmap_init();
 	nsproxy_cache_init();

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: x86/mm] x86/mm: Do verify W^X at boot up
  2022-10-25 20:07 ` [PATCH 5/5] x86/mm: Do verify W^X at boot up Peter Zijlstra
  2022-11-02  9:12   ` [tip: x86/mm] " tip-bot2 for Peter Zijlstra
@ 2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2022-12-17 18:55 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     414ebf148cb5c5fa727ec51fdb69c4ab82dccf3b
Gitweb:        https://git.kernel.org/tip/414ebf148cb5c5fa727ec51fdb69c4ab82dccf3b
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 25 Oct 2022 21:39:43 +02:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Thu, 15 Dec 2022 10:37:26 -08:00

x86/mm: Do verify W^X at boot up

Straight up revert of commit:

  a970174d7a10 ("x86/mm: Do not verify W^X at boot up")

now that the root cause has been fixed.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201058.011279208@infradead.org
---
 arch/x86/mm/pat/set_memory.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 2e5a045..97342c4 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -587,10 +587,6 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long star
 {
 	unsigned long end;
 
-	/* Kernel text is rw at boot up */
-	if (system_state == SYSTEM_BOOTING)
-		return new;
-
 	/*
 	 * 32-bit has some unfixable W+X issues, like EFI code
 	 * and writeable data being in the same page.  Disable

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: x86/mm] mm: Introduce set_memory_rox()
  2022-10-26  7:15   ` Peter Zijlstra
  2022-10-26 17:59     ` Linus Torvalds
  2022-11-02  9:12     ` [tip: x86/mm] mm: Introduce set_memory_rox() tip-bot2 for Peter Zijlstra
@ 2022-12-17 18:55     ` tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2022-12-17 18:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Linus Torvalds, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     d48567c9a0d1e605639f8a8705a61bbb55fb4e84
Gitweb:        https://git.kernel.org/tip/d48567c9a0d1e605639f8a8705a61bbb55fb4e84
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 26 Oct 2022 12:13:03 +02:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Thu, 15 Dec 2022 10:37:26 -08:00

mm: Introduce set_memory_rox()

Because endlessly repeating:

	set_memory_ro()
	set_memory_x()

is getting tedious.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Y1jek64pXOsougmz@hirez.programming.kicks-ass.net
---
 arch/arm/mach-omap1/sram-init.c |  8 +++-----
 arch/arm/mach-omap2/sram.c      |  8 +++-----
 arch/powerpc/kernel/kprobes.c   |  9 ++++-----
 arch/x86/kernel/ftrace.c        |  5 ++---
 arch/x86/kernel/kprobes/core.c  |  9 ++-------
 drivers/misc/sram-exec.c        |  7 ++-----
 include/linux/filter.h          |  3 +--
 include/linux/set_memory.h      |  8 ++++++++
 kernel/bpf/bpf_struct_ops.c     |  3 +--
 kernel/bpf/core.c               |  6 ++----
 kernel/bpf/trampoline.c         |  3 +--
 net/bpf/bpf_dummy_struct_ops.c  |  3 +--
 12 files changed, 30 insertions(+), 42 deletions(-)

diff --git a/arch/arm/mach-omap1/sram-init.c b/arch/arm/mach-omap1/sram-init.c
index 27c42e2..dabf0c4 100644
--- a/arch/arm/mach-omap1/sram-init.c
+++ b/arch/arm/mach-omap1/sram-init.c
@@ -10,11 +10,11 @@
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/io.h>
+#include <linux/set_memory.h>
 
 #include <asm/fncpy.h>
 #include <asm/tlb.h>
 #include <asm/cacheflush.h>
-#include <asm/set_memory.h>
 
 #include <asm/mach/map.h>
 
@@ -74,8 +74,7 @@ void *omap_sram_push(void *funcp, unsigned long size)
 
 	dst = fncpy(sram, funcp, size);
 
-	set_memory_ro(base, pages);
-	set_memory_x(base, pages);
+	set_memory_rox(base, pages);
 
 	return dst;
 }
@@ -126,8 +125,7 @@ static void __init omap_detect_and_map_sram(void)
 	base = (unsigned long)omap_sram_base;
 	pages = PAGE_ALIGN(omap_sram_size) / PAGE_SIZE;
 
-	set_memory_ro(base, pages);
-	set_memory_x(base, pages);
+	set_memory_rox(base, pages);
 }
 
 static void (*_omap_sram_reprogram_clock)(u32 dpllctl, u32 ckctl);
diff --git a/arch/arm/mach-omap2/sram.c b/arch/arm/mach-omap2/sram.c
index 39cf270..815d390 100644
--- a/arch/arm/mach-omap2/sram.c
+++ b/arch/arm/mach-omap2/sram.c
@@ -14,11 +14,11 @@
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/io.h>
+#include <linux/set_memory.h>
 
 #include <asm/fncpy.h>
 #include <asm/tlb.h>
 #include <asm/cacheflush.h>
-#include <asm/set_memory.h>
 
 #include <asm/mach/map.h>
 
@@ -96,8 +96,7 @@ void *omap_sram_push(void *funcp, unsigned long size)
 
 	dst = fncpy(sram, funcp, size);
 
-	set_memory_ro(base, pages);
-	set_memory_x(base, pages);
+	set_memory_rox(base, pages);
 
 	return dst;
 }
@@ -217,8 +216,7 @@ static void __init omap2_map_sram(void)
 	base = (unsigned long)omap_sram_base;
 	pages = PAGE_ALIGN(omap_sram_size) / PAGE_SIZE;
 
-	set_memory_ro(base, pages);
-	set_memory_x(base, pages);
+	set_memory_rox(base, pages);
 }
 
 static void (*_omap2_sram_ddr_init)(u32 *slow_dll_ctrl, u32 fast_dll_ctrl,
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index bd7b1a0..7a89de3 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -20,12 +20,12 @@
 #include <linux/kdebug.h>
 #include <linux/slab.h>
 #include <linux/moduleloader.h>
+#include <linux/set_memory.h>
 #include <asm/code-patching.h>
 #include <asm/cacheflush.h>
 #include <asm/sstep.h>
 #include <asm/sections.h>
 #include <asm/inst.h>
-#include <asm/set_memory.h>
 #include <linux/uaccess.h>
 
 DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
@@ -134,10 +134,9 @@ void *alloc_insn_page(void)
 	if (!page)
 		return NULL;
 
-	if (strict_module_rwx_enabled()) {
-		set_memory_ro((unsigned long)page, 1);
-		set_memory_x((unsigned long)page, 1);
-	}
+	if (strict_module_rwx_enabled())
+		set_memory_rox((unsigned long)page, 1);
+
 	return page;
 }
 
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 43628b8..0357946 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -24,10 +24,10 @@
 #include <linux/module.h>
 #include <linux/memory.h>
 #include <linux/vmalloc.h>
+#include <linux/set_memory.h>
 
 #include <trace/syscall.h>
 
-#include <asm/set_memory.h>
 #include <asm/kprobes.h>
 #include <asm/ftrace.h>
 #include <asm/nops.h>
@@ -415,8 +415,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 
 	set_vm_flush_reset_perms(trampoline);
 
-	set_memory_ro((unsigned long)trampoline, npages);
-	set_memory_x((unsigned long)trampoline, npages);
+	set_memory_rox((unsigned long)trampoline, npages);
 	return (unsigned long)trampoline;
 fail:
 	tramp_free(trampoline);
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index eb8bc82..e7b7ca6 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -43,6 +43,7 @@
 #include <linux/objtool.h>
 #include <linux/vmalloc.h>
 #include <linux/pgtable.h>
+#include <linux/set_memory.h>
 
 #include <asm/text-patching.h>
 #include <asm/cacheflush.h>
@@ -51,7 +52,6 @@
 #include <asm/alternative.h>
 #include <asm/insn.h>
 #include <asm/debugreg.h>
-#include <asm/set_memory.h>
 #include <asm/ibt.h>
 
 #include "common.h"
@@ -415,17 +415,12 @@ void *alloc_insn_page(void)
 		return NULL;
 
 	set_vm_flush_reset_perms(page);
-	/*
-	 * First make the page read-only, and only then make it executable to
-	 * prevent it from being W+X in between.
-	 */
-	set_memory_ro((unsigned long)page, 1);
 
 	/*
 	 * TODO: Once additional kernel code protection mechanisms are set, ensure
 	 * that the page was not maliciously altered and it is still zeroed.
 	 */
-	set_memory_x((unsigned long)page, 1);
+	set_memory_rox((unsigned long)page, 1);
 
 	return page;
 }
diff --git a/drivers/misc/sram-exec.c b/drivers/misc/sram-exec.c
index a948e95..b71dbbd 100644
--- a/drivers/misc/sram-exec.c
+++ b/drivers/misc/sram-exec.c
@@ -10,9 +10,9 @@
 #include <linux/genalloc.h>
 #include <linux/mm.h>
 #include <linux/sram.h>
+#include <linux/set_memory.h>
 
 #include <asm/fncpy.h>
-#include <asm/set_memory.h>
 
 #include "sram.h"
 
@@ -106,10 +106,7 @@ void *sram_exec_copy(struct gen_pool *pool, void *dst, void *src,
 
 	dst_cpy = fncpy(dst, src, size);
 
-	ret = set_memory_ro((unsigned long)base, pages);
-	if (ret)
-		goto error_out;
-	ret = set_memory_x((unsigned long)base, pages);
+	ret = set_memory_rox((unsigned long)base, pages);
 	if (ret)
 		goto error_out;
 
diff --git a/include/linux/filter.h b/include/linux/filter.h
index efc42a6..f0b17af 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -860,8 +860,7 @@ static inline void bpf_prog_lock_ro(struct bpf_prog *fp)
 static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
 {
 	set_vm_flush_reset_perms(hdr);
-	set_memory_ro((unsigned long)hdr, hdr->size >> PAGE_SHIFT);
-	set_memory_x((unsigned long)hdr, hdr->size >> PAGE_SHIFT);
+	set_memory_rox((unsigned long)hdr, hdr->size >> PAGE_SHIFT);
 }
 
 int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap);
diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
index 369769c..023ebc6 100644
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -14,6 +14,14 @@ static inline int set_memory_x(unsigned long addr,  int numpages) { return 0; }
 static inline int set_memory_nx(unsigned long addr, int numpages) { return 0; }
 #endif
 
+static inline int set_memory_rox(unsigned long addr, int numpages)
+{
+	int ret = set_memory_ro(addr, numpages);
+	if (ret)
+		return ret;
+	return set_memory_x(addr, numpages);
+}
+
 #ifndef CONFIG_ARCH_HAS_SET_DIRECT_MAP
 static inline int set_direct_map_invalid_noflush(struct page *page)
 {
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 84b2d9d..ece9870 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -494,8 +494,7 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
 	refcount_set(&kvalue->refcnt, 1);
 	bpf_map_inc(map);
 
-	set_memory_ro((long)st_map->image, 1);
-	set_memory_x((long)st_map->image, 1);
+	set_memory_rox((long)st_map->image, 1);
 	err = st_ops->reg(kdata);
 	if (likely(!err)) {
 		/* Pair with smp_load_acquire() during lookup_elem().
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 25a54e0..b0525ea 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -864,8 +864,7 @@ static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_ins
 	list_add_tail(&pack->list, &pack_list);
 
 	set_vm_flush_reset_perms(pack->ptr);
-	set_memory_ro((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
-	set_memory_x((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
+	set_memory_rox((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE);
 	return pack;
 }
 
@@ -883,8 +882,7 @@ void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns)
 		if (ptr) {
 			bpf_fill_ill_insns(ptr, size);
 			set_vm_flush_reset_perms(ptr);
-			set_memory_ro((unsigned long)ptr, size / PAGE_SIZE);
-			set_memory_x((unsigned long)ptr, size / PAGE_SIZE);
+			set_memory_rox((unsigned long)ptr, size / PAGE_SIZE);
 		}
 		goto out;
 	}
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index bf0906e..a848922 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -468,8 +468,7 @@ again:
 	if (err < 0)
 		goto out;
 
-	set_memory_ro((long)im->image, 1);
-	set_memory_x((long)im->image, 1);
+	set_memory_rox((long)im->image, 1);
 
 	WARN_ON(tr->cur_image && tr->selector == 0);
 	WARN_ON(!tr->cur_image && tr->selector);
diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c
index e78dadf..9ff3232 100644
--- a/net/bpf/bpf_dummy_struct_ops.c
+++ b/net/bpf/bpf_dummy_struct_ops.c
@@ -124,8 +124,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
 	if (err < 0)
 		goto out;
 
-	set_memory_ro((long)image, 1);
-	set_memory_x((long)image, 1);
+	set_memory_rox((long)image, 1);
 	prog_ret = dummy_ops_call_op(image, args);
 
 	err = dummy_ops_copy_args(args);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: x86/mm] x86/mm: Initialize text poking earlier
  2022-10-25 20:06 ` [PATCH 3/5] x86/mm: Initialize text poking earlier Peter Zijlstra
  2022-11-02  9:12   ` [tip: x86/mm] " tip-bot2 for Peter Zijlstra
@ 2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2022-12-17 18:55 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     5b93a83649c7cba3a15eb7e8959b250841acb1b1
Gitweb:        https://git.kernel.org/tip/5b93a83649c7cba3a15eb7e8959b250841acb1b1
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 25 Oct 2022 21:38:25 +02:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Thu, 15 Dec 2022 10:37:26 -08:00

x86/mm: Initialize text poking earlier

Move poking_init() up a bunch; specifically move it right after
mm_init() which is right before ftrace_init().

This will allow simplifying ftrace text poking which currently has
a bunch of exceptions for early boot.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.881703081@infradead.org
---
 init/main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/init/main.c b/init/main.c
index f1d1a54..5372ea2 100644
--- a/init/main.c
+++ b/init/main.c
@@ -996,7 +996,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
 	sort_main_extable();
 	trap_init();
 	mm_init();
-
+	poking_init();
 	ftrace_init();
 
 	/* trace_printk can be enabled here */
@@ -1135,7 +1135,6 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
 	taskstats_init_early();
 	delayacct_init();
 
-	poking_init();
 	check_bugs();
 
 	acpi_subsystem_init();

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: x86/mm] x86/ftrace: Remove SYSTEM_BOOTING exceptions
  2022-10-25 20:07 ` [PATCH 4/5] x86/ftrace: Remove SYSTEM_BOOTING exceptions Peter Zijlstra
  2022-10-25 20:59   ` Steven Rostedt
  2022-11-02  9:12   ` [tip: x86/mm] " tip-bot2 for Peter Zijlstra
@ 2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2022-12-17 18:55 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     eb7d389d5b2b3c453332abc41c3eea73290cc006
Gitweb:        https://git.kernel.org/tip/eb7d389d5b2b3c453332abc41c3eea73290cc006
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 25 Oct 2022 21:39:47 +02:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Thu, 15 Dec 2022 10:37:26 -08:00

x86/ftrace: Remove SYSTEM_BOOTING exceptions

Now that text_poke is available before ftrace, remove the
SYSTEM_BOOTING exceptions.

Specifically, this cures a W+X case during boot.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.945960823@infradead.org
---
 arch/x86/kernel/alternative.c | 10 ----------
 arch/x86/kernel/ftrace.c      |  3 +--
 2 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 5cadcea..e240351 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1681,11 +1681,6 @@ void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const voi
 {
 	struct text_poke_loc *tp;
 
-	if (unlikely(system_state == SYSTEM_BOOTING)) {
-		text_poke_early(addr, opcode, len);
-		return;
-	}
-
 	text_poke_flush(addr);
 
 	tp = &tp_vec[tp_vec_nr++];
@@ -1707,11 +1702,6 @@ void __ref text_poke_bp(void *addr, const void *opcode, size_t len, const void *
 {
 	struct text_poke_loc tp;
 
-	if (unlikely(system_state == SYSTEM_BOOTING)) {
-		text_poke_early(addr, opcode, len);
-		return;
-	}
-
 	text_poke_loc_init(&tp, addr, opcode, len, emulate);
 	text_poke_bp_batch(&tp, 1);
 }
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index bd16500..43628b8 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -415,8 +415,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 
 	set_vm_flush_reset_perms(trampoline);
 
-	if (likely(system_state != SYSTEM_BOOTING))
-		set_memory_ro((unsigned long)trampoline, npages);
+	set_memory_ro((unsigned long)trampoline, npages);
 	set_memory_x((unsigned long)trampoline, npages);
 	return (unsigned long)trampoline;
 fail:

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: x86/mm] x86/mm: Use mm_alloc() in poking_init()
  2022-10-25 20:06 ` [PATCH 2/5] x86/mm: Use mm_alloc() in poking_init() Peter Zijlstra
  2022-11-02  9:12   ` [tip: x86/mm] " tip-bot2 for Peter Zijlstra
@ 2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2022-12-17 18:55 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     3f4c8211d982099be693be9aa7d6fc4607dff290
Gitweb:        https://git.kernel.org/tip/3f4c8211d982099be693be9aa7d6fc4607dff290
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 25 Oct 2022 21:38:21 +02:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Thu, 15 Dec 2022 10:37:26 -08:00

x86/mm: Use mm_alloc() in poking_init()

Instead of duplicating init_mm, allocate a fresh mm. The advantage is
that mm_alloc() has much simpler dependencies. Additionally it makes
more conceptual sense, init_mm has no (and must not have) user state
to duplicate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.816175235@infradead.org
---
 arch/x86/mm/init.c         | 2 +-
 include/linux/sched/task.h | 1 -
 kernel/fork.c              | 5 -----
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 9121bc1..d398735 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -801,7 +801,7 @@ void __init poking_init(void)
 	spinlock_t *ptl;
 	pte_t *ptep;
 
-	poking_mm = copy_init_mm();
+	poking_mm = mm_alloc();
 	BUG_ON(!poking_mm);
 
 	/*
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 8431558..357e006 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -91,7 +91,6 @@ extern void exit_itimers(struct task_struct *);
 extern pid_t kernel_clone(struct kernel_clone_args *kargs);
 struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node);
 struct task_struct *fork_idle(int);
-struct mm_struct *copy_init_mm(void);
 extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 extern pid_t user_mode_thread(int (*fn)(void *), void *arg, unsigned long flags);
 extern long kernel_wait4(pid_t, int __user *, int, struct rusage *);
diff --git a/kernel/fork.c b/kernel/fork.c
index 451ce80..6142c58 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2592,11 +2592,6 @@ struct task_struct * __init fork_idle(int cpu)
 	return task;
 }
 
-struct mm_struct *copy_init_mm(void)
-{
-	return dup_mm(NULL, &init_mm);
-}
-
 /*
  * This is like kernel_clone(), but shaved down and tailored to just
  * creating io_uring workers. It returns a created task, or an error pointer.

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: x86/mm] mm: Move mm_cachep initialization to mm_init()
  2022-10-25 20:06 ` [PATCH 1/5] mm: Move mm_cachep initialization to mm_init() Peter Zijlstra
  2022-11-02  9:12   ` [tip: x86/mm] " tip-bot2 for Peter Zijlstra
@ 2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2022-12-17 18:55 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the x86/mm branch of tip:

Commit-ID:     af80602799681c78f14fbe20b6185a56020dedee
Gitweb:        https://git.kernel.org/tip/af80602799681c78f14fbe20b6185a56020dedee
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 25 Oct 2022 21:38:18 +02:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Thu, 15 Dec 2022 10:37:26 -08:00

mm: Move mm_cachep initialization to mm_init()

In order to allow using mm_alloc() much earlier, move initializing
mm_cachep into mm_init().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.751153381@infradead.org
---
 include/linux/sched/task.h |  1 +
 init/main.c                |  1 +
 kernel/fork.c              | 32 ++++++++++++++++++--------------
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index d6c4816..8431558 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -65,6 +65,7 @@ extern void sched_dead(struct task_struct *p);
 void __noreturn do_task_dead(void);
 void __noreturn make_task_dead(int signr);
 
+extern void mm_cache_init(void);
 extern void proc_caches_init(void);
 
 extern void fork_init(void);
diff --git a/init/main.c b/init/main.c
index aa21add..f1d1a54 100644
--- a/init/main.c
+++ b/init/main.c
@@ -860,6 +860,7 @@ static void __init mm_init(void)
 	/* Should be run after espfix64 is set up. */
 	pti_init();
 	kmsan_init_runtime();
+	mm_cache_init();
 }
 
 #ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
diff --git a/kernel/fork.c b/kernel/fork.c
index 08969f5..451ce80 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -3015,10 +3015,27 @@ static void sighand_ctor(void *data)
 	init_waitqueue_head(&sighand->signalfd_wqh);
 }
 
-void __init proc_caches_init(void)
+void __init mm_cache_init(void)
 {
 	unsigned int mm_size;
 
+	/*
+	 * The mm_cpumask is located at the end of mm_struct, and is
+	 * dynamically sized based on the maximum CPU number this system
+	 * can have, taking hotplug into account (nr_cpu_ids).
+	 */
+	mm_size = sizeof(struct mm_struct) + cpumask_size();
+
+	mm_cachep = kmem_cache_create_usercopy("mm_struct",
+			mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
+			offsetof(struct mm_struct, saved_auxv),
+			sizeof_field(struct mm_struct, saved_auxv),
+			NULL);
+}
+
+void __init proc_caches_init(void)
+{
 	sighand_cachep = kmem_cache_create("sighand_cache",
 			sizeof(struct sighand_struct), 0,
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TYPESAFE_BY_RCU|
@@ -3036,19 +3053,6 @@ void __init proc_caches_init(void)
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
 			NULL);
 
-	/*
-	 * The mm_cpumask is located at the end of mm_struct, and is
-	 * dynamically sized based on the maximum CPU number this system
-	 * can have, taking hotplug into account (nr_cpu_ids).
-	 */
-	mm_size = sizeof(struct mm_struct) + cpumask_size();
-
-	mm_cachep = kmem_cache_create_usercopy("mm_struct",
-			mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,
-			offsetof(struct mm_struct, saved_auxv),
-			sizeof_field(struct mm_struct, saved_auxv),
-			NULL);
 	vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT);
 	mmap_init();
 	nsproxy_cache_init();

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-12-17 18:58 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-25 20:06 [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping Peter Zijlstra
2022-10-25 20:06 ` [PATCH 1/5] mm: Move mm_cachep initialization to mm_init() Peter Zijlstra
2022-11-02  9:12   ` [tip: x86/mm] " tip-bot2 for Peter Zijlstra
2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
2022-10-25 20:06 ` [PATCH 2/5] x86/mm: Use mm_alloc() in poking_init() Peter Zijlstra
2022-11-02  9:12   ` [tip: x86/mm] " tip-bot2 for Peter Zijlstra
2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
2022-10-25 20:06 ` [PATCH 3/5] x86/mm: Initialize text poking earlier Peter Zijlstra
2022-11-02  9:12   ` [tip: x86/mm] " tip-bot2 for Peter Zijlstra
2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
2022-10-25 20:07 ` [PATCH 4/5] x86/ftrace: Remove SYSTEM_BOOTING exceptions Peter Zijlstra
2022-10-25 20:59   ` Steven Rostedt
2022-10-26  7:02     ` Peter Zijlstra
2022-11-02  9:12   ` [tip: x86/mm] " tip-bot2 for Peter Zijlstra
2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
2022-10-25 20:07 ` [PATCH 5/5] x86/mm: Do verify W^X at boot up Peter Zijlstra
2022-11-02  9:12   ` [tip: x86/mm] " tip-bot2 for Peter Zijlstra
2022-12-17 18:55   ` tip-bot2 for Peter Zijlstra
2022-10-25 23:07 ` [PATCH 0/5] x86/ftrace: Cure boot time W+X mapping Linus Torvalds
2022-10-25 23:17   ` Steven Rostedt
2022-10-26  7:15   ` Peter Zijlstra
2022-10-26 17:59     ` Linus Torvalds
2022-10-27  6:59       ` Peter Zijlstra
2022-10-29 11:30         ` Peter Zijlstra
2022-10-29 17:35           ` Linus Torvalds
2022-11-02  9:12     ` [tip: x86/mm] mm: Introduce set_memory_rox() tip-bot2 for Peter Zijlstra
2022-12-17 18:55     ` tip-bot2 for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.