linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Kees Cook <keescook@chromium.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Sean Christopherson <seanjc@google.com>
Subject: Re: [PATCH] x86/mm: Do not verify W^X at boot up
Date: Mon, 24 Oct 2022 18:04:08 -0400	[thread overview]
Message-ID: <20221024180408.7cd8dad7@gandalf.local.home> (raw)
In-Reply-To: <CAHk-=wjRpCS3oAJHVfByDoaj0-tAhV5a5YKV7QssUdMOAm8bAg@mail.gmail.com>

On Mon, 24 Oct 2022 12:08:49 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> And then all these special cases beget other special cases.

Well, I was able to get it to work without these special cases, but it
caused a bit of another kind of special cases to get poking_init() into
mm_init().

To get poking_init() working in mm_init() I had to pull in:

proc_caches_init(), as poking_init() uses some fork code that requires its
caches to be initialized.

Then dup_mm() is called, which uses maple tree code, which required
maple_tree_init() to be there too. (I pulled in radix_tree_init() just to
be consistent). But maple tree code calls kmem_cache_alloc_bulk() which
specifically states:

  /* Note that interrupts must be enabled when calling this function. */

and lockdep confirmed it.

So I did some hacking in the maple_tree.c to make that work.

And finally, dup_mm() calls dup_mmap() that calls flush_tlb_mm() for the
old mm, but since this is early boot up, there's really no need for that. I
added some hacks to avoid that.

Thus, I guess you get to choose your poison. Either we have special ftrace
cases in x86 that beget other special cases to keep it working, or we make
text_poke() work early by moving poking_init() into mm_init() and then
creating more generic special cases that beget other special cases (and I
have no idea if this works on other architectures, which could beget more
special cases).

Your call.

-- Steve

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 5cadcea035e0..e240351e0bc1 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1681,11 +1681,6 @@ void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const voi
 {
 	struct text_poke_loc *tp;
 
-	if (unlikely(system_state == SYSTEM_BOOTING)) {
-		text_poke_early(addr, opcode, len);
-		return;
-	}
-
 	text_poke_flush(addr);
 
 	tp = &tp_vec[tp_vec_nr++];
@@ -1707,11 +1702,6 @@ void __ref text_poke_bp(void *addr, const void *opcode, size_t len, const void *
 {
 	struct text_poke_loc tp;
 
-	if (unlikely(system_state == SYSTEM_BOOTING)) {
-		text_poke_early(addr, opcode, len);
-		return;
-	}
-
 	text_poke_loc_init(&tp, addr, opcode, len, emulate);
 	text_poke_bp_batch(&tp, 1);
 }
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index bd165004776d..43628b8480fa 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -415,8 +415,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 
 	set_vm_flush_reset_perms(trampoline);
 
-	if (likely(system_state != SYSTEM_BOOTING))
-		set_memory_ro((unsigned long)trampoline, npages);
+	set_memory_ro((unsigned long)trampoline, npages);
 	set_memory_x((unsigned long)trampoline, npages);
 	return (unsigned long)trampoline;
 fail:
diff --git a/init/main.c b/init/main.c
index aa21add5f7c5..e5f4ae2d4cca 100644
--- a/init/main.c
+++ b/init/main.c
@@ -860,6 +860,10 @@ static void __init mm_init(void)
 	/* Should be run after espfix64 is set up. */
 	pti_init();
 	kmsan_init_runtime();
+	proc_caches_init();
+	radix_tree_init();
+	maple_tree_init();
+	poking_init();
 }
 
 #ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
@@ -1011,8 +1015,6 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
 	if (WARN(!irqs_disabled(),
 		 "Interrupts were enabled *very* early, fixing it\n"))
 		local_irq_disable();
-	radix_tree_init();
-	maple_tree_init();
 
 	/*
 	 * Set up housekeeping before setting up workqueues to allow the unbound
@@ -1117,7 +1119,6 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
 	thread_stack_cache_init();
 	cred_init();
 	fork_init();
-	proc_caches_init();
 	uts_ns_init();
 	key_init();
 	security_init();
@@ -1134,7 +1135,6 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
 	taskstats_init_early();
 	delayacct_init();
 
-	poking_init();
 	check_bugs();
 
 	acpi_subsystem_init();
diff --git a/kernel/fork.c b/kernel/fork.c
index 08969f5aa38d..e24fb3ddcf9f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -702,7 +702,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 	mas_destroy(&mas);
 out:
 	mmap_write_unlock(mm);
-	flush_tlb_mm(oldmm);
+	if (likely(!early_boot_irqs_disabled))
+		flush_tlb_mm(oldmm);
 	mmap_write_unlock(oldmm);
 	dup_userfaultfd_complete(&uf);
 fail_uprobe_end:
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index e1743803c851..6fc72ca62c7d 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -1253,7 +1253,12 @@ static inline void mas_alloc_nodes(struct ma_state *mas, gfp_t gfp)
 		}
 
 		max_req = min(requested, max_req);
-		count = mt_alloc_bulk(gfp, max_req, slots);
+		if (unlikely(early_boot_irqs_disabled)) {
+			slots[0] = mt_alloc_one(gfp | GFP_ATOMIC);
+			count = slots[0] ? 1 : 0;
+		} else {
+			count = mt_alloc_bulk(gfp, max_req, slots);
+		}
 		if (!count)
 			goto nomem_bulk;
 

  reply	other threads:[~2022-10-24 23:46 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-24 15:45 [PATCH] x86/mm: Do not verify W^X at boot up Steven Rostedt
2022-10-24 16:14 ` Dave Hansen
2022-10-24 18:13   ` Steven Rostedt
2022-10-24 19:26   ` Steven Rostedt
2022-10-24 18:19 ` Linus Torvalds
2022-10-24 18:52   ` Steven Rostedt
2022-10-24 19:08     ` Linus Torvalds
2022-10-24 22:04       ` Steven Rostedt [this message]
2022-10-25  9:39       ` Peter Zijlstra
2022-10-25 10:16         ` Peter Zijlstra
2022-10-25 16:53           ` Linus Torvalds
2022-10-25 17:47             ` Peter Zijlstra
2022-10-25 18:14               ` Linus Torvalds
2022-10-25 18:46                 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221024180408.7cd8dad7@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=dave.hansen@intel.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=seanjc@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).