archive mirror
 help / color / mirror / Atom feed
From: Michael Ellerman <>
Subject: [PATCH] powerpc/mm: Fix set_memory_*() against concurrent accesses
Date: Tue, 17 Aug 2021 23:25:52 +1000	[thread overview]
Message-ID: <> (raw)

Laurent reported that STRICT_MODULE_RWX was causing intermittent crashes
on one of his systems:

  kernel tried to execute exec-protected page (c008000004073278) - exploit attempt? (uid: 0)
  BUG: Unable to handle kernel instruction fetch
  Faulting instruction address: 0xc008000004073278
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks ...
  CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12
  Workqueue: events control_work_handler [virtio_console]
  NIP:  c008000004073278 LR: c008000004073278 CTR: c0000000001e9de0
  REGS: c00000002e4ef7e0 TRAP: 0400   Not tainted  (5.14.0-rc4+)
  MSR:  800000004280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002822 XER: 200400cf
  NIP fill_queue+0xf0/0x210 [virtio_console]
  LR  fill_queue+0xf0/0x210 [virtio_console]
  Call Trace:
    fill_queue+0xb4/0x210 [virtio_console] (unreliable)
    add_port+0x1a8/0x470 [virtio_console]
    control_work_handler+0xbc/0x1e8 [virtio_console]

Jordan, Fabiano & Murilo were able to reproduce and identify that the
problem is caused by the call to module_enable_ro() in do_init_module(),
which happens after the module's init function has already been called.

Our current implementation of change_page_attr() is not safe against
concurrent accesses, because it invalidates the PTE before flushing the
TLB and then installing the new PTE. That leaves a window in time where
there is no valid PTE for the page, if another CPU tries to access the
page at that time we see something like the fault above.

We can't simply switch to set_pte_at()/flush TLB, because our hash MMU
code doesn't handle a set_pte_at() of a valid PTE. See [1].

But we do have pte_update(), which replaces the old PTE with the new,
meaning there's no window where the PTE is invalid. And the hash MMU
version hash__pte_update() deals with synchronising the hash page table

Because pte_update() takes the set of PTE bits to set and clear we can't
use our existing helpers, eg. pte_wrprotect() etc. and instead have to
open code the set of flags. We will clean that up somehow in a future


Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines")
Reported-by: Laurent Vivier <>
Signed-off-by: Michael Ellerman <>
 arch/powerpc/mm/pageattr.c | 45 +++++++++++++++++++++++---------------
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
index 0876216ceee6..72425b61eb7e 100644
--- a/arch/powerpc/mm/pageattr.c
+++ b/arch/powerpc/mm/pageattr.c
@@ -18,52 +18,61 @@
  * Updates the attributes of a page in three steps:
- * 1. invalidate the page table entry
- * 2. flush the TLB
- * 3. install the new entry with the updated attributes
- *
- * Invalidating the pte means there are situations where this will not work
- * when in theory it should.
- * For example:
- * - removing write from page whilst it is being executed
- * - setting a page read-only whilst it is being read by another CPU
+ * 1. take the page_table_lock
+ * 2. install the new entry with the updated attributes
+ * 3. flush the TLB
+ * This sequence is safe against concurrent updates, and also allows updating the
+ * attributes of a page currently being executed or accessed.
 static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
 	long action = (long)data;
-	pte_t pte;
+	unsigned long set, clear;
-	/* invalidate the PTE so it's safe to modify */
-	pte = ptep_get_and_clear(&init_mm, addr, ptep);
-	flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+	set = clear = 0;
 	/* modify the PTE bits as desired, then apply */
 	switch (action) {
-		pte = pte_wrprotect(pte);
+#ifdef CONFIG_PPC_BOOK3S_64
+		clear = _PAGE_WRITE;
+#elif defined(CONFIG_PPC_8xx)
+		set = _PAGE_RO;
+		clear = _PAGE_RW;
-		pte = pte_mkwrite(pte_mkdirty(pte));
+#ifdef CONFIG_PPC_8xx
+		clear = _PAGE_RO;
+#elif defined(CONFIG_PPC_BOOK3S_64)
+		set = _PAGE_RW | _PAGE_DIRTY;
-		pte = pte_exprotect(pte);
+		clear = _PAGE_EXEC;
 	case SET_MEMORY_X:
-		pte = pte_mkexec(pte);
+		set = _PAGE_EXEC;
-	set_pte_at(&init_mm, addr, ptep, pte);
+	pte_update(&init_mm, addr, ptep, clear, set, 0);
 	/* See ptesync comment in radix__set_pte_at() */
 	if (radix_enabled())
 		asm volatile("ptesync": : :"memory");
+	flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
 	return 0;

base-commit: cbc06f051c524dcfe52ef0d1f30647828e226d30

             reply	other threads:[~2021-08-17 13:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-17 13:25 Michael Ellerman [this message]
2021-08-17 14:20 ` [PATCH] powerpc/mm: Fix set_memory_*() against concurrent accesses Christophe Leroy
2021-08-17 14:21 ` Fabiano Rosas
2021-08-17 14:28   ` Christophe Leroy
2021-08-18  7:46   ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).