All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christopher M. Riedl" <cmr@linux.ibm.com>
To: linuxppc-dev@lists.ozlabs.org
Cc: tglx@linutronix.de, x86@kernel.org,
	linux-hardening@vger.kernel.org, keescook@chromium.org
Subject: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching
Date: Wed,  5 May 2021 23:34:49 -0500	[thread overview]
Message-ID: <20210506043452.9674-9-cmr@linux.ibm.com> (raw)
In-Reply-To: <20210506043452.9674-1-cmr@linux.ibm.com>

When code patching a STRICT_KERNEL_RWX kernel the page containing the
address to be patched is temporarily mapped as writeable. Currently, a
per-cpu vmalloc patch area is used for this purpose. While the patch
area is per-cpu, the temporary page mapping is inserted into the kernel
page tables for the duration of patching. The mapping is exposed to CPUs
other than the patching CPU - this is undesirable from a hardening
perspective. Use a temporary mm instead which keeps the mapping local to
the CPU doing the patching.

Use the `poking_init` init hook to prepare a temporary mm and patching
address. Initialize the temporary mm by copying the init mm. Choose a
randomized patching address inside the temporary mm userspace address
space. The patching address is randomized between PAGE_SIZE and
DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how
the Book3s64 Hash MMU operates - by default the space above
DEFAULT_MAP_WINDOW is not available. For now, the patching address for
all platforms/MMUs is randomized inside this range.  The number of
possible random addresses is dependent on PAGE_SIZE and limited by
DEFAULT_MAP_WINDOW.

Bits of entropy with 64K page size on BOOK3S_64:

        bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)

        PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
        bits of entropy = log2(128TB / 64K) bits of entropy = 31

Randomization occurs only once during initialization at boot.

Introduce two new functions, map_patch() and unmap_patch(), to
respectively create and remove the temporary mapping with write
permissions at patching_addr. The Hash MMU on Book3s64 requires mapping
the page for patching with PAGE_SHARED since the kernel cannot access
userspace pages with the PAGE_PRIVILEGED (PAGE_KERNEL) bit set.

Also introduce hash_prefault_mapping() to preload the SLB entry and HPTE
for the patching_addr when using the Hash MMU on Book3s64 to avoid
taking an SLB and Hash fault during patching.

Since patching_addr is now a userspace address, lock/unlock KUAP on
non-Book3s64 platforms. On Book3s64 with a Radix MMU, mapping the page
with PAGE_KERNEL sets EAA[0] for the PTE which ignores the AMR (KUAP)
according to PowerISA v3.0b Figure 35. On Book3s64 with a Hash MMU, the
hash PTE for the mapping is inserted with HPTE_USE_KERNEL_KEY which
similarly avoids the need for switching KUAP.

Finally, add a new WARN_ON() to check that the instruction was patched
as intended after the temporary mapping is torn down.

Based on x86 implementation:

commit 4fc19708b165
("x86/alternatives: Initialize temporary mm for patching")

and:

commit b3fd8e83ada0
("x86/alternatives: Use temporary mm for text poking")

Signed-off-by: Christopher M. Riedl <cmr@linux.ibm.com>

---

v4:  * In the previous series this was two separate patches: one to init
       the temporary mm in poking_init() (unused in powerpc at the time)
       and the other to use it for patching (which removed all the
       per-cpu vmalloc code). Now that we use poking_init() in the
       existing per-cpu vmalloc approach, that separation doesn't work
       as nicely anymore so I just merged the two patches into one.
     * Preload the SLB entry and hash the page for the patching_addr
       when using Hash on book3s64 to avoid taking an SLB and Hash fault
       during patching. The previous implementation was a hack which
       changed current->mm to allow the SLB and Hash fault handlers to
       work with the temporary mm since both of those code-paths always
       assume mm == current->mm.
     * Also (hmm - seeing a trend here) with the book3s64 Hash MMU we
       have to manage the mm->context.active_cpus counter and mm cpumask
       since they determine (via mm_is_thread_local()) if the TLB flush
       in pte_clear() is local or not - it should always be local when
       we're using the temporary mm. On book3s64's Radix MMU we can
       just call local_flush_tlb_mm().
     * Use HPTE_USE_KERNEL_KEY on Hash to avoid costly lock/unlock of
       KUAP.
---
 arch/powerpc/lib/code-patching.c | 209 ++++++++++++++++++-------------
 1 file changed, 121 insertions(+), 88 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index cbdfba8a39360..7e15abc09ec04 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -11,6 +11,8 @@
 #include <linux/cpuhotplug.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
+#include <linux/sched/task.h>
+#include <linux/random.h>
 
 #include <asm/tlbflush.h>
 #include <asm/page.h>
@@ -19,6 +21,7 @@
 #include <asm/inst.h>
 #include <asm/mmu_context.h>
 #include <asm/debug.h>
+#include <asm/tlb.h>
 
 static int __patch_instruction(struct ppc_inst *exec_addr, struct ppc_inst instr,
 			       struct ppc_inst *patch_addr)
@@ -113,113 +116,142 @@ static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
 	}
 }
 
-static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
+static struct mm_struct *patching_mm __ro_after_init;
+static unsigned long patching_addr __ro_after_init;
+
+void __init poking_init(void)
+{
+	spinlock_t *ptl; /* for protecting pte table */
+	pte_t *ptep;
+
+	/*
+	 * Some parts of the kernel (static keys for example) depend on
+	 * successful code patching. Code patching under STRICT_KERNEL_RWX
+	 * requires this setup - otherwise we cannot patch at all. We use
+	 * BUG_ON() here and later since an early failure is preferred to
+	 * buggy behavior and/or strange crashes later.
+	 */
+	patching_mm = copy_init_mm();
+	BUG_ON(!patching_mm);
+
+	/*
+	 * Choose a randomized, page-aligned address from the range:
+	 * [PAGE_SIZE, DEFAULT_MAP_WINDOW - PAGE_SIZE]
+	 * The lower address bound is PAGE_SIZE to avoid the zero-page.
+	 * The upper address bound is DEFAULT_MAP_WINDOW - PAGE_SIZE to stay
+	 * under DEFAULT_MAP_WINDOW with the Book3s64 Hash MMU.
+	 */
+	patching_addr = PAGE_SIZE + ((get_random_long() & PAGE_MASK)
+			% (DEFAULT_MAP_WINDOW - 2 * PAGE_SIZE));
+
+	/*
+	 * PTE allocation uses GFP_KERNEL which means we need to pre-allocate
+	 * the PTE here. We cannot do the allocation during patching with IRQs
+	 * disabled (ie. "atomic" context).
+	 */
+	ptep = get_locked_pte(patching_mm, patching_addr, &ptl);
+	BUG_ON(!ptep);
+	pte_unmap_unlock(ptep, ptl);
+}
 
 #if IS_BUILTIN(CONFIG_LKDTM)
 unsigned long read_cpu_patching_addr(unsigned int cpu)
 {
-	return (unsigned long)(per_cpu(text_poke_area, cpu))->addr;
+	return patching_addr;
 }
 #endif
 
-static int text_area_cpu_up(unsigned int cpu)
+struct patch_mapping {
+	spinlock_t *ptl; /* for protecting pte table */
+	pte_t *ptep;
+	struct temp_mm temp_mm;
+};
+
+#ifdef CONFIG_PPC_BOOK3S_64
+
+static inline int hash_prefault_mapping(pgprot_t pgprot)
 {
-	struct vm_struct *area;
+	int err;
 
-	area = get_vm_area(PAGE_SIZE, VM_ALLOC);
-	if (!area) {
-		WARN_ONCE(1, "Failed to create text area for cpu %d\n",
-			cpu);
-		return -1;
-	}
-	this_cpu_write(text_poke_area, area);
+	if (radix_enabled())
+		return 0;
 
-	return 0;
-}
+	err = slb_allocate_user(patching_mm, patching_addr);
+	if (err)
+		pr_warn("map patch: failed to allocate slb entry\n");
 
-static int text_area_cpu_down(unsigned int cpu)
-{
-	free_vm_area(this_cpu_read(text_poke_area));
-	return 0;
+	err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0,
+			   HPTE_USE_KERNEL_KEY);
+	if (err)
+		pr_warn("map patch: failed to insert hashed page\n");
+
+	/* See comment in switch_slb() in mm/book3s64/slb.c */
+	isync();
+
+	return err;
 }
 
-/*
- * Run as a late init call. This allows all the boot time patching to be done
- * simply by patching the code, and then we're called here prior to
- * mark_rodata_ro(), which happens after all init calls are run. Although
- * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
- * it as being preferable to a kernel that will crash later when someone tries
- * to use patch_instruction().
- */
-static int __init setup_text_poke_area(void)
-{
-	BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
-		"powerpc/text_poke:online", text_area_cpu_up,
-		text_area_cpu_down));
+#else
 
+static inline int hash_prefault_mapping(pgprot_t pgprot)
+{
 	return 0;
 }
-late_initcall(setup_text_poke_area);
+
+#endif /* CONFIG_PPC_BOOK3S_64 */
 
 /*
  * This can be called for kernel text or a module.
  */
-static int map_patch_area(void *addr, unsigned long text_poke_addr)
+static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
 {
-	unsigned long pfn;
-	int err;
+	struct page *page;
+	pte_t pte;
+	pgprot_t pgprot;
 
 	if (is_vmalloc_or_module_addr(addr))
-		pfn = vmalloc_to_pfn(addr);
+		page = vmalloc_to_page(addr);
 	else
-		pfn = __pa_symbol(addr) >> PAGE_SHIFT;
+		page = virt_to_page(addr);
 
-	err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
+	if (radix_enabled())
+		pgprot = PAGE_KERNEL;
+	else
+		pgprot = PAGE_SHARED;
 
-	pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err);
-	if (err)
+	patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr,
+					     &patch_mapping->ptl);
+	if (unlikely(!patch_mapping->ptep)) {
+		pr_warn("map patch: failed to allocate pte for patching\n");
 		return -1;
+	}
 
-	return 0;
-}
-
-static inline int unmap_patch_area(unsigned long addr)
-{
-	pte_t *ptep;
-	pmd_t *pmdp;
-	pud_t *pudp;
-	p4d_t *p4dp;
-	pgd_t *pgdp;
-
-	pgdp = pgd_offset_k(addr);
-	if (unlikely(!pgdp))
-		return -EINVAL;
-
-	p4dp = p4d_offset(pgdp, addr);
-	if (unlikely(!p4dp))
-		return -EINVAL;
+	pte = mk_pte(page, pgprot);
+	pte = pte_mkdirty(pte);
+	set_pte_at(patching_mm, patching_addr, patch_mapping->ptep, pte);
 
-	pudp = pud_offset(p4dp, addr);
-	if (unlikely(!pudp))
-		return -EINVAL;
+	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
+	use_temporary_mm(&patch_mapping->temp_mm);
 
-	pmdp = pmd_offset(pudp, addr);
-	if (unlikely(!pmdp))
-		return -EINVAL;
+	/*
+	 * On Book3s64 with the Hash MMU we have to manually insert the SLB
+	 * entry and HPTE to prevent taking faults on the patching_addr later.
+	 */
+	return(hash_prefault_mapping(pgprot));
+}
 
-	ptep = pte_offset_kernel(pmdp, addr);
-	if (unlikely(!ptep))
-		return -EINVAL;
+static void unmap_patch(struct patch_mapping *patch_mapping)
+{
+	/* Book3s64 Hash MMU: pte_clear() flushes the TLB */
+	pte_clear(patching_mm, patching_addr, patch_mapping->ptep);
 
-	pr_devel("clearing mm %p, pte %p, addr %lx\n", &init_mm, ptep, addr);
+	/* Book3s64 Radix MMU: explicitly flush the TLB (no-op in Hash MMU) */
+	local_flush_tlb_mm(patching_mm);
 
-	/*
-	 * In hash, pte_clear flushes the tlb, in radix, we have to
-	 */
-	pte_clear(&init_mm, addr, ptep);
-	flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+	pte_unmap_unlock(patch_mapping->ptep, patch_mapping->ptl);
 
-	return 0;
+	/* Book3s64 Hash MMU: switch_mm_irqs_off() invalidates the SLB */
+	unuse_temporary_mm(&patch_mapping->temp_mm);
 }
 
 static int do_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
@@ -227,32 +259,33 @@ static int do_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
 	int err;
 	struct ppc_inst *patch_addr = NULL;
 	unsigned long flags;
-	unsigned long text_poke_addr;
-	unsigned long kaddr = (unsigned long)addr;
+	struct patch_mapping patch_mapping;
 
 	/*
-	 * During early early boot patch_instruction is called
-	 * when text_poke_area is not ready, but we still need
-	 * to allow patching. We just do the plain old patching
+	 * The patching_mm is initialized before calling mark_rodata_ro. Prior
+	 * to this, patch_instruction is called when we don't have (and don't
+	 * need) the patching_mm so just do plain old patching.
 	 */
-	if (!this_cpu_read(text_poke_area))
+	if (!patching_mm)
 		return raw_patch_instruction(addr, instr);
 
 	local_irq_save(flags);
 
-	text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr;
-	if (map_patch_area(addr, text_poke_addr)) {
-		err = -1;
+	err = map_patch(addr, &patch_mapping);
+	if (err)
 		goto out;
-	}
 
-	patch_addr = (struct ppc_inst *)(text_poke_addr + (kaddr & ~PAGE_MASK));
+	patch_addr = (struct ppc_inst *)(patching_addr | offset_in_page(addr));
 
-	__patch_instruction(addr, instr, patch_addr);
+	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64))
+		allow_read_write_user(patch_addr, patch_addr, ppc_inst_len(instr));
+	err = __patch_instruction(addr, instr, patch_addr);
+	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64))
+		prevent_read_write_user(patch_addr, patch_addr, ppc_inst_len(instr));
 
-	err = unmap_patch_area(text_poke_addr);
-	if (err)
-		pr_warn("failed to unmap %lx\n", text_poke_addr);
+	unmap_patch(&patch_mapping);
+
+	WARN_ON(!ppc_inst_equal(ppc_inst_read(addr), instr));
 
 out:
 	local_irq_restore(flags);
-- 
2.26.1


  parent reply	other threads:[~2021-05-06  4:35 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-06  4:34 [RESEND PATCH v4 00/11] Use per-CPU temporary mappings for patching Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 01/11] powerpc: Add LKDTM accessor for patching addr Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 02/11] lkdtm/powerpc: Add test to hijack a patch mapping Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 03/11] x86_64: Add LKDTM accessor for patching addr Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 04/11] lkdtm/x86_64: Add test to hijack a patch mapping Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 05/11] powerpc/64s: Add ability to skip SLB preload Christopher M. Riedl
2021-06-21  3:13   ` Daniel Axtens
2021-07-01  3:48     ` Christopher M. Riedl
2021-07-01  3:48       ` Christopher M. Riedl
2021-07-01  4:15       ` Nicholas Piggin
2021-07-01  4:15         ` Nicholas Piggin
2021-07-01  5:28         ` Christopher M. Riedl
2021-07-01  5:28           ` Christopher M. Riedl
2021-07-01  6:04           ` Nicholas Piggin
2021-07-01  6:04             ` Nicholas Piggin
2021-07-01  6:53             ` Christopher M. Riedl
2021-07-01  6:53               ` Christopher M. Riedl
2021-07-01  7:37               ` Nicholas Piggin
2021-07-01  7:37                 ` Nicholas Piggin
2021-07-01 11:30                 ` Nicholas Piggin
2021-07-01 11:30                   ` Nicholas Piggin
2021-07-09  4:55                 ` Christopher M. Riedl
2021-07-09  4:55                   ` Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 06/11] powerpc: Introduce temporary mm Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 07/11] powerpc/64s: Make slb_allocate_user() non-static Christopher M. Riedl
2021-05-06  4:34 ` Christopher M. Riedl [this message]
2021-06-21  3:19   ` [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching Daniel Axtens
2021-07-01  5:11     ` Christopher M. Riedl
2021-07-01  5:11       ` Christopher M. Riedl
2021-07-01  6:12   ` Nicholas Piggin
2021-07-01  6:12     ` Nicholas Piggin
2021-07-01  7:02     ` Christopher M. Riedl
2021-07-01  7:02       ` Christopher M. Riedl
2021-07-01  7:51       ` Nicholas Piggin
2021-07-01  7:51         ` Nicholas Piggin
2021-07-09  5:03         ` Christopher M. Riedl
2021-07-09  5:03           ` Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 09/11] lkdtm/powerpc: Fix code patching hijack test Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 10/11] powerpc: Protect patching_mm with a lock Christopher M. Riedl
2021-05-06 10:51   ` Peter Zijlstra
2021-05-06 10:51     ` Peter Zijlstra
2021-05-07 20:03     ` Christopher M. Riedl
2021-05-07 20:03       ` Christopher M. Riedl
2021-05-07 22:26       ` Peter Zijlstra
2021-05-06  4:34 ` [RESEND PATCH v4 11/11] powerpc: Use patch_instruction_unlocked() in loops Christopher M. Riedl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210506043452.9674-9-cmr@linux.ibm.com \
    --to=cmr@linux.ibm.com \
    --cc=keescook@chromium.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.