All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Use per-CPU temporary mappings for patching
@ 2020-03-23  4:52 Christopher M. Riedl
  2020-03-23  4:52 ` [RFC PATCH 1/3] powerpc/mm: Introduce temporary mm Christopher M. Riedl
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Christopher M. Riedl @ 2020-03-23  4:52 UTC (permalink / raw)
  To: linuxppc-dev

When compiled with CONFIG_STRICT_KERNEL_RWX, the kernel must create
temporary mappings when patching itself. These mappings temporarily
override the strict RWX text protections to permit a write. Currently,
powerpc allocates a per-CPU VM area for patching. Patching occurs as
follows:

	1. Map page of text to be patched to per-CPU VM area w/
	   PAGE_KERNEL protection
	2. Patch text
	3. Remove the temporary mapping

While the VM area is per-CPU, the mapping is actually inserted into the
kernel page tables. Presumably, this could allow another CPU to access
the normally write-protected text - either malicously or accidentally -
via this same mapping if the address of the VM area is known. Ideally,
the mapping should be kept local to the CPU doing the patching (or any
other sensitive operations requiring temporarily overriding memory
protections) [0].

x86 introduced "temporary mm" structs which allow the creation of
mappings local to a particular CPU [1]. This series intends to bring the
notion of a temporary mm to powerpc and harden powerpc by using such a
mapping for patching a kernel with strict RWX permissions.

The first patch introduces the temporary mm struct and API for powerpc
along with a new function to retrieve a current hw breakpoint.

The second patch uses the `poking_init` init hook added by the x86
patches to initialize a temporary mm and patching address. The patching
address is randomized between 0 and DEFAULT_MAP_WINDOW-PAGE_SIZE. The
upper limit is necessary due to how the hash MMU operates - by default
the space above DEFAULT_MAP_WINDOW is not available. For now, both hash
and radix randomize inside this range. The number of possible random
addresses is dependent on PAGE_SIZE and limited by DEFAULT_MAP_WINDOW.

Bits of entropy with 64K page size on BOOK3S_64:

	bits-o-entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)

	PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
	bits-o-entropy = log2(128TB / 64K)
	bits-o-entropy = 31

Currently, randomization occurs only once during initialization at boot.

The third patch replaces the VM area with the temporary mm in the
patching code. The page for patching has to be mapped PAGE_SHARED with
the hash MMU since hash prevents the kernel from accessing userspace
pages with PAGE_PRIVILEGED bit set. There is on-going work on my side to
explore if this is actually necessary in the hash codepath.

Testing so far is limited to booting on QEMU (power8 and power9 targets)
and a POWER8 VM along with setting some simple xmon breakpoints (which
makes use of code-patching). A POC lkdtm test is in-progress to actually
exploit the existing vulnerability (ie. the mapping during patching is
exposed in kernel page tables and accessible by other CPUS) - this will
accompany a future v1 of this series.

[0]: https://github.com/linuxppc/issues/issues/224
[1]: https://lore.kernel.org/kernel-hardening/20190426232303.28381-1-nadav.amit@gmail.com/

Christopher M. Riedl (3):
  powerpc/mm: Introduce temporary mm
  powerpc/lib: Initialize a temporary mm for code patching
  powerpc/lib: Use a temporary mm for code patching

 arch/powerpc/include/asm/debug.h       |   1 +
 arch/powerpc/include/asm/mmu_context.h |  56 +++++++++-
 arch/powerpc/kernel/process.c          |   5 +
 arch/powerpc/lib/code-patching.c       | 140 ++++++++++++++-----------
 4 files changed, 137 insertions(+), 65 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH 1/3] powerpc/mm: Introduce temporary mm
  2020-03-23  4:52 [RFC PATCH 0/3] Use per-CPU temporary mappings for patching Christopher M. Riedl
@ 2020-03-23  4:52 ` Christopher M. Riedl
  2020-03-23  6:10   ` kbuild test robot
                     ` (2 more replies)
  2020-03-23  4:52 ` [RFC PATCH 2/3] powerpc/lib: Initialize a temporary mm for code patching Christopher M. Riedl
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 23+ messages in thread
From: Christopher M. Riedl @ 2020-03-23  4:52 UTC (permalink / raw)
  To: linuxppc-dev

x86 supports the notion of a temporary mm which restricts access to
temporary PTEs to a single CPU. A temporary mm is useful for situations
where a CPU needs to perform sensitive operations (such as patching a
STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
said mappings to other CPUs. A side benefit is that other CPU TLBs do
not need to be flushed when the temporary mm is torn down.

Mappings in the temporary mm can be set in the userspace portion of the
address-space.

Interrupts must be disabled while the temporary mm is in use. HW
breakpoints, which may have been set by userspace as watchpoints on
addresses now within the temporary mm, are saved and disabled when
loading the temporary mm. The HW breakpoints are restored when unloading
the temporary mm. All HW breakpoints are indiscriminately disabled while
the temporary mm is in use.

Based on x86 implementation:

commit cefa929c034e
("x86/mm: Introduce temporary mm structs")

Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
---
 arch/powerpc/include/asm/debug.h       |  1 +
 arch/powerpc/include/asm/mmu_context.h | 56 +++++++++++++++++++++++++-
 arch/powerpc/kernel/process.c          |  5 +++
 3 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
index 7756026b95ca..b945bc16c932 100644
--- a/arch/powerpc/include/asm/debug.h
+++ b/arch/powerpc/include/asm/debug.h
@@ -45,6 +45,7 @@ static inline int debugger_break_match(struct pt_regs *regs) { return 0; }
 static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; }
 #endif
 
+void __get_breakpoint(struct arch_hw_breakpoint *brk);
 void __set_breakpoint(struct arch_hw_breakpoint *brk);
 bool ppc_breakpoint_available(void);
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 360367c579de..3e6381d04c28 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -7,9 +7,10 @@
 #include <linux/mm.h>
 #include <linux/sched.h>
 #include <linux/spinlock.h>
-#include <asm/mmu.h>	
+#include <asm/mmu.h>
 #include <asm/cputable.h>
 #include <asm/cputhreads.h>
+#include <asm/hw_breakpoint.h>
 
 /*
  * Most if the context management is out of line
@@ -270,5 +271,58 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
 	return 0;
 }
 
+struct temp_mm {
+	struct mm_struct *temp;
+	struct mm_struct *prev;
+	bool is_kernel_thread;
+	struct arch_hw_breakpoint brk;
+};
+
+static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
+{
+	temp_mm->temp = mm;
+	temp_mm->prev = NULL;
+	temp_mm->is_kernel_thread = false;
+	memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
+}
+
+static inline void use_temporary_mm(struct temp_mm *temp_mm)
+{
+	lockdep_assert_irqs_disabled();
+
+	temp_mm->is_kernel_thread = current->mm == NULL;
+	if (temp_mm->is_kernel_thread)
+		temp_mm->prev = current->active_mm;
+	else
+		temp_mm->prev = current->mm;
+
+	/*
+	 * Hash requires a non-NULL current->mm to allocate a userspace address
+	 * when handling a page fault. Does not appear to hurt in Radix either.
+	 */
+	current->mm = temp_mm->temp;
+	switch_mm_irqs_off(NULL, temp_mm->temp, current);
+
+	if (ppc_breakpoint_available()) {
+		__get_breakpoint(&temp_mm->brk);
+		if (temp_mm->brk.type != 0)
+			hw_breakpoint_disable();
+	}
+}
+
+static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
+{
+	lockdep_assert_irqs_disabled();
+
+	if (temp_mm->is_kernel_thread)
+		current->mm = NULL;
+	else
+		current->mm = temp_mm->prev;
+	switch_mm_irqs_off(NULL, temp_mm->prev, current);
+
+	if (ppc_breakpoint_available() && temp_mm->brk.type != 0)
+		__set_breakpoint(&temp_mm->brk);
+}
+
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index fad50db9dcf2..5e5cf33fc358 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -793,6 +793,11 @@ static inline int set_breakpoint_8xx(struct arch_hw_breakpoint *brk)
 	return 0;
 }
 
+void __get_breakpoint(struct arch_hw_breakpoint *brk)
+{
+	memcpy(brk, this_cpu_ptr(&current_brk), sizeof(*brk));
+}
+
 void __set_breakpoint(struct arch_hw_breakpoint *brk)
 {
 	memcpy(this_cpu_ptr(&current_brk), brk, sizeof(*brk));
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH 2/3] powerpc/lib: Initialize a temporary mm for code patching
  2020-03-23  4:52 [RFC PATCH 0/3] Use per-CPU temporary mappings for patching Christopher M. Riedl
  2020-03-23  4:52 ` [RFC PATCH 1/3] powerpc/mm: Introduce temporary mm Christopher M. Riedl
@ 2020-03-23  4:52 ` Christopher M. Riedl
  2020-03-24 16:10   ` Christophe Leroy
  2020-03-23  4:52 ` [RFC PATCH 3/3] powerpc/lib: Use " Christopher M. Riedl
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Christopher M. Riedl @ 2020-03-23  4:52 UTC (permalink / raw)
  To: linuxppc-dev

When code patching a STRICT_KERNEL_RWX kernel the page containing the
address to be patched is temporarily mapped with permissive memory
protections. Currently, a per-cpu vmalloc patch area is used for this
purpose. While the patch area is per-cpu, the temporary page mapping is
inserted into the kernel page tables for the duration of the patching.
The mapping is exposed to CPUs other than the patching CPU - this is
undesirable from a hardening perspective.

Use the `poking_init` init hook to prepare a temporary mm and patching
address. Initialize the temporary mm by copying the init mm. Choose a
randomized patching address inside the temporary mm userspace address
portion. The next patch uses the temporary mm and patching address for
code patching.

Based on x86 implementation:

commit 4fc19708b165
("x86/alternatives: Initialize temporary mm for patching")

Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
---
 arch/powerpc/lib/code-patching.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 3345f039a876..18b88ecfc5a8 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -11,6 +11,8 @@
 #include <linux/cpuhotplug.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
+#include <linux/sched/task.h>
+#include <linux/random.h>
 
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
@@ -39,6 +41,30 @@ int raw_patch_instruction(unsigned int *addr, unsigned int instr)
 }
 
 #ifdef CONFIG_STRICT_KERNEL_RWX
+
+__ro_after_init struct mm_struct *patching_mm;
+__ro_after_init unsigned long patching_addr;
+
+void __init poking_init(void)
+{
+	spinlock_t *ptl; /* for protecting pte table */
+	pte_t *ptep;
+
+	patching_mm = copy_init_mm();
+	BUG_ON(!patching_mm);
+
+	/*
+	 * In hash we cannot go above DEFAULT_MAP_WINDOW easily.
+	 * XXX: Do we want additional bits of entropy for radix?
+	 */
+	patching_addr = (get_random_long() & PAGE_MASK) %
+		(DEFAULT_MAP_WINDOW - PAGE_SIZE);
+
+	ptep = get_locked_pte(patching_mm, patching_addr, &ptl);
+	BUG_ON(!ptep);
+	pte_unmap_unlock(ptep, ptl);
+}
+
 static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
 
 static int text_area_cpu_up(unsigned int cpu)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH 3/3] powerpc/lib: Use a temporary mm for code patching
  2020-03-23  4:52 [RFC PATCH 0/3] Use per-CPU temporary mappings for patching Christopher M. Riedl
  2020-03-23  4:52 ` [RFC PATCH 1/3] powerpc/mm: Introduce temporary mm Christopher M. Riedl
  2020-03-23  4:52 ` [RFC PATCH 2/3] powerpc/lib: Initialize a temporary mm for code patching Christopher M. Riedl
@ 2020-03-23  4:52 ` Christopher M. Riedl
  2020-03-24 16:25   ` Christophe Leroy
  2020-03-23 11:30 ` [RFC PATCH 0/3] Use per-CPU temporary mappings for patching Christophe Leroy
  2020-03-25  2:51 ` Andrew Donnellan
  4 siblings, 1 reply; 23+ messages in thread
From: Christopher M. Riedl @ 2020-03-23  4:52 UTC (permalink / raw)
  To: linuxppc-dev

Currently, code patching a STRICT_KERNEL_RWX exposes the temporary
mappings to other CPUs. These mappings should be kept local to the CPU
doing the patching. Use the pre-initialized temporary mm and patching
address for this purpose. Also add a check after patching to ensure the
patch succeeded.

Based on x86 implementation:

commit b3fd8e83ada0
("x86/alternatives: Use temporary mm for text poking")

Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
---
 arch/powerpc/lib/code-patching.c | 128 ++++++++++++++-----------------
 1 file changed, 57 insertions(+), 71 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 18b88ecfc5a8..f156132e8975 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -19,6 +19,7 @@
 #include <asm/page.h>
 #include <asm/code-patching.h>
 #include <asm/setup.h>
+#include <asm/mmu_context.h>
 
 static int __patch_instruction(unsigned int *exec_addr, unsigned int instr,
 			       unsigned int *patch_addr)
@@ -65,99 +66,79 @@ void __init poking_init(void)
 	pte_unmap_unlock(ptep, ptl);
 }
 
-static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
-
-static int text_area_cpu_up(unsigned int cpu)
-{
-	struct vm_struct *area;
-
-	area = get_vm_area(PAGE_SIZE, VM_ALLOC);
-	if (!area) {
-		WARN_ONCE(1, "Failed to create text area for cpu %d\n",
-			cpu);
-		return -1;
-	}
-	this_cpu_write(text_poke_area, area);
-
-	return 0;
-}
-
-static int text_area_cpu_down(unsigned int cpu)
-{
-	free_vm_area(this_cpu_read(text_poke_area));
-	return 0;
-}
-
-/*
- * Run as a late init call. This allows all the boot time patching to be done
- * simply by patching the code, and then we're called here prior to
- * mark_rodata_ro(), which happens after all init calls are run. Although
- * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
- * it as being preferable to a kernel that will crash later when someone tries
- * to use patch_instruction().
- */
-static int __init setup_text_poke_area(void)
-{
-	BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
-		"powerpc/text_poke:online", text_area_cpu_up,
-		text_area_cpu_down));
-
-	return 0;
-}
-late_initcall(setup_text_poke_area);
+struct patch_mapping {
+	spinlock_t *ptl; /* for protecting pte table */
+	struct temp_mm temp_mm;
+};
 
 /*
  * This can be called for kernel text or a module.
  */
-static int map_patch_area(void *addr, unsigned long text_poke_addr)
+static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
 {
-	unsigned long pfn;
-	int err;
+	struct page *page;
+	pte_t pte, *ptep;
+	pgprot_t pgprot;
 
 	if (is_vmalloc_addr(addr))
-		pfn = vmalloc_to_pfn(addr);
+		page = vmalloc_to_page(addr);
 	else
-		pfn = __pa_symbol(addr) >> PAGE_SHIFT;
+		page = virt_to_page(addr);
 
-	err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
+	if (radix_enabled())
+		pgprot = __pgprot(pgprot_val(PAGE_KERNEL));
+	else
+		pgprot = PAGE_SHARED;
 
-	pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err);
-	if (err)
+	ptep = get_locked_pte(patching_mm, patching_addr, &patch_mapping->ptl);
+	if (unlikely(!ptep)) {
+		pr_warn("map patch: failed to allocate pte for patching\n");
 		return -1;
+	}
+
+	pte = mk_pte(page, pgprot);
+	set_pte_at(patching_mm, patching_addr, ptep, pte);
+
+	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
+	use_temporary_mm(&patch_mapping->temp_mm);
 
 	return 0;
 }
 
-static inline int unmap_patch_area(unsigned long addr)
+static int unmap_patch(struct patch_mapping *patch_mapping)
 {
 	pte_t *ptep;
 	pmd_t *pmdp;
 	pud_t *pudp;
 	pgd_t *pgdp;
 
-	pgdp = pgd_offset_k(addr);
+	pgdp = pgd_offset(patching_mm, patching_addr);
 	if (unlikely(!pgdp))
 		return -EINVAL;
 
-	pudp = pud_offset(pgdp, addr);
+	pudp = pud_offset(pgdp, patching_addr);
 	if (unlikely(!pudp))
 		return -EINVAL;
 
-	pmdp = pmd_offset(pudp, addr);
+	pmdp = pmd_offset(pudp, patching_addr);
 	if (unlikely(!pmdp))
 		return -EINVAL;
 
-	ptep = pte_offset_kernel(pmdp, addr);
+	ptep = pte_offset_kernel(pmdp, patching_addr);
 	if (unlikely(!ptep))
 		return -EINVAL;
 
-	pr_devel("clearing mm %p, pte %p, addr %lx\n", &init_mm, ptep, addr);
+	/*
+	 * In hash, pte_clear flushes the tlb
+	 */
+	pte_clear(patching_mm, patching_addr, ptep);
+	unuse_temporary_mm(&patch_mapping->temp_mm);
 
 	/*
-	 * In hash, pte_clear flushes the tlb, in radix, we have to
+	 * In radix, we have to explicitly flush the tlb (no-op in hash)
 	 */
-	pte_clear(&init_mm, addr, ptep);
-	flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+	local_flush_tlb_mm(patching_mm);
+	pte_unmap_unlock(ptep, patch_mapping->ptl);
 
 	return 0;
 }
@@ -167,33 +148,38 @@ static int do_patch_instruction(unsigned int *addr, unsigned int instr)
 	int err;
 	unsigned int *patch_addr = NULL;
 	unsigned long flags;
-	unsigned long text_poke_addr;
-	unsigned long kaddr = (unsigned long)addr;
+	struct patch_mapping patch_mapping;
 
 	/*
-	 * During early early boot patch_instruction is called
-	 * when text_poke_area is not ready, but we still need
-	 * to allow patching. We just do the plain old patching
+	 * The patching_mm is initialized before calling mark_rodata_ro. Prior
+	 * to this, patch_instruction is called when we don't have (and don't
+	 * need) the patching_mm so just do plain old patching.
 	 */
-	if (!this_cpu_read(text_poke_area))
+	if (!patching_mm)
 		return raw_patch_instruction(addr, instr);
 
 	local_irq_save(flags);
 
-	text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr;
-	if (map_patch_area(addr, text_poke_addr)) {
-		err = -1;
+	err = map_patch(addr, &patch_mapping);
+	if (err)
 		goto out;
-	}
 
-	patch_addr = (unsigned int *)(text_poke_addr) +
-			((kaddr & ~PAGE_MASK) / sizeof(unsigned int));
+	patch_addr = (unsigned int *)(patching_addr) +
+			(offset_in_page((unsigned long)addr) /
+				sizeof(unsigned int));
 
 	__patch_instruction(addr, instr, patch_addr);
 
-	err = unmap_patch_area(text_poke_addr);
+	err = unmap_patch(&patch_mapping);
 	if (err)
-		pr_warn("failed to unmap %lx\n", text_poke_addr);
+		pr_warn("unmap patch: failed to unmap patch\n");
+
+	/*
+	 * Something is wrong if what we just wrote doesn't match what we
+	 * think we just wrote.
+	 * XXX: BUG_ON() instead?
+	 */
+	WARN_ON(memcmp(addr, &instr, sizeof(instr)));
 
 out:
 	local_irq_restore(flags);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/3] powerpc/mm: Introduce temporary mm
  2020-03-23  4:52 ` [RFC PATCH 1/3] powerpc/mm: Introduce temporary mm Christopher M. Riedl
@ 2020-03-23  6:10   ` kbuild test robot
  2020-03-24 16:07   ` Christophe Leroy
  2020-04-18 10:36   ` Christophe Leroy
  2 siblings, 0 replies; 23+ messages in thread
From: kbuild test robot @ 2020-03-23  6:10 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 6611 bytes --]

Hi "Christopher,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.6-rc7 next-20200320]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Christopher-M-Riedl/Use-per-CPU-temporary-mappings-for-patching/20200323-130744
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allnoconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=9.2.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from arch/powerpc/include/asm/asm-prototypes.h:17,
                    from arch/powerpc/kernel/ptrace.c:44:
   arch/powerpc/include/asm/mmu_context.h: In function 'use_temporary_mm':
>> arch/powerpc/include/asm/mmu_context.h:306:6: error: implicit declaration of function 'ppc_breakpoint_available'; did you mean 'hw_breakpoint_disable'? [-Werror=implicit-function-declaration]
     306 |  if (ppc_breakpoint_available()) {
         |      ^~~~~~~~~~~~~~~~~~~~~~~~
         |      hw_breakpoint_disable
>> arch/powerpc/include/asm/mmu_context.h:307:3: error: implicit declaration of function '__get_breakpoint' [-Werror=implicit-function-declaration]
     307 |   __get_breakpoint(&temp_mm->brk);
         |   ^~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/mmu_context.h: In function 'unuse_temporary_mm':
>> arch/powerpc/include/asm/mmu_context.h:324:3: error: implicit declaration of function '__set_breakpoint' [-Werror=implicit-function-declaration]
     324 |   __set_breakpoint(&temp_mm->brk);
         |   ^~~~~~~~~~~~~~~~
   In file included from arch/powerpc/kernel/ptrace.c:45:
   arch/powerpc/include/asm/debug.h: At top level:
   arch/powerpc/include/asm/debug.h:48:6: error: conflicting types for '__get_breakpoint' [-Werror]
      48 | void __get_breakpoint(struct arch_hw_breakpoint *brk);
         |      ^~~~~~~~~~~~~~~~
   In file included from arch/powerpc/include/asm/asm-prototypes.h:17,
                    from arch/powerpc/kernel/ptrace.c:44:
   arch/powerpc/include/asm/mmu_context.h:307:3: note: previous implicit declaration of '__get_breakpoint' was here
     307 |   __get_breakpoint(&temp_mm->brk);
         |   ^~~~~~~~~~~~~~~~
   In file included from arch/powerpc/kernel/ptrace.c:45:
   arch/powerpc/include/asm/debug.h:49:6: error: conflicting types for '__set_breakpoint' [-Werror]
      49 | void __set_breakpoint(struct arch_hw_breakpoint *brk);
         |      ^~~~~~~~~~~~~~~~
   In file included from arch/powerpc/include/asm/asm-prototypes.h:17,
                    from arch/powerpc/kernel/ptrace.c:44:
   arch/powerpc/include/asm/mmu_context.h:324:3: note: previous implicit declaration of '__set_breakpoint' was here
     324 |   __set_breakpoint(&temp_mm->brk);
         |   ^~~~~~~~~~~~~~~~
   In file included from arch/powerpc/kernel/ptrace.c:45:
>> arch/powerpc/include/asm/debug.h:50:6: error: conflicting types for 'ppc_breakpoint_available'
      50 | bool ppc_breakpoint_available(void);
         |      ^~~~~~~~~~~~~~~~~~~~~~~~
   In file included from arch/powerpc/include/asm/asm-prototypes.h:17,
                    from arch/powerpc/kernel/ptrace.c:44:
   arch/powerpc/include/asm/mmu_context.h:306:6: note: previous implicit declaration of 'ppc_breakpoint_available' was here
     306 |  if (ppc_breakpoint_available()) {
         |      ^~~~~~~~~~~~~~~~~~~~~~~~
   cc1: all warnings being treated as errors
--
   In file included from arch/powerpc/include/asm/asm-prototypes.h:17,
                    from arch/powerpc/kernel/syscalls.c:38:
   arch/powerpc/include/asm/mmu_context.h: In function 'use_temporary_mm':
>> arch/powerpc/include/asm/mmu_context.h:306:6: error: implicit declaration of function 'ppc_breakpoint_available'; did you mean 'hw_breakpoint_disable'? [-Werror=implicit-function-declaration]
     306 |  if (ppc_breakpoint_available()) {
         |      ^~~~~~~~~~~~~~~~~~~~~~~~
         |      hw_breakpoint_disable
>> arch/powerpc/include/asm/mmu_context.h:307:3: error: implicit declaration of function '__get_breakpoint' [-Werror=implicit-function-declaration]
     307 |   __get_breakpoint(&temp_mm->brk);
         |   ^~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/mmu_context.h: In function 'unuse_temporary_mm':
>> arch/powerpc/include/asm/mmu_context.h:324:3: error: implicit declaration of function '__set_breakpoint' [-Werror=implicit-function-declaration]
     324 |   __set_breakpoint(&temp_mm->brk);
         |   ^~~~~~~~~~~~~~~~
   cc1: all warnings being treated as errors

vim +306 arch/powerpc/include/asm/mmu_context.h

   288	
   289	static inline void use_temporary_mm(struct temp_mm *temp_mm)
   290	{
   291		lockdep_assert_irqs_disabled();
   292	
   293		temp_mm->is_kernel_thread = current->mm == NULL;
   294		if (temp_mm->is_kernel_thread)
   295			temp_mm->prev = current->active_mm;
   296		else
   297			temp_mm->prev = current->mm;
   298	
   299		/*
   300		 * Hash requires a non-NULL current->mm to allocate a userspace address
   301		 * when handling a page fault. Does not appear to hurt in Radix either.
   302		 */
   303		current->mm = temp_mm->temp;
   304		switch_mm_irqs_off(NULL, temp_mm->temp, current);
   305	
 > 306		if (ppc_breakpoint_available()) {
 > 307			__get_breakpoint(&temp_mm->brk);
   308			if (temp_mm->brk.type != 0)
   309				hw_breakpoint_disable();
   310		}
   311	}
   312	
   313	static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
   314	{
   315		lockdep_assert_irqs_disabled();
   316	
   317		if (temp_mm->is_kernel_thread)
   318			current->mm = NULL;
   319		else
   320			current->mm = temp_mm->prev;
   321		switch_mm_irqs_off(NULL, temp_mm->prev, current);
   322	
   323		if (ppc_breakpoint_available() && temp_mm->brk.type != 0)
 > 324			__set_breakpoint(&temp_mm->brk);
   325	}
   326	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 6413 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 0/3] Use per-CPU temporary mappings for patching
  2020-03-23  4:52 [RFC PATCH 0/3] Use per-CPU temporary mappings for patching Christopher M. Riedl
                   ` (2 preceding siblings ...)
  2020-03-23  4:52 ` [RFC PATCH 3/3] powerpc/lib: Use " Christopher M. Riedl
@ 2020-03-23 11:30 ` Christophe Leroy
  2020-03-23 14:04   ` Christophe Leroy
  2020-03-25  2:51 ` Andrew Donnellan
  4 siblings, 1 reply; 23+ messages in thread
From: Christophe Leroy @ 2020-03-23 11:30 UTC (permalink / raw)
  To: Christopher M. Riedl, linuxppc-dev



On 03/23/2020 04:52 AM, Christopher M. Riedl wrote:
> When compiled with CONFIG_STRICT_KERNEL_RWX, the kernel must create
> temporary mappings when patching itself. These mappings temporarily
> override the strict RWX text protections to permit a write. Currently,
> powerpc allocates a per-CPU VM area for patching. Patching occurs as
> follows:
> 
> 	1. Map page of text to be patched to per-CPU VM area w/
> 	   PAGE_KERNEL protection
> 	2. Patch text
> 	3. Remove the temporary mapping
> 
> While the VM area is per-CPU, the mapping is actually inserted into the
> kernel page tables. Presumably, this could allow another CPU to access
> the normally write-protected text - either malicously or accidentally -
> via this same mapping if the address of the VM area is known. Ideally,
> the mapping should be kept local to the CPU doing the patching (or any
> other sensitive operations requiring temporarily overriding memory
> protections) [0].
> 
> x86 introduced "temporary mm" structs which allow the creation of
> mappings local to a particular CPU [1]. This series intends to bring the
> notion of a temporary mm to powerpc and harden powerpc by using such a
> mapping for patching a kernel with strict RWX permissions.
> 
> The first patch introduces the temporary mm struct and API for powerpc
> along with a new function to retrieve a current hw breakpoint.
> 
> The second patch uses the `poking_init` init hook added by the x86
> patches to initialize a temporary mm and patching address. The patching
> address is randomized between 0 and DEFAULT_MAP_WINDOW-PAGE_SIZE. The
> upper limit is necessary due to how the hash MMU operates - by default
> the space above DEFAULT_MAP_WINDOW is not available. For now, both hash
> and radix randomize inside this range. The number of possible random
> addresses is dependent on PAGE_SIZE and limited by DEFAULT_MAP_WINDOW.
> 
> Bits of entropy with 64K page size on BOOK3S_64:
> 
> 	bits-o-entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
> 
> 	PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
> 	bits-o-entropy = log2(128TB / 64K)
> 	bits-o-entropy = 31
> 
> Currently, randomization occurs only once during initialization at boot.
> 
> The third patch replaces the VM area with the temporary mm in the
> patching code. The page for patching has to be mapped PAGE_SHARED with
> the hash MMU since hash prevents the kernel from accessing userspace
> pages with PAGE_PRIVILEGED bit set. There is on-going work on my side to
> explore if this is actually necessary in the hash codepath.
> 
> Testing so far is limited to booting on QEMU (power8 and power9 targets)
> and a POWER8 VM along with setting some simple xmon breakpoints (which
> makes use of code-patching). A POC lkdtm test is in-progress to actually
> exploit the existing vulnerability (ie. the mapping during patching is
> exposed in kernel page tables and accessible by other CPUS) - this will
> accompany a future v1 of this series.

Got following failures on an 8xx. Note that "fault blocked by AP 
register !" means an unauthorised access from Kernel to Userspace.

[    6.113538] ------------[ cut here ]------------
[    6.117852] Bug: fault blocked by AP register !
[    6.117977] WARNING: CPU: 0 PID: 1 at 
./arch/powerpc/include/asm/nohash/32/kup-8xx.h:67 do_page_fault+0x660/0x6ec
[    6.132532] CPU: 0 PID: 1 Comm: swapper Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3524
[    6.142484] NIP:  c000f148 LR: c000f148 CTR: 00000000
[    6.147490] REGS: c9023ca8 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[    6.157185] MSR:  00021032 <ME,IR,DR,RI>  CR: 39023333  XER: a0002100
[    6.163553]
[    6.163553] GPR00: c000f148 c9023d60 c60ec000 00000023 c0924862 
0000000b c0921f7a 6c742062
[    6.163553] GPR08: 00001032 c088e534 00000000 00000004 39023333 
00000000 c0003964 00000000
[    6.163553] GPR16: 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000
[    6.163553] GPR24: 00000000 00000000 c0730000 00000300 c60dc000 
23085188 82000000 c9023db0
[    6.198284] NIP [c000f148] do_page_fault+0x660/0x6ec
[    6.203182] LR [c000f148] do_page_fault+0x660/0x6ec
[    6.207958] Call Trace:
[    6.210412] [c9023d60] [c000f148] do_page_fault+0x660/0x6ec (unreliable)
[    6.217037] [c9023da0] [c000e2f0] handle_page_fault+0x8/0x34
[    6.222684] --- interrupt: 301 at __patch_instruction+0x4/0x2c
[    6.222684]     LR = patch_instruction+0x144/0x324
[    6.233138] [c9023e68] [c0013408] patch_instruction+0x120/0x324 
(unreliable)
[    6.240108] [c9023ee8] [c00114fc] mmu_mark_initmem_nx+0x44/0x124
[    6.246039] [c9023f18] [c000f42c] free_initmem+0x20/0x58
[    6.251292] [c9023f28] [c0003980] kernel_init+0x1c/0x130
[    6.256538] [c9023f38] [c000e184] ret_from_kernel_thread+0x14/0x1c
[    6.262605] Instruction dump:
[    6.265542] 4182fc30 39200002 912a0610 7fa5eb78 38800002 38600007 
4801e8a9 38600000
[    6.273200] 4bfffa78 3c60c06d 38632f0c 4800eb95 <0fe00000> 3860000b 
4bfffa60 73490040
[    6.281034] ---[ end trace 18702eef58b6f5ff ]---
[    6.285638] ------------[ cut here ]------------
[    6.290233] WARNING: CPU: 0 PID: 1 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[    6.299575] CPU: 0 PID: 1 Comm: swapper Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3524
[    6.309527] NIP:  c00134f4 LR: c00134ec CTR: 00000000
[    6.314533] REGS: c9023db0 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[    6.324229] MSR:  00021032 <ME,IR,DR,RI>  CR: 55000933  XER: a0002100
[    6.330597]
[    6.330597] GPR00: 3d6a4000 c9023e68 c60ec000 00000001 c9023ea8 
00000004 c0001188 00000004
[    6.330597] GPR08: 00000000 00000000 00000000 c53720f0 33000333 
00000000 c0003964 00000000
[    6.330597] GPR16: 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000
[    6.330597] GPR24: 00000000 00000000 c0730000 00009032 c0734b50 
c0730000 c0001188 00000000
[    6.365337] NIP [c00134f4] patch_instruction+0x20c/0x324
[    6.370576] LR [c00134ec] patch_instruction+0x204/0x324
[    6.375690] Call Trace:
[    6.378150] [c9023e68] [c00134b8] patch_instruction+0x1d0/0x324 
(unreliable)
[    6.385121] [c9023ee8] [c00114fc] mmu_mark_initmem_nx+0x44/0x124
[    6.391051] [c9023f18] [c000f42c] free_initmem+0x20/0x58
[    6.396303] [c9023f28] [c0003980] kernel_init+0x1c/0x130
[    6.401550] [c9023f38] [c000e184] ret_from_kernel_thread+0x14/0x1c
[    6.407617] Instruction dump:
[    6.410554] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[    6.418212] 7fc3f378 4800086d 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[    6.426046] ---[ end trace 18702eef58b6f600 ]---
[    6.430941] ------------[ cut here ]------------
[    6.435243] Bug: fault blocked by AP register !
[    6.435363] WARNING: CPU: 0 PID: 1 at 
./arch/powerpc/include/asm/nohash/32/kup-8xx.h:67 do_page_fault+0x660/0x6ec
[    6.449923] CPU: 0 PID: 1 Comm: swapper Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3524
[    6.459875] NIP:  c000f148 LR: c000f148 CTR: 00000000
[    6.464881] REGS: c9023ca8 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[    6.474577] MSR:  00021032 <ME,IR,DR,RI>  CR: 39023333  XER: a0002100
[    6.480945]
[    6.480945] GPR00: c000f148 c9023d60 c60ec000 00000023 c0924862 
0000000b c0921f7a 6c742062
[    6.480945] GPR08: 00001032 c088e534 00000000 00000004 39023333 
00000000 c0003964 00000000
[    6.480945] GPR16: 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000
[    6.480945] GPR24: 00000000 00000000 c0730000 00000300 c60dc000 
2308511c 82000000 c9023db0
[    6.515670] NIP [c000f148] do_page_fault+0x660/0x6ec
[    6.520573] LR [c000f148] do_page_fault+0x660/0x6ec
[    6.525350] Call Trace:
[    6.527804] [c9023d60] [c000f148] do_page_fault+0x660/0x6ec (unreliable)
[    6.534428] [c9023da0] [c000e2f0] handle_page_fault+0x8/0x34
[    6.540072] --- interrupt: 301 at __patch_instruction+0x4/0x2c
[    6.540072]     LR = patch_instruction+0x144/0x324
[    6.550531] [c9023e68] [c0013408] patch_instruction+0x120/0x324 
(unreliable)
[    6.557500] [c9023ee8] [c0011534] mmu_mark_initmem_nx+0x7c/0x124
[    6.563431] [c9023f18] [c000f42c] free_initmem+0x20/0x58
[    6.568683] [c9023f28] [c0003980] kernel_init+0x1c/0x130
[    6.573931] [c9023f38] [c000e184] ret_from_kernel_thread+0x14/0x1c
[    6.579996] Instruction dump:
[    6.582934] 4182fc30 39200002 912a0610 7fa5eb78 38800002 38600007 
4801e8a9 38600000
[    6.590592] 4bfffa78 3c60c06d 38632f0c 4800eb95 <0fe00000> 3860000b 
4bfffa60 73490040
[    6.598425] ---[ end trace 18702eef58b6f601 ]---
[    6.603026] ------------[ cut here ]------------
[    6.607623] WARNING: CPU: 0 PID: 1 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[    6.616968] CPU: 0 PID: 1 Comm: swapper Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3524
[    6.626919] NIP:  c00134f4 LR: c00134ec CTR: 00000000
[    6.631925] REGS: c9023db0 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[    6.641621] MSR:  00021032 <ME,IR,DR,RI>  CR: 55000933  XER: a0002100
[    6.647988]
[    6.647988] GPR00: 2b8ac060 c9023e68 c60ec000 00000001 c9023ea8 
00000004 c000111c 00000004
[    6.647988] GPR08: 00000000 00000000 00000000 c53720f0 33000333 
00000000 c0003964 00000000
[    6.647988] GPR16: 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000
[    6.647988] GPR24: 00000000 00000000 c0730000 00009032 c0734b50 
c0730000 c000111c 00000000
[    6.682728] NIP [c00134f4] patch_instruction+0x20c/0x324
[    6.687968] LR [c00134ec] patch_instruction+0x204/0x324
[    6.693082] Call Trace:
[    6.695542] [c9023e68] [c00134b8] patch_instruction+0x1d0/0x324 
(unreliable)
[    6.702512] [c9023ee8] [c0011534] mmu_mark_initmem_nx+0x7c/0x124
[    6.708443] [c9023f18] [c000f42c] free_initmem+0x20/0x58
[    6.713694] [c9023f28] [c0003980] kernel_init+0x1c/0x130
[    6.718942] [c9023f38] [c000e184] ret_from_kernel_thread+0x14/0x1c
[    6.725008] Instruction dump:
[    6.727946] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[    6.735604] 7fc3f378 4800086d 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[    6.743437] ---[ end trace 18702eef58b6f602 ]---
[    6.759669] Freeing unused kernel memory: 496K
[    6.764014] ------------[ cut here ]------------
[    6.768382] Bug: fault blocked by AP register !
[    6.768515] WARNING: CPU: 0 PID: 1 at 
./arch/powerpc/include/asm/nohash/32/kup-8xx.h:67 do_page_fault+0x660/0x6ec
[    6.783065] CPU: 0 PID: 1 Comm: swapper Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3524
[    6.793016] NIP:  c000f148 LR: c000f148 CTR: 00000000
[    6.798022] REGS: c9023c98 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[    6.807717] MSR:  00021032 <ME,IR,DR,RI>  CR: 39023333  XER: a0002100
[    6.814085]
[    6.814085] GPR00: c000f148 c9023d50 c60ec000 00000023 c0924862 
0000000b c0921f7a 6c742062
[    6.814085] GPR08: 00001032 c088e534 00000000 00000004 39023333 
00000000 c0003964 00000000
[    6.814085] GPR16: 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000
[    6.814085] GPR24: 00000000 00000000 c0730000 00000300 c60dc000 
230852b0 82000000 c9023da0
[    6.848811] NIP [c000f148] do_page_fault+0x660/0x6ec
[    6.853714] LR [c000f148] do_page_fault+0x660/0x6ec
[    6.858491] Call Trace:
[    6.860944] [c9023d50] [c000f148] do_page_fault+0x660/0x6ec (unreliable)
[    6.867569] [c9023d90] [c000e2f0] handle_page_fault+0x8/0x34
[    6.873215] --- interrupt: 301 at __patch_instruction+0x4/0x2c
[    6.873215]     LR = patch_instruction+0x144/0x324
[    6.883670] [c9023e58] [c0013408] patch_instruction+0x120/0x324 
(unreliable)
[    6.890640] [c9023ed8] [c0011624] mmu_mark_rodata_ro+0x48/0xf8
[    6.896401] [c9023f08] [c000fe50] mark_rodata_ro+0xc4/0xd8
[    6.901824] [c9023f28] [c0003998] kernel_init+0x34/0x130
[    6.907072] [c9023f38] [c000e184] ret_from_kernel_thread+0x14/0x1c
[    6.913137] Instruction dump:
[    6.916074] 4182fc30 39200002 912a0610 7fa5eb78 38800002 38600007 
4801e8a9 38600000
[    6.923732] 4bfffa78 3c60c06d 38632f0c 4800eb95 <0fe00000> 3860000b 
4bfffa60 73490040
[    6.931565] ---[ end trace 18702eef58b6f603 ]---
[    6.936166] ------------[ cut here ]------------
[    6.940763] WARNING: CPU: 0 PID: 1 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[    6.950108] CPU: 0 PID: 1 Comm: swapper Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3524
[    6.960059] NIP:  c00134f4 LR: c00134ec CTR: 00000000
[    6.965067] REGS: c9023da0 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[    6.974761] MSR:  00021032 <ME,IR,DR,RI>  CR: 99000333  XER: a0002100
[    6.981129]
[    6.981129] GPR00: 3d6aff80 c9023e58 c60ec000 00000001 c9023e98 
00000004 c00012b0 00000004
[    6.981129] GPR08: 00000000 fffffffe 00000000 00000000 39000335 
00000000 c0003964 00000000
[    6.981129] GPR16: 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000
[    6.981129] GPR24: 00000000 00000000 c0730000 00009032 c0734b50 
c0730000 c00012b0 00000000
[    7.015870] NIP [c00134f4] patch_instruction+0x20c/0x324
[    7.021109] LR [c00134ec] patch_instruction+0x204/0x324
[    7.026222] Call Trace:
[    7.028683] [c9023e58] [c00134b8] patch_instruction+0x1d0/0x324 
(unreliable)
[    7.035653] [c9023ed8] [c0011624] mmu_mark_rodata_ro+0x48/0xf8
[    7.041415] [c9023f08] [c000fe50] mark_rodata_ro+0xc4/0xd8
[    7.046835] [c9023f28] [c0003998] kernel_init+0x34/0x130
[    7.052083] [c9023f38] [c000e184] ret_from_kernel_thread+0x14/0x1c
[    7.058149] Instruction dump:
[    7.061086] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[    7.068744] 7fc3f378 4800086d 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[    7.076578] ---[ end trace 18702eef58b6f604 ]---

Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 0/3] Use per-CPU temporary mappings for patching
  2020-03-23 11:30 ` [RFC PATCH 0/3] Use per-CPU temporary mappings for patching Christophe Leroy
@ 2020-03-23 14:04   ` Christophe Leroy
       [not found]     ` <738663735.45060.1584982175261@privateemail.com>
  2020-03-24 16:02     ` Christophe Leroy
  0 siblings, 2 replies; 23+ messages in thread
From: Christophe Leroy @ 2020-03-23 14:04 UTC (permalink / raw)
  To: Christopher M. Riedl, linuxppc-dev



On 03/23/2020 11:30 AM, Christophe Leroy wrote:
> 
> 
> On 03/23/2020 04:52 AM, Christopher M. Riedl wrote:
>> When compiled with CONFIG_STRICT_KERNEL_RWX, the kernel must create
>> temporary mappings when patching itself. These mappings temporarily
>> override the strict RWX text protections to permit a write. Currently,
>> powerpc allocates a per-CPU VM area for patching. Patching occurs as
>> follows:
>>
>>     1. Map page of text to be patched to per-CPU VM area w/
>>        PAGE_KERNEL protection
>>     2. Patch text
>>     3. Remove the temporary mapping
>>
>> While the VM area is per-CPU, the mapping is actually inserted into the
>> kernel page tables. Presumably, this could allow another CPU to access
>> the normally write-protected text - either malicously or accidentally -
>> via this same mapping if the address of the VM area is known. Ideally,
>> the mapping should be kept local to the CPU doing the patching (or any
>> other sensitive operations requiring temporarily overriding memory
>> protections) [0].
>>
>> x86 introduced "temporary mm" structs which allow the creation of
>> mappings local to a particular CPU [1]. This series intends to bring the
>> notion of a temporary mm to powerpc and harden powerpc by using such a
>> mapping for patching a kernel with strict RWX permissions.
>>
>> The first patch introduces the temporary mm struct and API for powerpc
>> along with a new function to retrieve a current hw breakpoint.
>>
>> The second patch uses the `poking_init` init hook added by the x86
>> patches to initialize a temporary mm and patching address. The patching
>> address is randomized between 0 and DEFAULT_MAP_WINDOW-PAGE_SIZE. The
>> upper limit is necessary due to how the hash MMU operates - by default
>> the space above DEFAULT_MAP_WINDOW is not available. For now, both hash
>> and radix randomize inside this range. The number of possible random
>> addresses is dependent on PAGE_SIZE and limited by DEFAULT_MAP_WINDOW.
>>
>> Bits of entropy with 64K page size on BOOK3S_64:
>>
>>     bits-o-entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
>>
>>     PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
>>     bits-o-entropy = log2(128TB / 64K)
>>     bits-o-entropy = 31
>>
>> Currently, randomization occurs only once during initialization at boot.
>>
>> The third patch replaces the VM area with the temporary mm in the
>> patching code. The page for patching has to be mapped PAGE_SHARED with
>> the hash MMU since hash prevents the kernel from accessing userspace
>> pages with PAGE_PRIVILEGED bit set. There is on-going work on my side to
>> explore if this is actually necessary in the hash codepath.
>>
>> Testing so far is limited to booting on QEMU (power8 and power9 targets)
>> and a POWER8 VM along with setting some simple xmon breakpoints (which
>> makes use of code-patching). A POC lkdtm test is in-progress to actually
>> exploit the existing vulnerability (ie. the mapping during patching is
>> exposed in kernel page tables and accessible by other CPUS) - this will
>> accompany a future v1 of this series.
> 
> Got following failures on an 8xx. Note that "fault blocked by AP 
> register !" means an unauthorised access from Kernel to Userspace.
> 

Still a problem even without CONFIG_PPC_KUAP:

[    5.901899] ------------[ cut here ]------------
[    5.906329] WARNING: CPU: 0 PID: 1 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[    5.915658] CPU: 0 PID: 1 Comm: swapper Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[    5.925610] NIP:  c0012468 LR: c0012460 CTR: 00000000
[    5.930614] REGS: c9023db0 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[    5.940310] MSR:  00021032 <ME,IR,DR,RI>  CR: 44000822  XER: 20000000
[    5.946677]
[    5.946677] GPR00: 3d6a4000 c9023e68 c60ec000 00000001 c9023ea8 
00000004 c0001188 00000004
[    5.946677] GPR08: 00000000 00000000 00000000 c53720f0 22000222 
00000000 c0003964 00000000
[    5.946677] GPR16: 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000
[    5.946677] GPR24: 00000000 00000000 c0730000 00009032 c0734a88 
c0730000 c0001188 00000000
[    5.981416] NIP [c0012468] patch_instruction+0x20c/0x324
[    5.986656] LR [c0012460] patch_instruction+0x204/0x324
[    5.991772] Call Trace:
[    5.994231] [c9023e68] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[    6.001201] [c9023ee8] [c0010470] mmu_mark_initmem_nx+0x44/0x124
[    6.007132] [c9023f18] [c000e3cc] free_initmem+0x20/0x58
[    6.012383] [c9023f28] [c0003980] kernel_init+0x1c/0x130
[    6.017630] [c9023f38] [c000d174] ret_from_kernel_thread+0x14/0x1c
[    6.023698] Instruction dump:
[    6.026635] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[    6.034293] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[    6.042128] ---[ end trace c46738768244c84e ]---
[    6.047021] ------------[ cut here ]------------
[    6.051422] WARNING: CPU: 0 PID: 1 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[    6.060756] CPU: 0 PID: 1 Comm: swapper Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[    6.070708] NIP:  c0012468 LR: c0012460 CTR: 00000000
[    6.075712] REGS: c9023db0 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[    6.085408] MSR:  00021032 <ME,IR,DR,RI>  CR: 44000822  XER: 20000000
[    6.091775]
[    6.091775] GPR00: 2b8ac060 c9023e68 c60ec000 00000001 c9023ea8 
00000004 c000111c 00000004
[    6.091775] GPR08: 00000000 00000000 00000000 c53720f0 22000222 
00000000 c0003964 00000000
[    6.091775] GPR16: 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000
[    6.091775] GPR24: 00000000 00000000 c0730000 00009032 c0734a88 
c0730000 c000111c 00000000
[    6.126510] NIP [c0012468] patch_instruction+0x20c/0x324
[    6.131754] LR [c0012460] patch_instruction+0x204/0x324
[    6.136870] Call Trace:
[    6.139329] [c9023e68] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[    6.146299] [c9023ee8] [c00104a8] mmu_mark_initmem_nx+0x7c/0x124
[    6.152229] [c9023f18] [c000e3cc] free_initmem+0x20/0x58
[    6.157480] [c9023f28] [c0003980] kernel_init+0x1c/0x130
[    6.162729] [c9023f38] [c000d174] ret_from_kernel_thread+0x14/0x1c
[    6.168796] Instruction dump:
[    6.171733] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[    6.179391] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[    6.187226] ---[ end trace c46738768244c84f ]---
[    6.203450] Freeing unused kernel memory: 496K
[    6.207700] ------------[ cut here ]------------
[    6.212280] WARNING: CPU: 0 PID: 1 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[    6.221605] CPU: 0 PID: 1 Comm: swapper Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[    6.231556] NIP:  c0012468 LR: c0012460 CTR: 00000000
[    6.236559] REGS: c9023da0 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[    6.246256] MSR:  00021032 <ME,IR,DR,RI>  CR: 88000222  XER: 20000000
[    6.252622]
[    6.252622] GPR00: 3d6aff80 c9023e58 c60ec000 00000001 c9023e98 
00000004 c00012b0 00000004
[    6.252622] GPR08: 00000000 fffffffe 00000000 00000000 28000224 
00000000 c0003964 00000000
[    6.252622] GPR16: 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000
[    6.252622] GPR24: 00000000 00000000 c0730000 00009032 c0734a88 
c0730000 c00012b0 00000000
[    6.287362] NIP [c0012468] patch_instruction+0x20c/0x324
[    6.292602] LR [c0012460] patch_instruction+0x204/0x324
[    6.297718] Call Trace:
[    6.300178] [c9023e58] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[    6.307148] [c9023ed8] [c0010598] mmu_mark_rodata_ro+0x48/0xf8
[    6.312907] [c9023f08] [c000edf0] mark_rodata_ro+0xc4/0xd8
[    6.318327] [c9023f28] [c0003998] kernel_init+0x34/0x130
[    6.323576] [c9023f38] [c000d174] ret_from_kernel_thread+0x14/0x1c
[    6.329643] Instruction dump:
[    6.332580] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[    6.340238] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[    6.348074] ---[ end trace c46738768244c850 ]---

[   10.024271] ------------[ cut here ]------------
[   10.028719] WARNING: CPU: 0 PID: 153 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[   10.038218] CPU: 0 PID: 153 Comm: nft Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[   10.047999] NIP:  c0012468 LR: c0012460 CTR: 00000000
[   10.053003] REGS: ca473960 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[   10.062699] MSR:  00021032 <ME,IR,DR,RI>  CR: 48008842  XER: 20000000
[   10.069066]
[   10.069066] GPR00: 48000028 ca473a18 c65f2ae0 00000001 ca473a58 
00000004 c045b304 00000004
[   10.069066] GPR08: 00000000 00000000 00000000 00000004 c6276b2c 
10093080 00000000 00000000
[   10.069066] GPR16: 00000000 c0665cac 00000000 c6619050 c6619070 
00000000 c63e5800 ca473bb0
[   10.069066] GPR24: 00000001 00000001 c0730000 00009032 c0734a88 
c0730000 c045b304 00000000
[   10.103800] NIP [c0012468] patch_instruction+0x20c/0x324
[   10.109041] LR [c0012460] patch_instruction+0x204/0x324
[   10.114161] Call Trace:
[   10.116617] [ca473a18] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[   10.123592] [ca473a98] [c00bfdd8] jump_label_update+0xe0/0x128
[   10.129353] [ca473ac8] [c00bfff0] 
static_key_slow_inc_cpuslocked+0x108/0x114
[   10.136324] [ca473ad8] [c041c3e8] __nf_register_net_hook+0xb0/0x1a4
[   10.142512] [ca473b08] [c041c6c8] nf_register_net_hook+0x28/0x94
[   10.148458] [ca473b28] [c0440230] nf_tables_newchain+0x894/0xc04
[   10.154395] [ca473ba8] [c041ede0] nfnetlink_rcv_batch+0x438/0x4c0
[   10.160415] [ca473ca8] [c041ef80] nfnetlink_rcv+0x118/0x138
[   10.165943] [ca473cd8] [c040f138] netlink_unicast+0x18c/0x240
[   10.171610] [ca473d08] [c040fa14] netlink_sendmsg+0x278/0x398
[   10.177281] [ca473d58] [c03ace48] ____sys_sendmsg+0xac/0x1e4
[   10.182873] [ca473db8] [c03ad174] ___sys_sendmsg+0x64/0x88
[   10.188304] [ca473ea8] [c03aeea0] __sys_sendmsg+0x44/0x88
[   10.193638] [ca473f08] [c03af3d8] sys_socketcall+0xf4/0x1fc
[   10.199143] [ca473f38] [c000d0c0] ret_from_syscall+0x0/0x34
[   10.204645] --- interrupt: c01 at 0xfd95304
[   10.204645]     LR = 0xfd952e4
[   10.211752] Instruction dump:
[   10.214688] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[   10.222346] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[   10.230182] ---[ end trace c46738768244c851 ]---
[   10.235177] ------------[ cut here ]------------
[   10.239561] WARNING: CPU: 0 PID: 153 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[   10.249067] CPU: 0 PID: 153 Comm: nft Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[   10.258848] NIP:  c0012468 LR: c0012460 CTR: 00000000
[   10.263852] REGS: ca473960 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[   10.273547] MSR:  00021032 <ME,IR,DR,RI>  CR: 48008842  XER: 20000000
[   10.279915]
[   10.279915] GPR00: 4800001c ca473a18 c65f2ae0 00000001 ca473a58 
00000004 c045f8b8 00000004
[   10.279915] GPR08: 00000000 00000000 00000000 00000004 c6276b4c 
10093080 00000000 00000000
[   10.279915] GPR16: 00000000 c0665cac 00000000 c661909c c66190bc 
00000000 c63e5800 ca473bb0
[   10.279915] GPR24: 00000001 00000001 c0730000 00009032 c0734a88 
c0730000 c045f8b8 00000000
[   10.314647] NIP [c0012468] patch_instruction+0x20c/0x324
[   10.319890] LR [c0012460] patch_instruction+0x204/0x324
[   10.325009] Call Trace:
[   10.327465] [ca473a18] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[   10.334439] [ca473a98] [c00bfdd8] jump_label_update+0xe0/0x128
[   10.340202] [ca473ac8] [c00bfff0] 
static_key_slow_inc_cpuslocked+0x108/0x114
[   10.347173] [ca473ad8] [c041c3e8] __nf_register_net_hook+0xb0/0x1a4
[   10.353361] [ca473b08] [c041c6c8] nf_register_net_hook+0x28/0x94
[   10.359308] [ca473b28] [c0440230] nf_tables_newchain+0x894/0xc04
[   10.365243] [ca473ba8] [c041ede0] nfnetlink_rcv_batch+0x438/0x4c0
[   10.371263] [ca473ca8] [c041ef80] nfnetlink_rcv+0x118/0x138
[   10.376792] [ca473cd8] [c040f138] netlink_unicast+0x18c/0x240
[   10.382459] [ca473d08] [c040fa14] netlink_sendmsg+0x278/0x398
[   10.388130] [ca473d58] [c03ace48] ____sys_sendmsg+0xac/0x1e4
[   10.393721] [ca473db8] [c03ad174] ___sys_sendmsg+0x64/0x88
[   10.399150] [ca473ea8] [c03aeea0] __sys_sendmsg+0x44/0x88
[   10.404486] [ca473f08] [c03af3d8] sys_socketcall+0xf4/0x1fc
[   10.409993] [ca473f38] [c000d0c0] ret_from_syscall+0x0/0x34
[   10.415494] --- interrupt: c01 at 0xfd95304
[   10.415494]     LR = 0xfd952e4
[   10.422600] Instruction dump:
[   10.425536] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[   10.433195] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[   10.441030] ---[ end trace c46738768244c852 ]---
[   10.445997] ------------[ cut here ]------------
[   10.450411] WARNING: CPU: 0 PID: 153 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[   10.459916] CPU: 0 PID: 153 Comm: nft Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[   10.469696] NIP:  c0012468 LR: c0012460 CTR: 00000000
[   10.474700] REGS: ca473960 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[   10.484396] MSR:  00021032 <ME,IR,DR,RI>  CR: 48008842  XER: 20000000
[   10.490763]
[   10.490763] GPR00: 480000f8 ca473a18 c65f2ae0 00000001 ca473a58 
00000004 c048eaf0 00000004
[   10.490763] GPR08: 00000000 00000000 00000000 00000004 c6276b4c 
10093080 00000000 00000000
[   10.490763] GPR16: 00000000 c0665cac 00000000 c661909c c66190bc 
00000000 c63e5800 ca473bb0
[   10.490763] GPR24: 00000001 00000001 c0730000 00009032 c0734a88 
c0730000 c048eaf0 00000000
[   10.525498] NIP [c0012468] patch_instruction+0x20c/0x324
[   10.530738] LR [c0012460] patch_instruction+0x204/0x324
[   10.535858] Call Trace:
[   10.538313] [ca473a18] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[   10.545287] [ca473a98] [c00bfdd8] jump_label_update+0xe0/0x128
[   10.551049] [ca473ac8] [c00bfff0] 
static_key_slow_inc_cpuslocked+0x108/0x114
[   10.558022] [ca473ad8] [c041c3e8] __nf_register_net_hook+0xb0/0x1a4
[   10.564209] [ca473b08] [c041c6c8] nf_register_net_hook+0x28/0x94
[   10.570155] [ca473b28] [c0440230] nf_tables_newchain+0x894/0xc04
[   10.576092] [ca473ba8] [c041ede0] nfnetlink_rcv_batch+0x438/0x4c0
[   10.582112] [ca473ca8] [c041ef80] nfnetlink_rcv+0x118/0x138
[   10.587639] [ca473cd8] [c040f138] netlink_unicast+0x18c/0x240
[   10.593307] [ca473d08] [c040fa14] netlink_sendmsg+0x278/0x398
[   10.598978] [ca473d58] [c03ace48] ____sys_sendmsg+0xac/0x1e4
[   10.604569] [ca473db8] [c03ad174] ___sys_sendmsg+0x64/0x88
[   10.609999] [ca473ea8] [c03aeea0] __sys_sendmsg+0x44/0x88
[   10.615334] [ca473f08] [c03af3d8] sys_socketcall+0xf4/0x1fc
[   10.620841] [ca473f38] [c000d0c0] ret_from_syscall+0x0/0x34
[   10.626342] --- interrupt: c01 at 0xfd95304
[   10.626342]     LR = 0xfd952e4
[   10.633449] Instruction dump:
[   10.636385] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[   10.644043] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[   10.651879] ---[ end trace c46738768244c853 ]---
[   10.656931] ------------[ cut here ]------------
[   10.661347] WARNING: CPU: 0 PID: 153 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[   10.670850] CPU: 0 PID: 153 Comm: nft Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[   10.680631] NIP:  c0012468 LR: c0012460 CTR: 00000000
[   10.685635] REGS: ca473960 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[   10.695330] MSR:  00021032 <ME,IR,DR,RI>  CR: 48008842  XER: 20000000
[   10.701699]
[   10.701699] GPR00: 4800008c ca473a18 c65f2ae0 00000001 ca473a58 
00000004 c045ca54 00000004
[   10.701699] GPR08: 00000000 00000000 00000000 00000004 c6276b6c 
10093080 00000000 00000000
[   10.701699] GPR16: 00000000 c0665cac 00000000 c66190e8 c6619108 
00000000 c63e5800 ca473bb0
[   10.701699] GPR24: 00000001 00000001 c0730000 00009032 c0734a88 
c0730000 c045ca54 00000000
[   10.736428] NIP [c0012468] patch_instruction+0x20c/0x324
[   10.741673] LR [c0012460] patch_instruction+0x204/0x324
[   10.746792] Call Trace:
[   10.749249] [ca473a18] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[   10.756222] [ca473a98] [c00bfdd8] jump_label_update+0xe0/0x128
[   10.761986] [ca473ac8] [c00bfff0] 
static_key_slow_inc_cpuslocked+0x108/0x114
[   10.768956] [ca473ad8] [c041c3e8] __nf_register_net_hook+0xb0/0x1a4
[   10.775144] [ca473b08] [c041c6c8] nf_register_net_hook+0x28/0x94
[   10.781091] [ca473b28] [c0440230] nf_tables_newchain+0x894/0xc04
[   10.787026] [ca473ba8] [c041ede0] nfnetlink_rcv_batch+0x438/0x4c0
[   10.793046] [ca473ca8] [c041ef80] nfnetlink_rcv+0x118/0x138
[   10.798576] [ca473cd8] [c040f138] netlink_unicast+0x18c/0x240
[   10.804241] [ca473d08] [c040fa14] netlink_sendmsg+0x278/0x398
[   10.809913] [ca473d58] [c03ace48] ____sys_sendmsg+0xac/0x1e4
[   10.815505] [ca473db8] [c03ad174] ___sys_sendmsg+0x64/0x88
[   10.820933] [ca473ea8] [c03aeea0] __sys_sendmsg+0x44/0x88
[   10.826268] [ca473f08] [c03af3d8] sys_socketcall+0xf4/0x1fc
[   10.831776] [ca473f38] [c000d0c0] ret_from_syscall+0x0/0x34
[   10.837277] --- interrupt: c01 at 0xfd95304
[   10.837277]     LR = 0xfd952e4
[   10.844383] Instruction dump:
[   10.847319] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[   10.854978] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[   10.862813] ---[ end trace c46738768244c854 ]---
[   10.867709] ------------[ cut here ]------------
[   10.872107] WARNING: CPU: 0 PID: 153 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[   10.881612] CPU: 0 PID: 153 Comm: nft Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[   10.891393] NIP:  c0012468 LR: c0012460 CTR: 00000000
[   10.896397] REGS: ca473960 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[   10.906092] MSR:  00021032 <ME,IR,DR,RI>  CR: 48008842  XER: 20000000
[   10.912460]
[   10.912460] GPR00: 48000008 ca473a18 c65f2ae0 00000001 ca473a58 
00000004 c04bdcf8 00000004
[   10.912460] GPR08: 00000000 00000000 00000000 00000004 c6276b6c 
10093080 00000000 00000000
[   10.912460] GPR16: 00000000 c0665cac 00000000 c66190e8 c6619108 
00000000 c63e5800 ca473bb0
[   10.912460] GPR24: 00000001 00000001 c0730000 00009032 c0734a88 
c0730000 c04bdcf8 00000000
[   10.947193] NIP [c0012468] patch_instruction+0x20c/0x324
[   10.952435] LR [c0012460] patch_instruction+0x204/0x324
[   10.957555] Call Trace:
[   10.960011] [ca473a18] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[   10.966984] [ca473a98] [c00bfdd8] jump_label_update+0xe0/0x128
[   10.972747] [ca473ac8] [c00bfff0] 
static_key_slow_inc_cpuslocked+0x108/0x114
[   10.979720] [ca473ad8] [c041c3e8] __nf_register_net_hook+0xb0/0x1a4
[   10.985907] [ca473b08] [c041c6c8] nf_register_net_hook+0x28/0x94
[   10.991853] [ca473b28] [c0440230] nf_tables_newchain+0x894/0xc04
[   10.997790] [ca473ba8] [c041ede0] nfnetlink_rcv_batch+0x438/0x4c0
[   11.003808] [ca473ca8] [c041ef80] nfnetlink_rcv+0x118/0x138
[   11.009337] [ca473cd8] [c040f138] netlink_unicast+0x18c/0x240
[   11.015005] [ca473d08] [c040fa14] netlink_sendmsg+0x278/0x398
[   11.020675] [ca473d58] [c03ace48] ____sys_sendmsg+0xac/0x1e4
[   11.026267] [ca473db8] [c03ad174] ___sys_sendmsg+0x64/0x88
[   11.031696] [ca473ea8] [c03aeea0] __sys_sendmsg+0x44/0x88
[   11.037031] [ca473f08] [c03af3d8] sys_socketcall+0xf4/0x1fc
[   11.042537] [ca473f38] [c000d0c0] ret_from_syscall+0x0/0x34
[   11.048039] --- interrupt: c01 at 0xfd95304
[   11.048039]     LR = 0xfd952e4
[   11.055146] Instruction dump:
[   11.058082] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[   11.065741] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[   11.073576] ---[ end trace c46738768244c855 ]---
[   11.078537] ------------[ cut here ]------------
[   11.082954] WARNING: CPU: 0 PID: 153 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[   11.092461] CPU: 0 PID: 153 Comm: nft Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[   11.102242] NIP:  c0012468 LR: c0012460 CTR: 00000000
[   11.107246] REGS: ca473960 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[   11.116941] MSR:  00021032 <ME,IR,DR,RI>  CR: 48008842  XER: 20000000
[   11.123309]
[   11.123309] GPR00: 4bfffdd8 ca473a18 c65f2ae0 00000001 ca473a58 
00000004 c04bdf28 00000004
[   11.123309] GPR08: 00000000 00000000 00000000 00000004 c6276b6c 
10093080 00000000 00000000
[   11.123309] GPR16: 00000000 c0665cac 00000000 c66190e8 c6619108 
00000000 c63e5800 ca473bb0
[   11.123309] GPR24: 00000001 00000001 c0730000 00009032 c0734a88 
c0730000 c04bdf28 00000000
[   11.158041] NIP [c0012468] patch_instruction+0x20c/0x324
[   11.163283] LR [c0012460] patch_instruction+0x204/0x324
[   11.168403] Call Trace:
[   11.170859] [ca473a18] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[   11.177834] [ca473a98] [c00bfdd8] jump_label_update+0xe0/0x128
[   11.183596] [ca473ac8] [c00bfff0] 
static_key_slow_inc_cpuslocked+0x108/0x114
[   11.190567] [ca473ad8] [c041c3e8] __nf_register_net_hook+0xb0/0x1a4
[   11.196754] [ca473b08] [c041c6c8] nf_register_net_hook+0x28/0x94
[   11.202701] [ca473b28] [c0440230] nf_tables_newchain+0x894/0xc04
[   11.208637] [ca473ba8] [c041ede0] nfnetlink_rcv_batch+0x438/0x4c0
[   11.214656] [ca473ca8] [c041ef80] nfnetlink_rcv+0x118/0x138
[   11.220185] [ca473cd8] [c040f138] netlink_unicast+0x18c/0x240
[   11.225852] [ca473d08] [c040fa14] netlink_sendmsg+0x278/0x398
[   11.231524] [ca473d58] [c03ace48] ____sys_sendmsg+0xac/0x1e4
[   11.237115] [ca473db8] [c03ad174] ___sys_sendmsg+0x64/0x88
[   11.242545] [ca473ea8] [c03aeea0] __sys_sendmsg+0x44/0x88
[   11.247880] [ca473f08] [c03af3d8] sys_socketcall+0xf4/0x1fc
[   11.253386] [ca473f38] [c000d0c0] ret_from_syscall+0x0/0x34
[   11.258888] --- interrupt: c01 at 0xfd95304
[   11.258888]     LR = 0xfd952e4
[   11.265994] Instruction dump:
[   11.268931] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[   11.276588] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[   11.284424] ---[ end trace c46738768244c856 ]---
[   11.289492] ------------[ cut here ]------------
[   11.293889] WARNING: CPU: 0 PID: 153 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[   11.303396] CPU: 0 PID: 153 Comm: nft Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[   11.313176] NIP:  c0012468 LR: c0012460 CTR: 00000000
[   11.318180] REGS: ca4738d0 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[   11.327876] MSR:  00021032 <ME,IR,DR,RI>  CR: 44008442  XER: 20000000
[   11.334243]
[   11.334243] GPR00: 48000104 ca473988 c65f2ae0 00000001 ca4739c8 
00000004 c045a954 00000004
[   11.334243] GPR08: 00000000 00000000 00000000 00000004 c6276b8c 
10093080 00000000 00000000
[   11.334243] GPR16: 00000000 c0665dd8 00000000 c6619134 c08e7340 
c63e5800 ca473c34 c625d6b0
[   11.334243] GPR24: 00000002 00000001 c0730000 00009032 c0734a88 
c0730000 c045a954 00000000
[   11.368977] NIP [c0012468] patch_instruction+0x20c/0x324
[   11.374218] LR [c0012460] patch_instruction+0x204/0x324
[   11.379338] Call Trace:
[   11.381794] [ca473988] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[   11.388767] [ca473a08] [c00bfdd8] jump_label_update+0xe0/0x128
[   11.394530] [ca473a38] [c00bfff0] 
static_key_slow_inc_cpuslocked+0x108/0x114
[   11.401500] [ca473a48] [c041c3e8] __nf_register_net_hook+0xb0/0x1a4
[   11.407689] [ca473a78] [c041c6c8] nf_register_net_hook+0x28/0x94
[   11.413628] [ca473a98] [c041c778] nf_register_net_hooks+0x44/0xac
[   11.419665] [ca473ab8] [c04caaf4] nf_defrag_ipv4_enable+0x80/0x94
[   11.425704] [ca473ad8] [c0426a00] nf_ct_netns_do_get+0x164/0x1d4
[   11.431642] [ca473af8] [c044e894] nft_ct_helper_obj_init+0x154/0x1d0
[   11.437898] [ca473b28] [c043f450] nft_obj_init+0xd4/0x178
[   11.443234] [ca473b48] [c0442b80] nf_tables_newobj+0x2e8/0x444
[   11.449003] [ca473ba8] [c041ede0] nfnetlink_rcv_batch+0x438/0x4c0
[   11.455023] [ca473ca8] [c041ef80] nfnetlink_rcv+0x118/0x138
[   11.460548] [ca473cd8] [c040f138] netlink_unicast+0x18c/0x240
[   11.466220] [ca473d08] [c040fa14] netlink_sendmsg+0x278/0x398
[   11.471891] [ca473d58] [c03ace48] ____sys_sendmsg+0xac/0x1e4
[   11.477482] [ca473db8] [c03ad174] ___sys_sendmsg+0x64/0x88
[   11.482910] [ca473ea8] [c03aeea0] __sys_sendmsg+0x44/0x88
[   11.488246] [ca473f08] [c03af3d8] sys_socketcall+0xf4/0x1fc
[   11.493753] [ca473f38] [c000d0c0] ret_from_syscall+0x0/0x34
[   11.499255] --- interrupt: c01 at 0xfd95304
[   11.499255]     LR = 0xfd952e4
[   11.506361] Instruction dump:
[   11.509298] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[   11.516956] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[   11.524792] ---[ end trace c46738768244c857 ]---
[   11.529687] ------------[ cut here ]------------
[   11.534086] WARNING: CPU: 0 PID: 153 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[   11.543591] CPU: 0 PID: 153 Comm: nft Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[   11.553372] NIP:  c0012468 LR: c0012460 CTR: 00000000
[   11.558377] REGS: ca4738d0 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[   11.568071] MSR:  00021032 <ME,IR,DR,RI>  CR: 44008442  XER: 20000000
[   11.574438]
[   11.574438] GPR00: 48000010 ca473988 c65f2ae0 00000001 ca4739c8 
00000004 c045b3dc 00000004
[   11.574438] GPR08: 00000000 00000000 00000000 00000004 c6276b8c 
10093080 00000000 00000000
[   11.574438] GPR16: 00000000 c0665dd8 00000000 c6619134 c08e7340 
c63e5800 ca473c34 c625d6b0
[   11.574438] GPR24: 00000002 00000001 c0730000 00009032 c0734a88 
c0730000 c045b3dc 00000000
[   11.609176] NIP [c0012468] patch_instruction+0x20c/0x324
[   11.614413] LR [c0012460] patch_instruction+0x204/0x324
[   11.619533] Call Trace:
[   11.621989] [ca473988] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[   11.628963] [ca473a08] [c00bfdd8] jump_label_update+0xe0/0x128
[   11.634725] [ca473a38] [c00bfff0] 
static_key_slow_inc_cpuslocked+0x108/0x114
[   11.641697] [ca473a48] [c041c3e8] __nf_register_net_hook+0xb0/0x1a4
[   11.647885] [ca473a78] [c041c6c8] nf_register_net_hook+0x28/0x94
[   11.653823] [ca473a98] [c041c778] nf_register_net_hooks+0x44/0xac
[   11.659862] [ca473ab8] [c04caaf4] nf_defrag_ipv4_enable+0x80/0x94
[   11.665899] [ca473ad8] [c0426a00] nf_ct_netns_do_get+0x164/0x1d4
[   11.671837] [ca473af8] [c044e894] nft_ct_helper_obj_init+0x154/0x1d0
[   11.678092] [ca473b28] [c043f450] nft_obj_init+0xd4/0x178
[   11.683430] [ca473b48] [c0442b80] nf_tables_newobj+0x2e8/0x444
[   11.689200] [ca473ba8] [c041ede0] nfnetlink_rcv_batch+0x438/0x4c0
[   11.695219] [ca473ca8] [c041ef80] nfnetlink_rcv+0x118/0x138
[   11.700744] [ca473cd8] [c040f138] netlink_unicast+0x18c/0x240
[   11.706417] [ca473d08] [c040fa14] netlink_sendmsg+0x278/0x398
[   11.712087] [ca473d58] [c03ace48] ____sys_sendmsg+0xac/0x1e4
[   11.717678] [ca473db8] [c03ad174] ___sys_sendmsg+0x64/0x88
[   11.723106] [ca473ea8] [c03aeea0] __sys_sendmsg+0x44/0x88
[   11.728442] [ca473f08] [c03af3d8] sys_socketcall+0xf4/0x1fc
[   11.733949] [ca473f38] [c000d0c0] ret_from_syscall+0x0/0x34
[   11.739450] --- interrupt: c01 at 0xfd95304
[   11.739450]     LR = 0xfd952e4
[   11.746556] Instruction dump:
[   11.749492] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[   11.757151] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[   11.764987] ---[ end trace c46738768244c858 ]---
[   11.769879] ------------[ cut here ]------------
[   11.774281] WARNING: CPU: 0 PID: 153 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[   11.783787] CPU: 0 PID: 153 Comm: nft Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[   11.793567] NIP:  c0012468 LR: c0012460 CTR: 00000000
[   11.798571] REGS: ca4738d0 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[   11.808266] MSR:  00021032 <ME,IR,DR,RI>  CR: 44008442  XER: 20000000
[   11.814633]
[   11.814633] GPR00: 48000048 ca473988 c65f2ae0 00000001 ca4739c8 
00000004 c04ce9d8 00000004
[   11.814633] GPR08: 00000000 00000000 00000000 00000004 c6276b8c 
10093080 00000000 00000000
[   11.814633] GPR16: 00000000 c0665dd8 00000000 c6619134 c08e7340 
c63e5800 ca473c34 c625d6b0
[   11.814633] GPR24: 00000002 00000001 c0730000 00009032 c0734a88 
c0730000 c04ce9d8 00000000
[   11.849366] NIP [c0012468] patch_instruction+0x20c/0x324
[   11.854610] LR [c0012460] patch_instruction+0x204/0x324
[   11.859728] Call Trace:
[   11.862184] [ca473988] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[   11.869161] [ca473a08] [c00bfdd8] jump_label_update+0xe0/0x128
[   11.874920] [ca473a38] [c00bfff0] 
static_key_slow_inc_cpuslocked+0x108/0x114
[   11.881891] [ca473a48] [c041c3e8] __nf_register_net_hook+0xb0/0x1a4
[   11.888080] [ca473a78] [c041c6c8] nf_register_net_hook+0x28/0x94
[   11.894018] [ca473a98] [c041c778] nf_register_net_hooks+0x44/0xac
[   11.900055] [ca473ab8] [c04caaf4] nf_defrag_ipv4_enable+0x80/0x94
[   11.906096] [ca473ad8] [c0426a00] nf_ct_netns_do_get+0x164/0x1d4
[   11.912034] [ca473af8] [c044e894] nft_ct_helper_obj_init+0x154/0x1d0
[   11.918287] [ca473b28] [c043f450] nft_obj_init+0xd4/0x178
[   11.923625] [ca473b48] [c0442b80] nf_tables_newobj+0x2e8/0x444
[   11.929393] [ca473ba8] [c041ede0] nfnetlink_rcv_batch+0x438/0x4c0
[   11.935414] [ca473ca8] [c041ef80] nfnetlink_rcv+0x118/0x138
[   11.940938] [ca473cd8] [c040f138] netlink_unicast+0x18c/0x240
[   11.946610] [ca473d08] [c040fa14] netlink_sendmsg+0x278/0x398
[   11.952282] [ca473d58] [c03ace48] ____sys_sendmsg+0xac/0x1e4
[   11.957873] [ca473db8] [c03ad174] ___sys_sendmsg+0x64/0x88
[   11.963303] [ca473ea8] [c03aeea0] __sys_sendmsg+0x44/0x88
[   11.968637] [ca473f08] [c03af3d8] sys_socketcall+0xf4/0x1fc
[   11.974144] [ca473f38] [c000d0c0] ret_from_syscall+0x0/0x34
[   11.979646] --- interrupt: c01 at 0xfd95304
[   11.979646]     LR = 0xfd952e4
[   11.986751] Instruction dump:
[   11.989688] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[   11.997346] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[   12.005182] ---[ end trace c46738768244c859 ]---
[   12.010129] ------------[ cut here ]------------
[   12.014564] WARNING: CPU: 0 PID: 153 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[   12.024068] CPU: 0 PID: 153 Comm: nft Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[   12.033847] NIP:  c0012468 LR: c0012460 CTR: 00000000
[   12.038852] REGS: ca4738f0 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[   12.048547] MSR:  00021032 <ME,IR,DR,RI>  CR: 44008442  XER: 20000000
[   12.054914]
[   12.054914] GPR00: 48000008 ca4739a8 c65f2ae0 00000001 ca4739e8 
00000004 c0460db8 00000004
[   12.054914] GPR08: 00000000 00000000 00000000 c66a4508 c6276bac 
10093080 00000000 00000000
[   12.054914] GPR16: 00000000 c0665dd8 00000000 c6619134 c08e7340 
c63e5800 ca473c34 c625d6b0
[   12.054914] GPR24: 00000002 00000001 c0730000 00009032 c0734a88 
c0730000 c0460db8 00000000
[   12.089649] NIP [c0012468] patch_instruction+0x20c/0x324
[   12.094891] LR [c0012460] patch_instruction+0x204/0x324
[   12.100009] Call Trace:
[   12.102466] [ca4739a8] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[   12.109439] [ca473a28] [c00bfdd8] jump_label_update+0xe0/0x128
[   12.115202] [ca473a58] [c00bfff0] 
static_key_slow_inc_cpuslocked+0x108/0x114
[   12.122172] [ca473a68] [c041c3e8] __nf_register_net_hook+0xb0/0x1a4
[   12.128361] [ca473a98] [c041c6c8] nf_register_net_hook+0x28/0x94
[   12.134299] [ca473ab8] [c041c778] nf_register_net_hooks+0x44/0xac
[   12.140354] [ca473ad8] [c0426a28] nf_ct_netns_do_get+0x18c/0x1d4
[   12.146290] [ca473af8] [c044e894] nft_ct_helper_obj_init+0x154/0x1d0
[   12.152545] [ca473b28] [c043f450] nft_obj_init+0xd4/0x178
[   12.157883] [ca473b48] [c0442b80] nf_tables_newobj+0x2e8/0x444
[   12.163650] [ca473ba8] [c041ede0] nfnetlink_rcv_batch+0x438/0x4c0
[   12.169671] [ca473ca8] [c041ef80] nfnetlink_rcv+0x118/0x138
[   12.175195] [ca473cd8] [c040f138] netlink_unicast+0x18c/0x240
[   12.180868] [ca473d08] [c040fa14] netlink_sendmsg+0x278/0x398
[   12.186539] [ca473d58] [c03ace48] ____sys_sendmsg+0xac/0x1e4
[   12.192130] [ca473db8] [c03ad174] ___sys_sendmsg+0x64/0x88
[   12.197558] [ca473ea8] [c03aeea0] __sys_sendmsg+0x44/0x88
[   12.202894] [ca473f08] [c03af3d8] sys_socketcall+0xf4/0x1fc
[   12.208401] [ca473f38] [c000d0c0] ret_from_syscall+0x0/0x34
[   12.213903] --- interrupt: c01 at 0xfd95304
[   12.213903]     LR = 0xfd952e4
[   12.221008] Instruction dump:
[   12.223945] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[   12.231603] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[   12.239438] ---[ end trace c46738768244c85a ]---
[   12.244413] ------------[ cut here ]------------
[   12.248824] WARNING: CPU: 0 PID: 153 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[   12.258325] CPU: 0 PID: 153 Comm: nft Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[   12.268105] NIP:  c0012468 LR: c0012460 CTR: 00000000
[   12.273110] REGS: ca4738f0 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[   12.282804] MSR:  00021032 <ME,IR,DR,RI>  CR: 44008442  XER: 20000000
[   12.289171]
[   12.289171] GPR00: 4800009c ca4739a8 c65f2ae0 00000001 ca4739e8 
00000004 c0460e6c 00000004
[   12.289171] GPR08: 00000000 00000000 00000000 c66a4508 c6276bac 
10093080 00000000 00000000
[   12.289171] GPR16: 00000000 c0665dd8 00000000 c6619134 c08e7340 
c63e5800 ca473c34 c625d6b0
[   12.289171] GPR24: 00000002 00000001 c0730000 00009032 c0734a88 
c0730000 c0460e6c 00000000
[   12.323909] NIP [c0012468] patch_instruction+0x20c/0x324
[   12.329146] LR [c0012460] patch_instruction+0x204/0x324
[   12.334267] Call Trace:
[   12.336722] [ca4739a8] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[   12.343696] [ca473a28] [c00bfdd8] jump_label_update+0xe0/0x128
[   12.349458] [ca473a58] [c00bfff0] 
static_key_slow_inc_cpuslocked+0x108/0x114
[   12.356430] [ca473a68] [c041c3e8] __nf_register_net_hook+0xb0/0x1a4
[   12.362618] [ca473a98] [c041c6c8] nf_register_net_hook+0x28/0x94
[   12.368556] [ca473ab8] [c041c778] nf_register_net_hooks+0x44/0xac
[   12.374607] [ca473ad8] [c0426a28] nf_ct_netns_do_get+0x18c/0x1d4
[   12.380551] [ca473af8] [c044e894] nft_ct_helper_obj_init+0x154/0x1d0
[   12.386801] [ca473b28] [c043f450] nft_obj_init+0xd4/0x178
[   12.392139] [ca473b48] [c0442b80] nf_tables_newobj+0x2e8/0x444
[   12.397909] [ca473ba8] [c041ede0] nfnetlink_rcv_batch+0x438/0x4c0
[   12.403928] [ca473ca8] [c041ef80] nfnetlink_rcv+0x118/0x138
[   12.409452] [ca473cd8] [c040f138] netlink_unicast+0x18c/0x240
[   12.415125] [ca473d08] [c040fa14] netlink_sendmsg+0x278/0x398
[   12.420795] [ca473d58] [c03ace48] ____sys_sendmsg+0xac/0x1e4
[   12.426387] [ca473db8] [c03ad174] ___sys_sendmsg+0x64/0x88
[   12.431814] [ca473ea8] [c03aeea0] __sys_sendmsg+0x44/0x88
[   12.437151] [ca473f08] [c03af3d8] sys_socketcall+0xf4/0x1fc
[   12.442658] [ca473f38] [c000d0c0] ret_from_syscall+0x0/0x34
[   12.448160] --- interrupt: c01 at 0xfd95304
[   12.448160]     LR = 0xfd952e4
[   12.455265] Instruction dump:
[   12.458202] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[   12.465860] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[   12.473696] ---[ end trace c46738768244c85b ]---
[   12.478748] ------------[ cut here ]------------
[   12.483165] WARNING: CPU: 0 PID: 153 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[   12.492667] CPU: 0 PID: 153 Comm: nft Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[   12.502449] NIP:  c0012468 LR: c0012460 CTR: 00000000
[   12.507453] REGS: ca4738f0 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[   12.517147] MSR:  00021032 <ME,IR,DR,RI>  CR: 44008442  XER: 20000000
[   12.523515]
[   12.523515] GPR00: 48000008 ca4739a8 c65f2ae0 00000001 ca4739e8 
00000004 c0460e84 00000004
[   12.523515] GPR08: 00000000 00000000 00000000 c66a4508 c6276bac 
10093080 00000000 00000000
[   12.523515] GPR16: 00000000 c0665dd8 00000000 c6619134 c08e7340 
c63e5800 ca473c34 c625d6b0
[   12.523515] GPR24: 00000002 00000001 c0730000 00009032 c0734a88 
c0730000 c0460e84 00000000
[   12.558246] NIP [c0012468] patch_instruction+0x20c/0x324
[   12.563490] LR [c0012460] patch_instruction+0x204/0x324
[   12.568609] Call Trace:
[   12.571066] [ca4739a8] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[   12.578040] [ca473a28] [c00bfdd8] jump_label_update+0xe0/0x128
[   12.583802] [ca473a58] [c00bfff0] 
static_key_slow_inc_cpuslocked+0x108/0x114
[   12.590772] [ca473a68] [c041c3e8] __nf_register_net_hook+0xb0/0x1a4
[   12.596961] [ca473a98] [c041c6c8] nf_register_net_hook+0x28/0x94
[   12.602899] [ca473ab8] [c041c778] nf_register_net_hooks+0x44/0xac
[   12.608952] [ca473ad8] [c0426a28] nf_ct_netns_do_get+0x18c/0x1d4
[   12.614892] [ca473af8] [c044e894] nft_ct_helper_obj_init+0x154/0x1d0
[   12.621144] [ca473b28] [c043f450] nft_obj_init+0xd4/0x178
[   12.626482] [ca473b48] [c0442b80] nf_tables_newobj+0x2e8/0x444
[   12.632251] [ca473ba8] [c041ede0] nfnetlink_rcv_batch+0x438/0x4c0
[   12.638271] [ca473ca8] [c041ef80] nfnetlink_rcv+0x118/0x138
[   12.643795] [ca473cd8] [c040f138] netlink_unicast+0x18c/0x240
[   12.649467] [ca473d08] [c040fa14] netlink_sendmsg+0x278/0x398
[   12.655139] [ca473d58] [c03ace48] ____sys_sendmsg+0xac/0x1e4
[   12.660730] [ca473db8] [c03ad174] ___sys_sendmsg+0x64/0x88
[   12.666159] [ca473ea8] [c03aeea0] __sys_sendmsg+0x44/0x88
[   12.671495] [ca473f08] [c03af3d8] sys_socketcall+0xf4/0x1fc
[   12.677000] [ca473f38] [c000d0c0] ret_from_syscall+0x0/0x34
[   12.682503] --- interrupt: c01 at 0xfd95304
[   12.682503]     LR = 0xfd952e4
[   12.689609] Instruction dump:
[   12.692545] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[   12.700203] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[   12.708039] ---[ end trace c46738768244c85c ]---
[   12.713016] ------------[ cut here ]------------
[   12.717423] WARNING: CPU: 0 PID: 153 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[   12.726924] CPU: 0 PID: 153 Comm: nft Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[   12.736705] NIP:  c0012468 LR: c0012460 CTR: 00000000
[   12.741710] REGS: ca4738f0 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[   12.751404] MSR:  00021032 <ME,IR,DR,RI>  CR: 44008442  XER: 20000000
[   12.757772]
[   12.757772] GPR00: 48000028 ca4739a8 c65f2ae0 00000001 ca4739e8 
00000004 c046102c 00000004
[   12.757772] GPR08: 00000000 00000000 00000000 c66a4508 c6276bac 
10093080 00000000 00000000
[   12.757772] GPR16: 00000000 c0665dd8 00000000 c6619134 c08e7340 
c63e5800 ca473c34 c625d6b0
[   12.757772] GPR24: 00000002 00000001 c0730000 00009032 c0734a88 
c0730000 c046102c 00000000
[   12.792504] NIP [c0012468] patch_instruction+0x20c/0x324
[   12.797747] LR [c0012460] patch_instruction+0x204/0x324
[   12.802867] Call Trace:
[   12.805323] [ca4739a8] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[   12.812295] [ca473a28] [c00bfdd8] jump_label_update+0xe0/0x128
[   12.818059] [ca473a58] [c00bfff0] 
static_key_slow_inc_cpuslocked+0x108/0x114
[   12.825030] [ca473a68] [c041c3e8] __nf_register_net_hook+0xb0/0x1a4
[   12.831217] [ca473a98] [c041c6c8] nf_register_net_hook+0x28/0x94
[   12.837156] [ca473ab8] [c041c778] nf_register_net_hooks+0x44/0xac
[   12.843211] [ca473ad8] [c0426a28] nf_ct_netns_do_get+0x18c/0x1d4
[   12.849148] [ca473af8] [c044e894] nft_ct_helper_obj_init+0x154/0x1d0
[   12.855401] [ca473b28] [c043f450] nft_obj_init+0xd4/0x178
[   12.860740] [ca473b48] [c0442b80] nf_tables_newobj+0x2e8/0x444
[   12.866507] [ca473ba8] [c041ede0] nfnetlink_rcv_batch+0x438/0x4c0
[   12.872528] [ca473ca8] [c041ef80] nfnetlink_rcv+0x118/0x138
[   12.878053] [ca473cd8] [c040f138] netlink_unicast+0x18c/0x240
[   12.883725] [ca473d08] [c040fa14] netlink_sendmsg+0x278/0x398
[   12.889395] [ca473d58] [c03ace48] ____sys_sendmsg+0xac/0x1e4
[   12.894987] [ca473db8] [c03ad174] ___sys_sendmsg+0x64/0x88
[   12.900414] [ca473ea8] [c03aeea0] __sys_sendmsg+0x44/0x88
[   12.905751] [ca473f08] [c03af3d8] sys_socketcall+0xf4/0x1fc
[   12.911258] [ca473f38] [c000d0c0] ret_from_syscall+0x0/0x34
[   12.916760] --- interrupt: c01 at 0xfd95304
[   12.916760]     LR = 0xfd952e4
[   12.923865] Instruction dump:
[   12.926802] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[   12.934460] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[   12.942296] ---[ end trace c46738768244c85d ]---
[   12.947193] ------------[ cut here ]------------
[   12.951591] WARNING: CPU: 0 PID: 153 at 
arch/powerpc/lib/code-patching.c:182 patch_instruction+0x20c/0x324
[   12.961095] CPU: 0 PID: 153 Comm: nft Tainted: G        W 
5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5 #3526
[   12.970876] NIP:  c0012468 LR: c0012460 CTR: 00000000
[   12.975881] REGS: ca4738f0 TRAP: 0700   Tainted: G        W 
(5.6.0-rc6-s3k-dev-00903-g70f8a9483ed5)
[   12.985575] MSR:  00021032 <ME,IR,DR,RI>  CR: 44008442  XER: 20000000
[   12.991942]
[   12.991942] GPR00: 48000028 ca4739a8 c65f2ae0 00000001 ca4739e8 
00000004 c04cf028 00000004
[   12.991942] GPR08: 00000000 00000000 00000000 c66a4508 c6276bac 
10093080 00000000 00000000
[   12.991942] GPR16: 00000000 c0665dd8 00000000 c6619134 c08e7340 
c63e5800 ca473c34 c625d6b0
[   12.991942] GPR24: 00000002 00000001 c0730000 00009032 c0734a88 
c0730000 c04cf028 00000000
[   13.026676] NIP [c0012468] patch_instruction+0x20c/0x324
[   13.031919] LR [c0012460] patch_instruction+0x204/0x324
[   13.037037] Call Trace:
[   13.039493] [ca4739a8] [c001242c] patch_instruction+0x1d0/0x324 
(unreliable)
[   13.046468] [ca473a28] [c00bfdd8] jump_label_update+0xe0/0x128
[   13.052229] [ca473a58] [c00bfff0] 
static_key_slow_inc_cpuslocked+0x108/0x114
[   13.059197] [ca473a68] [c041c3e8] __nf_register_net_hook+0xb0/0x1a4
[   13.065389] [ca473a98] [c041c6c8] nf_register_net_hook+0x28/0x94
[   13.071328] [ca473ab8] [c041c778] nf_register_net_hooks+0x44/0xac
[   13.077378] [ca473ad8] [c0426a28] nf_ct_netns_do_get+0x18c/0x1d4
[   13.083323] [ca473af8] [c044e894] nft_ct_helper_obj_init+0x154/0x1d0
[   13.089572] [ca473b28] [c043f450] nft_obj_init+0xd4/0x178
[   13.094910] [ca473b48] [c0442b80] nf_tables_newobj+0x2e8/0x444
[   13.100680] [ca473ba8] [c041ede0] nfnetlink_rcv_batch+0x438/0x4c0
[   13.106699] [ca473ca8] [c041ef80] nfnetlink_rcv+0x118/0x138
[   13.112223] [ca473cd8] [c040f138] netlink_unicast+0x18c/0x240
[   13.117895] [ca473d08] [c040fa14] netlink_sendmsg+0x278/0x398
[   13.123566] [ca473d58] [c03ace48] ____sys_sendmsg+0xac/0x1e4
[   13.129158] [ca473db8] [c03ad174] ___sys_sendmsg+0x64/0x88
[   13.134585] [ca473ea8] [c03aeea0] __sys_sendmsg+0x44/0x88
[   13.139922] [ca473f08] [c03af3d8] sys_socketcall+0xf4/0x1fc
[   13.145429] [ca473f38] [c000d0c0] ret_from_syscall+0x0/0x34
[   13.150931] --- interrupt: c01 at 0xfd95304
[   13.150931]     LR = 0xfd952e4
[   13.158036] Instruction dump:
[   13.160973] 2f890000 91220000 409e0010 81220070 712a0004 40820120 
38a00004 38810040
[   13.168631] 7fc3f378 48000855 3123ffff 7c691910 <0f030000> 7f600124 
7fe9fb78 80010084
[   13.176467] ---[ end trace c46738768244c85e ]---


Christophe



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 0/3] Use per-CPU temporary mappings for patching
       [not found]     ` <738663735.45060.1584982175261@privateemail.com>
@ 2020-03-23 16:59       ` Christophe Leroy
  0 siblings, 0 replies; 23+ messages in thread
From: Christophe Leroy @ 2020-03-23 16:59 UTC (permalink / raw)
  To: Christopher M Riedl; +Cc: linuxppc-dev



Le 23/03/2020 à 17:49, Christopher M Riedl a écrit :
> 
>> On March 23, 2020 9:04 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>
>>   
>> On 03/23/2020 11:30 AM, Christophe Leroy wrote:
>>>
>>>
>>> On 03/23/2020 04:52 AM, Christopher M. Riedl wrote:
>>>> When compiled with CONFIG_STRICT_KERNEL_RWX, the kernel must create
>>>> temporary mappings when patching itself. These mappings temporarily
>>>> override the strict RWX text protections to permit a write. Currently,
>>>> powerpc allocates a per-CPU VM area for patching. Patching occurs as
>>>> follows:
>>>>
>>>>      1. Map page of text to be patched to per-CPU VM area w/
>>>>         PAGE_KERNEL protection
>>>>      2. Patch text
>>>>      3. Remove the temporary mapping
>>>>
>>>> While the VM area is per-CPU, the mapping is actually inserted into the
>>>> kernel page tables. Presumably, this could allow another CPU to access
>>>> the normally write-protected text - either malicously or accidentally -
>>>> via this same mapping if the address of the VM area is known. Ideally,
>>>> the mapping should be kept local to the CPU doing the patching (or any
>>>> other sensitive operations requiring temporarily overriding memory
>>>> protections) [0].
>>>>
>>>> x86 introduced "temporary mm" structs which allow the creation of
>>>> mappings local to a particular CPU [1]. This series intends to bring the
>>>> notion of a temporary mm to powerpc and harden powerpc by using such a
>>>> mapping for patching a kernel with strict RWX permissions.
>>>>
>>>> The first patch introduces the temporary mm struct and API for powerpc
>>>> along with a new function to retrieve a current hw breakpoint.
>>>>
>>>> The second patch uses the `poking_init` init hook added by the x86
>>>> patches to initialize a temporary mm and patching address. The patching
>>>> address is randomized between 0 and DEFAULT_MAP_WINDOW-PAGE_SIZE. The
>>>> upper limit is necessary due to how the hash MMU operates - by default
>>>> the space above DEFAULT_MAP_WINDOW is not available. For now, both hash
>>>> and radix randomize inside this range. The number of possible random
>>>> addresses is dependent on PAGE_SIZE and limited by DEFAULT_MAP_WINDOW.
>>>>
>>>> Bits of entropy with 64K page size on BOOK3S_64:
>>>>
>>>>      bits-o-entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
>>>>
>>>>      PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
>>>>      bits-o-entropy = log2(128TB / 64K)
>>>>      bits-o-entropy = 31
>>>>
>>>> Currently, randomization occurs only once during initialization at boot.
>>>>
>>>> The third patch replaces the VM area with the temporary mm in the
>>>> patching code. The page for patching has to be mapped PAGE_SHARED with
>>>> the hash MMU since hash prevents the kernel from accessing userspace
>>>> pages with PAGE_PRIVILEGED bit set. There is on-going work on my side to
>>>> explore if this is actually necessary in the hash codepath.
>>>>
>>>> Testing so far is limited to booting on QEMU (power8 and power9 targets)
>>>> and a POWER8 VM along with setting some simple xmon breakpoints (which
>>>> makes use of code-patching). A POC lkdtm test is in-progress to actually
>>>> exploit the existing vulnerability (ie. the mapping during patching is
>>>> exposed in kernel page tables and accessible by other CPUS) - this will
>>>> accompany a future v1 of this series.
>>>
>>> Got following failures on an 8xx. Note that "fault blocked by AP
>>> register !" means an unauthorised access from Kernel to Userspace.
>>>
> 
> So I am pretty ignorant on 8xx. Can you share a config I can build?
> Can I test this on my end w/ QEMU? I suspect there are some subtleties
> in the 8xx MMU I did not consider.
> 
> 

As config you can take mpc885_ads_defconfig

Unfortunately the 8xx is not supported by QEMU AFAIK.

By the way, first point is you are using userspace for the mapping, this 
means you have to modify the patching code to use put_user(), otherwise 
accesses are blocked when CONFIG_PPC_KUAP is set. That should also be 
the case on power9 with Radix.

Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 0/3] Use per-CPU temporary mappings for patching
  2020-03-23 14:04   ` Christophe Leroy
       [not found]     ` <738663735.45060.1584982175261@privateemail.com>
@ 2020-03-24 16:02     ` Christophe Leroy
  1 sibling, 0 replies; 23+ messages in thread
From: Christophe Leroy @ 2020-03-24 16:02 UTC (permalink / raw)
  To: Christopher M. Riedl, linuxppc-dev



Le 23/03/2020 à 15:04, Christophe Leroy a écrit :
> 
> 
> On 03/23/2020 11:30 AM, Christophe Leroy wrote:
>>
>>
>> On 03/23/2020 04:52 AM, Christopher M. Riedl wrote:
>>> When compiled with CONFIG_STRICT_KERNEL_RWX, the kernel must create
>>> temporary mappings when patching itself. These mappings temporarily
>>> override the strict RWX text protections to permit a write. Currently,
>>> powerpc allocates a per-CPU VM area for patching. Patching occurs as
>>> follows:
>>>
>>>     1. Map page of text to be patched to per-CPU VM area w/
>>>        PAGE_KERNEL protection
>>>     2. Patch text
>>>     3. Remove the temporary mapping
>>>
>>> While the VM area is per-CPU, the mapping is actually inserted into the
>>> kernel page tables. Presumably, this could allow another CPU to access
>>> the normally write-protected text - either malicously or accidentally -
>>> via this same mapping if the address of the VM area is known. Ideally,
>>> the mapping should be kept local to the CPU doing the patching (or any
>>> other sensitive operations requiring temporarily overriding memory
>>> protections) [0].
>>>
>>> x86 introduced "temporary mm" structs which allow the creation of
>>> mappings local to a particular CPU [1]. This series intends to bring the
>>> notion of a temporary mm to powerpc and harden powerpc by using such a
>>> mapping for patching a kernel with strict RWX permissions.
>>>
>>> The first patch introduces the temporary mm struct and API for powerpc
>>> along with a new function to retrieve a current hw breakpoint.
>>>
>>> The second patch uses the `poking_init` init hook added by the x86
>>> patches to initialize a temporary mm and patching address. The patching
>>> address is randomized between 0 and DEFAULT_MAP_WINDOW-PAGE_SIZE. The
>>> upper limit is necessary due to how the hash MMU operates - by default
>>> the space above DEFAULT_MAP_WINDOW is not available. For now, both hash
>>> and radix randomize inside this range. The number of possible random
>>> addresses is dependent on PAGE_SIZE and limited by DEFAULT_MAP_WINDOW.
>>>
>>> Bits of entropy with 64K page size on BOOK3S_64:
>>>
>>>     bits-o-entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
>>>
>>>     PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
>>>     bits-o-entropy = log2(128TB / 64K)
>>>     bits-o-entropy = 31
>>>
>>> Currently, randomization occurs only once during initialization at boot.
>>>
>>> The third patch replaces the VM area with the temporary mm in the
>>> patching code. The page for patching has to be mapped PAGE_SHARED with
>>> the hash MMU since hash prevents the kernel from accessing userspace
>>> pages with PAGE_PRIVILEGED bit set. There is on-going work on my side to
>>> explore if this is actually necessary in the hash codepath.
>>>
>>> Testing so far is limited to booting on QEMU (power8 and power9 targets)
>>> and a POWER8 VM along with setting some simple xmon breakpoints (which
>>> makes use of code-patching). A POC lkdtm test is in-progress to actually
>>> exploit the existing vulnerability (ie. the mapping during patching is
>>> exposed in kernel page tables and accessible by other CPUS) - this will
>>> accompany a future v1 of this series.
>>
>> Got following failures on an 8xx. Note that "fault blocked by AP 
>> register !" means an unauthorised access from Kernel to Userspace.
>>
> 
> Still a problem even without CONFIG_PPC_KUAP:
> 

I've been able to dig into the problem.

With CONFIG_PPC_KUAP, it can definitely not work. See why in commit 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ef296729b735e083d8919e76ac213b8ff237eb78

Without CONFIG_PPC_KUAP, on the 8xx, __put_user_asm() in 
__patch_instruction() returns -EFAULT. That's because _PAGE_DIRTY is not 
set on the page. Normally it should be a minor fault and the fault 
handler should set the _PAGE_DIRTY flag. It must be something in the way 
the page is allocated and mapped which prevents that. If I forge 
_PAGE_DIRTY in addition to PAGE_SHARED, it works. But I don't think it 
is valid approach to solve the issue.

Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/3] powerpc/mm: Introduce temporary mm
  2020-03-23  4:52 ` [RFC PATCH 1/3] powerpc/mm: Introduce temporary mm Christopher M. Riedl
  2020-03-23  6:10   ` kbuild test robot
@ 2020-03-24 16:07   ` Christophe Leroy
  2020-03-31  2:41     ` Christopher M Riedl
  2020-04-18 10:36   ` Christophe Leroy
  2 siblings, 1 reply; 23+ messages in thread
From: Christophe Leroy @ 2020-03-24 16:07 UTC (permalink / raw)
  To: Christopher M. Riedl, linuxppc-dev



Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit :
> x86 supports the notion of a temporary mm which restricts access to
> temporary PTEs to a single CPU. A temporary mm is useful for situations
> where a CPU needs to perform sensitive operations (such as patching a
> STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
> said mappings to other CPUs. A side benefit is that other CPU TLBs do
> not need to be flushed when the temporary mm is torn down.
> 
> Mappings in the temporary mm can be set in the userspace portion of the
> address-space.
> 
> Interrupts must be disabled while the temporary mm is in use. HW
> breakpoints, which may have been set by userspace as watchpoints on
> addresses now within the temporary mm, are saved and disabled when
> loading the temporary mm. The HW breakpoints are restored when unloading
> the temporary mm. All HW breakpoints are indiscriminately disabled while
> the temporary mm is in use.
> 
> Based on x86 implementation:
> 
> commit cefa929c034e
> ("x86/mm: Introduce temporary mm structs")
> 
> Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> ---
>   arch/powerpc/include/asm/debug.h       |  1 +
>   arch/powerpc/include/asm/mmu_context.h | 56 +++++++++++++++++++++++++-
>   arch/powerpc/kernel/process.c          |  5 +++
>   3 files changed, 61 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
> index 7756026b95ca..b945bc16c932 100644
> --- a/arch/powerpc/include/asm/debug.h
> +++ b/arch/powerpc/include/asm/debug.h
> @@ -45,6 +45,7 @@ static inline int debugger_break_match(struct pt_regs *regs) { return 0; }
>   static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; }
>   #endif
>   
> +void __get_breakpoint(struct arch_hw_breakpoint *brk);
>   void __set_breakpoint(struct arch_hw_breakpoint *brk);
>   bool ppc_breakpoint_available(void);
>   #ifdef CONFIG_PPC_ADV_DEBUG_REGS
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 360367c579de..3e6381d04c28 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -7,9 +7,10 @@
>   #include <linux/mm.h>
>   #include <linux/sched.h>
>   #include <linux/spinlock.h>
> -#include <asm/mmu.h>	
> +#include <asm/mmu.h>

What's this change ?
I see you are removing a space at the end of the line, but it shouldn't 
be part of this patch.

>   #include <asm/cputable.h>
>   #include <asm/cputhreads.h>
> +#include <asm/hw_breakpoint.h>
>   
>   /*
>    * Most if the context management is out of line
> @@ -270,5 +271,58 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
>   	return 0;
>   }
>   
> +struct temp_mm {
> +	struct mm_struct *temp;
> +	struct mm_struct *prev;
> +	bool is_kernel_thread;
> +	struct arch_hw_breakpoint brk;
> +};
> +
> +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
> +{
> +	temp_mm->temp = mm;
> +	temp_mm->prev = NULL;
> +	temp_mm->is_kernel_thread = false;
> +	memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
> +}
> +
> +static inline void use_temporary_mm(struct temp_mm *temp_mm)
> +{
> +	lockdep_assert_irqs_disabled();
> +
> +	temp_mm->is_kernel_thread = current->mm == NULL;
> +	if (temp_mm->is_kernel_thread)
> +		temp_mm->prev = current->active_mm;
> +	else
> +		temp_mm->prev = current->mm;
> +
> +	/*
> +	 * Hash requires a non-NULL current->mm to allocate a userspace address
> +	 * when handling a page fault. Does not appear to hurt in Radix either.
> +	 */
> +	current->mm = temp_mm->temp;
> +	switch_mm_irqs_off(NULL, temp_mm->temp, current);
> +
> +	if (ppc_breakpoint_available()) {
> +		__get_breakpoint(&temp_mm->brk);
> +		if (temp_mm->brk.type != 0)
> +			hw_breakpoint_disable();
> +	}
> +}
> +
> +static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
> +{
> +	lockdep_assert_irqs_disabled();
> +
> +	if (temp_mm->is_kernel_thread)
> +		current->mm = NULL;
> +	else
> +		current->mm = temp_mm->prev;
> +	switch_mm_irqs_off(NULL, temp_mm->prev, current);
> +
> +	if (ppc_breakpoint_available() && temp_mm->brk.type != 0)
> +		__set_breakpoint(&temp_mm->brk);
> +}
> +
>   #endif /* __KERNEL__ */
>   #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index fad50db9dcf2..5e5cf33fc358 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -793,6 +793,11 @@ static inline int set_breakpoint_8xx(struct arch_hw_breakpoint *brk)
>   	return 0;
>   }
>   
> +void __get_breakpoint(struct arch_hw_breakpoint *brk)
> +{
> +	memcpy(brk, this_cpu_ptr(&current_brk), sizeof(*brk));
> +}
> +
>   void __set_breakpoint(struct arch_hw_breakpoint *brk)
>   {
>   	memcpy(this_cpu_ptr(&current_brk), brk, sizeof(*brk));
> 


Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/lib: Initialize a temporary mm for code patching
  2020-03-23  4:52 ` [RFC PATCH 2/3] powerpc/lib: Initialize a temporary mm for code patching Christopher M. Riedl
@ 2020-03-24 16:10   ` Christophe Leroy
  2020-03-31  3:19     ` Christopher M Riedl
  0 siblings, 1 reply; 23+ messages in thread
From: Christophe Leroy @ 2020-03-24 16:10 UTC (permalink / raw)
  To: Christopher M. Riedl, linuxppc-dev



Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit :
> When code patching a STRICT_KERNEL_RWX kernel the page containing the
> address to be patched is temporarily mapped with permissive memory
> protections. Currently, a per-cpu vmalloc patch area is used for this
> purpose. While the patch area is per-cpu, the temporary page mapping is
> inserted into the kernel page tables for the duration of the patching.
> The mapping is exposed to CPUs other than the patching CPU - this is
> undesirable from a hardening perspective.
> 
> Use the `poking_init` init hook to prepare a temporary mm and patching
> address. Initialize the temporary mm by copying the init mm. Choose a
> randomized patching address inside the temporary mm userspace address
> portion. The next patch uses the temporary mm and patching address for
> code patching.
> 
> Based on x86 implementation:
> 
> commit 4fc19708b165
> ("x86/alternatives: Initialize temporary mm for patching")
> 
> Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> ---
>   arch/powerpc/lib/code-patching.c | 26 ++++++++++++++++++++++++++
>   1 file changed, 26 insertions(+)
> 
> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> index 3345f039a876..18b88ecfc5a8 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -11,6 +11,8 @@
>   #include <linux/cpuhotplug.h>
>   #include <linux/slab.h>
>   #include <linux/uaccess.h>
> +#include <linux/sched/task.h>
> +#include <linux/random.h>
>   
>   #include <asm/pgtable.h>
>   #include <asm/tlbflush.h>
> @@ -39,6 +41,30 @@ int raw_patch_instruction(unsigned int *addr, unsigned int instr)
>   }
>   
>   #ifdef CONFIG_STRICT_KERNEL_RWX
> +
> +__ro_after_init struct mm_struct *patching_mm;
> +__ro_after_init unsigned long patching_addr;

Can we make those those static ?

> +
> +void __init poking_init(void)
> +{
> +	spinlock_t *ptl; /* for protecting pte table */
> +	pte_t *ptep;
> +
> +	patching_mm = copy_init_mm();
> +	BUG_ON(!patching_mm);

Does it needs to be a BUG_ON() ? Can't we fail gracefully with just a 
WARN_ON ?

> +
> +	/*
> +	 * In hash we cannot go above DEFAULT_MAP_WINDOW easily.
> +	 * XXX: Do we want additional bits of entropy for radix?
> +	 */
> +	patching_addr = (get_random_long() & PAGE_MASK) %
> +		(DEFAULT_MAP_WINDOW - PAGE_SIZE);
> +
> +	ptep = get_locked_pte(patching_mm, patching_addr, &ptl);
> +	BUG_ON(!ptep);

Same here, can we fail gracefully instead ?

> +	pte_unmap_unlock(ptep, ptl);
> +}
> +
>   static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
>   
>   static int text_area_cpu_up(unsigned int cpu)
> 

Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/lib: Use a temporary mm for code patching
  2020-03-23  4:52 ` [RFC PATCH 3/3] powerpc/lib: Use " Christopher M. Riedl
@ 2020-03-24 16:25   ` Christophe Leroy
  2020-04-15  5:11     ` Christopher M Riedl
  0 siblings, 1 reply; 23+ messages in thread
From: Christophe Leroy @ 2020-03-24 16:25 UTC (permalink / raw)
  To: Christopher M. Riedl, linuxppc-dev



Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit :
> Currently, code patching a STRICT_KERNEL_RWX exposes the temporary
> mappings to other CPUs. These mappings should be kept local to the CPU
> doing the patching. Use the pre-initialized temporary mm and patching
> address for this purpose. Also add a check after patching to ensure the
> patch succeeded.
> 
> Based on x86 implementation:
> 
> commit b3fd8e83ada0
> ("x86/alternatives: Use temporary mm for text poking")
> 
> Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> ---
>   arch/powerpc/lib/code-patching.c | 128 ++++++++++++++-----------------
>   1 file changed, 57 insertions(+), 71 deletions(-)
> 
> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> index 18b88ecfc5a8..f156132e8975 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -19,6 +19,7 @@
>   #include <asm/page.h>
>   #include <asm/code-patching.h>
>   #include <asm/setup.h>
> +#include <asm/mmu_context.h>
>   
>   static int __patch_instruction(unsigned int *exec_addr, unsigned int instr,
>   			       unsigned int *patch_addr)
> @@ -65,99 +66,79 @@ void __init poking_init(void)
>   	pte_unmap_unlock(ptep, ptl);
>   }
>   
> -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> -
> -static int text_area_cpu_up(unsigned int cpu)
> -{
> -	struct vm_struct *area;
> -
> -	area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> -	if (!area) {
> -		WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> -			cpu);
> -		return -1;
> -	}
> -	this_cpu_write(text_poke_area, area);
> -
> -	return 0;
> -}
> -
> -static int text_area_cpu_down(unsigned int cpu)
> -{
> -	free_vm_area(this_cpu_read(text_poke_area));
> -	return 0;
> -}
> -
> -/*
> - * Run as a late init call. This allows all the boot time patching to be done
> - * simply by patching the code, and then we're called here prior to
> - * mark_rodata_ro(), which happens after all init calls are run. Although
> - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
> - * it as being preferable to a kernel that will crash later when someone tries
> - * to use patch_instruction().
> - */
> -static int __init setup_text_poke_area(void)
> -{
> -	BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> -		"powerpc/text_poke:online", text_area_cpu_up,
> -		text_area_cpu_down));
> -
> -	return 0;
> -}
> -late_initcall(setup_text_poke_area);
> +struct patch_mapping {
> +	spinlock_t *ptl; /* for protecting pte table */
> +	struct temp_mm temp_mm;
> +};
>   
>   /*
>    * This can be called for kernel text or a module.
>    */
> -static int map_patch_area(void *addr, unsigned long text_poke_addr)
> +static int map_patch(const void *addr, struct patch_mapping *patch_mapping)

Why change the name ?

>   {
> -	unsigned long pfn;
> -	int err;
> +	struct page *page;
> +	pte_t pte, *ptep;
> +	pgprot_t pgprot;
>   
>   	if (is_vmalloc_addr(addr))
> -		pfn = vmalloc_to_pfn(addr);
> +		page = vmalloc_to_page(addr);
>   	else
> -		pfn = __pa_symbol(addr) >> PAGE_SHIFT;
> +		page = virt_to_page(addr);
>   
> -	err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
> +	if (radix_enabled())
> +		pgprot = __pgprot(pgprot_val(PAGE_KERNEL));
> +	else
> +		pgprot = PAGE_SHARED;

Can you explain the difference between radix and non radix ?

Why PAGE_KERNEL for a page that is mapped in userspace ?

Why do you need to do __pgprot(pgprot_val(PAGE_KERNEL)) instead of just 
using PAGE_KERNEL ?

>   
> -	pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err);
> -	if (err)
> +	ptep = get_locked_pte(patching_mm, patching_addr, &patch_mapping->ptl);
> +	if (unlikely(!ptep)) {
> +		pr_warn("map patch: failed to allocate pte for patching\n");
>   		return -1;
> +	}
> +
> +	pte = mk_pte(page, pgprot);
> +	set_pte_at(patching_mm, patching_addr, ptep, pte);
> +
> +	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
> +	use_temporary_mm(&patch_mapping->temp_mm);
>   
>   	return 0;
>   }
>   
> -static inline int unmap_patch_area(unsigned long addr)
> +static int unmap_patch(struct patch_mapping *patch_mapping)
>   {
>   	pte_t *ptep;
>   	pmd_t *pmdp;
>   	pud_t *pudp;
>   	pgd_t *pgdp;
>   
> -	pgdp = pgd_offset_k(addr);
> +	pgdp = pgd_offset(patching_mm, patching_addr);
>   	if (unlikely(!pgdp))
>   		return -EINVAL;
>   
> -	pudp = pud_offset(pgdp, addr);
> +	pudp = pud_offset(pgdp, patching_addr);
>   	if (unlikely(!pudp))
>   		return -EINVAL;
>   
> -	pmdp = pmd_offset(pudp, addr);
> +	pmdp = pmd_offset(pudp, patching_addr);
>   	if (unlikely(!pmdp))
>   		return -EINVAL;
>   
> -	ptep = pte_offset_kernel(pmdp, addr);
> +	ptep = pte_offset_kernel(pmdp, patching_addr);

ptep should be stored in the patch_mapping struct instead of walking 
again the page tables.

>   	if (unlikely(!ptep))
>   		return -EINVAL;
>   
> -	pr_devel("clearing mm %p, pte %p, addr %lx\n", &init_mm, ptep, addr);
> +	/*
> +	 * In hash, pte_clear flushes the tlb
> +	 */
> +	pte_clear(patching_mm, patching_addr, ptep);
> +	unuse_temporary_mm(&patch_mapping->temp_mm);
>   
>   	/*
> -	 * In hash, pte_clear flushes the tlb, in radix, we have to
> +	 * In radix, we have to explicitly flush the tlb (no-op in hash)
>   	 */
> -	pte_clear(&init_mm, addr, ptep);
> -	flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> +	local_flush_tlb_mm(patching_mm);
> +	pte_unmap_unlock(ptep, patch_mapping->ptl);
>   
>   	return 0;
>   }
> @@ -167,33 +148,38 @@ static int do_patch_instruction(unsigned int *addr, unsigned int instr)
>   	int err;
>   	unsigned int *patch_addr = NULL;
>   	unsigned long flags;
> -	unsigned long text_poke_addr;
> -	unsigned long kaddr = (unsigned long)addr;
> +	struct patch_mapping patch_mapping;
>   
>   	/*
> -	 * During early early boot patch_instruction is called
> -	 * when text_poke_area is not ready, but we still need
> -	 * to allow patching. We just do the plain old patching
> +	 * The patching_mm is initialized before calling mark_rodata_ro. Prior
> +	 * to this, patch_instruction is called when we don't have (and don't
> +	 * need) the patching_mm so just do plain old patching.
>   	 */
> -	if (!this_cpu_read(text_poke_area))
> +	if (!patching_mm)
>   		return raw_patch_instruction(addr, instr);
>   
>   	local_irq_save(flags);
>   
> -	text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr;
> -	if (map_patch_area(addr, text_poke_addr)) {
> -		err = -1;
> +	err = map_patch(addr, &patch_mapping);
> +	if (err)
>   		goto out;
> -	}
>   
> -	patch_addr = (unsigned int *)(text_poke_addr) +
> -			((kaddr & ~PAGE_MASK) / sizeof(unsigned int));
> +	patch_addr = (unsigned int *)(patching_addr) +
> +			(offset_in_page((unsigned long)addr) /
> +				sizeof(unsigned int));
>   
>   	__patch_instruction(addr, instr, patch_addr);

The error returned by __patch_instruction() should be managed.

>   
> -	err = unmap_patch_area(text_poke_addr);
> +	err = unmap_patch(&patch_mapping);
>   	if (err)
> -		pr_warn("failed to unmap %lx\n", text_poke_addr);
> +		pr_warn("unmap patch: failed to unmap patch\n");
> +
> +	/*
> +	 * Something is wrong if what we just wrote doesn't match what we
> +	 * think we just wrote.
> +	 * XXX: BUG_ON() instead?

No, not a BUG_ON(). If patching fails, that's no a vital fault, we can 
fail gracefully. You should return a fault instead.

> +	 */
> +	WARN_ON(memcmp(addr, &instr, sizeof(instr)));

Come on. addr is an *int, instr is an int. By doing a memcmp() on 
&instr, you for the compiler to write instr into the stack whereas local 
vars are mainly in registers on RISC processors like powerpc. Following 
should do it:

	WARN_ON(*addr != instr);

>   
>   out:
>   	local_irq_restore(flags);
> 

Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 0/3] Use per-CPU temporary mappings for patching
  2020-03-23  4:52 [RFC PATCH 0/3] Use per-CPU temporary mappings for patching Christopher M. Riedl
                   ` (3 preceding siblings ...)
  2020-03-23 11:30 ` [RFC PATCH 0/3] Use per-CPU temporary mappings for patching Christophe Leroy
@ 2020-03-25  2:51 ` Andrew Donnellan
  4 siblings, 0 replies; 23+ messages in thread
From: Andrew Donnellan @ 2020-03-25  2:51 UTC (permalink / raw)
  To: Christopher M. Riedl, linuxppc-dev

On 23/3/20 3:52 pm, Christopher M. Riedl wrote:
> When compiled with CONFIG_STRICT_KERNEL_RWX, the kernel must create
> temporary mappings when patching itself. These mappings temporarily
> override the strict RWX text protections to permit a write. Currently,
> powerpc allocates a per-CPU VM area for patching. Patching occurs as
> follows:
> 
> 	1. Map page of text to be patched to per-CPU VM area w/
> 	   PAGE_KERNEL protection
> 	2. Patch text
> 	3. Remove the temporary mapping
> 
> While the VM area is per-CPU, the mapping is actually inserted into the
> kernel page tables. Presumably, this could allow another CPU to access
> the normally write-protected text - either malicously or accidentally -
> via this same mapping if the address of the VM area is known. Ideally,
> the mapping should be kept local to the CPU doing the patching (or any
> other sensitive operations requiring temporarily overriding memory
> protections) [0].
> 
> x86 introduced "temporary mm" structs which allow the creation of
> mappings local to a particular CPU [1]. This series intends to bring the
> notion of a temporary mm to powerpc and harden powerpc by using such a
> mapping for patching a kernel with strict RWX permissions.
> 
> The first patch introduces the temporary mm struct and API for powerpc
> along with a new function to retrieve a current hw breakpoint.
> 
> The second patch uses the `poking_init` init hook added by the x86
> patches to initialize a temporary mm and patching address. The patching
> address is randomized between 0 and DEFAULT_MAP_WINDOW-PAGE_SIZE. The
> upper limit is necessary due to how the hash MMU operates - by default
> the space above DEFAULT_MAP_WINDOW is not available. For now, both hash
> and radix randomize inside this range. The number of possible random
> addresses is dependent on PAGE_SIZE and limited by DEFAULT_MAP_WINDOW.
> 
> Bits of entropy with 64K page size on BOOK3S_64:
> 
> 	bits-o-entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
> 
> 	PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
> 	bits-o-entropy = log2(128TB / 64K)
> 	bits-o-entropy = 31
> 
> Currently, randomization occurs only once during initialization at boot.
> 
> The third patch replaces the VM area with the temporary mm in the
> patching code. The page for patching has to be mapped PAGE_SHARED with
> the hash MMU since hash prevents the kernel from accessing userspace
> pages with PAGE_PRIVILEGED bit set. There is on-going work on my side to
> explore if this is actually necessary in the hash codepath.
> 
> Testing so far is limited to booting on QEMU (power8 and power9 targets)
> and a POWER8 VM along with setting some simple xmon breakpoints (which
> makes use of code-patching). A POC lkdtm test is in-progress to actually
> exploit the existing vulnerability (ie. the mapping during patching is
> exposed in kernel page tables and accessible by other CPUS) - this will
> accompany a future v1 of this series.
> 
> [0]: https://github.com/linuxppc/issues/issues/224
> [1]: https://lore.kernel.org/kernel-hardening/20190426232303.28381-1-nadav.amit@gmail.com/
> 
> Christopher M. Riedl (3):
>    powerpc/mm: Introduce temporary mm
>    powerpc/lib: Initialize a temporary mm for code patching
>    powerpc/lib: Use a temporary mm for code patching
> 
>   arch/powerpc/include/asm/debug.h       |   1 +
>   arch/powerpc/include/asm/mmu_context.h |  56 +++++++++-
>   arch/powerpc/kernel/process.c          |   5 +
>   arch/powerpc/lib/code-patching.c       | 140 ++++++++++++++-----------
>   4 files changed, 137 insertions(+), 65 deletions(-)

This series causes a build failure with ppc64e_defconfig

https://openpower.xyz/job/snowpatch/job/snowpatch-linux-sparse/16478//artifact/linux/report.txt

-- 
Andrew Donnellan              OzLabs, ADL Canberra
ajd@linux.ibm.com             IBM Australia Limited


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/3] powerpc/mm: Introduce temporary mm
  2020-03-24 16:07   ` Christophe Leroy
@ 2020-03-31  2:41     ` Christopher M Riedl
  0 siblings, 0 replies; 23+ messages in thread
From: Christopher M Riedl @ 2020-03-31  2:41 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev

> On March 24, 2020 11:07 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
>  
> Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit :
> > x86 supports the notion of a temporary mm which restricts access to
> > temporary PTEs to a single CPU. A temporary mm is useful for situations
> > where a CPU needs to perform sensitive operations (such as patching a
> > STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
> > said mappings to other CPUs. A side benefit is that other CPU TLBs do
> > not need to be flushed when the temporary mm is torn down.
> > 
> > Mappings in the temporary mm can be set in the userspace portion of the
> > address-space.
> > 
> > Interrupts must be disabled while the temporary mm is in use. HW
> > breakpoints, which may have been set by userspace as watchpoints on
> > addresses now within the temporary mm, are saved and disabled when
> > loading the temporary mm. The HW breakpoints are restored when unloading
> > the temporary mm. All HW breakpoints are indiscriminately disabled while
> > the temporary mm is in use.
> > 
> > Based on x86 implementation:
> > 
> > commit cefa929c034e
> > ("x86/mm: Introduce temporary mm structs")
> > 
> > Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> > ---
> >   arch/powerpc/include/asm/debug.h       |  1 +
> >   arch/powerpc/include/asm/mmu_context.h | 56 +++++++++++++++++++++++++-
> >   arch/powerpc/kernel/process.c          |  5 +++
> >   3 files changed, 61 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
> > index 7756026b95ca..b945bc16c932 100644
> > --- a/arch/powerpc/include/asm/debug.h
> > +++ b/arch/powerpc/include/asm/debug.h
> > @@ -45,6 +45,7 @@ static inline int debugger_break_match(struct pt_regs *regs) { return 0; }
> >   static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; }
> >   #endif
> >   
> > +void __get_breakpoint(struct arch_hw_breakpoint *brk);
> >   void __set_breakpoint(struct arch_hw_breakpoint *brk);
> >   bool ppc_breakpoint_available(void);
> >   #ifdef CONFIG_PPC_ADV_DEBUG_REGS
> > diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> > index 360367c579de..3e6381d04c28 100644
> > --- a/arch/powerpc/include/asm/mmu_context.h
> > +++ b/arch/powerpc/include/asm/mmu_context.h
> > @@ -7,9 +7,10 @@
> >   #include <linux/mm.h>
> >   #include <linux/sched.h>
> >   #include <linux/spinlock.h>
> > -#include <asm/mmu.h>	
> > +#include <asm/mmu.h>
> 
> What's this change ?
> I see you are removing a space at the end of the line, but it shouldn't 
> be part of this patch.
> 

Overly aggressive "helpful" editor setting apparently.
Removed this change in the next version.

> >   #include <asm/cputable.h>
> >   #include <asm/cputhreads.h>
> > +#include <asm/hw_breakpoint.h>
> >   
> >   /*
> >    * Most if the context management is out of line
> > @@ -270,5 +271,58 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
> >   	return 0;
> >   }
> >   
> > +struct temp_mm {
> > +	struct mm_struct *temp;
> > +	struct mm_struct *prev;
> > +	bool is_kernel_thread;
> > +	struct arch_hw_breakpoint brk;
> > +};
> > +
> > +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
> > +{
> > +	temp_mm->temp = mm;
> > +	temp_mm->prev = NULL;
> > +	temp_mm->is_kernel_thread = false;
> > +	memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
> > +}
> > +
> > +static inline void use_temporary_mm(struct temp_mm *temp_mm)
> > +{
> > +	lockdep_assert_irqs_disabled();
> > +
> > +	temp_mm->is_kernel_thread = current->mm == NULL;
> > +	if (temp_mm->is_kernel_thread)
> > +		temp_mm->prev = current->active_mm;
> > +	else
> > +		temp_mm->prev = current->mm;
> > +
> > +	/*
> > +	 * Hash requires a non-NULL current->mm to allocate a userspace address
> > +	 * when handling a page fault. Does not appear to hurt in Radix either.
> > +	 */
> > +	current->mm = temp_mm->temp;
> > +	switch_mm_irqs_off(NULL, temp_mm->temp, current);
> > +
> > +	if (ppc_breakpoint_available()) {
> > +		__get_breakpoint(&temp_mm->brk);
> > +		if (temp_mm->brk.type != 0)
> > +			hw_breakpoint_disable();
> > +	}
> > +}
> > +
> > +static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
> > +{
> > +	lockdep_assert_irqs_disabled();
> > +
> > +	if (temp_mm->is_kernel_thread)
> > +		current->mm = NULL;
> > +	else
> > +		current->mm = temp_mm->prev;
> > +	switch_mm_irqs_off(NULL, temp_mm->prev, current);
> > +
> > +	if (ppc_breakpoint_available() && temp_mm->brk.type != 0)
> > +		__set_breakpoint(&temp_mm->brk);
> > +}
> > +
> >   #endif /* __KERNEL__ */
> >   #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index fad50db9dcf2..5e5cf33fc358 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -793,6 +793,11 @@ static inline int set_breakpoint_8xx(struct arch_hw_breakpoint *brk)
> >   	return 0;
> >   }
> >   
> > +void __get_breakpoint(struct arch_hw_breakpoint *brk)
> > +{
> > +	memcpy(brk, this_cpu_ptr(&current_brk), sizeof(*brk));
> > +}
> > +
> >   void __set_breakpoint(struct arch_hw_breakpoint *brk)
> >   {
> >   	memcpy(this_cpu_ptr(&current_brk), brk, sizeof(*brk));
> > 
> 
> 
> Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/lib: Initialize a temporary mm for code patching
  2020-03-24 16:10   ` Christophe Leroy
@ 2020-03-31  3:19     ` Christopher M Riedl
  2020-04-08 11:01       ` Christophe Leroy
  0 siblings, 1 reply; 23+ messages in thread
From: Christopher M Riedl @ 2020-03-31  3:19 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev

> On March 24, 2020 11:10 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
>  
> Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit :
> > When code patching a STRICT_KERNEL_RWX kernel the page containing the
> > address to be patched is temporarily mapped with permissive memory
> > protections. Currently, a per-cpu vmalloc patch area is used for this
> > purpose. While the patch area is per-cpu, the temporary page mapping is
> > inserted into the kernel page tables for the duration of the patching.
> > The mapping is exposed to CPUs other than the patching CPU - this is
> > undesirable from a hardening perspective.
> > 
> > Use the `poking_init` init hook to prepare a temporary mm and patching
> > address. Initialize the temporary mm by copying the init mm. Choose a
> > randomized patching address inside the temporary mm userspace address
> > portion. The next patch uses the temporary mm and patching address for
> > code patching.
> > 
> > Based on x86 implementation:
> > 
> > commit 4fc19708b165
> > ("x86/alternatives: Initialize temporary mm for patching")
> > 
> > Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> > ---
> >   arch/powerpc/lib/code-patching.c | 26 ++++++++++++++++++++++++++
> >   1 file changed, 26 insertions(+)
> > 
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> > index 3345f039a876..18b88ecfc5a8 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -11,6 +11,8 @@
> >   #include <linux/cpuhotplug.h>
> >   #include <linux/slab.h>
> >   #include <linux/uaccess.h>
> > +#include <linux/sched/task.h>
> > +#include <linux/random.h>
> >   
> >   #include <asm/pgtable.h>
> >   #include <asm/tlbflush.h>
> > @@ -39,6 +41,30 @@ int raw_patch_instruction(unsigned int *addr, unsigned int instr)
> >   }
> >   
> >   #ifdef CONFIG_STRICT_KERNEL_RWX
> > +
> > +__ro_after_init struct mm_struct *patching_mm;
> > +__ro_after_init unsigned long patching_addr;
> 
> Can we make those those static ?
> 

Yes, makes sense to me.

> > +
> > +void __init poking_init(void)
> > +{
> > +	spinlock_t *ptl; /* for protecting pte table */
> > +	pte_t *ptep;
> > +
> > +	patching_mm = copy_init_mm();
> > +	BUG_ON(!patching_mm);
> 
> Does it needs to be a BUG_ON() ? Can't we fail gracefully with just a 
> WARN_ON ?
>

I'm not sure what failing gracefully means here? The main reason this could
fail is if there is not enough memory to allocate the patching_mm. The
previous implementation had this justification for BUG_ON():

/*
 * Run as a late init call. This allows all the boot time patching to be done
 * simply by patching the code, and then we're called here prior to
 * mark_rodata_ro(), which happens after all init calls are run. Although
 * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
 * it as being preferable to a kernel that will crash later when someone tries
 * to use patch_instruction().
 */
static int __init setup_text_poke_area(void)
{
        BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
                "powerpc/text_poke:online", text_area_cpu_up,
                text_area_cpu_down));

        return 0;
}
late_initcall(setup_text_poke_area);

I think the BUG_ON() is appropriate even if only to adhere to the previous
judgement call. I can add a similar comment explaining the reasoning if
that helps.

> > +
> > +	/*
> > +	 * In hash we cannot go above DEFAULT_MAP_WINDOW easily.
> > +	 * XXX: Do we want additional bits of entropy for radix?
> > +	 */
> > +	patching_addr = (get_random_long() & PAGE_MASK) %
> > +		(DEFAULT_MAP_WINDOW - PAGE_SIZE);
> > +
> > +	ptep = get_locked_pte(patching_mm, patching_addr, &ptl);
> > +	BUG_ON(!ptep);
> 
> Same here, can we fail gracefully instead ?
>

Same reasoning as above.

> > +	pte_unmap_unlock(ptep, ptl);
> > +}
> > +
> >   static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> >   
> >   static int text_area_cpu_up(unsigned int cpu)
> > 
> 
> Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/lib: Initialize a temporary mm for code patching
  2020-03-31  3:19     ` Christopher M Riedl
@ 2020-04-08 11:01       ` Christophe Leroy
  2020-04-15  4:39         ` Christopher M Riedl
  2020-04-17  0:57         ` Michael Ellerman
  0 siblings, 2 replies; 23+ messages in thread
From: Christophe Leroy @ 2020-04-08 11:01 UTC (permalink / raw)
  To: Christopher M Riedl, linuxppc-dev



Le 31/03/2020 à 05:19, Christopher M Riedl a écrit :
>> On March 24, 2020 11:10 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>
>>   
>> Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit :
>>> When code patching a STRICT_KERNEL_RWX kernel the page containing the
>>> address to be patched is temporarily mapped with permissive memory
>>> protections. Currently, a per-cpu vmalloc patch area is used for this
>>> purpose. While the patch area is per-cpu, the temporary page mapping is
>>> inserted into the kernel page tables for the duration of the patching.
>>> The mapping is exposed to CPUs other than the patching CPU - this is
>>> undesirable from a hardening perspective.
>>>
>>> Use the `poking_init` init hook to prepare a temporary mm and patching
>>> address. Initialize the temporary mm by copying the init mm. Choose a
>>> randomized patching address inside the temporary mm userspace address
>>> portion. The next patch uses the temporary mm and patching address for
>>> code patching.
>>>
>>> Based on x86 implementation:
>>>
>>> commit 4fc19708b165
>>> ("x86/alternatives: Initialize temporary mm for patching")
>>>
>>> Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
>>> ---
>>>    arch/powerpc/lib/code-patching.c | 26 ++++++++++++++++++++++++++
>>>    1 file changed, 26 insertions(+)
>>>
>>> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
>>> index 3345f039a876..18b88ecfc5a8 100644
>>> --- a/arch/powerpc/lib/code-patching.c
>>> +++ b/arch/powerpc/lib/code-patching.c
>>> @@ -11,6 +11,8 @@
>>>    #include <linux/cpuhotplug.h>
>>>    #include <linux/slab.h>
>>>    #include <linux/uaccess.h>
>>> +#include <linux/sched/task.h>
>>> +#include <linux/random.h>
>>>    
>>>    #include <asm/pgtable.h>
>>>    #include <asm/tlbflush.h>
>>> @@ -39,6 +41,30 @@ int raw_patch_instruction(unsigned int *addr, unsigned int instr)
>>>    }
>>>    
>>>    #ifdef CONFIG_STRICT_KERNEL_RWX
>>> +
>>> +__ro_after_init struct mm_struct *patching_mm;
>>> +__ro_after_init unsigned long patching_addr;
>>
>> Can we make those those static ?
>>
> 
> Yes, makes sense to me.
> 
>>> +
>>> +void __init poking_init(void)
>>> +{
>>> +	spinlock_t *ptl; /* for protecting pte table */
>>> +	pte_t *ptep;
>>> +
>>> +	patching_mm = copy_init_mm();
>>> +	BUG_ON(!patching_mm);
>>
>> Does it needs to be a BUG_ON() ? Can't we fail gracefully with just a
>> WARN_ON ?
>>
> 
> I'm not sure what failing gracefully means here? The main reason this could
> fail is if there is not enough memory to allocate the patching_mm. The
> previous implementation had this justification for BUG_ON():

But the system can continue running just fine after this failure.
Only the things that make use of code patching will fail (ftrace, kgdb, ...)

Checkpatch tells: "Avoid crashing the kernel - try using WARN_ON & 
recovery code rather than BUG() or BUG_ON()"

All vital code patching has already been done previously, so I think a 
WARN_ON() should be enough, plus returning non 0 to indicate that the 
late_initcall failed.


> 
> /*
>   * Run as a late init call. This allows all the boot time patching to be done
>   * simply by patching the code, and then we're called here prior to
>   * mark_rodata_ro(), which happens after all init calls are run. Although
>   * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
>   * it as being preferable to a kernel that will crash later when someone tries
>   * to use patch_instruction().
>   */
> static int __init setup_text_poke_area(void)
> {
>          BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
>                  "powerpc/text_poke:online", text_area_cpu_up,
>                  text_area_cpu_down));
> 
>          return 0;
> }
> late_initcall(setup_text_poke_area);
> 
> I think the BUG_ON() is appropriate even if only to adhere to the previous
> judgement call. I can add a similar comment explaining the reasoning if
> that helps.
> 
>>> +
>>> +	/*
>>> +	 * In hash we cannot go above DEFAULT_MAP_WINDOW easily.
>>> +	 * XXX: Do we want additional bits of entropy for radix?
>>> +	 */
>>> +	patching_addr = (get_random_long() & PAGE_MASK) %
>>> +		(DEFAULT_MAP_WINDOW - PAGE_SIZE);
>>> +
>>> +	ptep = get_locked_pte(patching_mm, patching_addr, &ptl);
>>> +	BUG_ON(!ptep);
>>
>> Same here, can we fail gracefully instead ?
>>
> 
> Same reasoning as above.

Here as well, a WARN_ON() should be enough, the system will continue 
running after that.

> 
>>> +	pte_unmap_unlock(ptep, ptl);
>>> +}
>>> +
>>>    static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
>>>    
>>>    static int text_area_cpu_up(unsigned int cpu)
>>>
>>
>> Christophe

Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/lib: Initialize a temporary mm for code patching
  2020-04-08 11:01       ` Christophe Leroy
@ 2020-04-15  4:39         ` Christopher M Riedl
  2020-04-17  0:57         ` Michael Ellerman
  1 sibling, 0 replies; 23+ messages in thread
From: Christopher M Riedl @ 2020-04-15  4:39 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev

> On April 8, 2020 6:01 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
>  
> Le 31/03/2020 à 05:19, Christopher M Riedl a écrit :
> >> On March 24, 2020 11:10 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >>
> >>   
> >> Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit :
> >>> When code patching a STRICT_KERNEL_RWX kernel the page containing the
> >>> address to be patched is temporarily mapped with permissive memory
> >>> protections. Currently, a per-cpu vmalloc patch area is used for this
> >>> purpose. While the patch area is per-cpu, the temporary page mapping is
> >>> inserted into the kernel page tables for the duration of the patching.
> >>> The mapping is exposed to CPUs other than the patching CPU - this is
> >>> undesirable from a hardening perspective.
> >>>
> >>> Use the `poking_init` init hook to prepare a temporary mm and patching
> >>> address. Initialize the temporary mm by copying the init mm. Choose a
> >>> randomized patching address inside the temporary mm userspace address
> >>> portion. The next patch uses the temporary mm and patching address for
> >>> code patching.
> >>>
> >>> Based on x86 implementation:
> >>>
> >>> commit 4fc19708b165
> >>> ("x86/alternatives: Initialize temporary mm for patching")
> >>>
> >>> Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> >>> ---
> >>>    arch/powerpc/lib/code-patching.c | 26 ++++++++++++++++++++++++++
> >>>    1 file changed, 26 insertions(+)
> >>>
> >>> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> >>> index 3345f039a876..18b88ecfc5a8 100644
> >>> --- a/arch/powerpc/lib/code-patching.c
> >>> +++ b/arch/powerpc/lib/code-patching.c
> >>> @@ -11,6 +11,8 @@
> >>>    #include <linux/cpuhotplug.h>
> >>>    #include <linux/slab.h>
> >>>    #include <linux/uaccess.h>
> >>> +#include <linux/sched/task.h>
> >>> +#include <linux/random.h>
> >>>    
> >>>    #include <asm/pgtable.h>
> >>>    #include <asm/tlbflush.h>
> >>> @@ -39,6 +41,30 @@ int raw_patch_instruction(unsigned int *addr, unsigned int instr)
> >>>    }
> >>>    
> >>>    #ifdef CONFIG_STRICT_KERNEL_RWX
> >>> +
> >>> +__ro_after_init struct mm_struct *patching_mm;
> >>> +__ro_after_init unsigned long patching_addr;
> >>
> >> Can we make those those static ?
> >>
> > 
> > Yes, makes sense to me.
> > 
> >>> +
> >>> +void __init poking_init(void)
> >>> +{
> >>> +	spinlock_t *ptl; /* for protecting pte table */
> >>> +	pte_t *ptep;
> >>> +
> >>> +	patching_mm = copy_init_mm();
> >>> +	BUG_ON(!patching_mm);
> >>
> >> Does it needs to be a BUG_ON() ? Can't we fail gracefully with just a
> >> WARN_ON ?
> >>
> > 
> > I'm not sure what failing gracefully means here? The main reason this could
> > fail is if there is not enough memory to allocate the patching_mm. The
> > previous implementation had this justification for BUG_ON():
> 
> But the system can continue running just fine after this failure.
> Only the things that make use of code patching will fail (ftrace, kgdb, ...)
> 
> Checkpatch tells: "Avoid crashing the kernel - try using WARN_ON & 
> recovery code rather than BUG() or BUG_ON()"
> 
> All vital code patching has already been done previously, so I think a 
> WARN_ON() should be enough, plus returning non 0 to indicate that the 
> late_initcall failed.
> 
> 

Got it, makes sense to me. I will make these changes in the next version.
Thanks!

> > 
> > /*
> >   * Run as a late init call. This allows all the boot time patching to be done
> >   * simply by patching the code, and then we're called here prior to
> >   * mark_rodata_ro(), which happens after all init calls are run. Although
> >   * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
> >   * it as being preferable to a kernel that will crash later when someone tries
> >   * to use patch_instruction().
> >   */
> > static int __init setup_text_poke_area(void)
> > {
> >          BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> >                  "powerpc/text_poke:online", text_area_cpu_up,
> >                  text_area_cpu_down));
> > 
> >          return 0;
> > }
> > late_initcall(setup_text_poke_area);
> > 
> > I think the BUG_ON() is appropriate even if only to adhere to the previous
> > judgement call. I can add a similar comment explaining the reasoning if
> > that helps.
> > 
> >>> +
> >>> +	/*
> >>> +	 * In hash we cannot go above DEFAULT_MAP_WINDOW easily.
> >>> +	 * XXX: Do we want additional bits of entropy for radix?
> >>> +	 */
> >>> +	patching_addr = (get_random_long() & PAGE_MASK) %
> >>> +		(DEFAULT_MAP_WINDOW - PAGE_SIZE);
> >>> +
> >>> +	ptep = get_locked_pte(patching_mm, patching_addr, &ptl);
> >>> +	BUG_ON(!ptep);
> >>
> >> Same here, can we fail gracefully instead ?
> >>
> > 
> > Same reasoning as above.
> 
> Here as well, a WARN_ON() should be enough, the system will continue 
> running after that.
> 
> > 
> >>> +	pte_unmap_unlock(ptep, ptl);
> >>> +}
> >>> +
> >>>    static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> >>>    
> >>>    static int text_area_cpu_up(unsigned int cpu)
> >>>
> >>
> >> Christophe
> 
> Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/lib: Use a temporary mm for code patching
  2020-03-24 16:25   ` Christophe Leroy
@ 2020-04-15  5:11     ` Christopher M Riedl
  2020-04-15  8:45       ` Christophe Leroy
  0 siblings, 1 reply; 23+ messages in thread
From: Christopher M Riedl @ 2020-04-15  5:11 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev

> On March 24, 2020 11:25 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
>  
> Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit :
> > Currently, code patching a STRICT_KERNEL_RWX exposes the temporary
> > mappings to other CPUs. These mappings should be kept local to the CPU
> > doing the patching. Use the pre-initialized temporary mm and patching
> > address for this purpose. Also add a check after patching to ensure the
> > patch succeeded.
> > 
> > Based on x86 implementation:
> > 
> > commit b3fd8e83ada0
> > ("x86/alternatives: Use temporary mm for text poking")
> > 
> > Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> > ---
> >   arch/powerpc/lib/code-patching.c | 128 ++++++++++++++-----------------
> >   1 file changed, 57 insertions(+), 71 deletions(-)
> > 
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> > index 18b88ecfc5a8..f156132e8975 100644
> > --- a/arch/powerpc/lib/code-patching.c
> > +++ b/arch/powerpc/lib/code-patching.c
> > @@ -19,6 +19,7 @@
> >   #include <asm/page.h>
> >   #include <asm/code-patching.h>
> >   #include <asm/setup.h>
> > +#include <asm/mmu_context.h>
> >   
> >   static int __patch_instruction(unsigned int *exec_addr, unsigned int instr,
> >   			       unsigned int *patch_addr)
> > @@ -65,99 +66,79 @@ void __init poking_init(void)
> >   	pte_unmap_unlock(ptep, ptl);
> >   }
> >   
> > -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> > -
> > -static int text_area_cpu_up(unsigned int cpu)
> > -{
> > -	struct vm_struct *area;
> > -
> > -	area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> > -	if (!area) {
> > -		WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> > -			cpu);
> > -		return -1;
> > -	}
> > -	this_cpu_write(text_poke_area, area);
> > -
> > -	return 0;
> > -}
> > -
> > -static int text_area_cpu_down(unsigned int cpu)
> > -{
> > -	free_vm_area(this_cpu_read(text_poke_area));
> > -	return 0;
> > -}
> > -
> > -/*
> > - * Run as a late init call. This allows all the boot time patching to be done
> > - * simply by patching the code, and then we're called here prior to
> > - * mark_rodata_ro(), which happens after all init calls are run. Although
> > - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
> > - * it as being preferable to a kernel that will crash later when someone tries
> > - * to use patch_instruction().
> > - */
> > -static int __init setup_text_poke_area(void)
> > -{
> > -	BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> > -		"powerpc/text_poke:online", text_area_cpu_up,
> > -		text_area_cpu_down));
> > -
> > -	return 0;
> > -}
> > -late_initcall(setup_text_poke_area);
> > +struct patch_mapping {
> > +	spinlock_t *ptl; /* for protecting pte table */
> > +	struct temp_mm temp_mm;
> > +};
> >   
> >   /*
> >    * This can be called for kernel text or a module.
> >    */
> > -static int map_patch_area(void *addr, unsigned long text_poke_addr)
> > +static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
> 
> Why change the name ?
> 

It's not really an "area" anymore.

> >   {
> > -	unsigned long pfn;
> > -	int err;
> > +	struct page *page;
> > +	pte_t pte, *ptep;
> > +	pgprot_t pgprot;
> >   
> >   	if (is_vmalloc_addr(addr))
> > -		pfn = vmalloc_to_pfn(addr);
> > +		page = vmalloc_to_page(addr);
> >   	else
> > -		pfn = __pa_symbol(addr) >> PAGE_SHIFT;
> > +		page = virt_to_page(addr);
> >   
> > -	err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
> > +	if (radix_enabled())
> > +		pgprot = __pgprot(pgprot_val(PAGE_KERNEL));
> > +	else
> > +		pgprot = PAGE_SHARED;
> 
> Can you explain the difference between radix and non radix ?
> 
> Why PAGE_KERNEL for a page that is mapped in userspace ?
> 
> Why do you need to do __pgprot(pgprot_val(PAGE_KERNEL)) instead of just 
> using PAGE_KERNEL ?
> 

On hash there is a manual check which prevents setting _PAGE_PRIVILEGED for
kernel to userspace access in __hash_page - hence we cannot access the mapping
if the page is mapped PAGE_KERNEL on hash. However, I would like to use
PAGE_KERNEL here as well and am working on understanding why this check is
done in hash and if this can change. On radix this works just fine.

The page is mapped PAGE_KERNEL because the address is technically a userspace
address - but only to keep the mapping local to this CPU doing the patching.
PAGE_KERNEL makes it clear both in intent and protection that this is a kernel
mapping.

I think the correct way is pgprot_val(PAGE_KERNEL) since PAGE_KERNEL is defined
as:

#define PAGE_KERNEL	__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)

and __pgprot() is defined as:

typedef struct { unsigned long pgprot; } pgprot_t;
#define pgprot_val(x)   ((x).pgprot)
#define __pgprot(x)     ((pgprot_t) { (x) })

> >   
> > -	pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err);
> > -	if (err)
> > +	ptep = get_locked_pte(patching_mm, patching_addr, &patch_mapping->ptl);
> > +	if (unlikely(!ptep)) {
> > +		pr_warn("map patch: failed to allocate pte for patching\n");
> >   		return -1;
> > +	}
> > +
> > +	pte = mk_pte(page, pgprot);
> > +	set_pte_at(patching_mm, patching_addr, ptep, pte);
> > +
> > +	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
> > +	use_temporary_mm(&patch_mapping->temp_mm);
> >   
> >   	return 0;
> >   }
> >   
> > -static inline int unmap_patch_area(unsigned long addr)
> > +static int unmap_patch(struct patch_mapping *patch_mapping)
> >   {
> >   	pte_t *ptep;
> >   	pmd_t *pmdp;
> >   	pud_t *pudp;
> >   	pgd_t *pgdp;
> >   
> > -	pgdp = pgd_offset_k(addr);
> > +	pgdp = pgd_offset(patching_mm, patching_addr);
> >   	if (unlikely(!pgdp))
> >   		return -EINVAL;
> >   
> > -	pudp = pud_offset(pgdp, addr);
> > +	pudp = pud_offset(pgdp, patching_addr);
> >   	if (unlikely(!pudp))
> >   		return -EINVAL;
> >   
> > -	pmdp = pmd_offset(pudp, addr);
> > +	pmdp = pmd_offset(pudp, patching_addr);
> >   	if (unlikely(!pmdp))
> >   		return -EINVAL;
> >   
> > -	ptep = pte_offset_kernel(pmdp, addr);
> > +	ptep = pte_offset_kernel(pmdp, patching_addr);
> 
> ptep should be stored in the patch_mapping struct instead of walking 
> again the page tables.
> 

Oh yes - this will be in the next version.

> >   	if (unlikely(!ptep))
> >   		return -EINVAL;
> >   
> > -	pr_devel("clearing mm %p, pte %p, addr %lx\n", &init_mm, ptep, addr);
> > +	/*
> > +	 * In hash, pte_clear flushes the tlb
> > +	 */
> > +	pte_clear(patching_mm, patching_addr, ptep);
> > +	unuse_temporary_mm(&patch_mapping->temp_mm);
> >   
> >   	/*
> > -	 * In hash, pte_clear flushes the tlb, in radix, we have to
> > +	 * In radix, we have to explicitly flush the tlb (no-op in hash)
> >   	 */
> > -	pte_clear(&init_mm, addr, ptep);
> > -	flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> > +	local_flush_tlb_mm(patching_mm);
> > +	pte_unmap_unlock(ptep, patch_mapping->ptl);
> >   
> >   	return 0;
> >   }
> > @@ -167,33 +148,38 @@ static int do_patch_instruction(unsigned int *addr, unsigned int instr)
> >   	int err;
> >   	unsigned int *patch_addr = NULL;
> >   	unsigned long flags;
> > -	unsigned long text_poke_addr;
> > -	unsigned long kaddr = (unsigned long)addr;
> > +	struct patch_mapping patch_mapping;
> >   
> >   	/*
> > -	 * During early early boot patch_instruction is called
> > -	 * when text_poke_area is not ready, but we still need
> > -	 * to allow patching. We just do the plain old patching
> > +	 * The patching_mm is initialized before calling mark_rodata_ro. Prior
> > +	 * to this, patch_instruction is called when we don't have (and don't
> > +	 * need) the patching_mm so just do plain old patching.
> >   	 */
> > -	if (!this_cpu_read(text_poke_area))
> > +	if (!patching_mm)
> >   		return raw_patch_instruction(addr, instr);
> >   
> >   	local_irq_save(flags);
> >   
> > -	text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr;
> > -	if (map_patch_area(addr, text_poke_addr)) {
> > -		err = -1;
> > +	err = map_patch(addr, &patch_mapping);
> > +	if (err)
> >   		goto out;
> > -	}
> >   
> > -	patch_addr = (unsigned int *)(text_poke_addr) +
> > -			((kaddr & ~PAGE_MASK) / sizeof(unsigned int));
> > +	patch_addr = (unsigned int *)(patching_addr) +
> > +			(offset_in_page((unsigned long)addr) /
> > +				sizeof(unsigned int));
> >   
> >   	__patch_instruction(addr, instr, patch_addr);
> 
> The error returned by __patch_instruction() should be managed.
> 

Agreed, will do something in the next spin.

> >   
> > -	err = unmap_patch_area(text_poke_addr);
> > +	err = unmap_patch(&patch_mapping);
> >   	if (err)
> > -		pr_warn("failed to unmap %lx\n", text_poke_addr);
> > +		pr_warn("unmap patch: failed to unmap patch\n");
> > +
> > +	/*
> > +	 * Something is wrong if what we just wrote doesn't match what we
> > +	 * think we just wrote.
> > +	 * XXX: BUG_ON() instead?
> 
> No, not a BUG_ON(). If patching fails, that's no a vital fault, we can 
> fail gracefully. You should return a fault instead.
> 

Yup - will make these changes in the next version.

> > +	 */
> > +	WARN_ON(memcmp(addr, &instr, sizeof(instr)));
> 
> Come on. addr is an *int, instr is an int. By doing a memcmp() on 
> &instr, you for the compiler to write instr into the stack whereas local 
> vars are mainly in registers on RISC processors like powerpc. Following 
> should do it:
> 
> 	WARN_ON(*addr != instr);
> 

Oh man - I agree, that's just embarrassing.
Appreciate your feedback on this RFC series, thanks!

> >   
> >   out:
> >   	local_irq_restore(flags);
> > 
> 
> Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/lib: Use a temporary mm for code patching
  2020-04-15  5:11     ` Christopher M Riedl
@ 2020-04-15  8:45       ` Christophe Leroy
  2020-04-15 16:24         ` Christopher M Riedl
  0 siblings, 1 reply; 23+ messages in thread
From: Christophe Leroy @ 2020-04-15  8:45 UTC (permalink / raw)
  To: Christopher M Riedl, linuxppc-dev



Le 15/04/2020 à 07:11, Christopher M Riedl a écrit :
>> On March 24, 2020 11:25 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>
>>   
>> Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit :
>>> Currently, code patching a STRICT_KERNEL_RWX exposes the temporary
>>> mappings to other CPUs. These mappings should be kept local to the CPU
>>> doing the patching. Use the pre-initialized temporary mm and patching
>>> address for this purpose. Also add a check after patching to ensure the
>>> patch succeeded.
>>>
>>> Based on x86 implementation:
>>>
>>> commit b3fd8e83ada0
>>> ("x86/alternatives: Use temporary mm for text poking")
>>>
>>> Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
>>> ---
>>>    arch/powerpc/lib/code-patching.c | 128 ++++++++++++++-----------------
>>>    1 file changed, 57 insertions(+), 71 deletions(-)
>>>
>>> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
>>> index 18b88ecfc5a8..f156132e8975 100644
>>> --- a/arch/powerpc/lib/code-patching.c
>>> +++ b/arch/powerpc/lib/code-patching.c
>>> @@ -19,6 +19,7 @@
>>>    #include <asm/page.h>
>>>    #include <asm/code-patching.h>
>>>    #include <asm/setup.h>
>>> +#include <asm/mmu_context.h>
>>>    
>>>    static int __patch_instruction(unsigned int *exec_addr, unsigned int instr,
>>>    			       unsigned int *patch_addr)
>>> @@ -65,99 +66,79 @@ void __init poking_init(void)
>>>    	pte_unmap_unlock(ptep, ptl);
>>>    }
>>>    
>>> -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
>>> -
>>> -static int text_area_cpu_up(unsigned int cpu)
>>> -{
>>> -	struct vm_struct *area;
>>> -
>>> -	area = get_vm_area(PAGE_SIZE, VM_ALLOC);
>>> -	if (!area) {
>>> -		WARN_ONCE(1, "Failed to create text area for cpu %d\n",
>>> -			cpu);
>>> -		return -1;
>>> -	}
>>> -	this_cpu_write(text_poke_area, area);
>>> -
>>> -	return 0;
>>> -}
>>> -
>>> -static int text_area_cpu_down(unsigned int cpu)
>>> -{
>>> -	free_vm_area(this_cpu_read(text_poke_area));
>>> -	return 0;
>>> -}
>>> -
>>> -/*
>>> - * Run as a late init call. This allows all the boot time patching to be done
>>> - * simply by patching the code, and then we're called here prior to
>>> - * mark_rodata_ro(), which happens after all init calls are run. Although
>>> - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
>>> - * it as being preferable to a kernel that will crash later when someone tries
>>> - * to use patch_instruction().
>>> - */
>>> -static int __init setup_text_poke_area(void)
>>> -{
>>> -	BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
>>> -		"powerpc/text_poke:online", text_area_cpu_up,
>>> -		text_area_cpu_down));
>>> -
>>> -	return 0;
>>> -}
>>> -late_initcall(setup_text_poke_area);
>>> +struct patch_mapping {
>>> +	spinlock_t *ptl; /* for protecting pte table */
>>> +	struct temp_mm temp_mm;
>>> +};
>>>    
>>>    /*
>>>     * This can be called for kernel text or a module.
>>>     */
>>> -static int map_patch_area(void *addr, unsigned long text_poke_addr)
>>> +static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
>>
>> Why change the name ?
>>
> 
> It's not really an "area" anymore.
> 
>>>    {
>>> -	unsigned long pfn;
>>> -	int err;
>>> +	struct page *page;
>>> +	pte_t pte, *ptep;
>>> +	pgprot_t pgprot;
>>>    
>>>    	if (is_vmalloc_addr(addr))
>>> -		pfn = vmalloc_to_pfn(addr);
>>> +		page = vmalloc_to_page(addr);
>>>    	else
>>> -		pfn = __pa_symbol(addr) >> PAGE_SHIFT;
>>> +		page = virt_to_page(addr);
>>>    
>>> -	err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
>>> +	if (radix_enabled())
>>> +		pgprot = __pgprot(pgprot_val(PAGE_KERNEL));
>>> +	else
>>> +		pgprot = PAGE_SHARED;
>>
>> Can you explain the difference between radix and non radix ?
>>
>> Why PAGE_KERNEL for a page that is mapped in userspace ?
>>
>> Why do you need to do __pgprot(pgprot_val(PAGE_KERNEL)) instead of just
>> using PAGE_KERNEL ?
>>
> 
> On hash there is a manual check which prevents setting _PAGE_PRIVILEGED for
> kernel to userspace access in __hash_page - hence we cannot access the mapping
> if the page is mapped PAGE_KERNEL on hash. However, I would like to use
> PAGE_KERNEL here as well and am working on understanding why this check is
> done in hash and if this can change. On radix this works just fine.
> 
> The page is mapped PAGE_KERNEL because the address is technically a userspace
> address - but only to keep the mapping local to this CPU doing the patching.
> PAGE_KERNEL makes it clear both in intent and protection that this is a kernel
> mapping.
> 
> I think the correct way is pgprot_val(PAGE_KERNEL) since PAGE_KERNEL is defined
> as:
> 
> #define PAGE_KERNEL	__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
> 
> and __pgprot() is defined as:
> 
> typedef struct { unsigned long pgprot; } pgprot_t;
> #define pgprot_val(x)   ((x).pgprot)
> #define __pgprot(x)     ((pgprot_t) { (x) })


Yes, so:
	pgprot_val(__pgprot(x)) == x


You do:

	pgprot = __pgprot(pgprot_val(PAGE_KERNEL));

Which is:

	pgprot = __pgprot(pgprot_val(__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)));

Which is equivalent to:

	pgprot = __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW);

So at the end it should simply be:

	pgprot = PAGE_KERNEL;




Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 3/3] powerpc/lib: Use a temporary mm for code patching
  2020-04-15  8:45       ` Christophe Leroy
@ 2020-04-15 16:24         ` Christopher M Riedl
  0 siblings, 0 replies; 23+ messages in thread
From: Christopher M Riedl @ 2020-04-15 16:24 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev

> On April 15, 2020 3:45 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
>  
> Le 15/04/2020 à 07:11, Christopher M Riedl a écrit :
> >> On March 24, 2020 11:25 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >>
> >>   
> >> Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit :
> >>> Currently, code patching a STRICT_KERNEL_RWX exposes the temporary
> >>> mappings to other CPUs. These mappings should be kept local to the CPU
> >>> doing the patching. Use the pre-initialized temporary mm and patching
> >>> address for this purpose. Also add a check after patching to ensure the
> >>> patch succeeded.
> >>>
> >>> Based on x86 implementation:
> >>>
> >>> commit b3fd8e83ada0
> >>> ("x86/alternatives: Use temporary mm for text poking")
> >>>
> >>> Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> >>> ---
> >>>    arch/powerpc/lib/code-patching.c | 128 ++++++++++++++-----------------
> >>>    1 file changed, 57 insertions(+), 71 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> >>> index 18b88ecfc5a8..f156132e8975 100644
> >>> --- a/arch/powerpc/lib/code-patching.c
> >>> +++ b/arch/powerpc/lib/code-patching.c
> >>> @@ -19,6 +19,7 @@
> >>>    #include <asm/page.h>
> >>>    #include <asm/code-patching.h>
> >>>    #include <asm/setup.h>
> >>> +#include <asm/mmu_context.h>
> >>>    
> >>>    static int __patch_instruction(unsigned int *exec_addr, unsigned int instr,
> >>>    			       unsigned int *patch_addr)
> >>> @@ -65,99 +66,79 @@ void __init poking_init(void)
> >>>    	pte_unmap_unlock(ptep, ptl);
> >>>    }
> >>>    
> >>> -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
> >>> -
> >>> -static int text_area_cpu_up(unsigned int cpu)
> >>> -{
> >>> -	struct vm_struct *area;
> >>> -
> >>> -	area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> >>> -	if (!area) {
> >>> -		WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> >>> -			cpu);
> >>> -		return -1;
> >>> -	}
> >>> -	this_cpu_write(text_poke_area, area);
> >>> -
> >>> -	return 0;
> >>> -}
> >>> -
> >>> -static int text_area_cpu_down(unsigned int cpu)
> >>> -{
> >>> -	free_vm_area(this_cpu_read(text_poke_area));
> >>> -	return 0;
> >>> -}
> >>> -
> >>> -/*
> >>> - * Run as a late init call. This allows all the boot time patching to be done
> >>> - * simply by patching the code, and then we're called here prior to
> >>> - * mark_rodata_ro(), which happens after all init calls are run. Although
> >>> - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
> >>> - * it as being preferable to a kernel that will crash later when someone tries
> >>> - * to use patch_instruction().
> >>> - */
> >>> -static int __init setup_text_poke_area(void)
> >>> -{
> >>> -	BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> >>> -		"powerpc/text_poke:online", text_area_cpu_up,
> >>> -		text_area_cpu_down));
> >>> -
> >>> -	return 0;
> >>> -}
> >>> -late_initcall(setup_text_poke_area);
> >>> +struct patch_mapping {
> >>> +	spinlock_t *ptl; /* for protecting pte table */
> >>> +	struct temp_mm temp_mm;
> >>> +};
> >>>    
> >>>    /*
> >>>     * This can be called for kernel text or a module.
> >>>     */
> >>> -static int map_patch_area(void *addr, unsigned long text_poke_addr)
> >>> +static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
> >>
> >> Why change the name ?
> >>
> > 
> > It's not really an "area" anymore.
> > 
> >>>    {
> >>> -	unsigned long pfn;
> >>> -	int err;
> >>> +	struct page *page;
> >>> +	pte_t pte, *ptep;
> >>> +	pgprot_t pgprot;
> >>>    
> >>>    	if (is_vmalloc_addr(addr))
> >>> -		pfn = vmalloc_to_pfn(addr);
> >>> +		page = vmalloc_to_page(addr);
> >>>    	else
> >>> -		pfn = __pa_symbol(addr) >> PAGE_SHIFT;
> >>> +		page = virt_to_page(addr);
> >>>    
> >>> -	err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
> >>> +	if (radix_enabled())
> >>> +		pgprot = __pgprot(pgprot_val(PAGE_KERNEL));
> >>> +	else
> >>> +		pgprot = PAGE_SHARED;
> >>
> >> Can you explain the difference between radix and non radix ?
> >>
> >> Why PAGE_KERNEL for a page that is mapped in userspace ?
> >>
> >> Why do you need to do __pgprot(pgprot_val(PAGE_KERNEL)) instead of just
> >> using PAGE_KERNEL ?
> >>
> > 
> > On hash there is a manual check which prevents setting _PAGE_PRIVILEGED for
> > kernel to userspace access in __hash_page - hence we cannot access the mapping
> > if the page is mapped PAGE_KERNEL on hash. However, I would like to use
> > PAGE_KERNEL here as well and am working on understanding why this check is
> > done in hash and if this can change. On radix this works just fine.
> > 
> > The page is mapped PAGE_KERNEL because the address is technically a userspace
> > address - but only to keep the mapping local to this CPU doing the patching.
> > PAGE_KERNEL makes it clear both in intent and protection that this is a kernel
> > mapping.
> > 
> > I think the correct way is pgprot_val(PAGE_KERNEL) since PAGE_KERNEL is defined
> > as:
> > 
> > #define PAGE_KERNEL	__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
> > 
> > and __pgprot() is defined as:
> > 
> > typedef struct { unsigned long pgprot; } pgprot_t;
> > #define pgprot_val(x)   ((x).pgprot)
> > #define __pgprot(x)     ((pgprot_t) { (x) })
> 
> 
> Yes, so:
> 	pgprot_val(__pgprot(x)) == x
> 
> 
> You do:
> 
> 	pgprot = __pgprot(pgprot_val(PAGE_KERNEL));
> 
> Which is:
> 
> 	pgprot = __pgprot(pgprot_val(__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)));
> 
> Which is equivalent to:
> 
> 	pgprot = __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW);
> 
> So at the end it should simply be:
> 
> 	pgprot = PAGE_KERNEL;
> 

Yes you're correct. Picking this up in the next spin.

> 
> 
> 
> Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/lib: Initialize a temporary mm for code patching
  2020-04-08 11:01       ` Christophe Leroy
  2020-04-15  4:39         ` Christopher M Riedl
@ 2020-04-17  0:57         ` Michael Ellerman
  2020-04-24 13:11           ` Steven Rostedt
  1 sibling, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2020-04-17  0:57 UTC (permalink / raw)
  To: Christophe Leroy, Christopher M Riedl, linuxppc-dev

Christophe Leroy <christophe.leroy@c-s.fr> writes:
> Le 31/03/2020 à 05:19, Christopher M Riedl a écrit :
>>> On March 24, 2020 11:10 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>> Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit :
>>>> When code patching a STRICT_KERNEL_RWX kernel the page containing the
>>>> address to be patched is temporarily mapped with permissive memory
>>>> protections. Currently, a per-cpu vmalloc patch area is used for this
>>>> purpose. While the patch area is per-cpu, the temporary page mapping is
>>>> inserted into the kernel page tables for the duration of the patching.
>>>> The mapping is exposed to CPUs other than the patching CPU - this is
>>>> undesirable from a hardening perspective.
>>>>
>>>> Use the `poking_init` init hook to prepare a temporary mm and patching
>>>> address. Initialize the temporary mm by copying the init mm. Choose a
>>>> randomized patching address inside the temporary mm userspace address
>>>> portion. The next patch uses the temporary mm and patching address for
>>>> code patching.
>>>>
>>>> Based on x86 implementation:
>>>>
>>>> commit 4fc19708b165
>>>> ("x86/alternatives: Initialize temporary mm for patching")
>>>>
>>>> Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
>>>> ---
>>>>    arch/powerpc/lib/code-patching.c | 26 ++++++++++++++++++++++++++
>>>>    1 file changed, 26 insertions(+)
>>>>
>>>> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
>>>> index 3345f039a876..18b88ecfc5a8 100644
>>>> --- a/arch/powerpc/lib/code-patching.c
>>>> +++ b/arch/powerpc/lib/code-patching.c
>>>> @@ -11,6 +11,8 @@
>>>>    #include <linux/cpuhotplug.h>
>>>>    #include <linux/slab.h>
>>>>    #include <linux/uaccess.h>
>>>> +#include <linux/sched/task.h>
>>>> +#include <linux/random.h>
>>>>    
>>>>    #include <asm/pgtable.h>
>>>>    #include <asm/tlbflush.h>
>>>> @@ -39,6 +41,30 @@ int raw_patch_instruction(unsigned int *addr, unsigned int instr)
>>>>    }
>>>>    
>>>>    #ifdef CONFIG_STRICT_KERNEL_RWX
>>>> +
>>>> +__ro_after_init struct mm_struct *patching_mm;
>>>> +__ro_after_init unsigned long patching_addr;
>>>
>>> Can we make those those static ?
>>>
>> 
>> Yes, makes sense to me.
>> 
>>>> +
>>>> +void __init poking_init(void)
>>>> +{
>>>> +	spinlock_t *ptl; /* for protecting pte table */
>>>> +	pte_t *ptep;
>>>> +
>>>> +	patching_mm = copy_init_mm();
>>>> +	BUG_ON(!patching_mm);
>>>
>>> Does it needs to be a BUG_ON() ? Can't we fail gracefully with just a
>>> WARN_ON ?
>>>
>> 
>> I'm not sure what failing gracefully means here? The main reason this could
>> fail is if there is not enough memory to allocate the patching_mm. The
>> previous implementation had this justification for BUG_ON():
>
> But the system can continue running just fine after this failure.
> Only the things that make use of code patching will fail (ftrace, kgdb, ...)

That's probably true of ftrace, but we can't fail patching for jump
labels (static keys).

See:

void arch_jump_label_transform(struct jump_entry *entry,
			       enum jump_label_type type)
{
	u32 *addr = (u32 *)(unsigned long)entry->code;

	if (type == JUMP_LABEL_JMP)
		patch_branch(addr, entry->target, 0);
	else
		patch_instruction(addr, PPC_INST_NOP);
}

cheers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 1/3] powerpc/mm: Introduce temporary mm
  2020-03-23  4:52 ` [RFC PATCH 1/3] powerpc/mm: Introduce temporary mm Christopher M. Riedl
  2020-03-23  6:10   ` kbuild test robot
  2020-03-24 16:07   ` Christophe Leroy
@ 2020-04-18 10:36   ` Christophe Leroy
  2 siblings, 0 replies; 23+ messages in thread
From: Christophe Leroy @ 2020-04-18 10:36 UTC (permalink / raw)
  To: Christopher M. Riedl, linuxppc-dev



Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit :
> x86 supports the notion of a temporary mm which restricts access to
> temporary PTEs to a single CPU. A temporary mm is useful for situations
> where a CPU needs to perform sensitive operations (such as patching a
> STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
> said mappings to other CPUs. A side benefit is that other CPU TLBs do
> not need to be flushed when the temporary mm is torn down.
> 
> Mappings in the temporary mm can be set in the userspace portion of the
> address-space.
> 
> Interrupts must be disabled while the temporary mm is in use. HW
> breakpoints, which may have been set by userspace as watchpoints on
> addresses now within the temporary mm, are saved and disabled when
> loading the temporary mm. The HW breakpoints are restored when unloading
> the temporary mm. All HW breakpoints are indiscriminately disabled while
> the temporary mm is in use.
> 
> Based on x86 implementation:
> 
> commit cefa929c034e
> ("x86/mm: Introduce temporary mm structs")
> 
> Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
> ---
>   arch/powerpc/include/asm/debug.h       |  1 +
>   arch/powerpc/include/asm/mmu_context.h | 56 +++++++++++++++++++++++++-
>   arch/powerpc/kernel/process.c          |  5 +++
>   3 files changed, 61 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
> index 7756026b95ca..b945bc16c932 100644
> --- a/arch/powerpc/include/asm/debug.h
> +++ b/arch/powerpc/include/asm/debug.h
> @@ -45,6 +45,7 @@ static inline int debugger_break_match(struct pt_regs *regs) { return 0; }
>   static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; }
>   #endif
>   
> +void __get_breakpoint(struct arch_hw_breakpoint *brk);
>   void __set_breakpoint(struct arch_hw_breakpoint *brk);
>   bool ppc_breakpoint_available(void);
>   #ifdef CONFIG_PPC_ADV_DEBUG_REGS
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 360367c579de..3e6381d04c28 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -7,9 +7,10 @@
>   #include <linux/mm.h>
>   #include <linux/sched.h>
>   #include <linux/spinlock.h>
> -#include <asm/mmu.h>	
> +#include <asm/mmu.h>
>   #include <asm/cputable.h>
>   #include <asm/cputhreads.h>
> +#include <asm/hw_breakpoint.h>
>   
>   /*
>    * Most if the context management is out of line
> @@ -270,5 +271,58 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
>   	return 0;
>   }
>   
> +struct temp_mm {
> +	struct mm_struct *temp;
> +	struct mm_struct *prev;
> +	bool is_kernel_thread;
> +	struct arch_hw_breakpoint brk;
> +};
> +
> +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
> +{
> +	temp_mm->temp = mm;
> +	temp_mm->prev = NULL;
> +	temp_mm->is_kernel_thread = false;
> +	memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
> +}
> +
> +static inline void use_temporary_mm(struct temp_mm *temp_mm)
> +{
> +	lockdep_assert_irqs_disabled();
> +
> +	temp_mm->is_kernel_thread = current->mm == NULL;
> +	if (temp_mm->is_kernel_thread)
> +		temp_mm->prev = current->active_mm;
> +	else
> +		temp_mm->prev = current->mm;

Could be:
	temp_mm->prev = current->mm ? : current->active_mm;

> +
> +	/*
> +	 * Hash requires a non-NULL current->mm to allocate a userspace address
> +	 * when handling a page fault. Does not appear to hurt in Radix either.
> +	 */
> +	current->mm = temp_mm->temp;
> +	switch_mm_irqs_off(NULL, temp_mm->temp, current);
> +
> +	if (ppc_breakpoint_available()) {
> +		__get_breakpoint(&temp_mm->brk);
> +		if (temp_mm->brk.type != 0)
> +			hw_breakpoint_disable();
> +	}

There is a series for having more than one breakpoint, see 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=170073&state=*

> +}
> +
> +static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
> +{
> +	lockdep_assert_irqs_disabled();
> +
> +	if (temp_mm->is_kernel_thread)
> +		current->mm = NULL;
> +	else
> +		current->mm = temp_mm->prev;

Could be:
	current->mm = !temp_mm->is_kernel_thread ? temp_mm->prev : NULL;

> +	switch_mm_irqs_off(NULL, temp_mm->prev, current);
> +
> +	if (ppc_breakpoint_available() && temp_mm->brk.type != 0)
> +		__set_breakpoint(&temp_mm->brk);
> +}
> +
>   #endif /* __KERNEL__ */
>   #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index fad50db9dcf2..5e5cf33fc358 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -793,6 +793,11 @@ static inline int set_breakpoint_8xx(struct arch_hw_breakpoint *brk)
>   	return 0;
>   }
>   
> +void __get_breakpoint(struct arch_hw_breakpoint *brk)
> +{
> +	memcpy(brk, this_cpu_ptr(&current_brk), sizeof(*brk));
> +}
> +
>   void __set_breakpoint(struct arch_hw_breakpoint *brk)
>   {
>   	memcpy(this_cpu_ptr(&current_brk), brk, sizeof(*brk));
> 

Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 2/3] powerpc/lib: Initialize a temporary mm for code patching
  2020-04-17  0:57         ` Michael Ellerman
@ 2020-04-24 13:11           ` Steven Rostedt
  0 siblings, 0 replies; 23+ messages in thread
From: Steven Rostedt @ 2020-04-24 13:11 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Christopher M Riedl

On Fri, 17 Apr 2020 10:57:10 +1000
Michael Ellerman <mpe@ellerman.id.au> wrote:

> >>> Does it needs to be a BUG_ON() ? Can't we fail gracefully with just a
> >>> WARN_ON ?
> >>>  
> >> 
> >> I'm not sure what failing gracefully means here? The main reason this could
> >> fail is if there is not enough memory to allocate the patching_mm. The
> >> previous implementation had this justification for BUG_ON():  
> >
> > But the system can continue running just fine after this failure.
> > Only the things that make use of code patching will fail (ftrace, kgdb, ...)  
> 
> That's probably true of ftrace, but we can't fail patching for jump
> labels (static keys).
> 
> See:
> 
> void arch_jump_label_transform(struct jump_entry *entry,
> 			       enum jump_label_type type)
> {
> 	u32 *addr = (u32 *)(unsigned long)entry->code;
> 
> 	if (type == JUMP_LABEL_JMP)
> 		patch_branch(addr, entry->target, 0);
> 	else
> 		patch_instruction(addr, PPC_INST_NOP);
> }

I would still error on a WARN_ON() as a lot of static keys should still
work if they don't get switched over.

If a user is concerned about something like this, they can always set
panic_on_warn.

-- Steve

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-04-24 13:14 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-23  4:52 [RFC PATCH 0/3] Use per-CPU temporary mappings for patching Christopher M. Riedl
2020-03-23  4:52 ` [RFC PATCH 1/3] powerpc/mm: Introduce temporary mm Christopher M. Riedl
2020-03-23  6:10   ` kbuild test robot
2020-03-24 16:07   ` Christophe Leroy
2020-03-31  2:41     ` Christopher M Riedl
2020-04-18 10:36   ` Christophe Leroy
2020-03-23  4:52 ` [RFC PATCH 2/3] powerpc/lib: Initialize a temporary mm for code patching Christopher M. Riedl
2020-03-24 16:10   ` Christophe Leroy
2020-03-31  3:19     ` Christopher M Riedl
2020-04-08 11:01       ` Christophe Leroy
2020-04-15  4:39         ` Christopher M Riedl
2020-04-17  0:57         ` Michael Ellerman
2020-04-24 13:11           ` Steven Rostedt
2020-03-23  4:52 ` [RFC PATCH 3/3] powerpc/lib: Use " Christopher M. Riedl
2020-03-24 16:25   ` Christophe Leroy
2020-04-15  5:11     ` Christopher M Riedl
2020-04-15  8:45       ` Christophe Leroy
2020-04-15 16:24         ` Christopher M Riedl
2020-03-23 11:30 ` [RFC PATCH 0/3] Use per-CPU temporary mappings for patching Christophe Leroy
2020-03-23 14:04   ` Christophe Leroy
     [not found]     ` <738663735.45060.1584982175261@privateemail.com>
2020-03-23 16:59       ` Christophe Leroy
2020-03-24 16:02     ` Christophe Leroy
2020-03-25  2:51 ` Andrew Donnellan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.