All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] Use per-CPU temporary mappings for patching
@ 2020-08-27  5:26 Christopher M. Riedl
  2020-08-27  5:26 ` [PATCH v3 1/6] powerpc: Add LKDTM accessor for patching addr Christopher M. Riedl
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Christopher M. Riedl @ 2020-08-27  5:26 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: kernel-hardening

When compiled with CONFIG_STRICT_KERNEL_RWX, the kernel must create
temporary mappings when patching itself. These mappings temporarily
override the strict RWX text protections to permit a write. Currently,
powerpc allocates a per-CPU VM area for patching. Patching occurs as
follows:

	1. Map page of text to be patched to per-CPU VM area w/
	   PAGE_KERNEL protection
	2. Patch text
	3. Remove the temporary mapping

While the VM area is per-CPU, the mapping is actually inserted into the
kernel page tables. Presumably, this could allow another CPU to access
the normally write-protected text - either malicously or accidentally -
via this same mapping if the address of the VM area is known. Ideally,
the mapping should be kept local to the CPU doing the patching (or any
other sensitive operations requiring temporarily overriding memory
protections) [0].

x86 introduced "temporary mm" structs which allow the creation of
mappings local to a particular CPU [1]. This series intends to bring the
notion of a temporary mm to powerpc and harden powerpc by using such a
mapping for patching a kernel with strict RWX permissions.

The first, second, and third patches implement an LKDTM test
"proof-of-concept" which exploits the potential vulnerability (ie. the
mapping during patching is exposed in kernel page tables and accessible
by other CPUS). The LKDTM test is somewhat "rough" in that it uses a
brute-force approach - I am open to any suggestions and/or ideas to
improve this. Currently, the LKDTM test passes with this series on
POWER8 (hash) and POWER9 (radix, hash) and fails without this series
(ie. the temporary mapping for patching is exposed to CPUs other than
the patching CPU).

The test is also implemented on x86_64 where it passes with a current
tree and fails only when using a tree prior to the temporary mappings. I
have such a tree here which intentionally fails:
https://github.com/cmr-informatik/linux/tree/x86_64-non-percpu-lkdtm

The fourth patch introduces the temporary mm struct and API for powerpc
along with a new function to retrieve a current hw breakpoint.

The fifth patch uses the `poking_init` init hook added by the x86
patches to initialize a temporary mm and patching address. The patching
address is randomized between PAGE_SIZE and DEFAULT_MAP_WINDOW-PAGE_SIZE.
The upper limit is necessary due to how the hash MMU operates - by default
the space above DEFAULT_MAP_WINDOW is not available. For now, both hash
and radix randomize inside this range. The number of possible random
addresses is dependent on PAGE_SIZE and limited by DEFAULT_MAP_WINDOW.

Bits of entropy with 64K page size on BOOK3S_64:

	bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)

	PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
	bits of entropy = log2(128TB / 64K)
	bits of entropy = 31

Randomization occurs only once during initialization at boot.

The sixth patch replaces the VM area with the temporary mm in the
patching code. The page for patching has to be mapped PAGE_SHARED with
the hash MMU since hash prevents the kernel from accessing userspace
pages with PAGE_PRIVILEGED bit set. On the radix MMU the page is mapped
with PAGE_KERNEL which has the added benefit that we can skip KUAP. 

Tested on Blackbird (8-core POWER9) w/ Hash (`disable_radix`) and Radix
MMUs, QEMU (TCG) POWER8 and POWER9, POWER8 VM.

Tested LKDTM test (passing and failing situations) on QEMU x86_64.

v3:	* Rebase on linuxppc/next:
	  commit 9123e3a74ec7 ("Linux 5.9-rc1")
	* Move temporary mm implementation into code-patching.c where it
	  belongs
	* Implement LKDTM hijacker test on x86_64 (on IBM time oof)
	* Do not use address zero for the patching address in the
	  temporary mm (thanks @dja for pointing this out!)
	* Wrap the LKDTM test w/ CONFIG_SMP as suggested by Christophe
	  Leroy
	* Comments to clarify PTE pre-allocation and patching addr
	  selection

v2:	* Rebase on linuxppc/next:
	  commit 105fb38124a4 ("powerpc/8xx: Modify ptep_get()")
	* Always dirty pte when mapping patch
	* Use `ppc_inst_len` instead of `sizeof` on instructions
	* Declare LKDTM patching addr accessor in header where it
	  belongs	

v1:	* Rebase on linuxppc/next (4336b9337824)
	* Save and restore second hw watchpoint
	* Use new ppc_inst_* functions for patching check and in LKDTM
	  test

rfc-v2:	* Many fixes and improvements mostly based on extensive feedback and
          testing by Christophe Leroy (thanks!).
	* Make patching_mm and patching_addr static and mode '__ro_after_init'
	  to after the variable name (more common in other parts of the kernel)
	* Use 'asm/debug.h' header instead of 'asm/hw_breakpoint.h' to fix
	  PPC64e compile
	* Add comment explaining why we use BUG_ON() during the init call to
	  setup for patching later
	* Move ptep into patch_mapping to avoid walking page tables a second
	  time when unmapping the temporary mapping
	* Use KUAP under non-radix, also manually dirty the PTE for patch
	  mapping on non-BOOK3S_64 platforms
	* Properly return any error from __patch_instruction
	* Do not use 'memcmp' where a simple comparison is appropriate
	* Simplify expression for patch address by removing pointer maths
	* Add LKDTM test


[0]: https://github.com/linuxppc/issues/issues/224
[1]: https://lore.kernel.org/kernel-hardening/20190426232303.28381-1-nadav.amit@gmail.com/



Christopher M. Riedl (6):
  powerpc: Add LKDTM accessor for patching addr
  x86: Add LKDTM accessor for patching addr
  Add LKDTM test to hijack a patch mapping (powerpc,x86_64)
  powerpc: Introduce temporary mm
  powerpc: Initialize a temporary mm for code patching
  powerpc: Use a temporary mm for code patching

 arch/powerpc/include/asm/code-patching.h |   4 +
 arch/powerpc/include/asm/debug.h         |   1 +
 arch/powerpc/kernel/process.c            |   5 +
 arch/powerpc/lib/code-patching.c         | 239 +++++++++++++++--------
 arch/x86/include/asm/text-patching.h     |   4 +
 arch/x86/kernel/alternative.c            |   7 +
 drivers/misc/lkdtm/core.c                |   1 +
 drivers/misc/lkdtm/lkdtm.h               |   1 +
 drivers/misc/lkdtm/perms.c               | 146 ++++++++++++++
 9 files changed, 322 insertions(+), 86 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/6] powerpc: Add LKDTM accessor for patching addr
  2020-08-27  5:26 [PATCH v3 0/6] Use per-CPU temporary mappings for patching Christopher M. Riedl
@ 2020-08-27  5:26 ` Christopher M. Riedl
  2020-08-27  5:26 ` [PATCH v3 2/6] x86: " Christopher M. Riedl
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Christopher M. Riedl @ 2020-08-27  5:26 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: kernel-hardening

When live patching a STRICT_RWX kernel, a mapping is installed at a
"patching address" with temporary write permissions. Provide a
LKDTM-only accessor function for this address in preparation for a LKDTM
test which attempts to "hijack" this mapping by writing to it from
another CPU.

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
 arch/powerpc/include/asm/code-patching.h | 4 ++++
 arch/powerpc/lib/code-patching.c         | 7 +++++++
 2 files changed, 11 insertions(+)

diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index eacc9102c251..7216d6a6bb0a 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -187,4 +187,8 @@ static inline unsigned long ppc_kallsyms_lookup_name(const char *name)
 				 ___PPC_RA(__REG_R1) | PPC_LR_STKOFF)
 #endif /* CONFIG_PPC64 */
 
+#if defined(CONFIG_LKDTM) && defined(CONFIG_STRICT_KERNEL_RWX)
+unsigned long read_cpu_patching_addr(unsigned int cpu);
+#endif
+
 #endif /* _ASM_POWERPC_CODE_PATCHING_H */
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 8c3934ea6220..85d3fdca9452 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -46,6 +46,13 @@ int raw_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
 #ifdef CONFIG_STRICT_KERNEL_RWX
 static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
 
+#ifdef CONFIG_LKDTM
+unsigned long read_cpu_patching_addr(unsigned int cpu)
+{
+	return (unsigned long)(per_cpu(text_poke_area, cpu))->addr;
+}
+#endif
+
 static int text_area_cpu_up(unsigned int cpu)
 {
 	struct vm_struct *area;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/6] x86: Add LKDTM accessor for patching addr
  2020-08-27  5:26 [PATCH v3 0/6] Use per-CPU temporary mappings for patching Christopher M. Riedl
  2020-08-27  5:26 ` [PATCH v3 1/6] powerpc: Add LKDTM accessor for patching addr Christopher M. Riedl
@ 2020-08-27  5:26 ` Christopher M. Riedl
  2020-08-27  5:26   ` [PATCH v3 3/6] Add LKDTM test to hijack a patch mapping (powerpc,x86_64) Christopher M. Riedl
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Christopher M. Riedl @ 2020-08-27  5:26 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: kernel-hardening

When live patching a STRICT_RWX kernel, a mapping is installed at a
"patching address" with temporary write permissions. Provide a
LKDTM-only accessor function for this address in preparation for a LKDTM
test which attempts to "hijack" this mapping by writing to it from
another CPU.

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
 arch/x86/include/asm/text-patching.h | 4 ++++
 arch/x86/kernel/alternative.c        | 7 +++++++
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index 6593b42cb379..f67b4dd30bf8 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -148,4 +148,8 @@ void int3_emulate_call(struct pt_regs *regs, unsigned long func)
 }
 #endif /* !CONFIG_UML_X86 */
 
+#ifdef CONFIG_LKDTM
+unsigned long read_cpu_patching_addr(unsigned int cpu);
+#endif
+
 #endif /* _ASM_X86_TEXT_PATCHING_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index c3daf0aaa0ee..c16833f35a1f 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -843,6 +843,13 @@ static inline void unuse_temporary_mm(temp_mm_state_t prev_state)
 __ro_after_init struct mm_struct *poking_mm;
 __ro_after_init unsigned long poking_addr;
 
+#ifdef CONFIG_LKDTM
+unsigned long read_cpu_patching_addr(unsigned int cpu)
+{
+	return poking_addr;
+}
+#endif
+
 static void *__text_poke(void *addr, const void *opcode, size_t len)
 {
 	bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/6] Add LKDTM test to hijack a patch mapping (powerpc, x86_64)
  2020-08-27  5:26 [PATCH v3 0/6] Use per-CPU temporary mappings for patching Christopher M. Riedl
@ 2020-08-27  5:26   ` Christopher M. Riedl
  2020-08-27  5:26 ` [PATCH v3 2/6] x86: " Christopher M. Riedl
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Christopher M. Riedl @ 2020-08-27  5:26 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: kernel-hardening

When live patching with STRICT_KERNEL_RWX, the CPU doing the patching
must temporarily remap the page(s) containing the patch site with +W
permissions. While this temporary mapping is in use another CPU could
write to the same mapping and maliciously alter kernel text. Implement a
LKDTM test to attempt to exploit such an opening from another (ie. not
the patching) CPU. The test is implemented on x86_64 and powerpc only.

The LKDTM "hijack" test works as follows:

	1. A CPU executes an infinite loop to patch an instruction.
	   This is the "patching" CPU.
	2. Another CPU attempts to write to the address of the temporary
	   mapping used by the "patching" CPU. This other CPU is the
	   "hijacker" CPU. The hijack either fails with a segfault or
	   succeeds, in which case some kernel text is now overwritten.

How to run the test:

	mount -t debugfs none /sys/kernel/debug
	(echo HIJACK_PATCH > /sys/kernel/debug/provoke-crash/DIRECT)

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
 drivers/misc/lkdtm/core.c  |   1 +
 drivers/misc/lkdtm/lkdtm.h |   1 +
 drivers/misc/lkdtm/perms.c | 146 +++++++++++++++++++++++++++++++++++++
 3 files changed, 148 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index a5e344df9166..482e72f6a1e1 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -145,6 +145,7 @@ static const struct crashtype crashtypes[] = {
 	CRASHTYPE(WRITE_RO),
 	CRASHTYPE(WRITE_RO_AFTER_INIT),
 	CRASHTYPE(WRITE_KERN),
+	CRASHTYPE(HIJACK_PATCH),
 	CRASHTYPE(REFCOUNT_INC_OVERFLOW),
 	CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
 	CRASHTYPE(REFCOUNT_INC_NOT_ZERO_OVERFLOW),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 8878538b2c13..8bd98e8f0443 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -60,6 +60,7 @@ void lkdtm_EXEC_USERSPACE(void);
 void lkdtm_EXEC_NULL(void);
 void lkdtm_ACCESS_USERSPACE(void);
 void lkdtm_ACCESS_NULL(void);
+void lkdtm_HIJACK_PATCH(void);
 
 /* lkdtm_refcount.c */
 void lkdtm_REFCOUNT_INC_OVERFLOW(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 2dede2ef658f..0ed32aba5216 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
 #include <linux/vmalloc.h>
 #include <linux/mman.h>
 #include <linux/uaccess.h>
+#include <linux/kthread.h>
 #include <asm/cacheflush.h>
 
 /* Whether or not to fill the target memory area with do_nothing(). */
@@ -222,6 +223,151 @@ void lkdtm_ACCESS_NULL(void)
 	pr_err("FAIL: survived bad write\n");
 }
 
+#if defined(CONFIG_PPC) || defined(CONFIG_X86_64)
+#if defined(CONFIG_STRICT_KERNEL_RWX) && defined(CONFIG_SMP)
+/*
+ * This is just a dummy location to patch-over.
+ */
+static void patching_target(void)
+{
+	return;
+}
+
+#ifdef CONFIG_PPC
+#include <asm/code-patching.h>
+struct ppc_inst * const patch_site = (struct ppc_inst *)&patching_target;
+#endif
+
+#ifdef CONFIG_X86_64
+#include <asm/text-patching.h>
+int * const patch_site = (int *)&patching_target;
+#endif
+
+static inline int lkdtm_do_patch(int data)
+{
+#ifdef CONFIG_PPC
+	return patch_instruction(patch_site, ppc_inst(data));
+#endif
+#ifdef CONFIG_X86_64
+	text_poke(patch_site, &data, sizeof(int));
+	return 0;
+#endif
+}
+
+static inline bool lkdtm_verify_patch(int data)
+{
+#ifdef CONFIG_PPC
+	return ppc_inst_equal(ppc_inst_read(READ_ONCE(patch_site)),
+			ppc_inst(data));
+#endif
+#ifdef CONFIG_X86_64
+	return READ_ONCE(*patch_site) == data;
+#endif
+}
+
+static int lkdtm_patching_cpu(void *data)
+{
+	int err = 0;
+	int val = 0xdeadbeef;
+
+	pr_info("starting patching_cpu=%d\n", smp_processor_id());
+	do {
+		err = lkdtm_do_patch(val);
+	} while (lkdtm_verify_patch(val) && !err && !kthread_should_stop());
+
+	if (err)
+		pr_warn("patch_instruction returned error: %d\n", err);
+
+	set_current_state(TASK_INTERRUPTIBLE);
+	while (!kthread_should_stop()) {
+		schedule();
+		set_current_state(TASK_INTERRUPTIBLE);
+	}
+
+	return err;
+}
+
+void lkdtm_HIJACK_PATCH(void)
+{
+#ifdef CONFIG_PPC
+	struct ppc_inst original_insn = ppc_inst_read(READ_ONCE(patch_site));
+#endif
+#ifdef CONFIG_X86_64
+	int original_insn = READ_ONCE(*patch_site);
+#endif
+	struct task_struct *patching_kthrd;
+	int patching_cpu, hijacker_cpu, attempts;
+	unsigned long addr;
+	bool hijacked;
+	const int bad_data = 0xbad00bad;
+
+	if (num_online_cpus() < 2) {
+		pr_warn("need at least two cpus\n");
+		return;
+	}
+
+	hijacker_cpu = smp_processor_id();
+	patching_cpu = cpumask_any_but(cpu_online_mask, hijacker_cpu);
+
+	patching_kthrd = kthread_create_on_node(&lkdtm_patching_cpu, NULL,
+						cpu_to_node(patching_cpu),
+						"lkdtm_patching_cpu");
+	kthread_bind(patching_kthrd, patching_cpu);
+	wake_up_process(patching_kthrd);
+
+	addr = offset_in_page(patch_site) | read_cpu_patching_addr(patching_cpu);
+
+	pr_info("starting hijacker_cpu=%d\n", hijacker_cpu);
+	for (attempts = 0; attempts < 100000; ++attempts) {
+		/* Use __put_user to catch faults without an Oops */
+		hijacked = !__put_user(bad_data, (int *)addr);
+
+		if (hijacked) {
+			if (kthread_stop(patching_kthrd))
+				pr_err("error trying to stop patching thread\n");
+			break;
+		}
+	}
+	pr_info("hijack attempts: %d\n", attempts);
+
+	if (hijacked) {
+		if (lkdtm_verify_patch(bad_data))
+			pr_err("overwrote kernel text\n");
+		/*
+		 * There are window conditions where the hijacker cpu manages to
+		 * write to the patch site but the site gets overwritten again by
+		 * the patching cpu. We still consider that a "successful" hijack
+		 * since the hijacker cpu did not fault on the write.
+		 */
+		pr_err("FAIL: wrote to another cpu's patching area\n");
+	} else {
+		kthread_stop(patching_kthrd);
+	}
+
+	/* Restore the original insn for any future lkdtm tests */
+#ifdef CONFIG_PPC
+	patch_instruction(patch_site, original_insn);
+#endif
+#ifdef CONFIG_X86_64
+	lkdtm_do_patch(original_insn);
+#endif
+}
+
+#else
+
+void lkdtm_HIJACK_PATCH(void)
+{
+	if (!IS_ENABLED(CONFIG_PPC) && !IS_ENABLED(CONFIG_X86_64))
+		pr_err("XFAIL: this test only runs on x86_64 or powerpc\n");
+	if (!IS_ENABLED(CONFIG_STRICT_KERNEL_RWX))
+		pr_err("XFAIL: this test requires CONFIG_STRICT_KERNEL_RWX\n");
+	if (!IS_ENABLED(CONFIG_SMP))
+		pr_err("XFAIL: this test requires CONFIG_SMP\n");
+}
+
+#endif /* CONFIG_STRICT_KERNEL_RWX && CONFIG_SMP */
+#endif /* CONFIG_PPC || CONFIG_X86_64 */
+
 void __init lkdtm_perms_init(void)
 {
 	/* Make sure we can write to __ro_after_init values during __init */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/6] Add LKDTM test to hijack a patch mapping (powerpc,x86_64)
@ 2020-08-27  5:26   ` Christopher M. Riedl
  0 siblings, 0 replies; 16+ messages in thread
From: Christopher M. Riedl @ 2020-08-27  5:26 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: kernel-hardening

When live patching with STRICT_KERNEL_RWX, the CPU doing the patching
must temporarily remap the page(s) containing the patch site with +W
permissions. While this temporary mapping is in use another CPU could
write to the same mapping and maliciously alter kernel text. Implement a
LKDTM test to attempt to exploit such an opening from another (ie. not
the patching) CPU. The test is implemented on x86_64 and powerpc only.

The LKDTM "hijack" test works as follows:

	1. A CPU executes an infinite loop to patch an instruction.
	   This is the "patching" CPU.
	2. Another CPU attempts to write to the address of the temporary
	   mapping used by the "patching" CPU. This other CPU is the
	   "hijacker" CPU. The hijack either fails with a segfault or
	   succeeds, in which case some kernel text is now overwritten.

How to run the test:

	mount -t debugfs none /sys/kernel/debug
	(echo HIJACK_PATCH > /sys/kernel/debug/provoke-crash/DIRECT)

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
 drivers/misc/lkdtm/core.c  |   1 +
 drivers/misc/lkdtm/lkdtm.h |   1 +
 drivers/misc/lkdtm/perms.c | 146 +++++++++++++++++++++++++++++++++++++
 3 files changed, 148 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index a5e344df9166..482e72f6a1e1 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -145,6 +145,7 @@ static const struct crashtype crashtypes[] = {
 	CRASHTYPE(WRITE_RO),
 	CRASHTYPE(WRITE_RO_AFTER_INIT),
 	CRASHTYPE(WRITE_KERN),
+	CRASHTYPE(HIJACK_PATCH),
 	CRASHTYPE(REFCOUNT_INC_OVERFLOW),
 	CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
 	CRASHTYPE(REFCOUNT_INC_NOT_ZERO_OVERFLOW),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 8878538b2c13..8bd98e8f0443 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -60,6 +60,7 @@ void lkdtm_EXEC_USERSPACE(void);
 void lkdtm_EXEC_NULL(void);
 void lkdtm_ACCESS_USERSPACE(void);
 void lkdtm_ACCESS_NULL(void);
+void lkdtm_HIJACK_PATCH(void);
 
 /* lkdtm_refcount.c */
 void lkdtm_REFCOUNT_INC_OVERFLOW(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 2dede2ef658f..0ed32aba5216 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
 #include <linux/vmalloc.h>
 #include <linux/mman.h>
 #include <linux/uaccess.h>
+#include <linux/kthread.h>
 #include <asm/cacheflush.h>
 
 /* Whether or not to fill the target memory area with do_nothing(). */
@@ -222,6 +223,151 @@ void lkdtm_ACCESS_NULL(void)
 	pr_err("FAIL: survived bad write\n");
 }
 
+#if defined(CONFIG_PPC) || defined(CONFIG_X86_64)
+#if defined(CONFIG_STRICT_KERNEL_RWX) && defined(CONFIG_SMP)
+/*
+ * This is just a dummy location to patch-over.
+ */
+static void patching_target(void)
+{
+	return;
+}
+
+#ifdef CONFIG_PPC
+#include <asm/code-patching.h>
+struct ppc_inst * const patch_site = (struct ppc_inst *)&patching_target;
+#endif
+
+#ifdef CONFIG_X86_64
+#include <asm/text-patching.h>
+int * const patch_site = (int *)&patching_target;
+#endif
+
+static inline int lkdtm_do_patch(int data)
+{
+#ifdef CONFIG_PPC
+	return patch_instruction(patch_site, ppc_inst(data));
+#endif
+#ifdef CONFIG_X86_64
+	text_poke(patch_site, &data, sizeof(int));
+	return 0;
+#endif
+}
+
+static inline bool lkdtm_verify_patch(int data)
+{
+#ifdef CONFIG_PPC
+	return ppc_inst_equal(ppc_inst_read(READ_ONCE(patch_site)),
+			ppc_inst(data));
+#endif
+#ifdef CONFIG_X86_64
+	return READ_ONCE(*patch_site) == data;
+#endif
+}
+
+static int lkdtm_patching_cpu(void *data)
+{
+	int err = 0;
+	int val = 0xdeadbeef;
+
+	pr_info("starting patching_cpu=%d\n", smp_processor_id());
+	do {
+		err = lkdtm_do_patch(val);
+	} while (lkdtm_verify_patch(val) && !err && !kthread_should_stop());
+
+	if (err)
+		pr_warn("patch_instruction returned error: %d\n", err);
+
+	set_current_state(TASK_INTERRUPTIBLE);
+	while (!kthread_should_stop()) {
+		schedule();
+		set_current_state(TASK_INTERRUPTIBLE);
+	}
+
+	return err;
+}
+
+void lkdtm_HIJACK_PATCH(void)
+{
+#ifdef CONFIG_PPC
+	struct ppc_inst original_insn = ppc_inst_read(READ_ONCE(patch_site));
+#endif
+#ifdef CONFIG_X86_64
+	int original_insn = READ_ONCE(*patch_site);
+#endif
+	struct task_struct *patching_kthrd;
+	int patching_cpu, hijacker_cpu, attempts;
+	unsigned long addr;
+	bool hijacked;
+	const int bad_data = 0xbad00bad;
+
+	if (num_online_cpus() < 2) {
+		pr_warn("need at least two cpus\n");
+		return;
+	}
+
+	hijacker_cpu = smp_processor_id();
+	patching_cpu = cpumask_any_but(cpu_online_mask, hijacker_cpu);
+
+	patching_kthrd = kthread_create_on_node(&lkdtm_patching_cpu, NULL,
+						cpu_to_node(patching_cpu),
+						"lkdtm_patching_cpu");
+	kthread_bind(patching_kthrd, patching_cpu);
+	wake_up_process(patching_kthrd);
+
+	addr = offset_in_page(patch_site) | read_cpu_patching_addr(patching_cpu);
+
+	pr_info("starting hijacker_cpu=%d\n", hijacker_cpu);
+	for (attempts = 0; attempts < 100000; ++attempts) {
+		/* Use __put_user to catch faults without an Oops */
+		hijacked = !__put_user(bad_data, (int *)addr);
+
+		if (hijacked) {
+			if (kthread_stop(patching_kthrd))
+				pr_err("error trying to stop patching thread\n");
+			break;
+		}
+	}
+	pr_info("hijack attempts: %d\n", attempts);
+
+	if (hijacked) {
+		if (lkdtm_verify_patch(bad_data))
+			pr_err("overwrote kernel text\n");
+		/*
+		 * There are window conditions where the hijacker cpu manages to
+		 * write to the patch site but the site gets overwritten again by
+		 * the patching cpu. We still consider that a "successful" hijack
+		 * since the hijacker cpu did not fault on the write.
+		 */
+		pr_err("FAIL: wrote to another cpu's patching area\n");
+	} else {
+		kthread_stop(patching_kthrd);
+	}
+
+	/* Restore the original insn for any future lkdtm tests */
+#ifdef CONFIG_PPC
+	patch_instruction(patch_site, original_insn);
+#endif
+#ifdef CONFIG_X86_64
+	lkdtm_do_patch(original_insn);
+#endif
+}
+
+#else
+
+void lkdtm_HIJACK_PATCH(void)
+{
+	if (!IS_ENABLED(CONFIG_PPC) && !IS_ENABLED(CONFIG_X86_64))
+		pr_err("XFAIL: this test only runs on x86_64 or powerpc\n");
+	if (!IS_ENABLED(CONFIG_STRICT_KERNEL_RWX))
+		pr_err("XFAIL: this test requires CONFIG_STRICT_KERNEL_RWX\n");
+	if (!IS_ENABLED(CONFIG_SMP))
+		pr_err("XFAIL: this test requires CONFIG_SMP\n");
+}
+
+#endif /* CONFIG_STRICT_KERNEL_RWX && CONFIG_SMP */
+#endif /* CONFIG_PPC || CONFIG_X86_64 */
+
 void __init lkdtm_perms_init(void)
 {
 	/* Make sure we can write to __ro_after_init values during __init */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/6] powerpc: Introduce temporary mm
  2020-08-27  5:26 [PATCH v3 0/6] Use per-CPU temporary mappings for patching Christopher M. Riedl
                   ` (2 preceding siblings ...)
  2020-08-27  5:26   ` [PATCH v3 3/6] Add LKDTM test to hijack a patch mapping (powerpc,x86_64) Christopher M. Riedl
@ 2020-08-27  5:26 ` Christopher M. Riedl
  2020-08-27 14:15     ` Jann Horn
  2020-08-27  5:26 ` [PATCH v3 5/6] powerpc: Initialize a temporary mm for code patching Christopher M. Riedl
  2020-08-27  5:26 ` [PATCH v3 6/6] powerpc: Use " Christopher M. Riedl
  5 siblings, 1 reply; 16+ messages in thread
From: Christopher M. Riedl @ 2020-08-27  5:26 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: kernel-hardening

x86 supports the notion of a temporary mm which restricts access to
temporary PTEs to a single CPU. A temporary mm is useful for situations
where a CPU needs to perform sensitive operations (such as patching a
STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
said mappings to other CPUs. A side benefit is that other CPU TLBs do
not need to be flushed when the temporary mm is torn down.

Mappings in the temporary mm can be set in the userspace portion of the
address-space.

Interrupts must be disabled while the temporary mm is in use. HW
breakpoints, which may have been set by userspace as watchpoints on
addresses now within the temporary mm, are saved and disabled when
loading the temporary mm. The HW breakpoints are restored when unloading
the temporary mm. All HW breakpoints are indiscriminately disabled while
the temporary mm is in use.

Based on x86 implementation:

commit cefa929c034e
("x86/mm: Introduce temporary mm structs")

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
 arch/powerpc/include/asm/debug.h |  1 +
 arch/powerpc/kernel/process.c    |  5 +++
 arch/powerpc/lib/code-patching.c | 65 ++++++++++++++++++++++++++++++++
 3 files changed, 71 insertions(+)

diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
index ec57daf87f40..827350c9bcf3 100644
--- a/arch/powerpc/include/asm/debug.h
+++ b/arch/powerpc/include/asm/debug.h
@@ -46,6 +46,7 @@ static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; }
 #endif
 
 void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk);
+void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk);
 bool ppc_breakpoint_available(void);
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
 extern void do_send_trap(struct pt_regs *regs, unsigned long address,
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 016bd831908e..0758a8db6342 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -843,6 +843,11 @@ static inline int set_breakpoint_8xx(struct arch_hw_breakpoint *brk)
 	return 0;
 }
 
+void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk)
+{
+	memcpy(brk, this_cpu_ptr(&current_brk[nr]), sizeof(*brk));
+}
+
 void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk)
 {
 	memcpy(this_cpu_ptr(&current_brk[nr]), brk, sizeof(*brk));
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 85d3fdca9452..89b37ece6d2f 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -17,6 +17,7 @@
 #include <asm/code-patching.h>
 #include <asm/setup.h>
 #include <asm/inst.h>
+#include <asm/mmu_context.h>
 
 static int __patch_instruction(struct ppc_inst *exec_addr, struct ppc_inst instr,
 			       struct ppc_inst *patch_addr)
@@ -44,6 +45,70 @@ int raw_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
 }
 
 #ifdef CONFIG_STRICT_KERNEL_RWX
+
+struct temp_mm {
+	struct mm_struct *temp;
+	struct mm_struct *prev;
+	bool is_kernel_thread;
+	struct arch_hw_breakpoint brk[HBP_NUM_MAX];
+};
+
+static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
+{
+	temp_mm->temp = mm;
+	temp_mm->prev = NULL;
+	temp_mm->is_kernel_thread = false;
+	memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
+}
+
+static inline void use_temporary_mm(struct temp_mm *temp_mm)
+{
+	lockdep_assert_irqs_disabled();
+
+	temp_mm->is_kernel_thread = current->mm == NULL;
+	if (temp_mm->is_kernel_thread)
+		temp_mm->prev = current->active_mm;
+	else
+		temp_mm->prev = current->mm;
+
+	/*
+	 * Hash requires a non-NULL current->mm to allocate a userspace address
+	 * when handling a page fault. Does not appear to hurt in Radix either.
+	 */
+	current->mm = temp_mm->temp;
+	switch_mm_irqs_off(NULL, temp_mm->temp, current);
+
+	if (ppc_breakpoint_available()) {
+		struct arch_hw_breakpoint null_brk = {0};
+		int i = 0;
+
+		for (; i < nr_wp_slots(); ++i) {
+			__get_breakpoint(i, &temp_mm->brk[i]);
+			if (temp_mm->brk[i].type != 0)
+				__set_breakpoint(i, &null_brk);
+		}
+	}
+}
+
+static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
+{
+	lockdep_assert_irqs_disabled();
+
+	if (temp_mm->is_kernel_thread)
+		current->mm = NULL;
+	else
+		current->mm = temp_mm->prev;
+	switch_mm_irqs_off(NULL, temp_mm->prev, current);
+
+	if (ppc_breakpoint_available()) {
+		int i = 0;
+
+		for (; i < nr_wp_slots(); ++i)
+			if (temp_mm->brk[i].type != 0)
+				__set_breakpoint(i, &temp_mm->brk[i]);
+	}
+}
+
 static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
 
 #ifdef CONFIG_LKDTM
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 5/6] powerpc: Initialize a temporary mm for code patching
  2020-08-27  5:26 [PATCH v3 0/6] Use per-CPU temporary mappings for patching Christopher M. Riedl
                   ` (3 preceding siblings ...)
  2020-08-27  5:26 ` [PATCH v3 4/6] powerpc: Introduce temporary mm Christopher M. Riedl
@ 2020-08-27  5:26 ` Christopher M. Riedl
  2020-08-27  5:26 ` [PATCH v3 6/6] powerpc: Use " Christopher M. Riedl
  5 siblings, 0 replies; 16+ messages in thread
From: Christopher M. Riedl @ 2020-08-27  5:26 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: kernel-hardening

When code patching a STRICT_KERNEL_RWX kernel the page containing the
address to be patched is temporarily mapped with permissive memory
protections. Currently, a per-cpu vmalloc patch area is used for this
purpose. While the patch area is per-cpu, the temporary page mapping is
inserted into the kernel page tables for the duration of the patching.
The mapping is exposed to CPUs other than the patching CPU - this is
undesirable from a hardening perspective.

Use the `poking_init` init hook to prepare a temporary mm and patching
address. Initialize the temporary mm by copying the init mm. Choose a
randomized patching address inside the temporary mm userspace address
portion. The next patch uses the temporary mm and patching address for
code patching.

Based on x86 implementation:

commit 4fc19708b165
("x86/alternatives: Initialize temporary mm for patching")

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
 arch/powerpc/lib/code-patching.c | 40 ++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 89b37ece6d2f..051d7ae6d8ee 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -11,6 +11,8 @@
 #include <linux/cpuhotplug.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
+#include <linux/sched/task.h>
+#include <linux/random.h>
 
 #include <asm/tlbflush.h>
 #include <asm/page.h>
@@ -109,6 +111,44 @@ static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
 	}
 }
 
+static struct mm_struct *patching_mm __ro_after_init;
+static unsigned long patching_addr __ro_after_init;
+
+void __init poking_init(void)
+{
+	spinlock_t *ptl; /* for protecting pte table */
+	pte_t *ptep;
+
+	/*
+	 * Some parts of the kernel (static keys for example) depend on
+	 * successful code patching. Code patching under STRICT_KERNEL_RWX
+	 * requires this setup - otherwise we cannot patch at all. We use
+	 * BUG_ON() here and later since an early failure is preferred to
+	 * buggy behavior and/or strange crashes later.
+	 */
+	patching_mm = copy_init_mm();
+	BUG_ON(!patching_mm);
+
+	/*
+	 * Choose a randomized, page-aligned address from the range:
+	 * [PAGE_SIZE, DEFAULT_MAP_WINDOW - PAGE_SIZE]
+	 * The lower address bound is PAGE_SIZE to avoid the zero-page.
+	 * The upper address bound is DEFAULT_MAP_WINDOW - PAGE_SIZE to stay
+	 * under DEFAULT_MAP_WINDOW in hash.
+	 */
+	patching_addr = PAGE_SIZE + ((get_random_long() & PAGE_MASK)
+			% (DEFAULT_MAP_WINDOW - 2 * PAGE_SIZE));
+
+	/*
+	 * PTE allocation uses GFP_KERNEL which means we need to pre-allocate
+	 * the PTE here. We cannot do the allocation during patching with IRQs
+	 * disabled (ie. "atomic" context).
+	 */
+	ptep = get_locked_pte(patching_mm, patching_addr, &ptl);
+	BUG_ON(!ptep);
+	pte_unmap_unlock(ptep, ptl);
+}
+
 static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
 
 #ifdef CONFIG_LKDTM
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 6/6] powerpc: Use a temporary mm for code patching
  2020-08-27  5:26 [PATCH v3 0/6] Use per-CPU temporary mappings for patching Christopher M. Riedl
                   ` (4 preceding siblings ...)
  2020-08-27  5:26 ` [PATCH v3 5/6] powerpc: Initialize a temporary mm for code patching Christopher M. Riedl
@ 2020-08-27  5:26 ` Christopher M. Riedl
  5 siblings, 0 replies; 16+ messages in thread
From: Christopher M. Riedl @ 2020-08-27  5:26 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: kernel-hardening

Currently, code patching a STRICT_KERNEL_RWX exposes the temporary
mappings to other CPUs. These mappings should be kept local to the CPU
doing the patching. Use the pre-initialized temporary mm and patching
address for this purpose. Also add a check after patching to ensure the
patch succeeded.

Toggle KUAP on non-radix MMU platforms when patching since the temporary
mapping for patching uses a userspace address. With a radix MMU this is
not required since mapping a page with PAGE_KERNEL sets EAA[0] for the
PTE which means the AMR (KUAP) protection is ignored (see PowerISA
v3.0b, Fig. 35).

Based on x86 implementation:

commit b3fd8e83ada0
("x86/alternatives: Use temporary mm for text poking")

Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
 arch/powerpc/lib/code-patching.c | 153 +++++++++++--------------------
 1 file changed, 54 insertions(+), 99 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 051d7ae6d8ee..9fb098680da8 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -149,113 +149,64 @@ void __init poking_init(void)
 	pte_unmap_unlock(ptep, ptl);
 }
 
-static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
-
 #ifdef CONFIG_LKDTM
 unsigned long read_cpu_patching_addr(unsigned int cpu)
 {
-	return (unsigned long)(per_cpu(text_poke_area, cpu))->addr;
+	return patching_addr;
 }
 #endif
 
-static int text_area_cpu_up(unsigned int cpu)
-{
-	struct vm_struct *area;
-
-	area = get_vm_area(PAGE_SIZE, VM_ALLOC);
-	if (!area) {
-		WARN_ONCE(1, "Failed to create text area for cpu %d\n",
-			cpu);
-		return -1;
-	}
-	this_cpu_write(text_poke_area, area);
-
-	return 0;
-}
-
-static int text_area_cpu_down(unsigned int cpu)
-{
-	free_vm_area(this_cpu_read(text_poke_area));
-	return 0;
-}
-
-/*
- * Run as a late init call. This allows all the boot time patching to be done
- * simply by patching the code, and then we're called here prior to
- * mark_rodata_ro(), which happens after all init calls are run. Although
- * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
- * it as being preferable to a kernel that will crash later when someone tries
- * to use patch_instruction().
- */
-static int __init setup_text_poke_area(void)
-{
-	BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
-		"powerpc/text_poke:online", text_area_cpu_up,
-		text_area_cpu_down));
-
-	return 0;
-}
-late_initcall(setup_text_poke_area);
+struct patch_mapping {
+	spinlock_t *ptl; /* for protecting pte table */
+	pte_t *ptep;
+	struct temp_mm temp_mm;
+};
 
 /*
  * This can be called for kernel text or a module.
  */
-static int map_patch_area(void *addr, unsigned long text_poke_addr)
+static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
 {
-	unsigned long pfn;
-	int err;
+	struct page *page;
+	pte_t pte;
+	pgprot_t pgprot;
 
 	if (is_vmalloc_or_module_addr(addr))
-		pfn = vmalloc_to_pfn(addr);
+		page = vmalloc_to_page(addr);
 	else
-		pfn = __pa_symbol(addr) >> PAGE_SHIFT;
+		page = virt_to_page(addr);
 
-	err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
+	if (radix_enabled())
+		pgprot = PAGE_KERNEL;
+	else
+		pgprot = PAGE_SHARED;
 
-	pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err);
-	if (err)
+	patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr,
+					     &patch_mapping->ptl);
+	if (unlikely(!patch_mapping->ptep)) {
+		pr_warn("map patch: failed to allocate pte for patching\n");
 		return -1;
+	}
+
+	pte = mk_pte(page, pgprot);
+	pte = pte_mkdirty(pte);
+	set_pte_at(patching_mm, patching_addr, patch_mapping->ptep, pte);
+
+	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
+	use_temporary_mm(&patch_mapping->temp_mm);
 
 	return 0;
 }
 
-static inline int unmap_patch_area(unsigned long addr)
+static void unmap_patch(struct patch_mapping *patch_mapping)
 {
-	pte_t *ptep;
-	pmd_t *pmdp;
-	pud_t *pudp;
-	p4d_t *p4dp;
-	pgd_t *pgdp;
-
-	pgdp = pgd_offset_k(addr);
-	if (unlikely(!pgdp))
-		return -EINVAL;
-
-	p4dp = p4d_offset(pgdp, addr);
-	if (unlikely(!p4dp))
-		return -EINVAL;
-
-	pudp = pud_offset(p4dp, addr);
-	if (unlikely(!pudp))
-		return -EINVAL;
-
-	pmdp = pmd_offset(pudp, addr);
-	if (unlikely(!pmdp))
-		return -EINVAL;
+	/* In hash, pte_clear flushes the tlb */
+	pte_clear(patching_mm, patching_addr, patch_mapping->ptep);
+	unuse_temporary_mm(&patch_mapping->temp_mm);
 
-	ptep = pte_offset_kernel(pmdp, addr);
-	if (unlikely(!ptep))
-		return -EINVAL;
-
-	pr_devel("clearing mm %p, pte %p, addr %lx\n", &init_mm, ptep, addr);
-
-	/*
-	 * In hash, pte_clear flushes the tlb, in radix, we have to
-	 */
-	pte_clear(&init_mm, addr, ptep);
-	flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
-
-	return 0;
+	/* In radix, we have to explicitly flush the tlb (no-op in hash) */
+	local_flush_tlb_mm(patching_mm);
+	pte_unmap_unlock(patch_mapping->ptep, patch_mapping->ptl);
 }
 
 static int do_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
@@ -263,32 +214,36 @@ static int do_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
 	int err;
 	struct ppc_inst *patch_addr = NULL;
 	unsigned long flags;
-	unsigned long text_poke_addr;
-	unsigned long kaddr = (unsigned long)addr;
+	struct patch_mapping patch_mapping;
 
 	/*
-	 * During early early boot patch_instruction is called
-	 * when text_poke_area is not ready, but we still need
-	 * to allow patching. We just do the plain old patching
+	 * The patching_mm is initialized before calling mark_rodata_ro. Prior
+	 * to this, patch_instruction is called when we don't have (and don't
+	 * need) the patching_mm so just do plain old patching.
 	 */
-	if (!this_cpu_read(text_poke_area))
+	if (!patching_mm)
 		return raw_patch_instruction(addr, instr);
 
 	local_irq_save(flags);
 
-	text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr;
-	if (map_patch_area(addr, text_poke_addr)) {
-		err = -1;
+	err = map_patch(addr, &patch_mapping);
+	if (err)
 		goto out;
-	}
 
-	patch_addr = (struct ppc_inst *)(text_poke_addr + (kaddr & ~PAGE_MASK));
+	patch_addr = (struct ppc_inst *)(patching_addr | offset_in_page(addr));
 
-	__patch_instruction(addr, instr, patch_addr);
+	if (!radix_enabled())
+		allow_write_to_user(patch_addr, ppc_inst_len(instr));
+	err = __patch_instruction(addr, instr, patch_addr);
+	if (!radix_enabled())
+		prevent_write_to_user(patch_addr, ppc_inst_len(instr));
 
-	err = unmap_patch_area(text_poke_addr);
-	if (err)
-		pr_warn("failed to unmap %lx\n", text_poke_addr);
+	unmap_patch(&patch_mapping);
+	/*
+	 * Something is wrong if what we just wrote doesn't match what we
+	 * think we just wrote.
+	 */
+	WARN_ON(!ppc_inst_equal(ppc_inst_read(addr), instr));
 
 out:
 	local_irq_restore(flags);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/6] Add LKDTM test to hijack a patch mapping (powerpc, x86_64)
  2020-08-27  5:26   ` [PATCH v3 3/6] Add LKDTM test to hijack a patch mapping (powerpc,x86_64) Christopher M. Riedl
@ 2020-08-27 10:11     ` kernel test robot
  -1 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2020-08-27 10:11 UTC (permalink / raw)
  To: Christopher M. Riedl, linuxppc-dev; +Cc: kbuild-all, kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 4191 bytes --]

Hi "Christopher,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on char-misc/char-misc-testing tip/x86/core v5.9-rc2 next-20200827]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Christopher-M-Riedl/Use-per-CPU-temporary-mappings-for-patching/20200827-161532
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: x86_64-allmodconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
        # save the attached .config to linux build tree
        make W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/misc/lkdtm/perms.c: In function 'lkdtm_HIJACK_PATCH':
>> drivers/misc/lkdtm/perms.c:318:38: error: implicit declaration of function 'read_cpu_patching_addr' [-Werror=implicit-function-declaration]
     318 |  addr = offset_in_page(patch_site) | read_cpu_patching_addr(patching_cpu);
         |                                      ^~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

# https://github.com/0day-ci/linux/commit/36a98d779ee4620e6e091cbe3b438b52faa108ad
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Christopher-M-Riedl/Use-per-CPU-temporary-mappings-for-patching/20200827-161532
git checkout 36a98d779ee4620e6e091cbe3b438b52faa108ad
vim +/read_cpu_patching_addr +318 drivers/misc/lkdtm/perms.c

   289	
   290	void lkdtm_HIJACK_PATCH(void)
   291	{
   292	#ifdef CONFIG_PPC
   293		struct ppc_inst original_insn = ppc_inst_read(READ_ONCE(patch_site));
   294	#endif
   295	#ifdef CONFIG_X86_64
   296		int original_insn = READ_ONCE(*patch_site);
   297	#endif
   298		struct task_struct *patching_kthrd;
   299		int patching_cpu, hijacker_cpu, attempts;
   300		unsigned long addr;
   301		bool hijacked;
   302		const int bad_data = 0xbad00bad;
   303	
   304		if (num_online_cpus() < 2) {
   305			pr_warn("need at least two cpus\n");
   306			return;
   307		}
   308	
   309		hijacker_cpu = smp_processor_id();
   310		patching_cpu = cpumask_any_but(cpu_online_mask, hijacker_cpu);
   311	
   312		patching_kthrd = kthread_create_on_node(&lkdtm_patching_cpu, NULL,
   313							cpu_to_node(patching_cpu),
   314							"lkdtm_patching_cpu");
   315		kthread_bind(patching_kthrd, patching_cpu);
   316		wake_up_process(patching_kthrd);
   317	
 > 318		addr = offset_in_page(patch_site) | read_cpu_patching_addr(patching_cpu);
   319	
   320		pr_info("starting hijacker_cpu=%d\n", hijacker_cpu);
   321		for (attempts = 0; attempts < 100000; ++attempts) {
   322			/* Use __put_user to catch faults without an Oops */
   323			hijacked = !__put_user(bad_data, (int *)addr);
   324	
   325			if (hijacked) {
   326				if (kthread_stop(patching_kthrd))
   327					pr_err("error trying to stop patching thread\n");
   328				break;
   329			}
   330		}
   331		pr_info("hijack attempts: %d\n", attempts);
   332	
   333		if (hijacked) {
   334			if (lkdtm_verify_patch(bad_data))
   335				pr_err("overwrote kernel text\n");
   336			/*
   337			 * There are window conditions where the hijacker cpu manages to
   338			 * write to the patch site but the site gets overwritten again by
   339			 * the patching cpu. We still consider that a "successful" hijack
   340			 * since the hijacker cpu did not fault on the write.
   341			 */
   342			pr_err("FAIL: wrote to another cpu's patching area\n");
   343		} else {
   344			kthread_stop(patching_kthrd);
   345		}
   346	
   347		/* Restore the original insn for any future lkdtm tests */
   348	#ifdef CONFIG_PPC
   349		patch_instruction(patch_site, original_insn);
   350	#endif
   351	#ifdef CONFIG_X86_64
   352		lkdtm_do_patch(original_insn);
   353	#endif
   354	}
   355	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 76556 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/6] Add LKDTM test to hijack a patch mapping (powerpc, x86_64)
@ 2020-08-27 10:11     ` kernel test robot
  0 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2020-08-27 10:11 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4294 bytes --]

Hi "Christopher,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on char-misc/char-misc-testing tip/x86/core v5.9-rc2 next-20200827]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Christopher-M-Riedl/Use-per-CPU-temporary-mappings-for-patching/20200827-161532
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: x86_64-allmodconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
        # save the attached .config to linux build tree
        make W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/misc/lkdtm/perms.c: In function 'lkdtm_HIJACK_PATCH':
>> drivers/misc/lkdtm/perms.c:318:38: error: implicit declaration of function 'read_cpu_patching_addr' [-Werror=implicit-function-declaration]
     318 |  addr = offset_in_page(patch_site) | read_cpu_patching_addr(patching_cpu);
         |                                      ^~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

# https://github.com/0day-ci/linux/commit/36a98d779ee4620e6e091cbe3b438b52faa108ad
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Christopher-M-Riedl/Use-per-CPU-temporary-mappings-for-patching/20200827-161532
git checkout 36a98d779ee4620e6e091cbe3b438b52faa108ad
vim +/read_cpu_patching_addr +318 drivers/misc/lkdtm/perms.c

   289	
   290	void lkdtm_HIJACK_PATCH(void)
   291	{
   292	#ifdef CONFIG_PPC
   293		struct ppc_inst original_insn = ppc_inst_read(READ_ONCE(patch_site));
   294	#endif
   295	#ifdef CONFIG_X86_64
   296		int original_insn = READ_ONCE(*patch_site);
   297	#endif
   298		struct task_struct *patching_kthrd;
   299		int patching_cpu, hijacker_cpu, attempts;
   300		unsigned long addr;
   301		bool hijacked;
   302		const int bad_data = 0xbad00bad;
   303	
   304		if (num_online_cpus() < 2) {
   305			pr_warn("need@least two cpus\n");
   306			return;
   307		}
   308	
   309		hijacker_cpu = smp_processor_id();
   310		patching_cpu = cpumask_any_but(cpu_online_mask, hijacker_cpu);
   311	
   312		patching_kthrd = kthread_create_on_node(&lkdtm_patching_cpu, NULL,
   313							cpu_to_node(patching_cpu),
   314							"lkdtm_patching_cpu");
   315		kthread_bind(patching_kthrd, patching_cpu);
   316		wake_up_process(patching_kthrd);
   317	
 > 318		addr = offset_in_page(patch_site) | read_cpu_patching_addr(patching_cpu);
   319	
   320		pr_info("starting hijacker_cpu=%d\n", hijacker_cpu);
   321		for (attempts = 0; attempts < 100000; ++attempts) {
   322			/* Use __put_user to catch faults without an Oops */
   323			hijacked = !__put_user(bad_data, (int *)addr);
   324	
   325			if (hijacked) {
   326				if (kthread_stop(patching_kthrd))
   327					pr_err("error trying to stop patching thread\n");
   328				break;
   329			}
   330		}
   331		pr_info("hijack attempts: %d\n", attempts);
   332	
   333		if (hijacked) {
   334			if (lkdtm_verify_patch(bad_data))
   335				pr_err("overwrote kernel text\n");
   336			/*
   337			 * There are window conditions where the hijacker cpu manages to
   338			 * write to the patch site but the site gets overwritten again by
   339			 * the patching cpu. We still consider that a "successful" hijack
   340			 * since the hijacker cpu did not fault on the write.
   341			 */
   342			pr_err("FAIL: wrote to another cpu's patching area\n");
   343		} else {
   344			kthread_stop(patching_kthrd);
   345		}
   346	
   347		/* Restore the original insn for any future lkdtm tests */
   348	#ifdef CONFIG_PPC
   349		patch_instruction(patch_site, original_insn);
   350	#endif
   351	#ifdef CONFIG_X86_64
   352		lkdtm_do_patch(original_insn);
   353	#endif
   354	}
   355	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 76556 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/6] powerpc: Introduce temporary mm
  2020-08-27  5:26 ` [PATCH v3 4/6] powerpc: Introduce temporary mm Christopher M. Riedl
@ 2020-08-27 14:15     ` Jann Horn
  0 siblings, 0 replies; 16+ messages in thread
From: Jann Horn @ 2020-08-27 14:15 UTC (permalink / raw)
  To: Christopher M. Riedl; +Cc: linuxppc-dev, Kernel Hardening

On Thu, Aug 27, 2020 at 7:24 AM Christopher M. Riedl <cmr@codefail.de> wrote:
> x86 supports the notion of a temporary mm which restricts access to
> temporary PTEs to a single CPU. A temporary mm is useful for situations
> where a CPU needs to perform sensitive operations (such as patching a
> STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
> said mappings to other CPUs. A side benefit is that other CPU TLBs do
> not need to be flushed when the temporary mm is torn down.
>
> Mappings in the temporary mm can be set in the userspace portion of the
> address-space.
[...]
> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
[...]
> @@ -44,6 +45,70 @@ int raw_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
>  }
>
>  #ifdef CONFIG_STRICT_KERNEL_RWX
> +
> +struct temp_mm {
> +       struct mm_struct *temp;
> +       struct mm_struct *prev;
> +       bool is_kernel_thread;
> +       struct arch_hw_breakpoint brk[HBP_NUM_MAX];
> +};
> +
> +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
> +{
> +       temp_mm->temp = mm;
> +       temp_mm->prev = NULL;
> +       temp_mm->is_kernel_thread = false;
> +       memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
> +}
> +
> +static inline void use_temporary_mm(struct temp_mm *temp_mm)
> +{
> +       lockdep_assert_irqs_disabled();
> +
> +       temp_mm->is_kernel_thread = current->mm == NULL;

(That's a somewhat misleading variable name - kernel threads can have
a non-NULL ->mm, too.)

> +       if (temp_mm->is_kernel_thread)
> +               temp_mm->prev = current->active_mm;
> +       else
> +               temp_mm->prev = current->mm;

Why the branch? Shouldn't current->active_mm work in both cases?


> +       /*
> +        * Hash requires a non-NULL current->mm to allocate a userspace address
> +        * when handling a page fault. Does not appear to hurt in Radix either.
> +        */
> +       current->mm = temp_mm->temp;

This looks dangerous to me. There are various places that attempt to
find all userspace tasks that use a given mm by iterating through all
tasks on the system and comparing each task's ->mm pointer to
current's. Things like current_is_single_threaded() as part of various
security checks, mm_update_next_owner(), zap_threads(), and so on. So
if this is reachable from userspace task context (which I think it
is?), I don't think we're allowed to switch out the ->mm pointer here.


> +       switch_mm_irqs_off(NULL, temp_mm->temp, current);

switch_mm_irqs_off() calls switch_mmu_context(), which in the nohash
implementation increments next->context.active and decrements
prev->context.active if prev is non-NULL, right? So this would
increase temp_mm->temp->context.active...

> +       if (ppc_breakpoint_available()) {
> +               struct arch_hw_breakpoint null_brk = {0};
> +               int i = 0;
> +
> +               for (; i < nr_wp_slots(); ++i) {
> +                       __get_breakpoint(i, &temp_mm->brk[i]);
> +                       if (temp_mm->brk[i].type != 0)
> +                               __set_breakpoint(i, &null_brk);
> +               }
> +       }
> +}
> +
> +static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
> +{
> +       lockdep_assert_irqs_disabled();
> +
> +       if (temp_mm->is_kernel_thread)
> +               current->mm = NULL;
> +       else
> +               current->mm = temp_mm->prev;
> +       switch_mm_irqs_off(NULL, temp_mm->prev, current);

... whereas this would increase temp_mm->prev->context.active. As far
as I can tell, that'll mean that both the original mm and the patching
mm will have their .active counts permanently too high after
use_temporary_mm()+unuse_temporary_mm()?

> +       if (ppc_breakpoint_available()) {
> +               int i = 0;
> +
> +               for (; i < nr_wp_slots(); ++i)
> +                       if (temp_mm->brk[i].type != 0)
> +                               __set_breakpoint(i, &temp_mm->brk[i]);
> +       }
> +}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/6] powerpc: Introduce temporary mm
@ 2020-08-27 14:15     ` Jann Horn
  0 siblings, 0 replies; 16+ messages in thread
From: Jann Horn @ 2020-08-27 14:15 UTC (permalink / raw)
  To: Christopher M. Riedl; +Cc: linuxppc-dev, Kernel Hardening

On Thu, Aug 27, 2020 at 7:24 AM Christopher M. Riedl <cmr@codefail.de> wrote:
> x86 supports the notion of a temporary mm which restricts access to
> temporary PTEs to a single CPU. A temporary mm is useful for situations
> where a CPU needs to perform sensitive operations (such as patching a
> STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
> said mappings to other CPUs. A side benefit is that other CPU TLBs do
> not need to be flushed when the temporary mm is torn down.
>
> Mappings in the temporary mm can be set in the userspace portion of the
> address-space.
[...]
> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
[...]
> @@ -44,6 +45,70 @@ int raw_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
>  }
>
>  #ifdef CONFIG_STRICT_KERNEL_RWX
> +
> +struct temp_mm {
> +       struct mm_struct *temp;
> +       struct mm_struct *prev;
> +       bool is_kernel_thread;
> +       struct arch_hw_breakpoint brk[HBP_NUM_MAX];
> +};
> +
> +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
> +{
> +       temp_mm->temp = mm;
> +       temp_mm->prev = NULL;
> +       temp_mm->is_kernel_thread = false;
> +       memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
> +}
> +
> +static inline void use_temporary_mm(struct temp_mm *temp_mm)
> +{
> +       lockdep_assert_irqs_disabled();
> +
> +       temp_mm->is_kernel_thread = current->mm == NULL;

(That's a somewhat misleading variable name - kernel threads can have
a non-NULL ->mm, too.)

> +       if (temp_mm->is_kernel_thread)
> +               temp_mm->prev = current->active_mm;
> +       else
> +               temp_mm->prev = current->mm;

Why the branch? Shouldn't current->active_mm work in both cases?


> +       /*
> +        * Hash requires a non-NULL current->mm to allocate a userspace address
> +        * when handling a page fault. Does not appear to hurt in Radix either.
> +        */
> +       current->mm = temp_mm->temp;

This looks dangerous to me. There are various places that attempt to
find all userspace tasks that use a given mm by iterating through all
tasks on the system and comparing each task's ->mm pointer to
current's. Things like current_is_single_threaded() as part of various
security checks, mm_update_next_owner(), zap_threads(), and so on. So
if this is reachable from userspace task context (which I think it
is?), I don't think we're allowed to switch out the ->mm pointer here.


> +       switch_mm_irqs_off(NULL, temp_mm->temp, current);

switch_mm_irqs_off() calls switch_mmu_context(), which in the nohash
implementation increments next->context.active and decrements
prev->context.active if prev is non-NULL, right? So this would
increase temp_mm->temp->context.active...

> +       if (ppc_breakpoint_available()) {
> +               struct arch_hw_breakpoint null_brk = {0};
> +               int i = 0;
> +
> +               for (; i < nr_wp_slots(); ++i) {
> +                       __get_breakpoint(i, &temp_mm->brk[i]);
> +                       if (temp_mm->brk[i].type != 0)
> +                               __set_breakpoint(i, &null_brk);
> +               }
> +       }
> +}
> +
> +static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
> +{
> +       lockdep_assert_irqs_disabled();
> +
> +       if (temp_mm->is_kernel_thread)
> +               current->mm = NULL;
> +       else
> +               current->mm = temp_mm->prev;
> +       switch_mm_irqs_off(NULL, temp_mm->prev, current);

... whereas this would increase temp_mm->prev->context.active. As far
as I can tell, that'll mean that both the original mm and the patching
mm will have their .active counts permanently too high after
use_temporary_mm()+unuse_temporary_mm()?

> +       if (ppc_breakpoint_available()) {
> +               int i = 0;
> +
> +               for (; i < nr_wp_slots(); ++i)
> +                       if (temp_mm->brk[i].type != 0)
> +                               __set_breakpoint(i, &temp_mm->brk[i]);
> +       }
> +}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/6] Add LKDTM test to hijack a patch mapping (powerpc, x86_64)
  2020-08-27  5:26   ` [PATCH v3 3/6] Add LKDTM test to hijack a patch mapping (powerpc,x86_64) Christopher M. Riedl
@ 2020-08-27 18:10     ` kernel test robot
  -1 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2020-08-27 18:10 UTC (permalink / raw)
  To: Christopher M. Riedl, linuxppc-dev; +Cc: kbuild-all, kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 1365 bytes --]

Hi "Christopher,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on char-misc/char-misc-testing tip/x86/core v5.9-rc2 next-20200827]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Christopher-M-Riedl/Use-per-CPU-temporary-mappings-for-patching/20200827-161532
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: parisc-allyesconfig (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> hppa-linux-ld: drivers/misc/lkdtm/core.o:(.rodata+0x1b4): undefined reference to `lkdtm_HIJACK_PATCH'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 65969 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/6] Add LKDTM test to hijack a patch mapping (powerpc, x86_64)
@ 2020-08-27 18:10     ` kernel test robot
  0 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2020-08-27 18:10 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1397 bytes --]

Hi "Christopher,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on char-misc/char-misc-testing tip/x86/core v5.9-rc2 next-20200827]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Christopher-M-Riedl/Use-per-CPU-temporary-mappings-for-patching/20200827-161532
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: parisc-allyesconfig (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> hppa-linux-ld: drivers/misc/lkdtm/core.o:(.rodata+0x1b4): undefined reference to `lkdtm_HIJACK_PATCH'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 65969 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/6] powerpc: Introduce temporary mm
  2020-08-27 14:15     ` Jann Horn
@ 2020-09-07  0:15       ` Christopher M. Riedl
  -1 siblings, 0 replies; 16+ messages in thread
From: Christopher M. Riedl @ 2020-09-07  0:15 UTC (permalink / raw)
  To: Jann Horn; +Cc: linuxppc-dev, Kernel Hardening

On Thu Aug 27, 2020 at 11:15 AM CDT, Jann Horn wrote:
> On Thu, Aug 27, 2020 at 7:24 AM Christopher M. Riedl <cmr@codefail.de>
> wrote:
> > x86 supports the notion of a temporary mm which restricts access to
> > temporary PTEs to a single CPU. A temporary mm is useful for situations
> > where a CPU needs to perform sensitive operations (such as patching a
> > STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
> > said mappings to other CPUs. A side benefit is that other CPU TLBs do
> > not need to be flushed when the temporary mm is torn down.
> >
> > Mappings in the temporary mm can be set in the userspace portion of the
> > address-space.
> [...]
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> [...]
> > @@ -44,6 +45,70 @@ int raw_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
> >  }
> >
> >  #ifdef CONFIG_STRICT_KERNEL_RWX
> > +
> > +struct temp_mm {
> > +       struct mm_struct *temp;
> > +       struct mm_struct *prev;
> > +       bool is_kernel_thread;
> > +       struct arch_hw_breakpoint brk[HBP_NUM_MAX];
> > +};
> > +
> > +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
> > +{
> > +       temp_mm->temp = mm;
> > +       temp_mm->prev = NULL;
> > +       temp_mm->is_kernel_thread = false;
> > +       memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
> > +}
> > +
> > +static inline void use_temporary_mm(struct temp_mm *temp_mm)
> > +{
> > +       lockdep_assert_irqs_disabled();
> > +
> > +       temp_mm->is_kernel_thread = current->mm == NULL;
>
> (That's a somewhat misleading variable name - kernel threads can have
> a non-NULL ->mm, too.)
>

Oh I didn't know that, in that case yes this is not a good name. I am
considering some changes (based on your comments about current->mm
below) which would make this variable superfluous.

> > +       if (temp_mm->is_kernel_thread)
> > +               temp_mm->prev = current->active_mm;
> > +       else
> > +               temp_mm->prev = current->mm;
>
> Why the branch? Shouldn't current->active_mm work in both cases?
>
>

Yes you are correct.

> > +       /*
> > +        * Hash requires a non-NULL current->mm to allocate a userspace address
> > +        * when handling a page fault. Does not appear to hurt in Radix either.
> > +        */
> > +       current->mm = temp_mm->temp;
>
> This looks dangerous to me. There are various places that attempt to
> find all userspace tasks that use a given mm by iterating through all
> tasks on the system and comparing each task's ->mm pointer to
> current's. Things like current_is_single_threaded() as part of various
> security checks, mm_update_next_owner(), zap_threads(), and so on. So
> if this is reachable from userspace task context (which I think it
> is?), I don't think we're allowed to switch out the ->mm pointer here.
>
>

Thanks for pointing this out! I took a step back and looked at this
again in more detail. The only reason for reassigning the ->mm pointer
is that when patching we need to hash the page and allocate an SLB 
entry w/ the hash MMU. That codepath includes a check to ensure that
->mm is not NULL. Overwriting ->mm temporarily and restoring it is
pretty crappy in retrospect. I _think_ a better approach is to just call
the hashing and allocate SLB functions from `map_patch` directly - this
both removes the need to overwrite ->mm (since the functions take an mm
parameter) and it avoids taking two exceptions when doing the actual
patching.

This works fine on Power9 and a Power8 at least but needs some testing
on PPC32 before I can send a v4.

> > +       switch_mm_irqs_off(NULL, temp_mm->temp, current);
>
> switch_mm_irqs_off() calls switch_mmu_context(), which in the nohash
> implementation increments next->context.active and decrements
> prev->context.active if prev is non-NULL, right? So this would
> increase temp_mm->temp->context.active...
>
> > +       if (ppc_breakpoint_available()) {
> > +               struct arch_hw_breakpoint null_brk = {0};
> > +               int i = 0;
> > +
> > +               for (; i < nr_wp_slots(); ++i) {
> > +                       __get_breakpoint(i, &temp_mm->brk[i]);
> > +                       if (temp_mm->brk[i].type != 0)
> > +                               __set_breakpoint(i, &null_brk);
> > +               }
> > +       }
> > +}
> > +
> > +static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
> > +{
> > +       lockdep_assert_irqs_disabled();
> > +
> > +       if (temp_mm->is_kernel_thread)
> > +               current->mm = NULL;
> > +       else
> > +               current->mm = temp_mm->prev;
> > +       switch_mm_irqs_off(NULL, temp_mm->prev, current);
>
> ... whereas this would increase temp_mm->prev->context.active. As far
> as I can tell, that'll mean that both the original mm and the patching
> mm will have their .active counts permanently too high after
> use_temporary_mm()+unuse_temporary_mm()?
>

Yes you are correct. Hmm, I can't immediately recall why prev=NULL here,
and I can't find anything in the various powerpc
switch_mm_irqs_off/switch_mmu_context implementations that would break
by setting prev=actual previous mm here. I will fix this for v4. Thanks!

> > +       if (ppc_breakpoint_available()) {
> > +               int i = 0;
> > +
> > +               for (; i < nr_wp_slots(); ++i)
> > +                       if (temp_mm->brk[i].type != 0)
> > +                               __set_breakpoint(i, &temp_mm->brk[i]);
> > +       }
> > +}


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/6] powerpc: Introduce temporary mm
@ 2020-09-07  0:15       ` Christopher M. Riedl
  0 siblings, 0 replies; 16+ messages in thread
From: Christopher M. Riedl @ 2020-09-07  0:15 UTC (permalink / raw)
  To: Jann Horn; +Cc: linuxppc-dev, Kernel Hardening

On Thu Aug 27, 2020 at 11:15 AM CDT, Jann Horn wrote:
> On Thu, Aug 27, 2020 at 7:24 AM Christopher M. Riedl <cmr@codefail.de>
> wrote:
> > x86 supports the notion of a temporary mm which restricts access to
> > temporary PTEs to a single CPU. A temporary mm is useful for situations
> > where a CPU needs to perform sensitive operations (such as patching a
> > STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
> > said mappings to other CPUs. A side benefit is that other CPU TLBs do
> > not need to be flushed when the temporary mm is torn down.
> >
> > Mappings in the temporary mm can be set in the userspace portion of the
> > address-space.
> [...]
> > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> [...]
> > @@ -44,6 +45,70 @@ int raw_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
> >  }
> >
> >  #ifdef CONFIG_STRICT_KERNEL_RWX
> > +
> > +struct temp_mm {
> > +       struct mm_struct *temp;
> > +       struct mm_struct *prev;
> > +       bool is_kernel_thread;
> > +       struct arch_hw_breakpoint brk[HBP_NUM_MAX];
> > +};
> > +
> > +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
> > +{
> > +       temp_mm->temp = mm;
> > +       temp_mm->prev = NULL;
> > +       temp_mm->is_kernel_thread = false;
> > +       memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
> > +}
> > +
> > +static inline void use_temporary_mm(struct temp_mm *temp_mm)
> > +{
> > +       lockdep_assert_irqs_disabled();
> > +
> > +       temp_mm->is_kernel_thread = current->mm == NULL;
>
> (That's a somewhat misleading variable name - kernel threads can have
> a non-NULL ->mm, too.)
>

Oh I didn't know that, in that case yes this is not a good name. I am
considering some changes (based on your comments about current->mm
below) which would make this variable superfluous.

> > +       if (temp_mm->is_kernel_thread)
> > +               temp_mm->prev = current->active_mm;
> > +       else
> > +               temp_mm->prev = current->mm;
>
> Why the branch? Shouldn't current->active_mm work in both cases?
>
>

Yes you are correct.

> > +       /*
> > +        * Hash requires a non-NULL current->mm to allocate a userspace address
> > +        * when handling a page fault. Does not appear to hurt in Radix either.
> > +        */
> > +       current->mm = temp_mm->temp;
>
> This looks dangerous to me. There are various places that attempt to
> find all userspace tasks that use a given mm by iterating through all
> tasks on the system and comparing each task's ->mm pointer to
> current's. Things like current_is_single_threaded() as part of various
> security checks, mm_update_next_owner(), zap_threads(), and so on. So
> if this is reachable from userspace task context (which I think it
> is?), I don't think we're allowed to switch out the ->mm pointer here.
>
>

Thanks for pointing this out! I took a step back and looked at this
again in more detail. The only reason for reassigning the ->mm pointer
is that when patching we need to hash the page and allocate an SLB 
entry w/ the hash MMU. That codepath includes a check to ensure that
->mm is not NULL. Overwriting ->mm temporarily and restoring it is
pretty crappy in retrospect. I _think_ a better approach is to just call
the hashing and allocate SLB functions from `map_patch` directly - this
both removes the need to overwrite ->mm (since the functions take an mm
parameter) and it avoids taking two exceptions when doing the actual
patching.

This works fine on Power9 and a Power8 at least but needs some testing
on PPC32 before I can send a v4.

> > +       switch_mm_irqs_off(NULL, temp_mm->temp, current);
>
> switch_mm_irqs_off() calls switch_mmu_context(), which in the nohash
> implementation increments next->context.active and decrements
> prev->context.active if prev is non-NULL, right? So this would
> increase temp_mm->temp->context.active...
>
> > +       if (ppc_breakpoint_available()) {
> > +               struct arch_hw_breakpoint null_brk = {0};
> > +               int i = 0;
> > +
> > +               for (; i < nr_wp_slots(); ++i) {
> > +                       __get_breakpoint(i, &temp_mm->brk[i]);
> > +                       if (temp_mm->brk[i].type != 0)
> > +                               __set_breakpoint(i, &null_brk);
> > +               }
> > +       }
> > +}
> > +
> > +static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
> > +{
> > +       lockdep_assert_irqs_disabled();
> > +
> > +       if (temp_mm->is_kernel_thread)
> > +               current->mm = NULL;
> > +       else
> > +               current->mm = temp_mm->prev;
> > +       switch_mm_irqs_off(NULL, temp_mm->prev, current);
>
> ... whereas this would increase temp_mm->prev->context.active. As far
> as I can tell, that'll mean that both the original mm and the patching
> mm will have their .active counts permanently too high after
> use_temporary_mm()+unuse_temporary_mm()?
>

Yes you are correct. Hmm, I can't immediately recall why prev=NULL here,
and I can't find anything in the various powerpc
switch_mm_irqs_off/switch_mmu_context implementations that would break
by setting prev=actual previous mm here. I will fix this for v4. Thanks!

> > +       if (ppc_breakpoint_available()) {
> > +               int i = 0;
> > +
> > +               for (; i < nr_wp_slots(); ++i)
> > +                       if (temp_mm->brk[i].type != 0)
> > +                               __set_breakpoint(i, &temp_mm->brk[i]);
> > +       }
> > +}


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-09-07  7:01 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-27  5:26 [PATCH v3 0/6] Use per-CPU temporary mappings for patching Christopher M. Riedl
2020-08-27  5:26 ` [PATCH v3 1/6] powerpc: Add LKDTM accessor for patching addr Christopher M. Riedl
2020-08-27  5:26 ` [PATCH v3 2/6] x86: " Christopher M. Riedl
2020-08-27  5:26 ` [PATCH v3 3/6] Add LKDTM test to hijack a patch mapping (powerpc, x86_64) Christopher M. Riedl
2020-08-27  5:26   ` [PATCH v3 3/6] Add LKDTM test to hijack a patch mapping (powerpc,x86_64) Christopher M. Riedl
2020-08-27 10:11   ` [PATCH v3 3/6] Add LKDTM test to hijack a patch mapping (powerpc, x86_64) kernel test robot
2020-08-27 10:11     ` kernel test robot
2020-08-27 18:10   ` kernel test robot
2020-08-27 18:10     ` kernel test robot
2020-08-27  5:26 ` [PATCH v3 4/6] powerpc: Introduce temporary mm Christopher M. Riedl
2020-08-27 14:15   ` Jann Horn
2020-08-27 14:15     ` Jann Horn
2020-09-07  0:15     ` Christopher M. Riedl
2020-09-07  0:15       ` Christopher M. Riedl
2020-08-27  5:26 ` [PATCH v3 5/6] powerpc: Initialize a temporary mm for code patching Christopher M. Riedl
2020-08-27  5:26 ` [PATCH v3 6/6] powerpc: Use " Christopher M. Riedl

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.