All of lore.kernel.org
 help / color / mirror / Atom feed
* [kernel-hardening] [RFC] Introduce rare_write() infrastructure
@ 2017-02-27 20:42 Kees Cook
  2017-02-27 20:42 ` [kernel-hardening] [RFC][PATCH 1/8] " Kees Cook
                   ` (7 more replies)
  0 siblings, 8 replies; 33+ messages in thread
From: Kees Cook @ 2017-02-27 20:42 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Mark Rutland, Andy Lutomirski, Hoeun Ryu, PaX Team,
	Emese Revfy, Russell King, x86

This is an RFC series to demonstrate a possible infrastructure for the
"write rarely" memory storage type in the kernel (patch 1). The intent
is to further reduce the internal attack surface of the kernel by making
more variables read-only while "at rest". This is strongly based on the
"__read_only" portion of the KERNEXEC infrastructure from PaX/grsecurity,
though I tried to adjust it to be more in line with ideas Mark Rutland had
about how it might work upstream.

Also included is the PaX/grsecurity constify plugin (patch 7) which will
automatically make all instances of certain structures read-only, to help
demonstrate more complex cases of "write rarely" targets. (The plugin in
this series is altered to only operate on marked structures, rather than
the full automatic constification.)

As part of the series I've included both x86 support (patch 4), exactly
as done in PaX, and ARM support (patch 5), similar to what is done in
grsecurity but without support for earlier ARM CPUs. Both are lightly
tested by me, but have lived through 0-day build testing over the weekend.

I've added an lkdtm test (patch 2), though it needs to be reorganized
since its failure case is inverted from what would normally be expected
for lkdtm. It does, however, serve as a stand-alone example of the new
infrastructure.

Included are two example "conversions" to the rare_write()-style of
variable manipulation: a simple one, which switches the inet diag handler
table to write-rarely during register/unregister calls (patch 3), and
a more complex one: cgroup types (patch 8), which is made read-only via
the constify plugin. The latter uses rare-write linked lists (patch 6)
and multi-field updates. Both examples are refactorings of what already
appears in PaX/grsecurity.

It may make sense to also return to PaX's original interface (using
assignments instead of a function-like macro), to avoid false positives
from coccinelle[1], and to allow for assignment operators instead of
longer-form assignments ("rare_write(thing->field, thing->field | FLAG)"
is ugly compared to "const_cast(thing->field) |= FLAG").

The patches are:

	[PATCH 1/8] Introduce rare_write() infrastructure
	[PATCH 2/8] lkdtm: add test for rare_write() infrastructure
	[PATCH 3/8] net: switch sock_diag handlers to rare_write()
	[PATCH 4/8] x86: Implement __arch_rare_write_map/unmap()
	[PATCH 5/8] ARM: Implement __arch_rare_write_map/unmap()
	[PATCH 6/8] list: add rare_write() list helpers
	[PATCH 7/8] gcc-plugins: Add constify plugin
	[PATCH 8/8] cgroups: force all struct cftype const

Let's hammer out the issues...

-Kees

[1] https://lists.01.org/pipermail/kbuild-all/2017-February/031316.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] [RFC][PATCH 1/8] Introduce rare_write() infrastructure
  2017-02-27 20:42 [kernel-hardening] [RFC] Introduce rare_write() infrastructure Kees Cook
@ 2017-02-27 20:42 ` Kees Cook
  2017-02-28  8:22   ` [kernel-hardening] " Hoeun Ryu
  2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 2/8] lkdtm: add test for " Kees Cook
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Kees Cook @ 2017-02-27 20:42 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Mark Rutland, Andy Lutomirski, Hoeun Ryu, PaX Team,
	Emese Revfy, Russell King, x86

Several types of data storage exist in the kernel: read-write data (.data,
.bss), read-only data (.rodata), and RO-after-init. This introduces the
infrastructure for another type: write-rarely, which intended for data
that is either only rarely modified or especially security-sensitive. The
intent is to further reduce the internal attack surface of the kernel by
making this storage read-only when "at rest". This makes it much harder
to be subverted by attackers who have a kernel-write flaw, since they
cannot directly change the memory contents.

Variables declared __wr_rare will be made const when an architecture
supports HAVE_ARCH_WRITE_RARE. To change these variables, either the
rare_write() macro can be used, or multiple uses of __rare_write(),
wrapped in rare_write_enable()/rare_write_disable() macros. These macros
are handled by the arch-specific functions that perform the actions needed
to write to otherwise read-only memory.

The arch-specific helpers must be not allow non-current CPUs to write
the memory area, run non-preemptible to avoid accidentally leaving
memory writable, and defined as inline to avoid making them desirable
ROP targets for attackers.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/Kconfig             | 15 +++++++++++++++
 include/linux/compiler.h | 38 ++++++++++++++++++++++++++++++++++++++
 include/linux/preempt.h  |  6 ++++--
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 99839c23d453..2446de19f66d 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -781,4 +781,19 @@ config VMAP_STACK
 	  the stack to map directly to the KASAN shadow map using a formula
 	  that is incorrect if the stack is in vmalloc space.
 
+config HAVE_ARCH_RARE_WRITE
+	def_bool n
+	help
+	  An arch should select this option if it has defined the functions
+	  __arch_rare_write_map() and __arch_rare_write_unmap() to
+	  respectively enable and disable writing to read-only memory.The
+	  routines must meet the following requirements:
+	  - read-only memory writing must only be available on the current
+	    CPU (to make sure other CPUs can't race to make changes too).
+	  - the routines must be declared inline (to discourage ROP use).
+	  - the routines must not be preemptible (likely they will call
+	    preempt_disable() and preempt_enable_no_resched() respectively).
+	  - the routines must validate expected state (e.g. when enabling
+	    writes, BUG() if writes are already be enabled).
+
 source "kernel/gcov/Kconfig"
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index cf0fa5d86059..f95603a8ee72 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -325,6 +325,44 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
 	__u.__val;					\
 })
 
+/*
+ * Build "write rarely" infrastructure for flipping memory r/w
+ * on a per-CPU basis.
+ */
+#ifndef CONFIG_HAVE_ARCH_RARE_WRITE
+# define __wr_rare
+# define __wr_rare_type
+# define __rare_write_type(v)	typeof(v)
+# define __rare_write_ptr(v)	(&(v))
+# define __rare_write(__var, __val)	({		\
+	__var = __val;					\
+	__var;						\
+})
+# define rare_write_enable()	do { } while (0)
+# define rare_write_disable()	do { } while (0)
+#else
+# define __wr_rare		__ro_after_init
+# define __wr_rare_type		const
+# define __rare_write_type(v)	typeof((typeof(v))0)
+# define __rare_write_ptr(v)	((__rare_write_type(v) *)&(v))
+# define __rare_write(__var, __val) ({			\
+	__rare_write_type(__var) *__rw_var;		\
+							\
+	__rw_var = __rare_write_ptr(__var);		\
+	*__rw_var = (__val);				\
+	__var;						\
+})
+# define rare_write_enable()	__arch_rare_write_map()
+# define rare_write_disable()	__arch_rare_write_unmap()
+#endif
+
+#define rare_write(__var, __val) ({			\
+	rare_write_enable();				\
+	__rare_write(__var, __val);			\
+	rare_write_disable();				\
+	__var;						\
+})
+
 #endif /* __KERNEL__ */
 
 #endif /* __ASSEMBLY__ */
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 7eeceac52dea..183c1d7a8594 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -237,10 +237,12 @@ do { \
 /*
  * Modules have no business playing preemption tricks.
  */
-#undef sched_preempt_enable_no_resched
-#undef preempt_enable_no_resched
 #undef preempt_enable_no_resched_notrace
 #undef preempt_check_resched
+#ifndef CONFIG_HAVE_ARCH_RARE_WRITE
+#undef sched_preempt_enable_no_resched
+#undef preempt_enable_no_resched
+#endif
 #endif
 
 #define preempt_set_need_resched() \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [kernel-hardening] [RFC][PATCH 2/8] lkdtm: add test for rare_write() infrastructure
  2017-02-27 20:42 [kernel-hardening] [RFC] Introduce rare_write() infrastructure Kees Cook
  2017-02-27 20:42 ` [kernel-hardening] [RFC][PATCH 1/8] " Kees Cook
@ 2017-02-27 20:43 ` Kees Cook
  2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 3/8] net: switch sock_diag handlers to rare_write() Kees Cook
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Kees Cook @ 2017-02-27 20:43 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Mark Rutland, Andy Lutomirski, Hoeun Ryu, PaX Team,
	Emese Revfy, Russell King, x86

This adds the WRITE_RARE_WRITE test to validate variables marked with
__wr_rare. This isn't the final form of the test, since right now the
result is inverted from what is normally expected from LKDTM: it should
BUG on success...

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 drivers/misc/lkdtm.h       |  1 +
 drivers/misc/lkdtm_core.c  |  1 +
 drivers/misc/lkdtm_perms.c | 21 ++++++++++++++++++++-
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/lkdtm.h b/drivers/misc/lkdtm.h
index cfa1039c62e7..42b5bb1f0062 100644
--- a/drivers/misc/lkdtm.h
+++ b/drivers/misc/lkdtm.h
@@ -35,6 +35,7 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void);
 void __init lkdtm_perms_init(void);
 void lkdtm_WRITE_RO(void);
 void lkdtm_WRITE_RO_AFTER_INIT(void);
+void lkdtm_WRITE_RARE_WRITE(void);
 void lkdtm_WRITE_KERN(void);
 void lkdtm_EXEC_DATA(void);
 void lkdtm_EXEC_STACK(void);
diff --git a/drivers/misc/lkdtm_core.c b/drivers/misc/lkdtm_core.c
index 7eeb71a75549..cc5a0186d80b 100644
--- a/drivers/misc/lkdtm_core.c
+++ b/drivers/misc/lkdtm_core.c
@@ -219,6 +219,7 @@ struct crashtype crashtypes[] = {
 	CRASHTYPE(ACCESS_USERSPACE),
 	CRASHTYPE(WRITE_RO),
 	CRASHTYPE(WRITE_RO_AFTER_INIT),
+	CRASHTYPE(WRITE_RARE_WRITE),
 	CRASHTYPE(WRITE_KERN),
 	CRASHTYPE(ATOMIC_UNDERFLOW),
 	CRASHTYPE(ATOMIC_OVERFLOW),
diff --git a/drivers/misc/lkdtm_perms.c b/drivers/misc/lkdtm_perms.c
index c7635a79341f..70559c76592e 100644
--- a/drivers/misc/lkdtm_perms.c
+++ b/drivers/misc/lkdtm_perms.c
@@ -20,12 +20,15 @@
 /* This is non-const, so it will end up in the .data section. */
 static u8 data_area[EXEC_SIZE];
 
-/* This is cost, so it will end up in the .rodata section. */
+/* This is const, so it will end up in the .rodata section. */
 static const unsigned long rodata = 0xAA55AA55;
 
 /* This is marked __ro_after_init, so it should ultimately be .rodata. */
 static unsigned long ro_after_init __ro_after_init = 0x55AA5500;
 
+/* This is marked __wr_rare, so it should ultimately be .rodata. */
+static unsigned long wr_rare __wr_rare = 0xAA66AA66;
+
 /*
  * This just returns to the caller. It is designed to be copied into
  * non-executable memory regions.
@@ -103,6 +106,22 @@ void lkdtm_WRITE_RO_AFTER_INIT(void)
 	*ptr ^= 0xabcd1234;
 }
 
+void lkdtm_WRITE_RARE_WRITE(void)
+{
+	/* Explicitly cast away "const" for the test. */
+	unsigned long *ptr = (unsigned long *)&wr_rare;
+
+#ifdef CONFIG_HAVE_ARCH_RARE_WRITE
+	pr_info("attempting good rare write at %p\n", ptr);
+	rare_write(*ptr, 0x11335577);
+	if (wr_rare != 0x11335577)
+		pr_warn("Yikes: wr_rare did not actually change!\n");
+#endif
+
+	pr_info("attempting bad rare write at %p\n", ptr);
+	*ptr ^= 0xbcd12345;
+}
+
 void lkdtm_WRITE_KERN(void)
 {
 	size_t size;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [kernel-hardening] [RFC][PATCH 3/8] net: switch sock_diag handlers to rare_write()
  2017-02-27 20:42 [kernel-hardening] [RFC] Introduce rare_write() infrastructure Kees Cook
  2017-02-27 20:42 ` [kernel-hardening] [RFC][PATCH 1/8] " Kees Cook
  2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 2/8] lkdtm: add test for " Kees Cook
@ 2017-02-27 20:43 ` Kees Cook
  2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 4/8] x86: Implement __arch_rare_write_map/unmap() Kees Cook
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Kees Cook @ 2017-02-27 20:43 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Mark Rutland, Andy Lutomirski, Hoeun Ryu, PaX Team,
	Emese Revfy, Russell King, x86

This is a simple example a register/unregister case for __wr_rare markings,
which only needs a simple rare_write() call to make updates.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/core/sock_diag.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 6b10573cc9fa..67253026106f 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -14,7 +14,7 @@
 #include <linux/inet_diag.h>
 #include <linux/sock_diag.h>
 
-static const struct sock_diag_handler *sock_diag_handlers[AF_MAX];
+static const struct sock_diag_handler *sock_diag_handlers[AF_MAX] __wr_rare;
 static int (*inet_rcv_compat)(struct sk_buff *skb, struct nlmsghdr *nlh);
 static DEFINE_MUTEX(sock_diag_table_mutex);
 static struct workqueue_struct *broadcast_wq;
@@ -194,7 +194,7 @@ int sock_diag_register(const struct sock_diag_handler *hndl)
 	if (sock_diag_handlers[hndl->family])
 		err = -EBUSY;
 	else
-		sock_diag_handlers[hndl->family] = hndl;
+		rare_write(sock_diag_handlers[hndl->family], hndl);
 	mutex_unlock(&sock_diag_table_mutex);
 
 	return err;
@@ -210,7 +210,7 @@ void sock_diag_unregister(const struct sock_diag_handler *hnld)
 
 	mutex_lock(&sock_diag_table_mutex);
 	BUG_ON(sock_diag_handlers[family] != hnld);
-	sock_diag_handlers[family] = NULL;
+	rare_write(sock_diag_handlers[family], NULL);
 	mutex_unlock(&sock_diag_table_mutex);
 }
 EXPORT_SYMBOL_GPL(sock_diag_unregister);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [kernel-hardening] [RFC][PATCH 4/8] x86: Implement __arch_rare_write_map/unmap()
  2017-02-27 20:42 [kernel-hardening] [RFC] Introduce rare_write() infrastructure Kees Cook
                   ` (2 preceding siblings ...)
  2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 3/8] net: switch sock_diag handlers to rare_write() Kees Cook
@ 2017-02-27 20:43 ` Kees Cook
  2017-02-28 19:33   ` [kernel-hardening] " Andy Lutomirski
  2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 5/8] ARM: " Kees Cook
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Kees Cook @ 2017-02-27 20:43 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Mark Rutland, Andy Lutomirski, Hoeun Ryu, PaX Team,
	Emese Revfy, Russell King, x86

Based on PaX's x86 pax_{open,close}_kernel() implementation, this
allows HAVE_ARCH_RARE_WRITE to work on x86.

There is missing work to sort out some header file issues where preempt.h
is missing, though it can't be included in pg_table.h unconditionally...
some other solution will be needed, perhaps an entirely separate header
file for rare_write()-related defines...

This patch is also missing paravirt support.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/Kconfig               |  1 +
 arch/x86/include/asm/pgtable.h | 31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e487493bbd47..6d4d6f73d4b8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -102,6 +102,7 @@ config X86
 	select HAVE_ARCH_KMEMCHECK
 	select HAVE_ARCH_MMAP_RND_BITS		if MMU
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if MMU && COMPAT
+	select HAVE_ARCH_RARE_WRITE
 	select HAVE_ARCH_SECCOMP_FILTER
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 437feb436efa..86e78e97738f 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -90,6 +90,37 @@ extern struct mm_struct *pgd_page_get_mm(struct page *page);
 
 #endif	/* CONFIG_PARAVIRT */
 
+/* TODO: Bad hack to deal with preempt macros being missing sometimes. */
+#ifndef preempt_disable
+#include <linux/preempt.h>
+#endif
+
+static inline unsigned long __arch_rare_write_map(void)
+{
+	unsigned long cr0;
+
+	preempt_disable();
+	barrier();
+	cr0 = read_cr0() ^ X86_CR0_WP;
+	BUG_ON(cr0 & X86_CR0_WP);
+	write_cr0(cr0);
+	barrier();
+	return cr0 ^ X86_CR0_WP;
+}
+
+static inline unsigned long __arch_rare_write_unmap(void)
+{
+	unsigned long cr0;
+
+	barrier();
+	cr0 = read_cr0() ^ X86_CR0_WP;
+	BUG_ON(!(cr0 & X86_CR0_WP));
+	write_cr0(cr0);
+	barrier();
+	preempt_enable_no_resched();
+	return cr0 ^ X86_CR0_WP;
+}
+
 /*
  * The following only work if pte_present() is true.
  * Undefined behaviour if not..
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [kernel-hardening] [RFC][PATCH 5/8] ARM: Implement __arch_rare_write_map/unmap()
  2017-02-27 20:42 [kernel-hardening] [RFC] Introduce rare_write() infrastructure Kees Cook
                   ` (3 preceding siblings ...)
  2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 4/8] x86: Implement __arch_rare_write_map/unmap() Kees Cook
@ 2017-02-27 20:43 ` Kees Cook
  2017-03-01  1:04   ` [kernel-hardening] " Russell King - ARM Linux
  2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 6/8] list: add rare_write() list helpers Kees Cook
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Kees Cook @ 2017-02-27 20:43 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Mark Rutland, Andy Lutomirski, Hoeun Ryu, PaX Team,
	Emese Revfy, Russell King, x86

Based on grsecurity's ARM pax_{open,close}_kernel() implementation, this
allows HAVE_ARCH_RARE_WRITE to work on ARM.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/arm/Kconfig               |  1 +
 arch/arm/include/asm/domain.h  |  3 ++-
 arch/arm/include/asm/pgtable.h | 27 +++++++++++++++++++++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 186c4c214e0a..c3e1369e7429 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -39,6 +39,7 @@ config ARM
 	select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
 	select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
 	select HAVE_ARCH_MMAP_RND_BITS if MMU
+	select HAVE_ARCH_RARE_WRITE if MMU && !ARM_LPAE && !CPU_USE_DOMAINS
 	select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARM_SMCCC if CPU_V7
diff --git a/arch/arm/include/asm/domain.h b/arch/arm/include/asm/domain.h
index 99d9f630d6b6..6c72f533382c 100644
--- a/arch/arm/include/asm/domain.h
+++ b/arch/arm/include/asm/domain.h
@@ -53,6 +53,7 @@
 #define DOMAIN_MANAGER	3
 #else
 #define DOMAIN_MANAGER	1
+#define DOMAIN_RARE_WRITE 3
 #endif
 
 #define domain_mask(dom)	((3) << (2 * (dom)))
@@ -115,7 +116,7 @@ static inline void set_domain(unsigned val)
 }
 #endif
 
-#ifdef CONFIG_CPU_USE_DOMAINS
+#if defined(CONFIG_CPU_USE_DOMAINS) || defined(CONFIG_HAVE_ARCH_RARE_WRITE)
 #define modify_domain(dom,type)					\
 	do {							\
 		unsigned int domain = get_domain();		\
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index a8d656d9aec7..c9492f3e9581 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -56,6 +56,33 @@ extern void __pgd_error(const char *file, int line, pgd_t);
 #define pmd_ERROR(pmd)		__pmd_error(__FILE__, __LINE__, pmd)
 #define pgd_ERROR(pgd)		__pgd_error(__FILE__, __LINE__, pgd)
 
+#ifdef CONFIG_HAVE_ARCH_RARE_WRITE
+#include <asm/domain.h>
+#include <linux/preempt.h>
+
+static inline int test_domain(int domain, int domaintype)
+{
+	return (get_domain() & domain_val(domain, 3)) ==
+		domain_val(domain, domaintype);
+}
+
+static inline unsigned long __arch_rare_write_map(void)
+{
+	preempt_disable();
+	BUG_ON(test_domain(DOMAIN_KERNEL, DOMAIN_RARE_WRITE));
+	modify_domain(DOMAIN_KERNEL, DOMAIN_RARE_WRITE);
+	return 0;
+}
+
+static inline unsigned long __arch_rare_write_unmap(void)
+{
+	BUG_ON(test_domain(DOMAIN_KERNEL, DOMAIN_MANAGER));
+	modify_domain(DOMAIN_KERNEL, DOMAIN_MANAGER);
+	preempt_enable_no_resched();
+	return 0;
+}
+#endif
+
 /*
  * This is the lowest virtual address we can permit any user space
  * mapping to be mapped at.  This is particularly important for
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [kernel-hardening] [RFC][PATCH 6/8] list: add rare_write() list helpers
  2017-02-27 20:42 [kernel-hardening] [RFC] Introduce rare_write() infrastructure Kees Cook
                   ` (4 preceding siblings ...)
  2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 5/8] ARM: " Kees Cook
@ 2017-02-27 20:43 ` Kees Cook
  2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 7/8] gcc-plugins: Add constify plugin Kees Cook
  2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 8/8] cgroups: force all struct cftype const Kees Cook
  7 siblings, 0 replies; 33+ messages in thread
From: Kees Cook @ 2017-02-27 20:43 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Mark Rutland, Andy Lutomirski, Hoeun Ryu, PaX Team,
	Emese Revfy, Russell King, x86

Some structures that are intended to be made write-rarely are designed to
be linked by lists. As a result, there need to be rare_write()-supported
linked list primitives.

As found in PaX, this adds list management helpers for doing updates to
rarely-changed lists.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/list.h | 17 +++++++++++++++++
 lib/Makefile         |  2 +-
 lib/list_debug.c     | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/include/linux/list.h b/include/linux/list.h
index d1039ecaf94f..548b95546793 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -126,6 +126,23 @@ static inline void list_del(struct list_head *entry)
 	entry->prev = LIST_POISON2;
 }
 
+extern void __rare_list_add(struct list_head *new,
+			    struct list_head *prev,
+			    struct list_head *next);
+
+static inline void
+rare_list_add(__wr_rare_type struct list_head *new, struct list_head *head)
+{
+	__rare_list_add((struct list_head *)new, head, head->next);
+}
+static inline void
+rare_list_add_tail(__wr_rare_type struct list_head *new, struct list_head *head)
+{
+	__rare_list_add((struct list_head *)new, head->prev, head);
+}
+
+extern void rare_list_del(__wr_rare_type struct list_head *entry);
+
 /**
  * list_replace - replace old entry by new one
  * @old : the element to be replaced
diff --git a/lib/Makefile b/lib/Makefile
index bc4073a8cd08..edee0bcc660a 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -76,7 +76,7 @@ obj-$(CONFIG_BTREE) += btree.o
 obj-$(CONFIG_INTERVAL_TREE) += interval_tree.o
 obj-$(CONFIG_ASSOCIATIVE_ARRAY) += assoc_array.o
 obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o
-obj-$(CONFIG_DEBUG_LIST) += list_debug.o
+obj-y += list_debug.o
 obj-$(CONFIG_DEBUG_OBJECTS) += debugobjects.o
 
 ifneq ($(CONFIG_HAVE_DEC_LOCK),y)
diff --git a/lib/list_debug.c b/lib/list_debug.c
index 7f7bfa55eb6d..1ff3c5bb926a 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -10,7 +10,9 @@
 #include <linux/bug.h>
 #include <linux/kernel.h>
 #include <linux/rculist.h>
+#include <linux/mm.h>
 
+#ifdef CONFIG_DEBUG_LIST
 /*
  * Check that the data structures for the list manipulations are reasonably
  * valid. Failures here indicate memory corruption (and possibly an exploit
@@ -57,3 +59,35 @@ bool __list_del_entry_valid(struct list_head *entry)
 
 }
 EXPORT_SYMBOL(__list_del_entry_valid);
+
+#endif /* CONFIG_DEBUG_LIST */
+
+void __rare_list_add(struct list_head *new, struct list_head *prev,
+		     struct list_head *next)
+{
+	if (!__list_add_valid(new, prev, next))
+		return;
+
+	rare_write_enable();
+	next->prev = new;
+	new->next = next;
+	new->prev = prev;
+	prev->next = new;
+	rare_write_disable();
+}
+EXPORT_SYMBOL(__rare_list_add);
+
+void rare_list_del(__wr_rare_type struct list_head *entry_const)
+{
+	struct list_head *entry = (struct list_head *)entry_const;
+
+	if (!__list_del_entry_valid(entry))
+		return;
+
+	rare_write_enable();
+	__list_del(entry->prev, entry->next);
+	entry->next = LIST_POISON1;
+	entry->prev = LIST_POISON2;
+	rare_write_disable();
+}
+EXPORT_SYMBOL(rare_list_del);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [kernel-hardening] [RFC][PATCH 7/8] gcc-plugins: Add constify plugin
  2017-02-27 20:42 [kernel-hardening] [RFC] Introduce rare_write() infrastructure Kees Cook
                   ` (5 preceding siblings ...)
  2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 6/8] list: add rare_write() list helpers Kees Cook
@ 2017-02-27 20:43 ` Kees Cook
  2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 8/8] cgroups: force all struct cftype const Kees Cook
  7 siblings, 0 replies; 33+ messages in thread
From: Kees Cook @ 2017-02-27 20:43 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Mark Rutland, Andy Lutomirski, Hoeun Ryu, PaX Team,
	Emese Revfy, Russell King, x86

This is a port of the PaX/grsecurity constify plugin. However, it has
the automatic function-pointer struct detection temporarily disabled. As
a result, this will only recognize the __do_const annotation, which
makes all instances of a marked structure read-only. The rare_write()
infrastructure can be used to make changes to such variables.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/Kconfig                          |  13 +
 include/linux/compiler-gcc.h          |   5 +
 include/linux/compiler.h              |   8 +
 scripts/Makefile.gcc-plugins          |   2 +
 scripts/gcc-plugins/constify_plugin.c | 585 ++++++++++++++++++++++++++++++++++
 5 files changed, 613 insertions(+)
 create mode 100644 scripts/gcc-plugins/constify_plugin.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 2446de19f66d..a40e38740d8a 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -410,6 +410,19 @@ config GCC_PLUGIN_LATENT_ENTROPY
 	   * https://grsecurity.net/
 	   * https://pax.grsecurity.net/
 
+config GCC_PLUGIN_CONSTIFY
+	bool "Make function pointer structures and marked variables read-only"
+	depends on GCC_PLUGINS && HAVE_ARCH_RARE_WRITE && !UML
+	help
+	  By saying Y here the compiler will automatically constify a class
+	  of types that contain only function pointers, as well as any
+	  manually annotated structures. This reduces the kernel's attack
+	  surface and also produces a better memory layout.
+
+	  This plugin was ported from grsecurity/PaX. More information at:
+	   * https://grsecurity.net/
+	   * https://pax.grsecurity.net/
+
 config HAVE_CC_STACKPROTECTOR
 	bool
 	help
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 0444b1336268..edeb82a2c043 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -189,6 +189,11 @@
 
 #if GCC_VERSION >= 40500
 
+#ifdef CONSTIFY_PLUGIN
+#define __no_const __attribute__((no_const))
+#define __do_const __attribute__((do_const))
+#endif
+
 #ifndef __CHECKER__
 #ifdef LATENT_ENTROPY_PLUGIN
 #define __latent_entropy __attribute__((latent_entropy))
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index f95603a8ee72..1510844ed6a7 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -471,6 +471,14 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
 # define __latent_entropy
 #endif
 
+#ifndef __do_const
+# define __do_const
+#endif
+
+#ifndef __no_const
+# define __no_const
+#endif
+
 /*
  * Tell gcc if a function is cold. The compiler will assume any path
  * directly leading to the call is unlikely.
diff --git a/scripts/Makefile.gcc-plugins b/scripts/Makefile.gcc-plugins
index 060d2cb373db..4a1d0a4c6563 100644
--- a/scripts/Makefile.gcc-plugins
+++ b/scripts/Makefile.gcc-plugins
@@ -5,6 +5,8 @@ ifdef CONFIG_GCC_PLUGINS
   SANCOV_PLUGIN := -fplugin=$(objtree)/scripts/gcc-plugins/sancov_plugin.so
 
   gcc-plugin-$(CONFIG_GCC_PLUGIN_CYC_COMPLEXITY)	+= cyc_complexity_plugin.so
+  gcc-plugin-$(CONFIG_GCC_PLUGIN_CONSTIFY)	+= constify_plugin.so
+  gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_CONSTIFY)	+= -DCONSTIFY_PLUGIN
 
   gcc-plugin-$(CONFIG_GCC_PLUGIN_LATENT_ENTROPY)	+= latent_entropy_plugin.so
   gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_LATENT_ENTROPY)	+= -DLATENT_ENTROPY_PLUGIN
diff --git a/scripts/gcc-plugins/constify_plugin.c b/scripts/gcc-plugins/constify_plugin.c
new file mode 100644
index 000000000000..e7d6f3140e87
--- /dev/null
+++ b/scripts/gcc-plugins/constify_plugin.c
@@ -0,0 +1,585 @@
+/*
+ * Copyright 2011 by Emese Revfy <re.emese@gmail.com>
+ * Copyright 2011-2016 by PaX Team <pageexec@freemail.hu>
+ * Licensed under the GPL v2, or (at your option) v3
+ *
+ * This gcc plugin constifies all structures which contain only function pointers or are explicitly marked for constification.
+ *
+ * Homepage:
+ * http://www.grsecurity.net/~ephox/const_plugin/
+ *
+ * Usage:
+ * $ gcc -I`gcc -print-file-name=plugin`/include -fPIC -shared -O2 -o constify_plugin.so constify_plugin.c
+ * $ gcc -fplugin=constify_plugin.so test.c -O2
+ */
+
+#include "gcc-common.h"
+
+// unused C type flag in all versions 4.5-6
+#define TYPE_CONSTIFY_VISITED(TYPE) TYPE_LANG_FLAG_4(TYPE)
+
+__visible int plugin_is_GPL_compatible;
+
+static bool enabled = true;
+
+static struct plugin_info const_plugin_info = {
+	.version	= "201607241840vanilla",
+	.help		= "disable\tturn off constification\n",
+};
+
+static struct {
+	const char *name;
+	const char *asm_op;
+} const_sections[] = {
+	{".init.rodata",     "\t.section\t.init.rodata,\"a\""},
+	{".ref.rodata",      "\t.section\t.ref.rodata,\"a\""},
+	{".devinit.rodata",  "\t.section\t.devinit.rodata,\"a\""},
+	{".devexit.rodata",  "\t.section\t.devexit.rodata,\"a\""},
+	{".cpuinit.rodata",  "\t.section\t.cpuinit.rodata,\"a\""},
+	{".cpuexit.rodata",  "\t.section\t.cpuexit.rodata,\"a\""},
+	{".meminit.rodata",  "\t.section\t.meminit.rodata,\"a\""},
+	{".memexit.rodata",  "\t.section\t.memexit.rodata,\"a\""},
+	{".data..read_only", "\t.section\t.data..read_only,\"a\""},
+};
+
+typedef struct {
+	bool has_fptr_field;
+	bool has_writable_field;
+	bool has_do_const_field;
+	bool has_no_const_field;
+} constify_info;
+
+static const_tree get_field_type(const_tree field)
+{
+	return strip_array_types(TREE_TYPE(field));
+}
+
+static bool is_fptr(const_tree field)
+{
+	/* XXX: disable automatic constification. */
+	return false;
+
+	const_tree ptr = get_field_type(field);
+
+	if (TREE_CODE(ptr) != POINTER_TYPE)
+		return false;
+
+	return TREE_CODE(TREE_TYPE(ptr)) == FUNCTION_TYPE;
+}
+
+/*
+ * determine whether the given structure type meets the requirements for automatic constification,
+ * including the constification attributes on nested structure types
+ */
+static void constifiable(const_tree node, constify_info *cinfo)
+{
+	const_tree field;
+
+	gcc_assert(TREE_CODE(node) == RECORD_TYPE || TREE_CODE(node) == UNION_TYPE);
+
+	// e.g., pointer to structure fields while still constructing the structure type
+	if (TYPE_FIELDS(node) == NULL_TREE)
+		return;
+
+	for (field = TYPE_FIELDS(node); field; field = TREE_CHAIN(field)) {
+		const_tree type = get_field_type(field);
+		enum tree_code code = TREE_CODE(type);
+
+		if (node == type)
+			continue;
+
+		if (is_fptr(field))
+			cinfo->has_fptr_field = true;
+		else if (code == RECORD_TYPE || code == UNION_TYPE) {
+			if (lookup_attribute("do_const", TYPE_ATTRIBUTES(type)))
+				cinfo->has_do_const_field = true;
+			else if (lookup_attribute("no_const", TYPE_ATTRIBUTES(type)))
+				cinfo->has_no_const_field = true;
+			else
+				constifiable(type, cinfo);
+		} else if (!TREE_READONLY(field))
+			cinfo->has_writable_field = true;
+	}
+}
+
+static bool constified(const_tree node)
+{
+	constify_info cinfo = {
+		.has_fptr_field = false,
+		.has_writable_field = false,
+		.has_do_const_field = false,
+		.has_no_const_field = false
+	};
+
+	gcc_assert(TREE_CODE(node) == RECORD_TYPE || TREE_CODE(node) == UNION_TYPE);
+
+	if (lookup_attribute("no_const", TYPE_ATTRIBUTES(node))) {
+//		gcc_assert(!TYPE_READONLY(node));
+		return false;
+	}
+
+	if (lookup_attribute("do_const", TYPE_ATTRIBUTES(node))) {
+		gcc_assert(TYPE_READONLY(node));
+		return true;
+	}
+
+	constifiable(node, &cinfo);
+	if ((!cinfo.has_fptr_field || cinfo.has_writable_field || cinfo.has_no_const_field) && !cinfo.has_do_const_field)
+		return false;
+
+	return TYPE_READONLY(node);
+}
+
+static void deconstify_tree(tree node);
+
+static void deconstify_type(tree type)
+{
+	tree field;
+
+	gcc_assert(TREE_CODE(type) == RECORD_TYPE || TREE_CODE(type) == UNION_TYPE);
+
+	for (field = TYPE_FIELDS(type); field; field = TREE_CHAIN(field)) {
+		const_tree fieldtype = get_field_type(field);
+
+		// special case handling of simple ptr-to-same-array-type members
+		if (TREE_CODE(TREE_TYPE(field)) == POINTER_TYPE) {
+			tree ptrtype = TREE_TYPE(TREE_TYPE(field));
+
+			if (TREE_TYPE(TREE_TYPE(field)) == type)
+				continue;
+			if (TREE_CODE(ptrtype) != RECORD_TYPE && TREE_CODE(ptrtype) != UNION_TYPE)
+				continue;
+			if (!constified(ptrtype))
+				continue;
+			if (TYPE_MAIN_VARIANT(ptrtype) == TYPE_MAIN_VARIANT(type))
+				TREE_TYPE(field) = build_pointer_type(build_qualified_type(type, TYPE_QUALS(ptrtype) & ~TYPE_QUAL_CONST));
+			continue;
+		}
+		if (TREE_CODE(fieldtype) != RECORD_TYPE && TREE_CODE(fieldtype) != UNION_TYPE)
+			continue;
+		if (!constified(fieldtype))
+			continue;
+
+		deconstify_tree(field);
+		TREE_READONLY(field) = 0;
+	}
+	TYPE_READONLY(type) = 0;
+	C_TYPE_FIELDS_READONLY(type) = 0;
+	if (lookup_attribute("do_const", TYPE_ATTRIBUTES(type))) {
+		TYPE_ATTRIBUTES(type) = copy_list(TYPE_ATTRIBUTES(type));
+		TYPE_ATTRIBUTES(type) = remove_attribute("do_const", TYPE_ATTRIBUTES(type));
+	}
+}
+
+static void deconstify_tree(tree node)
+{
+	tree old_type, new_type, field;
+
+	old_type = TREE_TYPE(node);
+	while (TREE_CODE(old_type) == ARRAY_TYPE && TREE_CODE(TREE_TYPE(old_type)) != ARRAY_TYPE) {
+		node = TREE_TYPE(node) = copy_node(old_type);
+		old_type = TREE_TYPE(old_type);
+	}
+
+	gcc_assert(TREE_CODE(old_type) == RECORD_TYPE || TREE_CODE(old_type) == UNION_TYPE);
+	gcc_assert(TYPE_READONLY(old_type) && (TYPE_QUALS(old_type) & TYPE_QUAL_CONST));
+
+	new_type = build_qualified_type(old_type, TYPE_QUALS(old_type) & ~TYPE_QUAL_CONST);
+	TYPE_FIELDS(new_type) = copy_list(TYPE_FIELDS(new_type));
+	for (field = TYPE_FIELDS(new_type); field; field = TREE_CHAIN(field))
+		DECL_FIELD_CONTEXT(field) = new_type;
+
+	deconstify_type(new_type);
+
+	TREE_TYPE(node) = new_type;
+}
+
+static tree handle_no_const_attribute(tree *node, tree name, tree args, int flags, bool *no_add_attrs)
+{
+	tree type;
+	constify_info cinfo = {
+		.has_fptr_field = false,
+		.has_writable_field = false,
+		.has_do_const_field = false,
+		.has_no_const_field = false
+	};
+
+	*no_add_attrs = true;
+	if (TREE_CODE(*node) == FUNCTION_DECL) {
+		error("%qE attribute does not apply to functions (%qF)", name, *node);
+		return NULL_TREE;
+	}
+
+	if (TREE_CODE(*node) == PARM_DECL) {
+		error("%qE attribute does not apply to function parameters (%qD)", name, *node);
+		return NULL_TREE;
+	}
+
+	if (TREE_CODE(*node) == VAR_DECL) {
+		error("%qE attribute does not apply to variables (%qD)", name, *node);
+		return NULL_TREE;
+	}
+
+	if (TYPE_P(*node)) {
+		type = *node;
+	} else {
+		if (TREE_CODE(*node) != TYPE_DECL) {
+			error("%qE attribute does not apply to %qD (%qT)", name, *node, TREE_TYPE(*node));
+			return NULL_TREE;
+		}
+		type = TREE_TYPE(*node);
+	}
+
+	if (TREE_CODE(type) != RECORD_TYPE && TREE_CODE(type) != UNION_TYPE) {
+		error("%qE attribute used on %qT applies to struct and union types only", name, type);
+		return NULL_TREE;
+	}
+
+	if (lookup_attribute(IDENTIFIER_POINTER(name), TYPE_ATTRIBUTES(type))) {
+		error("%qE attribute is already applied to the type %qT", name, type);
+		return NULL_TREE;
+	}
+
+	if (TYPE_P(*node)) {
+		if (lookup_attribute("do_const", TYPE_ATTRIBUTES(type)))
+			error("%qE attribute used on type %qT is incompatible with 'do_const'", name, type);
+		else
+			*no_add_attrs = false;
+		return NULL_TREE;
+	}
+
+	constifiable(type, &cinfo);
+	if ((cinfo.has_fptr_field && !cinfo.has_writable_field && !cinfo.has_no_const_field) || lookup_attribute("do_const", TYPE_ATTRIBUTES(type))) {
+		if (enabled) {
+			if TYPE_P(*node)
+				deconstify_type(*node);
+			else
+				deconstify_tree(*node);
+		}
+		if (TYPE_P(*node))
+			TYPE_CONSTIFY_VISITED(*node) = 1;
+		else
+			TYPE_CONSTIFY_VISITED(TREE_TYPE(*node)) = 1;
+		return NULL_TREE;
+	}
+
+	if (enabled && TYPE_FIELDS(type))
+		error("%qE attribute used on type %qT that is not constified", name, type);
+	return NULL_TREE;
+}
+
+static void constify_type(tree type)
+{
+	gcc_assert(type == TYPE_MAIN_VARIANT(type));
+	TYPE_READONLY(type) = 1;
+	C_TYPE_FIELDS_READONLY(type) = 1;
+	TYPE_CONSTIFY_VISITED(type) = 1;
+//	TYPE_ATTRIBUTES(type) = copy_list(TYPE_ATTRIBUTES(type));
+//	TYPE_ATTRIBUTES(type) = tree_cons(get_identifier("do_const"), NULL_TREE, TYPE_ATTRIBUTES(type));
+}
+
+static tree handle_do_const_attribute(tree *node, tree name, tree args, int flags, bool *no_add_attrs)
+{
+	*no_add_attrs = true;
+	if (!TYPE_P(*node)) {
+		error("%qE attribute applies to types only (%qD)", name, *node);
+		return NULL_TREE;
+	}
+
+	if (TREE_CODE(*node) != RECORD_TYPE && TREE_CODE(*node) != UNION_TYPE) {
+		error("%qE attribute used on %qT applies to struct and union types only", name, *node);
+		return NULL_TREE;
+	}
+
+	if (lookup_attribute(IDENTIFIER_POINTER(name), TYPE_ATTRIBUTES(*node))) {
+		error("%qE attribute used on %qT is already applied to the type", name, *node);
+		return NULL_TREE;
+	}
+
+	if (lookup_attribute("no_const", TYPE_ATTRIBUTES(*node))) {
+		error("%qE attribute used on %qT is incompatible with 'no_const'", name, *node);
+		return NULL_TREE;
+	}
+
+	*no_add_attrs = false;
+	return NULL_TREE;
+}
+
+static struct attribute_spec no_const_attr = {
+	.name			= "no_const",
+	.min_length		= 0,
+	.max_length		= 0,
+	.decl_required		= false,
+	.type_required		= false,
+	.function_type_required	= false,
+	.handler		= handle_no_const_attribute,
+#if BUILDING_GCC_VERSION >= 4007
+	.affects_type_identity	= true
+#endif
+};
+
+static struct attribute_spec do_const_attr = {
+	.name			= "do_const",
+	.min_length		= 0,
+	.max_length		= 0,
+	.decl_required		= false,
+	.type_required		= false,
+	.function_type_required	= false,
+	.handler		= handle_do_const_attribute,
+#if BUILDING_GCC_VERSION >= 4007
+	.affects_type_identity	= true
+#endif
+};
+
+static void register_attributes(void *event_data, void *data)
+{
+	register_attribute(&no_const_attr);
+	register_attribute(&do_const_attr);
+}
+
+static void finish_type(void *event_data, void *data)
+{
+	tree type = (tree)event_data;
+	constify_info cinfo = {
+		.has_fptr_field = false,
+		.has_writable_field = false,
+		.has_do_const_field = false,
+		.has_no_const_field = false
+	};
+
+	if (type == NULL_TREE || type == error_mark_node)
+		return;
+
+#if BUILDING_GCC_VERSION >= 5000
+	if (TREE_CODE(type) == ENUMERAL_TYPE)
+		return;
+#endif
+
+	if (TYPE_FIELDS(type) == NULL_TREE || TYPE_CONSTIFY_VISITED(type))
+		return;
+
+	constifiable(type, &cinfo);
+
+	if (lookup_attribute("no_const", TYPE_ATTRIBUTES(type))) {
+		if ((cinfo.has_fptr_field && !cinfo.has_writable_field && !cinfo.has_no_const_field) || cinfo.has_do_const_field) {
+			deconstify_type(type);
+			TYPE_CONSTIFY_VISITED(type) = 1;
+		} else
+			error("'no_const' attribute used on type %qT that is not constified", type);
+		return;
+	}
+
+	if (lookup_attribute("do_const", TYPE_ATTRIBUTES(type))) {
+		if (!cinfo.has_writable_field && !cinfo.has_no_const_field) {
+			error("'do_const' attribute used on type %qT that is%sconstified", type, cinfo.has_fptr_field ? " " : " not ");
+			return;
+		}
+		constify_type(type);
+		return;
+	}
+
+	if (cinfo.has_fptr_field && !cinfo.has_writable_field && !cinfo.has_no_const_field) {
+		if (lookup_attribute("do_const", TYPE_ATTRIBUTES(type))) {
+			error("'do_const' attribute used on type %qT that is constified", type);
+			return;
+		}
+		constify_type(type);
+		return;
+	}
+
+	deconstify_type(type);
+	TYPE_CONSTIFY_VISITED(type) = 1;
+}
+
+static bool is_constified_var(varpool_node_ptr node)
+{
+	tree var = NODE_DECL(node);
+	tree type = TREE_TYPE(var);
+
+	if (node->alias)
+		return false;
+
+	if (DECL_EXTERNAL(var))
+		return false;
+
+	// XXX handle more complex nesting of arrays/structs
+	if (TREE_CODE(type) == ARRAY_TYPE)
+		type = TREE_TYPE(type);
+
+	if (TREE_CODE(type) != RECORD_TYPE && TREE_CODE(type) != UNION_TYPE)
+		return false;
+
+	if (!TYPE_READONLY(type) || !C_TYPE_FIELDS_READONLY(type))
+		return false;
+
+	if (!TYPE_CONSTIFY_VISITED(type))
+		return false;
+
+	return true;
+}
+
+static void check_section_mismatch(varpool_node_ptr node)
+{
+	tree var, section;
+	size_t i;
+	const char *name;
+
+	var = NODE_DECL(node);
+	name = get_decl_section_name(var);
+	section = lookup_attribute("section", DECL_ATTRIBUTES(var));
+	if (!section) {
+		if (name) {
+			fprintf(stderr, "DECL_SECTION [%s] ", name);
+			dump_varpool_node(stderr, node);
+			gcc_unreachable();
+		}
+		return;
+	} else
+		gcc_assert(name);
+
+//fprintf(stderr, "SECTIONAME: [%s] ", get_decl_section_name(var));
+//debug_tree(var);
+
+	gcc_assert(!TREE_CHAIN(section));
+	gcc_assert(TREE_VALUE(section));
+
+	section = TREE_VALUE(TREE_VALUE(section));
+	gcc_assert(!strcmp(TREE_STRING_POINTER(section), name));
+//debug_tree(section);
+
+	for (i = 0; i < ARRAY_SIZE(const_sections); i++)
+		if (!strcmp(const_sections[i].name, name))
+			return;
+
+	error_at(DECL_SOURCE_LOCATION(var), "constified variable %qD placed into writable section %E", var, section);
+}
+
+// this works around a gcc bug/feature where uninitialized globals
+// are moved into the .bss section regardless of any constification
+// see gcc/varasm.c:bss_initializer_p()
+static void fix_initializer(varpool_node_ptr node)
+{
+	tree var = NODE_DECL(node);
+	tree type = TREE_TYPE(var);
+
+	if (DECL_INITIAL(var))
+		return;
+
+	DECL_INITIAL(var) = build_constructor(type, NULL);
+//	inform(DECL_SOURCE_LOCATION(var), "constified variable %qE moved into .rodata", var);
+}
+
+static void check_global_variables(void *event_data, void *data)
+{
+	varpool_node_ptr node;
+
+	FOR_EACH_VARIABLE(node) {
+		if (!is_constified_var(node))
+			continue;
+
+		check_section_mismatch(node);
+		fix_initializer(node);
+	}
+}
+
+static unsigned int check_local_variables_execute(void)
+{
+	unsigned int ret = 0;
+	tree var;
+
+	unsigned int i;
+
+	FOR_EACH_LOCAL_DECL(cfun, i, var) {
+		tree type = TREE_TYPE(var);
+
+		gcc_assert(DECL_P(var));
+		if (is_global_var(var))
+			continue;
+
+		if (TREE_CODE(type) != RECORD_TYPE && TREE_CODE(type) != UNION_TYPE)
+			continue;
+
+		if (!TYPE_READONLY(type) || !C_TYPE_FIELDS_READONLY(type))
+			continue;
+
+		if (!TYPE_CONSTIFY_VISITED(type))
+			continue;
+
+		error_at(DECL_SOURCE_LOCATION(var), "constified variable %qE cannot be local", var);
+		ret = 1;
+	}
+	return ret;
+}
+
+#define PASS_NAME check_local_variables
+#define NO_GATE
+#include "gcc-generate-gimple-pass.h"
+
+static unsigned int (*old_section_type_flags)(tree decl, const char *name, int reloc);
+
+static unsigned int constify_section_type_flags(tree decl, const char *name, int reloc)
+{
+	size_t i;
+
+	for (i = 0; i < ARRAY_SIZE(const_sections); i++)
+		if (!strcmp(const_sections[i].name, name))
+			return 0;
+
+	return old_section_type_flags(decl, name, reloc);
+}
+
+static void constify_start_unit(void *gcc_data, void *user_data)
+{
+//	size_t i;
+
+//	for (i = 0; i < ARRAY_SIZE(const_sections); i++)
+//		const_sections[i].section = get_unnamed_section(0, output_section_asm_op, const_sections[i].asm_op);
+//		const_sections[i].section = get_section(const_sections[i].name, 0, NULL);
+
+	old_section_type_flags = targetm.section_type_flags;
+	targetm.section_type_flags = constify_section_type_flags;
+}
+
+__visible int plugin_init(struct plugin_name_args *plugin_info, struct plugin_gcc_version *version)
+{
+	const char * const plugin_name = plugin_info->base_name;
+	const int argc = plugin_info->argc;
+	const struct plugin_argument * const argv = plugin_info->argv;
+	int i;
+
+	struct register_pass_info check_local_variables_pass_info;
+
+	check_local_variables_pass_info.pass				= make_check_local_variables_pass();
+	check_local_variables_pass_info.reference_pass_name		= "ssa";
+	check_local_variables_pass_info.ref_pass_instance_number	= 1;
+	check_local_variables_pass_info.pos_op				= PASS_POS_INSERT_BEFORE;
+
+	if (!plugin_default_version_check(version, &gcc_version)) {
+		error(G_("incompatible gcc/plugin versions"));
+		return 1;
+	}
+
+	for (i = 0; i < argc; ++i) {
+		if (!(strcmp(argv[i].key, "disable"))) {
+			enabled = false;
+			continue;
+		}
+		error(G_("unknown option '-fplugin-arg-%s-%s'"), plugin_name, argv[i].key);
+	}
+
+	if (strncmp(lang_hooks.name, "GNU C", 5) && !strncmp(lang_hooks.name, "GNU C+", 6)) {
+		inform(UNKNOWN_LOCATION, G_("%s supports C only, not %s"), plugin_name, lang_hooks.name);
+		enabled = false;
+	}
+
+	register_callback(plugin_name, PLUGIN_INFO, NULL, &const_plugin_info);
+	if (enabled) {
+		register_callback(plugin_name, PLUGIN_ALL_IPA_PASSES_START, check_global_variables, NULL);
+		register_callback(plugin_name, PLUGIN_FINISH_TYPE, finish_type, NULL);
+		register_callback(plugin_name, PLUGIN_PASS_MANAGER_SETUP, NULL, &check_local_variables_pass_info);
+		register_callback(plugin_name, PLUGIN_START_UNIT, constify_start_unit, NULL);
+	}
+	register_callback(plugin_name, PLUGIN_ATTRIBUTES, register_attributes, NULL);
+
+	return 0;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [kernel-hardening] [RFC][PATCH 8/8] cgroups: force all struct cftype const
  2017-02-27 20:42 [kernel-hardening] [RFC] Introduce rare_write() infrastructure Kees Cook
                   ` (6 preceding siblings ...)
  2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 7/8] gcc-plugins: Add constify plugin Kees Cook
@ 2017-02-27 20:43 ` Kees Cook
  7 siblings, 0 replies; 33+ messages in thread
From: Kees Cook @ 2017-02-27 20:43 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Kees Cook, Mark Rutland, Andy Lutomirski, Hoeun Ryu, PaX Team,
	Emese Revfy, Russell King, x86

As found in PaX, mark struct cftype with __do_const and add helpers
to deal with rare writes. This is a more complex example of a write-rarely
structure, which needs to use list helpers and blocks of enable/disable
pairs to perform the needed updates.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/cgroup-defs.h |  2 +-
 kernel/cgroup.c             | 35 +++++++++++++++++++++++------------
 2 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 861b4677fc5b..aa15b3132bce 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -431,7 +431,7 @@ struct cftype {
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	struct lock_class_key	lockdep_key;
 #endif
-};
+} __do_const;
 
 /*
  * Control Group subsystem type.
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b794bcadefa4..19a8e1baeb6b 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3653,11 +3653,11 @@ static int cgroup_add_file(struct cgroup_subsys_state *css, struct cgroup *cgrp,
 	int ret;
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-	key = &cft->lockdep_key;
+	key = (struct lock_class_key *)&cft->lockdep_key;
 #endif
 	kn = __kernfs_create_file(cgrp->kn, cgroup_file_name(cgrp, cft, name),
-				  cgroup_file_mode(cft), 0, cft->kf_ops, cft,
-				  NULL, key);
+				  cgroup_file_mode(cft), 0, cft->kf_ops,
+				  (void *)cft, NULL, key);
 	if (IS_ERR(kn))
 		return PTR_ERR(kn);
 
@@ -3760,11 +3760,16 @@ static void cgroup_exit_cftypes(struct cftype *cfts)
 		/* free copy for custom atomic_write_len, see init_cftypes() */
 		if (cft->max_write_len && cft->max_write_len != PAGE_SIZE)
 			kfree(cft->kf_ops);
-		cft->kf_ops = NULL;
-		cft->ss = NULL;
+
+		rare_write_enable();
+		__rare_write(cft->kf_ops, NULL);
+		__rare_write(cft->ss, NULL);
 
 		/* revert flags set by cgroup core while adding @cfts */
-		cft->flags &= ~(__CFTYPE_ONLY_ON_DFL | __CFTYPE_NOT_ON_DFL);
+		__rare_write(cft->flags,
+			     cft->flags & ~(__CFTYPE_ONLY_ON_DFL |
+					    __CFTYPE_NOT_ON_DFL));
+		rare_write_disable();
 	}
 }
 
@@ -3795,8 +3800,10 @@ static int cgroup_init_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
 			kf_ops->atomic_write_len = cft->max_write_len;
 		}
 
-		cft->kf_ops = kf_ops;
-		cft->ss = ss;
+		rare_write_enable();
+		__rare_write(cft->kf_ops, kf_ops);
+		__rare_write(cft->ss, ss);
+		rare_write_disable();
 	}
 
 	return 0;
@@ -3809,7 +3816,7 @@ static int cgroup_rm_cftypes_locked(struct cftype *cfts)
 	if (!cfts || !cfts[0].ss)
 		return -ENOENT;
 
-	list_del(&cfts->node);
+	rare_list_del(&cfts->node);
 	cgroup_apply_cftypes(cfts, false);
 	cgroup_exit_cftypes(cfts);
 	return 0;
@@ -3866,7 +3873,7 @@ static int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
 
 	mutex_lock(&cgroup_mutex);
 
-	list_add_tail(&cfts->node, &ss->cfts);
+	rare_list_add_tail(&cfts->node, &ss->cfts);
 	ret = cgroup_apply_cftypes(cfts, true);
 	if (ret)
 		cgroup_rm_cftypes_locked(cfts);
@@ -3887,8 +3894,10 @@ int cgroup_add_dfl_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
 {
 	struct cftype *cft;
 
+	rare_write_enable();
 	for (cft = cfts; cft && cft->name[0] != '\0'; cft++)
-		cft->flags |= __CFTYPE_ONLY_ON_DFL;
+		__rare_write(cft->flags, cft->flags | __CFTYPE_ONLY_ON_DFL);
+	rare_write_disable();
 	return cgroup_add_cftypes(ss, cfts);
 }
 
@@ -3904,8 +3913,10 @@ int cgroup_add_legacy_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
 {
 	struct cftype *cft;
 
+	rare_write_enable();
 	for (cft = cfts; cft && cft->name[0] != '\0'; cft++)
-		cft->flags |= __CFTYPE_NOT_ON_DFL;
+		__rare_write(cft->flags, cft->flags | __CFTYPE_NOT_ON_DFL);
+	rare_write_disable();
 	return cgroup_add_cftypes(ss, cfts);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 1/8] Introduce rare_write() infrastructure
  2017-02-27 20:42 ` [kernel-hardening] [RFC][PATCH 1/8] " Kees Cook
@ 2017-02-28  8:22   ` Hoeun Ryu
  2017-02-28 15:05     ` Kees Cook
  0 siblings, 1 reply; 33+ messages in thread
From: Hoeun Ryu @ 2017-02-28  8:22 UTC (permalink / raw)
  To: Kees Cook, kernel-hardening
  Cc: Mark Rutland, Andy Lutomirski, PaX Team, Emese Revfy, Russell King, x86

On Mon, 2017-02-27 at 12:42 -0800, Kees Cook wrote:
> Several types of data storage exist in the kernel: read-write data
> (.data,
> .bss), read-only data (.rodata), and RO-after-init. This introduces
> the
> infrastructure for another type: write-rarely, which intended for
> data
> that is either only rarely modified or especially security-sensitive. 
> The
> intent is to further reduce the internal attack surface of the kernel
> by
> making this storage read-only when "at rest". This makes it much
> harder
> to be subverted by attackers who have a kernel-write flaw, since they
> cannot directly change the memory contents.
> 
> Variables declared __wr_rare will be made const when an architecture
> supports HAVE_ARCH_WRITE_RARE. To change these variables, either the
> rare_write() macro can be used, or multiple uses of __rare_write(),
> wrapped in rare_write_enable()/rare_write_disable() macros. These
> macros
> are handled by the arch-specific functions that perform the actions
> needed
> to write to otherwise read-only memory.
> 
> The arch-specific helpers must be not allow non-current CPUs to write
> the memory area, run non-preemptible to avoid accidentally leaving
> memory writable, and defined as inline to avoid making them desirable
> ROP targets for attackers.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  arch/Kconfig             | 15 +++++++++++++++
>  include/linux/compiler.h | 38 ++++++++++++++++++++++++++++++++++++++
>  include/linux/preempt.h  |  6 ++++--
>  3 files changed, 57 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 99839c23d453..2446de19f66d 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -781,4 +781,19 @@ config VMAP_STACK
>  	  the stack to map directly to the KASAN shadow map using a
> formula
>  	  that is incorrect if the stack is in vmalloc space.
>  
> +config HAVE_ARCH_RARE_WRITE
> +	def_bool n
> +	help
> +	  An arch should select this option if it has defined the
> functions
> +	  __arch_rare_write_map() and __arch_rare_write_unmap() to
> +	  respectively enable and disable writing to read-only
> memory.The
> +	  routines must meet the following requirements:
> +	  - read-only memory writing must only be available on the
> current
> +	    CPU (to make sure other CPUs can't race to make changes
> too).
> +	  - the routines must be declared inline (to discourage ROP
> use).
> +	  - the routines must not be preemptible (likely they will
> call
> +	    preempt_disable() and preempt_enable_no_resched()
> respectively).
> +	  - the routines must validate expected state (e.g. when
> enabling
> +	    writes, BUG() if writes are already be enabled).
> +
>  source "kernel/gcov/Kconfig"
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index cf0fa5d86059..f95603a8ee72 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -325,6 +325,44 @@ static __always_inline void
> __write_once_size(volatile void *p, void *res, int s
>  	__u.__val;					\
>  })
>  
> +/*
> + * Build "write rarely" infrastructure for flipping memory r/w
> + * on a per-CPU basis.
> + */
> +#ifndef CONFIG_HAVE_ARCH_RARE_WRITE
> +# define __wr_rare
> +# define __wr_rare_type
> +# define __rare_write_type(v)	typeof(v)
> +# define __rare_write_ptr(v)	(&(v))
> +# define __rare_write(__var, __val)	({		\
> +	__var = __val;					\
> +	__var;						\
> +})
> +# define rare_write_enable()	do { } while (0)
> +# define rare_write_disable()	do { } while (0)
> +#else
> +# define __wr_rare		__ro_after_init
> +# define __wr_rare_type		const
> +# define __rare_write_type(v)	typeof((typeof(v))0)
> +# define __rare_write_ptr(v)	((__rare_write_type(v) *)&(v))

Should we make __rare_write_ptr arch-specific so architectures can
convert the pointer to rw alias from ro address like this ? [1]

#ifndef __arch_rare_write_ptr
# define __rare_write_ptr(v)	((__rare_write_type(v) *)&(v))
#else
# define __rate_write_ptr(v)	__arch_rare_write_ptr
#endif

[1] https://lkml.org/lkml/2017/2/22/254

> +# define __rare_write(__var, __val) ({			\
> +	__rare_write_type(__var) *__rw_var;		\
> +							\
> +	__rw_var = __rare_write_ptr(__var);		\
> +	*__rw_var = (__val);				\
> +	__var;						\
> +})
> +# define rare_write_enable()	__arch_rare_write_map()
> +# define rare_write_disable()	__arch_rare_write_unmap()
> +#endif
> +
> +#define rare_write(__var, __val) ({			\
> +	rare_write_enable();				\
> +	__rare_write(__var, __val);			\
> +	rare_write_disable();				\
> +	__var;						\
> +})
> +
>  #endif /* __KERNEL__ */
>  
>  #endif /* __ASSEMBLY__ */
> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> index 7eeceac52dea..183c1d7a8594 100644
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -237,10 +237,12 @@ do { \
>  /*
>   * Modules have no business playing preemption tricks.
>   */
> -#undef sched_preempt_enable_no_resched
> -#undef preempt_enable_no_resched
>  #undef preempt_enable_no_resched_notrace
>  #undef preempt_check_resched
> +#ifndef CONFIG_HAVE_ARCH_RARE_WRITE
> +#undef sched_preempt_enable_no_resched
> +#undef preempt_enable_no_resched
> +#endif
>  #endif
>  
>  #define preempt_set_need_resched() \

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 1/8] Introduce rare_write() infrastructure
  2017-02-28  8:22   ` [kernel-hardening] " Hoeun Ryu
@ 2017-02-28 15:05     ` Kees Cook
  2017-03-01 10:43       ` Mark Rutland
  0 siblings, 1 reply; 33+ messages in thread
From: Kees Cook @ 2017-02-28 15:05 UTC (permalink / raw)
  To: Hoeun Ryu
  Cc: kernel-hardening, Mark Rutland, Andy Lutomirski, PaX Team,
	Emese Revfy, Russell King, x86

On Tue, Feb 28, 2017 at 12:22 AM, Hoeun Ryu <hoeun.ryu@gmail.com> wrote:
> On Mon, 2017-02-27 at 12:42 -0800, Kees Cook wrote:
>> Several types of data storage exist in the kernel: read-write data
>> (.data,
>> .bss), read-only data (.rodata), and RO-after-init. This introduces
>> the
>> infrastructure for another type: write-rarely, which intended for
>> data
>> that is either only rarely modified or especially security-sensitive.
>> The
>> intent is to further reduce the internal attack surface of the kernel
>> by
>> making this storage read-only when "at rest". This makes it much
>> harder
>> to be subverted by attackers who have a kernel-write flaw, since they
>> cannot directly change the memory contents.
>>
>> Variables declared __wr_rare will be made const when an architecture
>> supports HAVE_ARCH_WRITE_RARE. To change these variables, either the
>> rare_write() macro can be used, or multiple uses of __rare_write(),
>> wrapped in rare_write_enable()/rare_write_disable() macros. These
>> macros
>> are handled by the arch-specific functions that perform the actions
>> needed
>> to write to otherwise read-only memory.
>>
>> The arch-specific helpers must be not allow non-current CPUs to write
>> the memory area, run non-preemptible to avoid accidentally leaving
>> memory writable, and defined as inline to avoid making them desirable
>> ROP targets for attackers.
>>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> ---
>>  arch/Kconfig             | 15 +++++++++++++++
>>  include/linux/compiler.h | 38 ++++++++++++++++++++++++++++++++++++++
>>  include/linux/preempt.h  |  6 ++++--
>>  3 files changed, 57 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/Kconfig b/arch/Kconfig
>> index 99839c23d453..2446de19f66d 100644
>> --- a/arch/Kconfig
>> +++ b/arch/Kconfig
>> @@ -781,4 +781,19 @@ config VMAP_STACK
>>         the stack to map directly to the KASAN shadow map using a
>> formula
>>         that is incorrect if the stack is in vmalloc space.
>>
>> +config HAVE_ARCH_RARE_WRITE
>> +     def_bool n
>> +     help
>> +       An arch should select this option if it has defined the
>> functions
>> +       __arch_rare_write_map() and __arch_rare_write_unmap() to
>> +       respectively enable and disable writing to read-only
>> memory.The
>> +       routines must meet the following requirements:
>> +       - read-only memory writing must only be available on the
>> current
>> +         CPU (to make sure other CPUs can't race to make changes
>> too).
>> +       - the routines must be declared inline (to discourage ROP
>> use).
>> +       - the routines must not be preemptible (likely they will
>> call
>> +         preempt_disable() and preempt_enable_no_resched()
>> respectively).
>> +       - the routines must validate expected state (e.g. when
>> enabling
>> +         writes, BUG() if writes are already be enabled).
>> +
>>  source "kernel/gcov/Kconfig"
>> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
>> index cf0fa5d86059..f95603a8ee72 100644
>> --- a/include/linux/compiler.h
>> +++ b/include/linux/compiler.h
>> @@ -325,6 +325,44 @@ static __always_inline void
>> __write_once_size(volatile void *p, void *res, int s
>>       __u.__val;                                      \
>>  })
>>
>> +/*
>> + * Build "write rarely" infrastructure for flipping memory r/w
>> + * on a per-CPU basis.
>> + */
>> +#ifndef CONFIG_HAVE_ARCH_RARE_WRITE
>> +# define __wr_rare
>> +# define __wr_rare_type
>> +# define __rare_write_type(v)        typeof(v)
>> +# define __rare_write_ptr(v) (&(v))
>> +# define __rare_write(__var, __val)  ({              \
>> +     __var = __val;                                  \
>> +     __var;                                          \
>> +})
>> +# define rare_write_enable() do { } while (0)
>> +# define rare_write_disable()        do { } while (0)
>> +#else
>> +# define __wr_rare           __ro_after_init
>> +# define __wr_rare_type              const
>> +# define __rare_write_type(v)        typeof((typeof(v))0)
>> +# define __rare_write_ptr(v) ((__rare_write_type(v) *)&(v))
>
> Should we make __rare_write_ptr arch-specific so architectures can
> convert the pointer to rw alias from ro address like this ? [1]
>
> #ifndef __arch_rare_write_ptr
> # define __rare_write_ptr(v)    ((__rare_write_type(v) *)&(v))
> #else
> # define __rate_write_ptr(v)    __arch_rare_write_ptr
> #endif

I think that was Mark's idea too, but I'm not sure to handle the
complex cases of doing multiple updates at once (e.g. linked list
updates, etc). I think we're better doing a full disabling of RO
protection, but there had been objections to this idea in the past,
which is why I included the "complex" example, so I don't see a way
around it.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 4/8] x86: Implement __arch_rare_write_map/unmap()
  2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 4/8] x86: Implement __arch_rare_write_map/unmap() Kees Cook
@ 2017-02-28 19:33   ` Andy Lutomirski
  2017-02-28 21:35     ` Kees Cook
  0 siblings, 1 reply; 33+ messages in thread
From: Andy Lutomirski @ 2017-02-28 19:33 UTC (permalink / raw)
  To: Kees Cook
  Cc: kernel-hardening, Mark Rutland, Andy Lutomirski, Hoeun Ryu,
	PaX Team, Emese Revfy, Russell King, X86 ML

On Mon, Feb 27, 2017 at 12:43 PM, Kees Cook <keescook@chromium.org> wrote:
> Based on PaX's x86 pax_{open,close}_kernel() implementation, this
> allows HAVE_ARCH_RARE_WRITE to work on x86.
>
> There is missing work to sort out some header file issues where preempt.h
> is missing, though it can't be included in pg_table.h unconditionally...
> some other solution will be needed, perhaps an entirely separate header
> file for rare_write()-related defines...
>
> This patch is also missing paravirt support.
>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  arch/x86/Kconfig               |  1 +
>  arch/x86/include/asm/pgtable.h | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 32 insertions(+)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index e487493bbd47..6d4d6f73d4b8 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -102,6 +102,7 @@ config X86
>         select HAVE_ARCH_KMEMCHECK
>         select HAVE_ARCH_MMAP_RND_BITS          if MMU
>         select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if MMU && COMPAT
> +       select HAVE_ARCH_RARE_WRITE
>         select HAVE_ARCH_SECCOMP_FILTER
>         select HAVE_ARCH_TRACEHOOK
>         select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 437feb436efa..86e78e97738f 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -90,6 +90,37 @@ extern struct mm_struct *pgd_page_get_mm(struct page *page);
>
>  #endif /* CONFIG_PARAVIRT */
>
> +/* TODO: Bad hack to deal with preempt macros being missing sometimes. */
> +#ifndef preempt_disable
> +#include <linux/preempt.h>
> +#endif
> +
> +static inline unsigned long __arch_rare_write_map(void)
> +{
> +       unsigned long cr0;
> +
> +       preempt_disable();
> +       barrier();
> +       cr0 = read_cr0() ^ X86_CR0_WP;
> +       BUG_ON(cr0 & X86_CR0_WP);
> +       write_cr0(cr0);
> +       barrier();
> +       return cr0 ^ X86_CR0_WP;
> +}
> +
> +static inline unsigned long __arch_rare_write_unmap(void)
> +{
> +       unsigned long cr0;
> +
> +       barrier();
> +       cr0 = read_cr0() ^ X86_CR0_WP;
> +       BUG_ON(!(cr0 & X86_CR0_WP));
> +       write_cr0(cr0);
> +       barrier();
> +       preempt_enable_no_resched();
> +       return cr0 ^ X86_CR0_WP;
> +}

This seems like a wonderful ROP target.  Off-hand, I'd guess the
reason it's never been a problem (that I've heard of) in grsecurity is
that grsecurity isn't quite popular enough to be a big publicly
discussed target.

Can't we at least make this:

struct rare_write_mapping {
  void *addr;
  struct arch_rare_write_state arch_state;
};

static inline struct rare_write_mapping __arch_rare_write_map(void
*addr, size_t len);
static inline void __arch_rare_write_unmap(struct rare_write_mapping mapping);

or similar, this allowing a non-terrifying implementation (e.g. switch
CR3 to a specialized pagetable) down the road?

I realize that if you get exactly the code generation you want, the
CR0.WP approach *might* be okay, but that seems quite fragile.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 4/8] x86: Implement __arch_rare_write_map/unmap()
  2017-02-28 19:33   ` [kernel-hardening] " Andy Lutomirski
@ 2017-02-28 21:35     ` Kees Cook
  2017-02-28 22:54       ` Andy Lutomirski
  2017-03-01 10:59       ` Mark Rutland
  0 siblings, 2 replies; 33+ messages in thread
From: Kees Cook @ 2017-02-28 21:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: kernel-hardening, Mark Rutland, Andy Lutomirski, Hoeun Ryu,
	PaX Team, Emese Revfy, Russell King, X86 ML

On Tue, Feb 28, 2017 at 11:33 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Feb 27, 2017 at 12:43 PM, Kees Cook <keescook@chromium.org> wrote:
>> Based on PaX's x86 pax_{open,close}_kernel() implementation, this
>> allows HAVE_ARCH_RARE_WRITE to work on x86.
>>
>> There is missing work to sort out some header file issues where preempt.h
>> is missing, though it can't be included in pg_table.h unconditionally...
>> some other solution will be needed, perhaps an entirely separate header
>> file for rare_write()-related defines...
>>
>> This patch is also missing paravirt support.
>>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> ---
>> [...]
>> +static inline unsigned long __arch_rare_write_unmap(void)
>> +{
>> +       unsigned long cr0;
>> +
>> +       barrier();
>> +       cr0 = read_cr0() ^ X86_CR0_WP;
>> +       BUG_ON(!(cr0 & X86_CR0_WP));
>> +       write_cr0(cr0);
>> +       barrier();
>> +       preempt_enable_no_resched();
>> +       return cr0 ^ X86_CR0_WP;
>> +}
>
> This seems like a wonderful ROP target.  Off-hand, I'd guess the
> reason it's never been a problem (that I've heard of) in grsecurity is
> that grsecurity isn't quite popular enough to be a big publicly
> discussed target.

The reason it's less risky is due to the inlining. This will always
produce code where WP is left enabled again before a "ret" path is
taken. That said, it is still a minor ROP target in my mind, since it
still has the form:

turn off read-only;
write a thing;
turn on read-only;

But frankly, if an attacker already has enough control over the stack
to build up registers for the "write a thing" step to do what they
want it to do, they likely already have vast control over the kernel
already.

> Can't we at least make this:
>
> struct rare_write_mapping {
>   void *addr;
>   struct arch_rare_write_state arch_state;
> };
>
> static inline struct rare_write_mapping __arch_rare_write_map(void
> *addr, size_t len);
> static inline void __arch_rare_write_unmap(struct rare_write_mapping mapping);

How do you envision this working with the more complex things I
included in the latter patches of the series, namely doing linked list
updates across multiple structure instances, etc?

ie, poor list manipulation pseudo-code:

turn off read-only;
struct_one->next = struct_tree->node;
struct_three->prev = struct_one->node;
struct_two->prev = struct_two->next = NULL;
turn on read-only;

That's three separate memory areas involved...

> or similar, this allowing a non-terrifying implementation (e.g. switch
> CR3 to a specialized pagetable) down the road?

We'd need some kind of per-CPU mapping that other CPUs couldn't get
at... which is kind of what the WP bit gets us already. (And ARM is
similar: it elevates permissions on the kernel memory domain to ignore
restrictions.)

> I realize that if you get exactly the code generation you want, the
> CR0.WP approach *might* be okay, but that seems quite fragile.

I think with preempt off, barriers, and inlining, this should be pretty safe...

(Which reminds me that my x86 implementation needs to use
__always_inline, rather than just inline...)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 4/8] x86: Implement __arch_rare_write_map/unmap()
  2017-02-28 21:35     ` Kees Cook
@ 2017-02-28 22:54       ` Andy Lutomirski
  2017-02-28 23:52         ` Kees Cook
  2017-03-01 10:59       ` Mark Rutland
  1 sibling, 1 reply; 33+ messages in thread
From: Andy Lutomirski @ 2017-02-28 22:54 UTC (permalink / raw)
  To: Kees Cook
  Cc: kernel-hardening, Mark Rutland, Andy Lutomirski, Hoeun Ryu,
	PaX Team, Emese Revfy, Russell King, X86 ML

On Tue, Feb 28, 2017 at 1:35 PM, Kees Cook <keescook@chromium.org> wrote:
>> Can't we at least make this:
>>
>> struct rare_write_mapping {
>>   void *addr;
>>   struct arch_rare_write_state arch_state;
>> };
>>
>> static inline struct rare_write_mapping __arch_rare_write_map(void
>> *addr, size_t len);
>> static inline void __arch_rare_write_unmap(struct rare_write_mapping mapping);
>
> How do you envision this working with the more complex things I
> included in the latter patches of the series, namely doing linked list
> updates across multiple structure instances, etc?
>
> ie, poor list manipulation pseudo-code:
>
> turn off read-only;
> struct_one->next = struct_tree->node;
> struct_three->prev = struct_one->node;
> struct_two->prev = struct_two->next = NULL;
> turn on read-only;
>
> That's three separate memory areas involved...

Fair enough.  That being said, how is this supposed to work on
architectures that don't have a global "disable write protection" bit?
 Surely these architectures exist.

>
>> or similar, this allowing a non-terrifying implementation (e.g. switch
>> CR3 to a specialized pagetable) down the road?
>
> We'd need some kind of per-CPU mapping that other CPUs couldn't get
> at... which is kind of what the WP bit gets us already. (And ARM is
> similar: it elevates permissions on the kernel memory domain to ignore
> restrictions.)
>
>> I realize that if you get exactly the code generation you want, the
>> CR0.WP approach *might* be okay, but that seems quite fragile.
>
> I think with preempt off, barriers, and inlining, this should be pretty safe...
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 4/8] x86: Implement __arch_rare_write_map/unmap()
  2017-02-28 22:54       ` Andy Lutomirski
@ 2017-02-28 23:52         ` Kees Cook
  2017-03-01 11:24           ` Mark Rutland
  0 siblings, 1 reply; 33+ messages in thread
From: Kees Cook @ 2017-02-28 23:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: kernel-hardening, Mark Rutland, Andy Lutomirski, Hoeun Ryu,
	PaX Team, Emese Revfy, Russell King, X86 ML

On Tue, Feb 28, 2017 at 2:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Tue, Feb 28, 2017 at 1:35 PM, Kees Cook <keescook@chromium.org> wrote:
>>> Can't we at least make this:
>>>
>>> struct rare_write_mapping {
>>>   void *addr;
>>>   struct arch_rare_write_state arch_state;
>>> };
>>>
>>> static inline struct rare_write_mapping __arch_rare_write_map(void
>>> *addr, size_t len);
>>> static inline void __arch_rare_write_unmap(struct rare_write_mapping mapping);
>>
>> How do you envision this working with the more complex things I
>> included in the latter patches of the series, namely doing linked list
>> updates across multiple structure instances, etc?
>>
>> ie, poor list manipulation pseudo-code:
>>
>> turn off read-only;
>> struct_one->next = struct_tree->node;
>> struct_three->prev = struct_one->node;
>> struct_two->prev = struct_two->next = NULL;
>> turn on read-only;
>>
>> That's three separate memory areas involved...
>
> Fair enough.  That being said, how is this supposed to work on
> architectures that don't have a global "disable write protection" bit?
> Surely these architectures exist.

I don't know. :) That's part of the reason for putting up this series:
looking to see what's possible on other architectures. It's not clear
to me what arm64 can do, for example. Without domains there didn't
seem to be an obvious global override. My intention is to make sure we
get a viable interface for the architectures that are interested in
these kinds of self-protection thingies. :)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 5/8] ARM: Implement __arch_rare_write_map/unmap()
  2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 5/8] ARM: " Kees Cook
@ 2017-03-01  1:04   ` Russell King - ARM Linux
  2017-03-01  5:41     ` Kees Cook
  0 siblings, 1 reply; 33+ messages in thread
From: Russell King - ARM Linux @ 2017-03-01  1:04 UTC (permalink / raw)
  To: Kees Cook
  Cc: kernel-hardening, Mark Rutland, Andy Lutomirski, Hoeun Ryu,
	PaX Team, Emese Revfy, x86

On Mon, Feb 27, 2017 at 12:43:03PM -0800, Kees Cook wrote:
> Based on grsecurity's ARM pax_{open,close}_kernel() implementation, this
> allows HAVE_ARCH_RARE_WRITE to work on ARM.

This has the effect that any memory mapped with DOMAIN_KERNEL will
loose it's NX status, and may end up being read into the I-cache.

We used to do exactly this to support set_fs(KERNEL_DS) but it was
deemed to be way too problematical (for speculative prefetching)
to use it on ARMv6+.

As vmalloc space ends up with a random mixture of DOMAIN_KERNEL and
DOMAIN_IO mappings (due to the order of ioremap() vs vmalloc()), this
means DOMAIN_KERNEL can cover devices... which with switching
DOMAIN_KERNEL to manager mode result in the NX being removed for
device mappings, which (iirc) is unpredictable.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 5/8] ARM: Implement __arch_rare_write_map/unmap()
  2017-03-01  1:04   ` [kernel-hardening] " Russell King - ARM Linux
@ 2017-03-01  5:41     ` Kees Cook
  2017-03-01 11:30       ` Russell King - ARM Linux
  2017-03-01 11:50       ` Mark Rutland
  0 siblings, 2 replies; 33+ messages in thread
From: Kees Cook @ 2017-03-01  5:41 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: kernel-hardening, Mark Rutland, Andy Lutomirski, Hoeun Ryu,
	PaX Team, Emese Revfy, x86

On Tue, Feb 28, 2017 at 5:04 PM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Mon, Feb 27, 2017 at 12:43:03PM -0800, Kees Cook wrote:
>> Based on grsecurity's ARM pax_{open,close}_kernel() implementation, this
>> allows HAVE_ARCH_RARE_WRITE to work on ARM.
>
> This has the effect that any memory mapped with DOMAIN_KERNEL will
> loose it's NX status, and may end up being read into the I-cache.

Arbitrarily so, or only memory accessed/pre-fetched by the CPU when in
this state? i.e. since this is non-preempt, only touches the needed
memory, and has the original domain state restored within a few
instructions, does this avoid the problem? It seems like the chance
for a speculative prefetch from device memory under these conditions
should be approaching zero.

> We used to do exactly this to support set_fs(KERNEL_DS) but it was
> deemed to be way too problematical (for speculative prefetching)
> to use it on ARMv6+.
>
> As vmalloc space ends up with a random mixture of DOMAIN_KERNEL and
> DOMAIN_IO mappings (due to the order of ioremap() vs vmalloc()), this
> means DOMAIN_KERNEL can cover devices... which with switching
> DOMAIN_KERNEL to manager mode result in the NX being removed for
> device mappings, which (iirc) is unpredictable.

Just to make sure I understand: it was only speculative prefetch vs
icache, right? Would an icache flush restore the correct permissions?
I'm just pondering alternatives. Also, is there a maximum distance the
prefetching spans? i.e. could device memory be guaranteed to be
vmapped far enough away from kernel memory to avoid prefetches?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 1/8] Introduce rare_write() infrastructure
  2017-02-28 15:05     ` Kees Cook
@ 2017-03-01 10:43       ` Mark Rutland
  2017-03-01 20:13         ` Kees Cook
  0 siblings, 1 reply; 33+ messages in thread
From: Mark Rutland @ 2017-03-01 10:43 UTC (permalink / raw)
  To: Kees Cook
  Cc: Hoeun Ryu, kernel-hardening, Andy Lutomirski, PaX Team,
	Emese Revfy, Russell King, x86

Hi,

On Tue, Feb 28, 2017 at 07:05:32AM -0800, Kees Cook wrote:
> On Tue, Feb 28, 2017 at 12:22 AM, Hoeun Ryu <hoeun.ryu@gmail.com> wrote:
> > On Mon, 2017-02-27 at 12:42 -0800, Kees Cook wrote:

> >> +/*
> >> + * Build "write rarely" infrastructure for flipping memory r/w
> >> + * on a per-CPU basis.
> >> + */
> >> +#ifndef CONFIG_HAVE_ARCH_RARE_WRITE
> >> +# define __wr_rare
> >> +# define __wr_rare_type
> >> +# define __rare_write_type(v)        typeof(v)
> >> +# define __rare_write_ptr(v) (&(v))
> >> +# define __rare_write(__var, __val)  ({              \
> >> +     __var = __val;                                  \
> >> +     __var;                                          \
> >> +})
> >> +# define rare_write_enable() do { } while (0)
> >> +# define rare_write_disable()        do { } while (0)
> >> +#else
> >> +# define __wr_rare           __ro_after_init
> >> +# define __wr_rare_type              const
> >> +# define __rare_write_type(v)        typeof((typeof(v))0)
> >> +# define __rare_write_ptr(v) ((__rare_write_type(v) *)&(v))
> >
> > Should we make __rare_write_ptr arch-specific so architectures can
> > convert the pointer to rw alias from ro address like this ? [1]
> >
> > #ifndef __arch_rare_write_ptr
> > # define __rare_write_ptr(v)    ((__rare_write_type(v) *)&(v))
> > #else
> > # define __rate_write_ptr(v)    __arch_rare_write_ptr
> > #endif
> 
> I think that was Mark's idea too, but I'm not sure to handle the
> complex cases of doing multiple updates at once (e.g. linked list
> updates, etc). 

Assuming there aren't that many places with complex updates, we could
allow code to explicitly do the map/unmap and use __rare_write_ptr
directly. That does mean that the list code has to be specialised for
__wr_rare, which is less than ideal.

That would also depend on what is __rw_rare in those cases, too.  i.e.
is it just the list_head for the list itself, the list_head of all
elements, or only some of them?

For the latter mixed case, the __rare_write_ptr() approach probably
doesn't work, and that rules out the mm-based approach, too, I guess. :/

> I think we're better doing a full disabling of RO protection, but
> there had been objections to this idea in the past, which is why I
> included the "complex" example, so I don't see a way around it.

My objection to that was that we can't implement CPU-local full
disabling of RO protection for the kernel page tables on some
architectures and configurations, e.g. arm64, or 32-bit arm with LPAE.

The RW alias write_write(var, val) approach only relies on what arches
already have to implement for userspace to work, so if we can figure out
how to make that work API-wise, we can probably implement that
generically with switch_mm() and {get,put}_user().

The only other way I can think to make this work would be to make a copy
of the whole swapper page tables, with the write-rarely data marked RW.
For arm64 at least, that'll be incredibly painful to keep in-sync with
the usual tables, in addition to being very expensive to switch to/from.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 4/8] x86: Implement __arch_rare_write_map/unmap()
  2017-02-28 21:35     ` Kees Cook
  2017-02-28 22:54       ` Andy Lutomirski
@ 2017-03-01 10:59       ` Mark Rutland
  1 sibling, 0 replies; 33+ messages in thread
From: Mark Rutland @ 2017-03-01 10:59 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, kernel-hardening, Andy Lutomirski, Hoeun Ryu,
	PaX Team, Emese Revfy, Russell King, X86 ML

On Tue, Feb 28, 2017 at 01:35:13PM -0800, Kees Cook wrote:
> On Tue, Feb 28, 2017 at 11:33 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> > On Mon, Feb 27, 2017 at 12:43 PM, Kees Cook <keescook@chromium.org> wrote:
> >> Based on PaX's x86 pax_{open,close}_kernel() implementation, this
> >> allows HAVE_ARCH_RARE_WRITE to work on x86.
> >>
> >> There is missing work to sort out some header file issues where preempt.h
> >> is missing, though it can't be included in pg_table.h unconditionally...
> >> some other solution will be needed, perhaps an entirely separate header
> >> file for rare_write()-related defines...
> >>
> >> This patch is also missing paravirt support.
> >>
> >> Signed-off-by: Kees Cook <keescook@chromium.org>
> >> ---
> >> [...]
> >> +static inline unsigned long __arch_rare_write_unmap(void)
> >> +{
> >> +       unsigned long cr0;
> >> +
> >> +       barrier();
> >> +       cr0 = read_cr0() ^ X86_CR0_WP;
> >> +       BUG_ON(!(cr0 & X86_CR0_WP));
> >> +       write_cr0(cr0);
> >> +       barrier();
> >> +       preempt_enable_no_resched();
> >> +       return cr0 ^ X86_CR0_WP;
> >> +}

> > Can't we at least make this:
> >
> > struct rare_write_mapping {
> >   void *addr;
> >   struct arch_rare_write_state arch_state;
> > };
> >
> > static inline struct rare_write_mapping __arch_rare_write_map(void
> > *addr, size_t len);
> > static inline void __arch_rare_write_unmap(struct rare_write_mapping mapping);
> 
> How do you envision this working with the more complex things I
> included in the latter patches of the series, namely doing linked list
> updates across multiple structure instances, etc?
> 
> ie, poor list manipulation pseudo-code:
> 
> turn off read-only;
> struct_one->next = struct_tree->node;
> struct_three->prev = struct_one->node;
> struct_two->prev = struct_two->next = NULL;
> turn on read-only;
> 
> That's three separate memory areas involved...

Do we need all list manipulation primitives, or is is just list_add()
and list_del() that we really care about?

If we only need to support a limited set of primitives, then we could
special-case those, e.g. for the above:

	rare_write_map();
	__rare_write(struct_one->next, struct_tree->node);
	__rare_write(struct_three->prev, struct_one->node);
	__rare_write(struct_two->prev, NULL);
	__rare_write(struct_two->next, NULL);
	rare_write_unmap();

... then the __rare_write() can map/unmap a separate page table if
required. We can have separate rare_write() and __rare_write() to fold
in the rare_write_{map,unmap}(), as with {__,}copy_*_user().

That doesn't work if we need arbitrary primitives, certainly.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 4/8] x86: Implement __arch_rare_write_map/unmap()
  2017-02-28 23:52         ` Kees Cook
@ 2017-03-01 11:24           ` Mark Rutland
  2017-03-01 20:25             ` Kees Cook
  2017-03-03  0:59             ` Hoeun Ryu
  0 siblings, 2 replies; 33+ messages in thread
From: Mark Rutland @ 2017-03-01 11:24 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, kernel-hardening, Andy Lutomirski, Hoeun Ryu,
	PaX Team, Emese Revfy, Russell King, X86 ML

On Tue, Feb 28, 2017 at 03:52:26PM -0800, Kees Cook wrote:
> On Tue, Feb 28, 2017 at 2:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> > On Tue, Feb 28, 2017 at 1:35 PM, Kees Cook <keescook@chromium.org> wrote:
> >>> Can't we at least make this:
> >>>
> >>> struct rare_write_mapping {
> >>>   void *addr;
> >>>   struct arch_rare_write_state arch_state;
> >>> };
> >>>
> >>> static inline struct rare_write_mapping __arch_rare_write_map(void
> >>> *addr, size_t len);
> >>> static inline void __arch_rare_write_unmap(struct rare_write_mapping mapping);
> >>
> >> How do you envision this working with the more complex things I
> >> included in the latter patches of the series, namely doing linked list
> >> updates across multiple structure instances, etc?
> >>
> >> ie, poor list manipulation pseudo-code:
> >>
> >> turn off read-only;
> >> struct_one->next = struct_tree->node;
> >> struct_three->prev = struct_one->node;
> >> struct_two->prev = struct_two->next = NULL;
> >> turn on read-only;
> >>
> >> That's three separate memory areas involved...
> >
> > Fair enough.  That being said, how is this supposed to work on
> > architectures that don't have a global "disable write protection" bit?
> > Surely these architectures exist.
> 
> I don't know. :) That's part of the reason for putting up this series:
> looking to see what's possible on other architectures. It's not clear
> to me what arm64 can do, for example.

There is no global override of this sort on arm64. Just having map/unap,
open/close, shed/unshed, etc, won't work.

The options I can think of for arm64 are:

* Have a separate RW alias of just the write_rarely data, that we
  temporarily map-in on a given CPU (using TTBR0) to perform the write.
  The RW alias is at a different VA to the usual RO alias, so we have to
  convert each pointer to its RW alias to perform the write. That's why
  we need __rare_write_ptr() to hide this, and can't have uninstrumented
  writes.
  
  Since this would *only* map the write_rarely data, it's simple to set
  up, and we don't need to modify the tables at runtime.
  
  I also think we can implement this generically using switch_mm() and
  {get,put}_user(), or specialised variants thereof.

  Assuming we can figure out how to handle those complex cases, this is
  my preferred solution. :)
  
* Have a copy of the entire swapper page tables, which differs only in
  the write_rarely data being RW. We then switch TTBR1 over to this for
  the duration of the rare_write_map() .. write_write_unmap() sequence.

  While this sounds conceptually simple, it's far more complex in
  practice. Keeping the not-quite-clone of the swapper in-sync is going
  to be very painful and ripe for error.

  Additionally, the TTBR1 switch will be very expensive, far more so
  than the TTBR0 switch due to TLB maintenance and other work (including
  switching TTBR0 twice per TTBR1 switch, itself requiring synchronised
  TLB maintenance and other work).

  I am not fond of this approach.

* Give up on this being per-cpu. Either change the permissions in place,
  or fixmap an RW alias as required. In either case, all CPUs will have
  RW access.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 5/8] ARM: Implement __arch_rare_write_map/unmap()
  2017-03-01  5:41     ` Kees Cook
@ 2017-03-01 11:30       ` Russell King - ARM Linux
  2017-03-02  0:08         ` Kees Cook
  2017-03-01 11:50       ` Mark Rutland
  1 sibling, 1 reply; 33+ messages in thread
From: Russell King - ARM Linux @ 2017-03-01 11:30 UTC (permalink / raw)
  To: Kees Cook
  Cc: kernel-hardening, Mark Rutland, Andy Lutomirski, Hoeun Ryu,
	PaX Team, Emese Revfy, x86

On Tue, Feb 28, 2017 at 09:41:07PM -0800, Kees Cook wrote:
> On Tue, Feb 28, 2017 at 5:04 PM, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
> > On Mon, Feb 27, 2017 at 12:43:03PM -0800, Kees Cook wrote:
> >> Based on grsecurity's ARM pax_{open,close}_kernel() implementation, this
> >> allows HAVE_ARCH_RARE_WRITE to work on ARM.
> >
> > This has the effect that any memory mapped with DOMAIN_KERNEL will
> > loose it's NX status, and may end up being read into the I-cache.
> 
> Arbitrarily so, or only memory accessed/pre-fetched by the CPU when in
> this state? i.e. since this is non-preempt, only touches the needed
> memory, and has the original domain state restored within a few
> instructions, does this avoid the problem? It seems like the chance
> for a speculative prefetch from device memory under these conditions
> should be approaching zero.

"The software that defines a translation table must mark any region of
 memory that is read-sensitive as execute-never, to avoid the possibility
 of a speculative fetch accessing the memory region. For example, it must
 mark any memory region that corresponds to a read-sensitive peripheral
 as Execute-never."

Also see:

commit 247055aa21ffef1c49dd64710d5e94c2aee19b58
Author: Catalin Marinas <catalin.marinas@arm.com>
Date:   Mon Sep 13 16:03:21 2010 +0100

    ARM: 6384/1: Remove the domain switching on ARMv6k/v7 CPUs

which removed the domain switching I referred to previously.

The way the ARM ARM looks at instruction speculative prefetch is that it
can happen to any location that is not explicitly marked as Execute-never.
(This is because the ARM ARM doesn't define an implementation.)  So, we
have to assume that any location that is not marked XN may be speculatively
prefetched by the processor.

Device memory can be read-sensitive - eg, reading an interrupt status
register can clear the ending interrupt bits.

A speculative prefetch is a read as far as a device is concerned, so
bypassing the XN permission by switching the domain to manager mode has
the effect that the processor can then _legally_ speculatively prefetch
from a device, and if it happens to hit a device that contains a read
sensitive location, the side effects of reading that location will
happen, even though the program did not perform an explicit read.

> Just to make sure I understand: it was only speculative prefetch vs
> icache, right? Would an icache flush restore the correct permissions?

It's not about permissions, it's about the side effects at the device
of a read created by the speculative prefetch.

> I'm just pondering alternatives. Also, is there a maximum distance the
> prefetching spans? i.e. could device memory be guaranteed to be
> vmapped far enough away from kernel memory to avoid prefetches?

The root cause of this problem is the way we lump both vmalloc() and
ioremap() mappings into the same memory space (vmalloc region) without
caring about the domain.

If all device memory was guaranteed to be placed under a different
domain, then this problem would not exist.  In order to achieve that,
there's several ways I can think of doing it:

1) Have separate virtual memory regions for ioremap() and vmalloc()
   We would need to choose an arbitary limit on the size of these
   memory pools, which may not suit everyone.

2) Have vmalloc() grow up as a heap, ioremap() grow down as a stack
   and a dynamic boundary (aligned to 1 or 2MB) between the two, no
   mixing allowed.  This avoids the problem with (1) but still results
   in the required separation.

3) Align vmalloc region allocations to 2MB, but this would be very
   wasteful.

4) Only permit same type (ioremap/vmalloc) of mapping within a 2MB block
   of vmalloc space.  In other words, a primary allocator of 2MB blocks
   and a sub-allocator of page-sized blocks (think of the way our
   page allocator vs slab works.)  Probably going to be subject to
   fragmentation problems.

5) Place all vmalloc() and ioremap() mappings under a separate domain,
   so that all these mappings would be unaffected by the change of
   domain settings (the resulting permissions would never change.)
   In other words, DOMAIN_IO becomes DOMAIN_VMALLOC and is used for all
   mappings in vmalloc space.

The problem with (2) and (5) is teaching pte_alloc_kernel() down to
pmd_populate_kernel() about the differences - currently, this only ever
sets up DOMAIN_KERNEL mappings because there's no way for it to know
what kind of mapping is required.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 5/8] ARM: Implement __arch_rare_write_map/unmap()
  2017-03-01  5:41     ` Kees Cook
  2017-03-01 11:30       ` Russell King - ARM Linux
@ 2017-03-01 11:50       ` Mark Rutland
  1 sibling, 0 replies; 33+ messages in thread
From: Mark Rutland @ 2017-03-01 11:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Russell King - ARM Linux, kernel-hardening, Andy Lutomirski,
	Hoeun Ryu, PaX Team, Emese Revfy, x86

Hi,

On Tue, Feb 28, 2017 at 09:41:07PM -0800, Kees Cook wrote:
> On Tue, Feb 28, 2017 at 5:04 PM, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
> > On Mon, Feb 27, 2017 at 12:43:03PM -0800, Kees Cook wrote:
> >> Based on grsecurity's ARM pax_{open,close}_kernel() implementation, this
> >> allows HAVE_ARCH_RARE_WRITE to work on ARM.
> >
> > This has the effect that any memory mapped with DOMAIN_KERNEL will
> > loose it's NX status, and may end up being read into the I-cache.
> 
> Arbitrarily so, or only memory accessed/pre-fetched by the CPU when in
> this state? 

While the MMU says the VA is executable, though "while" is difficult to
define, see below.

> i.e. since this is non-preempt, only touches the needed memory, and
> has the original domain state restored within a few instructions, does
> this avoid the problem? It seems like the chance for a speculative
> prefetch from device memory under these conditions should be
> approaching zero.

It's entirely possible for the I-cache to fetch within this window, even
if unlikely.

It is a bug to map devices without NX (on cores that have it), even
instantaneously.

You can't reason about this in terms of number of instructions, since
bits of the CPU can operate asynchronously anyhow. For example, the CPU
might stall immediately after the domain switch, fetching the next
instruction, and in the mean time the I-cache decides to send of
requests for some other arbitrary locations while waiting for a
response.

> > We used to do exactly this to support set_fs(KERNEL_DS) but it was
> > deemed to be way too problematical (for speculative prefetching)
> > to use it on ARMv6+.
> >
> > As vmalloc space ends up with a random mixture of DOMAIN_KERNEL and
> > DOMAIN_IO mappings (due to the order of ioremap() vs vmalloc()), this
> > means DOMAIN_KERNEL can cover devices... which with switching
> > DOMAIN_KERNEL to manager mode result in the NX being removed for
> > device mappings, which (iirc) is unpredictable.
> 
> Just to make sure I understand: it was only speculative prefetch vs
> icache, right? Would an icache flush restore the correct permissions?

The problem is that the fetch itself can be destructive. It can change
the state of a device (see below for an example), or trigger
(asynchronous) errors from the endpoint or interconnect.

No amount of cache maintenance can avoid this.

> I'm just pondering alternatives. Also, is there a maximum distance the
> prefetching spans? i.e. could device memory be guaranteed to be
> vmapped far enough away from kernel memory to avoid prefetches?

There is no practical limitation. The architecture permits a CPU's
I-cache to fetch from any mapping which does not have NX, at any point
in time that mapping is live, for any reason it sees fit.

For example, see commit b6ccb9803e90c16b ("ARM: 7954/1: mm: remove
remaining domain support from ARMv6").

In that case, while executing some kernel code (e.g. the sys_exec()
path), Cortex-A15's instruction fetch would occasionally fetch from the
GIC, ACKing interrupts in the process.

The only solution is to never map devices without NX.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 1/8] Introduce rare_write() infrastructure
  2017-03-01 10:43       ` Mark Rutland
@ 2017-03-01 20:13         ` Kees Cook
  2017-03-01 20:31           ` Kees Cook
  2017-03-01 21:00           ` Andy Lutomirski
  0 siblings, 2 replies; 33+ messages in thread
From: Kees Cook @ 2017-03-01 20:13 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Hoeun Ryu, kernel-hardening, Andy Lutomirski, PaX Team,
	Emese Revfy, Russell King, x86

On Wed, Mar 1, 2017 at 2:43 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi,
>
> On Tue, Feb 28, 2017 at 07:05:32AM -0800, Kees Cook wrote:
>> On Tue, Feb 28, 2017 at 12:22 AM, Hoeun Ryu <hoeun.ryu@gmail.com> wrote:
>> > On Mon, 2017-02-27 at 12:42 -0800, Kees Cook wrote:
>
>> >> +/*
>> >> + * Build "write rarely" infrastructure for flipping memory r/w
>> >> + * on a per-CPU basis.
>> >> + */
>> >> +#ifndef CONFIG_HAVE_ARCH_RARE_WRITE
>> >> +# define __wr_rare
>> >> +# define __wr_rare_type
>> >> +# define __rare_write_type(v)        typeof(v)
>> >> +# define __rare_write_ptr(v) (&(v))
>> >> +# define __rare_write(__var, __val)  ({              \
>> >> +     __var = __val;                                  \
>> >> +     __var;                                          \
>> >> +})
>> >> +# define rare_write_enable() do { } while (0)
>> >> +# define rare_write_disable()        do { } while (0)
>> >> +#else
>> >> +# define __wr_rare           __ro_after_init
>> >> +# define __wr_rare_type              const
>> >> +# define __rare_write_type(v)        typeof((typeof(v))0)
>> >> +# define __rare_write_ptr(v) ((__rare_write_type(v) *)&(v))
>> >
>> > Should we make __rare_write_ptr arch-specific so architectures can
>> > convert the pointer to rw alias from ro address like this ? [1]
>> >
>> > #ifndef __arch_rare_write_ptr
>> > # define __rare_write_ptr(v)    ((__rare_write_type(v) *)&(v))
>> > #else
>> > # define __rate_write_ptr(v)    __arch_rare_write_ptr
>> > #endif
>>
>> I think that was Mark's idea too, but I'm not sure to handle the
>> complex cases of doing multiple updates at once (e.g. linked list
>> updates, etc).
>
> Assuming there aren't that many places with complex updates, we could
> allow code to explicitly do the map/unmap and use __rare_write_ptr
> directly. That does mean that the list code has to be specialised for
> __wr_rare, which is less than ideal.

Here's some sed output: http://paste.ubuntu.com/24092015/

grsecurity currently has 314 instances of using
pax_open/close_kernel(). The number of lines of code between them is
about half a single line, but there is a lot of variation on how it's
used:

count  lines-of-code
    164 1
     72 2
     21 3
     20 4
      2 5
      8 6
      3 7
      2 8
      1 9
     18 10+

The obvious helpers are in the list code (for double, rcu, and single
lists), with most header file uses are operating on page tables.

The list code is already specialized (see the rare_list_add()/del()
function), and could be made even more explicit if needed.

> That would also depend on what is __rw_rare in those cases, too.  i.e.
> is it just the list_head for the list itself, the list_head of all
> elements, or only some of them?

lib/list_debug.c:EXPORT_SYMBOL(__pax_list_add);
lib/list_debug.c:EXPORT_SYMBOL(pax_list_del);
lib/list_debug.c:EXPORT_SYMBOL(pax_list_del_init);
lib/list_debug.c:EXPORT_SYMBOL(__pax_list_add_rcu);
lib/list_debug.c:EXPORT_SYMBOL(pax_list_del_rcu);
lib/llist.c:EXPORT_SYMBOL_GPL(pax_llist_add_batch);

I haven't exhaustively checked, but I assume it could touch any of the
list pieces? I suspect there are mixtures of read-only and
non-read-only elements, but I haven't dug that far yet. From what I
can see in the cgroup case I showed, all the struct cftype variables
were in .data (static struct cftype foo[] = { ... }) so no kmalloc nor
vmalloc ones are mixed in. But I doubt we can entirely depend on
that...

> For the latter mixed case, the __rare_write_ptr() approach probably
> doesn't work, and that rules out the mm-based approach, too, I guess. :/
>
>> I think we're better doing a full disabling of RO protection, but
>> there had been objections to this idea in the past, which is why I
>> included the "complex" example, so I don't see a way around it.
>
> My objection to that was that we can't implement CPU-local full
> disabling of RO protection for the kernel page tables on some
> architectures and configurations, e.g. arm64, or 32-bit arm with LPAE.

The CPU-local is a pretty important requirement... Hrmm

> The RW alias write_write(var, val) approach only relies on what arches
> already have to implement for userspace to work, so if we can figure out
> how to make that work API-wise, we can probably implement that
> generically with switch_mm() and {get,put}_user().

With a third mm? I maybe misunderstood what you meant about userspace...

> The only other way I can think to make this work would be to make a copy
> of the whole swapper page tables, with the write-rarely data marked RW.
> For arm64 at least, that'll be incredibly painful to keep in-sync with
> the usual tables, in addition to being very expensive to switch to/from.

Well, I think sync shouldn't be hard since it'd just be a mirror of
the .rodata section, which is configured once. (Well, and for
modules... oof.)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 4/8] x86: Implement __arch_rare_write_map/unmap()
  2017-03-01 11:24           ` Mark Rutland
@ 2017-03-01 20:25             ` Kees Cook
  2017-03-02 11:20               ` Mark Rutland
  2017-03-03  0:59             ` Hoeun Ryu
  1 sibling, 1 reply; 33+ messages in thread
From: Kees Cook @ 2017-03-01 20:25 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Andy Lutomirski, kernel-hardening, Andy Lutomirski, Hoeun Ryu,
	PaX Team, Emese Revfy, Russell King, X86 ML

On Wed, Mar 1, 2017 at 3:24 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Tue, Feb 28, 2017 at 03:52:26PM -0800, Kees Cook wrote:
>> On Tue, Feb 28, 2017 at 2:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> > On Tue, Feb 28, 2017 at 1:35 PM, Kees Cook <keescook@chromium.org> wrote:
>> >>> Can't we at least make this:
>> >>>
>> >>> struct rare_write_mapping {
>> >>>   void *addr;
>> >>>   struct arch_rare_write_state arch_state;
>> >>> };
>> >>>
>> >>> static inline struct rare_write_mapping __arch_rare_write_map(void
>> >>> *addr, size_t len);
>> >>> static inline void __arch_rare_write_unmap(struct rare_write_mapping mapping);
>> >>
>> >> How do you envision this working with the more complex things I
>> >> included in the latter patches of the series, namely doing linked list
>> >> updates across multiple structure instances, etc?
>> >>
>> >> ie, poor list manipulation pseudo-code:
>> >>
>> >> turn off read-only;
>> >> struct_one->next = struct_tree->node;
>> >> struct_three->prev = struct_one->node;
>> >> struct_two->prev = struct_two->next = NULL;
>> >> turn on read-only;
>> >>
>> >> That's three separate memory areas involved...
>> >
>> > Fair enough.  That being said, how is this supposed to work on
>> > architectures that don't have a global "disable write protection" bit?
>> > Surely these architectures exist.
>>
>> I don't know. :) That's part of the reason for putting up this series:
>> looking to see what's possible on other architectures. It's not clear
>> to me what arm64 can do, for example.
>
> There is no global override of this sort on arm64. Just having map/unap,
> open/close, shed/unshed, etc, won't work.
>
> The options I can think of for arm64 are:
>
> * Have a separate RW alias of just the write_rarely data, that we
>   temporarily map-in on a given CPU (using TTBR0) to perform the write.
>   The RW alias is at a different VA to the usual RO alias, so we have to
>   convert each pointer to its RW alias to perform the write. That's why
>   we need __rare_write_ptr() to hide this, and can't have uninstrumented
>   writes.

I think only the list code isn't instrumented, and that's just because
it discards casts outside the function. There's no reason it couldn't
be instrumented. All the other places I can see are using the
const_cast() explicitly because otherwise gcc freaks out (since the
constify plugin forces the variables const). There are a few places
where things aren't explicitly const_cast() (like in the modules).

Oh hey, I just found a bug in the open/close stuff, that's why there
was a giant routine misdetected by scripts... just changes one of the
outiers though. I'll send a separate email with the correction...

>   Since this would *only* map the write_rarely data, it's simple to set
>   up, and we don't need to modify the tables at runtime.
>
>   I also think we can implement this generically using switch_mm() and
>   {get,put}_user(), or specialised variants thereof.
>
>   Assuming we can figure out how to handle those complex cases, this is
>   my preferred solution. :)

Would this alias be CPU-local? (I assume yes, given the "give up on on
being per-cpu" option below..)

> * Have a copy of the entire swapper page tables, which differs only in
>   the write_rarely data being RW. We then switch TTBR1 over to this for
>   the duration of the rare_write_map() .. write_write_unmap() sequence.
>
>   While this sounds conceptually simple, it's far more complex in
>   practice. Keeping the not-quite-clone of the swapper in-sync is going
>   to be very painful and ripe for error.
>
>   Additionally, the TTBR1 switch will be very expensive, far more so
>   than the TTBR0 switch due to TLB maintenance and other work (including
>   switching TTBR0 twice per TTBR1 switch, itself requiring synchronised
>   TLB maintenance and other work).
>
>   I am not fond of this approach.
>
> * Give up on this being per-cpu. Either change the permissions in place,
>   or fixmap an RW alias as required. In either case, all CPUs will have
>   RW access.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 1/8] Introduce rare_write() infrastructure
  2017-03-01 20:13         ` Kees Cook
@ 2017-03-01 20:31           ` Kees Cook
  2017-03-01 21:00           ` Andy Lutomirski
  1 sibling, 0 replies; 33+ messages in thread
From: Kees Cook @ 2017-03-01 20:31 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Hoeun Ryu, kernel-hardening, Andy Lutomirski, PaX Team,
	Emese Revfy, Russell King, x86

On Wed, Mar 1, 2017 at 12:13 PM, Kees Cook <keescook@chromium.org> wrote:

> Here's some sed output: http://paste.ubuntu.com/24092015/
>
> grsecurity currently has 314 instances of using
> pax_open/close_kernel(). The number of lines of code between them is
> about half a single line, but there is a lot of variation on how it's
> used:
>
> count  lines-of-code
>     164 1
>      72 2
>      21 3
>      20 4
>       2 5
>       8 6
>       3 7
>       2 8
>       1 9
>      18 10+

Oops, bug in grsecurity fooled my scripts and evaded detection.
There's another 3 line use. If you search the pastebin for
pax_open_kernel, you can see a giant bit in the middle that isn't
supposed to be there:

drivers/pci/hotplug/cpcihp_zt5550.c
       pax_open_kernel();
       const_cast(zt5550_hpc_ops.enable_irq) = zt5550_hc_enable_irq;
       const_cast(zt5550_hpc_ops.disable_irq) = zt5550_hc_disable_irq;
       const_cast(zt5550_hpc_ops.check_irq) = zt5550_hc_check_irq;
       pax_open_kernel();

I'll send a patch...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 1/8] Introduce rare_write() infrastructure
  2017-03-01 20:13         ` Kees Cook
  2017-03-01 20:31           ` Kees Cook
@ 2017-03-01 21:00           ` Andy Lutomirski
  2017-03-01 23:14             ` Kees Cook
  2017-03-02 11:19             ` Mark Rutland
  1 sibling, 2 replies; 33+ messages in thread
From: Andy Lutomirski @ 2017-03-01 21:00 UTC (permalink / raw)
  To: Kees Cook
  Cc: Mark Rutland, Hoeun Ryu, kernel-hardening, Andy Lutomirski,
	PaX Team, Emese Revfy, Russell King, x86

On Wed, Mar 1, 2017 at 12:13 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Mar 1, 2017 at 2:43 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>> My objection to that was that we can't implement CPU-local full
>> disabling of RO protection for the kernel page tables on some
>> architectures and configurations, e.g. arm64, or 32-bit arm with LPAE.
>
> The CPU-local is a pretty important requirement... Hrmm

Why?  Presumably we'd do pretty well if we made an alias somewhere at
a random address and got rid of it quickly.

>
>> The RW alias write_write(var, val) approach only relies on what arches
>> already have to implement for userspace to work, so if we can figure out
>> how to make that work API-wise, we can probably implement that
>> generically with switch_mm() and {get,put}_user().
>
> With a third mm? I maybe misunderstood what you meant about userspace...
>

I think I'd like this series a lot better if the arch hooks were
designed in such a way that a generic implementation could be backed
by kmap, use_mm, or similar.  This would *require* that the arch hooks
be able to return a different address.  Also, I see no reason that the
list manipulation needs to be particularly special.  If the arch hook
sets up an alias, couldn't the generic code just call it twice.

So here's a new proposal for how the hooks could look:

void __arch_rare_write_begin(void);
void *__arch_rare_write_map(void *addr, size_t len);
void *__arch_rare_write_unmap(void *addr, size_t len);
void __arch_rare_write_end(void);

or an even simpler variant:

void __arch_rare_write_begin(void);
void __arch_rare_write(void *dest, const void *source, size_t len);
void __arch_rare_write_end(void);

Now a generic implementation could work by allocating a percpu
mm_struct that contains a single giant VMA.  __arch_rare_write_begin()
switches to that mm.  __arch_rare_write pokes some PTEs into the mm
and calls copy_to_user().  __arch_rare_write_end() switches back to
the original mm.  An x86 implementation could just fiddle with CR0.WP.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 1/8] Introduce rare_write() infrastructure
  2017-03-01 21:00           ` Andy Lutomirski
@ 2017-03-01 23:14             ` Kees Cook
  2017-03-02 11:19             ` Mark Rutland
  1 sibling, 0 replies; 33+ messages in thread
From: Kees Cook @ 2017-03-01 23:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mark Rutland, Hoeun Ryu, kernel-hardening, Andy Lutomirski,
	PaX Team, Emese Revfy, Russell King, x86

On Wed, Mar 1, 2017 at 1:00 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Wed, Mar 1, 2017 at 12:13 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Wed, Mar 1, 2017 at 2:43 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>>> My objection to that was that we can't implement CPU-local full
>>> disabling of RO protection for the kernel page tables on some
>>> architectures and configurations, e.g. arm64, or 32-bit arm with LPAE.
>>
>> The CPU-local is a pretty important requirement... Hrmm
>
> Why?  Presumably we'd do pretty well if we made an alias somewhere at
> a random address and got rid of it quickly.

I'd much rather avoid any risk like this, just so the protection could
be safely reasoned about...

>>> The RW alias write_write(var, val) approach only relies on what arches
>>> already have to implement for userspace to work, so if we can figure out
>>> how to make that work API-wise, we can probably implement that
>>> generically with switch_mm() and {get,put}_user().
>>
>> With a third mm? I maybe misunderstood what you meant about userspace...
>
> I think I'd like this series a lot better if the arch hooks were
> designed in such a way that a generic implementation could be backed
> by kmap, use_mm, or similar.  This would *require* that the arch hooks
> be able to return a different address.  Also, I see no reason that the
> list manipulation needs to be particularly special.  If the arch hook

Yeah, there's nothing special about the list manipulation except that
it already dropped the const casts, so adding them back is trivial.

> sets up an alias, couldn't the generic code just call it twice.
>
> So here's a new proposal for how the hooks could look:
>
> void __arch_rare_write_begin(void);
> void *__arch_rare_write_map(void *addr, size_t len);
> void *__arch_rare_write_unmap(void *addr, size_t len);
> void __arch_rare_write_end(void);
>
> or an even simpler variant:
>
> void __arch_rare_write_begin(void);
> void __arch_rare_write(void *dest, const void *source, size_t len);
> void __arch_rare_write_end(void);

Based on your and Mark's feedback from last year, I'd probably create
rare_write() as a macro that examines types and calls some other the
macro that has the above parameters but enforces the
builtin_const-ness of "len", before ultimately resolving to
__arch_rare_write() as above, just to be as type-safe as we can
manage...

#define __rare_write_n(dst, src, len)    ({ \
    BUILD_BUG(!builtin_const(len));      \
    __arch_rare_write((dst), (src), (len);  \
})
#define __rare_write(var, val)     __rare_write_n(&(var), &(val), sizeof(var))

> Now a generic implementation could work by allocating a percpu
> mm_struct that contains a single giant VMA.  __arch_rare_write_begin()
> switches to that mm.  __arch_rare_write pokes some PTEs into the mm
> and calls copy_to_user().  __arch_rare_write_end() switches back to
> the original mm.  An x86 implementation could just fiddle with CR0.WP.

If that works for Russell and Mark, cool by me. :) I'll reorganize my
series with the new API...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 5/8] ARM: Implement __arch_rare_write_map/unmap()
  2017-03-01 11:30       ` Russell King - ARM Linux
@ 2017-03-02  0:08         ` Kees Cook
  0 siblings, 0 replies; 33+ messages in thread
From: Kees Cook @ 2017-03-02  0:08 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: kernel-hardening, Mark Rutland, Andy Lutomirski, Hoeun Ryu,
	PaX Team, Emese Revfy, x86

On Wed, Mar 1, 2017 at 3:30 AM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Tue, Feb 28, 2017 at 09:41:07PM -0800, Kees Cook wrote:
>> On Tue, Feb 28, 2017 at 5:04 PM, Russell King - ARM Linux
>> <linux@armlinux.org.uk> wrote:
>> > On Mon, Feb 27, 2017 at 12:43:03PM -0800, Kees Cook wrote:
>> >> Based on grsecurity's ARM pax_{open,close}_kernel() implementation, this
>> >> allows HAVE_ARCH_RARE_WRITE to work on ARM.
>> >
>> > This has the effect that any memory mapped with DOMAIN_KERNEL will
>> > loose it's NX status, and may end up being read into the I-cache.
>>
>> Arbitrarily so, or only memory accessed/pre-fetched by the CPU when in
>> this state? i.e. since this is non-preempt, only touches the needed
>> memory, and has the original domain state restored within a few
>> instructions, does this avoid the problem? It seems like the chance
>> for a speculative prefetch from device memory under these conditions
>> should be approaching zero.
>
> "The software that defines a translation table must mark any region of
>  memory that is read-sensitive as execute-never, to avoid the possibility
>  of a speculative fetch accessing the memory region. For example, it must
>  mark any memory region that corresponds to a read-sensitive peripheral
>  as Execute-never."
>
> Also see:
>
> commit 247055aa21ffef1c49dd64710d5e94c2aee19b58
> Author: Catalin Marinas <catalin.marinas@arm.com>
> Date:   Mon Sep 13 16:03:21 2010 +0100
>
>     ARM: 6384/1: Remove the domain switching on ARMv6k/v7 CPUs
>
> which removed the domain switching I referred to previously.
>
> The way the ARM ARM looks at instruction speculative prefetch is that it
> can happen to any location that is not explicitly marked as Execute-never.
> (This is because the ARM ARM doesn't define an implementation.)  So, we
> have to assume that any location that is not marked XN may be speculatively
> prefetched by the processor.
>
> Device memory can be read-sensitive - eg, reading an interrupt status
> register can clear the ending interrupt bits.

OOoh, yes, this is the part that wasn't getting in my head. I was
stuck thinking it was an XN bypass (due to the icache mention). Got
it.

> A speculative prefetch is a read as far as a device is concerned, so
> bypassing the XN permission by switching the domain to manager mode has
> the effect that the processor can then _legally_ speculatively prefetch
> from a device, and if it happens to hit a device that contains a read
> sensitive location, the side effects of reading that location will
> happen, even though the program did not perform an explicit read.
>
>> Just to make sure I understand: it was only speculative prefetch vs
>> icache, right? Would an icache flush restore the correct permissions?
>
> It's not about permissions, it's about the side effects at the device
> of a read created by the speculative prefetch.
>
>> I'm just pondering alternatives. Also, is there a maximum distance the
>> prefetching spans? i.e. could device memory be guaranteed to be
>> vmapped far enough away from kernel memory to avoid prefetches?
>
> The root cause of this problem is the way we lump both vmalloc() and
> ioremap() mappings into the same memory space (vmalloc region) without
> caring about the domain.

Okay, I patched ptdump (badly) to answer some questions I had on this,
but it looks like everything I was curious about is DOMAIN_KERNEL.

The memory area that write-rarely would want to touch would only be
the .rodata segment, which we could put under a different domain that
looked otherwise identical to DOMAIN_KERNEL. This would apply to
modules too, since those appear to be below the kernel, and not part
of the vmalloc area. Instead of dealing with vmalloc vs ioremap, don't
we just need to adjust the domain of the kernel (or just rodata) and
modules mappings then? We don't need to touch vmalloc at all.

Maybe I'm (still) missing something...

-Kees

> If all device memory was guaranteed to be placed under a different
> domain, then this problem would not exist.  In order to achieve that,
> there's several ways I can think of doing it:
>
> 1) Have separate virtual memory regions for ioremap() and vmalloc()
>    We would need to choose an arbitary limit on the size of these
>    memory pools, which may not suit everyone.
>
> 2) Have vmalloc() grow up as a heap, ioremap() grow down as a stack
>    and a dynamic boundary (aligned to 1 or 2MB) between the two, no
>    mixing allowed.  This avoids the problem with (1) but still results
>    in the required separation.
>
> 3) Align vmalloc region allocations to 2MB, but this would be very
>    wasteful.
>
> 4) Only permit same type (ioremap/vmalloc) of mapping within a 2MB block
>    of vmalloc space.  In other words, a primary allocator of 2MB blocks
>    and a sub-allocator of page-sized blocks (think of the way our
>    page allocator vs slab works.)  Probably going to be subject to
>    fragmentation problems.
>
> 5) Place all vmalloc() and ioremap() mappings under a separate domain,
>    so that all these mappings would be unaffected by the change of
>    domain settings (the resulting permissions would never change.)
>    In other words, DOMAIN_IO becomes DOMAIN_VMALLOC and is used for all
>    mappings in vmalloc space.
>
> The problem with (2) and (5) is teaching pte_alloc_kernel() down to
> pmd_populate_kernel() about the differences - currently, this only ever
> sets up DOMAIN_KERNEL mappings because there's no way for it to know
> what kind of mapping is required.
>
> --
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 1/8] Introduce rare_write() infrastructure
  2017-03-01 21:00           ` Andy Lutomirski
  2017-03-01 23:14             ` Kees Cook
@ 2017-03-02 11:19             ` Mark Rutland
  2017-03-02 16:33               ` Andy Lutomirski
  1 sibling, 1 reply; 33+ messages in thread
From: Mark Rutland @ 2017-03-02 11:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Hoeun Ryu, kernel-hardening, Andy Lutomirski,
	PaX Team, Emese Revfy, Russell King, x86

On Wed, Mar 01, 2017 at 01:00:05PM -0800, Andy Lutomirski wrote:
> On Wed, Mar 1, 2017 at 12:13 PM, Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Mar 1, 2017 at 2:43 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> >> The RW alias write_write(var, val) approach only relies on what arches
> >> already have to implement for userspace to work, so if we can figure out
> >> how to make that work API-wise, we can probably implement that
> >> generically with switch_mm() and {get,put}_user().
> >
> > With a third mm? I maybe misunderstood what you meant about userspace...
> 
> I think I'd like this series a lot better if the arch hooks were
> designed in such a way that a generic implementation could be backed
> by kmap, use_mm, or similar.  This would *require* that the arch hooks
> be able to return a different address.  Also, I see no reason that the
> list manipulation needs to be particularly special.  If the arch hook
> sets up an alias, couldn't the generic code just call it twice.

I completely agree.

There's some fun to be had with switch_mm/use_mm (e.g. with arm64's
TTBR0_SW_PAN), but I think we can solve that generically.

> So here's a new proposal for how the hooks could look:

> void __arch_rare_write_begin(void);
> void __arch_rare_write(void *dest, const void *source, size_t len);
> void __arch_rare_write_end(void);

I think we're on the same page, API-wise.

Modulo naming, and the len argument to the write function, this is
exactly the same as my original proposal.

I had assumed that we could derive the len argument implicitly from the
object being assigned to, but it doesn't really matter either way.

> Now a generic implementation could work by allocating a percpu
> mm_struct that contains a single giant VMA.  __arch_rare_write_begin()
> switches to that mm.  __arch_rare_write pokes some PTEs into the mm
> and calls copy_to_user().  __arch_rare_write_end() switches back to
> the original mm.  An x86 implementation could just fiddle with CR0.WP.

I'd expected that we'd know where the write_rarely data was up-front, so
we could set up the mapping statically, and just map it in at map/begin,
but otherwise this sounds like what I had in mind.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 4/8] x86: Implement __arch_rare_write_map/unmap()
  2017-03-01 20:25             ` Kees Cook
@ 2017-03-02 11:20               ` Mark Rutland
  0 siblings, 0 replies; 33+ messages in thread
From: Mark Rutland @ 2017-03-02 11:20 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, kernel-hardening, Andy Lutomirski, Hoeun Ryu,
	PaX Team, Emese Revfy, Russell King, X86 ML

On Wed, Mar 01, 2017 at 12:25:11PM -0800, Kees Cook wrote:
> On Wed, Mar 1, 2017 at 3:24 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> > There is no global override of this sort on arm64. Just having map/unap,
> > open/close, shed/unshed, etc, won't work.
> >
> > The options I can think of for arm64 are:
> >
> > * Have a separate RW alias of just the write_rarely data, that we
> >   temporarily map-in on a given CPU (using TTBR0) to perform the write.
> >   The RW alias is at a different VA to the usual RO alias, so we have to
> >   convert each pointer to its RW alias to perform the write. That's why
> >   we need __rare_write_ptr() to hide this, and can't have uninstrumented
> >   writes.
> 
> I think only the list code isn't instrumented, and that's just because
> it discards casts outside the function. There's no reason it couldn't
> be instrumented. 

Ok, it sounds like we could make this work, then.

> >   Since this would *only* map the write_rarely data, it's simple to set
> >   up, and we don't need to modify the tables at runtime.
> >
> >   I also think we can implement this generically using switch_mm() and
> >   {get,put}_user(), or specialised variants thereof.
> >
> >   Assuming we can figure out how to handle those complex cases, this is
> >   my preferred solution. :)
> 
> Would this alias be CPU-local? (I assume yes, given the "give up on on
> being per-cpu" option below..)

Yes, this would be CPU-local. It would be like mapping the idmap, or
userspace.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 1/8] Introduce rare_write() infrastructure
  2017-03-02 11:19             ` Mark Rutland
@ 2017-03-02 16:33               ` Andy Lutomirski
  2017-03-02 19:48                 ` Kees Cook
  0 siblings, 1 reply; 33+ messages in thread
From: Andy Lutomirski @ 2017-03-02 16:33 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Kees Cook, Hoeun Ryu, kernel-hardening, Andy Lutomirski,
	PaX Team, Emese Revfy, Russell King, x86

On Thu, Mar 2, 2017 at 3:19 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Mar 01, 2017 at 01:00:05PM -0800, Andy Lutomirski wrote:
>> On Wed, Mar 1, 2017 at 12:13 PM, Kees Cook <keescook@chromium.org> wrote:
>> > On Wed, Mar 1, 2017 at 2:43 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>> >> The RW alias write_write(var, val) approach only relies on what arches
>> >> already have to implement for userspace to work, so if we can figure out
>> >> how to make that work API-wise, we can probably implement that
>> >> generically with switch_mm() and {get,put}_user().
>> >
>> > With a third mm? I maybe misunderstood what you meant about userspace...
>>
>> I think I'd like this series a lot better if the arch hooks were
>> designed in such a way that a generic implementation could be backed
>> by kmap, use_mm, or similar.  This would *require* that the arch hooks
>> be able to return a different address.  Also, I see no reason that the
>> list manipulation needs to be particularly special.  If the arch hook
>> sets up an alias, couldn't the generic code just call it twice.
>
> I completely agree.
>
> There's some fun to be had with switch_mm/use_mm (e.g. with arm64's
> TTBR0_SW_PAN), but I think we can solve that generically.
>
>> So here's a new proposal for how the hooks could look:
>
>> void __arch_rare_write_begin(void);
>> void __arch_rare_write(void *dest, const void *source, size_t len);
>> void __arch_rare_write_end(void);
>
> I think we're on the same page, API-wise.
>
> Modulo naming, and the len argument to the write function, this is
> exactly the same as my original proposal.
>
> I had assumed that we could derive the len argument implicitly from the
> object being assigned to, but it doesn't really matter either way.

I think we can, but I was imagining that this kind of logic would live
in generic code that called __arch_rare_write (or whatever it's
called).

>
>> Now a generic implementation could work by allocating a percpu
>> mm_struct that contains a single giant VMA.  __arch_rare_write_begin()
>> switches to that mm.  __arch_rare_write pokes some PTEs into the mm
>> and calls copy_to_user().  __arch_rare_write_end() switches back to
>> the original mm.  An x86 implementation could just fiddle with CR0.WP.
>
> I'd expected that we'd know where the write_rarely data was up-front, so
> we could set up the mapping statically, and just map it in at map/begin,
> but otherwise this sounds like what I had in mind.
>

Duh.  That works too and would be considerably simpler :)

--Andy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 1/8] Introduce rare_write() infrastructure
  2017-03-02 16:33               ` Andy Lutomirski
@ 2017-03-02 19:48                 ` Kees Cook
  0 siblings, 0 replies; 33+ messages in thread
From: Kees Cook @ 2017-03-02 19:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mark Rutland, Hoeun Ryu, kernel-hardening, Andy Lutomirski,
	PaX Team, Emese Revfy, Russell King, x86

On Thu, Mar 2, 2017 at 8:33 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Mar 2, 2017 at 3:19 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>> On Wed, Mar 01, 2017 at 01:00:05PM -0800, Andy Lutomirski wrote:
>>> On Wed, Mar 1, 2017 at 12:13 PM, Kees Cook <keescook@chromium.org> wrote:
>>> > On Wed, Mar 1, 2017 at 2:43 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>>> Now a generic implementation could work by allocating a percpu
>>> mm_struct that contains a single giant VMA.  __arch_rare_write_begin()
>>> switches to that mm.  __arch_rare_write pokes some PTEs into the mm
>>> and calls copy_to_user().  __arch_rare_write_end() switches back to
>>> the original mm.  An x86 implementation could just fiddle with CR0.WP.
>>
>> I'd expected that we'd know where the write_rarely data was up-front, so
>> we could set up the mapping statically, and just map it in at map/begin,
>> but otherwise this sounds like what I had in mind.
>>
>
> Duh.  That works too and would be considerably simpler :)

It may be slightly complicated by needing to track the module .rodata
areas, but ultimately, yeah, it seems like that could work.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [kernel-hardening] Re: [RFC][PATCH 4/8] x86: Implement __arch_rare_write_map/unmap()
  2017-03-01 11:24           ` Mark Rutland
  2017-03-01 20:25             ` Kees Cook
@ 2017-03-03  0:59             ` Hoeun Ryu
  1 sibling, 0 replies; 33+ messages in thread
From: Hoeun Ryu @ 2017-03-03  0:59 UTC (permalink / raw)
  To: Mark Rutland, Kees Cook
  Cc: kernel-hardening, Andy Lutomirski, PaX Team, Emese Revfy,
	Russell King, X86 ML

[-- Attachment #1: Type: text/plain, Size: 3560 bytes --]


>> On Mar 1, 2017, at 8:24 PM, Mark Rutland <mark.rutland@arm.com> wrote:
>> 
>>> On Tue, Feb 28, 2017 at 03:52:26PM -0800, Kees Cook wrote:
>>> On Tue, Feb 28, 2017 at 2:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Tue, Feb 28, 2017 at 1:35 PM, Kees Cook <keescook@chromium.org> wrote:
>>>>> Can't we at least make this:
>>>>> 
>>>>> struct rare_write_mapping {
>>>>> void *addr;
>>>>> struct arch_rare_write_state arch_state;
>>>>> };
>>>>> 
>>>>> static inline struct rare_write_mapping __arch_rare_write_map(void
>>>>> *addr, size_t len);
>>>>> static inline void __arch_rare_write_unmap(struct rare_write_mapping mapping);
>>>> 
>>>> How do you envision this working with the more complex things I
>>>> included in the latter patches of the series, namely doing linked list
>>>> updates across multiple structure instances, etc?
>>>> 
>>>> ie, poor list manipulation pseudo-code:
>>>> 
>>>> turn off read-only;
>>>> struct_one->next = struct_tree->node;
>>>> struct_three->prev = struct_one->node;
>>>> struct_two->prev = struct_two->next = NULL;
>>>> turn on read-only;
>>>> 
>>>> That's three separate memory areas involved...
>>> 
>>> Fair enough.  That being said, how is this supposed to work on
>>> architectures that don't have a global "disable write protection" bit?
>>> Surely these architectures exist.
>> 
>> I don't know. :) That's part of the reason for putting up this series:
>> looking to see what's possible on other architectures. It's not clear
>> to me what arm64 can do, for example.
> 
> There is no global override of this sort on arm64. Just having map/unap,
> open/close, shed/unshed, etc, won't work.
> 
> The options I can think of for arm64 are:
> 
> * Have a separate RW alias of just the write_rarely data, that we
> temporarily map-in on a given CPU (using TTBR0) to perform the write.
> The RW alias is at a different VA to the usual RO alias, so we have to
> convert each pointer to its RW alias to perform the write. That's why
> we need __rare_write_ptr() to hide this, and can't have uninstrumented
> writes.
> 
> Since this would *only* map the write_rarely data, it's simple to set
> up, and we don't need to modify the tables at runtime.
> 
> I also think we can implement this generically using switch_mm() and
> {get,put}_user(), or specialised variants thereof.
> 
> Assuming we can figure out how to handle those complex cases, this is
> my preferred solution. :)

I implemented RFC version of the option #1 .
It would be appreciated if you could review the code [1].

[1] http://www.openwall.com/lists/kernel-hardening/2017/03/02/5

> 
> * Have a copy of the entire swapper page tables, which differs only in
> the write_rarely data being RW. We then switch TTBR1 over to this for
> the duration of the rare_write_map() .. write_write_unmap() sequence.
> 
> While this sounds conceptually simple, it's far more complex in
> practice. Keeping the not-quite-clone of the swapper in-sync is going
> to be very painful and ripe for error.
> 
> Additionally, the TTBR1 switch will be very expensive, far more so
> than the TTBR0 switch due to TLB maintenance and other work (including
> switching TTBR0 twice per TTBR1 switch, itself requiring synchronised
> TLB maintenance and other work).
> 
> I am not fond of this approach.
> 
> * Give up on this being per-cpu. Either change the permissions in place,
> or fixmap an RW alias as required. In either case, all CPUs will have
> RW access.
> 
> Thanks,
> Mark.

[-- Attachment #2: Type: text/html, Size: 11628 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2017-03-03  0:59 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-27 20:42 [kernel-hardening] [RFC] Introduce rare_write() infrastructure Kees Cook
2017-02-27 20:42 ` [kernel-hardening] [RFC][PATCH 1/8] " Kees Cook
2017-02-28  8:22   ` [kernel-hardening] " Hoeun Ryu
2017-02-28 15:05     ` Kees Cook
2017-03-01 10:43       ` Mark Rutland
2017-03-01 20:13         ` Kees Cook
2017-03-01 20:31           ` Kees Cook
2017-03-01 21:00           ` Andy Lutomirski
2017-03-01 23:14             ` Kees Cook
2017-03-02 11:19             ` Mark Rutland
2017-03-02 16:33               ` Andy Lutomirski
2017-03-02 19:48                 ` Kees Cook
2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 2/8] lkdtm: add test for " Kees Cook
2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 3/8] net: switch sock_diag handlers to rare_write() Kees Cook
2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 4/8] x86: Implement __arch_rare_write_map/unmap() Kees Cook
2017-02-28 19:33   ` [kernel-hardening] " Andy Lutomirski
2017-02-28 21:35     ` Kees Cook
2017-02-28 22:54       ` Andy Lutomirski
2017-02-28 23:52         ` Kees Cook
2017-03-01 11:24           ` Mark Rutland
2017-03-01 20:25             ` Kees Cook
2017-03-02 11:20               ` Mark Rutland
2017-03-03  0:59             ` Hoeun Ryu
2017-03-01 10:59       ` Mark Rutland
2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 5/8] ARM: " Kees Cook
2017-03-01  1:04   ` [kernel-hardening] " Russell King - ARM Linux
2017-03-01  5:41     ` Kees Cook
2017-03-01 11:30       ` Russell King - ARM Linux
2017-03-02  0:08         ` Kees Cook
2017-03-01 11:50       ` Mark Rutland
2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 6/8] list: add rare_write() list helpers Kees Cook
2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 7/8] gcc-plugins: Add constify plugin Kees Cook
2017-02-27 20:43 ` [kernel-hardening] [RFC][PATCH 8/8] cgroups: force all struct cftype const Kees Cook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.