All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v4 0/3] Add support for eXclusive Page Frame Ownership
@ 2017-06-07 21:16 ` Tycho Andersen
  0 siblings, 0 replies; 12+ messages in thread
From: Tycho Andersen @ 2017-06-07 21:16 UTC (permalink / raw)
  To: linux-mm; +Cc: Juerg Haefliger, kernel-hardening, Tycho Andersen

Hi all,

I have talked with Juerg about picking up the torch for XPFO [1], and have been
playing around with the set for a bit. I've fixed one memory corruption issue
since v3, and also tried and failed at integrating hugepages support. The code
in patch 3 seems to split up the page and apply the right protections, but
somehow the lkdtm test read succeeds and no fault is generated, and I don't
understand why.

[1]: https://lkml.org/lkml/2016/11/4/245

Thoughts welcome,

Tycho

Juerg Haefliger (2):
  mm, x86: Add support for eXclusive Page Frame Ownership (XPFO)
  lkdtm: Add tests for XPFO

Tycho Andersen (1):
  xpfo: add support for hugepages

 Documentation/admin-guide/kernel-parameters.txt |   2 +
 arch/x86/Kconfig                                |   1 +
 arch/x86/include/asm/pgtable.h                  |  22 +++
 arch/x86/mm/Makefile                            |   1 +
 arch/x86/mm/pageattr.c                          |  21 +--
 arch/x86/mm/xpfo.c                              |  82 +++++++++
 drivers/misc/Makefile                           |   1 +
 drivers/misc/lkdtm.h                            |   3 +
 drivers/misc/lkdtm_core.c                       |   1 +
 drivers/misc/lkdtm_xpfo.c                       | 105 ++++++++++++
 include/linux/highmem.h                         |  15 +-
 include/linux/xpfo.h                            |  38 +++++
 mm/Makefile                                     |   1 +
 mm/page_alloc.c                                 |   2 +
 mm/page_ext.c                                   |   4 +
 mm/xpfo.c                                       | 210 ++++++++++++++++++++++++
 security/Kconfig                                |  19 +++
 17 files changed, 508 insertions(+), 20 deletions(-)
 create mode 100644 arch/x86/mm/xpfo.c
 create mode 100644 drivers/misc/lkdtm_xpfo.c
 create mode 100644 include/linux/xpfo.h
 create mode 100644 mm/xpfo.c

-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [kernel-hardening] [RFC v4 0/3] Add support for eXclusive Page Frame Ownership
@ 2017-06-07 21:16 ` Tycho Andersen
  0 siblings, 0 replies; 12+ messages in thread
From: Tycho Andersen @ 2017-06-07 21:16 UTC (permalink / raw)
  To: linux-mm; +Cc: Juerg Haefliger, kernel-hardening, Tycho Andersen

Hi all,

I have talked with Juerg about picking up the torch for XPFO [1], and have been
playing around with the set for a bit. I've fixed one memory corruption issue
since v3, and also tried and failed at integrating hugepages support. The code
in patch 3 seems to split up the page and apply the right protections, but
somehow the lkdtm test read succeeds and no fault is generated, and I don't
understand why.

[1]: https://lkml.org/lkml/2016/11/4/245

Thoughts welcome,

Tycho

Juerg Haefliger (2):
  mm, x86: Add support for eXclusive Page Frame Ownership (XPFO)
  lkdtm: Add tests for XPFO

Tycho Andersen (1):
  xpfo: add support for hugepages

 Documentation/admin-guide/kernel-parameters.txt |   2 +
 arch/x86/Kconfig                                |   1 +
 arch/x86/include/asm/pgtable.h                  |  22 +++
 arch/x86/mm/Makefile                            |   1 +
 arch/x86/mm/pageattr.c                          |  21 +--
 arch/x86/mm/xpfo.c                              |  82 +++++++++
 drivers/misc/Makefile                           |   1 +
 drivers/misc/lkdtm.h                            |   3 +
 drivers/misc/lkdtm_core.c                       |   1 +
 drivers/misc/lkdtm_xpfo.c                       | 105 ++++++++++++
 include/linux/highmem.h                         |  15 +-
 include/linux/xpfo.h                            |  38 +++++
 mm/Makefile                                     |   1 +
 mm/page_alloc.c                                 |   2 +
 mm/page_ext.c                                   |   4 +
 mm/xpfo.c                                       | 210 ++++++++++++++++++++++++
 security/Kconfig                                |  19 +++
 17 files changed, 508 insertions(+), 20 deletions(-)
 create mode 100644 arch/x86/mm/xpfo.c
 create mode 100644 drivers/misc/lkdtm_xpfo.c
 create mode 100644 include/linux/xpfo.h
 create mode 100644 mm/xpfo.c

-- 
2.11.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC v4 1/3] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO)
  2017-06-07 21:16 ` [kernel-hardening] " Tycho Andersen
@ 2017-06-07 21:16   ` Tycho Andersen
  -1 siblings, 0 replies; 12+ messages in thread
From: Tycho Andersen @ 2017-06-07 21:16 UTC (permalink / raw)
  To: linux-mm
  Cc: Juerg Haefliger, kernel-hardening, Juerg Haefliger, Tycho Andersen

From: Juerg Haefliger <juerg.haefliger@hpe.com>

This patch adds support for XPFO which protects against 'ret2dir' kernel
attacks. The basic idea is to enforce exclusive ownership of page frames
by either the kernel or userspace, unless explicitly requested by the
kernel. Whenever a page destined for userspace is allocated, it is
unmapped from physmap (the kernel's page table). When such a page is
reclaimed from userspace, it is mapped back to physmap.

Additional fields in the page_ext struct are used for XPFO housekeeping,
specifically:
  - two flags to distinguish user vs. kernel pages and to tag unmapped
    pages.
  - a reference counter to balance kmap/kunmap operations.
  - a lock to serialize access to the XPFO fields.

This patch is based on the work of Vasileios P. Kemerlis et al. who
published their work in this paper:
  http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf

Suggested-by: Vasileios P. Kemerlis <vpk@cs.columbia.edu>
Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
Signed-off-by: Tycho Andersen <tycho@docker.com>
---
 Documentation/admin-guide/kernel-parameters.txt |   2 +
 arch/x86/Kconfig                                |   1 +
 arch/x86/mm/Makefile                            |   1 +
 arch/x86/mm/xpfo.c                              |  23 +++
 include/linux/highmem.h                         |  15 +-
 include/linux/xpfo.h                            |  37 ++++
 mm/Makefile                                     |   1 +
 mm/page_alloc.c                                 |   2 +
 mm/page_ext.c                                   |   4 +
 mm/xpfo.c                                       | 214 ++++++++++++++++++++++++
 security/Kconfig                                |  19 +++
 11 files changed, 317 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 15f79c27748d..e719071ff45b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2679,6 +2679,8 @@
 
 	nox2apic	[X86-64,APIC] Do not enable x2APIC mode.
 
+	noxpfo		[X86-64] Disable XPFO when CONFIG_XPFO is on.
+
 	cpu0_hotplug	[X86] Turn on CPU0 hotplug feature when
 			CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off.
 			Some features depend on CPU0. Known dependencies are:
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cd18994a9555..17235bd44036 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -176,6 +176,7 @@ config X86
 	select USER_STACKTRACE_SUPPORT
 	select VIRT_TO_BUS
 	select X86_FEATURE_NAMES		if PROC_FS
+	select ARCH_SUPPORTS_XPFO		if X86_64
 
 config INSTRUCTION_DECODER
 	def_bool y
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 96d2b847e09e..756cf39c29f2 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -39,3 +39,4 @@ obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
 obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
 obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
 
+obj-$(CONFIG_XPFO)		+= xpfo.o
diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c
new file mode 100644
index 000000000000..c24b06c9b4ab
--- /dev/null
+++ b/arch/x86/mm/xpfo.c
@@ -0,0 +1,23 @@
+/*
+ * Copyright (C) 2017 Hewlett Packard Enterprise Development, L.P.
+ * Copyright (C) 2016 Brown University. All rights reserved.
+ *
+ * Authors:
+ *   Juerg Haefliger <juerg.haefliger@hpe.com>
+ *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/mm.h>
+
+/* Update a single kernel page table entry */
+inline void set_kpte(void *kaddr, struct page *page, pgprot_t prot)
+{
+	unsigned int level;
+	pte_t *pte = lookup_address((unsigned long)kaddr, &level);
+
+	set_pte_atomic(pte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
+}
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index bb3f3297062a..7a17c166532f 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -7,6 +7,7 @@
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 #include <linux/hardirq.h>
+#include <linux/xpfo.h>
 
 #include <asm/cacheflush.h>
 
@@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr)
 #ifndef ARCH_HAS_KMAP
 static inline void *kmap(struct page *page)
 {
+	void *kaddr;
+
 	might_sleep();
-	return page_address(page);
+	kaddr = page_address(page);
+	xpfo_kmap(kaddr, page);
+	return kaddr;
 }
 
 static inline void kunmap(struct page *page)
 {
+	xpfo_kunmap(page_address(page), page);
 }
 
 static inline void *kmap_atomic(struct page *page)
 {
+	void *kaddr;
+
 	preempt_disable();
 	pagefault_disable();
-	return page_address(page);
+	kaddr = page_address(page);
+	xpfo_kmap(kaddr, page);
+	return kaddr;
 }
 #define kmap_atomic_prot(page, prot)	kmap_atomic(page)
 
 static inline void __kunmap_atomic(void *addr)
 {
+	xpfo_kunmap(addr, virt_to_page(addr));
 	pagefault_enable();
 	preempt_enable();
 }
diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h
new file mode 100644
index 000000000000..031cbee22a41
--- /dev/null
+++ b/include/linux/xpfo.h
@@ -0,0 +1,37 @@
+/*
+ * Copyright (C) 2017 Hewlett Packard Enterprise Development, L.P.
+ * Copyright (C) 2016 Brown University. All rights reserved.
+ *
+ * Authors:
+ *   Juerg Haefliger <juerg.haefliger@hpe.com>
+ *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef _LINUX_XPFO_H
+#define _LINUX_XPFO_H
+
+#ifdef CONFIG_XPFO
+
+extern struct page_ext_operations page_xpfo_ops;
+
+void set_kpte(void *kaddr, struct page *page, pgprot_t prot);
+
+void xpfo_kmap(void *kaddr, struct page *page);
+void xpfo_kunmap(void *kaddr, struct page *page);
+void xpfo_alloc_pages(struct page *page, int order, gfp_t gfp);
+void xpfo_free_pages(struct page *page, int order);
+
+#else /* !CONFIG_XPFO */
+
+static inline void xpfo_kmap(void *kaddr, struct page *page) { }
+static inline void xpfo_kunmap(void *kaddr, struct page *page) { }
+static inline void xpfo_alloc_pages(struct page *page, int order, gfp_t gfp) { }
+static inline void xpfo_free_pages(struct page *page, int order) { }
+
+#endif /* CONFIG_XPFO */
+
+#endif /* _LINUX_XPFO_H */
diff --git a/mm/Makefile b/mm/Makefile
index 026f6a828a50..b5f2244f8293 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -103,3 +103,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
 obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
 obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
+obj-$(CONFIG_XPFO) += xpfo.o
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f9e450c6b6e4..8250814ae6de 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1049,6 +1049,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 	kernel_poison_pages(page, 1 << order, 0);
 	kernel_map_pages(page, 1 << order, 0);
 	kasan_free_pages(page, order);
+	xpfo_free_pages(page, order);
 
 	return true;
 }
@@ -1740,6 +1741,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 	kernel_map_pages(page, 1 << order, 1);
 	kernel_poison_pages(page, 1 << order, 1);
 	kasan_alloc_pages(page, order);
+	xpfo_alloc_pages(page, order, gfp_flags);
 	set_page_owner(page, order, gfp_flags);
 }
 
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 88ccc044b09a..4899df1f5d66 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -7,6 +7,7 @@
 #include <linux/kmemleak.h>
 #include <linux/page_owner.h>
 #include <linux/page_idle.h>
+#include <linux/xpfo.h>
 
 /*
  * struct page extension
@@ -65,6 +66,9 @@ static struct page_ext_operations *page_ext_ops[] = {
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
 	&page_idle_ops,
 #endif
+#ifdef CONFIG_XPFO
+	&page_xpfo_ops,
+#endif
 };
 
 static unsigned long total_usage;
diff --git a/mm/xpfo.c b/mm/xpfo.c
new file mode 100644
index 000000000000..8384058136b1
--- /dev/null
+++ b/mm/xpfo.c
@@ -0,0 +1,214 @@
+/*
+ * Copyright (C) Docker Inc.
+ * Copyright (C) 2017 Hewlett Packard Enterprise Development, L.P.
+ * Copyright (C) 2016 Brown University. All rights reserved.
+ *
+ * Authors:
+ *   Tycho Andersen <tycho@docker.com>
+ *   Juerg Haefliger <juerg.haefliger@hpe.com>
+ *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/page_ext.h>
+#include <linux/xpfo.h>
+
+#include <asm/tlbflush.h>
+
+/* XPFO page state flags */
+enum xpfo_flags {
+	XPFO_PAGE_USER,		/* Page is allocated to user-space */
+	XPFO_PAGE_UNMAPPED,	/* Page is unmapped from the linear map */
+};
+
+/* Per-page XPFO house-keeping data */
+struct xpfo {
+	unsigned long flags;	/* Page state */
+	bool inited;		/* Map counter and lock initialized */
+	atomic_t mapcount;	/* Counter for balancing map/unmap requests */
+	spinlock_t maplock;	/* Lock to serialize map/unmap requests */
+};
+
+DEFINE_STATIC_KEY_FALSE(xpfo_inited);
+
+static bool xpfo_disabled __initdata;
+
+static int __init noxpfo_param(char *str)
+{
+	xpfo_disabled = true;
+
+	return 0;
+}
+
+early_param("noxpfo", noxpfo_param);
+
+static bool __init need_xpfo(void)
+{
+	if (xpfo_disabled) {
+		printk(KERN_INFO "XPFO disabled\n");
+		return false;
+	}
+
+	return true;
+}
+
+static void init_xpfo(void)
+{
+	printk(KERN_INFO "XPFO enabled\n");
+	static_branch_enable(&xpfo_inited);
+}
+
+struct page_ext_operations page_xpfo_ops = {
+	.size = sizeof(struct xpfo),
+	.need = need_xpfo,
+	.init = init_xpfo,
+};
+
+static inline struct xpfo *lookup_xpfo(struct page *page)
+{
+	return (void *)lookup_page_ext(page) + page_xpfo_ops.offset;
+}
+
+void xpfo_alloc_pages(struct page *page, int order, gfp_t gfp)
+{
+	int i, flush_tlb = 0;
+	struct xpfo *xpfo;
+	unsigned long kaddr;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	for (i = 0; i < (1 << order); i++)  {
+		xpfo = lookup_xpfo(page + i);
+
+		BUG_ON(test_bit(XPFO_PAGE_UNMAPPED, &xpfo->flags));
+
+		/* Initialize the map lock and map counter */
+		if (unlikely(!xpfo->inited)) {
+			spin_lock_init(&xpfo->maplock);
+			atomic_set(&xpfo->mapcount, 0);
+			xpfo->inited = true;
+		}
+		BUG_ON(atomic_read(&xpfo->mapcount));
+
+		if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) {
+			/*
+			 * Tag the page as a user page and flush the TLB if it
+			 * was previously allocated to the kernel.
+			 */
+			if (!test_and_set_bit(XPFO_PAGE_USER, &xpfo->flags))
+				flush_tlb = 1;
+		} else {
+			/* Tag the page as a non-user (kernel) page */
+			clear_bit(XPFO_PAGE_USER, &xpfo->flags);
+		}
+	}
+
+	if (flush_tlb) {
+		kaddr = (unsigned long)page_address(page);
+		flush_tlb_kernel_range(kaddr, kaddr + (1 << order) *
+				       PAGE_SIZE);
+	}
+}
+
+void xpfo_free_pages(struct page *page, int order)
+{
+	int i;
+	struct xpfo *xpfo;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	for (i = 0; i < (1 << order); i++) {
+		xpfo = lookup_xpfo(page + i);
+
+		if (unlikely(!xpfo->inited)) {
+			/*
+			 * The page was allocated before page_ext was
+			 * initialized, so it is a kernel page.
+			 */
+			continue;
+		}
+
+		/*
+		 * Map the page back into the kernel if it was previously
+		 * allocated to user space.
+		 */
+		if (test_and_clear_bit(XPFO_PAGE_UNMAPPED, &xpfo->flags)) {
+			set_kpte(page_address(page + i), page + i,
+				 PAGE_KERNEL);
+		}
+	}
+}
+
+void xpfo_kmap(void *kaddr, struct page *page)
+{
+	struct xpfo *xpfo;
+	unsigned long flags;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	xpfo = lookup_xpfo(page);
+
+	/*
+	 * The page was allocated before page_ext was initialized (which means
+	 * it's a kernel page) or it's allocated to the kernel, so nothing to
+	 * do.
+	 */
+	if (unlikely(!xpfo->inited) || !test_bit(XPFO_PAGE_USER, &xpfo->flags))
+		return;
+
+	spin_lock_irqsave(&xpfo->maplock, flags);
+
+	/*
+	 * The page was previously allocated to user space, so map it back
+	 * into the kernel. No TLB flush required.
+	 */
+	if ((atomic_inc_return(&xpfo->mapcount) == 1) &&
+	    test_and_clear_bit(XPFO_PAGE_UNMAPPED, &xpfo->flags))
+		set_kpte(kaddr, page, PAGE_KERNEL);
+
+	spin_unlock_irqrestore(&xpfo->maplock, flags);
+}
+EXPORT_SYMBOL(xpfo_kmap);
+
+void xpfo_kunmap(void *kaddr, struct page *page)
+{
+	struct xpfo *xpfo;
+	unsigned long flags;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	xpfo = lookup_xpfo(page);
+
+	/*
+	 * The page was allocated before page_ext was initialized (which means
+	 * it's a kernel page) or it's allocated to the kernel, so nothing to
+	 * do.
+	 */
+	if (unlikely(!xpfo->inited) || !test_bit(XPFO_PAGE_USER, &xpfo->flags))
+		return;
+
+	spin_lock_irqsave(&xpfo->maplock, flags);
+
+	/*
+	 * The page is to be allocated back to user space, so unmap it from the
+	 * kernel, flush the TLB and tag it as a user page.
+	 */
+	if (atomic_dec_return(&xpfo->mapcount) == 0) {
+		BUG_ON(test_bit(XPFO_PAGE_UNMAPPED, &xpfo->flags));
+		set_bit(XPFO_PAGE_UNMAPPED, &xpfo->flags);
+		set_kpte(kaddr, page, __pgprot(0));
+		__flush_tlb_one((unsigned long)kaddr);
+	}
+
+	spin_unlock_irqrestore(&xpfo->maplock, flags);
+}
+EXPORT_SYMBOL(xpfo_kunmap);
diff --git a/security/Kconfig b/security/Kconfig
index 93027fdf47d1..f9563c3f7bc1 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -6,6 +6,25 @@ menu "Security options"
 
 source security/keys/Kconfig
 
+config ARCH_SUPPORTS_XPFO
+	bool
+
+config XPFO
+	bool "Enable eXclusive Page Frame Ownership (XPFO)"
+	default n
+	depends on ARCH_SUPPORTS_XPFO
+	select PAGE_EXTENSION
+	help
+	  This option offers protection against 'ret2dir' kernel attacks.
+	  When enabled, every time a page frame is allocated to user space, it
+	  is unmapped from the direct mapped RAM region in kernel space
+	  (physmap). Similarly, when a page frame is freed/reclaimed, it is
+	  mapped back to physmap.
+
+	  There is a slight performance impact when this option is enabled.
+
+	  If in doubt, say "N".
+
 config SECURITY_DMESG_RESTRICT
 	bool "Restrict unprivileged access to the kernel syslog"
 	default n
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [kernel-hardening] [RFC v4 1/3] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO)
@ 2017-06-07 21:16   ` Tycho Andersen
  0 siblings, 0 replies; 12+ messages in thread
From: Tycho Andersen @ 2017-06-07 21:16 UTC (permalink / raw)
  To: linux-mm
  Cc: Juerg Haefliger, kernel-hardening, Juerg Haefliger, Tycho Andersen

From: Juerg Haefliger <juerg.haefliger@hpe.com>

This patch adds support for XPFO which protects against 'ret2dir' kernel
attacks. The basic idea is to enforce exclusive ownership of page frames
by either the kernel or userspace, unless explicitly requested by the
kernel. Whenever a page destined for userspace is allocated, it is
unmapped from physmap (the kernel's page table). When such a page is
reclaimed from userspace, it is mapped back to physmap.

Additional fields in the page_ext struct are used for XPFO housekeeping,
specifically:
  - two flags to distinguish user vs. kernel pages and to tag unmapped
    pages.
  - a reference counter to balance kmap/kunmap operations.
  - a lock to serialize access to the XPFO fields.

This patch is based on the work of Vasileios P. Kemerlis et al. who
published their work in this paper:
  http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf

Suggested-by: Vasileios P. Kemerlis <vpk@cs.columbia.edu>
Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
Signed-off-by: Tycho Andersen <tycho@docker.com>
---
 Documentation/admin-guide/kernel-parameters.txt |   2 +
 arch/x86/Kconfig                                |   1 +
 arch/x86/mm/Makefile                            |   1 +
 arch/x86/mm/xpfo.c                              |  23 +++
 include/linux/highmem.h                         |  15 +-
 include/linux/xpfo.h                            |  37 ++++
 mm/Makefile                                     |   1 +
 mm/page_alloc.c                                 |   2 +
 mm/page_ext.c                                   |   4 +
 mm/xpfo.c                                       | 214 ++++++++++++++++++++++++
 security/Kconfig                                |  19 +++
 11 files changed, 317 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 15f79c27748d..e719071ff45b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2679,6 +2679,8 @@
 
 	nox2apic	[X86-64,APIC] Do not enable x2APIC mode.
 
+	noxpfo		[X86-64] Disable XPFO when CONFIG_XPFO is on.
+
 	cpu0_hotplug	[X86] Turn on CPU0 hotplug feature when
 			CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off.
 			Some features depend on CPU0. Known dependencies are:
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cd18994a9555..17235bd44036 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -176,6 +176,7 @@ config X86
 	select USER_STACKTRACE_SUPPORT
 	select VIRT_TO_BUS
 	select X86_FEATURE_NAMES		if PROC_FS
+	select ARCH_SUPPORTS_XPFO		if X86_64
 
 config INSTRUCTION_DECODER
 	def_bool y
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 96d2b847e09e..756cf39c29f2 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -39,3 +39,4 @@ obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
 obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
 obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
 
+obj-$(CONFIG_XPFO)		+= xpfo.o
diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c
new file mode 100644
index 000000000000..c24b06c9b4ab
--- /dev/null
+++ b/arch/x86/mm/xpfo.c
@@ -0,0 +1,23 @@
+/*
+ * Copyright (C) 2017 Hewlett Packard Enterprise Development, L.P.
+ * Copyright (C) 2016 Brown University. All rights reserved.
+ *
+ * Authors:
+ *   Juerg Haefliger <juerg.haefliger@hpe.com>
+ *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/mm.h>
+
+/* Update a single kernel page table entry */
+inline void set_kpte(void *kaddr, struct page *page, pgprot_t prot)
+{
+	unsigned int level;
+	pte_t *pte = lookup_address((unsigned long)kaddr, &level);
+
+	set_pte_atomic(pte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
+}
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index bb3f3297062a..7a17c166532f 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -7,6 +7,7 @@
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 #include <linux/hardirq.h>
+#include <linux/xpfo.h>
 
 #include <asm/cacheflush.h>
 
@@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr)
 #ifndef ARCH_HAS_KMAP
 static inline void *kmap(struct page *page)
 {
+	void *kaddr;
+
 	might_sleep();
-	return page_address(page);
+	kaddr = page_address(page);
+	xpfo_kmap(kaddr, page);
+	return kaddr;
 }
 
 static inline void kunmap(struct page *page)
 {
+	xpfo_kunmap(page_address(page), page);
 }
 
 static inline void *kmap_atomic(struct page *page)
 {
+	void *kaddr;
+
 	preempt_disable();
 	pagefault_disable();
-	return page_address(page);
+	kaddr = page_address(page);
+	xpfo_kmap(kaddr, page);
+	return kaddr;
 }
 #define kmap_atomic_prot(page, prot)	kmap_atomic(page)
 
 static inline void __kunmap_atomic(void *addr)
 {
+	xpfo_kunmap(addr, virt_to_page(addr));
 	pagefault_enable();
 	preempt_enable();
 }
diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h
new file mode 100644
index 000000000000..031cbee22a41
--- /dev/null
+++ b/include/linux/xpfo.h
@@ -0,0 +1,37 @@
+/*
+ * Copyright (C) 2017 Hewlett Packard Enterprise Development, L.P.
+ * Copyright (C) 2016 Brown University. All rights reserved.
+ *
+ * Authors:
+ *   Juerg Haefliger <juerg.haefliger@hpe.com>
+ *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef _LINUX_XPFO_H
+#define _LINUX_XPFO_H
+
+#ifdef CONFIG_XPFO
+
+extern struct page_ext_operations page_xpfo_ops;
+
+void set_kpte(void *kaddr, struct page *page, pgprot_t prot);
+
+void xpfo_kmap(void *kaddr, struct page *page);
+void xpfo_kunmap(void *kaddr, struct page *page);
+void xpfo_alloc_pages(struct page *page, int order, gfp_t gfp);
+void xpfo_free_pages(struct page *page, int order);
+
+#else /* !CONFIG_XPFO */
+
+static inline void xpfo_kmap(void *kaddr, struct page *page) { }
+static inline void xpfo_kunmap(void *kaddr, struct page *page) { }
+static inline void xpfo_alloc_pages(struct page *page, int order, gfp_t gfp) { }
+static inline void xpfo_free_pages(struct page *page, int order) { }
+
+#endif /* CONFIG_XPFO */
+
+#endif /* _LINUX_XPFO_H */
diff --git a/mm/Makefile b/mm/Makefile
index 026f6a828a50..b5f2244f8293 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -103,3 +103,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
 obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
 obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
+obj-$(CONFIG_XPFO) += xpfo.o
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f9e450c6b6e4..8250814ae6de 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1049,6 +1049,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 	kernel_poison_pages(page, 1 << order, 0);
 	kernel_map_pages(page, 1 << order, 0);
 	kasan_free_pages(page, order);
+	xpfo_free_pages(page, order);
 
 	return true;
 }
@@ -1740,6 +1741,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 	kernel_map_pages(page, 1 << order, 1);
 	kernel_poison_pages(page, 1 << order, 1);
 	kasan_alloc_pages(page, order);
+	xpfo_alloc_pages(page, order, gfp_flags);
 	set_page_owner(page, order, gfp_flags);
 }
 
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 88ccc044b09a..4899df1f5d66 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -7,6 +7,7 @@
 #include <linux/kmemleak.h>
 #include <linux/page_owner.h>
 #include <linux/page_idle.h>
+#include <linux/xpfo.h>
 
 /*
  * struct page extension
@@ -65,6 +66,9 @@ static struct page_ext_operations *page_ext_ops[] = {
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
 	&page_idle_ops,
 #endif
+#ifdef CONFIG_XPFO
+	&page_xpfo_ops,
+#endif
 };
 
 static unsigned long total_usage;
diff --git a/mm/xpfo.c b/mm/xpfo.c
new file mode 100644
index 000000000000..8384058136b1
--- /dev/null
+++ b/mm/xpfo.c
@@ -0,0 +1,214 @@
+/*
+ * Copyright (C) Docker Inc.
+ * Copyright (C) 2017 Hewlett Packard Enterprise Development, L.P.
+ * Copyright (C) 2016 Brown University. All rights reserved.
+ *
+ * Authors:
+ *   Tycho Andersen <tycho@docker.com>
+ *   Juerg Haefliger <juerg.haefliger@hpe.com>
+ *   Vasileios P. Kemerlis <vpk@cs.brown.edu>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/page_ext.h>
+#include <linux/xpfo.h>
+
+#include <asm/tlbflush.h>
+
+/* XPFO page state flags */
+enum xpfo_flags {
+	XPFO_PAGE_USER,		/* Page is allocated to user-space */
+	XPFO_PAGE_UNMAPPED,	/* Page is unmapped from the linear map */
+};
+
+/* Per-page XPFO house-keeping data */
+struct xpfo {
+	unsigned long flags;	/* Page state */
+	bool inited;		/* Map counter and lock initialized */
+	atomic_t mapcount;	/* Counter for balancing map/unmap requests */
+	spinlock_t maplock;	/* Lock to serialize map/unmap requests */
+};
+
+DEFINE_STATIC_KEY_FALSE(xpfo_inited);
+
+static bool xpfo_disabled __initdata;
+
+static int __init noxpfo_param(char *str)
+{
+	xpfo_disabled = true;
+
+	return 0;
+}
+
+early_param("noxpfo", noxpfo_param);
+
+static bool __init need_xpfo(void)
+{
+	if (xpfo_disabled) {
+		printk(KERN_INFO "XPFO disabled\n");
+		return false;
+	}
+
+	return true;
+}
+
+static void init_xpfo(void)
+{
+	printk(KERN_INFO "XPFO enabled\n");
+	static_branch_enable(&xpfo_inited);
+}
+
+struct page_ext_operations page_xpfo_ops = {
+	.size = sizeof(struct xpfo),
+	.need = need_xpfo,
+	.init = init_xpfo,
+};
+
+static inline struct xpfo *lookup_xpfo(struct page *page)
+{
+	return (void *)lookup_page_ext(page) + page_xpfo_ops.offset;
+}
+
+void xpfo_alloc_pages(struct page *page, int order, gfp_t gfp)
+{
+	int i, flush_tlb = 0;
+	struct xpfo *xpfo;
+	unsigned long kaddr;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	for (i = 0; i < (1 << order); i++)  {
+		xpfo = lookup_xpfo(page + i);
+
+		BUG_ON(test_bit(XPFO_PAGE_UNMAPPED, &xpfo->flags));
+
+		/* Initialize the map lock and map counter */
+		if (unlikely(!xpfo->inited)) {
+			spin_lock_init(&xpfo->maplock);
+			atomic_set(&xpfo->mapcount, 0);
+			xpfo->inited = true;
+		}
+		BUG_ON(atomic_read(&xpfo->mapcount));
+
+		if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) {
+			/*
+			 * Tag the page as a user page and flush the TLB if it
+			 * was previously allocated to the kernel.
+			 */
+			if (!test_and_set_bit(XPFO_PAGE_USER, &xpfo->flags))
+				flush_tlb = 1;
+		} else {
+			/* Tag the page as a non-user (kernel) page */
+			clear_bit(XPFO_PAGE_USER, &xpfo->flags);
+		}
+	}
+
+	if (flush_tlb) {
+		kaddr = (unsigned long)page_address(page);
+		flush_tlb_kernel_range(kaddr, kaddr + (1 << order) *
+				       PAGE_SIZE);
+	}
+}
+
+void xpfo_free_pages(struct page *page, int order)
+{
+	int i;
+	struct xpfo *xpfo;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	for (i = 0; i < (1 << order); i++) {
+		xpfo = lookup_xpfo(page + i);
+
+		if (unlikely(!xpfo->inited)) {
+			/*
+			 * The page was allocated before page_ext was
+			 * initialized, so it is a kernel page.
+			 */
+			continue;
+		}
+
+		/*
+		 * Map the page back into the kernel if it was previously
+		 * allocated to user space.
+		 */
+		if (test_and_clear_bit(XPFO_PAGE_UNMAPPED, &xpfo->flags)) {
+			set_kpte(page_address(page + i), page + i,
+				 PAGE_KERNEL);
+		}
+	}
+}
+
+void xpfo_kmap(void *kaddr, struct page *page)
+{
+	struct xpfo *xpfo;
+	unsigned long flags;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	xpfo = lookup_xpfo(page);
+
+	/*
+	 * The page was allocated before page_ext was initialized (which means
+	 * it's a kernel page) or it's allocated to the kernel, so nothing to
+	 * do.
+	 */
+	if (unlikely(!xpfo->inited) || !test_bit(XPFO_PAGE_USER, &xpfo->flags))
+		return;
+
+	spin_lock_irqsave(&xpfo->maplock, flags);
+
+	/*
+	 * The page was previously allocated to user space, so map it back
+	 * into the kernel. No TLB flush required.
+	 */
+	if ((atomic_inc_return(&xpfo->mapcount) == 1) &&
+	    test_and_clear_bit(XPFO_PAGE_UNMAPPED, &xpfo->flags))
+		set_kpte(kaddr, page, PAGE_KERNEL);
+
+	spin_unlock_irqrestore(&xpfo->maplock, flags);
+}
+EXPORT_SYMBOL(xpfo_kmap);
+
+void xpfo_kunmap(void *kaddr, struct page *page)
+{
+	struct xpfo *xpfo;
+	unsigned long flags;
+
+	if (!static_branch_unlikely(&xpfo_inited))
+		return;
+
+	xpfo = lookup_xpfo(page);
+
+	/*
+	 * The page was allocated before page_ext was initialized (which means
+	 * it's a kernel page) or it's allocated to the kernel, so nothing to
+	 * do.
+	 */
+	if (unlikely(!xpfo->inited) || !test_bit(XPFO_PAGE_USER, &xpfo->flags))
+		return;
+
+	spin_lock_irqsave(&xpfo->maplock, flags);
+
+	/*
+	 * The page is to be allocated back to user space, so unmap it from the
+	 * kernel, flush the TLB and tag it as a user page.
+	 */
+	if (atomic_dec_return(&xpfo->mapcount) == 0) {
+		BUG_ON(test_bit(XPFO_PAGE_UNMAPPED, &xpfo->flags));
+		set_bit(XPFO_PAGE_UNMAPPED, &xpfo->flags);
+		set_kpte(kaddr, page, __pgprot(0));
+		__flush_tlb_one((unsigned long)kaddr);
+	}
+
+	spin_unlock_irqrestore(&xpfo->maplock, flags);
+}
+EXPORT_SYMBOL(xpfo_kunmap);
diff --git a/security/Kconfig b/security/Kconfig
index 93027fdf47d1..f9563c3f7bc1 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -6,6 +6,25 @@ menu "Security options"
 
 source security/keys/Kconfig
 
+config ARCH_SUPPORTS_XPFO
+	bool
+
+config XPFO
+	bool "Enable eXclusive Page Frame Ownership (XPFO)"
+	default n
+	depends on ARCH_SUPPORTS_XPFO
+	select PAGE_EXTENSION
+	help
+	  This option offers protection against 'ret2dir' kernel attacks.
+	  When enabled, every time a page frame is allocated to user space, it
+	  is unmapped from the direct mapped RAM region in kernel space
+	  (physmap). Similarly, when a page frame is freed/reclaimed, it is
+	  mapped back to physmap.
+
+	  There is a slight performance impact when this option is enabled.
+
+	  If in doubt, say "N".
+
 config SECURITY_DMESG_RESTRICT
 	bool "Restrict unprivileged access to the kernel syslog"
 	default n
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC v4 2/3] lkdtm: Add tests for XPFO
  2017-06-07 21:16 ` [kernel-hardening] " Tycho Andersen
@ 2017-06-07 21:16   ` Tycho Andersen
  -1 siblings, 0 replies; 12+ messages in thread
From: Tycho Andersen @ 2017-06-07 21:16 UTC (permalink / raw)
  To: linux-mm
  Cc: Juerg Haefliger, kernel-hardening, Juerg Haefliger, Tycho Andersen

From: Juerg Haefliger <juerg.haefliger@hpe.com>

Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
minor fixups for 5-level paging, whitespace
Signed-off-by: Tycho Andersen <tycho@docker.com>
---
 drivers/misc/Makefile     |   1 +
 drivers/misc/lkdtm.h      |   3 ++
 drivers/misc/lkdtm_core.c |   1 +
 drivers/misc/lkdtm_xpfo.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 110 insertions(+)

diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 81ef3e67acc9..e0b5065478be 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -61,6 +61,7 @@ lkdtm-$(CONFIG_LKDTM)		+= lkdtm_heap.o
 lkdtm-$(CONFIG_LKDTM)		+= lkdtm_perms.o
 lkdtm-$(CONFIG_LKDTM)		+= lkdtm_rodata_objcopy.o
 lkdtm-$(CONFIG_LKDTM)		+= lkdtm_usercopy.o
+lkdtm-$(CONFIG_LKDTM)		+= lkdtm_xpfo.o
 
 KCOV_INSTRUMENT_lkdtm_rodata.o	:= n
 
diff --git a/drivers/misc/lkdtm.h b/drivers/misc/lkdtm.h
index 3b4976396ec4..ed9c3131bf41 100644
--- a/drivers/misc/lkdtm.h
+++ b/drivers/misc/lkdtm.h
@@ -64,4 +64,7 @@ void lkdtm_USERCOPY_STACK_FRAME_FROM(void);
 void lkdtm_USERCOPY_STACK_BEYOND(void);
 void lkdtm_USERCOPY_KERNEL(void);
 
+/* lkdtm_xpfo.c */
+void lkdtm_XPFO_READ_USER(void);
+
 #endif
diff --git a/drivers/misc/lkdtm_core.c b/drivers/misc/lkdtm_core.c
index 42d2b8e31e6b..4f3b60bb2d31 100644
--- a/drivers/misc/lkdtm_core.c
+++ b/drivers/misc/lkdtm_core.c
@@ -235,6 +235,7 @@ struct crashtype crashtypes[] = {
 	CRASHTYPE(USERCOPY_STACK_FRAME_FROM),
 	CRASHTYPE(USERCOPY_STACK_BEYOND),
 	CRASHTYPE(USERCOPY_KERNEL),
+	CRASHTYPE(XPFO_READ_USER),
 };
 
 
diff --git a/drivers/misc/lkdtm_xpfo.c b/drivers/misc/lkdtm_xpfo.c
new file mode 100644
index 000000000000..540a2539a88b
--- /dev/null
+++ b/drivers/misc/lkdtm_xpfo.c
@@ -0,0 +1,105 @@
+/*
+ * This is for all the tests related to XPFO (eXclusive Page Frame Ownership).
+ */
+
+#include "lkdtm.h"
+
+#include <linux/mman.h>
+#include <linux/uaccess.h>
+
+/* This is hacky... */
+#ifdef CONFIG_ARM64
+#define pud_large(pud) (pud_sect(pud))
+#define pmd_large(pmd) (pmd_sect(pmd))
+#define PUD_PAGE_MASK PUD_MASK
+#define PMD_PAGE_MASK PMD_MASK
+#endif
+
+static phys_addr_t user_virt_to_phys(unsigned long addr)
+{
+	phys_addr_t phys_addr;
+	unsigned long offset;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	p4d_t *p4d;
+
+	pgd = pgd_offset(current->mm, addr);
+	if (pgd_none(*pgd))
+                return 0;
+
+	p4d = p4d_offset(pgd, addr);
+	if (p4d_none(*p4d))
+		return 0;
+
+	pud = pud_offset(p4d, addr);
+	if (pud_none(*pud))
+		return 0;
+
+	if (pud_large(*pud) || !pud_present(*pud)) {
+		phys_addr = (unsigned long)pud_pfn(*pud) << PAGE_SHIFT;
+		offset = addr & ~PUD_PAGE_MASK;
+		goto out;
+	}
+
+	pmd = pmd_offset(pud, addr);
+	if (pmd_none(*pmd))
+		return 0;
+
+	if (pmd_large(*pmd) || !pmd_present(*pmd)) {
+		phys_addr = (unsigned long)pmd_pfn(*pmd) << PAGE_SHIFT;
+		offset = addr & ~PMD_PAGE_MASK;
+		goto out;
+	}
+
+	pte =  pte_offset_kernel(pmd, addr);
+	phys_addr = (phys_addr_t)pte_pfn(*pte) << PAGE_SHIFT;
+	offset = addr & ~PAGE_MASK;
+
+out:
+	return (phys_addr_t)(phys_addr | offset);
+}
+
+/* Read from userspace via the kernel's linear map */
+void lkdtm_XPFO_READ_USER(void)
+{
+	unsigned long user_addr, user_data = 0xdeadbeef;
+	phys_addr_t phys_addr;
+	void *virt_addr;
+
+	user_addr = vm_mmap(NULL, 0, PAGE_SIZE,
+			    PROT_READ | PROT_WRITE | PROT_EXEC,
+			    MAP_ANONYMOUS | MAP_PRIVATE, 0);
+	if (user_addr >= TASK_SIZE) {
+		pr_warn("Failed to allocate user memory\n");
+		return;
+	}
+
+	if (copy_to_user((void __user *)user_addr, &user_data,
+			 sizeof(user_data))) {
+		pr_warn("copy_to_user failed\n");
+		goto free_user;
+	}
+
+	phys_addr = user_virt_to_phys(user_addr);
+	if (!phys_addr) {
+		pr_warn("Failed to get physical address of user memory\n");
+		goto free_user;
+	}
+
+	virt_addr = phys_to_virt(phys_addr);
+	if (phys_addr != virt_to_phys(virt_addr)) {
+		pr_warn("Physical address of user memory seems incorrect\n");
+		goto free_user;
+	}
+
+	pr_info("Attempting bad read from kernel address %p\n", virt_addr);
+	if (*(unsigned long *)virt_addr == user_data)
+		pr_info("Huh? Bad read succeeded?!\n");
+	else
+		pr_info("Huh? Bad read didn't fail but data is incorrect?!\n");
+
+ free_user:
+	vm_munmap(user_addr, PAGE_SIZE);
+}
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [kernel-hardening] [RFC v4 2/3] lkdtm: Add tests for XPFO
@ 2017-06-07 21:16   ` Tycho Andersen
  0 siblings, 0 replies; 12+ messages in thread
From: Tycho Andersen @ 2017-06-07 21:16 UTC (permalink / raw)
  To: linux-mm
  Cc: Juerg Haefliger, kernel-hardening, Juerg Haefliger, Tycho Andersen

From: Juerg Haefliger <juerg.haefliger@hpe.com>

Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>
minor fixups for 5-level paging, whitespace
Signed-off-by: Tycho Andersen <tycho@docker.com>
---
 drivers/misc/Makefile     |   1 +
 drivers/misc/lkdtm.h      |   3 ++
 drivers/misc/lkdtm_core.c |   1 +
 drivers/misc/lkdtm_xpfo.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 110 insertions(+)

diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 81ef3e67acc9..e0b5065478be 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -61,6 +61,7 @@ lkdtm-$(CONFIG_LKDTM)		+= lkdtm_heap.o
 lkdtm-$(CONFIG_LKDTM)		+= lkdtm_perms.o
 lkdtm-$(CONFIG_LKDTM)		+= lkdtm_rodata_objcopy.o
 lkdtm-$(CONFIG_LKDTM)		+= lkdtm_usercopy.o
+lkdtm-$(CONFIG_LKDTM)		+= lkdtm_xpfo.o
 
 KCOV_INSTRUMENT_lkdtm_rodata.o	:= n
 
diff --git a/drivers/misc/lkdtm.h b/drivers/misc/lkdtm.h
index 3b4976396ec4..ed9c3131bf41 100644
--- a/drivers/misc/lkdtm.h
+++ b/drivers/misc/lkdtm.h
@@ -64,4 +64,7 @@ void lkdtm_USERCOPY_STACK_FRAME_FROM(void);
 void lkdtm_USERCOPY_STACK_BEYOND(void);
 void lkdtm_USERCOPY_KERNEL(void);
 
+/* lkdtm_xpfo.c */
+void lkdtm_XPFO_READ_USER(void);
+
 #endif
diff --git a/drivers/misc/lkdtm_core.c b/drivers/misc/lkdtm_core.c
index 42d2b8e31e6b..4f3b60bb2d31 100644
--- a/drivers/misc/lkdtm_core.c
+++ b/drivers/misc/lkdtm_core.c
@@ -235,6 +235,7 @@ struct crashtype crashtypes[] = {
 	CRASHTYPE(USERCOPY_STACK_FRAME_FROM),
 	CRASHTYPE(USERCOPY_STACK_BEYOND),
 	CRASHTYPE(USERCOPY_KERNEL),
+	CRASHTYPE(XPFO_READ_USER),
 };
 
 
diff --git a/drivers/misc/lkdtm_xpfo.c b/drivers/misc/lkdtm_xpfo.c
new file mode 100644
index 000000000000..540a2539a88b
--- /dev/null
+++ b/drivers/misc/lkdtm_xpfo.c
@@ -0,0 +1,105 @@
+/*
+ * This is for all the tests related to XPFO (eXclusive Page Frame Ownership).
+ */
+
+#include "lkdtm.h"
+
+#include <linux/mman.h>
+#include <linux/uaccess.h>
+
+/* This is hacky... */
+#ifdef CONFIG_ARM64
+#define pud_large(pud) (pud_sect(pud))
+#define pmd_large(pmd) (pmd_sect(pmd))
+#define PUD_PAGE_MASK PUD_MASK
+#define PMD_PAGE_MASK PMD_MASK
+#endif
+
+static phys_addr_t user_virt_to_phys(unsigned long addr)
+{
+	phys_addr_t phys_addr;
+	unsigned long offset;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	p4d_t *p4d;
+
+	pgd = pgd_offset(current->mm, addr);
+	if (pgd_none(*pgd))
+                return 0;
+
+	p4d = p4d_offset(pgd, addr);
+	if (p4d_none(*p4d))
+		return 0;
+
+	pud = pud_offset(p4d, addr);
+	if (pud_none(*pud))
+		return 0;
+
+	if (pud_large(*pud) || !pud_present(*pud)) {
+		phys_addr = (unsigned long)pud_pfn(*pud) << PAGE_SHIFT;
+		offset = addr & ~PUD_PAGE_MASK;
+		goto out;
+	}
+
+	pmd = pmd_offset(pud, addr);
+	if (pmd_none(*pmd))
+		return 0;
+
+	if (pmd_large(*pmd) || !pmd_present(*pmd)) {
+		phys_addr = (unsigned long)pmd_pfn(*pmd) << PAGE_SHIFT;
+		offset = addr & ~PMD_PAGE_MASK;
+		goto out;
+	}
+
+	pte =  pte_offset_kernel(pmd, addr);
+	phys_addr = (phys_addr_t)pte_pfn(*pte) << PAGE_SHIFT;
+	offset = addr & ~PAGE_MASK;
+
+out:
+	return (phys_addr_t)(phys_addr | offset);
+}
+
+/* Read from userspace via the kernel's linear map */
+void lkdtm_XPFO_READ_USER(void)
+{
+	unsigned long user_addr, user_data = 0xdeadbeef;
+	phys_addr_t phys_addr;
+	void *virt_addr;
+
+	user_addr = vm_mmap(NULL, 0, PAGE_SIZE,
+			    PROT_READ | PROT_WRITE | PROT_EXEC,
+			    MAP_ANONYMOUS | MAP_PRIVATE, 0);
+	if (user_addr >= TASK_SIZE) {
+		pr_warn("Failed to allocate user memory\n");
+		return;
+	}
+
+	if (copy_to_user((void __user *)user_addr, &user_data,
+			 sizeof(user_data))) {
+		pr_warn("copy_to_user failed\n");
+		goto free_user;
+	}
+
+	phys_addr = user_virt_to_phys(user_addr);
+	if (!phys_addr) {
+		pr_warn("Failed to get physical address of user memory\n");
+		goto free_user;
+	}
+
+	virt_addr = phys_to_virt(phys_addr);
+	if (phys_addr != virt_to_phys(virt_addr)) {
+		pr_warn("Physical address of user memory seems incorrect\n");
+		goto free_user;
+	}
+
+	pr_info("Attempting bad read from kernel address %p\n", virt_addr);
+	if (*(unsigned long *)virt_addr == user_data)
+		pr_info("Huh? Bad read succeeded?!\n");
+	else
+		pr_info("Huh? Bad read didn't fail but data is incorrect?!\n");
+
+ free_user:
+	vm_munmap(user_addr, PAGE_SIZE);
+}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC v4 3/3] xpfo: add support for hugepages
  2017-06-07 21:16 ` [kernel-hardening] " Tycho Andersen
@ 2017-06-07 21:16   ` Tycho Andersen
  -1 siblings, 0 replies; 12+ messages in thread
From: Tycho Andersen @ 2017-06-07 21:16 UTC (permalink / raw)
  To: linux-mm; +Cc: Juerg Haefliger, kernel-hardening, Tycho Andersen

Based on an earlier draft by Marco Benatto.

Signed-off-by: Tycho Andersen <tycho@docker.com>
CC: Juerg Haefliger <juergh@gmail.com>
---
 arch/x86/include/asm/pgtable.h | 22 +++++++++++++++
 arch/x86/mm/pageattr.c         | 21 +++------------
 arch/x86/mm/xpfo.c             | 61 +++++++++++++++++++++++++++++++++++++++++-
 include/linux/xpfo.h           |  1 +
 mm/xpfo.c                      |  8 ++----
 5 files changed, 88 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index f5af95a0c6b8..58bb43d8b9c1 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1185,6 +1185,28 @@ static inline u16 pte_flags_pkey(unsigned long pte_flags)
 #endif
 }
 
+/*
+ * The current flushing context - we pass it instead of 5 arguments:
+ */
+struct cpa_data {
+	unsigned long	*vaddr;
+	pgd_t		*pgd;
+	pgprot_t	mask_set;
+	pgprot_t	mask_clr;
+	unsigned long	numpages;
+	int		flags;
+	unsigned long	pfn;
+	unsigned	force_split : 1;
+	int		curpage;
+	struct page	**pages;
+};
+
+int
+try_preserve_large_page(pte_t *kpte, unsigned long address,
+			struct cpa_data *cpa);
+int split_large_page(struct cpa_data *cpa, pte_t *kpte,
+		     unsigned long address);
+
 #include <asm-generic/pgtable.h>
 #endif	/* __ASSEMBLY__ */
 
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 1dcd2be4cce4..6d6a78e6e023 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -26,21 +26,6 @@
 #include <asm/pat.h>
 #include <asm/set_memory.h>
 
-/*
- * The current flushing context - we pass it instead of 5 arguments:
- */
-struct cpa_data {
-	unsigned long	*vaddr;
-	pgd_t		*pgd;
-	pgprot_t	mask_set;
-	pgprot_t	mask_clr;
-	unsigned long	numpages;
-	int		flags;
-	unsigned long	pfn;
-	unsigned	force_split : 1;
-	int		curpage;
-	struct page	**pages;
-};
 
 /*
  * Serialize cpa() (for !DEBUG_PAGEALLOC which uses large identity mappings)
@@ -506,7 +491,7 @@ static void __set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte)
 #endif
 }
 
-static int
+int
 try_preserve_large_page(pte_t *kpte, unsigned long address,
 			struct cpa_data *cpa)
 {
@@ -740,8 +725,8 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
 	return 0;
 }
 
-static int split_large_page(struct cpa_data *cpa, pte_t *kpte,
-			    unsigned long address)
+int split_large_page(struct cpa_data *cpa, pte_t *kpte,
+		     unsigned long address)
 {
 	struct page *base;
 
diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c
index c24b06c9b4ab..818da3ebc077 100644
--- a/arch/x86/mm/xpfo.c
+++ b/arch/x86/mm/xpfo.c
@@ -13,11 +13,70 @@
 
 #include <linux/mm.h>
 
+#include <asm/tlbflush.h>
+
 /* Update a single kernel page table entry */
 inline void set_kpte(void *kaddr, struct page *page, pgprot_t prot)
 {
 	unsigned int level;
 	pte_t *pte = lookup_address((unsigned long)kaddr, &level);
 
-	set_pte_atomic(pte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
+
+	BUG_ON(!pte);
+
+	switch (level) {
+	case PG_LEVEL_4K:
+		set_pte_atomic(pte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
+		break;
+	case PG_LEVEL_2M:
+	case PG_LEVEL_1G: {
+		struct cpa_data cpa;
+		int do_split;
+
+		memset(&cpa, 0, sizeof(cpa));
+		cpa.vaddr = kaddr;
+		cpa.pages = &page;
+		cpa.mask_set = prot;
+		pgprot_val(cpa.mask_clr) = ~pgprot_val(prot);
+		cpa.numpages = 1;
+		cpa.flags = 0;
+		cpa.curpage = 0;
+		cpa.force_split = 0;
+
+		do_split = try_preserve_large_page(pte, (unsigned long)kaddr, &cpa);
+		if (do_split < 0)
+			BUG_ON(split_large_page(&cpa, pte, (unsigned long)kaddr));
+
+		break;
+	}
+	default:
+		BUG();
+	}
+
+}
+
+inline void xpfo_flush_kernel_page(struct page *page, int order)
+{
+	int level;
+	unsigned long size, kaddr;
+
+	kaddr = (unsigned long)page_address(page);
+	lookup_address(kaddr, &level);
+
+
+	switch (level) {
+	case PG_LEVEL_4K:
+		size = PAGE_SIZE;
+		break;
+	case PG_LEVEL_2M:
+		size = PMD_SIZE;
+		break;
+	case PG_LEVEL_1G:
+		size = PUD_SIZE;
+		break;
+	default:
+		BUG();
+	}
+
+	flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * size);
 }
diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h
index 031cbee22a41..a0f0101720f6 100644
--- a/include/linux/xpfo.h
+++ b/include/linux/xpfo.h
@@ -19,6 +19,7 @@
 extern struct page_ext_operations page_xpfo_ops;
 
 void set_kpte(void *kaddr, struct page *page, pgprot_t prot);
+void xpfo_flush_kernel_page(struct page *page, int order);
 
 void xpfo_kmap(void *kaddr, struct page *page);
 void xpfo_kunmap(void *kaddr, struct page *page);
diff --git a/mm/xpfo.c b/mm/xpfo.c
index 8384058136b1..895de28108da 100644
--- a/mm/xpfo.c
+++ b/mm/xpfo.c
@@ -78,7 +78,6 @@ void xpfo_alloc_pages(struct page *page, int order, gfp_t gfp)
 {
 	int i, flush_tlb = 0;
 	struct xpfo *xpfo;
-	unsigned long kaddr;
 
 	if (!static_branch_unlikely(&xpfo_inited))
 		return;
@@ -109,11 +108,8 @@ void xpfo_alloc_pages(struct page *page, int order, gfp_t gfp)
 		}
 	}
 
-	if (flush_tlb) {
-		kaddr = (unsigned long)page_address(page);
-		flush_tlb_kernel_range(kaddr, kaddr + (1 << order) *
-				       PAGE_SIZE);
-	}
+	if (flush_tlb)
+		xpfo_flush_kernel_page(page, order);
 }
 
 void xpfo_free_pages(struct page *page, int order)
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [kernel-hardening] [RFC v4 3/3] xpfo: add support for hugepages
@ 2017-06-07 21:16   ` Tycho Andersen
  0 siblings, 0 replies; 12+ messages in thread
From: Tycho Andersen @ 2017-06-07 21:16 UTC (permalink / raw)
  To: linux-mm; +Cc: Juerg Haefliger, kernel-hardening, Tycho Andersen

Based on an earlier draft by Marco Benatto.

Signed-off-by: Tycho Andersen <tycho@docker.com>
CC: Juerg Haefliger <juergh@gmail.com>
---
 arch/x86/include/asm/pgtable.h | 22 +++++++++++++++
 arch/x86/mm/pageattr.c         | 21 +++------------
 arch/x86/mm/xpfo.c             | 61 +++++++++++++++++++++++++++++++++++++++++-
 include/linux/xpfo.h           |  1 +
 mm/xpfo.c                      |  8 ++----
 5 files changed, 88 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index f5af95a0c6b8..58bb43d8b9c1 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1185,6 +1185,28 @@ static inline u16 pte_flags_pkey(unsigned long pte_flags)
 #endif
 }
 
+/*
+ * The current flushing context - we pass it instead of 5 arguments:
+ */
+struct cpa_data {
+	unsigned long	*vaddr;
+	pgd_t		*pgd;
+	pgprot_t	mask_set;
+	pgprot_t	mask_clr;
+	unsigned long	numpages;
+	int		flags;
+	unsigned long	pfn;
+	unsigned	force_split : 1;
+	int		curpage;
+	struct page	**pages;
+};
+
+int
+try_preserve_large_page(pte_t *kpte, unsigned long address,
+			struct cpa_data *cpa);
+int split_large_page(struct cpa_data *cpa, pte_t *kpte,
+		     unsigned long address);
+
 #include <asm-generic/pgtable.h>
 #endif	/* __ASSEMBLY__ */
 
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 1dcd2be4cce4..6d6a78e6e023 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -26,21 +26,6 @@
 #include <asm/pat.h>
 #include <asm/set_memory.h>
 
-/*
- * The current flushing context - we pass it instead of 5 arguments:
- */
-struct cpa_data {
-	unsigned long	*vaddr;
-	pgd_t		*pgd;
-	pgprot_t	mask_set;
-	pgprot_t	mask_clr;
-	unsigned long	numpages;
-	int		flags;
-	unsigned long	pfn;
-	unsigned	force_split : 1;
-	int		curpage;
-	struct page	**pages;
-};
 
 /*
  * Serialize cpa() (for !DEBUG_PAGEALLOC which uses large identity mappings)
@@ -506,7 +491,7 @@ static void __set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte)
 #endif
 }
 
-static int
+int
 try_preserve_large_page(pte_t *kpte, unsigned long address,
 			struct cpa_data *cpa)
 {
@@ -740,8 +725,8 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
 	return 0;
 }
 
-static int split_large_page(struct cpa_data *cpa, pte_t *kpte,
-			    unsigned long address)
+int split_large_page(struct cpa_data *cpa, pte_t *kpte,
+		     unsigned long address)
 {
 	struct page *base;
 
diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c
index c24b06c9b4ab..818da3ebc077 100644
--- a/arch/x86/mm/xpfo.c
+++ b/arch/x86/mm/xpfo.c
@@ -13,11 +13,70 @@
 
 #include <linux/mm.h>
 
+#include <asm/tlbflush.h>
+
 /* Update a single kernel page table entry */
 inline void set_kpte(void *kaddr, struct page *page, pgprot_t prot)
 {
 	unsigned int level;
 	pte_t *pte = lookup_address((unsigned long)kaddr, &level);
 
-	set_pte_atomic(pte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
+
+	BUG_ON(!pte);
+
+	switch (level) {
+	case PG_LEVEL_4K:
+		set_pte_atomic(pte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
+		break;
+	case PG_LEVEL_2M:
+	case PG_LEVEL_1G: {
+		struct cpa_data cpa;
+		int do_split;
+
+		memset(&cpa, 0, sizeof(cpa));
+		cpa.vaddr = kaddr;
+		cpa.pages = &page;
+		cpa.mask_set = prot;
+		pgprot_val(cpa.mask_clr) = ~pgprot_val(prot);
+		cpa.numpages = 1;
+		cpa.flags = 0;
+		cpa.curpage = 0;
+		cpa.force_split = 0;
+
+		do_split = try_preserve_large_page(pte, (unsigned long)kaddr, &cpa);
+		if (do_split < 0)
+			BUG_ON(split_large_page(&cpa, pte, (unsigned long)kaddr));
+
+		break;
+	}
+	default:
+		BUG();
+	}
+
+}
+
+inline void xpfo_flush_kernel_page(struct page *page, int order)
+{
+	int level;
+	unsigned long size, kaddr;
+
+	kaddr = (unsigned long)page_address(page);
+	lookup_address(kaddr, &level);
+
+
+	switch (level) {
+	case PG_LEVEL_4K:
+		size = PAGE_SIZE;
+		break;
+	case PG_LEVEL_2M:
+		size = PMD_SIZE;
+		break;
+	case PG_LEVEL_1G:
+		size = PUD_SIZE;
+		break;
+	default:
+		BUG();
+	}
+
+	flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * size);
 }
diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h
index 031cbee22a41..a0f0101720f6 100644
--- a/include/linux/xpfo.h
+++ b/include/linux/xpfo.h
@@ -19,6 +19,7 @@
 extern struct page_ext_operations page_xpfo_ops;
 
 void set_kpte(void *kaddr, struct page *page, pgprot_t prot);
+void xpfo_flush_kernel_page(struct page *page, int order);
 
 void xpfo_kmap(void *kaddr, struct page *page);
 void xpfo_kunmap(void *kaddr, struct page *page);
diff --git a/mm/xpfo.c b/mm/xpfo.c
index 8384058136b1..895de28108da 100644
--- a/mm/xpfo.c
+++ b/mm/xpfo.c
@@ -78,7 +78,6 @@ void xpfo_alloc_pages(struct page *page, int order, gfp_t gfp)
 {
 	int i, flush_tlb = 0;
 	struct xpfo *xpfo;
-	unsigned long kaddr;
 
 	if (!static_branch_unlikely(&xpfo_inited))
 		return;
@@ -109,11 +108,8 @@ void xpfo_alloc_pages(struct page *page, int order, gfp_t gfp)
 		}
 	}
 
-	if (flush_tlb) {
-		kaddr = (unsigned long)page_address(page);
-		flush_tlb_kernel_range(kaddr, kaddr + (1 << order) *
-				       PAGE_SIZE);
-	}
+	if (flush_tlb)
+		xpfo_flush_kernel_page(page, order);
 }
 
 void xpfo_free_pages(struct page *page, int order)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC v4 3/3] xpfo: add support for hugepages
  2017-06-07 21:16   ` [kernel-hardening] " Tycho Andersen
@ 2017-06-10  0:23     ` Laura Abbott
  -1 siblings, 0 replies; 12+ messages in thread
From: Laura Abbott @ 2017-06-10  0:23 UTC (permalink / raw)
  To: Tycho Andersen, linux-mm; +Cc: Juerg Haefliger, kernel-hardening

On 06/07/2017 02:16 PM, Tycho Andersen wrote:
> Based on an earlier draft by Marco Benatto.
> 
> Signed-off-by: Tycho Andersen <tycho@docker.com>
> CC: Juerg Haefliger <juergh@gmail.com>
> ---
>  arch/x86/include/asm/pgtable.h | 22 +++++++++++++++
>  arch/x86/mm/pageattr.c         | 21 +++------------
>  arch/x86/mm/xpfo.c             | 61 +++++++++++++++++++++++++++++++++++++++++-
>  include/linux/xpfo.h           |  1 +
>  mm/xpfo.c                      |  8 ++----
>  5 files changed, 88 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index f5af95a0c6b8..58bb43d8b9c1 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -1185,6 +1185,28 @@ static inline u16 pte_flags_pkey(unsigned long pte_flags)
>  #endif
>  }
>  
> +/*
> + * The current flushing context - we pass it instead of 5 arguments:
> + */
> +struct cpa_data {
> +	unsigned long	*vaddr;
> +	pgd_t		*pgd;
> +	pgprot_t	mask_set;
> +	pgprot_t	mask_clr;
> +	unsigned long	numpages;
> +	int		flags;
> +	unsigned long	pfn;
> +	unsigned	force_split : 1;
> +	int		curpage;
> +	struct page	**pages;
> +};
> +
> +int
> +try_preserve_large_page(pte_t *kpte, unsigned long address,
> +			struct cpa_data *cpa);
> +int split_large_page(struct cpa_data *cpa, pte_t *kpte,
> +		     unsigned long address);
> +
>  #include <asm-generic/pgtable.h>
>  #endif	/* __ASSEMBLY__ */
>  
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 1dcd2be4cce4..6d6a78e6e023 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -26,21 +26,6 @@
>  #include <asm/pat.h>
>  #include <asm/set_memory.h>
>  
> -/*
> - * The current flushing context - we pass it instead of 5 arguments:
> - */
> -struct cpa_data {
> -	unsigned long	*vaddr;
> -	pgd_t		*pgd;
> -	pgprot_t	mask_set;
> -	pgprot_t	mask_clr;
> -	unsigned long	numpages;
> -	int		flags;
> -	unsigned long	pfn;
> -	unsigned	force_split : 1;
> -	int		curpage;
> -	struct page	**pages;
> -};
>  
>  /*
>   * Serialize cpa() (for !DEBUG_PAGEALLOC which uses large identity mappings)
> @@ -506,7 +491,7 @@ static void __set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte)
>  #endif
>  }
>  
> -static int
> +int
>  try_preserve_large_page(pte_t *kpte, unsigned long address,
>  			struct cpa_data *cpa)
>  {
> @@ -740,8 +725,8 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
>  	return 0;
>  }
>  
> -static int split_large_page(struct cpa_data *cpa, pte_t *kpte,
> -			    unsigned long address)
> +int split_large_page(struct cpa_data *cpa, pte_t *kpte,
> +		     unsigned long address)
>  {
>  	struct page *base;
>  
> diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c
> index c24b06c9b4ab..818da3ebc077 100644
> --- a/arch/x86/mm/xpfo.c
> +++ b/arch/x86/mm/xpfo.c
> @@ -13,11 +13,70 @@
>  
>  #include <linux/mm.h>
>  
> +#include <asm/tlbflush.h>
> +
>  /* Update a single kernel page table entry */
>  inline void set_kpte(void *kaddr, struct page *page, pgprot_t prot)
>  {
>  	unsigned int level;
>  	pte_t *pte = lookup_address((unsigned long)kaddr, &level);
>  
> -	set_pte_atomic(pte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
> +
> +	BUG_ON(!pte);
> +
> +	switch (level) {
> +	case PG_LEVEL_4K:
> +		set_pte_atomic(pte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
> +		break;
> +	case PG_LEVEL_2M:
> +	case PG_LEVEL_1G: {
> +		struct cpa_data cpa;
> +		int do_split;
> +
> +		memset(&cpa, 0, sizeof(cpa));
> +		cpa.vaddr = kaddr;
> +		cpa.pages = &page;
> +		cpa.mask_set = prot;
> +		pgprot_val(cpa.mask_clr) = ~pgprot_val(prot);
> +		cpa.numpages = 1;
> +		cpa.flags = 0;
> +		cpa.curpage = 0;
> +		cpa.force_split = 0;
> +
> +		do_split = try_preserve_large_page(pte, (unsigned long)kaddr, &cpa);
> +		if (do_split < 0)

I can't reproduce the failure you describe in the cover letter but are you sure this
check is correct? It looks like try_preserve_large_page can return 1 on failure
and you still need to call split_large_page.

> +			BUG_ON(split_large_page(&cpa, pte, (unsigned long)kaddr));
> +
> +		break;
> +	}
> +	default:
> +		BUG();
> +	}
> +
> +}
> +
> +inline void xpfo_flush_kernel_page(struct page *page, int order)
> +{
> +	int level;
> +	unsigned long size, kaddr;
> +
> +	kaddr = (unsigned long)page_address(page);
> +	lookup_address(kaddr, &level);
> +
> +
> +	switch (level) {
> +	case PG_LEVEL_4K:
> +		size = PAGE_SIZE;
> +		break;
> +	case PG_LEVEL_2M:
> +		size = PMD_SIZE;
> +		break;
> +	case PG_LEVEL_1G:
> +		size = PUD_SIZE;
> +		break;
> +	default:
> +		BUG();
> +	}
> +
> +	flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * size);
>  }
> diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h
> index 031cbee22a41..a0f0101720f6 100644
> --- a/include/linux/xpfo.h
> +++ b/include/linux/xpfo.h
> @@ -19,6 +19,7 @@
>  extern struct page_ext_operations page_xpfo_ops;
>  
>  void set_kpte(void *kaddr, struct page *page, pgprot_t prot);
> +void xpfo_flush_kernel_page(struct page *page, int order);
>  
>  void xpfo_kmap(void *kaddr, struct page *page);
>  void xpfo_kunmap(void *kaddr, struct page *page);
> diff --git a/mm/xpfo.c b/mm/xpfo.c
> index 8384058136b1..895de28108da 100644
> --- a/mm/xpfo.c
> +++ b/mm/xpfo.c
> @@ -78,7 +78,6 @@ void xpfo_alloc_pages(struct page *page, int order, gfp_t gfp)
>  {
>  	int i, flush_tlb = 0;
>  	struct xpfo *xpfo;
> -	unsigned long kaddr;
>  
>  	if (!static_branch_unlikely(&xpfo_inited))
>  		return;
> @@ -109,11 +108,8 @@ void xpfo_alloc_pages(struct page *page, int order, gfp_t gfp)
>  		}
>  	}
>  
> -	if (flush_tlb) {
> -		kaddr = (unsigned long)page_address(page);
> -		flush_tlb_kernel_range(kaddr, kaddr + (1 << order) *
> -				       PAGE_SIZE);
> -	}
> +	if (flush_tlb)
> +		xpfo_flush_kernel_page(page, order);
>  }
>  
>  void xpfo_free_pages(struct page *page, int order)
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [kernel-hardening] Re: [RFC v4 3/3] xpfo: add support for hugepages
@ 2017-06-10  0:23     ` Laura Abbott
  0 siblings, 0 replies; 12+ messages in thread
From: Laura Abbott @ 2017-06-10  0:23 UTC (permalink / raw)
  To: Tycho Andersen, linux-mm; +Cc: Juerg Haefliger, kernel-hardening

On 06/07/2017 02:16 PM, Tycho Andersen wrote:
> Based on an earlier draft by Marco Benatto.
> 
> Signed-off-by: Tycho Andersen <tycho@docker.com>
> CC: Juerg Haefliger <juergh@gmail.com>
> ---
>  arch/x86/include/asm/pgtable.h | 22 +++++++++++++++
>  arch/x86/mm/pageattr.c         | 21 +++------------
>  arch/x86/mm/xpfo.c             | 61 +++++++++++++++++++++++++++++++++++++++++-
>  include/linux/xpfo.h           |  1 +
>  mm/xpfo.c                      |  8 ++----
>  5 files changed, 88 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index f5af95a0c6b8..58bb43d8b9c1 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -1185,6 +1185,28 @@ static inline u16 pte_flags_pkey(unsigned long pte_flags)
>  #endif
>  }
>  
> +/*
> + * The current flushing context - we pass it instead of 5 arguments:
> + */
> +struct cpa_data {
> +	unsigned long	*vaddr;
> +	pgd_t		*pgd;
> +	pgprot_t	mask_set;
> +	pgprot_t	mask_clr;
> +	unsigned long	numpages;
> +	int		flags;
> +	unsigned long	pfn;
> +	unsigned	force_split : 1;
> +	int		curpage;
> +	struct page	**pages;
> +};
> +
> +int
> +try_preserve_large_page(pte_t *kpte, unsigned long address,
> +			struct cpa_data *cpa);
> +int split_large_page(struct cpa_data *cpa, pte_t *kpte,
> +		     unsigned long address);
> +
>  #include <asm-generic/pgtable.h>
>  #endif	/* __ASSEMBLY__ */
>  
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 1dcd2be4cce4..6d6a78e6e023 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -26,21 +26,6 @@
>  #include <asm/pat.h>
>  #include <asm/set_memory.h>
>  
> -/*
> - * The current flushing context - we pass it instead of 5 arguments:
> - */
> -struct cpa_data {
> -	unsigned long	*vaddr;
> -	pgd_t		*pgd;
> -	pgprot_t	mask_set;
> -	pgprot_t	mask_clr;
> -	unsigned long	numpages;
> -	int		flags;
> -	unsigned long	pfn;
> -	unsigned	force_split : 1;
> -	int		curpage;
> -	struct page	**pages;
> -};
>  
>  /*
>   * Serialize cpa() (for !DEBUG_PAGEALLOC which uses large identity mappings)
> @@ -506,7 +491,7 @@ static void __set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte)
>  #endif
>  }
>  
> -static int
> +int
>  try_preserve_large_page(pte_t *kpte, unsigned long address,
>  			struct cpa_data *cpa)
>  {
> @@ -740,8 +725,8 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
>  	return 0;
>  }
>  
> -static int split_large_page(struct cpa_data *cpa, pte_t *kpte,
> -			    unsigned long address)
> +int split_large_page(struct cpa_data *cpa, pte_t *kpte,
> +		     unsigned long address)
>  {
>  	struct page *base;
>  
> diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c
> index c24b06c9b4ab..818da3ebc077 100644
> --- a/arch/x86/mm/xpfo.c
> +++ b/arch/x86/mm/xpfo.c
> @@ -13,11 +13,70 @@
>  
>  #include <linux/mm.h>
>  
> +#include <asm/tlbflush.h>
> +
>  /* Update a single kernel page table entry */
>  inline void set_kpte(void *kaddr, struct page *page, pgprot_t prot)
>  {
>  	unsigned int level;
>  	pte_t *pte = lookup_address((unsigned long)kaddr, &level);
>  
> -	set_pte_atomic(pte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
> +
> +	BUG_ON(!pte);
> +
> +	switch (level) {
> +	case PG_LEVEL_4K:
> +		set_pte_atomic(pte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
> +		break;
> +	case PG_LEVEL_2M:
> +	case PG_LEVEL_1G: {
> +		struct cpa_data cpa;
> +		int do_split;
> +
> +		memset(&cpa, 0, sizeof(cpa));
> +		cpa.vaddr = kaddr;
> +		cpa.pages = &page;
> +		cpa.mask_set = prot;
> +		pgprot_val(cpa.mask_clr) = ~pgprot_val(prot);
> +		cpa.numpages = 1;
> +		cpa.flags = 0;
> +		cpa.curpage = 0;
> +		cpa.force_split = 0;
> +
> +		do_split = try_preserve_large_page(pte, (unsigned long)kaddr, &cpa);
> +		if (do_split < 0)

I can't reproduce the failure you describe in the cover letter but are you sure this
check is correct? It looks like try_preserve_large_page can return 1 on failure
and you still need to call split_large_page.

> +			BUG_ON(split_large_page(&cpa, pte, (unsigned long)kaddr));
> +
> +		break;
> +	}
> +	default:
> +		BUG();
> +	}
> +
> +}
> +
> +inline void xpfo_flush_kernel_page(struct page *page, int order)
> +{
> +	int level;
> +	unsigned long size, kaddr;
> +
> +	kaddr = (unsigned long)page_address(page);
> +	lookup_address(kaddr, &level);
> +
> +
> +	switch (level) {
> +	case PG_LEVEL_4K:
> +		size = PAGE_SIZE;
> +		break;
> +	case PG_LEVEL_2M:
> +		size = PMD_SIZE;
> +		break;
> +	case PG_LEVEL_1G:
> +		size = PUD_SIZE;
> +		break;
> +	default:
> +		BUG();
> +	}
> +
> +	flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * size);
>  }
> diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h
> index 031cbee22a41..a0f0101720f6 100644
> --- a/include/linux/xpfo.h
> +++ b/include/linux/xpfo.h
> @@ -19,6 +19,7 @@
>  extern struct page_ext_operations page_xpfo_ops;
>  
>  void set_kpte(void *kaddr, struct page *page, pgprot_t prot);
> +void xpfo_flush_kernel_page(struct page *page, int order);
>  
>  void xpfo_kmap(void *kaddr, struct page *page);
>  void xpfo_kunmap(void *kaddr, struct page *page);
> diff --git a/mm/xpfo.c b/mm/xpfo.c
> index 8384058136b1..895de28108da 100644
> --- a/mm/xpfo.c
> +++ b/mm/xpfo.c
> @@ -78,7 +78,6 @@ void xpfo_alloc_pages(struct page *page, int order, gfp_t gfp)
>  {
>  	int i, flush_tlb = 0;
>  	struct xpfo *xpfo;
> -	unsigned long kaddr;
>  
>  	if (!static_branch_unlikely(&xpfo_inited))
>  		return;
> @@ -109,11 +108,8 @@ void xpfo_alloc_pages(struct page *page, int order, gfp_t gfp)
>  		}
>  	}
>  
> -	if (flush_tlb) {
> -		kaddr = (unsigned long)page_address(page);
> -		flush_tlb_kernel_range(kaddr, kaddr + (1 << order) *
> -				       PAGE_SIZE);
> -	}
> +	if (flush_tlb)
> +		xpfo_flush_kernel_page(page, order);
>  }
>  
>  void xpfo_free_pages(struct page *page, int order)
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC v4 3/3] xpfo: add support for hugepages
  2017-06-10  0:23     ` [kernel-hardening] " Laura Abbott
@ 2017-06-12 14:31       ` Tycho Andersen
  -1 siblings, 0 replies; 12+ messages in thread
From: Tycho Andersen @ 2017-06-12 14:31 UTC (permalink / raw)
  To: Laura Abbott; +Cc: linux-mm, Juerg Haefliger, kernel-hardening

Hi Laura,

Thanks for taking a look.

On Fri, Jun 09, 2017 at 05:23:06PM -0700, Laura Abbott wrote:
> > -	set_pte_atomic(pte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
> > +
> > +	BUG_ON(!pte);
> > +
> > +	switch (level) {
> > +	case PG_LEVEL_4K:
> > +		set_pte_atomic(pte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
> > +		break;
> > +	case PG_LEVEL_2M:
> > +	case PG_LEVEL_1G: {
> > +		struct cpa_data cpa;
> > +		int do_split;
> > +
> > +		memset(&cpa, 0, sizeof(cpa));
> > +		cpa.vaddr = kaddr;
> > +		cpa.pages = &page;
> > +		cpa.mask_set = prot;
> > +		pgprot_val(cpa.mask_clr) = ~pgprot_val(prot);
> > +		cpa.numpages = 1;
> > +		cpa.flags = 0;
> > +		cpa.curpage = 0;
> > +		cpa.force_split = 0;
> > +
> > +		do_split = try_preserve_large_page(pte, (unsigned long)kaddr, &cpa);
> > +		if (do_split < 0)
> 
> I can't reproduce the failure you describe in the cover letter but are you sure this
> check is correct?

The check seems to only happen when splitting up a large page,
indicating that...

> It looks like try_preserve_large_page can return 1 on failure
> and you still need to call split_large_page.

...yes, you're absolutely right. When I fix this, it now fails to
boot, stalling on unpacking the initramfs. So it seems something else
is wrong too.

Cheers,

Tycho

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [kernel-hardening] Re: [RFC v4 3/3] xpfo: add support for hugepages
@ 2017-06-12 14:31       ` Tycho Andersen
  0 siblings, 0 replies; 12+ messages in thread
From: Tycho Andersen @ 2017-06-12 14:31 UTC (permalink / raw)
  To: Laura Abbott; +Cc: linux-mm, Juerg Haefliger, kernel-hardening

Hi Laura,

Thanks for taking a look.

On Fri, Jun 09, 2017 at 05:23:06PM -0700, Laura Abbott wrote:
> > -	set_pte_atomic(pte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
> > +
> > +	BUG_ON(!pte);
> > +
> > +	switch (level) {
> > +	case PG_LEVEL_4K:
> > +		set_pte_atomic(pte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)));
> > +		break;
> > +	case PG_LEVEL_2M:
> > +	case PG_LEVEL_1G: {
> > +		struct cpa_data cpa;
> > +		int do_split;
> > +
> > +		memset(&cpa, 0, sizeof(cpa));
> > +		cpa.vaddr = kaddr;
> > +		cpa.pages = &page;
> > +		cpa.mask_set = prot;
> > +		pgprot_val(cpa.mask_clr) = ~pgprot_val(prot);
> > +		cpa.numpages = 1;
> > +		cpa.flags = 0;
> > +		cpa.curpage = 0;
> > +		cpa.force_split = 0;
> > +
> > +		do_split = try_preserve_large_page(pte, (unsigned long)kaddr, &cpa);
> > +		if (do_split < 0)
> 
> I can't reproduce the failure you describe in the cover letter but are you sure this
> check is correct?

The check seems to only happen when splitting up a large page,
indicating that...

> It looks like try_preserve_large_page can return 1 on failure
> and you still need to call split_large_page.

...yes, you're absolutely right. When I fix this, it now fails to
boot, stalling on unpacking the initramfs. So it seems something else
is wrong too.

Cheers,

Tycho

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-06-12 14:31 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-07 21:16 [RFC v4 0/3] Add support for eXclusive Page Frame Ownership Tycho Andersen
2017-06-07 21:16 ` [kernel-hardening] " Tycho Andersen
2017-06-07 21:16 ` [RFC v4 1/3] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO) Tycho Andersen
2017-06-07 21:16   ` [kernel-hardening] " Tycho Andersen
2017-06-07 21:16 ` [RFC v4 2/3] lkdtm: Add tests for XPFO Tycho Andersen
2017-06-07 21:16   ` [kernel-hardening] " Tycho Andersen
2017-06-07 21:16 ` [RFC v4 3/3] xpfo: add support for hugepages Tycho Andersen
2017-06-07 21:16   ` [kernel-hardening] " Tycho Andersen
2017-06-10  0:23   ` Laura Abbott
2017-06-10  0:23     ` [kernel-hardening] " Laura Abbott
2017-06-12 14:31     ` Tycho Andersen
2017-06-12 14:31       ` [kernel-hardening] " Tycho Andersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.