linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/3] page table check
@ 2021-11-16 22:00 Pasha Tatashin
  2021-11-16 22:00 ` [RFC 1/3] mm: ptep_clear() page table helper Pasha Tatashin
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Pasha Tatashin @ 2021-11-16 22:00 UTC (permalink / raw)
  To: pasha.tatashin, linux-kernel, linux-mm, linux-doc, akpm,
	rientjes, pjt, weixugc, gthelen, mingo, corbet, will, rppt,
	keescook, tglx, peterz, masahiroy, samitolvanen, dave.hansen,
	x86, frederic, hpa, aneesh.kumar

From: Pasha Tatashin <tatashin@google.com>

Ensure that some memory corruptions are prevented by checking at the
time of insertion of entries into user page tables that there is no
illegal sharing.

We have recently found a problem [1] that existed in kernel since 4.14.
The problem was caused by broken page ref count and led to memory
leaking from one process into another. The problem was accidentally
detected by studying a dump of one process and noticing that one page
contains memory that should not belong to this process.

There are some other page->_refcount related problems that were recently
fixed: [2], [3] which potentially could also lead to illegal sharing.

In addition to hardening refcount [4] itself, this work is an attempt to
prevent this class of memory corruption issues.

It uses a simple state machine that is independent from regular MM logic
to check for illegal sharing at time pages are inserted and removed 
from page tables.

[1] https://lore.kernel.org/all/xr9335nxwc5y.fsf@gthelen2.svl.corp.google.com
[2] https://lore.kernel.org/all/1582661774-30925-2-git-send-email-akaher@vmware.com
[3] https://lore.kernel.org/all/20210622021423.154662-3-mike.kravetz@oracle.com
[4] https://lore.kernel.org/all/20211026173822.502506-1-pasha.tatashin@soleen.com

Pasha Tatashin (3):
  mm: ptep_clear() page table helper
  mm: page table check
  x86: mm: add x86_64 support for page table check

 Documentation/vm/arch_pgtable_helpers.rst |   6 +-
 Documentation/vm/page_table_check.rst     |  53 +++++
 MAINTAINERS                               |   9 +
 arch/Kconfig                              |   3 +
 arch/x86/Kconfig                          |   1 +
 arch/x86/include/asm/pgtable.h            |  27 ++-
 include/linux/page_table_check.h          | 147 ++++++++++++
 include/linux/pgtable.h                   |   8 +
 mm/Kconfig.debug                          |  24 ++
 mm/Makefile                               |   1 +
 mm/khugepaged.c                           |  12 +-
 mm/page_alloc.c                           |   4 +
 mm/page_ext.c                             |   4 +
 mm/page_table_check.c                     | 264 ++++++++++++++++++++++
 14 files changed, 549 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/vm/page_table_check.rst
 create mode 100644 include/linux/page_table_check.h
 create mode 100644 mm/page_table_check.c

-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC 1/3] mm: ptep_clear() page table helper
  2021-11-16 22:00 [RFC 0/3] page table check Pasha Tatashin
@ 2021-11-16 22:00 ` Pasha Tatashin
  2021-11-17  8:51   ` Anshuman Khandual
  2021-11-16 22:00 ` [RFC 2/3] mm: page table check Pasha Tatashin
  2021-11-16 22:00 ` [RFC 3/3] x86: mm: add x86_64 support for " Pasha Tatashin
  2 siblings, 1 reply; 8+ messages in thread
From: Pasha Tatashin @ 2021-11-16 22:00 UTC (permalink / raw)
  To: pasha.tatashin, linux-kernel, linux-mm, linux-doc, akpm,
	rientjes, pjt, weixugc, gthelen, mingo, corbet, will, rppt,
	keescook, tglx, peterz, masahiroy, samitolvanen, dave.hansen,
	x86, frederic, hpa, aneesh.kumar

We have ptep_get_and_clear() and ptep_get_and_clear_full() helpers to
clear PTE from user page tables, but there is no variant for simple
clear of a present PTE from user page tables without using a low level
pte_clear() which can be either native or para-virtualised.

Add a new ptep_clear() that can be used in common code to clear PTEs
from page table. We will need this call later in order to add a hook
for page table check.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 Documentation/vm/arch_pgtable_helpers.rst |  6 ++++--
 include/linux/pgtable.h                   |  8 ++++++++
 mm/khugepaged.c                           | 12 ++----------
 3 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/Documentation/vm/arch_pgtable_helpers.rst b/Documentation/vm/arch_pgtable_helpers.rst
index 552567d863b8..fbe06ec75370 100644
--- a/Documentation/vm/arch_pgtable_helpers.rst
+++ b/Documentation/vm/arch_pgtable_helpers.rst
@@ -66,9 +66,11 @@ PTE Page Table Helpers
 +---------------------------+--------------------------------------------------+
 | pte_mknotpresent          | Invalidates a mapped PTE                         |
 +---------------------------+--------------------------------------------------+
-| ptep_get_and_clear        | Clears a PTE                                     |
+| ptep_clear                | Clears a PTE                                     |
 +---------------------------+--------------------------------------------------+
-| ptep_get_and_clear_full   | Clears a PTE                                     |
+| ptep_get_and_clear        | Clears and returns PTE                           |
++---------------------------+--------------------------------------------------+
+| ptep_get_and_clear_full   | Clears and returns PTE (batched PTE unmap)       |
 +---------------------------+--------------------------------------------------+
 | ptep_test_and_clear_young | Clears young from a PTE                          |
 +---------------------------+--------------------------------------------------+
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index e24d2c992b11..bc8713a76e03 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -258,6 +258,14 @@ static inline int pmdp_clear_flush_young(struct vm_area_struct *vma,
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 #endif
 
+#ifndef __HAVE_ARCH_PTEP_CLEAR
+static inline void ptep_clear(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep)
+{
+	pte_clear(mm, addr, ptep);
+}
+#endif
+
 #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 				       unsigned long address,
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 5f02fda6f265..6ae659ef7e08 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -756,11 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
 				 * ptl mostly unnecessary.
 				 */
 				spin_lock(ptl);
-				/*
-				 * paravirt calls inside pte_clear here are
-				 * superfluous.
-				 */
-				pte_clear(vma->vm_mm, address, _pte);
+				ptep_clear(vma->vm_mm, address, _pte);
 				spin_unlock(ptl);
 			}
 		} else {
@@ -774,11 +770,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
 			 * inside page_remove_rmap().
 			 */
 			spin_lock(ptl);
-			/*
-			 * paravirt calls inside pte_clear here are
-			 * superfluous.
-			 */
-			pte_clear(vma->vm_mm, address, _pte);
+			ptep_clear(vma->vm_mm, address, _pte);
 			page_remove_rmap(src_page, false);
 			spin_unlock(ptl);
 			free_page_and_swap_cache(src_page);
-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC 2/3] mm: page table check
  2021-11-16 22:00 [RFC 0/3] page table check Pasha Tatashin
  2021-11-16 22:00 ` [RFC 1/3] mm: ptep_clear() page table helper Pasha Tatashin
@ 2021-11-16 22:00 ` Pasha Tatashin
  2021-11-17  8:08   ` Jonathan Corbet
  2021-11-16 22:00 ` [RFC 3/3] x86: mm: add x86_64 support for " Pasha Tatashin
  2 siblings, 1 reply; 8+ messages in thread
From: Pasha Tatashin @ 2021-11-16 22:00 UTC (permalink / raw)
  To: pasha.tatashin, linux-kernel, linux-mm, linux-doc, akpm,
	rientjes, pjt, weixugc, gthelen, mingo, corbet, will, rppt,
	keescook, tglx, peterz, masahiroy, samitolvanen, dave.hansen,
	x86, frederic, hpa, aneesh.kumar

Check user page table entries at the time they are added and removed.

Allows to synchronously catch memory corruption issues related to
double mapping.

When a pte for an anonymous page is added into page table, we verify
that this pte does not already point to a file backed page, and vice
versa if this is a file backed page that is being added we verify that
this page does not have an anonymous mapping

We also enforce that read-only sharing for anonymous pages is allowed
(i.e. cow after fork). All other sharing must be for file pages.

Page table check allows to protect and debug cases where "struct page"
metadata became corrupted for some reason. For example, when refcnt or
mapcount become invalid.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 Documentation/vm/page_table_check.rst |  53 ++++++
 MAINTAINERS                           |   9 +
 arch/Kconfig                          |   3 +
 include/linux/page_table_check.h      | 147 ++++++++++++++
 mm/Kconfig.debug                      |  24 +++
 mm/Makefile                           |   1 +
 mm/page_alloc.c                       |   4 +
 mm/page_ext.c                         |   4 +
 mm/page_table_check.c                 | 264 ++++++++++++++++++++++++++
 9 files changed, 509 insertions(+)
 create mode 100644 Documentation/vm/page_table_check.rst
 create mode 100644 include/linux/page_table_check.h
 create mode 100644 mm/page_table_check.c

diff --git a/Documentation/vm/page_table_check.rst b/Documentation/vm/page_table_check.rst
new file mode 100644
index 000000000000..41435a45869f
--- /dev/null
+++ b/Documentation/vm/page_table_check.rst
@@ -0,0 +1,53 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _page_table_check:
+
+================
+Page Table Check
+================
+
+Page table check allows to hardern the kernel by ensuring that some types of
+memory corruptions are prevented.
+
+Page table check performs extra verifications at the time when new pages become
+accessible from userspace by getting their page table entries (PTEs PMDs etc.)
+added into the table.
+
+In case of detected corruption, the kernel is crashed. There is a small
+performance and memory overhead associated with page table check. Thereofre, it
+is disabled by default but can be optionally enabled on systems where extra
+hardening outweighs the costs. Also, because page table check is synchronous, it
+can help with debugging double map memory corruption issues, by crashing kernel
+at the time wrong mapping occurs instead of later which is often the case with
+memory corruptions bugs.
+
+==============================
+Double mapping detection logic
+==============================
++-------------------+-------------------+-------------------+------------------+
+| Current Mapping   | New mapping       | Permissions       | Rule             |
++===================+===================+===================+==================+
+| Anonymous         | Anonymous         | Read              | Allow            |
++-------------------+-------------------+-------------------+------------------+
+| Anonymous         | Anonymous         | Read / Write      | Prohibit         |
++-------------------+-------------------+-------------------+------------------+
+| Anonymous         | Named             | Any               | Prohibit         |
++-------------------+-------------------+-------------------+------------------+
+| Named             | Anonymous         | Any               | Prohibit         |
++-------------------+-------------------+-------------------+------------------+
+| Named             | Named             | Any               | Allow            |
++-------------------+-------------------+-------------------+------------------+
+
+=========================
+Enabling Page Table Check
+=========================
+
+Build kernel with:
+
+- PAGE_TABLE_CHECK=y
+Note, it can only be enabled on platforms where ARCH_SUPPORTS_PAGE_TABLE_CHECK
+is available.
+- Boot with 'page_table_check=on' kernel parameter.
+
+Optionally, build kernel with PAGE_TABLE_CHECK_ENFORCED in order to have page
+table support without extra kernel parameter.
diff --git a/MAINTAINERS b/MAINTAINERS
index 74158b271cb7..13dcc9cc10c2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14338,6 +14338,15 @@ F:	include/net/page_pool.h
 F:	include/trace/events/page_pool.h
 F:	net/core/page_pool.c
 
+PAGE TABLE CHECK
+M:	Pasha Tatashin <pasha.tatashin@soleen.com>
+M:	Andrew Morton <akpm@linux-foundation.org>
+L:	linux-mm@kvack.org
+S:	Maintained
+F:	Documentation/vm/page_table_check.rst
+F:	include/linux/page_table_check.h
+F:	mm/page_table_check.c
+
 PANASONIC LAPTOP ACPI EXTRAS DRIVER
 M:	Kenneth Chan <kenneth.t.chan@gmail.com>
 L:	platform-driver-x86@vger.kernel.org
diff --git a/arch/Kconfig b/arch/Kconfig
index 26b8ed11639d..c5b03b3bd62d 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1287,6 +1287,9 @@ config HAVE_ARCH_PFN_VALID
 config ARCH_SUPPORTS_DEBUG_PAGEALLOC
 	bool
 
+config ARCH_SUPPORTS_PAGE_TABLE_CHECK
+	bool
+
 config ARCH_SPLIT_ARG64
 	bool
 	help
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
new file mode 100644
index 000000000000..38cace1da7b6
--- /dev/null
+++ b/include/linux/page_table_check.h
@@ -0,0 +1,147 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Copyright (c) 2021, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+#ifndef __LINUX_PAGE_TABLE_CHECK_H
+#define __LINUX_PAGE_TABLE_CHECK_H
+
+#ifdef CONFIG_PAGE_TABLE_CHECK
+#include <linux/jump_label.h>
+
+extern struct static_key_true page_table_check_disabled;
+extern struct page_ext_operations page_table_check_ops;
+
+void __page_table_check_zero(struct page *page, unsigned int order);
+void __page_table_check_pte_clear(struct mm_struct *mm, unsigned long addr,
+				  pte_t pte);
+void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
+				  pmd_t pmd);
+void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
+				  pud_t pud);
+void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr,
+				pte_t *ptep, pte_t pte);
+void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
+				pmd_t *pmdp, pmd_t pmd);
+void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
+				pud_t *pudp, pud_t pud);
+
+static inline void page_table_check_alloc(struct page *page, unsigned int order)
+{
+	if (static_branch_likely(&page_table_check_disabled))
+		return;
+
+	__page_table_check_zero(page, order);
+}
+
+static inline void page_table_check_free(struct page *page, unsigned int order)
+{
+	if (static_branch_likely(&page_table_check_disabled))
+		return;
+
+	__page_table_check_zero(page, order);
+}
+
+static inline void page_table_check_pte_clear(struct mm_struct *mm,
+					      unsigned long addr, pte_t pte)
+{
+	if (static_branch_likely(&page_table_check_disabled))
+		return;
+
+	__page_table_check_pte_clear(mm, addr, pte);
+}
+
+static inline void page_table_check_pmd_clear(struct mm_struct *mm,
+					      unsigned long addr, pmd_t pmd)
+{
+	if (static_branch_likely(&page_table_check_disabled))
+		return;
+
+	__page_table_check_pmd_clear(mm, addr, pmd);
+}
+
+static inline void page_table_check_pud_clear(struct mm_struct *mm,
+					      unsigned long addr, pud_t pud)
+{
+	if (static_branch_likely(&page_table_check_disabled))
+		return;
+
+	__page_table_check_pud_clear(mm, addr, pud);
+}
+
+static inline void page_table_check_pte_set(struct mm_struct *mm,
+					    unsigned long addr, pte_t *ptep,
+					    pte_t pte)
+{
+	if (static_branch_likely(&page_table_check_disabled))
+		return;
+
+	__page_table_check_pte_set(mm, addr, ptep, pte);
+}
+
+static inline void page_table_check_pmd_set(struct mm_struct *mm,
+					    unsigned long addr, pmd_t *pmdp,
+					    pmd_t pmd)
+{
+	if (static_branch_likely(&page_table_check_disabled))
+		return;
+
+	__page_table_check_pmd_set(mm, addr, pmdp, pmd);
+}
+
+static inline void page_table_check_pud_set(struct mm_struct *mm,
+					    unsigned long addr, pud_t *pudp,
+					    pud_t pud)
+{
+	if (static_branch_likely(&page_table_check_disabled))
+		return;
+
+	__page_table_check_pud_set(mm, addr, pudp, pud);
+}
+
+#else
+
+static inline void page_table_check_alloc(struct page *page, unsigned int order)
+{
+}
+
+static inline void page_table_check_free(struct page *page, unsigned int order)
+{
+}
+
+static inline void page_table_check_pte_clear(struct mm_struct *mm,
+					      unsigned long addr, pte_t pte)
+{
+}
+
+static inline void page_table_check_pmd_clear(struct mm_struct *mm,
+					      unsigned long addr, pmd_t pmd)
+{
+}
+
+static inline void page_table_check_pud_clear(struct mm_struct *mm,
+					      unsigned long addr, pud_t pud)
+{
+}
+
+static inline void page_table_check_pte_set(struct mm_struct *mm,
+					    unsigned long addr, pte_t *ptep,
+					    pte_t pte)
+{
+}
+
+static inline void page_table_check_pmd_set(struct mm_struct *mm,
+					    unsigned long addr, pmd_t *pmdp,
+					    pmd_t pmd)
+{
+}
+
+static inline void page_table_check_pud_set(struct mm_struct *mm,
+					    unsigned long addr, pud_t *pudp,
+					    pud_t pud)
+{
+}
+
+#endif /* CONFIG_PAGE_TABLE_CHECK */
+#endif /* __LINUX_PAGE_TABLE_CHECK_H */
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index 1e73717802f8..e5724cd6946b 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -62,6 +62,30 @@ config PAGE_OWNER
 
 	  If unsure, say N.
 
+config PAGE_TABLE_CHECK
+	bool "Check for invalid mappings in user page tables"
+	depends on ARCH_SUPPORTS_PAGE_TABLE_CHECK
+	select PAGE_EXTENSION
+	help
+	  Check that anonymous page is not being mapped twice with read write
+	  permissions. Check that anonymous and file pages are not being
+	  erroneously shared. Since the checking is performed at the time
+	  entries are added and removed to user page tables, leaking, corruption
+	  and double mapping problems are detected synchronously.
+
+	  If unsure say "n".
+
+config PAGE_TABLE_CHECK_ENFORCED
+	bool "Enforce the page table checking by defauled"
+	depends on PAGE_TABLE_CHECK
+	help
+	  Always enable page table checking.  By default the page table checking
+	  is disabled, and can be optionally enabled via page_table_check=on
+	  kernel parameter. This config enforces that page table check is always
+	  enabled.
+
+	  If unsure say "n".
+
 config PAGE_POISONING
 	bool "Poison pages after freeing"
 	help
diff --git a/mm/Makefile b/mm/Makefile
index d6c0042e3aa0..5c5a3a480fa6 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -112,6 +112,7 @@ obj-$(CONFIG_GENERIC_EARLY_IOREMAP) += early_ioremap.o
 obj-$(CONFIG_CMA)	+= cma.o
 obj-$(CONFIG_MEMORY_BALLOON) += balloon_compaction.o
 obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
+obj-$(CONFIG_PAGE_TABLE_CHECK) += page_table_check.o
 obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
 obj-$(CONFIG_SECRETMEM) += secretmem.o
 obj-$(CONFIG_CMA_SYSFS) += cma_sysfs.o
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fee18ada46a2..4165071e2958 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -63,6 +63,7 @@
 #include <linux/sched/rt.h>
 #include <linux/sched/mm.h>
 #include <linux/page_owner.h>
+#include <linux/page_table_check.h>
 #include <linux/kthread.h>
 #include <linux/memcontrol.h>
 #include <linux/ftrace.h>
@@ -1299,6 +1300,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 		if (memcg_kmem_enabled() && PageMemcgKmem(page))
 			__memcg_kmem_uncharge_page(page, order);
 		reset_page_owner(page, order);
+		page_table_check_free(page, order);
 		return false;
 	}
 
@@ -1338,6 +1340,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
 	page_cpupid_reset_last(page);
 	page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
 	reset_page_owner(page, order);
+	page_table_check_free(page, order);
 
 	if (!PageHighMem(page)) {
 		debug_check_no_locks_freed(page_address(page),
@@ -2418,6 +2421,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 	}
 
 	set_page_owner(page, order, gfp_flags);
+	page_table_check_alloc(page, order);
 }
 
 static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 2a52fd9ed464..a4d2c86c26a9 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -8,6 +8,7 @@
 #include <linux/kmemleak.h>
 #include <linux/page_owner.h>
 #include <linux/page_idle.h>
+#include <linux/page_table_check.h>
 
 /*
  * struct page extension
@@ -75,6 +76,9 @@ static struct page_ext_operations *page_ext_ops[] = {
 #if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT)
 	&page_idle_ops,
 #endif
+#ifdef CONFIG_PAGE_TABLE_CHECK
+	&page_table_check_ops,
+#endif
 };
 
 unsigned long page_ext_size = sizeof(struct page_ext);
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
new file mode 100644
index 000000000000..5a63a2a57da2
--- /dev/null
+++ b/mm/page_table_check.c
@@ -0,0 +1,264 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (c) 2021, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+#include <linux/mm.h>
+#include <linux/page_table_check.h>
+
+#undef pr_fmt
+#define pr_fmt(fmt)	"page_table_check: " fmt
+
+struct page_table_check {
+	atomic_t anon_map_count;
+	atomic_t file_map_count;
+};
+
+static bool __page_table_check_enabled __initdata =
+				IS_ENABLED(CONFIG_PAGE_TABLE_CHECK_ENFORCED);
+
+DEFINE_STATIC_KEY_TRUE_RO(page_table_check_disabled);
+
+static int __init early_page_table_check_param(char *buf)
+{
+	if (!buf)
+		return -EINVAL;
+
+	if (strcmp(buf, "on") == 0)
+		__page_table_check_enabled = true;
+
+	return 0;
+}
+
+early_param("page_table_check", early_page_table_check_param);
+
+static bool __init need_page_table_check(void)
+{
+	if (!__page_table_check_enabled)
+		return false;
+
+	return true;
+}
+
+static void __init init_page_table_check(void)
+{
+	if (!__page_table_check_enabled)
+		return;
+	static_branch_disable(&page_table_check_disabled);
+}
+
+struct page_ext_operations page_table_check_ops = {
+	.size = sizeof(struct page_table_check),
+	.need = need_page_table_check,
+	.init = init_page_table_check,
+};
+
+static struct page_table_check *get_page_table_check(struct page_ext *page_ext)
+{
+	BUG_ON(!page_ext);
+	return ((void *)(page_ext) + page_table_check_ops.offset);
+}
+
+static inline bool pte_user_accessible_page(pte_t pte)
+{
+	return (pte_val(pte) & _PAGE_PRESENT) && (pte_val(pte) & _PAGE_USER);
+}
+
+static inline bool pmd_user_accessible_page(pmd_t pmd)
+{
+	return pmd_leaf(pmd) && (pmd_val(pmd) & _PAGE_PRESENT) &&
+		(pmd_val(pmd) & _PAGE_USER);
+}
+
+static inline bool pud_user_accessible_page(pud_t pud)
+{
+	return pud_leaf(pud) && (pud_val(pud) & _PAGE_PRESENT) &&
+		(pud_val(pud) & _PAGE_USER);
+}
+
+/*
+ * An enty is removed from the page table, decrement the counters for that page
+ * verify that it is of correct type and counters do not become negative.
+ */
+static void page_table_check_clear(struct mm_struct *mm, unsigned long addr,
+				   unsigned long pfn, unsigned int pgcnt)
+{
+	struct page_ext *page_ext;
+	bool anon;
+	int i, count;
+
+	if (!pfn_valid(pfn))
+		return;
+
+	page_ext = lookup_page_ext(pfn_to_page(pfn));
+	anon = PageAnon(pfn_to_page(pfn));
+
+	for (i = 0; i < pgcnt; i++, pfn++) {
+		struct page_table_check *ptc = get_page_table_check(page_ext);
+
+		if (anon) {
+			BUG_ON(atomic_read(&ptc->file_map_count));
+			count = atomic_dec_return(&ptc->anon_map_count);
+		} else {
+			BUG_ON(atomic_read(&ptc->anon_map_count));
+			count = atomic_dec_return(&ptc->file_map_count);
+		}
+
+		BUG_ON(count < 0);
+		page_ext = page_ext_next(page_ext);
+	}
+}
+
+/*
+ * A new enty is added to the page table, increment the counters for that page
+ * verify that it is of correct type and is not being mapped with a different
+ * type to a different process.
+ */
+static void page_table_check_set(struct mm_struct *mm, unsigned long addr,
+				 unsigned long pfn, unsigned long pgcnt,
+				 bool rw)
+{
+	struct page_ext *page_ext;
+	bool anon;
+	int i, count;
+
+	if (!pfn_valid(pfn))
+		return;
+
+	page_ext = lookup_page_ext(pfn_to_page(pfn));
+	anon = PageAnon(pfn_to_page(pfn));
+
+	for (i = 0; i < pgcnt; i++, pfn++) {
+		struct page_table_check *ptc = get_page_table_check(page_ext);
+
+		if (anon) {
+			BUG_ON(atomic_read(&ptc->file_map_count));
+			count = atomic_inc_return(&ptc->anon_map_count);
+			BUG_ON(count > 1 && rw);
+		} else {
+			BUG_ON(atomic_read(&ptc->anon_map_count));
+			count = atomic_inc_return(&ptc->file_map_count);
+		}
+		BUG_ON(count < 0);
+		page_ext = page_ext_next(page_ext);
+	}
+}
+
+/*
+ * page is on free list, or is being allocated, verify that counters are zeroes
+ * crash if they are not.
+ */
+void __page_table_check_zero(struct page *page, unsigned int order)
+{
+	struct page_ext *page_ext = lookup_page_ext(page);
+	int i;
+
+	BUG_ON(!page_ext);
+	for (i = 0; i < (1 << order); i++) {
+		struct page_table_check *ptc = get_page_table_check(page_ext);
+
+		BUG_ON(atomic_read(&ptc->anon_map_count));
+		BUG_ON(atomic_read(&ptc->file_map_count));
+		page_ext = page_ext_next(page_ext);
+	}
+}
+
+void __page_table_check_pte_clear(struct mm_struct *mm, unsigned long addr,
+				  pte_t pte)
+{
+	if (&init_mm == mm)
+		return;
+
+	if (pte_user_accessible_page(pte)) {
+		page_table_check_clear(mm, addr, pte_pfn(pte),
+				       PAGE_SIZE >> PAGE_SHIFT);
+	}
+}
+
+void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
+				  pmd_t pmd)
+{
+	if (&init_mm == mm)
+		return;
+
+	if (pmd_user_accessible_page(pmd)) {
+		page_table_check_clear(mm, addr, pmd_pfn(pmd),
+				       PMD_PAGE_SIZE >> PAGE_SHIFT);
+	}
+}
+
+void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
+				  pud_t pud)
+{
+	if (&init_mm == mm)
+		return;
+
+	if (pud_user_accessible_page(pud)) {
+		page_table_check_clear(mm, addr, pud_pfn(pud),
+				       PUD_PAGE_SIZE >> PAGE_SHIFT);
+	}
+}
+
+void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr,
+				pte_t *ptep, pte_t pte)
+{
+	pte_t old_pte;
+
+	if (&init_mm == mm)
+		return;
+
+	old_pte = *ptep;
+	if (pte_user_accessible_page(old_pte)) {
+		page_table_check_clear(mm, addr, pte_pfn(old_pte),
+				       PAGE_SIZE >> PAGE_SHIFT);
+	}
+
+	if (pte_user_accessible_page(pte)) {
+		page_table_check_set(mm, addr, pte_pfn(pte),
+				     PAGE_SIZE >> PAGE_SHIFT,
+				     pte_write(pte));
+	}
+}
+
+void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
+				pmd_t *pmdp, pmd_t pmd)
+{
+	pmd_t old_pmd;
+
+	if (&init_mm == mm)
+		return;
+
+	old_pmd = *pmdp;
+	if (pmd_user_accessible_page(old_pmd)) {
+		page_table_check_clear(mm, addr, pmd_pfn(old_pmd),
+				       PMD_PAGE_SIZE >> PAGE_SHIFT);
+	}
+
+	if (pmd_user_accessible_page(pmd)) {
+		page_table_check_set(mm, addr, pmd_pfn(pmd),
+				     PMD_PAGE_SIZE >> PAGE_SHIFT,
+				     pmd_write(pmd));
+	}
+}
+
+void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
+				pud_t *pudp, pud_t pud)
+{
+	pud_t old_pud;
+
+	if (&init_mm == mm)
+		return;
+
+	old_pud = *pudp;
+	if (pud_user_accessible_page(old_pud)) {
+		page_table_check_clear(mm, addr, pud_pfn(old_pud),
+				       PUD_PAGE_SIZE >> PAGE_SHIFT);
+	}
+
+	if (pud_user_accessible_page(pud)) {
+		page_table_check_set(mm, addr, pud_pfn(pud),
+				     PUD_PAGE_SIZE >> PAGE_SHIFT,
+				     pud_write(pud));
+	}
+}
-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC 3/3] x86: mm: add x86_64 support for page table check
  2021-11-16 22:00 [RFC 0/3] page table check Pasha Tatashin
  2021-11-16 22:00 ` [RFC 1/3] mm: ptep_clear() page table helper Pasha Tatashin
  2021-11-16 22:00 ` [RFC 2/3] mm: page table check Pasha Tatashin
@ 2021-11-16 22:00 ` Pasha Tatashin
  2 siblings, 0 replies; 8+ messages in thread
From: Pasha Tatashin @ 2021-11-16 22:00 UTC (permalink / raw)
  To: pasha.tatashin, linux-kernel, linux-mm, linux-doc, akpm,
	rientjes, pjt, weixugc, gthelen, mingo, corbet, will, rppt,
	keescook, tglx, peterz, masahiroy, samitolvanen, dave.hansen,
	x86, frederic, hpa, aneesh.kumar

Add page table check hooks into routines that modify user page tables.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 arch/x86/Kconfig               |  1 +
 arch/x86/include/asm/pgtable.h | 27 +++++++++++++++++++++++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b1d4b481fcdd..9d28f2ac85ff 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -104,6 +104,7 @@ config X86
 	select ARCH_SUPPORTS_ACPI
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_DEBUG_PAGEALLOC
+	select ARCH_SUPPORTS_PAGE_TABLE_CHECK	if X86_64
 	select ARCH_SUPPORTS_NUMA_BALANCING	if X86_64
 	select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP	if NR_CPUS <= 4096
 	select ARCH_SUPPORTS_LTO_CLANG
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 448cd01eb3ec..46f0112f0303 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -26,6 +26,7 @@
 #include <asm/pkru.h>
 #include <asm/fpu/api.h>
 #include <asm-generic/pgtable_uffd.h>
+#include <linux/page_table_check.h>
 
 extern pgd_t early_top_pgt[PTRS_PER_PGD];
 bool __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
@@ -1006,18 +1007,21 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp)
 static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 			      pte_t *ptep, pte_t pte)
 {
+	page_table_check_pte_set(mm, addr, ptep, pte);
 	set_pte(ptep, pte);
 }
 
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
 			      pmd_t *pmdp, pmd_t pmd)
 {
+	page_table_check_pmd_set(mm, addr, pmdp, pmd);
 	set_pmd(pmdp, pmd);
 }
 
 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
 			      pud_t *pudp, pud_t pud)
 {
+	page_table_check_pud_set(mm, addr, pudp, pud);
 	native_set_pud(pudp, pud);
 }
 
@@ -1048,6 +1052,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 				       pte_t *ptep)
 {
 	pte_t pte = native_ptep_get_and_clear(ptep);
+	page_table_check_pte_clear(mm, addr, pte);
 	return pte;
 }
 
@@ -1063,12 +1068,21 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
 		 * care about updates and native needs no locking
 		 */
 		pte = native_local_ptep_get_and_clear(ptep);
+		page_table_check_pte_clear(mm, addr, pte);
 	} else {
 		pte = ptep_get_and_clear(mm, addr, ptep);
 	}
 	return pte;
 }
 
+#define __HAVE_ARCH_PTEP_CLEAR
+static inline void ptep_clear(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep)
+{
+	page_table_check_pte_clear(mm, addr, *ptep);
+	pte_clear(mm, addr, ptep);
+}
+
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
 static inline void ptep_set_wrprotect(struct mm_struct *mm,
 				      unsigned long addr, pte_t *ptep)
@@ -1109,14 +1123,22 @@ static inline int pmd_write(pmd_t pmd)
 static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, unsigned long addr,
 				       pmd_t *pmdp)
 {
-	return native_pmdp_get_and_clear(pmdp);
+	pmd_t pmd = native_pmdp_get_and_clear(pmdp);
+
+	page_table_check_pmd_clear(mm, addr, pmd);
+
+	return pmd;
 }
 
 #define __HAVE_ARCH_PUDP_HUGE_GET_AND_CLEAR
 static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm,
 					unsigned long addr, pud_t *pudp)
 {
-	return native_pudp_get_and_clear(pudp);
+	pud_t pud = native_pudp_get_and_clear(pudp);
+
+	page_table_check_pud_clear(mm, addr, pud);
+
+	return pud;
 }
 
 #define __HAVE_ARCH_PMDP_SET_WRPROTECT
@@ -1137,6 +1159,7 @@ static inline int pud_write(pud_t pud)
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 		unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
+	page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
 	if (IS_ENABLED(CONFIG_SMP)) {
 		return xchg(pmdp, pmd);
 	} else {
-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC 2/3] mm: page table check
  2021-11-16 22:00 ` [RFC 2/3] mm: page table check Pasha Tatashin
@ 2021-11-17  8:08   ` Jonathan Corbet
  2021-11-17 16:47     ` Pasha Tatashin
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Corbet @ 2021-11-17  8:08 UTC (permalink / raw)
  To: Pasha Tatashin, pasha.tatashin, linux-kernel, linux-mm,
	linux-doc, akpm, rientjes, pjt, weixugc, gthelen, mingo, will,
	rppt, keescook, tglx, peterz, masahiroy, samitolvanen,
	dave.hansen, x86, frederic, hpa, aneesh.kumar

Pasha Tatashin <pasha.tatashin@soleen.com> writes:

> Check user page table entries at the time they are added and removed.
>
> Allows to synchronously catch memory corruption issues related to
> double mapping.
>
> When a pte for an anonymous page is added into page table, we verify
> that this pte does not already point to a file backed page, and vice
> versa if this is a file backed page that is being added we verify that
> this page does not have an anonymous mapping
>
> We also enforce that read-only sharing for anonymous pages is allowed
> (i.e. cow after fork). All other sharing must be for file pages.
>
> Page table check allows to protect and debug cases where "struct page"
> metadata became corrupted for some reason. For example, when refcnt or
> mapcount become invalid.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  Documentation/vm/page_table_check.rst |  53 ++++++

Thanks for documenting this feature!  When you add a new RST file,
though, you need to add it to the index.rst file as well so that it is
included in the docs build.

>  MAINTAINERS                           |   9 +
>  arch/Kconfig                          |   3 +
>  include/linux/page_table_check.h      | 147 ++++++++++++++
>  mm/Kconfig.debug                      |  24 +++
>  mm/Makefile                           |   1 +
>  mm/page_alloc.c                       |   4 +
>  mm/page_ext.c                         |   4 +
>  mm/page_table_check.c                 | 264 ++++++++++++++++++++++++++
>  9 files changed, 509 insertions(+)
>  create mode 100644 Documentation/vm/page_table_check.rst
>  create mode 100644 include/linux/page_table_check.h
>  create mode 100644 mm/page_table_check.c
>
> diff --git a/Documentation/vm/page_table_check.rst b/Documentation/vm/page_table_check.rst
> new file mode 100644
> index 000000000000..41435a45869f
> --- /dev/null
> +++ b/Documentation/vm/page_table_check.rst
> @@ -0,0 +1,53 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +.. _page_table_check:

Do you need this label for anything?  As-is it's just added visual
clutter and could come out.

> +================
> +Page Table Check
> +================
> +
> +Page table check allows to hardern the kernel by ensuring that some types of
> +memory corruptions are prevented.
> +
> +Page table check performs extra verifications at the time when new pages become
> +accessible from userspace by getting their page table entries (PTEs PMDs etc.)
> +added into the table.
> +
> +In case of detected corruption, the kernel is crashed. There is a small
> +performance and memory overhead associated with page table check. Thereofre, it
> +is disabled by default but can be optionally enabled on systems where extra
> +hardening outweighs the costs. Also, because page table check is synchronous, it
> +can help with debugging double map memory corruption issues, by crashing kernel
> +at the time wrong mapping occurs instead of later which is often the case with
> +memory corruptions bugs.
> +
> +==============================
> +Double mapping detection logic
> +==============================

I'd use subsection markup (single "==========" line underneath) for the
subsections.

> ++-------------------+-------------------+-------------------+------------------+
> +| Current Mapping   | New mapping       | Permissions       | Rule             |
> ++===================+===================+===================+==================+
> +| Anonymous         | Anonymous         | Read              | Allow            |
> ++-------------------+-------------------+-------------------+------------------+
> +| Anonymous         | Anonymous         | Read / Write      | Prohibit         |
> ++-------------------+-------------------+-------------------+------------------+
> +| Anonymous         | Named             | Any               | Prohibit         |
> ++-------------------+-------------------+-------------------+------------------+
> +| Named             | Anonymous         | Any               | Prohibit         |
> ++-------------------+-------------------+-------------------+------------------+
> +| Named             | Named             | Any               | Allow            |
> ++-------------------+-------------------+-------------------+------------------+
> +
> +=========================
> +Enabling Page Table Check
> +=========================
> +
> +Build kernel with:
> +
> +- PAGE_TABLE_CHECK=y
> +Note, it can only be enabled on platforms where ARCH_SUPPORTS_PAGE_TABLE_CHECK
> +is available.
> +- Boot with 'page_table_check=on' kernel parameter.
> +
> +Optionally, build kernel with PAGE_TABLE_CHECK_ENFORCED in order to have page
> +table support without extra kernel parameter.

Thanks,

jon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC 1/3] mm: ptep_clear() page table helper
  2021-11-16 22:00 ` [RFC 1/3] mm: ptep_clear() page table helper Pasha Tatashin
@ 2021-11-17  8:51   ` Anshuman Khandual
  2021-11-17 16:43     ` Pasha Tatashin
  0 siblings, 1 reply; 8+ messages in thread
From: Anshuman Khandual @ 2021-11-17  8:51 UTC (permalink / raw)
  To: Pasha Tatashin, linux-kernel, linux-mm, linux-doc, akpm,
	rientjes, pjt, weixugc, gthelen, mingo, corbet, will, rppt,
	keescook, tglx, peterz, masahiroy, samitolvanen, dave.hansen,
	x86, frederic, hpa, aneesh.kumar



On 11/17/21 3:30 AM, Pasha Tatashin wrote:
> diff --git a/Documentation/vm/arch_pgtable_helpers.rst b/Documentation/vm/arch_pgtable_helpers.rst
> index 552567d863b8..fbe06ec75370 100644
> --- a/Documentation/vm/arch_pgtable_helpers.rst
> +++ b/Documentation/vm/arch_pgtable_helpers.rst
> @@ -66,9 +66,11 @@ PTE Page Table Helpers
>  +---------------------------+--------------------------------------------------+
>  | pte_mknotpresent          | Invalidates a mapped PTE                         |
>  +---------------------------+--------------------------------------------------+
> -| ptep_get_and_clear        | Clears a PTE                                     |
> +| ptep_clear                | Clears a PTE                                     |
>  +---------------------------+--------------------------------------------------+
> -| ptep_get_and_clear_full   | Clears a PTE                                     |
> +| ptep_get_and_clear        | Clears and returns PTE                           |
> ++---------------------------+--------------------------------------------------+
> +| ptep_get_and_clear_full   | Clears and returns PTE (batched PTE unmap)       |
>  +---------------------------+--------------------------------------------------+
>  | ptep_test_and_clear_young | Clears young from a PTE                          |
>  +---------------------------+--------------------------------------------------+

Just curious. This does not have a corresponding change in mm/debug_vm_pgtable.c ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC 1/3] mm: ptep_clear() page table helper
  2021-11-17  8:51   ` Anshuman Khandual
@ 2021-11-17 16:43     ` Pasha Tatashin
  0 siblings, 0 replies; 8+ messages in thread
From: Pasha Tatashin @ 2021-11-17 16:43 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: LKML, linux-mm, Linux Doc Mailing List, Andrew Morton,
	David Rientjes, Paul Turner, weixugc, Greg Thelen, Ingo Molnar,
	Jonathan Corbet, Will Deacon, Mike Rapoport, Kees Cook,
	Thomas Gleixner, Peter Zijlstra, masahiroy, Sami Tolvanen,
	Dave Hansen, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	frederic, H. Peter Anvin, Aneesh Kumar K.V

On Wed, Nov 17, 2021 at 3:52 AM Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
>
>
> On 11/17/21 3:30 AM, Pasha Tatashin wrote:
> > diff --git a/Documentation/vm/arch_pgtable_helpers.rst b/Documentation/vm/arch_pgtable_helpers.rst
> > index 552567d863b8..fbe06ec75370 100644
> > --- a/Documentation/vm/arch_pgtable_helpers.rst
> > +++ b/Documentation/vm/arch_pgtable_helpers.rst
> > @@ -66,9 +66,11 @@ PTE Page Table Helpers
> >  +---------------------------+--------------------------------------------------+
> >  | pte_mknotpresent          | Invalidates a mapped PTE                         |
> >  +---------------------------+--------------------------------------------------+
> > -| ptep_get_and_clear        | Clears a PTE                                     |
> > +| ptep_clear                | Clears a PTE                                     |
> >  +---------------------------+--------------------------------------------------+
> > -| ptep_get_and_clear_full   | Clears a PTE                                     |
> > +| ptep_get_and_clear        | Clears and returns PTE                           |
> > ++---------------------------+--------------------------------------------------+
> > +| ptep_get_and_clear_full   | Clears and returns PTE (batched PTE unmap)       |
> >  +---------------------------+--------------------------------------------------+
> >  | ptep_test_and_clear_young | Clears young from a PTE                          |
> >  +---------------------------+--------------------------------------------------+
>
> Just curious. This does not have a corresponding change in mm/debug_vm_pgtable.c ?

You are right, I need to replace it in mm/debug_vm_pgtable.c as well.
I will do it in the next version.

Thanks,
Pasha

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC 2/3] mm: page table check
  2021-11-17  8:08   ` Jonathan Corbet
@ 2021-11-17 16:47     ` Pasha Tatashin
  0 siblings, 0 replies; 8+ messages in thread
From: Pasha Tatashin @ 2021-11-17 16:47 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: LKML, linux-mm, Linux Doc Mailing List, Andrew Morton,
	David Rientjes, Paul Turner, weixugc, Greg Thelen, Ingo Molnar,
	Will Deacon, Mike Rapoport, Kees Cook, Thomas Gleixner,
	Peter Zijlstra, masahiroy, Sami Tolvanen, Dave Hansen,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	frederic, H. Peter Anvin, Aneesh Kumar K.V

> >  Documentation/vm/page_table_check.rst |  53 ++++++
>
> Thanks for documenting this feature!  When you add a new RST file,
> though, you need to add it to the index.rst file as well so that it is
> included in the docs build.

I will add the index.rst changes.

>
> >  MAINTAINERS                           |   9 +
> >  arch/Kconfig                          |   3 +
> >  include/linux/page_table_check.h      | 147 ++++++++++++++
> >  mm/Kconfig.debug                      |  24 +++
> >  mm/Makefile                           |   1 +
> >  mm/page_alloc.c                       |   4 +
> >  mm/page_ext.c                         |   4 +
> >  mm/page_table_check.c                 | 264 ++++++++++++++++++++++++++
> >  9 files changed, 509 insertions(+)
> >  create mode 100644 Documentation/vm/page_table_check.rst
> >  create mode 100644 include/linux/page_table_check.h
> >  create mode 100644 mm/page_table_check.c
> >
> > diff --git a/Documentation/vm/page_table_check.rst b/Documentation/vm/page_table_check.rst
> > new file mode 100644
> > index 000000000000..41435a45869f
> > --- /dev/null
> > +++ b/Documentation/vm/page_table_check.rst
> > @@ -0,0 +1,53 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +.. _page_table_check:
>
> Do you need this label for anything?  As-is it's just added visual
> clutter and could come out.

Sure, I will remove it

>
> > +================
> > +Page Table Check
> > +================
> > +
> > +Page table check allows to hardern the kernel by ensuring that some types of
> > +memory corruptions are prevented.
> > +
> > +Page table check performs extra verifications at the time when new pages become
> > +accessible from userspace by getting their page table entries (PTEs PMDs etc.)
> > +added into the table.
> > +
> > +In case of detected corruption, the kernel is crashed. There is a small
> > +performance and memory overhead associated with page table check. Thereofre, it
> > +is disabled by default but can be optionally enabled on systems where extra
> > +hardening outweighs the costs. Also, because page table check is synchronous, it
> > +can help with debugging double map memory corruption issues, by crashing kernel
> > +at the time wrong mapping occurs instead of later which is often the case with
> > +memory corruptions bugs.
> > +
> > +==============================
> > +Double mapping detection logic
> > +==============================
>
> I'd use subsection markup (single "==========" line underneath) for the
> subsections.

I will change to subsection.

Thanks,
Pasha

On Wed, Nov 17, 2021 at 3:08 AM Jonathan Corbet <corbet@lwn.net> wrote:
>
> Pasha Tatashin <pasha.tatashin@soleen.com> writes:
>
> > Check user page table entries at the time they are added and removed.
> >
> > Allows to synchronously catch memory corruption issues related to
> > double mapping.
> >
> > When a pte for an anonymous page is added into page table, we verify
> > that this pte does not already point to a file backed page, and vice
> > versa if this is a file backed page that is being added we verify that
> > this page does not have an anonymous mapping
> >
> > We also enforce that read-only sharing for anonymous pages is allowed
> > (i.e. cow after fork). All other sharing must be for file pages.
> >
> > Page table check allows to protect and debug cases where "struct page"
> > metadata became corrupted for some reason. For example, when refcnt or
> > mapcount become invalid.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > ---
> >  Documentation/vm/page_table_check.rst |  53 ++++++
>
> Thanks for documenting this feature!  When you add a new RST file,
> though, you need to add it to the index.rst file as well so that it is
> included in the docs build.
>
> >  MAINTAINERS                           |   9 +
> >  arch/Kconfig                          |   3 +
> >  include/linux/page_table_check.h      | 147 ++++++++++++++
> >  mm/Kconfig.debug                      |  24 +++
> >  mm/Makefile                           |   1 +
> >  mm/page_alloc.c                       |   4 +
> >  mm/page_ext.c                         |   4 +
> >  mm/page_table_check.c                 | 264 ++++++++++++++++++++++++++
> >  9 files changed, 509 insertions(+)
> >  create mode 100644 Documentation/vm/page_table_check.rst
> >  create mode 100644 include/linux/page_table_check.h
> >  create mode 100644 mm/page_table_check.c
> >
> > diff --git a/Documentation/vm/page_table_check.rst b/Documentation/vm/page_table_check.rst
> > new file mode 100644
> > index 000000000000..41435a45869f
> > --- /dev/null
> > +++ b/Documentation/vm/page_table_check.rst
> > @@ -0,0 +1,53 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +.. _page_table_check:
>
> Do you need this label for anything?  As-is it's just added visual
> clutter and could come out.
>
> > +================
> > +Page Table Check
> > +================
> > +
> > +Page table check allows to hardern the kernel by ensuring that some types of
> > +memory corruptions are prevented.
> > +
> > +Page table check performs extra verifications at the time when new pages become
> > +accessible from userspace by getting their page table entries (PTEs PMDs etc.)
> > +added into the table.
> > +
> > +In case of detected corruption, the kernel is crashed. There is a small
> > +performance and memory overhead associated with page table check. Thereofre, it
> > +is disabled by default but can be optionally enabled on systems where extra
> > +hardening outweighs the costs. Also, because page table check is synchronous, it
> > +can help with debugging double map memory corruption issues, by crashing kernel
> > +at the time wrong mapping occurs instead of later which is often the case with
> > +memory corruptions bugs.
> > +
> > +==============================
> > +Double mapping detection logic
> > +==============================
>
> I'd use subsection markup (single "==========" line underneath) for the
> subsections.
>
> > ++-------------------+-------------------+-------------------+------------------+
> > +| Current Mapping   | New mapping       | Permissions       | Rule             |
> > ++===================+===================+===================+==================+
> > +| Anonymous         | Anonymous         | Read              | Allow            |
> > ++-------------------+-------------------+-------------------+------------------+
> > +| Anonymous         | Anonymous         | Read / Write      | Prohibit         |
> > ++-------------------+-------------------+-------------------+------------------+
> > +| Anonymous         | Named             | Any               | Prohibit         |
> > ++-------------------+-------------------+-------------------+------------------+
> > +| Named             | Anonymous         | Any               | Prohibit         |
> > ++-------------------+-------------------+-------------------+------------------+
> > +| Named             | Named             | Any               | Allow            |
> > ++-------------------+-------------------+-------------------+------------------+
> > +
> > +=========================
> > +Enabling Page Table Check
> > +=========================
> > +
> > +Build kernel with:
> > +
> > +- PAGE_TABLE_CHECK=y
> > +Note, it can only be enabled on platforms where ARCH_SUPPORTS_PAGE_TABLE_CHECK
> > +is available.
> > +- Boot with 'page_table_check=on' kernel parameter.
> > +
> > +Optionally, build kernel with PAGE_TABLE_CHECK_ENFORCED in order to have page
> > +table support without extra kernel parameter.
>
> Thanks,
>
> jon

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-11-17 16:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-16 22:00 [RFC 0/3] page table check Pasha Tatashin
2021-11-16 22:00 ` [RFC 1/3] mm: ptep_clear() page table helper Pasha Tatashin
2021-11-17  8:51   ` Anshuman Khandual
2021-11-17 16:43     ` Pasha Tatashin
2021-11-16 22:00 ` [RFC 2/3] mm: page table check Pasha Tatashin
2021-11-17  8:08   ` Jonathan Corbet
2021-11-17 16:47     ` Pasha Tatashin
2021-11-16 22:00 ` [RFC 3/3] x86: mm: add x86_64 support for " Pasha Tatashin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).