linux-hardening.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/12] arm64: implement read-only page tables
@ 2022-01-26 17:29 Ard Biesheuvel
  2022-01-26 17:30 ` [RFC PATCH 01/12] asm-generic/pgalloc: allow arch to override PMD alloc/free routines Ard Biesheuvel
                   ` (11 more replies)
  0 siblings, 12 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2022-01-26 17:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: kvmarm, linux-hardening, Ard Biesheuvel, Will Deacon,
	Marc Zyngier, Fuad Tabba, Quentin Perret, Mark Rutland,
	James Morse, Catalin Marinas

This RFC series implements support for mapping all user and kernel page
tables read-only in the linear map, and using a special fixmap slot to
make any modifications.

The purpose is to prevent page tables from being manipulated
inadvertently, which is becoming more and more important on arm64, as
many new hardening features such as BTI and MTE are controlled via
attributes in the page tables.

This series is only half of the work that is underway to implement this
in terms of hypervisor services rather than fixmap pokes, as this will
allow the hypervisor to remove all write permissions from pages used as
page tables. This work is being done in the context of the pKVM project,
which defines a clear boundary between the hypervisor executing at EL2,
and the [untrusted] host running at EL1. In this context, managing the
host's page tables at HYP level should increase the robustness of the
entire system substantially.

This series is posted separately for discussion, as it introduces the
changes that are necessary to route all page table updates via a small
set of helpers, allowing us to choose between unprotected, fixmap or HYP
protection straight-forwardly.

The pKVM specific changes will be posted as a followup series.

Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Fuad Tabba <tabba@google.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>

Ard Biesheuvel (12):
  asm-generic/pgalloc: allow arch to override PMD alloc/free routines
  arm64: mm: add helpers to remap page tables read-only/read-write
  arm64: mm: use a fixmap slot for user page table modifications
  arm64: mm: remap PGD pages r/o in the linear region after allocation
  arm64: mm: remap PUD pages r/o in linear region
  arm64: mm: remap PMD pages r/o in linear region
  arm64: mm: remap PTE level user page tables r/o in the linear region
  arm64: mm: remap kernel PTE level page tables r/o in the linear region
  arm64: mm: remap kernel page tables read-only at end of init
  mm: add default definition of p4d_index()
  arm64: efi: use set_pte_at() not set_pte() in order to pass mm pointer
  arm64: hugetlb: use set_pte_at() not set_pte() to provide mm pointer

 arch/arm64/Kconfig               |  11 ++
 arch/arm64/include/asm/fixmap.h  |   1 +
 arch/arm64/include/asm/pgalloc.h |  49 ++++++++-
 arch/arm64/include/asm/pgtable.h |  82 +++++++++++---
 arch/arm64/include/asm/tlb.h     |   6 +
 arch/arm64/kernel/efi.c          |   2 +-
 arch/arm64/mm/Makefile           |   2 +
 arch/arm64/mm/fault.c            |   8 +-
 arch/arm64/mm/hugetlbpage.c      |   4 +-
 arch/arm64/mm/mmu.c              | 115 +++++++++++++++++++-
 arch/arm64/mm/pageattr.c         |  14 +++
 arch/arm64/mm/pgd.c              |  25 +++--
 arch/arm64/mm/ro_page_tables.c   | 100 +++++++++++++++++
 include/asm-generic/pgalloc.h    |  13 ++-
 include/linux/pgtable.h          |   8 ++
 15 files changed, 405 insertions(+), 35 deletions(-)
 create mode 100644 arch/arm64/mm/ro_page_tables.c


base-commit: e783362eb54cd99b2cac8b3a9aeac942e6f6ac07
-- 
2.30.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 01/12] asm-generic/pgalloc: allow arch to override PMD alloc/free routines
  2022-01-26 17:29 [RFC PATCH 00/12] arm64: implement read-only page tables Ard Biesheuvel
@ 2022-01-26 17:30 ` Ard Biesheuvel
  2022-01-26 17:30 ` [RFC PATCH 02/12] arm64: mm: add helpers to remap page tables read-only/read-write Ard Biesheuvel
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2022-01-26 17:30 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: kvmarm, linux-hardening, Ard Biesheuvel, Will Deacon,
	Marc Zyngier, Fuad Tabba, Quentin Perret, Mark Rutland,
	James Morse, Catalin Marinas

Extend the existing CPP macro based hooks that allow architectures to
specialize the code that allocates and frees pages to be used as page
tables.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 include/asm-generic/pgalloc.h | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 977bea16cf1b..65f31f615d99 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -34,6 +34,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 }
 #endif
 
+#ifndef __HAVE_ARCH_PTE_FREE_KERNEL
 /**
  * pte_free_kernel - free PTE-level kernel page table page
  * @mm: the mm_struct of the current context
@@ -43,6 +44,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
 	free_page((unsigned long)pte);
 }
+#endif
 
 /**
  * __pte_alloc_one - allocate a page for PTE-level user page table
@@ -91,6 +93,7 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
  * done with a reference count in struct page.
  */
 
+#ifndef __HAVE_ARCH_PTE_FREE
 /**
  * pte_free - free PTE-level user page table page
  * @mm: the mm_struct of the current context
@@ -101,11 +104,11 @@ static inline void pte_free(struct mm_struct *mm, struct page *pte_page)
 	pgtable_pte_page_dtor(pte_page);
 	__free_page(pte_page);
 }
+#endif
 
 
 #if CONFIG_PGTABLE_LEVELS > 2
 
-#ifndef __HAVE_ARCH_PMD_ALLOC_ONE
 /**
  * pmd_alloc_one - allocate a page for PMD-level page table
  * @mm: the mm_struct of the current context
@@ -116,7 +119,7 @@ static inline void pte_free(struct mm_struct *mm, struct page *pte_page)
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
+static inline pmd_t *__pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
 	struct page *page;
 	gfp_t gfp = GFP_PGTABLE_USER;
@@ -132,6 +135,12 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 	}
 	return (pmd_t *)page_address(page);
 }
+
+#ifndef __HAVE_ARCH_PMD_ALLOC_ONE
+static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+	return __pmd_alloc_one(mm, addr);
+}
 #endif
 
 #ifndef __HAVE_ARCH_PMD_FREE
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 02/12] arm64: mm: add helpers to remap page tables read-only/read-write
  2022-01-26 17:29 [RFC PATCH 00/12] arm64: implement read-only page tables Ard Biesheuvel
  2022-01-26 17:30 ` [RFC PATCH 01/12] asm-generic/pgalloc: allow arch to override PMD alloc/free routines Ard Biesheuvel
@ 2022-01-26 17:30 ` Ard Biesheuvel
  2022-01-26 17:30 ` [RFC PATCH 03/12] arm64: mm: use a fixmap slot for user page table modifications Ard Biesheuvel
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2022-01-26 17:30 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: kvmarm, linux-hardening, Ard Biesheuvel, Will Deacon,
	Marc Zyngier, Fuad Tabba, Quentin Perret, Mark Rutland,
	James Morse, Catalin Marinas

Add a couple of helpers to remap a single page read-only or read-write
via its linear address. This will be used for mappings of page table
pages in the linear region.

Note that set_memory_ro/set_memory_rw operate on addresses in the
vmalloc space only, so they cannot be used here.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/pgtable.h |  3 +++
 arch/arm64/mm/pageattr.c         | 14 ++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index c4ba047a82d2..8d3806c68687 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -34,6 +34,9 @@
 #include <linux/mm_types.h>
 #include <linux/sched.h>
 
+int set_pgtable_ro(void *addr);
+int set_pgtable_rw(void *addr);
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
 
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index a3bacd79507a..61f4aca08b95 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -153,6 +153,20 @@ int set_memory_valid(unsigned long addr, int numpages, int enable)
 					__pgprot(PTE_VALID));
 }
 
+int set_pgtable_ro(void *addr)
+{
+	return __change_memory_common((u64)addr, PAGE_SIZE,
+				      __pgprot(PTE_RDONLY),
+				      __pgprot(PTE_WRITE));
+}
+
+int set_pgtable_rw(void *addr)
+{
+	return __change_memory_common((u64)addr, PAGE_SIZE,
+				      __pgprot(PTE_WRITE),
+				      __pgprot(PTE_RDONLY));
+}
+
 int set_direct_map_invalid_noflush(struct page *page)
 {
 	struct page_change_data data = {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 03/12] arm64: mm: use a fixmap slot for user page table modifications
  2022-01-26 17:29 [RFC PATCH 00/12] arm64: implement read-only page tables Ard Biesheuvel
  2022-01-26 17:30 ` [RFC PATCH 01/12] asm-generic/pgalloc: allow arch to override PMD alloc/free routines Ard Biesheuvel
  2022-01-26 17:30 ` [RFC PATCH 02/12] arm64: mm: add helpers to remap page tables read-only/read-write Ard Biesheuvel
@ 2022-01-26 17:30 ` Ard Biesheuvel
  2022-01-28 16:08   ` Steven Price
  2022-01-26 17:30 ` [RFC PATCH 04/12] arm64: mm: remap PGD pages r/o in the linear region after allocation Ard Biesheuvel
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2022-01-26 17:30 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: kvmarm, linux-hardening, Ard Biesheuvel, Will Deacon,
	Marc Zyngier, Fuad Tabba, Quentin Perret, Mark Rutland,
	James Morse, Catalin Marinas

To prepare for user and kernel page tables being remapped read-only in
the linear region, define a new fixmap slot and use it to apply all page
table descriptor updates that target page tables other than swapper.

Fortunately for us, the fixmap descriptors themselves are always
manipulated via their kernel mapping in .bss, so there is no special
exception required to avoid circular logic here.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/Kconfig               |  11 +++
 arch/arm64/include/asm/fixmap.h  |   1 +
 arch/arm64/include/asm/pgalloc.h |  28 +++++-
 arch/arm64/include/asm/pgtable.h |  79 +++++++++++++---
 arch/arm64/mm/Makefile           |   2 +
 arch/arm64/mm/fault.c            |   8 +-
 arch/arm64/mm/ro_page_tables.c   | 100 ++++++++++++++++++++
 7 files changed, 209 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6978140edfa4..a3e98286b074 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1311,6 +1311,17 @@ config RODATA_FULL_DEFAULT_ENABLED
 	  This requires the linear region to be mapped down to pages,
 	  which may adversely affect performance in some cases.
 
+config ARM64_RO_PAGE_TABLES
+	bool "Remap page tables read-only in the kernel VA space"
+	select RODATA_FULL_DEFAULT_ENABLED
+	help
+	  Remap linear mappings of page table pages read-only as long as they
+	  are being used as such, and use a fixmap API to manipulate all page
+	  table descriptors, instead of manipulating them directly via their
+	  writable mappings in the direct map. This is intended as a debug
+	  and/or hardening feature, as it removes the ability for stray writes
+	  to be exploited to bypass permission restrictions.
+
 config ARM64_SW_TTBR0_PAN
 	bool "Emulate Privileged Access Never using TTBR0_EL1 switching"
 	help
diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index 4335800201c9..71dfbe0452bb 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -50,6 +50,7 @@ enum fixed_addresses {
 
 	FIX_EARLYCON_MEM_BASE,
 	FIX_TEXT_POKE0,
+	FIX_TEXT_POKE_PTE,
 
 #ifdef CONFIG_ACPI_APEI_GHES
 	/* Used for GHES mapping from assorted contexts */
diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 237224484d0f..d54ac9f8d6c7 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -30,7 +30,11 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
 	pudval_t pudval = PUD_TYPE_TABLE;
 
 	pudval |= (mm == &init_mm) ? PUD_TABLE_UXN : PUD_TABLE_PXN;
-	__pud_populate(pudp, __pa(pmdp), pudval);
+	if (page_tables_are_ro())
+		xchg_ro_pte(mm, (pte_t *)pudp,
+			    __pte(__phys_to_pud_val(__pa(pmdp) | pudval)));
+	else
+		__pud_populate(pudp, __pa(pmdp), pudval);
 }
 #else
 static inline void __pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot)
@@ -51,7 +55,11 @@ static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
 	p4dval_t p4dval = P4D_TYPE_TABLE;
 
 	p4dval |= (mm == &init_mm) ? P4D_TABLE_UXN : P4D_TABLE_PXN;
-	__p4d_populate(p4dp, __pa(pudp), p4dval);
+	if (page_tables_are_ro())
+		xchg_ro_pte(mm, (pte_t *)p4dp,
+			    __pte(__phys_to_p4d_val(__pa(pudp) | p4dval)));
+	else
+		__p4d_populate(p4dp, __pa(pudp), p4dval);
 }
 #else
 static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
@@ -76,15 +84,27 @@ static inline void __pmd_populate(pmd_t *pmdp, phys_addr_t ptep,
 static inline void
 pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp, pte_t *ptep)
 {
+	pmdval_t pmdval = PMD_TYPE_TABLE | PMD_TABLE_UXN;
+
 	VM_BUG_ON(mm && mm != &init_mm);
-	__pmd_populate(pmdp, __pa(ptep), PMD_TYPE_TABLE | PMD_TABLE_UXN);
+	if (page_tables_are_ro())
+		xchg_ro_pte(mm, (pte_t *)pmdp,
+			    __pte(__phys_to_pmd_val(__pa(ptep) | pmdval)));
+	else
+		__pmd_populate(pmdp, __pa(ptep), pmdval);
 }
 
 static inline void
 pmd_populate(struct mm_struct *mm, pmd_t *pmdp, pgtable_t ptep)
 {
+	pmdval_t pmdval = PMD_TYPE_TABLE | PMD_TABLE_PXN;
+
 	VM_BUG_ON(mm == &init_mm);
-	__pmd_populate(pmdp, page_to_phys(ptep), PMD_TYPE_TABLE | PMD_TABLE_PXN);
+	if (page_tables_are_ro())
+		xchg_ro_pte(mm, (pte_t *)pmdp,
+			    __pte(__phys_to_pmd_val(page_to_phys(ptep) | pmdval)));
+	else
+		__pmd_populate(pmdp, page_to_phys(ptep), pmdval);
 }
 
 #endif
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 8d3806c68687..a8daea6b4ac9 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -30,6 +30,7 @@
 
 #include <asm/cmpxchg.h>
 #include <asm/fixmap.h>
+#include <linux/jump_label.h>
 #include <linux/mmdebug.h>
 #include <linux/mm_types.h>
 #include <linux/sched.h>
@@ -37,6 +38,17 @@
 int set_pgtable_ro(void *addr);
 int set_pgtable_rw(void *addr);
 
+DECLARE_STATIC_KEY_FALSE(ro_page_tables);
+
+static inline bool page_tables_are_ro(void)
+{
+	return IS_ENABLED(CONFIG_ARM64_RO_PAGE_TABLES) &&
+	       static_branch_unlikely(&ro_page_tables);
+}
+
+pte_t xchg_ro_pte(struct mm_struct *mm, pte_t *ptep, pte_t pte);
+pte_t cmpxchg_ro_pte(struct mm_struct *mm, pte_t *ptep, pte_t old, pte_t new);
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
 
@@ -89,7 +101,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
 	__pte(__phys_to_pte_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
 
 #define pte_none(pte)		(!pte_val(pte))
-#define pte_clear(mm,addr,ptep)	set_pte(ptep, __pte(0))
+#define pte_clear(mm,addr,ptep)	set_pte_at(mm, addr, ptep, __pte(0))
 #define pte_page(pte)		(pfn_to_page(pte_pfn(pte)))
 
 /*
@@ -257,7 +269,10 @@ static inline pte_t pte_mkdevmap(pte_t pte)
 
 static inline void set_pte(pte_t *ptep, pte_t pte)
 {
-	WRITE_ONCE(*ptep, pte);
+	if (page_tables_are_ro())
+		xchg_ro_pte(&init_mm, ptep, pte);
+	else
+		WRITE_ONCE(*ptep, pte);
 
 	/*
 	 * Only if the new pte is valid and kernel, otherwise TLB maintenance
@@ -343,7 +358,10 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 
 	__check_racy_pte_update(mm, ptep, pte);
 
-	set_pte(ptep, pte);
+	if (page_tables_are_ro())
+		xchg_ro_pte(mm, ptep, pte);
+	else
+		set_pte(ptep, pte);
 }
 
 /*
@@ -579,7 +597,10 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 	}
 #endif /* __PAGETABLE_PMD_FOLDED */
 
-	WRITE_ONCE(*pmdp, pmd);
+	if (page_tables_are_ro())
+		xchg_ro_pte(&init_mm, (pte_t *)pmdp, pmd_pte(pmd));
+	else
+		WRITE_ONCE(*pmdp, pmd);
 
 	if (pmd_valid(pmd)) {
 		dsb(ishst);
@@ -589,7 +610,10 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 
 static inline void pmd_clear(pmd_t *pmdp)
 {
-	set_pmd(pmdp, __pmd(0));
+	if (page_tables_are_ro())
+		xchg_ro_pte(NULL, (pte_t *)pmdp, __pte(0));
+	else
+		set_pmd(pmdp, __pmd(0));
 }
 
 static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
@@ -640,7 +664,10 @@ static inline void set_pud(pud_t *pudp, pud_t pud)
 	}
 #endif /* __PAGETABLE_PUD_FOLDED */
 
-	WRITE_ONCE(*pudp, pud);
+	if (page_tables_are_ro())
+		xchg_ro_pte(&init_mm, (pte_t *)pudp, pud_pte(pud));
+	else
+		WRITE_ONCE(*pudp, pud);
 
 	if (pud_valid(pud)) {
 		dsb(ishst);
@@ -650,7 +677,10 @@ static inline void set_pud(pud_t *pudp, pud_t pud)
 
 static inline void pud_clear(pud_t *pudp)
 {
-	set_pud(pudp, __pud(0));
+	if (page_tables_are_ro())
+		xchg_ro_pte(NULL, (pte_t *)pudp, __pte(0));
+	else
+		set_pud(pudp, __pud(0));
 }
 
 static inline phys_addr_t pud_page_paddr(pud_t pud)
@@ -704,14 +734,20 @@ static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
 		return;
 	}
 
-	WRITE_ONCE(*p4dp, p4d);
+	if (page_tables_are_ro())
+		xchg_ro_pte(&init_mm, (pte_t *)p4dp, p4d_pte(p4d));
+	else
+		WRITE_ONCE(*p4dp, p4d);
 	dsb(ishst);
 	isb();
 }
 
 static inline void p4d_clear(p4d_t *p4dp)
 {
-	set_p4d(p4dp, __p4d(0));
+	if (page_tables_are_ro())
+		xchg_ro_pte(NULL, (pte_t *)p4dp, __pte(0));
+	else
+		set_p4d(p4dp, __p4d(0));
 }
 
 static inline phys_addr_t p4d_page_paddr(p4d_t p4d)
@@ -806,7 +842,7 @@ static inline int pgd_devmap(pgd_t pgd)
  * Atomic pte/pmd modifications.
  */
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
-static inline int __ptep_test_and_clear_young(pte_t *ptep)
+static inline int __ptep_test_and_clear_young(struct mm_struct *mm, pte_t *ptep)
 {
 	pte_t old_pte, pte;
 
@@ -814,8 +850,13 @@ static inline int __ptep_test_and_clear_young(pte_t *ptep)
 	do {
 		old_pte = pte;
 		pte = pte_mkold(pte);
-		pte_val(pte) = cmpxchg_relaxed(&pte_val(*ptep),
-					       pte_val(old_pte), pte_val(pte));
+
+		if (page_tables_are_ro())
+			pte = cmpxchg_ro_pte(mm, ptep, old_pte, pte);
+		else
+			pte_val(pte) = cmpxchg_relaxed(&pte_val(*ptep),
+						       pte_val(old_pte),
+						       pte_val(pte));
 	} while (pte_val(pte) != pte_val(old_pte));
 
 	return pte_young(pte);
@@ -825,7 +866,7 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
 					    unsigned long address,
 					    pte_t *ptep)
 {
-	return __ptep_test_and_clear_young(ptep);
+	return __ptep_test_and_clear_young(vma->vm_mm, ptep);
 }
 
 #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
@@ -863,6 +904,8 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 				       unsigned long address, pte_t *ptep)
 {
+	if (page_tables_are_ro())
+		return xchg_ro_pte(mm, ptep, __pte(0));
 	return __pte(xchg_relaxed(&pte_val(*ptep), 0));
 }
 
@@ -888,8 +931,12 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres
 	do {
 		old_pte = pte;
 		pte = pte_wrprotect(pte);
-		pte_val(pte) = cmpxchg_relaxed(&pte_val(*ptep),
-					       pte_val(old_pte), pte_val(pte));
+		if (page_tables_are_ro())
+			pte = cmpxchg_ro_pte(mm, ptep, old_pte, pte);
+		else
+			pte_val(pte) = cmpxchg_relaxed(&pte_val(*ptep),
+						       pte_val(old_pte),
+						       pte_val(pte));
 	} while (pte_val(pte) != pte_val(old_pte));
 }
 
@@ -905,6 +952,8 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 		unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
+	if (page_tables_are_ro())
+		return pte_pmd(xchg_ro_pte(vma->vm_mm, (pte_t *)pmdp, pmd_pte(pmd)));
 	return __pmd(xchg_relaxed(&pmd_val(*pmdp), pmd_val(pmd)));
 }
 #endif
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index ff1e800ba7a1..7750cafd969a 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -14,3 +14,5 @@ KASAN_SANITIZE_physaddr.o	+= n
 
 obj-$(CONFIG_KASAN)		+= kasan_init.o
 KASAN_SANITIZE_kasan_init.o	:= n
+
+obj-$(CONFIG_ARM64_RO_PAGE_TABLES) += ro_page_tables.o
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 77341b160aca..5a5055c3e1c2 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -220,7 +220,13 @@ int ptep_set_access_flags(struct vm_area_struct *vma,
 		pteval ^= PTE_RDONLY;
 		pteval |= pte_val(entry);
 		pteval ^= PTE_RDONLY;
-		pteval = cmpxchg_relaxed(&pte_val(*ptep), old_pteval, pteval);
+		if (page_tables_are_ro())
+			pteval = pte_val(cmpxchg_ro_pte(vma->vm_mm, ptep,
+							__pte(old_pteval),
+							__pte(pteval)));
+		else
+			pteval = cmpxchg_relaxed(&pte_val(*ptep), old_pteval,
+						 pteval);
 	} while (pteval != old_pteval);
 
 	/* Invalidate a stale read-only entry */
diff --git a/arch/arm64/mm/ro_page_tables.c b/arch/arm64/mm/ro_page_tables.c
new file mode 100644
index 000000000000..f497adfd774d
--- /dev/null
+++ b/arch/arm64/mm/ro_page_tables.c
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 - Google Inc
+ * Author: Ard Biesheuvel <ardb@google.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/memory.h>
+#include <linux/mm.h>
+#include <linux/sizes.h>
+
+#include <asm/fixmap.h>
+#include <asm/kernel-pgtable.h>
+#include <asm/mmu_context.h>
+#include <asm/pgalloc.h>
+#include <asm/tlbflush.h>
+#include <asm/sections.h>
+
+static DEFINE_RAW_SPINLOCK(patch_pte_lock);
+
+DEFINE_STATIC_KEY_FALSE(ro_page_tables);
+
+static bool __initdata ro_page_tables_enabled = true;
+
+static int __init parse_ro_page_tables(char *arg)
+{
+	return strtobool(arg, &ro_page_tables_enabled);
+}
+early_param("ro_page_tables", parse_ro_page_tables);
+
+static bool in_kernel_text_or_rodata(phys_addr_t pa)
+{
+	/*
+	 * This is a minimal check to ensure that the r/o page table patching
+	 * API is not being abused to make changes to the kernel text. This
+	 * should ideally cover module and BPF text/rodata as well, but that
+	 * is less straight-forward and hence more costly.
+	 */
+	return pa >= __pa_symbol(_stext) && pa < __pa_symbol(__init_begin);
+}
+
+pte_t xchg_ro_pte(struct mm_struct *mm, pte_t *ptep, pte_t pte)
+{
+	unsigned long flags;
+	u64 pte_pa;
+	pte_t ret;
+	pte_t *p;
+
+	/* can we use __pa() on ptep? */
+	if (!virt_addr_valid(ptep)) {
+		/* only linear aliases are remapped r/o anyway */
+		pte_val(ret) = xchg_relaxed(&pte_val(*ptep), pte_val(pte));
+		return ret;
+	}
+
+	pte_pa = __pa(ptep);
+	BUG_ON(in_kernel_text_or_rodata(pte_pa));
+
+	raw_spin_lock_irqsave(&patch_pte_lock, flags);
+	p = (pte_t *)set_fixmap_offset(FIX_TEXT_POKE_PTE, pte_pa);
+	pte_val(ret) = xchg_relaxed(&pte_val(*p), pte_val(pte));
+	clear_fixmap(FIX_TEXT_POKE_PTE);
+	raw_spin_unlock_irqrestore(&patch_pte_lock, flags);
+	return ret;
+}
+
+pte_t cmpxchg_ro_pte(struct mm_struct *mm, pte_t *ptep, pte_t old, pte_t new)
+{
+	unsigned long flags;
+	u64 pte_pa;
+	pte_t ret;
+	pte_t *p;
+
+	BUG_ON(!virt_addr_valid(ptep));
+
+	pte_pa = __pa(ptep);
+	BUG_ON(in_kernel_text_or_rodata(pte_pa));
+
+	raw_spin_lock_irqsave(&patch_pte_lock, flags);
+	p = (pte_t *)set_fixmap_offset(FIX_TEXT_POKE_PTE, pte_pa);
+	pte_val(ret) = cmpxchg_relaxed(&pte_val(*p), pte_val(old), pte_val(new));
+	clear_fixmap(FIX_TEXT_POKE_PTE);
+	raw_spin_unlock_irqrestore(&patch_pte_lock, flags);
+	return ret;
+}
+
+static int __init ro_page_tables_init(void)
+{
+	if (ro_page_tables_enabled) {
+		if (!rodata_full) {
+			pr_err("Failed to enable R/O page table protection, rodata=full is not enabled\n");
+		} else {
+			pr_err("Enabling R/O page table protection\n");
+			static_branch_enable(&ro_page_tables);
+		}
+	}
+	return 0;
+}
+early_initcall(ro_page_tables_init);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 04/12] arm64: mm: remap PGD pages r/o in the linear region after allocation
  2022-01-26 17:29 [RFC PATCH 00/12] arm64: implement read-only page tables Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2022-01-26 17:30 ` [RFC PATCH 03/12] arm64: mm: use a fixmap slot for user page table modifications Ard Biesheuvel
@ 2022-01-26 17:30 ` Ard Biesheuvel
  2022-01-26 17:30 ` [RFC PATCH 05/12] arm64: mm: remap PUD pages r/o in linear region Ard Biesheuvel
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2022-01-26 17:30 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: kvmarm, linux-hardening, Ard Biesheuvel, Will Deacon,
	Marc Zyngier, Fuad Tabba, Quentin Perret, Mark Rutland,
	James Morse, Catalin Marinas

As the first step in restricting write access to all page tables via the
linear mapping, remap the page at the root PGD level of a user space
page table hierarchy read-only after allocation, so that it can only be
manipulated using the dedicated fixmap based API.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/mm/mmu.c |  7 ++++--
 arch/arm64/mm/pgd.c | 25 ++++++++++++++------
 2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index acfae9b41cc8..a52c3162beae 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -394,8 +394,11 @@ static phys_addr_t __pgd_pgtable_alloc(int shift)
 	void *ptr = (void *)__get_free_page(GFP_PGTABLE_KERNEL);
 	BUG_ON(!ptr);
 
-	/* Ensure the zeroed page is visible to the page table walker */
-	dsb(ishst);
+	if (page_tables_are_ro())
+		set_pgtable_ro(ptr);
+	else
+		/* Ensure the zeroed page is visible to the page table walker */
+		dsb(ishst);
 	return __pa(ptr);
 }
 
diff --git a/arch/arm64/mm/pgd.c b/arch/arm64/mm/pgd.c
index 4a64089e5771..637d6eceeada 100644
--- a/arch/arm64/mm/pgd.c
+++ b/arch/arm64/mm/pgd.c
@@ -9,8 +9,10 @@
 #include <linux/mm.h>
 #include <linux/gfp.h>
 #include <linux/highmem.h>
+#include <linux/set_memory.h>
 #include <linux/slab.h>
 
+#include <asm/mmu_context.h>
 #include <asm/pgalloc.h>
 #include <asm/page.h>
 #include <asm/tlbflush.h>
@@ -20,24 +22,33 @@ static struct kmem_cache *pgd_cache __ro_after_init;
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
 	gfp_t gfp = GFP_PGTABLE_USER;
+	pgd_t *pgd;
 
-	if (PGD_SIZE == PAGE_SIZE)
-		return (pgd_t *)__get_free_page(gfp);
-	else
+	if (PGD_SIZE < PAGE_SIZE && !page_tables_are_ro())
 		return kmem_cache_alloc(pgd_cache, gfp);
+
+	pgd = (pgd_t *)__get_free_page(gfp);
+	if (!pgd)
+		return NULL;
+	if (page_tables_are_ro())
+		set_pgtable_ro(pgd);
+	return pgd;
 }
 
 void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 {
-	if (PGD_SIZE == PAGE_SIZE)
-		free_page((unsigned long)pgd);
-	else
+	if (PGD_SIZE < PAGE_SIZE && !page_tables_are_ro()) {
 		kmem_cache_free(pgd_cache, pgd);
+	} else {
+		if (page_tables_are_ro())
+			set_pgtable_rw(pgd);
+		free_page((unsigned long)pgd);
+	}
 }
 
 void __init pgtable_cache_init(void)
 {
-	if (PGD_SIZE == PAGE_SIZE)
+	if (PGD_SIZE == PAGE_SIZE || page_tables_are_ro())
 		return;
 
 #ifdef CONFIG_ARM64_PA_BITS_52
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 05/12] arm64: mm: remap PUD pages r/o in linear region
  2022-01-26 17:29 [RFC PATCH 00/12] arm64: implement read-only page tables Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2022-01-26 17:30 ` [RFC PATCH 04/12] arm64: mm: remap PGD pages r/o in the linear region after allocation Ard Biesheuvel
@ 2022-01-26 17:30 ` Ard Biesheuvel
  2022-01-26 17:30 ` [RFC PATCH 06/12] arm64: mm: remap PMD " Ard Biesheuvel
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2022-01-26 17:30 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: kvmarm, linux-hardening, Ard Biesheuvel, Will Deacon,
	Marc Zyngier, Fuad Tabba, Quentin Perret, Mark Rutland,
	James Morse, Catalin Marinas

Implement the arch specific PUD alloc/free helpers by wrapping the
generic code, and remapping the page read-only on allocation and
read-write on free.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/pgalloc.h |  5 +++++
 arch/arm64/include/asm/tlb.h     |  2 ++
 arch/arm64/mm/mmu.c              | 20 ++++++++++++++++++++
 3 files changed, 27 insertions(+)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index d54ac9f8d6c7..737e9f32b199 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -14,6 +14,8 @@
 #include <asm/tlbflush.h>
 
 #define __HAVE_ARCH_PGD_FREE
+#define __HAVE_ARCH_PUD_ALLOC_ONE
+#define __HAVE_ARCH_PUD_FREE
 #include <asm-generic/pgalloc.h>
 
 #define PGD_SIZE	(PTRS_PER_PGD * sizeof(pgd_t))
@@ -45,6 +47,9 @@ static inline void __pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot)
 
 #if CONFIG_PGTABLE_LEVELS > 3
 
+pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr);
+void pud_free(struct mm_struct *mm, pud_t *pud);
+
 static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
 {
 	set_p4d(p4dp, __p4d(__phys_to_p4d_val(pudp) | prot));
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index c995d1f4594f..6557626752fc 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -94,6 +94,8 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
 				  unsigned long addr)
 {
+	if (page_tables_are_ro())
+		set_pgtable_rw(pudp);
 	tlb_remove_table(tlb, virt_to_page(pudp));
 }
 #endif
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a52c3162beae..03d77c4c3570 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1645,3 +1645,23 @@ static int __init prevent_bootmem_remove_init(void)
 }
 early_initcall(prevent_bootmem_remove_init);
 #endif
+
+#ifndef __PAGETABLE_PUD_FOLDED
+pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+	pud_t *pud = __pud_alloc_one(mm, addr);
+
+	if (!pud)
+		return NULL;
+	if (page_tables_are_ro())
+		set_pgtable_ro(pud);
+	return pud;
+}
+
+void pud_free(struct mm_struct *mm, pud_t *pud)
+{
+	if (page_tables_are_ro())
+		set_pgtable_rw(pud);
+	free_page((u64)pud);
+}
+#endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 06/12] arm64: mm: remap PMD pages r/o in linear region
  2022-01-26 17:29 [RFC PATCH 00/12] arm64: implement read-only page tables Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2022-01-26 17:30 ` [RFC PATCH 05/12] arm64: mm: remap PUD pages r/o in linear region Ard Biesheuvel
@ 2022-01-26 17:30 ` Ard Biesheuvel
  2022-01-26 17:30 ` [RFC PATCH 07/12] arm64: mm: remap PTE level user page tables r/o in the " Ard Biesheuvel
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2022-01-26 17:30 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: kvmarm, linux-hardening, Ard Biesheuvel, Will Deacon,
	Marc Zyngier, Fuad Tabba, Quentin Perret, Mark Rutland,
	James Morse, Catalin Marinas

PMD modifications all go through the fixmap update routine, so there is
no longer a need to keep it mapped read/write in the linear region.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/pgalloc.h |  5 +++++
 arch/arm64/include/asm/tlb.h     |  2 ++
 arch/arm64/mm/mmu.c              | 21 ++++++++++++++++++++
 3 files changed, 28 insertions(+)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 737e9f32b199..63f9ae9e96fe 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -16,12 +16,17 @@
 #define __HAVE_ARCH_PGD_FREE
 #define __HAVE_ARCH_PUD_ALLOC_ONE
 #define __HAVE_ARCH_PUD_FREE
+#define __HAVE_ARCH_PMD_ALLOC_ONE
+#define __HAVE_ARCH_PMD_FREE
 #include <asm-generic/pgalloc.h>
 
 #define PGD_SIZE	(PTRS_PER_PGD * sizeof(pgd_t))
 
 #if CONFIG_PGTABLE_LEVELS > 2
 
+pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr);
+void pmd_free(struct mm_struct *mm, pmd_t *pmd);
+
 static inline void __pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot)
 {
 	set_pud(pudp, __pud(__phys_to_pud_val(pmdp) | prot));
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 6557626752fc..0f54fbb59bba 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -85,6 +85,8 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 {
 	struct page *page = virt_to_page(pmdp);
 
+	if (page_tables_are_ro())
+		set_pgtable_rw(pmdp);
 	pgtable_pmd_page_dtor(page);
 	tlb_remove_table(tlb, page);
 }
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 03d77c4c3570..e55d91a5f1ed 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1665,3 +1665,24 @@ void pud_free(struct mm_struct *mm, pud_t *pud)
 	free_page((u64)pud);
 }
 #endif
+
+#ifndef __PAGETABLE_PMD_FOLDED
+pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+	pmd_t *pmd = __pmd_alloc_one(mm, addr);
+
+	if (!pmd)
+		return NULL;
+	if (page_tables_are_ro())
+		set_pgtable_ro(pmd);
+	return pmd;
+}
+
+void pmd_free(struct mm_struct *mm, pmd_t *pmd)
+{
+	if (page_tables_are_ro())
+		set_pgtable_rw(pmd);
+	pgtable_pmd_page_dtor(virt_to_page(pmd));
+	free_page((u64)pmd);
+}
+#endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 07/12] arm64: mm: remap PTE level user page tables r/o in the linear region
  2022-01-26 17:29 [RFC PATCH 00/12] arm64: implement read-only page tables Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2022-01-26 17:30 ` [RFC PATCH 06/12] arm64: mm: remap PMD " Ard Biesheuvel
@ 2022-01-26 17:30 ` Ard Biesheuvel
  2022-01-26 17:30 ` [RFC PATCH 08/12] arm64: mm: remap kernel PTE level " Ard Biesheuvel
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2022-01-26 17:30 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: kvmarm, linux-hardening, Ard Biesheuvel, Will Deacon,
	Marc Zyngier, Fuad Tabba, Quentin Perret, Mark Rutland,
	James Morse, Catalin Marinas

Now that all PTE manipulations for user space tables go via the fixmap,
we can remap these tables read-only in the linear region so they cannot
be corrupted inadvertently.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/pgalloc.h |  5 +++++
 arch/arm64/include/asm/tlb.h     |  2 ++
 arch/arm64/mm/mmu.c              | 23 ++++++++++++++++++++
 3 files changed, 30 insertions(+)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 63f9ae9e96fe..18a5bb0c9ee4 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -18,10 +18,15 @@
 #define __HAVE_ARCH_PUD_FREE
 #define __HAVE_ARCH_PMD_ALLOC_ONE
 #define __HAVE_ARCH_PMD_FREE
+#define __HAVE_ARCH_PTE_ALLOC_ONE
+#define __HAVE_ARCH_PTE_FREE
 #include <asm-generic/pgalloc.h>
 
 #define PGD_SIZE	(PTRS_PER_PGD * sizeof(pgd_t))
 
+pgtable_t pte_alloc_one(struct mm_struct *mm);
+void pte_free(struct mm_struct *mm, struct page *pte_page);
+
 #if CONFIG_PGTABLE_LEVELS > 2
 
 pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr);
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 0f54fbb59bba..e69a44160cce 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -75,6 +75,8 @@ static inline void tlb_flush(struct mmu_gather *tlb)
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 				  unsigned long addr)
 {
+	if (page_tables_are_ro())
+		set_pgtable_rw(page_address(pte));
 	pgtable_pte_page_dtor(pte);
 	tlb_remove_table(tlb, pte);
 }
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index e55d91a5f1ed..949846654797 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1686,3 +1686,26 @@ void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 	free_page((u64)pmd);
 }
 #endif
+
+pgtable_t pte_alloc_one(struct mm_struct *mm)
+{
+	pgtable_t pgt = __pte_alloc_one(mm, GFP_PGTABLE_USER);
+
+	VM_BUG_ON(mm == &init_mm);
+
+	if (!pgt)
+		return NULL;
+	if (page_tables_are_ro())
+		set_pgtable_ro(page_address(pgt));
+	return pgt;
+}
+
+void pte_free(struct mm_struct *mm, struct page *pte_page)
+{
+	VM_BUG_ON(mm == &init_mm);
+
+	if (page_tables_are_ro())
+		set_pgtable_rw(page_address(pte_page));
+	pgtable_pte_page_dtor(pte_page);
+	__free_page(pte_page);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 08/12] arm64: mm: remap kernel PTE level page tables r/o in the linear region
  2022-01-26 17:29 [RFC PATCH 00/12] arm64: implement read-only page tables Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2022-01-26 17:30 ` [RFC PATCH 07/12] arm64: mm: remap PTE level user page tables r/o in the " Ard Biesheuvel
@ 2022-01-26 17:30 ` Ard Biesheuvel
  2022-01-26 17:30 ` [RFC PATCH 09/12] arm64: mm: remap kernel page tables read-only at end of init Ard Biesheuvel
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2022-01-26 17:30 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: kvmarm, linux-hardening, Ard Biesheuvel, Will Deacon,
	Marc Zyngier, Fuad Tabba, Quentin Perret, Mark Rutland,
	James Morse, Catalin Marinas

Now that all kernel page table manipulations are routed through the
fixmap API if r/o page tables are enabled, we can remove write access
from the linear mapping of those pages.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/pgalloc.h |  6 +++++
 arch/arm64/mm/mmu.c              | 24 +++++++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 18a5bb0c9ee4..073482634e74 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -20,6 +20,9 @@
 #define __HAVE_ARCH_PMD_FREE
 #define __HAVE_ARCH_PTE_ALLOC_ONE
 #define __HAVE_ARCH_PTE_FREE
+#define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
+#define __HAVE_ARCH_PTE_FREE_KERNEL
+
 #include <asm-generic/pgalloc.h>
 
 #define PGD_SIZE	(PTRS_PER_PGD * sizeof(pgd_t))
@@ -27,6 +30,9 @@
 pgtable_t pte_alloc_one(struct mm_struct *mm);
 void pte_free(struct mm_struct *mm, struct page *pte_page);
 
+pte_t *pte_alloc_one_kernel(struct mm_struct *mm);
+void pte_free_kernel(struct mm_struct *mm, pte_t *pte);
+
 #if CONFIG_PGTABLE_LEVELS > 2
 
 pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 949846654797..971501535757 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1402,7 +1402,7 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)
 	table = pte_offset_kernel(pmdp, addr);
 	pmd_clear(pmdp);
 	__flush_tlb_kernel_pgtable(addr);
-	pte_free_kernel(NULL, table);
+	pte_free_kernel(&init_mm, table);
 	return 1;
 }
 
@@ -1709,3 +1709,25 @@ void pte_free(struct mm_struct *mm, struct page *pte_page)
 	pgtable_pte_page_dtor(pte_page);
 	__free_page(pte_page);
 }
+
+pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
+{
+	pte_t *pte = __pte_alloc_one_kernel(mm);
+
+	VM_BUG_ON(mm != &init_mm);
+
+	if (!pte)
+		return NULL;
+	if (page_tables_are_ro())
+		set_pgtable_ro(pte);
+	return pte;
+}
+
+void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
+{
+	VM_BUG_ON(mm != &init_mm);
+
+	if (page_tables_are_ro())
+		set_pgtable_rw(pte);
+	free_page((u64)pte);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 09/12] arm64: mm: remap kernel page tables read-only at end of init
  2022-01-26 17:29 [RFC PATCH 00/12] arm64: implement read-only page tables Ard Biesheuvel
                   ` (7 preceding siblings ...)
  2022-01-26 17:30 ` [RFC PATCH 08/12] arm64: mm: remap kernel PTE level " Ard Biesheuvel
@ 2022-01-26 17:30 ` Ard Biesheuvel
  2022-01-26 17:30 ` [RFC PATCH 10/12] mm: add default definition of p4d_index() Ard Biesheuvel
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2022-01-26 17:30 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: kvmarm, linux-hardening, Ard Biesheuvel, Will Deacon,
	Marc Zyngier, Fuad Tabba, Quentin Perret, Mark Rutland,
	James Morse, Catalin Marinas

Now that all the handling is in place to deal with read-only page tables
at runtime, do a pass over the kernel page tables at boot to remap all
the page table pages read-only that were allocated early.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/mm/mmu.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 971501535757..b1212f6d48f2 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -559,8 +559,23 @@ static void __init map_mem(pgd_t *pgdp)
 	memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
 }
 
+static void mark_pgtables_ro(const pmd_t *pmdp, int level, int num_entries)
+{
+	while (num_entries--) {
+		if (pmd_valid(*pmdp) && pmd_table(*pmdp)) {
+			pmd_t *next = __va(__pmd_to_phys(*pmdp));
+
+			if (level < 2)
+				mark_pgtables_ro(next, level + 1, PTRS_PER_PMD);
+			set_pgtable_ro(next);
+		}
+		pmdp++;
+	}
+}
+
 void mark_rodata_ro(void)
 {
+	int pgd_level = 4 - CONFIG_PGTABLE_LEVELS;
 	unsigned long section_size;
 
 	/*
@@ -571,6 +586,11 @@ void mark_rodata_ro(void)
 	update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
 			    section_size, PAGE_KERNEL_RO);
 
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+	mark_pgtables_ro((pmd_t *)&tramp_pg_dir, pgd_level, PTRS_PER_PGD);
+#endif
+	mark_pgtables_ro((pmd_t *)&swapper_pg_dir, pgd_level, PTRS_PER_PGD);
+
 	debug_checkwx();
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 10/12] mm: add default definition of p4d_index()
  2022-01-26 17:29 [RFC PATCH 00/12] arm64: implement read-only page tables Ard Biesheuvel
                   ` (8 preceding siblings ...)
  2022-01-26 17:30 ` [RFC PATCH 09/12] arm64: mm: remap kernel page tables read-only at end of init Ard Biesheuvel
@ 2022-01-26 17:30 ` Ard Biesheuvel
  2022-01-26 17:30 ` [RFC PATCH 11/12] arm64: efi: use set_pte_at() not set_pte() in order to pass mm pointer Ard Biesheuvel
  2022-01-26 17:30 ` [RFC PATCH 12/12] arm64: hugetlb: use set_pte_at() not set_pte() to provide " Ard Biesheuvel
  11 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2022-01-26 17:30 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: kvmarm, linux-hardening, Ard Biesheuvel, Will Deacon,
	Marc Zyngier, Fuad Tabba, Quentin Perret, Mark Rutland,
	James Morse, Catalin Marinas

Implement a default version of p4d_index() similar to how pud/pmd_index
are defined.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 include/linux/pgtable.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index bc8713a76e03..e8aacf6ea207 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -79,6 +79,14 @@ static inline unsigned long pud_index(unsigned long address)
 #define pud_index pud_index
 #endif
 
+#ifndef p4d_index
+static inline unsigned long p4d_index(unsigned long address)
+{
+	return (address >> P4D_SHIFT) & (PTRS_PER_P4D - 1);
+}
+#define p4d_index p4d_index
+#endif
+
 #ifndef pgd_index
 /* Must be a compile-time constant, so implement it as a macro */
 #define pgd_index(a)  (((a) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 11/12] arm64: efi: use set_pte_at() not set_pte() in order to pass mm pointer
  2022-01-26 17:29 [RFC PATCH 00/12] arm64: implement read-only page tables Ard Biesheuvel
                   ` (9 preceding siblings ...)
  2022-01-26 17:30 ` [RFC PATCH 10/12] mm: add default definition of p4d_index() Ard Biesheuvel
@ 2022-01-26 17:30 ` Ard Biesheuvel
  2022-01-26 17:30 ` [RFC PATCH 12/12] arm64: hugetlb: use set_pte_at() not set_pte() to provide " Ard Biesheuvel
  11 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2022-01-26 17:30 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: kvmarm, linux-hardening, Ard Biesheuvel, Will Deacon,
	Marc Zyngier, Fuad Tabba, Quentin Perret, Mark Rutland,
	James Morse, Catalin Marinas

The set_pte() helper does not carry the struct mm pointer, which makes
it difficult for the implementation to reason about the context in which
the set_pte() call is taking place. So switch to set_pte_at() instead.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/efi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index e1be6c429810..e3e50adfae18 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -92,7 +92,7 @@ static int __init set_permissions(pte_t *ptep, unsigned long addr, void *data)
 		pte = set_pte_bit(pte, __pgprot(PTE_RDONLY));
 	if (md->attribute & EFI_MEMORY_XP)
 		pte = set_pte_bit(pte, __pgprot(PTE_PXN));
-	set_pte(ptep, pte);
+	set_pte_at(&efi_mm, addr, ptep, pte);
 	return 0;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 12/12] arm64: hugetlb: use set_pte_at() not set_pte() to provide mm pointer
  2022-01-26 17:29 [RFC PATCH 00/12] arm64: implement read-only page tables Ard Biesheuvel
                   ` (10 preceding siblings ...)
  2022-01-26 17:30 ` [RFC PATCH 11/12] arm64: efi: use set_pte_at() not set_pte() in order to pass mm pointer Ard Biesheuvel
@ 2022-01-26 17:30 ` Ard Biesheuvel
  11 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2022-01-26 17:30 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: kvmarm, linux-hardening, Ard Biesheuvel, Will Deacon,
	Marc Zyngier, Fuad Tabba, Quentin Perret, Mark Rutland,
	James Morse, Catalin Marinas

Switch to set_pte_at() so we can provide the mm pointer to the code that
performs the page table update.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/mm/hugetlbpage.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index ffb9c229610a..099b28b00f4c 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -252,8 +252,8 @@ void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
 
 	ncontig = num_contig_ptes(sz, &pgsize);
 
-	for (i = 0; i < ncontig; i++, ptep++)
-		set_pte(ptep, pte);
+	for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
+		set_pte_at(mm, addr, ptep, pte);
 }
 
 pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 03/12] arm64: mm: use a fixmap slot for user page table modifications
  2022-01-26 17:30 ` [RFC PATCH 03/12] arm64: mm: use a fixmap slot for user page table modifications Ard Biesheuvel
@ 2022-01-28 16:08   ` Steven Price
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Price @ 2022-01-28 16:08 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: kvmarm, linux-hardening, Will Deacon, Marc Zyngier, Fuad Tabba,
	Quentin Perret, Mark Rutland, James Morse, Catalin Marinas

Hi Ard,

Interesting series - I attempted[1] something similar a few years ago,
but only dealing with the page tables in the linear map.

At first glace the series looks like it should work, but this patch
caught my eye because there's only a single fixmap slot for page table
modifications. The upshot of which is you need the patch_pte_lock
serialising multiple CPUs' access to the page tables. Which looks like
it would hurt badly as the number of CPUs grows.

Do you have any benchmarks of the performance? I've lost any I did at
the time I previously looked at this idea but obviously thought it was
important at the time.

Thanks,

Steve

[1]
https://lore.kernel.org/lkml/20200417152619.41680-1-steven.price@arm.com/

On 26/01/2022 17:30, Ard Biesheuvel wrote:
> To prepare for user and kernel page tables being remapped read-only in
> the linear region, define a new fixmap slot and use it to apply all page
> table descriptor updates that target page tables other than swapper.
> 
> Fortunately for us, the fixmap descriptors themselves are always
> manipulated via their kernel mapping in .bss, so there is no special
> exception required to avoid circular logic here.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/Kconfig               |  11 +++
>  arch/arm64/include/asm/fixmap.h  |   1 +
>  arch/arm64/include/asm/pgalloc.h |  28 +++++-
>  arch/arm64/include/asm/pgtable.h |  79 +++++++++++++---
>  arch/arm64/mm/Makefile           |   2 +
>  arch/arm64/mm/fault.c            |   8 +-
>  arch/arm64/mm/ro_page_tables.c   | 100 ++++++++++++++++++++
>  7 files changed, 209 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 6978140edfa4..a3e98286b074 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1311,6 +1311,17 @@ config RODATA_FULL_DEFAULT_ENABLED
>  	  This requires the linear region to be mapped down to pages,
>  	  which may adversely affect performance in some cases.
>  
> +config ARM64_RO_PAGE_TABLES
> +	bool "Remap page tables read-only in the kernel VA space"
> +	select RODATA_FULL_DEFAULT_ENABLED
> +	help
> +	  Remap linear mappings of page table pages read-only as long as they
> +	  are being used as such, and use a fixmap API to manipulate all page
> +	  table descriptors, instead of manipulating them directly via their
> +	  writable mappings in the direct map. This is intended as a debug
> +	  and/or hardening feature, as it removes the ability for stray writes
> +	  to be exploited to bypass permission restrictions.
> +
>  config ARM64_SW_TTBR0_PAN
>  	bool "Emulate Privileged Access Never using TTBR0_EL1 switching"
>  	help
> diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
> index 4335800201c9..71dfbe0452bb 100644
> --- a/arch/arm64/include/asm/fixmap.h
> +++ b/arch/arm64/include/asm/fixmap.h
> @@ -50,6 +50,7 @@ enum fixed_addresses {
>  
>  	FIX_EARLYCON_MEM_BASE,
>  	FIX_TEXT_POKE0,
> +	FIX_TEXT_POKE_PTE,
>  
>  #ifdef CONFIG_ACPI_APEI_GHES
>  	/* Used for GHES mapping from assorted contexts */
> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> index 237224484d0f..d54ac9f8d6c7 100644
> --- a/arch/arm64/include/asm/pgalloc.h
> +++ b/arch/arm64/include/asm/pgalloc.h
> @@ -30,7 +30,11 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp)
>  	pudval_t pudval = PUD_TYPE_TABLE;
>  
>  	pudval |= (mm == &init_mm) ? PUD_TABLE_UXN : PUD_TABLE_PXN;
> -	__pud_populate(pudp, __pa(pmdp), pudval);
> +	if (page_tables_are_ro())
> +		xchg_ro_pte(mm, (pte_t *)pudp,
> +			    __pte(__phys_to_pud_val(__pa(pmdp) | pudval)));
> +	else
> +		__pud_populate(pudp, __pa(pmdp), pudval);
>  }
>  #else
>  static inline void __pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot)
> @@ -51,7 +55,11 @@ static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
>  	p4dval_t p4dval = P4D_TYPE_TABLE;
>  
>  	p4dval |= (mm == &init_mm) ? P4D_TABLE_UXN : P4D_TABLE_PXN;
> -	__p4d_populate(p4dp, __pa(pudp), p4dval);
> +	if (page_tables_are_ro())
> +		xchg_ro_pte(mm, (pte_t *)p4dp,
> +			    __pte(__phys_to_p4d_val(__pa(pudp) | p4dval)));
> +	else
> +		__p4d_populate(p4dp, __pa(pudp), p4dval);
>  }
>  #else
>  static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
> @@ -76,15 +84,27 @@ static inline void __pmd_populate(pmd_t *pmdp, phys_addr_t ptep,
>  static inline void
>  pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp, pte_t *ptep)
>  {
> +	pmdval_t pmdval = PMD_TYPE_TABLE | PMD_TABLE_UXN;
> +
>  	VM_BUG_ON(mm && mm != &init_mm);
> -	__pmd_populate(pmdp, __pa(ptep), PMD_TYPE_TABLE | PMD_TABLE_UXN);
> +	if (page_tables_are_ro())
> +		xchg_ro_pte(mm, (pte_t *)pmdp,
> +			    __pte(__phys_to_pmd_val(__pa(ptep) | pmdval)));
> +	else
> +		__pmd_populate(pmdp, __pa(ptep), pmdval);
>  }
>  
>  static inline void
>  pmd_populate(struct mm_struct *mm, pmd_t *pmdp, pgtable_t ptep)
>  {
> +	pmdval_t pmdval = PMD_TYPE_TABLE | PMD_TABLE_PXN;
> +
>  	VM_BUG_ON(mm == &init_mm);
> -	__pmd_populate(pmdp, page_to_phys(ptep), PMD_TYPE_TABLE | PMD_TABLE_PXN);
> +	if (page_tables_are_ro())
> +		xchg_ro_pte(mm, (pte_t *)pmdp,
> +			    __pte(__phys_to_pmd_val(page_to_phys(ptep) | pmdval)));
> +	else
> +		__pmd_populate(pmdp, page_to_phys(ptep), pmdval);
>  }
>  
>  #endif
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 8d3806c68687..a8daea6b4ac9 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -30,6 +30,7 @@
>  
>  #include <asm/cmpxchg.h>
>  #include <asm/fixmap.h>
> +#include <linux/jump_label.h>
>  #include <linux/mmdebug.h>
>  #include <linux/mm_types.h>
>  #include <linux/sched.h>
> @@ -37,6 +38,17 @@
>  int set_pgtable_ro(void *addr);
>  int set_pgtable_rw(void *addr);
>  
> +DECLARE_STATIC_KEY_FALSE(ro_page_tables);
> +
> +static inline bool page_tables_are_ro(void)
> +{
> +	return IS_ENABLED(CONFIG_ARM64_RO_PAGE_TABLES) &&
> +	       static_branch_unlikely(&ro_page_tables);
> +}
> +
> +pte_t xchg_ro_pte(struct mm_struct *mm, pte_t *ptep, pte_t pte);
> +pte_t cmpxchg_ro_pte(struct mm_struct *mm, pte_t *ptep, pte_t old, pte_t new);
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
>  
> @@ -89,7 +101,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
>  	__pte(__phys_to_pte_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
>  
>  #define pte_none(pte)		(!pte_val(pte))
> -#define pte_clear(mm,addr,ptep)	set_pte(ptep, __pte(0))
> +#define pte_clear(mm,addr,ptep)	set_pte_at(mm, addr, ptep, __pte(0))
>  #define pte_page(pte)		(pfn_to_page(pte_pfn(pte)))
>  
>  /*
> @@ -257,7 +269,10 @@ static inline pte_t pte_mkdevmap(pte_t pte)
>  
>  static inline void set_pte(pte_t *ptep, pte_t pte)
>  {
> -	WRITE_ONCE(*ptep, pte);
> +	if (page_tables_are_ro())
> +		xchg_ro_pte(&init_mm, ptep, pte);
> +	else
> +		WRITE_ONCE(*ptep, pte);
>  
>  	/*
>  	 * Only if the new pte is valid and kernel, otherwise TLB maintenance
> @@ -343,7 +358,10 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>  
>  	__check_racy_pte_update(mm, ptep, pte);
>  
> -	set_pte(ptep, pte);
> +	if (page_tables_are_ro())
> +		xchg_ro_pte(mm, ptep, pte);
> +	else
> +		set_pte(ptep, pte);
>  }
>  
>  /*
> @@ -579,7 +597,10 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
>  	}
>  #endif /* __PAGETABLE_PMD_FOLDED */
>  
> -	WRITE_ONCE(*pmdp, pmd);
> +	if (page_tables_are_ro())
> +		xchg_ro_pte(&init_mm, (pte_t *)pmdp, pmd_pte(pmd));
> +	else
> +		WRITE_ONCE(*pmdp, pmd);
>  
>  	if (pmd_valid(pmd)) {
>  		dsb(ishst);
> @@ -589,7 +610,10 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
>  
>  static inline void pmd_clear(pmd_t *pmdp)
>  {
> -	set_pmd(pmdp, __pmd(0));
> +	if (page_tables_are_ro())
> +		xchg_ro_pte(NULL, (pte_t *)pmdp, __pte(0));
> +	else
> +		set_pmd(pmdp, __pmd(0));
>  }
>  
>  static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
> @@ -640,7 +664,10 @@ static inline void set_pud(pud_t *pudp, pud_t pud)
>  	}
>  #endif /* __PAGETABLE_PUD_FOLDED */
>  
> -	WRITE_ONCE(*pudp, pud);
> +	if (page_tables_are_ro())
> +		xchg_ro_pte(&init_mm, (pte_t *)pudp, pud_pte(pud));
> +	else
> +		WRITE_ONCE(*pudp, pud);
>  
>  	if (pud_valid(pud)) {
>  		dsb(ishst);
> @@ -650,7 +677,10 @@ static inline void set_pud(pud_t *pudp, pud_t pud)
>  
>  static inline void pud_clear(pud_t *pudp)
>  {
> -	set_pud(pudp, __pud(0));
> +	if (page_tables_are_ro())
> +		xchg_ro_pte(NULL, (pte_t *)pudp, __pte(0));
> +	else
> +		set_pud(pudp, __pud(0));
>  }
>  
>  static inline phys_addr_t pud_page_paddr(pud_t pud)
> @@ -704,14 +734,20 @@ static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
>  		return;
>  	}
>  
> -	WRITE_ONCE(*p4dp, p4d);
> +	if (page_tables_are_ro())
> +		xchg_ro_pte(&init_mm, (pte_t *)p4dp, p4d_pte(p4d));
> +	else
> +		WRITE_ONCE(*p4dp, p4d);
>  	dsb(ishst);
>  	isb();
>  }
>  
>  static inline void p4d_clear(p4d_t *p4dp)
>  {
> -	set_p4d(p4dp, __p4d(0));
> +	if (page_tables_are_ro())
> +		xchg_ro_pte(NULL, (pte_t *)p4dp, __pte(0));
> +	else
> +		set_p4d(p4dp, __p4d(0));
>  }
>  
>  static inline phys_addr_t p4d_page_paddr(p4d_t p4d)
> @@ -806,7 +842,7 @@ static inline int pgd_devmap(pgd_t pgd)
>   * Atomic pte/pmd modifications.
>   */
>  #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
> -static inline int __ptep_test_and_clear_young(pte_t *ptep)
> +static inline int __ptep_test_and_clear_young(struct mm_struct *mm, pte_t *ptep)
>  {
>  	pte_t old_pte, pte;
>  
> @@ -814,8 +850,13 @@ static inline int __ptep_test_and_clear_young(pte_t *ptep)
>  	do {
>  		old_pte = pte;
>  		pte = pte_mkold(pte);
> -		pte_val(pte) = cmpxchg_relaxed(&pte_val(*ptep),
> -					       pte_val(old_pte), pte_val(pte));
> +
> +		if (page_tables_are_ro())
> +			pte = cmpxchg_ro_pte(mm, ptep, old_pte, pte);
> +		else
> +			pte_val(pte) = cmpxchg_relaxed(&pte_val(*ptep),
> +						       pte_val(old_pte),
> +						       pte_val(pte));
>  	} while (pte_val(pte) != pte_val(old_pte));
>  
>  	return pte_young(pte);
> @@ -825,7 +866,7 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
>  					    unsigned long address,
>  					    pte_t *ptep)
>  {
> -	return __ptep_test_and_clear_young(ptep);
> +	return __ptep_test_and_clear_young(vma->vm_mm, ptep);
>  }
>  
>  #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
> @@ -863,6 +904,8 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma,
>  static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
>  				       unsigned long address, pte_t *ptep)
>  {
> +	if (page_tables_are_ro())
> +		return xchg_ro_pte(mm, ptep, __pte(0));
>  	return __pte(xchg_relaxed(&pte_val(*ptep), 0));
>  }
>  
> @@ -888,8 +931,12 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres
>  	do {
>  		old_pte = pte;
>  		pte = pte_wrprotect(pte);
> -		pte_val(pte) = cmpxchg_relaxed(&pte_val(*ptep),
> -					       pte_val(old_pte), pte_val(pte));
> +		if (page_tables_are_ro())
> +			pte = cmpxchg_ro_pte(mm, ptep, old_pte, pte);
> +		else
> +			pte_val(pte) = cmpxchg_relaxed(&pte_val(*ptep),
> +						       pte_val(old_pte),
> +						       pte_val(pte));
>  	} while (pte_val(pte) != pte_val(old_pte));
>  }
>  
> @@ -905,6 +952,8 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
>  static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
>  		unsigned long address, pmd_t *pmdp, pmd_t pmd)
>  {
> +	if (page_tables_are_ro())
> +		return pte_pmd(xchg_ro_pte(vma->vm_mm, (pte_t *)pmdp, pmd_pte(pmd)));
>  	return __pmd(xchg_relaxed(&pmd_val(*pmdp), pmd_val(pmd)));
>  }
>  #endif
> diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
> index ff1e800ba7a1..7750cafd969a 100644
> --- a/arch/arm64/mm/Makefile
> +++ b/arch/arm64/mm/Makefile
> @@ -14,3 +14,5 @@ KASAN_SANITIZE_physaddr.o	+= n
>  
>  obj-$(CONFIG_KASAN)		+= kasan_init.o
>  KASAN_SANITIZE_kasan_init.o	:= n
> +
> +obj-$(CONFIG_ARM64_RO_PAGE_TABLES) += ro_page_tables.o
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 77341b160aca..5a5055c3e1c2 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -220,7 +220,13 @@ int ptep_set_access_flags(struct vm_area_struct *vma,
>  		pteval ^= PTE_RDONLY;
>  		pteval |= pte_val(entry);
>  		pteval ^= PTE_RDONLY;
> -		pteval = cmpxchg_relaxed(&pte_val(*ptep), old_pteval, pteval);
> +		if (page_tables_are_ro())
> +			pteval = pte_val(cmpxchg_ro_pte(vma->vm_mm, ptep,
> +							__pte(old_pteval),
> +							__pte(pteval)));
> +		else
> +			pteval = cmpxchg_relaxed(&pte_val(*ptep), old_pteval,
> +						 pteval);
>  	} while (pteval != old_pteval);
>  
>  	/* Invalidate a stale read-only entry */
> diff --git a/arch/arm64/mm/ro_page_tables.c b/arch/arm64/mm/ro_page_tables.c
> new file mode 100644
> index 000000000000..f497adfd774d
> --- /dev/null
> +++ b/arch/arm64/mm/ro_page_tables.c
> @@ -0,0 +1,100 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2021 - Google Inc
> + * Author: Ard Biesheuvel <ardb@google.com>
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/init.h>
> +#include <linux/memory.h>
> +#include <linux/mm.h>
> +#include <linux/sizes.h>
> +
> +#include <asm/fixmap.h>
> +#include <asm/kernel-pgtable.h>
> +#include <asm/mmu_context.h>
> +#include <asm/pgalloc.h>
> +#include <asm/tlbflush.h>
> +#include <asm/sections.h>
> +
> +static DEFINE_RAW_SPINLOCK(patch_pte_lock);
> +
> +DEFINE_STATIC_KEY_FALSE(ro_page_tables);
> +
> +static bool __initdata ro_page_tables_enabled = true;
> +
> +static int __init parse_ro_page_tables(char *arg)
> +{
> +	return strtobool(arg, &ro_page_tables_enabled);
> +}
> +early_param("ro_page_tables", parse_ro_page_tables);
> +
> +static bool in_kernel_text_or_rodata(phys_addr_t pa)
> +{
> +	/*
> +	 * This is a minimal check to ensure that the r/o page table patching
> +	 * API is not being abused to make changes to the kernel text. This
> +	 * should ideally cover module and BPF text/rodata as well, but that
> +	 * is less straight-forward and hence more costly.
> +	 */
> +	return pa >= __pa_symbol(_stext) && pa < __pa_symbol(__init_begin);
> +}
> +
> +pte_t xchg_ro_pte(struct mm_struct *mm, pte_t *ptep, pte_t pte)
> +{
> +	unsigned long flags;
> +	u64 pte_pa;
> +	pte_t ret;
> +	pte_t *p;
> +
> +	/* can we use __pa() on ptep? */
> +	if (!virt_addr_valid(ptep)) {
> +		/* only linear aliases are remapped r/o anyway */
> +		pte_val(ret) = xchg_relaxed(&pte_val(*ptep), pte_val(pte));
> +		return ret;
> +	}
> +
> +	pte_pa = __pa(ptep);
> +	BUG_ON(in_kernel_text_or_rodata(pte_pa));
> +
> +	raw_spin_lock_irqsave(&patch_pte_lock, flags);
> +	p = (pte_t *)set_fixmap_offset(FIX_TEXT_POKE_PTE, pte_pa);
> +	pte_val(ret) = xchg_relaxed(&pte_val(*p), pte_val(pte));
> +	clear_fixmap(FIX_TEXT_POKE_PTE);
> +	raw_spin_unlock_irqrestore(&patch_pte_lock, flags);
> +	return ret;
> +}
> +
> +pte_t cmpxchg_ro_pte(struct mm_struct *mm, pte_t *ptep, pte_t old, pte_t new)
> +{
> +	unsigned long flags;
> +	u64 pte_pa;
> +	pte_t ret;
> +	pte_t *p;
> +
> +	BUG_ON(!virt_addr_valid(ptep));
> +
> +	pte_pa = __pa(ptep);
> +	BUG_ON(in_kernel_text_or_rodata(pte_pa));
> +
> +	raw_spin_lock_irqsave(&patch_pte_lock, flags);
> +	p = (pte_t *)set_fixmap_offset(FIX_TEXT_POKE_PTE, pte_pa);
> +	pte_val(ret) = cmpxchg_relaxed(&pte_val(*p), pte_val(old), pte_val(new));
> +	clear_fixmap(FIX_TEXT_POKE_PTE);
> +	raw_spin_unlock_irqrestore(&patch_pte_lock, flags);
> +	return ret;
> +}
> +
> +static int __init ro_page_tables_init(void)
> +{
> +	if (ro_page_tables_enabled) {
> +		if (!rodata_full) {
> +			pr_err("Failed to enable R/O page table protection, rodata=full is not enabled\n");
> +		} else {
> +			pr_err("Enabling R/O page table protection\n");
> +			static_branch_enable(&ro_page_tables);
> +		}
> +	}
> +	return 0;
> +}
> +early_initcall(ro_page_tables_init);


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-01-28 16:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-26 17:29 [RFC PATCH 00/12] arm64: implement read-only page tables Ard Biesheuvel
2022-01-26 17:30 ` [RFC PATCH 01/12] asm-generic/pgalloc: allow arch to override PMD alloc/free routines Ard Biesheuvel
2022-01-26 17:30 ` [RFC PATCH 02/12] arm64: mm: add helpers to remap page tables read-only/read-write Ard Biesheuvel
2022-01-26 17:30 ` [RFC PATCH 03/12] arm64: mm: use a fixmap slot for user page table modifications Ard Biesheuvel
2022-01-28 16:08   ` Steven Price
2022-01-26 17:30 ` [RFC PATCH 04/12] arm64: mm: remap PGD pages r/o in the linear region after allocation Ard Biesheuvel
2022-01-26 17:30 ` [RFC PATCH 05/12] arm64: mm: remap PUD pages r/o in linear region Ard Biesheuvel
2022-01-26 17:30 ` [RFC PATCH 06/12] arm64: mm: remap PMD " Ard Biesheuvel
2022-01-26 17:30 ` [RFC PATCH 07/12] arm64: mm: remap PTE level user page tables r/o in the " Ard Biesheuvel
2022-01-26 17:30 ` [RFC PATCH 08/12] arm64: mm: remap kernel PTE level " Ard Biesheuvel
2022-01-26 17:30 ` [RFC PATCH 09/12] arm64: mm: remap kernel page tables read-only at end of init Ard Biesheuvel
2022-01-26 17:30 ` [RFC PATCH 10/12] mm: add default definition of p4d_index() Ard Biesheuvel
2022-01-26 17:30 ` [RFC PATCH 11/12] arm64: efi: use set_pte_at() not set_pte() in order to pass mm pointer Ard Biesheuvel
2022-01-26 17:30 ` [RFC PATCH 12/12] arm64: hugetlb: use set_pte_at() not set_pte() to provide " Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).