All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel
@ 2017-03-04 14:30 ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, mark.rutland,
	catalin.marinas, will.deacon, labbott
  Cc: keescook, marc.zyngier, andre.przywara, Ard Biesheuvel, kvmarm

Having memory that is writable and executable at the same time is a
security hazard, and so we tend to avoid those when we can. However,
at boot time, we keep .text mapped writable during the entire init
phase, and the init region itself is mapped rwx as well.

Let's improve the situation by:
- making the alternatives patching use the linear mapping
- splitting the init region into separate text and data regions

This removes all RWX mappings except the really early one created
in head.S (which we could perhaps fix in the future as well)

Changes since v3:
- use linear alias only when patching the core kernel, and not for modules
- add patch to reintroduce the use of PTE_CONT for kernel mappings, except
  for regions that are remapped read-only later on (i.e, .rodata and the
  linear alias of .text+.rodata)

Changes since v2:
  - ensure that text mappings remain writable under rodata=off
  - rename create_mapping_late() to update_mapping_prot()
  - clarify commit log of #2
  - add acks

Changes since v1:
- add patch to move TLB maintenance into create_mapping_late() and remove it
  from its callers (#2)
- use the true address not the linear alias when patching branch instructions,
  spotted by Suzuki (#3)
- mark mark_linear_text_alias_ro() __init (#3)
- move the .rela section back into __initdata: as it turns out, leaving a hole
  between the segments results in a peculiar situation where other unrelated
  allocations end up right in the middle of the kernel Image, which is
  probably a bad idea (#5). See below for an example.
- add acks


Ard Biesheuvel (6):
  arm: kvm: move kvm_vgic_global_state out of .text section
  arm64: mmu: move TLB maintenance from callers to create_mapping_late()
  arm64: alternatives: apply boot time fixups via the linear mapping
  arm64: mmu: map .text as read-only from the outset
  arm64: mmu: apply strict permissions to .init.text and .init.data
  arm64: mm: set the contiguous bit for kernel mappings where
    appropriate

 arch/arm64/include/asm/mmu.h      |   1 +
 arch/arm64/include/asm/sections.h |   3 +-
 arch/arm64/kernel/alternative.c   |  11 +-
 arch/arm64/kernel/smp.c           |   1 +
 arch/arm64/kernel/vmlinux.lds.S   |  25 ++--
 arch/arm64/mm/mmu.c               | 139 ++++++++++++++------
 virt/kvm/arm/vgic/vgic.c          |   4 +-
 7 files changed, 129 insertions(+), 55 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 0/6] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel
@ 2017-03-04 14:30 ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel

Having memory that is writable and executable at the same time is a
security hazard, and so we tend to avoid those when we can. However,
at boot time, we keep .text mapped writable during the entire init
phase, and the init region itself is mapped rwx as well.

Let's improve the situation by:
- making the alternatives patching use the linear mapping
- splitting the init region into separate text and data regions

This removes all RWX mappings except the really early one created
in head.S (which we could perhaps fix in the future as well)

Changes since v3:
- use linear alias only when patching the core kernel, and not for modules
- add patch to reintroduce the use of PTE_CONT for kernel mappings, except
  for regions that are remapped read-only later on (i.e, .rodata and the
  linear alias of .text+.rodata)

Changes since v2:
  - ensure that text mappings remain writable under rodata=off
  - rename create_mapping_late() to update_mapping_prot()
  - clarify commit log of #2
  - add acks

Changes since v1:
- add patch to move TLB maintenance into create_mapping_late() and remove it
  from its callers (#2)
- use the true address not the linear alias when patching branch instructions,
  spotted by Suzuki (#3)
- mark mark_linear_text_alias_ro() __init (#3)
- move the .rela section back into __initdata: as it turns out, leaving a hole
  between the segments results in a peculiar situation where other unrelated
  allocations end up right in the middle of the kernel Image, which is
  probably a bad idea (#5). See below for an example.
- add acks


Ard Biesheuvel (6):
  arm: kvm: move kvm_vgic_global_state out of .text section
  arm64: mmu: move TLB maintenance from callers to create_mapping_late()
  arm64: alternatives: apply boot time fixups via the linear mapping
  arm64: mmu: map .text as read-only from the outset
  arm64: mmu: apply strict permissions to .init.text and .init.data
  arm64: mm: set the contiguous bit for kernel mappings where
    appropriate

 arch/arm64/include/asm/mmu.h      |   1 +
 arch/arm64/include/asm/sections.h |   3 +-
 arch/arm64/kernel/alternative.c   |  11 +-
 arch/arm64/kernel/smp.c           |   1 +
 arch/arm64/kernel/vmlinux.lds.S   |  25 ++--
 arch/arm64/mm/mmu.c               | 139 ++++++++++++++------
 virt/kvm/arm/vgic/vgic.c          |   4 +-
 7 files changed, 129 insertions(+), 55 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [kernel-hardening] [PATCH v4 0/6] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel
@ 2017-03-04 14:30 ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, mark.rutland,
	catalin.marinas, will.deacon, labbott
  Cc: kvmarm, marc.zyngier, keescook, andre.przywara, james.morse,
	suzuki.poulose, Ard Biesheuvel

Having memory that is writable and executable at the same time is a
security hazard, and so we tend to avoid those when we can. However,
at boot time, we keep .text mapped writable during the entire init
phase, and the init region itself is mapped rwx as well.

Let's improve the situation by:
- making the alternatives patching use the linear mapping
- splitting the init region into separate text and data regions

This removes all RWX mappings except the really early one created
in head.S (which we could perhaps fix in the future as well)

Changes since v3:
- use linear alias only when patching the core kernel, and not for modules
- add patch to reintroduce the use of PTE_CONT for kernel mappings, except
  for regions that are remapped read-only later on (i.e, .rodata and the
  linear alias of .text+.rodata)

Changes since v2:
  - ensure that text mappings remain writable under rodata=off
  - rename create_mapping_late() to update_mapping_prot()
  - clarify commit log of #2
  - add acks

Changes since v1:
- add patch to move TLB maintenance into create_mapping_late() and remove it
  from its callers (#2)
- use the true address not the linear alias when patching branch instructions,
  spotted by Suzuki (#3)
- mark mark_linear_text_alias_ro() __init (#3)
- move the .rela section back into __initdata: as it turns out, leaving a hole
  between the segments results in a peculiar situation where other unrelated
  allocations end up right in the middle of the kernel Image, which is
  probably a bad idea (#5). See below for an example.
- add acks


Ard Biesheuvel (6):
  arm: kvm: move kvm_vgic_global_state out of .text section
  arm64: mmu: move TLB maintenance from callers to create_mapping_late()
  arm64: alternatives: apply boot time fixups via the linear mapping
  arm64: mmu: map .text as read-only from the outset
  arm64: mmu: apply strict permissions to .init.text and .init.data
  arm64: mm: set the contiguous bit for kernel mappings where
    appropriate

 arch/arm64/include/asm/mmu.h      |   1 +
 arch/arm64/include/asm/sections.h |   3 +-
 arch/arm64/kernel/alternative.c   |  11 +-
 arch/arm64/kernel/smp.c           |   1 +
 arch/arm64/kernel/vmlinux.lds.S   |  25 ++--
 arch/arm64/mm/mmu.c               | 139 ++++++++++++++------
 virt/kvm/arm/vgic/vgic.c          |   4 +-
 7 files changed, 129 insertions(+), 55 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 1/6] arm: kvm: move kvm_vgic_global_state out of .text section
  2017-03-04 14:30 ` Ard Biesheuvel
  (?)
@ 2017-03-04 14:30   ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, mark.rutland,
	catalin.marinas, will.deacon, labbott
  Cc: keescook, marc.zyngier, andre.przywara, Ard Biesheuvel, kvmarm

The kvm_vgic_global_state struct contains a static key which is
written to by jump_label_init() at boot time. So in preparation of
making .text regions truly (well, almost truly) read-only, mark
kvm_vgic_global_state __ro_after_init so it moves to the .rodata
section instead.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 virt/kvm/arm/vgic/vgic.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 654dfd40e449..7713d96e85b7 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -29,7 +29,9 @@
 #define DEBUG_SPINLOCK_BUG_ON(p)
 #endif
 
-struct vgic_global __section(.hyp.text) kvm_vgic_global_state = {.gicv3_cpuif = STATIC_KEY_FALSE_INIT,};
+struct vgic_global kvm_vgic_global_state __ro_after_init = {
+	.gicv3_cpuif = STATIC_KEY_FALSE_INIT,
+};
 
 /*
  * Locking order is always:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 1/6] arm: kvm: move kvm_vgic_global_state out of .text section
@ 2017-03-04 14:30   ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel

The kvm_vgic_global_state struct contains a static key which is
written to by jump_label_init() at boot time. So in preparation of
making .text regions truly (well, almost truly) read-only, mark
kvm_vgic_global_state __ro_after_init so it moves to the .rodata
section instead.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 virt/kvm/arm/vgic/vgic.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 654dfd40e449..7713d96e85b7 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -29,7 +29,9 @@
 #define DEBUG_SPINLOCK_BUG_ON(p)
 #endif
 
-struct vgic_global __section(.hyp.text) kvm_vgic_global_state = {.gicv3_cpuif = STATIC_KEY_FALSE_INIT,};
+struct vgic_global kvm_vgic_global_state __ro_after_init = {
+	.gicv3_cpuif = STATIC_KEY_FALSE_INIT,
+};
 
 /*
  * Locking order is always:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [kernel-hardening] [PATCH v4 1/6] arm: kvm: move kvm_vgic_global_state out of .text section
@ 2017-03-04 14:30   ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, mark.rutland,
	catalin.marinas, will.deacon, labbott
  Cc: kvmarm, marc.zyngier, keescook, andre.przywara, james.morse,
	suzuki.poulose, Ard Biesheuvel

The kvm_vgic_global_state struct contains a static key which is
written to by jump_label_init() at boot time. So in preparation of
making .text regions truly (well, almost truly) read-only, mark
kvm_vgic_global_state __ro_after_init so it moves to the .rodata
section instead.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 virt/kvm/arm/vgic/vgic.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 654dfd40e449..7713d96e85b7 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -29,7 +29,9 @@
 #define DEBUG_SPINLOCK_BUG_ON(p)
 #endif
 
-struct vgic_global __section(.hyp.text) kvm_vgic_global_state = {.gicv3_cpuif = STATIC_KEY_FALSE_INIT,};
+struct vgic_global kvm_vgic_global_state __ro_after_init = {
+	.gicv3_cpuif = STATIC_KEY_FALSE_INIT,
+};
 
 /*
  * Locking order is always:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 2/6] arm64: mmu: move TLB maintenance from callers to create_mapping_late()
  2017-03-04 14:30 ` Ard Biesheuvel
  (?)
@ 2017-03-04 14:30   ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, mark.rutland,
	catalin.marinas, will.deacon, labbott
  Cc: keescook, marc.zyngier, andre.przywara, Ard Biesheuvel, kvmarm

In preparation of refactoring the kernel mapping logic so that text regions
are never mapped writable, which would require adding explicit TLB
maintenance to new call sites of create_mapping_late() (which is currently
invoked twice from the same function), move the TLB maintenance from the
call site into create_mapping_late() itself, and change it from a full
TLB flush into a flush by VA, which is more appropriate here.

Also, given that create_mapping_late() has evolved into a routine that only
updates protection bits on existing mappings, rename it to
update_mapping_prot()

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d28dbcf596b6..6cafd8723d1a 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -319,17 +319,20 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			     pgd_pgtable_alloc, page_mappings_only);
 }
 
-static void create_mapping_late(phys_addr_t phys, unsigned long virt,
-				  phys_addr_t size, pgprot_t prot)
+static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
+				phys_addr_t size, pgprot_t prot)
 {
 	if (virt < VMALLOC_START) {
-		pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
+		pr_warn("BUG: not updating mapping for %pa at 0x%016lx - outside kernel range\n",
 			&phys, virt);
 		return;
 	}
 
 	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
 			     NULL, debug_pagealloc_enabled());
+
+	/* flush the TLBs after updating live kernel mappings */
+	flush_tlb_kernel_range(virt, virt + size);
 }
 
 static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
@@ -402,19 +405,16 @@ void mark_rodata_ro(void)
 	unsigned long section_size;
 
 	section_size = (unsigned long)_etext - (unsigned long)_text;
-	create_mapping_late(__pa_symbol(_text), (unsigned long)_text,
+	update_mapping_prot(__pa_symbol(_text), (unsigned long)_text,
 			    section_size, PAGE_KERNEL_ROX);
 	/*
 	 * mark .rodata as read only. Use __init_begin rather than __end_rodata
 	 * to cover NOTES and EXCEPTION_TABLE.
 	 */
 	section_size = (unsigned long)__init_begin - (unsigned long)__start_rodata;
-	create_mapping_late(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
+	update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
 			    section_size, PAGE_KERNEL_RO);
 
-	/* flush the TLBs after updating live kernel mappings */
-	flush_tlb_all();
-
 	debug_checkwx();
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 2/6] arm64: mmu: move TLB maintenance from callers to create_mapping_late()
@ 2017-03-04 14:30   ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel

In preparation of refactoring the kernel mapping logic so that text regions
are never mapped writable, which would require adding explicit TLB
maintenance to new call sites of create_mapping_late() (which is currently
invoked twice from the same function), move the TLB maintenance from the
call site into create_mapping_late() itself, and change it from a full
TLB flush into a flush by VA, which is more appropriate here.

Also, given that create_mapping_late() has evolved into a routine that only
updates protection bits on existing mappings, rename it to
update_mapping_prot()

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d28dbcf596b6..6cafd8723d1a 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -319,17 +319,20 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			     pgd_pgtable_alloc, page_mappings_only);
 }
 
-static void create_mapping_late(phys_addr_t phys, unsigned long virt,
-				  phys_addr_t size, pgprot_t prot)
+static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
+				phys_addr_t size, pgprot_t prot)
 {
 	if (virt < VMALLOC_START) {
-		pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
+		pr_warn("BUG: not updating mapping for %pa@0x%016lx - outside kernel range\n",
 			&phys, virt);
 		return;
 	}
 
 	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
 			     NULL, debug_pagealloc_enabled());
+
+	/* flush the TLBs after updating live kernel mappings */
+	flush_tlb_kernel_range(virt, virt + size);
 }
 
 static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
@@ -402,19 +405,16 @@ void mark_rodata_ro(void)
 	unsigned long section_size;
 
 	section_size = (unsigned long)_etext - (unsigned long)_text;
-	create_mapping_late(__pa_symbol(_text), (unsigned long)_text,
+	update_mapping_prot(__pa_symbol(_text), (unsigned long)_text,
 			    section_size, PAGE_KERNEL_ROX);
 	/*
 	 * mark .rodata as read only. Use __init_begin rather than __end_rodata
 	 * to cover NOTES and EXCEPTION_TABLE.
 	 */
 	section_size = (unsigned long)__init_begin - (unsigned long)__start_rodata;
-	create_mapping_late(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
+	update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
 			    section_size, PAGE_KERNEL_RO);
 
-	/* flush the TLBs after updating live kernel mappings */
-	flush_tlb_all();
-
 	debug_checkwx();
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [kernel-hardening] [PATCH v4 2/6] arm64: mmu: move TLB maintenance from callers to create_mapping_late()
@ 2017-03-04 14:30   ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, mark.rutland,
	catalin.marinas, will.deacon, labbott
  Cc: kvmarm, marc.zyngier, keescook, andre.przywara, james.morse,
	suzuki.poulose, Ard Biesheuvel

In preparation of refactoring the kernel mapping logic so that text regions
are never mapped writable, which would require adding explicit TLB
maintenance to new call sites of create_mapping_late() (which is currently
invoked twice from the same function), move the TLB maintenance from the
call site into create_mapping_late() itself, and change it from a full
TLB flush into a flush by VA, which is more appropriate here.

Also, given that create_mapping_late() has evolved into a routine that only
updates protection bits on existing mappings, rename it to
update_mapping_prot()

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d28dbcf596b6..6cafd8723d1a 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -319,17 +319,20 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			     pgd_pgtable_alloc, page_mappings_only);
 }
 
-static void create_mapping_late(phys_addr_t phys, unsigned long virt,
-				  phys_addr_t size, pgprot_t prot)
+static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
+				phys_addr_t size, pgprot_t prot)
 {
 	if (virt < VMALLOC_START) {
-		pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
+		pr_warn("BUG: not updating mapping for %pa at 0x%016lx - outside kernel range\n",
 			&phys, virt);
 		return;
 	}
 
 	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
 			     NULL, debug_pagealloc_enabled());
+
+	/* flush the TLBs after updating live kernel mappings */
+	flush_tlb_kernel_range(virt, virt + size);
 }
 
 static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
@@ -402,19 +405,16 @@ void mark_rodata_ro(void)
 	unsigned long section_size;
 
 	section_size = (unsigned long)_etext - (unsigned long)_text;
-	create_mapping_late(__pa_symbol(_text), (unsigned long)_text,
+	update_mapping_prot(__pa_symbol(_text), (unsigned long)_text,
 			    section_size, PAGE_KERNEL_ROX);
 	/*
 	 * mark .rodata as read only. Use __init_begin rather than __end_rodata
 	 * to cover NOTES and EXCEPTION_TABLE.
 	 */
 	section_size = (unsigned long)__init_begin - (unsigned long)__start_rodata;
-	create_mapping_late(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
+	update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
 			    section_size, PAGE_KERNEL_RO);
 
-	/* flush the TLBs after updating live kernel mappings */
-	flush_tlb_all();
-
 	debug_checkwx();
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 3/6] arm64: alternatives: apply boot time fixups via the linear mapping
  2017-03-04 14:30 ` Ard Biesheuvel
  (?)
@ 2017-03-04 14:30   ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, mark.rutland,
	catalin.marinas, will.deacon, labbott
  Cc: keescook, marc.zyngier, andre.przywara, Ard Biesheuvel, kvmarm

One important rule of thumb when desiging a secure software system is
that memory should never be writable and executable at the same time.
We mostly adhere to this rule in the kernel, except at boot time, when
regions may be mapped RWX until after we are done applying alternatives
or making other one-off changes.

For the alternative patching, we can improve the situation by applying
the fixups via the linear mapping, which is never mapped with executable
permissions. So map the linear alias of .text with RW- permissions
initially, and remove the write permissions as soon as alternative
patching has completed.

Reviewed-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/mmu.h    |  1 +
 arch/arm64/kernel/alternative.c | 11 +++++-----
 arch/arm64/kernel/smp.c         |  1 +
 arch/arm64/mm/mmu.c             | 22 +++++++++++++++-----
 4 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 47619411f0ff..5468c834b072 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -37,5 +37,6 @@ extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			       unsigned long virt, phys_addr_t size,
 			       pgprot_t prot, bool page_mappings_only);
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
+extern void mark_linear_text_alias_ro(void);
 
 #endif
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 06d650f61da7..8840c109c5d6 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -105,11 +105,11 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
 	return insn;
 }
 
-static void __apply_alternatives(void *alt_region)
+static void __apply_alternatives(void *alt_region, bool use_linear_alias)
 {
 	struct alt_instr *alt;
 	struct alt_region *region = alt_region;
-	u32 *origptr, *replptr;
+	u32 *origptr, *replptr, *updptr;
 
 	for (alt = region->begin; alt < region->end; alt++) {
 		u32 insn;
@@ -124,11 +124,12 @@ static void __apply_alternatives(void *alt_region)
 
 		origptr = ALT_ORIG_PTR(alt);
 		replptr = ALT_REPL_PTR(alt);
+		updptr = use_linear_alias ? (u32 *)lm_alias(origptr) : origptr;
 		nr_inst = alt->alt_len / sizeof(insn);
 
 		for (i = 0; i < nr_inst; i++) {
 			insn = get_alt_insn(alt, origptr + i, replptr + i);
-			*(origptr + i) = cpu_to_le32(insn);
+			updptr[i] = cpu_to_le32(insn);
 		}
 
 		flush_icache_range((uintptr_t)origptr,
@@ -155,7 +156,7 @@ static int __apply_alternatives_multi_stop(void *unused)
 		isb();
 	} else {
 		BUG_ON(patched);
-		__apply_alternatives(&region);
+		__apply_alternatives(&region, true);
 		/* Barriers provided by the cache flushing */
 		WRITE_ONCE(patched, 1);
 	}
@@ -176,5 +177,5 @@ void apply_alternatives(void *start, size_t length)
 		.end	= start + length,
 	};
 
-	__apply_alternatives(&region);
+	__apply_alternatives(&region, false);
 }
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 827d52d78b67..691831c0ba74 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -432,6 +432,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
 	setup_cpu_features();
 	hyp_mode_check();
 	apply_alternatives_all();
+	mark_linear_text_alias_ro();
 }
 
 void __init smp_prepare_boot_cpu(void)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 6cafd8723d1a..df377fbe464e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -372,16 +372,28 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 				     debug_pagealloc_enabled());
 
 	/*
-	 * Map the linear alias of the [_text, __init_begin) interval as
-	 * read-only/non-executable. This makes the contents of the
-	 * region accessible to subsystems such as hibernate, but
-	 * protects it from inadvertent modification or execution.
+	 * Map the linear alias of the [_text, __init_begin) interval
+	 * as non-executable now, and remove the write permission in
+	 * mark_linear_text_alias_ro() below (which will be called after
+	 * alternative patching has completed). This makes the contents
+	 * of the region accessible to subsystems such as hibernate,
+	 * but protects it from inadvertent modification or execution.
 	 */
 	__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
-			     kernel_end - kernel_start, PAGE_KERNEL_RO,
+			     kernel_end - kernel_start, PAGE_KERNEL,
 			     early_pgtable_alloc, debug_pagealloc_enabled());
 }
 
+void __init mark_linear_text_alias_ro(void)
+{
+	/*
+	 * Remove the write permissions from the linear alias of .text/.rodata
+	 */
+	update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text),
+			    (unsigned long)__init_begin - (unsigned long)_text,
+			    PAGE_KERNEL_RO);
+}
+
 static void __init map_mem(pgd_t *pgd)
 {
 	struct memblock_region *reg;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 3/6] arm64: alternatives: apply boot time fixups via the linear mapping
@ 2017-03-04 14:30   ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel

One important rule of thumb when desiging a secure software system is
that memory should never be writable and executable at the same time.
We mostly adhere to this rule in the kernel, except at boot time, when
regions may be mapped RWX until after we are done applying alternatives
or making other one-off changes.

For the alternative patching, we can improve the situation by applying
the fixups via the linear mapping, which is never mapped with executable
permissions. So map the linear alias of .text with RW- permissions
initially, and remove the write permissions as soon as alternative
patching has completed.

Reviewed-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/mmu.h    |  1 +
 arch/arm64/kernel/alternative.c | 11 +++++-----
 arch/arm64/kernel/smp.c         |  1 +
 arch/arm64/mm/mmu.c             | 22 +++++++++++++++-----
 4 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 47619411f0ff..5468c834b072 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -37,5 +37,6 @@ extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			       unsigned long virt, phys_addr_t size,
 			       pgprot_t prot, bool page_mappings_only);
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
+extern void mark_linear_text_alias_ro(void);
 
 #endif
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 06d650f61da7..8840c109c5d6 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -105,11 +105,11 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
 	return insn;
 }
 
-static void __apply_alternatives(void *alt_region)
+static void __apply_alternatives(void *alt_region, bool use_linear_alias)
 {
 	struct alt_instr *alt;
 	struct alt_region *region = alt_region;
-	u32 *origptr, *replptr;
+	u32 *origptr, *replptr, *updptr;
 
 	for (alt = region->begin; alt < region->end; alt++) {
 		u32 insn;
@@ -124,11 +124,12 @@ static void __apply_alternatives(void *alt_region)
 
 		origptr = ALT_ORIG_PTR(alt);
 		replptr = ALT_REPL_PTR(alt);
+		updptr = use_linear_alias ? (u32 *)lm_alias(origptr) : origptr;
 		nr_inst = alt->alt_len / sizeof(insn);
 
 		for (i = 0; i < nr_inst; i++) {
 			insn = get_alt_insn(alt, origptr + i, replptr + i);
-			*(origptr + i) = cpu_to_le32(insn);
+			updptr[i] = cpu_to_le32(insn);
 		}
 
 		flush_icache_range((uintptr_t)origptr,
@@ -155,7 +156,7 @@ static int __apply_alternatives_multi_stop(void *unused)
 		isb();
 	} else {
 		BUG_ON(patched);
-		__apply_alternatives(&region);
+		__apply_alternatives(&region, true);
 		/* Barriers provided by the cache flushing */
 		WRITE_ONCE(patched, 1);
 	}
@@ -176,5 +177,5 @@ void apply_alternatives(void *start, size_t length)
 		.end	= start + length,
 	};
 
-	__apply_alternatives(&region);
+	__apply_alternatives(&region, false);
 }
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 827d52d78b67..691831c0ba74 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -432,6 +432,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
 	setup_cpu_features();
 	hyp_mode_check();
 	apply_alternatives_all();
+	mark_linear_text_alias_ro();
 }
 
 void __init smp_prepare_boot_cpu(void)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 6cafd8723d1a..df377fbe464e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -372,16 +372,28 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 				     debug_pagealloc_enabled());
 
 	/*
-	 * Map the linear alias of the [_text, __init_begin) interval as
-	 * read-only/non-executable. This makes the contents of the
-	 * region accessible to subsystems such as hibernate, but
-	 * protects it from inadvertent modification or execution.
+	 * Map the linear alias of the [_text, __init_begin) interval
+	 * as non-executable now, and remove the write permission in
+	 * mark_linear_text_alias_ro() below (which will be called after
+	 * alternative patching has completed). This makes the contents
+	 * of the region accessible to subsystems such as hibernate,
+	 * but protects it from inadvertent modification or execution.
 	 */
 	__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
-			     kernel_end - kernel_start, PAGE_KERNEL_RO,
+			     kernel_end - kernel_start, PAGE_KERNEL,
 			     early_pgtable_alloc, debug_pagealloc_enabled());
 }
 
+void __init mark_linear_text_alias_ro(void)
+{
+	/*
+	 * Remove the write permissions from the linear alias of .text/.rodata
+	 */
+	update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text),
+			    (unsigned long)__init_begin - (unsigned long)_text,
+			    PAGE_KERNEL_RO);
+}
+
 static void __init map_mem(pgd_t *pgd)
 {
 	struct memblock_region *reg;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [kernel-hardening] [PATCH v4 3/6] arm64: alternatives: apply boot time fixups via the linear mapping
@ 2017-03-04 14:30   ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, mark.rutland,
	catalin.marinas, will.deacon, labbott
  Cc: kvmarm, marc.zyngier, keescook, andre.przywara, james.morse,
	suzuki.poulose, Ard Biesheuvel

One important rule of thumb when desiging a secure software system is
that memory should never be writable and executable at the same time.
We mostly adhere to this rule in the kernel, except at boot time, when
regions may be mapped RWX until after we are done applying alternatives
or making other one-off changes.

For the alternative patching, we can improve the situation by applying
the fixups via the linear mapping, which is never mapped with executable
permissions. So map the linear alias of .text with RW- permissions
initially, and remove the write permissions as soon as alternative
patching has completed.

Reviewed-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/mmu.h    |  1 +
 arch/arm64/kernel/alternative.c | 11 +++++-----
 arch/arm64/kernel/smp.c         |  1 +
 arch/arm64/mm/mmu.c             | 22 +++++++++++++++-----
 4 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 47619411f0ff..5468c834b072 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -37,5 +37,6 @@ extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			       unsigned long virt, phys_addr_t size,
 			       pgprot_t prot, bool page_mappings_only);
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
+extern void mark_linear_text_alias_ro(void);
 
 #endif
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 06d650f61da7..8840c109c5d6 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -105,11 +105,11 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr)
 	return insn;
 }
 
-static void __apply_alternatives(void *alt_region)
+static void __apply_alternatives(void *alt_region, bool use_linear_alias)
 {
 	struct alt_instr *alt;
 	struct alt_region *region = alt_region;
-	u32 *origptr, *replptr;
+	u32 *origptr, *replptr, *updptr;
 
 	for (alt = region->begin; alt < region->end; alt++) {
 		u32 insn;
@@ -124,11 +124,12 @@ static void __apply_alternatives(void *alt_region)
 
 		origptr = ALT_ORIG_PTR(alt);
 		replptr = ALT_REPL_PTR(alt);
+		updptr = use_linear_alias ? (u32 *)lm_alias(origptr) : origptr;
 		nr_inst = alt->alt_len / sizeof(insn);
 
 		for (i = 0; i < nr_inst; i++) {
 			insn = get_alt_insn(alt, origptr + i, replptr + i);
-			*(origptr + i) = cpu_to_le32(insn);
+			updptr[i] = cpu_to_le32(insn);
 		}
 
 		flush_icache_range((uintptr_t)origptr,
@@ -155,7 +156,7 @@ static int __apply_alternatives_multi_stop(void *unused)
 		isb();
 	} else {
 		BUG_ON(patched);
-		__apply_alternatives(&region);
+		__apply_alternatives(&region, true);
 		/* Barriers provided by the cache flushing */
 		WRITE_ONCE(patched, 1);
 	}
@@ -176,5 +177,5 @@ void apply_alternatives(void *start, size_t length)
 		.end	= start + length,
 	};
 
-	__apply_alternatives(&region);
+	__apply_alternatives(&region, false);
 }
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 827d52d78b67..691831c0ba74 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -432,6 +432,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
 	setup_cpu_features();
 	hyp_mode_check();
 	apply_alternatives_all();
+	mark_linear_text_alias_ro();
 }
 
 void __init smp_prepare_boot_cpu(void)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 6cafd8723d1a..df377fbe464e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -372,16 +372,28 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 				     debug_pagealloc_enabled());
 
 	/*
-	 * Map the linear alias of the [_text, __init_begin) interval as
-	 * read-only/non-executable. This makes the contents of the
-	 * region accessible to subsystems such as hibernate, but
-	 * protects it from inadvertent modification or execution.
+	 * Map the linear alias of the [_text, __init_begin) interval
+	 * as non-executable now, and remove the write permission in
+	 * mark_linear_text_alias_ro() below (which will be called after
+	 * alternative patching has completed). This makes the contents
+	 * of the region accessible to subsystems such as hibernate,
+	 * but protects it from inadvertent modification or execution.
 	 */
 	__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
-			     kernel_end - kernel_start, PAGE_KERNEL_RO,
+			     kernel_end - kernel_start, PAGE_KERNEL,
 			     early_pgtable_alloc, debug_pagealloc_enabled());
 }
 
+void __init mark_linear_text_alias_ro(void)
+{
+	/*
+	 * Remove the write permissions from the linear alias of .text/.rodata
+	 */
+	update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text),
+			    (unsigned long)__init_begin - (unsigned long)_text,
+			    PAGE_KERNEL_RO);
+}
+
 static void __init map_mem(pgd_t *pgd)
 {
 	struct memblock_region *reg;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 4/6] arm64: mmu: map .text as read-only from the outset
  2017-03-04 14:30 ` Ard Biesheuvel
  (?)
@ 2017-03-04 14:30   ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, mark.rutland,
	catalin.marinas, will.deacon, labbott
  Cc: keescook, marc.zyngier, andre.przywara, Ard Biesheuvel, kvmarm

Now that alternatives patching code no longer relies on the primary
mapping of .text being writable, we can remove the code that removes
the writable permissions post-init time, and map it read-only from
the outset.

To preserve the existing behavior under rodata=off, which is relied
upon by external debuggers to manage software breakpoints (as pointed
out by Mark), add an early_param() check for rodata=, and use RWX
permissions if it set to 'off'.

Reviewed-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index df377fbe464e..edd982f88714 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -416,9 +416,6 @@ void mark_rodata_ro(void)
 {
 	unsigned long section_size;
 
-	section_size = (unsigned long)_etext - (unsigned long)_text;
-	update_mapping_prot(__pa_symbol(_text), (unsigned long)_text,
-			    section_size, PAGE_KERNEL_ROX);
 	/*
 	 * mark .rodata as read only. Use __init_begin rather than __end_rodata
 	 * to cover NOTES and EXCEPTION_TABLE.
@@ -451,6 +448,12 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
 	vm_area_add_early(vma);
 }
 
+static int __init parse_rodata(char *arg)
+{
+	return strtobool(arg, &rodata_enabled);
+}
+early_param("rodata", parse_rodata);
+
 /*
  * Create fine-grained mappings for the kernel.
  */
@@ -458,7 +461,9 @@ static void __init map_kernel(pgd_t *pgd)
 {
 	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, vmlinux_data;
 
-	map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_EXEC, &vmlinux_text);
+	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
+
+	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
 	map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, &vmlinux_rodata);
 	map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
 			   &vmlinux_init);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 4/6] arm64: mmu: map .text as read-only from the outset
@ 2017-03-04 14:30   ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel

Now that alternatives patching code no longer relies on the primary
mapping of .text being writable, we can remove the code that removes
the writable permissions post-init time, and map it read-only from
the outset.

To preserve the existing behavior under rodata=off, which is relied
upon by external debuggers to manage software breakpoints (as pointed
out by Mark), add an early_param() check for rodata=, and use RWX
permissions if it set to 'off'.

Reviewed-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index df377fbe464e..edd982f88714 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -416,9 +416,6 @@ void mark_rodata_ro(void)
 {
 	unsigned long section_size;
 
-	section_size = (unsigned long)_etext - (unsigned long)_text;
-	update_mapping_prot(__pa_symbol(_text), (unsigned long)_text,
-			    section_size, PAGE_KERNEL_ROX);
 	/*
 	 * mark .rodata as read only. Use __init_begin rather than __end_rodata
 	 * to cover NOTES and EXCEPTION_TABLE.
@@ -451,6 +448,12 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
 	vm_area_add_early(vma);
 }
 
+static int __init parse_rodata(char *arg)
+{
+	return strtobool(arg, &rodata_enabled);
+}
+early_param("rodata", parse_rodata);
+
 /*
  * Create fine-grained mappings for the kernel.
  */
@@ -458,7 +461,9 @@ static void __init map_kernel(pgd_t *pgd)
 {
 	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, vmlinux_data;
 
-	map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_EXEC, &vmlinux_text);
+	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
+
+	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
 	map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, &vmlinux_rodata);
 	map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
 			   &vmlinux_init);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [kernel-hardening] [PATCH v4 4/6] arm64: mmu: map .text as read-only from the outset
@ 2017-03-04 14:30   ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, mark.rutland,
	catalin.marinas, will.deacon, labbott
  Cc: kvmarm, marc.zyngier, keescook, andre.przywara, james.morse,
	suzuki.poulose, Ard Biesheuvel

Now that alternatives patching code no longer relies on the primary
mapping of .text being writable, we can remove the code that removes
the writable permissions post-init time, and map it read-only from
the outset.

To preserve the existing behavior under rodata=off, which is relied
upon by external debuggers to manage software breakpoints (as pointed
out by Mark), add an early_param() check for rodata=, and use RWX
permissions if it set to 'off'.

Reviewed-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index df377fbe464e..edd982f88714 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -416,9 +416,6 @@ void mark_rodata_ro(void)
 {
 	unsigned long section_size;
 
-	section_size = (unsigned long)_etext - (unsigned long)_text;
-	update_mapping_prot(__pa_symbol(_text), (unsigned long)_text,
-			    section_size, PAGE_KERNEL_ROX);
 	/*
 	 * mark .rodata as read only. Use __init_begin rather than __end_rodata
 	 * to cover NOTES and EXCEPTION_TABLE.
@@ -451,6 +448,12 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
 	vm_area_add_early(vma);
 }
 
+static int __init parse_rodata(char *arg)
+{
+	return strtobool(arg, &rodata_enabled);
+}
+early_param("rodata", parse_rodata);
+
 /*
  * Create fine-grained mappings for the kernel.
  */
@@ -458,7 +461,9 @@ static void __init map_kernel(pgd_t *pgd)
 {
 	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, vmlinux_data;
 
-	map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_EXEC, &vmlinux_text);
+	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
+
+	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
 	map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, &vmlinux_rodata);
 	map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
 			   &vmlinux_init);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 5/6] arm64: mmu: apply strict permissions to .init.text and .init.data
  2017-03-04 14:30 ` Ard Biesheuvel
  (?)
@ 2017-03-04 14:30   ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, mark.rutland,
	catalin.marinas, will.deacon, labbott
  Cc: keescook, marc.zyngier, andre.przywara, Ard Biesheuvel, kvmarm

To avoid having mappings that are writable and executable at the same
time, split the init region into a .init.text region that is mapped
read-only, and a .init.data region that is mapped non-executable.

This is possible now that the alternative patching occurs via the linear
mapping, and the linear alias of the init region is always mapped writable
(but never executable).

Since the alternatives descriptions themselves are read-only data, move
those into the .init.text region.

Reviewed-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/sections.h |  3 ++-
 arch/arm64/kernel/vmlinux.lds.S   | 25 +++++++++++++-------
 arch/arm64/mm/mmu.c               | 12 ++++++----
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
index 4e7e7067afdb..22582819b2e5 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -24,7 +24,8 @@ extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[];
 extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 extern char __hyp_text_start[], __hyp_text_end[];
 extern char __idmap_text_start[], __idmap_text_end[];
+extern char __initdata_begin[], __initdata_end[];
+extern char __inittext_begin[], __inittext_end[];
 extern char __irqentry_text_start[], __irqentry_text_end[];
 extern char __mmuoff_data_start[], __mmuoff_data_end[];
-
 #endif /* __ASM_SECTIONS_H */
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index b8deffa9e1bf..2c93d259046c 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -143,12 +143,27 @@ SECTIONS
 
 	. = ALIGN(SEGMENT_ALIGN);
 	__init_begin = .;
+	__inittext_begin = .;
 
 	INIT_TEXT_SECTION(8)
 	.exit.text : {
 		ARM_EXIT_KEEP(EXIT_TEXT)
 	}
 
+	. = ALIGN(4);
+	.altinstructions : {
+		__alt_instructions = .;
+		*(.altinstructions)
+		__alt_instructions_end = .;
+	}
+	.altinstr_replacement : {
+		*(.altinstr_replacement)
+	}
+
+	. = ALIGN(PAGE_SIZE);
+	__inittext_end = .;
+	__initdata_begin = .;
+
 	.init.data : {
 		INIT_DATA
 		INIT_SETUP(16)
@@ -164,15 +179,6 @@ SECTIONS
 
 	PERCPU_SECTION(L1_CACHE_BYTES)
 
-	. = ALIGN(4);
-	.altinstructions : {
-		__alt_instructions = .;
-		*(.altinstructions)
-		__alt_instructions_end = .;
-	}
-	.altinstr_replacement : {
-		*(.altinstr_replacement)
-	}
 	.rela : ALIGN(8) {
 		*(.rela .rela*)
 	}
@@ -181,6 +187,7 @@ SECTIONS
 	__rela_size	= SIZEOF(.rela);
 
 	. = ALIGN(SEGMENT_ALIGN);
+	__initdata_end = .;
 	__init_end = .;
 
 	_data = .;
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index edd982f88714..0612573ef869 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -459,14 +459,18 @@ early_param("rodata", parse_rodata);
  */
 static void __init map_kernel(pgd_t *pgd)
 {
-	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, vmlinux_data;
+	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
+				vmlinux_initdata, vmlinux_data;
 
 	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
 
 	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
-	map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, &vmlinux_rodata);
-	map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
-			   &vmlinux_init);
+	map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
+			   &vmlinux_rodata);
+	map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
+			   &vmlinux_inittext);
+	map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
+			   &vmlinux_initdata);
 	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);
 
 	if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 5/6] arm64: mmu: apply strict permissions to .init.text and .init.data
@ 2017-03-04 14:30   ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel

To avoid having mappings that are writable and executable at the same
time, split the init region into a .init.text region that is mapped
read-only, and a .init.data region that is mapped non-executable.

This is possible now that the alternative patching occurs via the linear
mapping, and the linear alias of the init region is always mapped writable
(but never executable).

Since the alternatives descriptions themselves are read-only data, move
those into the .init.text region.

Reviewed-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/sections.h |  3 ++-
 arch/arm64/kernel/vmlinux.lds.S   | 25 +++++++++++++-------
 arch/arm64/mm/mmu.c               | 12 ++++++----
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
index 4e7e7067afdb..22582819b2e5 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -24,7 +24,8 @@ extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[];
 extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 extern char __hyp_text_start[], __hyp_text_end[];
 extern char __idmap_text_start[], __idmap_text_end[];
+extern char __initdata_begin[], __initdata_end[];
+extern char __inittext_begin[], __inittext_end[];
 extern char __irqentry_text_start[], __irqentry_text_end[];
 extern char __mmuoff_data_start[], __mmuoff_data_end[];
-
 #endif /* __ASM_SECTIONS_H */
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index b8deffa9e1bf..2c93d259046c 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -143,12 +143,27 @@ SECTIONS
 
 	. = ALIGN(SEGMENT_ALIGN);
 	__init_begin = .;
+	__inittext_begin = .;
 
 	INIT_TEXT_SECTION(8)
 	.exit.text : {
 		ARM_EXIT_KEEP(EXIT_TEXT)
 	}
 
+	. = ALIGN(4);
+	.altinstructions : {
+		__alt_instructions = .;
+		*(.altinstructions)
+		__alt_instructions_end = .;
+	}
+	.altinstr_replacement : {
+		*(.altinstr_replacement)
+	}
+
+	. = ALIGN(PAGE_SIZE);
+	__inittext_end = .;
+	__initdata_begin = .;
+
 	.init.data : {
 		INIT_DATA
 		INIT_SETUP(16)
@@ -164,15 +179,6 @@ SECTIONS
 
 	PERCPU_SECTION(L1_CACHE_BYTES)
 
-	. = ALIGN(4);
-	.altinstructions : {
-		__alt_instructions = .;
-		*(.altinstructions)
-		__alt_instructions_end = .;
-	}
-	.altinstr_replacement : {
-		*(.altinstr_replacement)
-	}
 	.rela : ALIGN(8) {
 		*(.rela .rela*)
 	}
@@ -181,6 +187,7 @@ SECTIONS
 	__rela_size	= SIZEOF(.rela);
 
 	. = ALIGN(SEGMENT_ALIGN);
+	__initdata_end = .;
 	__init_end = .;
 
 	_data = .;
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index edd982f88714..0612573ef869 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -459,14 +459,18 @@ early_param("rodata", parse_rodata);
  */
 static void __init map_kernel(pgd_t *pgd)
 {
-	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, vmlinux_data;
+	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
+				vmlinux_initdata, vmlinux_data;
 
 	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
 
 	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
-	map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, &vmlinux_rodata);
-	map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
-			   &vmlinux_init);
+	map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
+			   &vmlinux_rodata);
+	map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
+			   &vmlinux_inittext);
+	map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
+			   &vmlinux_initdata);
 	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);
 
 	if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [kernel-hardening] [PATCH v4 5/6] arm64: mmu: apply strict permissions to .init.text and .init.data
@ 2017-03-04 14:30   ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, mark.rutland,
	catalin.marinas, will.deacon, labbott
  Cc: kvmarm, marc.zyngier, keescook, andre.przywara, james.morse,
	suzuki.poulose, Ard Biesheuvel

To avoid having mappings that are writable and executable at the same
time, split the init region into a .init.text region that is mapped
read-only, and a .init.data region that is mapped non-executable.

This is possible now that the alternative patching occurs via the linear
mapping, and the linear alias of the init region is always mapped writable
(but never executable).

Since the alternatives descriptions themselves are read-only data, move
those into the .init.text region.

Reviewed-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/sections.h |  3 ++-
 arch/arm64/kernel/vmlinux.lds.S   | 25 +++++++++++++-------
 arch/arm64/mm/mmu.c               | 12 ++++++----
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
index 4e7e7067afdb..22582819b2e5 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -24,7 +24,8 @@ extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[];
 extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 extern char __hyp_text_start[], __hyp_text_end[];
 extern char __idmap_text_start[], __idmap_text_end[];
+extern char __initdata_begin[], __initdata_end[];
+extern char __inittext_begin[], __inittext_end[];
 extern char __irqentry_text_start[], __irqentry_text_end[];
 extern char __mmuoff_data_start[], __mmuoff_data_end[];
-
 #endif /* __ASM_SECTIONS_H */
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index b8deffa9e1bf..2c93d259046c 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -143,12 +143,27 @@ SECTIONS
 
 	. = ALIGN(SEGMENT_ALIGN);
 	__init_begin = .;
+	__inittext_begin = .;
 
 	INIT_TEXT_SECTION(8)
 	.exit.text : {
 		ARM_EXIT_KEEP(EXIT_TEXT)
 	}
 
+	. = ALIGN(4);
+	.altinstructions : {
+		__alt_instructions = .;
+		*(.altinstructions)
+		__alt_instructions_end = .;
+	}
+	.altinstr_replacement : {
+		*(.altinstr_replacement)
+	}
+
+	. = ALIGN(PAGE_SIZE);
+	__inittext_end = .;
+	__initdata_begin = .;
+
 	.init.data : {
 		INIT_DATA
 		INIT_SETUP(16)
@@ -164,15 +179,6 @@ SECTIONS
 
 	PERCPU_SECTION(L1_CACHE_BYTES)
 
-	. = ALIGN(4);
-	.altinstructions : {
-		__alt_instructions = .;
-		*(.altinstructions)
-		__alt_instructions_end = .;
-	}
-	.altinstr_replacement : {
-		*(.altinstr_replacement)
-	}
 	.rela : ALIGN(8) {
 		*(.rela .rela*)
 	}
@@ -181,6 +187,7 @@ SECTIONS
 	__rela_size	= SIZEOF(.rela);
 
 	. = ALIGN(SEGMENT_ALIGN);
+	__initdata_end = .;
 	__init_end = .;
 
 	_data = .;
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index edd982f88714..0612573ef869 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -459,14 +459,18 @@ early_param("rodata", parse_rodata);
  */
 static void __init map_kernel(pgd_t *pgd)
 {
-	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, vmlinux_data;
+	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
+				vmlinux_initdata, vmlinux_data;
 
 	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
 
 	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
-	map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, &vmlinux_rodata);
-	map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
-			   &vmlinux_init);
+	map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
+			   &vmlinux_rodata);
+	map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
+			   &vmlinux_inittext);
+	map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
+			   &vmlinux_initdata);
 	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);
 
 	if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 6/6] arm64: mm: set the contiguous bit for kernel mappings where appropriate
  2017-03-04 14:30 ` Ard Biesheuvel
  (?)
@ 2017-03-04 14:30   ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, mark.rutland,
	catalin.marinas, will.deacon, labbott
  Cc: keescook, marc.zyngier, andre.przywara, Ard Biesheuvel, kvmarm

This is the third attempt at enabling the use of contiguous hints for
kernel mappings. The most recent attempt 0bfc445dec9d was reverted after
it turned out that updating permission attributes on live contiguous ranges
may result in TLB conflicts. So this time, the contiguous hint is not set
for .rodata or for the linear alias of .text/.rodata, both of which are
mapped read-write initially, and remapped read-only at a later stage.
(Note that the latter region could also be unmapped and remapped again
with updated permission attributes, given that the region, while live, is
only mapped for the convenience of the hibernation code, but that also
means the TLB footprint is negligible anyway, so why bother)

This enables the following contiguous range sizes for the virtual mapping
of the kernel image, and for the linear mapping:

          granule size |  cont PTE  |  cont PMD  |
          -------------+------------+------------+
               4 KB    |    64 KB   |   32 MB    |
              16 KB    |     2 MB   |    1 GB*   |
              64 KB    |     2 MB   |   16 GB*   |

* Only when built for 3 or more levels of translation. This is due to the
  fact that a 2 level configuration only consists of PGDs and PTEs, and the
  added complexity of dealing with folded PMDs is not justified considering
  that 16 GB contiguous ranges are likely to be ignored by the hardware (and
  16k/2 levels is a niche configuration)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 86 ++++++++++++++------
 1 file changed, 63 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0612573ef869..d0ae2f1f44fc 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -109,8 +109,10 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 				  unsigned long end, unsigned long pfn,
 				  pgprot_t prot,
-				  phys_addr_t (*pgtable_alloc)(void))
+				  phys_addr_t (*pgtable_alloc)(void),
+				  bool may_use_cont)
 {
+	pgprot_t __prot = prot;
 	pte_t *pte;
 
 	BUG_ON(pmd_sect(*pmd));
@@ -128,7 +130,19 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 	do {
 		pte_t old_pte = *pte;
 
-		set_pte(pte, pfn_pte(pfn, prot));
+		/*
+		 * Set the contiguous bit for the subsequent group of PTEs if
+		 * its size and alignment are appropriate.
+		 */
+		if (may_use_cont &&
+		    ((addr | PFN_PHYS(pfn)) & ~CONT_PTE_MASK) == 0) {
+			if (end - addr >= CONT_PTE_SIZE)
+				__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
+			else
+				__prot = prot;
+		}
+
+		set_pte(pte, pfn_pte(pfn, __prot));
 		pfn++;
 
 		/*
@@ -145,8 +159,10 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 				  phys_addr_t phys, pgprot_t prot,
 				  phys_addr_t (*pgtable_alloc)(void),
-				  bool page_mappings_only)
+				  bool page_mappings_only,
+				  bool may_use_cont)
 {
+	pgprot_t __prot = prot;
 	pmd_t *pmd;
 	unsigned long next;
 
@@ -173,7 +189,19 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 		/* try section mapping first */
 		if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
 		      !page_mappings_only) {
-			pmd_set_huge(pmd, phys, prot);
+			/*
+			 * Set the contiguous bit for the subsequent group of
+			 * PMDs if its size and alignment are appropriate.
+			 */
+			if (may_use_cont &&
+			    ((addr | phys) & ~CONT_PMD_MASK) == 0) {
+				if (end - addr >= CONT_PMD_SIZE)
+					__prot = __pgprot(pgprot_val(prot) |
+							  PTE_CONT);
+				else
+					__prot = prot;
+			}
+			pmd_set_huge(pmd, phys, __prot);
 
 			/*
 			 * After the PMD entry has been populated once, we
@@ -183,7 +211,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 						      pmd_val(*pmd)));
 		} else {
 			alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
-				       prot, pgtable_alloc);
+				       prot, pgtable_alloc, may_use_cont);
 
 			BUG_ON(pmd_val(old_pmd) != 0 &&
 			       pmd_val(old_pmd) != pmd_val(*pmd));
@@ -209,7 +237,8 @@ static inline bool use_1G_block(unsigned long addr, unsigned long next,
 static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 				  phys_addr_t phys, pgprot_t prot,
 				  phys_addr_t (*pgtable_alloc)(void),
-				  bool page_mappings_only)
+				  bool page_mappings_only,
+				  bool may_use_cont)
 {
 	pud_t *pud;
 	unsigned long next;
@@ -242,7 +271,8 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 						      pud_val(*pud)));
 		} else {
 			alloc_init_pmd(pud, addr, next, phys, prot,
-				       pgtable_alloc, page_mappings_only);
+				       pgtable_alloc, page_mappings_only,
+				       may_use_cont);
 
 			BUG_ON(pud_val(old_pud) != 0 &&
 			       pud_val(old_pud) != pud_val(*pud));
@@ -257,11 +287,14 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 				 unsigned long virt, phys_addr_t size,
 				 pgprot_t prot,
 				 phys_addr_t (*pgtable_alloc)(void),
-				 bool page_mappings_only)
+				 bool page_mappings_only,
+				 bool may_use_cont)
 {
 	unsigned long addr, length, end, next;
 	pgd_t *pgd = pgd_offset_raw(pgdir, virt);
 
+	BUG_ON(page_mappings_only && may_use_cont); /* sanity check */
+
 	/*
 	 * If the virtual and physical address don't have the same offset
 	 * within a page, we cannot map the region as the caller expects.
@@ -277,7 +310,7 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 	do {
 		next = pgd_addr_end(addr, end);
 		alloc_init_pud(pgd, addr, next, phys, prot, pgtable_alloc,
-			       page_mappings_only);
+			       page_mappings_only, may_use_cont);
 		phys += next - addr;
 	} while (pgd++, addr = next, addr != end);
 }
@@ -306,7 +339,8 @@ static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
 			&phys, virt);
 		return;
 	}
-	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false);
+	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false,
+			     false);
 }
 
 void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
@@ -316,7 +350,7 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 	BUG_ON(mm == &init_mm);
 
 	__create_pgd_mapping(mm->pgd, phys, virt, size, prot,
-			     pgd_pgtable_alloc, page_mappings_only);
+			     pgd_pgtable_alloc, page_mappings_only, false);
 }
 
 static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
@@ -329,7 +363,7 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
 	}
 
 	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
-			     NULL, debug_pagealloc_enabled());
+			     NULL, debug_pagealloc_enabled(), false);
 
 	/* flush the TLBs after updating live kernel mappings */
 	flush_tlb_kernel_range(virt, virt + size);
@@ -350,7 +384,8 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
 				     end - start, PAGE_KERNEL,
 				     early_pgtable_alloc,
-				     debug_pagealloc_enabled());
+				     debug_pagealloc_enabled(),
+				     !debug_pagealloc_enabled());
 		return;
 	}
 
@@ -363,13 +398,15 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 				     __phys_to_virt(start),
 				     kernel_start - start, PAGE_KERNEL,
 				     early_pgtable_alloc,
-				     debug_pagealloc_enabled());
+				     debug_pagealloc_enabled(),
+				     !debug_pagealloc_enabled());
 	if (kernel_end < end)
 		__create_pgd_mapping(pgd, kernel_end,
 				     __phys_to_virt(kernel_end),
 				     end - kernel_end, PAGE_KERNEL,
 				     early_pgtable_alloc,
-				     debug_pagealloc_enabled());
+				     debug_pagealloc_enabled(),
+				     !debug_pagealloc_enabled());
 
 	/*
 	 * Map the linear alias of the [_text, __init_begin) interval
@@ -381,7 +418,8 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 	 */
 	__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
 			     kernel_end - kernel_start, PAGE_KERNEL,
-			     early_pgtable_alloc, debug_pagealloc_enabled());
+			     early_pgtable_alloc, debug_pagealloc_enabled(),
+			     false);
 }
 
 void __init mark_linear_text_alias_ro(void)
@@ -428,7 +466,8 @@ void mark_rodata_ro(void)
 }
 
 static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
-				      pgprot_t prot, struct vm_struct *vma)
+				      pgprot_t prot, struct vm_struct *vma,
+				      bool may_use_cont)
 {
 	phys_addr_t pa_start = __pa_symbol(va_start);
 	unsigned long size = va_end - va_start;
@@ -437,7 +476,8 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
 	BUG_ON(!PAGE_ALIGNED(size));
 
 	__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
-			     early_pgtable_alloc, debug_pagealloc_enabled());
+			     early_pgtable_alloc, debug_pagealloc_enabled(),
+			     !debug_pagealloc_enabled() && may_use_cont);
 
 	vma->addr	= va_start;
 	vma->phys_addr	= pa_start;
@@ -464,14 +504,14 @@ static void __init map_kernel(pgd_t *pgd)
 
 	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
 
-	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
+	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text, true);
 	map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
-			   &vmlinux_rodata);
+			   &vmlinux_rodata, false);
 	map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
-			   &vmlinux_inittext);
+			   &vmlinux_inittext, true);
 	map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
-			   &vmlinux_initdata);
-	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);
+			   &vmlinux_initdata, true);
+	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data, true);
 
 	if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {
 		/*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 6/6] arm64: mm: set the contiguous bit for kernel mappings where appropriate
@ 2017-03-04 14:30   ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel

This is the third attempt at enabling the use of contiguous hints for
kernel mappings. The most recent attempt 0bfc445dec9d was reverted after
it turned out that updating permission attributes on live contiguous ranges
may result in TLB conflicts. So this time, the contiguous hint is not set
for .rodata or for the linear alias of .text/.rodata, both of which are
mapped read-write initially, and remapped read-only at a later stage.
(Note that the latter region could also be unmapped and remapped again
with updated permission attributes, given that the region, while live, is
only mapped for the convenience of the hibernation code, but that also
means the TLB footprint is negligible anyway, so why bother)

This enables the following contiguous range sizes for the virtual mapping
of the kernel image, and for the linear mapping:

          granule size |  cont PTE  |  cont PMD  |
          -------------+------------+------------+
               4 KB    |    64 KB   |   32 MB    |
              16 KB    |     2 MB   |    1 GB*   |
              64 KB    |     2 MB   |   16 GB*   |

* Only when built for 3 or more levels of translation. This is due to the
  fact that a 2 level configuration only consists of PGDs and PTEs, and the
  added complexity of dealing with folded PMDs is not justified considering
  that 16 GB contiguous ranges are likely to be ignored by the hardware (and
  16k/2 levels is a niche configuration)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 86 ++++++++++++++------
 1 file changed, 63 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0612573ef869..d0ae2f1f44fc 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -109,8 +109,10 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 				  unsigned long end, unsigned long pfn,
 				  pgprot_t prot,
-				  phys_addr_t (*pgtable_alloc)(void))
+				  phys_addr_t (*pgtable_alloc)(void),
+				  bool may_use_cont)
 {
+	pgprot_t __prot = prot;
 	pte_t *pte;
 
 	BUG_ON(pmd_sect(*pmd));
@@ -128,7 +130,19 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 	do {
 		pte_t old_pte = *pte;
 
-		set_pte(pte, pfn_pte(pfn, prot));
+		/*
+		 * Set the contiguous bit for the subsequent group of PTEs if
+		 * its size and alignment are appropriate.
+		 */
+		if (may_use_cont &&
+		    ((addr | PFN_PHYS(pfn)) & ~CONT_PTE_MASK) == 0) {
+			if (end - addr >= CONT_PTE_SIZE)
+				__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
+			else
+				__prot = prot;
+		}
+
+		set_pte(pte, pfn_pte(pfn, __prot));
 		pfn++;
 
 		/*
@@ -145,8 +159,10 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 				  phys_addr_t phys, pgprot_t prot,
 				  phys_addr_t (*pgtable_alloc)(void),
-				  bool page_mappings_only)
+				  bool page_mappings_only,
+				  bool may_use_cont)
 {
+	pgprot_t __prot = prot;
 	pmd_t *pmd;
 	unsigned long next;
 
@@ -173,7 +189,19 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 		/* try section mapping first */
 		if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
 		      !page_mappings_only) {
-			pmd_set_huge(pmd, phys, prot);
+			/*
+			 * Set the contiguous bit for the subsequent group of
+			 * PMDs if its size and alignment are appropriate.
+			 */
+			if (may_use_cont &&
+			    ((addr | phys) & ~CONT_PMD_MASK) == 0) {
+				if (end - addr >= CONT_PMD_SIZE)
+					__prot = __pgprot(pgprot_val(prot) |
+							  PTE_CONT);
+				else
+					__prot = prot;
+			}
+			pmd_set_huge(pmd, phys, __prot);
 
 			/*
 			 * After the PMD entry has been populated once, we
@@ -183,7 +211,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 						      pmd_val(*pmd)));
 		} else {
 			alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
-				       prot, pgtable_alloc);
+				       prot, pgtable_alloc, may_use_cont);
 
 			BUG_ON(pmd_val(old_pmd) != 0 &&
 			       pmd_val(old_pmd) != pmd_val(*pmd));
@@ -209,7 +237,8 @@ static inline bool use_1G_block(unsigned long addr, unsigned long next,
 static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 				  phys_addr_t phys, pgprot_t prot,
 				  phys_addr_t (*pgtable_alloc)(void),
-				  bool page_mappings_only)
+				  bool page_mappings_only,
+				  bool may_use_cont)
 {
 	pud_t *pud;
 	unsigned long next;
@@ -242,7 +271,8 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 						      pud_val(*pud)));
 		} else {
 			alloc_init_pmd(pud, addr, next, phys, prot,
-				       pgtable_alloc, page_mappings_only);
+				       pgtable_alloc, page_mappings_only,
+				       may_use_cont);
 
 			BUG_ON(pud_val(old_pud) != 0 &&
 			       pud_val(old_pud) != pud_val(*pud));
@@ -257,11 +287,14 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 				 unsigned long virt, phys_addr_t size,
 				 pgprot_t prot,
 				 phys_addr_t (*pgtable_alloc)(void),
-				 bool page_mappings_only)
+				 bool page_mappings_only,
+				 bool may_use_cont)
 {
 	unsigned long addr, length, end, next;
 	pgd_t *pgd = pgd_offset_raw(pgdir, virt);
 
+	BUG_ON(page_mappings_only && may_use_cont); /* sanity check */
+
 	/*
 	 * If the virtual and physical address don't have the same offset
 	 * within a page, we cannot map the region as the caller expects.
@@ -277,7 +310,7 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 	do {
 		next = pgd_addr_end(addr, end);
 		alloc_init_pud(pgd, addr, next, phys, prot, pgtable_alloc,
-			       page_mappings_only);
+			       page_mappings_only, may_use_cont);
 		phys += next - addr;
 	} while (pgd++, addr = next, addr != end);
 }
@@ -306,7 +339,8 @@ static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
 			&phys, virt);
 		return;
 	}
-	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false);
+	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false,
+			     false);
 }
 
 void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
@@ -316,7 +350,7 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 	BUG_ON(mm == &init_mm);
 
 	__create_pgd_mapping(mm->pgd, phys, virt, size, prot,
-			     pgd_pgtable_alloc, page_mappings_only);
+			     pgd_pgtable_alloc, page_mappings_only, false);
 }
 
 static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
@@ -329,7 +363,7 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
 	}
 
 	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
-			     NULL, debug_pagealloc_enabled());
+			     NULL, debug_pagealloc_enabled(), false);
 
 	/* flush the TLBs after updating live kernel mappings */
 	flush_tlb_kernel_range(virt, virt + size);
@@ -350,7 +384,8 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
 				     end - start, PAGE_KERNEL,
 				     early_pgtable_alloc,
-				     debug_pagealloc_enabled());
+				     debug_pagealloc_enabled(),
+				     !debug_pagealloc_enabled());
 		return;
 	}
 
@@ -363,13 +398,15 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 				     __phys_to_virt(start),
 				     kernel_start - start, PAGE_KERNEL,
 				     early_pgtable_alloc,
-				     debug_pagealloc_enabled());
+				     debug_pagealloc_enabled(),
+				     !debug_pagealloc_enabled());
 	if (kernel_end < end)
 		__create_pgd_mapping(pgd, kernel_end,
 				     __phys_to_virt(kernel_end),
 				     end - kernel_end, PAGE_KERNEL,
 				     early_pgtable_alloc,
-				     debug_pagealloc_enabled());
+				     debug_pagealloc_enabled(),
+				     !debug_pagealloc_enabled());
 
 	/*
 	 * Map the linear alias of the [_text, __init_begin) interval
@@ -381,7 +418,8 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 	 */
 	__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
 			     kernel_end - kernel_start, PAGE_KERNEL,
-			     early_pgtable_alloc, debug_pagealloc_enabled());
+			     early_pgtable_alloc, debug_pagealloc_enabled(),
+			     false);
 }
 
 void __init mark_linear_text_alias_ro(void)
@@ -428,7 +466,8 @@ void mark_rodata_ro(void)
 }
 
 static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
-				      pgprot_t prot, struct vm_struct *vma)
+				      pgprot_t prot, struct vm_struct *vma,
+				      bool may_use_cont)
 {
 	phys_addr_t pa_start = __pa_symbol(va_start);
 	unsigned long size = va_end - va_start;
@@ -437,7 +476,8 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
 	BUG_ON(!PAGE_ALIGNED(size));
 
 	__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
-			     early_pgtable_alloc, debug_pagealloc_enabled());
+			     early_pgtable_alloc, debug_pagealloc_enabled(),
+			     !debug_pagealloc_enabled() && may_use_cont);
 
 	vma->addr	= va_start;
 	vma->phys_addr	= pa_start;
@@ -464,14 +504,14 @@ static void __init map_kernel(pgd_t *pgd)
 
 	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
 
-	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
+	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text, true);
 	map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
-			   &vmlinux_rodata);
+			   &vmlinux_rodata, false);
 	map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
-			   &vmlinux_inittext);
+			   &vmlinux_inittext, true);
 	map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
-			   &vmlinux_initdata);
-	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);
+			   &vmlinux_initdata, true);
+	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data, true);
 
 	if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {
 		/*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [kernel-hardening] [PATCH v4 6/6] arm64: mm: set the contiguous bit for kernel mappings where appropriate
@ 2017-03-04 14:30   ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, mark.rutland,
	catalin.marinas, will.deacon, labbott
  Cc: kvmarm, marc.zyngier, keescook, andre.przywara, james.morse,
	suzuki.poulose, Ard Biesheuvel

This is the third attempt at enabling the use of contiguous hints for
kernel mappings. The most recent attempt 0bfc445dec9d was reverted after
it turned out that updating permission attributes on live contiguous ranges
may result in TLB conflicts. So this time, the contiguous hint is not set
for .rodata or for the linear alias of .text/.rodata, both of which are
mapped read-write initially, and remapped read-only at a later stage.
(Note that the latter region could also be unmapped and remapped again
with updated permission attributes, given that the region, while live, is
only mapped for the convenience of the hibernation code, but that also
means the TLB footprint is negligible anyway, so why bother)

This enables the following contiguous range sizes for the virtual mapping
of the kernel image, and for the linear mapping:

          granule size |  cont PTE  |  cont PMD  |
          -------------+------------+------------+
               4 KB    |    64 KB   |   32 MB    |
              16 KB    |     2 MB   |    1 GB*   |
              64 KB    |     2 MB   |   16 GB*   |

* Only when built for 3 or more levels of translation. This is due to the
  fact that a 2 level configuration only consists of PGDs and PTEs, and the
  added complexity of dealing with folded PMDs is not justified considering
  that 16 GB contiguous ranges are likely to be ignored by the hardware (and
  16k/2 levels is a niche configuration)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 86 ++++++++++++++------
 1 file changed, 63 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0612573ef869..d0ae2f1f44fc 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -109,8 +109,10 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 				  unsigned long end, unsigned long pfn,
 				  pgprot_t prot,
-				  phys_addr_t (*pgtable_alloc)(void))
+				  phys_addr_t (*pgtable_alloc)(void),
+				  bool may_use_cont)
 {
+	pgprot_t __prot = prot;
 	pte_t *pte;
 
 	BUG_ON(pmd_sect(*pmd));
@@ -128,7 +130,19 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 	do {
 		pte_t old_pte = *pte;
 
-		set_pte(pte, pfn_pte(pfn, prot));
+		/*
+		 * Set the contiguous bit for the subsequent group of PTEs if
+		 * its size and alignment are appropriate.
+		 */
+		if (may_use_cont &&
+		    ((addr | PFN_PHYS(pfn)) & ~CONT_PTE_MASK) == 0) {
+			if (end - addr >= CONT_PTE_SIZE)
+				__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
+			else
+				__prot = prot;
+		}
+
+		set_pte(pte, pfn_pte(pfn, __prot));
 		pfn++;
 
 		/*
@@ -145,8 +159,10 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 				  phys_addr_t phys, pgprot_t prot,
 				  phys_addr_t (*pgtable_alloc)(void),
-				  bool page_mappings_only)
+				  bool page_mappings_only,
+				  bool may_use_cont)
 {
+	pgprot_t __prot = prot;
 	pmd_t *pmd;
 	unsigned long next;
 
@@ -173,7 +189,19 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 		/* try section mapping first */
 		if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
 		      !page_mappings_only) {
-			pmd_set_huge(pmd, phys, prot);
+			/*
+			 * Set the contiguous bit for the subsequent group of
+			 * PMDs if its size and alignment are appropriate.
+			 */
+			if (may_use_cont &&
+			    ((addr | phys) & ~CONT_PMD_MASK) == 0) {
+				if (end - addr >= CONT_PMD_SIZE)
+					__prot = __pgprot(pgprot_val(prot) |
+							  PTE_CONT);
+				else
+					__prot = prot;
+			}
+			pmd_set_huge(pmd, phys, __prot);
 
 			/*
 			 * After the PMD entry has been populated once, we
@@ -183,7 +211,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 						      pmd_val(*pmd)));
 		} else {
 			alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
-				       prot, pgtable_alloc);
+				       prot, pgtable_alloc, may_use_cont);
 
 			BUG_ON(pmd_val(old_pmd) != 0 &&
 			       pmd_val(old_pmd) != pmd_val(*pmd));
@@ -209,7 +237,8 @@ static inline bool use_1G_block(unsigned long addr, unsigned long next,
 static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 				  phys_addr_t phys, pgprot_t prot,
 				  phys_addr_t (*pgtable_alloc)(void),
-				  bool page_mappings_only)
+				  bool page_mappings_only,
+				  bool may_use_cont)
 {
 	pud_t *pud;
 	unsigned long next;
@@ -242,7 +271,8 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 						      pud_val(*pud)));
 		} else {
 			alloc_init_pmd(pud, addr, next, phys, prot,
-				       pgtable_alloc, page_mappings_only);
+				       pgtable_alloc, page_mappings_only,
+				       may_use_cont);
 
 			BUG_ON(pud_val(old_pud) != 0 &&
 			       pud_val(old_pud) != pud_val(*pud));
@@ -257,11 +287,14 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 				 unsigned long virt, phys_addr_t size,
 				 pgprot_t prot,
 				 phys_addr_t (*pgtable_alloc)(void),
-				 bool page_mappings_only)
+				 bool page_mappings_only,
+				 bool may_use_cont)
 {
 	unsigned long addr, length, end, next;
 	pgd_t *pgd = pgd_offset_raw(pgdir, virt);
 
+	BUG_ON(page_mappings_only && may_use_cont); /* sanity check */
+
 	/*
 	 * If the virtual and physical address don't have the same offset
 	 * within a page, we cannot map the region as the caller expects.
@@ -277,7 +310,7 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 	do {
 		next = pgd_addr_end(addr, end);
 		alloc_init_pud(pgd, addr, next, phys, prot, pgtable_alloc,
-			       page_mappings_only);
+			       page_mappings_only, may_use_cont);
 		phys += next - addr;
 	} while (pgd++, addr = next, addr != end);
 }
@@ -306,7 +339,8 @@ static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
 			&phys, virt);
 		return;
 	}
-	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false);
+	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false,
+			     false);
 }
 
 void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
@@ -316,7 +350,7 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 	BUG_ON(mm == &init_mm);
 
 	__create_pgd_mapping(mm->pgd, phys, virt, size, prot,
-			     pgd_pgtable_alloc, page_mappings_only);
+			     pgd_pgtable_alloc, page_mappings_only, false);
 }
 
 static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
@@ -329,7 +363,7 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
 	}
 
 	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
-			     NULL, debug_pagealloc_enabled());
+			     NULL, debug_pagealloc_enabled(), false);
 
 	/* flush the TLBs after updating live kernel mappings */
 	flush_tlb_kernel_range(virt, virt + size);
@@ -350,7 +384,8 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
 				     end - start, PAGE_KERNEL,
 				     early_pgtable_alloc,
-				     debug_pagealloc_enabled());
+				     debug_pagealloc_enabled(),
+				     !debug_pagealloc_enabled());
 		return;
 	}
 
@@ -363,13 +398,15 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 				     __phys_to_virt(start),
 				     kernel_start - start, PAGE_KERNEL,
 				     early_pgtable_alloc,
-				     debug_pagealloc_enabled());
+				     debug_pagealloc_enabled(),
+				     !debug_pagealloc_enabled());
 	if (kernel_end < end)
 		__create_pgd_mapping(pgd, kernel_end,
 				     __phys_to_virt(kernel_end),
 				     end - kernel_end, PAGE_KERNEL,
 				     early_pgtable_alloc,
-				     debug_pagealloc_enabled());
+				     debug_pagealloc_enabled(),
+				     !debug_pagealloc_enabled());
 
 	/*
 	 * Map the linear alias of the [_text, __init_begin) interval
@@ -381,7 +418,8 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 	 */
 	__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
 			     kernel_end - kernel_start, PAGE_KERNEL,
-			     early_pgtable_alloc, debug_pagealloc_enabled());
+			     early_pgtable_alloc, debug_pagealloc_enabled(),
+			     false);
 }
 
 void __init mark_linear_text_alias_ro(void)
@@ -428,7 +466,8 @@ void mark_rodata_ro(void)
 }
 
 static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
-				      pgprot_t prot, struct vm_struct *vma)
+				      pgprot_t prot, struct vm_struct *vma,
+				      bool may_use_cont)
 {
 	phys_addr_t pa_start = __pa_symbol(va_start);
 	unsigned long size = va_end - va_start;
@@ -437,7 +476,8 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
 	BUG_ON(!PAGE_ALIGNED(size));
 
 	__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
-			     early_pgtable_alloc, debug_pagealloc_enabled());
+			     early_pgtable_alloc, debug_pagealloc_enabled(),
+			     !debug_pagealloc_enabled() && may_use_cont);
 
 	vma->addr	= va_start;
 	vma->phys_addr	= pa_start;
@@ -464,14 +504,14 @@ static void __init map_kernel(pgd_t *pgd)
 
 	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
 
-	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
+	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text, true);
 	map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
-			   &vmlinux_rodata);
+			   &vmlinux_rodata, false);
 	map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
-			   &vmlinux_inittext);
+			   &vmlinux_inittext, true);
 	map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
-			   &vmlinux_initdata);
-	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);
+			   &vmlinux_initdata, true);
+	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data, true);
 
 	if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {
 		/*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 4/6] arm64: mmu: map .text as read-only from the outset
  2017-03-04 14:30   ` Ard Biesheuvel
  (?)
@ 2017-03-07 14:10     ` Mark Rutland
  -1 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2017-03-07 14:10 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: keescook, kernel-hardening, marc.zyngier, catalin.marinas,
	will.deacon, andre.przywara, kvmarm, linux-arm-kernel, labbott

On Sat, Mar 04, 2017 at 02:30:46PM +0000, Ard Biesheuvel wrote:
> Now that alternatives patching code no longer relies on the primary
> mapping of .text being writable, we can remove the code that removes
> the writable permissions post-init time, and map it read-only from
> the outset.
> 
> To preserve the existing behavior under rodata=off, which is relied
> upon by external debuggers to manage software breakpoints (as pointed
> out by Mark), add an early_param() check for rodata=, and use RWX
> permissions if it set to 'off'.
> 
> Reviewed-by: Laura Abbott <labbott@redhat.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/mm/mmu.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index df377fbe464e..edd982f88714 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -416,9 +416,6 @@ void mark_rodata_ro(void)
>  {
>  	unsigned long section_size;
>  
> -	section_size = (unsigned long)_etext - (unsigned long)_text;
> -	update_mapping_prot(__pa_symbol(_text), (unsigned long)_text,
> -			    section_size, PAGE_KERNEL_ROX);
>  	/*
>  	 * mark .rodata as read only. Use __init_begin rather than __end_rodata
>  	 * to cover NOTES and EXCEPTION_TABLE.
> @@ -451,6 +448,12 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
>  	vm_area_add_early(vma);
>  }
>  
> +static int __init parse_rodata(char *arg)
> +{
> +	return strtobool(arg, &rodata_enabled);
> +}
> +early_param("rodata", parse_rodata);
> +
>  /*
>   * Create fine-grained mappings for the kernel.
>   */
> @@ -458,7 +461,9 @@ static void __init map_kernel(pgd_t *pgd)
>  {
>  	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, vmlinux_data;
>  
> -	map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_EXEC, &vmlinux_text);
> +	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
> +

It might be worth having a comment as to why, e.g.

	/*
	 * External debuggers may need to write directly to the text
	 * mapping to install SW breakpoints. Allow this (only) when
	 * explicitly requested with rodata=off.
	 */

... otherwise this looks fine. FWIW, either way:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> +	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
>  	map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, &vmlinux_rodata);
>  	map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
>  			   &vmlinux_init);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 4/6] arm64: mmu: map .text as read-only from the outset
@ 2017-03-07 14:10     ` Mark Rutland
  0 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2017-03-07 14:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Mar 04, 2017 at 02:30:46PM +0000, Ard Biesheuvel wrote:
> Now that alternatives patching code no longer relies on the primary
> mapping of .text being writable, we can remove the code that removes
> the writable permissions post-init time, and map it read-only from
> the outset.
> 
> To preserve the existing behavior under rodata=off, which is relied
> upon by external debuggers to manage software breakpoints (as pointed
> out by Mark), add an early_param() check for rodata=, and use RWX
> permissions if it set to 'off'.
> 
> Reviewed-by: Laura Abbott <labbott@redhat.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/mm/mmu.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index df377fbe464e..edd982f88714 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -416,9 +416,6 @@ void mark_rodata_ro(void)
>  {
>  	unsigned long section_size;
>  
> -	section_size = (unsigned long)_etext - (unsigned long)_text;
> -	update_mapping_prot(__pa_symbol(_text), (unsigned long)_text,
> -			    section_size, PAGE_KERNEL_ROX);
>  	/*
>  	 * mark .rodata as read only. Use __init_begin rather than __end_rodata
>  	 * to cover NOTES and EXCEPTION_TABLE.
> @@ -451,6 +448,12 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
>  	vm_area_add_early(vma);
>  }
>  
> +static int __init parse_rodata(char *arg)
> +{
> +	return strtobool(arg, &rodata_enabled);
> +}
> +early_param("rodata", parse_rodata);
> +
>  /*
>   * Create fine-grained mappings for the kernel.
>   */
> @@ -458,7 +461,9 @@ static void __init map_kernel(pgd_t *pgd)
>  {
>  	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, vmlinux_data;
>  
> -	map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_EXEC, &vmlinux_text);
> +	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
> +

It might be worth having a comment as to why, e.g.

	/*
	 * External debuggers may need to write directly to the text
	 * mapping to install SW breakpoints. Allow this (only) when
	 * explicitly requested with rodata=off.
	 */

... otherwise this looks fine. FWIW, either way:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> +	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
>  	map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, &vmlinux_rodata);
>  	map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
>  			   &vmlinux_init);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [kernel-hardening] Re: [PATCH v4 4/6] arm64: mmu: map .text as read-only from the outset
@ 2017-03-07 14:10     ` Mark Rutland
  0 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2017-03-07 14:10 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, catalin.marinas, will.deacon,
	labbott, kvmarm, marc.zyngier, keescook, andre.przywara,
	james.morse, suzuki.poulose

On Sat, Mar 04, 2017 at 02:30:46PM +0000, Ard Biesheuvel wrote:
> Now that alternatives patching code no longer relies on the primary
> mapping of .text being writable, we can remove the code that removes
> the writable permissions post-init time, and map it read-only from
> the outset.
> 
> To preserve the existing behavior under rodata=off, which is relied
> upon by external debuggers to manage software breakpoints (as pointed
> out by Mark), add an early_param() check for rodata=, and use RWX
> permissions if it set to 'off'.
> 
> Reviewed-by: Laura Abbott <labbott@redhat.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/mm/mmu.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index df377fbe464e..edd982f88714 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -416,9 +416,6 @@ void mark_rodata_ro(void)
>  {
>  	unsigned long section_size;
>  
> -	section_size = (unsigned long)_etext - (unsigned long)_text;
> -	update_mapping_prot(__pa_symbol(_text), (unsigned long)_text,
> -			    section_size, PAGE_KERNEL_ROX);
>  	/*
>  	 * mark .rodata as read only. Use __init_begin rather than __end_rodata
>  	 * to cover NOTES and EXCEPTION_TABLE.
> @@ -451,6 +448,12 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
>  	vm_area_add_early(vma);
>  }
>  
> +static int __init parse_rodata(char *arg)
> +{
> +	return strtobool(arg, &rodata_enabled);
> +}
> +early_param("rodata", parse_rodata);
> +
>  /*
>   * Create fine-grained mappings for the kernel.
>   */
> @@ -458,7 +461,9 @@ static void __init map_kernel(pgd_t *pgd)
>  {
>  	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, vmlinux_data;
>  
> -	map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_EXEC, &vmlinux_text);
> +	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
> +

It might be worth having a comment as to why, e.g.

	/*
	 * External debuggers may need to write directly to the text
	 * mapping to install SW breakpoints. Allow this (only) when
	 * explicitly requested with rodata=off.
	 */

... otherwise this looks fine. FWIW, either way:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> +	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
>  	map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, &vmlinux_rodata);
>  	map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
>  			   &vmlinux_init);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 5/6] arm64: mmu: apply strict permissions to .init.text and .init.data
  2017-03-04 14:30   ` Ard Biesheuvel
  (?)
@ 2017-03-07 14:21     ` Mark Rutland
  -1 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2017-03-07 14:21 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: keescook, kernel-hardening, marc.zyngier, catalin.marinas,
	will.deacon, andre.przywara, kvmarm, linux-arm-kernel, labbott

On Sat, Mar 04, 2017 at 02:30:47PM +0000, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
> index 4e7e7067afdb..22582819b2e5 100644
> --- a/arch/arm64/include/asm/sections.h
> +++ b/arch/arm64/include/asm/sections.h
> @@ -24,7 +24,8 @@ extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[];
>  extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
>  extern char __hyp_text_start[], __hyp_text_end[];
>  extern char __idmap_text_start[], __idmap_text_end[];
> +extern char __initdata_begin[], __initdata_end[];
> +extern char __inittext_begin[], __inittext_end[];
>  extern char __irqentry_text_start[], __irqentry_text_end[];
>  extern char __mmuoff_data_start[], __mmuoff_data_end[];
> -
>  #endif /* __ASM_SECTIONS_H */

Unintended whitespace change?

Please restore the line above the endif. 

> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index b8deffa9e1bf..2c93d259046c 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -143,12 +143,27 @@ SECTIONS
>  
>  	. = ALIGN(SEGMENT_ALIGN);
>  	__init_begin = .;
> +	__inittext_begin = .;
>  
>  	INIT_TEXT_SECTION(8)
>  	.exit.text : {
>  		ARM_EXIT_KEEP(EXIT_TEXT)
>  	}
>  
> +	. = ALIGN(4);
> +	.altinstructions : {
> +		__alt_instructions = .;
> +		*(.altinstructions)
> +		__alt_instructions_end = .;
> +	}
> +	.altinstr_replacement : {
> +		*(.altinstr_replacement)
> +	}
> +
> +	. = ALIGN(PAGE_SIZE);

Arguably this should be SEGMENT_ALIGN for consitency, but given this is
just .init* this should be fine.

> +	__inittext_end = .;
> +	__initdata_begin = .;
> +
>  	.init.data : {
>  		INIT_DATA
>  		INIT_SETUP(16)

[...]

> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index edd982f88714..0612573ef869 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -459,14 +459,18 @@ early_param("rodata", parse_rodata);
>   */
>  static void __init map_kernel(pgd_t *pgd)
>  {
> -	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, vmlinux_data;
> +	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
> +				vmlinux_initdata, vmlinux_data;
>  
>  	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
>  
>  	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
> -	map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, &vmlinux_rodata);
> -	map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
> -			   &vmlinux_init);
> +	map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
> +			   &vmlinux_rodata);
> +	map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
> +			   &vmlinux_inittext);
> +	map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
> +			   &vmlinux_initdata);
>  	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);


This all look fine, given text_prot is used for the init text.

With the whitespace restored:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 5/6] arm64: mmu: apply strict permissions to .init.text and .init.data
@ 2017-03-07 14:21     ` Mark Rutland
  0 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2017-03-07 14:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Mar 04, 2017 at 02:30:47PM +0000, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
> index 4e7e7067afdb..22582819b2e5 100644
> --- a/arch/arm64/include/asm/sections.h
> +++ b/arch/arm64/include/asm/sections.h
> @@ -24,7 +24,8 @@ extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[];
>  extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
>  extern char __hyp_text_start[], __hyp_text_end[];
>  extern char __idmap_text_start[], __idmap_text_end[];
> +extern char __initdata_begin[], __initdata_end[];
> +extern char __inittext_begin[], __inittext_end[];
>  extern char __irqentry_text_start[], __irqentry_text_end[];
>  extern char __mmuoff_data_start[], __mmuoff_data_end[];
> -
>  #endif /* __ASM_SECTIONS_H */

Unintended whitespace change?

Please restore the line above the endif. 

> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index b8deffa9e1bf..2c93d259046c 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -143,12 +143,27 @@ SECTIONS
>  
>  	. = ALIGN(SEGMENT_ALIGN);
>  	__init_begin = .;
> +	__inittext_begin = .;
>  
>  	INIT_TEXT_SECTION(8)
>  	.exit.text : {
>  		ARM_EXIT_KEEP(EXIT_TEXT)
>  	}
>  
> +	. = ALIGN(4);
> +	.altinstructions : {
> +		__alt_instructions = .;
> +		*(.altinstructions)
> +		__alt_instructions_end = .;
> +	}
> +	.altinstr_replacement : {
> +		*(.altinstr_replacement)
> +	}
> +
> +	. = ALIGN(PAGE_SIZE);

Arguably this should be SEGMENT_ALIGN for consitency, but given this is
just .init* this should be fine.

> +	__inittext_end = .;
> +	__initdata_begin = .;
> +
>  	.init.data : {
>  		INIT_DATA
>  		INIT_SETUP(16)

[...]

> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index edd982f88714..0612573ef869 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -459,14 +459,18 @@ early_param("rodata", parse_rodata);
>   */
>  static void __init map_kernel(pgd_t *pgd)
>  {
> -	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, vmlinux_data;
> +	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
> +				vmlinux_initdata, vmlinux_data;
>  
>  	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
>  
>  	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
> -	map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, &vmlinux_rodata);
> -	map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
> -			   &vmlinux_init);
> +	map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
> +			   &vmlinux_rodata);
> +	map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
> +			   &vmlinux_inittext);
> +	map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
> +			   &vmlinux_initdata);
>  	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);


This all look fine, given text_prot is used for the init text.

With the whitespace restored:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [kernel-hardening] Re: [PATCH v4 5/6] arm64: mmu: apply strict permissions to .init.text and .init.data
@ 2017-03-07 14:21     ` Mark Rutland
  0 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2017-03-07 14:21 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, catalin.marinas, will.deacon,
	labbott, kvmarm, marc.zyngier, keescook, andre.przywara,
	james.morse, suzuki.poulose

On Sat, Mar 04, 2017 at 02:30:47PM +0000, Ard Biesheuvel wrote:
> diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
> index 4e7e7067afdb..22582819b2e5 100644
> --- a/arch/arm64/include/asm/sections.h
> +++ b/arch/arm64/include/asm/sections.h
> @@ -24,7 +24,8 @@ extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[];
>  extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
>  extern char __hyp_text_start[], __hyp_text_end[];
>  extern char __idmap_text_start[], __idmap_text_end[];
> +extern char __initdata_begin[], __initdata_end[];
> +extern char __inittext_begin[], __inittext_end[];
>  extern char __irqentry_text_start[], __irqentry_text_end[];
>  extern char __mmuoff_data_start[], __mmuoff_data_end[];
> -
>  #endif /* __ASM_SECTIONS_H */

Unintended whitespace change?

Please restore the line above the endif. 

> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index b8deffa9e1bf..2c93d259046c 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -143,12 +143,27 @@ SECTIONS
>  
>  	. = ALIGN(SEGMENT_ALIGN);
>  	__init_begin = .;
> +	__inittext_begin = .;
>  
>  	INIT_TEXT_SECTION(8)
>  	.exit.text : {
>  		ARM_EXIT_KEEP(EXIT_TEXT)
>  	}
>  
> +	. = ALIGN(4);
> +	.altinstructions : {
> +		__alt_instructions = .;
> +		*(.altinstructions)
> +		__alt_instructions_end = .;
> +	}
> +	.altinstr_replacement : {
> +		*(.altinstr_replacement)
> +	}
> +
> +	. = ALIGN(PAGE_SIZE);

Arguably this should be SEGMENT_ALIGN for consitency, but given this is
just .init* this should be fine.

> +	__inittext_end = .;
> +	__initdata_begin = .;
> +
>  	.init.data : {
>  		INIT_DATA
>  		INIT_SETUP(16)

[...]

> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index edd982f88714..0612573ef869 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -459,14 +459,18 @@ early_param("rodata", parse_rodata);
>   */
>  static void __init map_kernel(pgd_t *pgd)
>  {
> -	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, vmlinux_data;
> +	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
> +				vmlinux_initdata, vmlinux_data;
>  
>  	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
>  
>  	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
> -	map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, &vmlinux_rodata);
> -	map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
> -			   &vmlinux_init);
> +	map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
> +			   &vmlinux_rodata);
> +	map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
> +			   &vmlinux_inittext);
> +	map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
> +			   &vmlinux_initdata);
>  	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);


This all look fine, given text_prot is used for the init text.

With the whitespace restored:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 6/6] arm64: mm: set the contiguous bit for kernel mappings where appropriate
  2017-03-04 14:30   ` Ard Biesheuvel
  (?)
@ 2017-03-07 16:46     ` Mark Rutland
  -1 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2017-03-07 16:46 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: keescook, kernel-hardening, marc.zyngier, catalin.marinas,
	will.deacon, andre.przywara, kvmarm, linux-arm-kernel, labbott

Hi,

On Sat, Mar 04, 2017 at 02:30:48PM +0000, Ard Biesheuvel wrote:
> This is the third attempt at enabling the use of contiguous hints for
> kernel mappings. The most recent attempt 0bfc445dec9d was reverted after
> it turned out that updating permission attributes on live contiguous ranges
> may result in TLB conflicts. So this time, the contiguous hint is not set
> for .rodata or for the linear alias of .text/.rodata, both of which are
> mapped read-write initially, and remapped read-only at a later stage.
> (Note that the latter region could also be unmapped and remapped again
> with updated permission attributes, given that the region, while live, is
> only mapped for the convenience of the hibernation code, but that also
> means the TLB footprint is negligible anyway, so why bother)
> 
> This enables the following contiguous range sizes for the virtual mapping
> of the kernel image, and for the linear mapping:
> 
>           granule size |  cont PTE  |  cont PMD  |
>           -------------+------------+------------+
>                4 KB    |    64 KB   |   32 MB    |
>               16 KB    |     2 MB   |    1 GB*   |
>               64 KB    |     2 MB   |   16 GB*   |
> 
> * Only when built for 3 or more levels of translation. This is due to the
>   fact that a 2 level configuration only consists of PGDs and PTEs, and the
>   added complexity of dealing with folded PMDs is not justified considering
>   that 16 GB contiguous ranges are likely to be ignored by the hardware (and
>   16k/2 levels is a niche configuration)
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/mm/mmu.c | 86 ++++++++++++++------
>  1 file changed, 63 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 0612573ef869..d0ae2f1f44fc 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -109,8 +109,10 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
>  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>  				  unsigned long end, unsigned long pfn,
>  				  pgprot_t prot,
> -				  phys_addr_t (*pgtable_alloc)(void))
> +				  phys_addr_t (*pgtable_alloc)(void),
> +				  bool may_use_cont)

Maybe we should invert this as single_pages_only, to keep the same
polarity as page_mappings_only?

That might make the calls to __create_pgd_mapping() in __map_memblock()
look a little nicer, for instance.

>  {
> +	pgprot_t __prot = prot;
>  	pte_t *pte;
>  
>  	BUG_ON(pmd_sect(*pmd));
> @@ -128,7 +130,19 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>  	do {
>  		pte_t old_pte = *pte;
>  
> -		set_pte(pte, pfn_pte(pfn, prot));
> +		/*
> +		 * Set the contiguous bit for the subsequent group of PTEs if
> +		 * its size and alignment are appropriate.
> +		 */
> +		if (may_use_cont &&
> +		    ((addr | PFN_PHYS(pfn)) & ~CONT_PTE_MASK) == 0) {
> +			if (end - addr >= CONT_PTE_SIZE)
> +				__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
> +			else
> +				__prot = prot;
> +		}
> +
> +		set_pte(pte, pfn_pte(pfn, __prot));
>  		pfn++;
>  
>  		/*

While it would require more code, I think it would be better to add a
function between alloc_init_pte() and alloc_init_pmd(), handling this in
the usual fashion. e.g.

---->8----
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0eef606..2c90925 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -74,6 +74,11 @@
 #define pte_user_exec(pte)     (!(pte_val(pte) & PTE_UXN))
 #define pte_cont(pte)          (!!(pte_val(pte) & PTE_CONT))
 
+#define pte_cont_addr_end(addr, end)                                           \
+({     unsigned long __boundary = ((addr) + CONT_PTE_SIZE) & CONT_PTE_MASK;    \
+       (__boundary - 1 < (end) - 1)? __boundary: (end);                        \
+})
+
 #ifdef CONFIG_ARM64_HW_AFDBM
 #define pte_hw_dirty(pte)      (pte_write(pte) && !(pte_val(pte) & PTE_RDONLY))
 #else
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2aec93ab..3cee826 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -142,6 +142,32 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
        pte_clear_fixmap();
 }
 
+static void alloc_init_cont_pte(pmd_t *pmd, unsigned long addr,
+                                 unsigned long end, phys_addr_t phys,
+                                 pgprot_t prot,
+                                 phys_addr_t (*pgtable_alloc)(void),
+                                 bool may_use_cont)
+{
+       const pgprot_t cont_prot = __pgprot(pgprot_val(prot) | PTE_CONT);
+       unsigned long next;
+
+       do {
+               next = pte_cont_addr_end(addr, end);
+
+               /* try a contiguous mapping first */
+               if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
+                   may_use_cont) {
+                       alloc_init_pte(pmd, addr, next, phys, cont_prot,
+                                      pgtable_alloc);
+               } else {
+                       alloc_init_pte(pmd, addr, next, phys, prot,
+                                      pgtable_alloc);
+               }
+
+               phys += next - addr;
+       } while (addr = next, addr != end);
+}
+
 static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
                                  phys_addr_t phys, pgprot_t prot,
                                  phys_addr_t (*pgtable_alloc)(void),
---->8----

It's more code, but it follows the existing pattern, and I personally
find it easier to follow than changing prot within the loop.

Note that I've cheated and made alloc_init_pte() take a phys_addr_t
rather than a pfn, which I think we should do anyhow for consistency. I
have a patch for that, if you agree.

Another think we *might* want to do here is pass a flags parameter that
than separate page_mappings_only and mask_use_cont. That might also make
things less painful in future if there are other things we need to pass
down.

That might also end up complicating matters, given the way we currently
use debug_pagealloc_enabled() in __create_pgd_mapping() callers, so I'm
also happy to leave that.

[...]

> @@ -173,7 +189,19 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>  		/* try section mapping first */
>  		if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
>  		      !page_mappings_only) {
> -			pmd_set_huge(pmd, phys, prot);
> +			/*
> +			 * Set the contiguous bit for the subsequent group of
> +			 * PMDs if its size and alignment are appropriate.
> +			 */
> +			if (may_use_cont &&
> +			    ((addr | phys) & ~CONT_PMD_MASK) == 0) {
> +				if (end - addr >= CONT_PMD_SIZE)
> +					__prot = __pgprot(pgprot_val(prot) |
> +							  PTE_CONT);
> +				else
> +					__prot = prot;
> +			}
> +			pmd_set_huge(pmd, phys, __prot);

As above, I think it would be better to have a alloc_init_cont_pmd()
wrapper that iterated in CONT_PMD_SIZE chunks, so as to keep the usual
pattern.

> @@ -381,7 +418,8 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>  	 */
>  	__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
>  			     kernel_end - kernel_start, PAGE_KERNEL,
> -			     early_pgtable_alloc, debug_pagealloc_enabled());
> +			     early_pgtable_alloc, debug_pagealloc_enabled(),
> +			     false);
>  }

> @@ -437,7 +476,8 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
>  	BUG_ON(!PAGE_ALIGNED(size));
>  
>  	__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
> -			     early_pgtable_alloc, debug_pagealloc_enabled());
> +			     early_pgtable_alloc, debug_pagealloc_enabled(),
> +			     !debug_pagealloc_enabled() && may_use_cont);
>  
>  	vma->addr	= va_start;
>  	vma->phys_addr	= pa_start;
> @@ -464,14 +504,14 @@ static void __init map_kernel(pgd_t *pgd)
>  
>  	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
>  
> -	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
> +	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text, true);
>  	map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
> -			   &vmlinux_rodata);
> +			   &vmlinux_rodata, false);
>  	map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
> -			   &vmlinux_inittext);
> +			   &vmlinux_inittext, true);
>  	map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
> -			   &vmlinux_initdata);
> -	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);
> +			   &vmlinux_initdata, true);
> +	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data, true);

It might be worth a comment somewhere as to why we have to special case
text, rodata, and the linear alias, but otherwise this looks fine.

Thanks,
Mark.

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 6/6] arm64: mm: set the contiguous bit for kernel mappings where appropriate
@ 2017-03-07 16:46     ` Mark Rutland
  0 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2017-03-07 16:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Sat, Mar 04, 2017 at 02:30:48PM +0000, Ard Biesheuvel wrote:
> This is the third attempt at enabling the use of contiguous hints for
> kernel mappings. The most recent attempt 0bfc445dec9d was reverted after
> it turned out that updating permission attributes on live contiguous ranges
> may result in TLB conflicts. So this time, the contiguous hint is not set
> for .rodata or for the linear alias of .text/.rodata, both of which are
> mapped read-write initially, and remapped read-only at a later stage.
> (Note that the latter region could also be unmapped and remapped again
> with updated permission attributes, given that the region, while live, is
> only mapped for the convenience of the hibernation code, but that also
> means the TLB footprint is negligible anyway, so why bother)
> 
> This enables the following contiguous range sizes for the virtual mapping
> of the kernel image, and for the linear mapping:
> 
>           granule size |  cont PTE  |  cont PMD  |
>           -------------+------------+------------+
>                4 KB    |    64 KB   |   32 MB    |
>               16 KB    |     2 MB   |    1 GB*   |
>               64 KB    |     2 MB   |   16 GB*   |
> 
> * Only when built for 3 or more levels of translation. This is due to the
>   fact that a 2 level configuration only consists of PGDs and PTEs, and the
>   added complexity of dealing with folded PMDs is not justified considering
>   that 16 GB contiguous ranges are likely to be ignored by the hardware (and
>   16k/2 levels is a niche configuration)
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/mm/mmu.c | 86 ++++++++++++++------
>  1 file changed, 63 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 0612573ef869..d0ae2f1f44fc 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -109,8 +109,10 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
>  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>  				  unsigned long end, unsigned long pfn,
>  				  pgprot_t prot,
> -				  phys_addr_t (*pgtable_alloc)(void))
> +				  phys_addr_t (*pgtable_alloc)(void),
> +				  bool may_use_cont)

Maybe we should invert this as single_pages_only, to keep the same
polarity as page_mappings_only?

That might make the calls to __create_pgd_mapping() in __map_memblock()
look a little nicer, for instance.

>  {
> +	pgprot_t __prot = prot;
>  	pte_t *pte;
>  
>  	BUG_ON(pmd_sect(*pmd));
> @@ -128,7 +130,19 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>  	do {
>  		pte_t old_pte = *pte;
>  
> -		set_pte(pte, pfn_pte(pfn, prot));
> +		/*
> +		 * Set the contiguous bit for the subsequent group of PTEs if
> +		 * its size and alignment are appropriate.
> +		 */
> +		if (may_use_cont &&
> +		    ((addr | PFN_PHYS(pfn)) & ~CONT_PTE_MASK) == 0) {
> +			if (end - addr >= CONT_PTE_SIZE)
> +				__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
> +			else
> +				__prot = prot;
> +		}
> +
> +		set_pte(pte, pfn_pte(pfn, __prot));
>  		pfn++;
>  
>  		/*

While it would require more code, I think it would be better to add a
function between alloc_init_pte() and alloc_init_pmd(), handling this in
the usual fashion. e.g.

---->8----
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0eef606..2c90925 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -74,6 +74,11 @@
 #define pte_user_exec(pte)     (!(pte_val(pte) & PTE_UXN))
 #define pte_cont(pte)          (!!(pte_val(pte) & PTE_CONT))
 
+#define pte_cont_addr_end(addr, end)                                           \
+({     unsigned long __boundary = ((addr) + CONT_PTE_SIZE) & CONT_PTE_MASK;    \
+       (__boundary - 1 < (end) - 1)? __boundary: (end);                        \
+})
+
 #ifdef CONFIG_ARM64_HW_AFDBM
 #define pte_hw_dirty(pte)      (pte_write(pte) && !(pte_val(pte) & PTE_RDONLY))
 #else
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2aec93ab..3cee826 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -142,6 +142,32 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
        pte_clear_fixmap();
 }
 
+static void alloc_init_cont_pte(pmd_t *pmd, unsigned long addr,
+                                 unsigned long end, phys_addr_t phys,
+                                 pgprot_t prot,
+                                 phys_addr_t (*pgtable_alloc)(void),
+                                 bool may_use_cont)
+{
+       const pgprot_t cont_prot = __pgprot(pgprot_val(prot) | PTE_CONT);
+       unsigned long next;
+
+       do {
+               next = pte_cont_addr_end(addr, end);
+
+               /* try a contiguous mapping first */
+               if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
+                   may_use_cont) {
+                       alloc_init_pte(pmd, addr, next, phys, cont_prot,
+                                      pgtable_alloc);
+               } else {
+                       alloc_init_pte(pmd, addr, next, phys, prot,
+                                      pgtable_alloc);
+               }
+
+               phys += next - addr;
+       } while (addr = next, addr != end);
+}
+
 static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
                                  phys_addr_t phys, pgprot_t prot,
                                  phys_addr_t (*pgtable_alloc)(void),
---->8----

It's more code, but it follows the existing pattern, and I personally
find it easier to follow than changing prot within the loop.

Note that I've cheated and made alloc_init_pte() take a phys_addr_t
rather than a pfn, which I think we should do anyhow for consistency. I
have a patch for that, if you agree.

Another think we *might* want to do here is pass a flags parameter that
than separate page_mappings_only and mask_use_cont. That might also make
things less painful in future if there are other things we need to pass
down.

That might also end up complicating matters, given the way we currently
use debug_pagealloc_enabled() in __create_pgd_mapping() callers, so I'm
also happy to leave that.

[...]

> @@ -173,7 +189,19 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>  		/* try section mapping first */
>  		if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
>  		      !page_mappings_only) {
> -			pmd_set_huge(pmd, phys, prot);
> +			/*
> +			 * Set the contiguous bit for the subsequent group of
> +			 * PMDs if its size and alignment are appropriate.
> +			 */
> +			if (may_use_cont &&
> +			    ((addr | phys) & ~CONT_PMD_MASK) == 0) {
> +				if (end - addr >= CONT_PMD_SIZE)
> +					__prot = __pgprot(pgprot_val(prot) |
> +							  PTE_CONT);
> +				else
> +					__prot = prot;
> +			}
> +			pmd_set_huge(pmd, phys, __prot);

As above, I think it would be better to have a alloc_init_cont_pmd()
wrapper that iterated in CONT_PMD_SIZE chunks, so as to keep the usual
pattern.

> @@ -381,7 +418,8 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>  	 */
>  	__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
>  			     kernel_end - kernel_start, PAGE_KERNEL,
> -			     early_pgtable_alloc, debug_pagealloc_enabled());
> +			     early_pgtable_alloc, debug_pagealloc_enabled(),
> +			     false);
>  }

> @@ -437,7 +476,8 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
>  	BUG_ON(!PAGE_ALIGNED(size));
>  
>  	__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
> -			     early_pgtable_alloc, debug_pagealloc_enabled());
> +			     early_pgtable_alloc, debug_pagealloc_enabled(),
> +			     !debug_pagealloc_enabled() && may_use_cont);
>  
>  	vma->addr	= va_start;
>  	vma->phys_addr	= pa_start;
> @@ -464,14 +504,14 @@ static void __init map_kernel(pgd_t *pgd)
>  
>  	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
>  
> -	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
> +	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text, true);
>  	map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
> -			   &vmlinux_rodata);
> +			   &vmlinux_rodata, false);
>  	map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
> -			   &vmlinux_inittext);
> +			   &vmlinux_inittext, true);
>  	map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
> -			   &vmlinux_initdata);
> -	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);
> +			   &vmlinux_initdata, true);
> +	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data, true);

It might be worth a comment somewhere as to why we have to special case
text, rodata, and the linear alias, but otherwise this looks fine.

Thanks,
Mark.

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [kernel-hardening] Re: [PATCH v4 6/6] arm64: mm: set the contiguous bit for kernel mappings where appropriate
@ 2017-03-07 16:46     ` Mark Rutland
  0 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2017-03-07 16:46 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, catalin.marinas, will.deacon,
	labbott, kvmarm, marc.zyngier, keescook, andre.przywara,
	james.morse, suzuki.poulose

Hi,

On Sat, Mar 04, 2017 at 02:30:48PM +0000, Ard Biesheuvel wrote:
> This is the third attempt at enabling the use of contiguous hints for
> kernel mappings. The most recent attempt 0bfc445dec9d was reverted after
> it turned out that updating permission attributes on live contiguous ranges
> may result in TLB conflicts. So this time, the contiguous hint is not set
> for .rodata or for the linear alias of .text/.rodata, both of which are
> mapped read-write initially, and remapped read-only at a later stage.
> (Note that the latter region could also be unmapped and remapped again
> with updated permission attributes, given that the region, while live, is
> only mapped for the convenience of the hibernation code, but that also
> means the TLB footprint is negligible anyway, so why bother)
> 
> This enables the following contiguous range sizes for the virtual mapping
> of the kernel image, and for the linear mapping:
> 
>           granule size |  cont PTE  |  cont PMD  |
>           -------------+------------+------------+
>                4 KB    |    64 KB   |   32 MB    |
>               16 KB    |     2 MB   |    1 GB*   |
>               64 KB    |     2 MB   |   16 GB*   |
> 
> * Only when built for 3 or more levels of translation. This is due to the
>   fact that a 2 level configuration only consists of PGDs and PTEs, and the
>   added complexity of dealing with folded PMDs is not justified considering
>   that 16 GB contiguous ranges are likely to be ignored by the hardware (and
>   16k/2 levels is a niche configuration)
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/mm/mmu.c | 86 ++++++++++++++------
>  1 file changed, 63 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 0612573ef869..d0ae2f1f44fc 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -109,8 +109,10 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
>  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>  				  unsigned long end, unsigned long pfn,
>  				  pgprot_t prot,
> -				  phys_addr_t (*pgtable_alloc)(void))
> +				  phys_addr_t (*pgtable_alloc)(void),
> +				  bool may_use_cont)

Maybe we should invert this as single_pages_only, to keep the same
polarity as page_mappings_only?

That might make the calls to __create_pgd_mapping() in __map_memblock()
look a little nicer, for instance.

>  {
> +	pgprot_t __prot = prot;
>  	pte_t *pte;
>  
>  	BUG_ON(pmd_sect(*pmd));
> @@ -128,7 +130,19 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>  	do {
>  		pte_t old_pte = *pte;
>  
> -		set_pte(pte, pfn_pte(pfn, prot));
> +		/*
> +		 * Set the contiguous bit for the subsequent group of PTEs if
> +		 * its size and alignment are appropriate.
> +		 */
> +		if (may_use_cont &&
> +		    ((addr | PFN_PHYS(pfn)) & ~CONT_PTE_MASK) == 0) {
> +			if (end - addr >= CONT_PTE_SIZE)
> +				__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
> +			else
> +				__prot = prot;
> +		}
> +
> +		set_pte(pte, pfn_pte(pfn, __prot));
>  		pfn++;
>  
>  		/*

While it would require more code, I think it would be better to add a
function between alloc_init_pte() and alloc_init_pmd(), handling this in
the usual fashion. e.g.

---->8----
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0eef606..2c90925 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -74,6 +74,11 @@
 #define pte_user_exec(pte)     (!(pte_val(pte) & PTE_UXN))
 #define pte_cont(pte)          (!!(pte_val(pte) & PTE_CONT))
 
+#define pte_cont_addr_end(addr, end)                                           \
+({     unsigned long __boundary = ((addr) + CONT_PTE_SIZE) & CONT_PTE_MASK;    \
+       (__boundary - 1 < (end) - 1)? __boundary: (end);                        \
+})
+
 #ifdef CONFIG_ARM64_HW_AFDBM
 #define pte_hw_dirty(pte)      (pte_write(pte) && !(pte_val(pte) & PTE_RDONLY))
 #else
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2aec93ab..3cee826 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -142,6 +142,32 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
        pte_clear_fixmap();
 }
 
+static void alloc_init_cont_pte(pmd_t *pmd, unsigned long addr,
+                                 unsigned long end, phys_addr_t phys,
+                                 pgprot_t prot,
+                                 phys_addr_t (*pgtable_alloc)(void),
+                                 bool may_use_cont)
+{
+       const pgprot_t cont_prot = __pgprot(pgprot_val(prot) | PTE_CONT);
+       unsigned long next;
+
+       do {
+               next = pte_cont_addr_end(addr, end);
+
+               /* try a contiguous mapping first */
+               if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
+                   may_use_cont) {
+                       alloc_init_pte(pmd, addr, next, phys, cont_prot,
+                                      pgtable_alloc);
+               } else {
+                       alloc_init_pte(pmd, addr, next, phys, prot,
+                                      pgtable_alloc);
+               }
+
+               phys += next - addr;
+       } while (addr = next, addr != end);
+}
+
 static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
                                  phys_addr_t phys, pgprot_t prot,
                                  phys_addr_t (*pgtable_alloc)(void),
---->8----

It's more code, but it follows the existing pattern, and I personally
find it easier to follow than changing prot within the loop.

Note that I've cheated and made alloc_init_pte() take a phys_addr_t
rather than a pfn, which I think we should do anyhow for consistency. I
have a patch for that, if you agree.

Another think we *might* want to do here is pass a flags parameter that
than separate page_mappings_only and mask_use_cont. That might also make
things less painful in future if there are other things we need to pass
down.

That might also end up complicating matters, given the way we currently
use debug_pagealloc_enabled() in __create_pgd_mapping() callers, so I'm
also happy to leave that.

[...]

> @@ -173,7 +189,19 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>  		/* try section mapping first */
>  		if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
>  		      !page_mappings_only) {
> -			pmd_set_huge(pmd, phys, prot);
> +			/*
> +			 * Set the contiguous bit for the subsequent group of
> +			 * PMDs if its size and alignment are appropriate.
> +			 */
> +			if (may_use_cont &&
> +			    ((addr | phys) & ~CONT_PMD_MASK) == 0) {
> +				if (end - addr >= CONT_PMD_SIZE)
> +					__prot = __pgprot(pgprot_val(prot) |
> +							  PTE_CONT);
> +				else
> +					__prot = prot;
> +			}
> +			pmd_set_huge(pmd, phys, __prot);

As above, I think it would be better to have a alloc_init_cont_pmd()
wrapper that iterated in CONT_PMD_SIZE chunks, so as to keep the usual
pattern.

> @@ -381,7 +418,8 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>  	 */
>  	__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
>  			     kernel_end - kernel_start, PAGE_KERNEL,
> -			     early_pgtable_alloc, debug_pagealloc_enabled());
> +			     early_pgtable_alloc, debug_pagealloc_enabled(),
> +			     false);
>  }

> @@ -437,7 +476,8 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
>  	BUG_ON(!PAGE_ALIGNED(size));
>  
>  	__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
> -			     early_pgtable_alloc, debug_pagealloc_enabled());
> +			     early_pgtable_alloc, debug_pagealloc_enabled(),
> +			     !debug_pagealloc_enabled() && may_use_cont);
>  
>  	vma->addr	= va_start;
>  	vma->phys_addr	= pa_start;
> @@ -464,14 +504,14 @@ static void __init map_kernel(pgd_t *pgd)
>  
>  	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
>  
> -	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
> +	map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text, true);
>  	map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
> -			   &vmlinux_rodata);
> +			   &vmlinux_rodata, false);
>  	map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
> -			   &vmlinux_inittext);
> +			   &vmlinux_inittext, true);
>  	map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
> -			   &vmlinux_initdata);
> -	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);
> +			   &vmlinux_initdata, true);
> +	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data, true);

It might be worth a comment somewhere as to why we have to special case
text, rodata, and the linear alias, but otherwise this looks fine.

Thanks,
Mark.

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 6/6] arm64: mm: set the contiguous bit for kernel mappings where appropriate
  2017-03-07 16:46     ` Mark Rutland
  (?)
@ 2017-03-08 10:57       ` Ard Biesheuvel
  -1 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-08 10:57 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Kees Cook, kernel-hardening, Marc Zyngier, Catalin Marinas,
	Will Deacon, Andre Przywara, kvmarm, linux-arm-kernel,
	Laura Abbott

On 7 March 2017 at 17:46, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi,
>
> On Sat, Mar 04, 2017 at 02:30:48PM +0000, Ard Biesheuvel wrote:
>> This is the third attempt at enabling the use of contiguous hints for
>> kernel mappings. The most recent attempt 0bfc445dec9d was reverted after
>> it turned out that updating permission attributes on live contiguous ranges
>> may result in TLB conflicts. So this time, the contiguous hint is not set
>> for .rodata or for the linear alias of .text/.rodata, both of which are
>> mapped read-write initially, and remapped read-only at a later stage.
>> (Note that the latter region could also be unmapped and remapped again
>> with updated permission attributes, given that the region, while live, is
>> only mapped for the convenience of the hibernation code, but that also
>> means the TLB footprint is negligible anyway, so why bother)
>>
>> This enables the following contiguous range sizes for the virtual mapping
>> of the kernel image, and for the linear mapping:
>>
>>           granule size |  cont PTE  |  cont PMD  |
>>           -------------+------------+------------+
>>                4 KB    |    64 KB   |   32 MB    |
>>               16 KB    |     2 MB   |    1 GB*   |
>>               64 KB    |     2 MB   |   16 GB*   |
>>
>> * Only when built for 3 or more levels of translation. This is due to the
>>   fact that a 2 level configuration only consists of PGDs and PTEs, and the
>>   added complexity of dealing with folded PMDs is not justified considering
>>   that 16 GB contiguous ranges are likely to be ignored by the hardware (and
>>   16k/2 levels is a niche configuration)
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/mm/mmu.c | 86 ++++++++++++++------
>>  1 file changed, 63 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 0612573ef869..d0ae2f1f44fc 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -109,8 +109,10 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
>>  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>>                                 unsigned long end, unsigned long pfn,
>>                                 pgprot_t prot,
>> -                               phys_addr_t (*pgtable_alloc)(void))
>> +                               phys_addr_t (*pgtable_alloc)(void),
>> +                               bool may_use_cont)
>
> Maybe we should invert this as single_pages_only, to keep the same
> polarity as page_mappings_only?
>
> That might make the calls to __create_pgd_mapping() in __map_memblock()
> look a little nicer, for instance.
>

Good point

>>  {
>> +     pgprot_t __prot = prot;
>>       pte_t *pte;
>>
>>       BUG_ON(pmd_sect(*pmd));
>> @@ -128,7 +130,19 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>>       do {
>>               pte_t old_pte = *pte;
>>
>> -             set_pte(pte, pfn_pte(pfn, prot));
>> +             /*
>> +              * Set the contiguous bit for the subsequent group of PTEs if
>> +              * its size and alignment are appropriate.
>> +              */
>> +             if (may_use_cont &&
>> +                 ((addr | PFN_PHYS(pfn)) & ~CONT_PTE_MASK) == 0) {
>> +                     if (end - addr >= CONT_PTE_SIZE)
>> +                             __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
>> +                     else
>> +                             __prot = prot;
>> +             }
>> +
>> +             set_pte(pte, pfn_pte(pfn, __prot));
>>               pfn++;
>>
>>               /*
>
> While it would require more code, I think it would be better to add a
> function between alloc_init_pte() and alloc_init_pmd(), handling this in
> the usual fashion. e.g.
>
> ---->8----
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 0eef606..2c90925 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -74,6 +74,11 @@
>  #define pte_user_exec(pte)     (!(pte_val(pte) & PTE_UXN))
>  #define pte_cont(pte)          (!!(pte_val(pte) & PTE_CONT))
>
> +#define pte_cont_addr_end(addr, end)                                           \
> +({     unsigned long __boundary = ((addr) + CONT_PTE_SIZE) & CONT_PTE_MASK;    \
> +       (__boundary - 1 < (end) - 1)? __boundary: (end);                        \
> +})
> +
>  #ifdef CONFIG_ARM64_HW_AFDBM
>  #define pte_hw_dirty(pte)      (pte_write(pte) && !(pte_val(pte) & PTE_RDONLY))
>  #else
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 2aec93ab..3cee826 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -142,6 +142,32 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>         pte_clear_fixmap();
>  }
>
> +static void alloc_init_cont_pte(pmd_t *pmd, unsigned long addr,
> +                                 unsigned long end, phys_addr_t phys,
> +                                 pgprot_t prot,
> +                                 phys_addr_t (*pgtable_alloc)(void),
> +                                 bool may_use_cont)
> +{
> +       const pgprot_t cont_prot = __pgprot(pgprot_val(prot) | PTE_CONT);
> +       unsigned long next;
> +
> +       do {
> +               next = pte_cont_addr_end(addr, end);
> +
> +               /* try a contiguous mapping first */
> +               if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
> +                   may_use_cont) {
> +                       alloc_init_pte(pmd, addr, next, phys, cont_prot,
> +                                      pgtable_alloc);
> +               } else {
> +                       alloc_init_pte(pmd, addr, next, phys, prot,
> +                                      pgtable_alloc);
> +               }
> +
> +               phys += next - addr;
> +       } while (addr = next, addr != end);
> +}
> +
>  static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>                                   phys_addr_t phys, pgprot_t prot,
>                                   phys_addr_t (*pgtable_alloc)(void),
> ---->8----
>
> It's more code, but it follows the existing pattern, and I personally
> find it easier to follow than changing prot within the loop.
>
> Note that I've cheated and made alloc_init_pte() take a phys_addr_t
> rather than a pfn, which I think we should do anyhow for consistency. I
> have a patch for that, if you agree.
>

Yes, absolutely. I did not spot this before you pointed it out, but it
looks a bit sloppy.

> Another think we *might* want to do here is pass a flags parameter that
> than separate page_mappings_only and mask_use_cont. That might also make
> things less painful in future if there are other things we need to pass
> down.
>

Yes, I already did that in a draft version, and then dropped it again.
I don't have a strong preference either way, so let me try that for
the next version

> That might also end up complicating matters, given the way we currently
> use debug_pagealloc_enabled() in __create_pgd_mapping() callers, so I'm
> also happy to leave that.
>
> [...]
>
>> @@ -173,7 +189,19 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>>               /* try section mapping first */
>>               if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
>>                     !page_mappings_only) {
>> -                     pmd_set_huge(pmd, phys, prot);
>> +                     /*
>> +                      * Set the contiguous bit for the subsequent group of
>> +                      * PMDs if its size and alignment are appropriate.
>> +                      */
>> +                     if (may_use_cont &&
>> +                         ((addr | phys) & ~CONT_PMD_MASK) == 0) {
>> +                             if (end - addr >= CONT_PMD_SIZE)
>> +                                     __prot = __pgprot(pgprot_val(prot) |
>> +                                                       PTE_CONT);
>> +                             else
>> +                                     __prot = prot;
>> +                     }
>> +                     pmd_set_huge(pmd, phys, __prot);
>
> As above, I think it would be better to have a alloc_init_cont_pmd()
> wrapper that iterated in CONT_PMD_SIZE chunks, so as to keep the usual
> pattern.
>
>> @@ -381,7 +418,8 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>>        */
>>       __create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
>>                            kernel_end - kernel_start, PAGE_KERNEL,
>> -                          early_pgtable_alloc, debug_pagealloc_enabled());
>> +                          early_pgtable_alloc, debug_pagealloc_enabled(),
>> +                          false);
>>  }
>
>> @@ -437,7 +476,8 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
>>       BUG_ON(!PAGE_ALIGNED(size));
>>
>>       __create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
>> -                          early_pgtable_alloc, debug_pagealloc_enabled());
>> +                          early_pgtable_alloc, debug_pagealloc_enabled(),
>> +                          !debug_pagealloc_enabled() && may_use_cont);
>>
>>       vma->addr       = va_start;
>>       vma->phys_addr  = pa_start;
>> @@ -464,14 +504,14 @@ static void __init map_kernel(pgd_t *pgd)
>>
>>       pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
>>
>> -     map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
>> +     map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text, true);
>>       map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
>> -                        &vmlinux_rodata);
>> +                        &vmlinux_rodata, false);
>>       map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
>> -                        &vmlinux_inittext);
>> +                        &vmlinux_inittext, true);
>>       map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
>> -                        &vmlinux_initdata);
>> -     map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);
>> +                        &vmlinux_initdata, true);
>> +     map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data, true);
>
> It might be worth a comment somewhere as to why we have to special case
> text, rodata, and the linear alias, but otherwise this looks fine.
>

Ack

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 6/6] arm64: mm: set the contiguous bit for kernel mappings where appropriate
@ 2017-03-08 10:57       ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-08 10:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 7 March 2017 at 17:46, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi,
>
> On Sat, Mar 04, 2017 at 02:30:48PM +0000, Ard Biesheuvel wrote:
>> This is the third attempt at enabling the use of contiguous hints for
>> kernel mappings. The most recent attempt 0bfc445dec9d was reverted after
>> it turned out that updating permission attributes on live contiguous ranges
>> may result in TLB conflicts. So this time, the contiguous hint is not set
>> for .rodata or for the linear alias of .text/.rodata, both of which are
>> mapped read-write initially, and remapped read-only at a later stage.
>> (Note that the latter region could also be unmapped and remapped again
>> with updated permission attributes, given that the region, while live, is
>> only mapped for the convenience of the hibernation code, but that also
>> means the TLB footprint is negligible anyway, so why bother)
>>
>> This enables the following contiguous range sizes for the virtual mapping
>> of the kernel image, and for the linear mapping:
>>
>>           granule size |  cont PTE  |  cont PMD  |
>>           -------------+------------+------------+
>>                4 KB    |    64 KB   |   32 MB    |
>>               16 KB    |     2 MB   |    1 GB*   |
>>               64 KB    |     2 MB   |   16 GB*   |
>>
>> * Only when built for 3 or more levels of translation. This is due to the
>>   fact that a 2 level configuration only consists of PGDs and PTEs, and the
>>   added complexity of dealing with folded PMDs is not justified considering
>>   that 16 GB contiguous ranges are likely to be ignored by the hardware (and
>>   16k/2 levels is a niche configuration)
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/mm/mmu.c | 86 ++++++++++++++------
>>  1 file changed, 63 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 0612573ef869..d0ae2f1f44fc 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -109,8 +109,10 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
>>  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>>                                 unsigned long end, unsigned long pfn,
>>                                 pgprot_t prot,
>> -                               phys_addr_t (*pgtable_alloc)(void))
>> +                               phys_addr_t (*pgtable_alloc)(void),
>> +                               bool may_use_cont)
>
> Maybe we should invert this as single_pages_only, to keep the same
> polarity as page_mappings_only?
>
> That might make the calls to __create_pgd_mapping() in __map_memblock()
> look a little nicer, for instance.
>

Good point

>>  {
>> +     pgprot_t __prot = prot;
>>       pte_t *pte;
>>
>>       BUG_ON(pmd_sect(*pmd));
>> @@ -128,7 +130,19 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>>       do {
>>               pte_t old_pte = *pte;
>>
>> -             set_pte(pte, pfn_pte(pfn, prot));
>> +             /*
>> +              * Set the contiguous bit for the subsequent group of PTEs if
>> +              * its size and alignment are appropriate.
>> +              */
>> +             if (may_use_cont &&
>> +                 ((addr | PFN_PHYS(pfn)) & ~CONT_PTE_MASK) == 0) {
>> +                     if (end - addr >= CONT_PTE_SIZE)
>> +                             __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
>> +                     else
>> +                             __prot = prot;
>> +             }
>> +
>> +             set_pte(pte, pfn_pte(pfn, __prot));
>>               pfn++;
>>
>>               /*
>
> While it would require more code, I think it would be better to add a
> function between alloc_init_pte() and alloc_init_pmd(), handling this in
> the usual fashion. e.g.
>
> ---->8----
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 0eef606..2c90925 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -74,6 +74,11 @@
>  #define pte_user_exec(pte)     (!(pte_val(pte) & PTE_UXN))
>  #define pte_cont(pte)          (!!(pte_val(pte) & PTE_CONT))
>
> +#define pte_cont_addr_end(addr, end)                                           \
> +({     unsigned long __boundary = ((addr) + CONT_PTE_SIZE) & CONT_PTE_MASK;    \
> +       (__boundary - 1 < (end) - 1)? __boundary: (end);                        \
> +})
> +
>  #ifdef CONFIG_ARM64_HW_AFDBM
>  #define pte_hw_dirty(pte)      (pte_write(pte) && !(pte_val(pte) & PTE_RDONLY))
>  #else
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 2aec93ab..3cee826 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -142,6 +142,32 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>         pte_clear_fixmap();
>  }
>
> +static void alloc_init_cont_pte(pmd_t *pmd, unsigned long addr,
> +                                 unsigned long end, phys_addr_t phys,
> +                                 pgprot_t prot,
> +                                 phys_addr_t (*pgtable_alloc)(void),
> +                                 bool may_use_cont)
> +{
> +       const pgprot_t cont_prot = __pgprot(pgprot_val(prot) | PTE_CONT);
> +       unsigned long next;
> +
> +       do {
> +               next = pte_cont_addr_end(addr, end);
> +
> +               /* try a contiguous mapping first */
> +               if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
> +                   may_use_cont) {
> +                       alloc_init_pte(pmd, addr, next, phys, cont_prot,
> +                                      pgtable_alloc);
> +               } else {
> +                       alloc_init_pte(pmd, addr, next, phys, prot,
> +                                      pgtable_alloc);
> +               }
> +
> +               phys += next - addr;
> +       } while (addr = next, addr != end);
> +}
> +
>  static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>                                   phys_addr_t phys, pgprot_t prot,
>                                   phys_addr_t (*pgtable_alloc)(void),
> ---->8----
>
> It's more code, but it follows the existing pattern, and I personally
> find it easier to follow than changing prot within the loop.
>
> Note that I've cheated and made alloc_init_pte() take a phys_addr_t
> rather than a pfn, which I think we should do anyhow for consistency. I
> have a patch for that, if you agree.
>

Yes, absolutely. I did not spot this before you pointed it out, but it
looks a bit sloppy.

> Another think we *might* want to do here is pass a flags parameter that
> than separate page_mappings_only and mask_use_cont. That might also make
> things less painful in future if there are other things we need to pass
> down.
>

Yes, I already did that in a draft version, and then dropped it again.
I don't have a strong preference either way, so let me try that for
the next version

> That might also end up complicating matters, given the way we currently
> use debug_pagealloc_enabled() in __create_pgd_mapping() callers, so I'm
> also happy to leave that.
>
> [...]
>
>> @@ -173,7 +189,19 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>>               /* try section mapping first */
>>               if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
>>                     !page_mappings_only) {
>> -                     pmd_set_huge(pmd, phys, prot);
>> +                     /*
>> +                      * Set the contiguous bit for the subsequent group of
>> +                      * PMDs if its size and alignment are appropriate.
>> +                      */
>> +                     if (may_use_cont &&
>> +                         ((addr | phys) & ~CONT_PMD_MASK) == 0) {
>> +                             if (end - addr >= CONT_PMD_SIZE)
>> +                                     __prot = __pgprot(pgprot_val(prot) |
>> +                                                       PTE_CONT);
>> +                             else
>> +                                     __prot = prot;
>> +                     }
>> +                     pmd_set_huge(pmd, phys, __prot);
>
> As above, I think it would be better to have a alloc_init_cont_pmd()
> wrapper that iterated in CONT_PMD_SIZE chunks, so as to keep the usual
> pattern.
>
>> @@ -381,7 +418,8 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>>        */
>>       __create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
>>                            kernel_end - kernel_start, PAGE_KERNEL,
>> -                          early_pgtable_alloc, debug_pagealloc_enabled());
>> +                          early_pgtable_alloc, debug_pagealloc_enabled(),
>> +                          false);
>>  }
>
>> @@ -437,7 +476,8 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
>>       BUG_ON(!PAGE_ALIGNED(size));
>>
>>       __create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
>> -                          early_pgtable_alloc, debug_pagealloc_enabled());
>> +                          early_pgtable_alloc, debug_pagealloc_enabled(),
>> +                          !debug_pagealloc_enabled() && may_use_cont);
>>
>>       vma->addr       = va_start;
>>       vma->phys_addr  = pa_start;
>> @@ -464,14 +504,14 @@ static void __init map_kernel(pgd_t *pgd)
>>
>>       pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
>>
>> -     map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
>> +     map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text, true);
>>       map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
>> -                        &vmlinux_rodata);
>> +                        &vmlinux_rodata, false);
>>       map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
>> -                        &vmlinux_inittext);
>> +                        &vmlinux_inittext, true);
>>       map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
>> -                        &vmlinux_initdata);
>> -     map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);
>> +                        &vmlinux_initdata, true);
>> +     map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data, true);
>
> It might be worth a comment somewhere as to why we have to special case
> text, rodata, and the linear alias, but otherwise this looks fine.
>

Ack

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [kernel-hardening] Re: [PATCH v4 6/6] arm64: mm: set the contiguous bit for kernel mappings where appropriate
@ 2017-03-08 10:57       ` Ard Biesheuvel
  0 siblings, 0 replies; 36+ messages in thread
From: Ard Biesheuvel @ 2017-03-08 10:57 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Catalin Marinas, Will Deacon,
	Laura Abbott, kvmarm, Marc Zyngier, Kees Cook, Andre Przywara,
	James Morse, Suzuki K. Poulose

On 7 March 2017 at 17:46, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi,
>
> On Sat, Mar 04, 2017 at 02:30:48PM +0000, Ard Biesheuvel wrote:
>> This is the third attempt at enabling the use of contiguous hints for
>> kernel mappings. The most recent attempt 0bfc445dec9d was reverted after
>> it turned out that updating permission attributes on live contiguous ranges
>> may result in TLB conflicts. So this time, the contiguous hint is not set
>> for .rodata or for the linear alias of .text/.rodata, both of which are
>> mapped read-write initially, and remapped read-only at a later stage.
>> (Note that the latter region could also be unmapped and remapped again
>> with updated permission attributes, given that the region, while live, is
>> only mapped for the convenience of the hibernation code, but that also
>> means the TLB footprint is negligible anyway, so why bother)
>>
>> This enables the following contiguous range sizes for the virtual mapping
>> of the kernel image, and for the linear mapping:
>>
>>           granule size |  cont PTE  |  cont PMD  |
>>           -------------+------------+------------+
>>                4 KB    |    64 KB   |   32 MB    |
>>               16 KB    |     2 MB   |    1 GB*   |
>>               64 KB    |     2 MB   |   16 GB*   |
>>
>> * Only when built for 3 or more levels of translation. This is due to the
>>   fact that a 2 level configuration only consists of PGDs and PTEs, and the
>>   added complexity of dealing with folded PMDs is not justified considering
>>   that 16 GB contiguous ranges are likely to be ignored by the hardware (and
>>   16k/2 levels is a niche configuration)
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/mm/mmu.c | 86 ++++++++++++++------
>>  1 file changed, 63 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 0612573ef869..d0ae2f1f44fc 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -109,8 +109,10 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
>>  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>>                                 unsigned long end, unsigned long pfn,
>>                                 pgprot_t prot,
>> -                               phys_addr_t (*pgtable_alloc)(void))
>> +                               phys_addr_t (*pgtable_alloc)(void),
>> +                               bool may_use_cont)
>
> Maybe we should invert this as single_pages_only, to keep the same
> polarity as page_mappings_only?
>
> That might make the calls to __create_pgd_mapping() in __map_memblock()
> look a little nicer, for instance.
>

Good point

>>  {
>> +     pgprot_t __prot = prot;
>>       pte_t *pte;
>>
>>       BUG_ON(pmd_sect(*pmd));
>> @@ -128,7 +130,19 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>>       do {
>>               pte_t old_pte = *pte;
>>
>> -             set_pte(pte, pfn_pte(pfn, prot));
>> +             /*
>> +              * Set the contiguous bit for the subsequent group of PTEs if
>> +              * its size and alignment are appropriate.
>> +              */
>> +             if (may_use_cont &&
>> +                 ((addr | PFN_PHYS(pfn)) & ~CONT_PTE_MASK) == 0) {
>> +                     if (end - addr >= CONT_PTE_SIZE)
>> +                             __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
>> +                     else
>> +                             __prot = prot;
>> +             }
>> +
>> +             set_pte(pte, pfn_pte(pfn, __prot));
>>               pfn++;
>>
>>               /*
>
> While it would require more code, I think it would be better to add a
> function between alloc_init_pte() and alloc_init_pmd(), handling this in
> the usual fashion. e.g.
>
> ---->8----
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 0eef606..2c90925 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -74,6 +74,11 @@
>  #define pte_user_exec(pte)     (!(pte_val(pte) & PTE_UXN))
>  #define pte_cont(pte)          (!!(pte_val(pte) & PTE_CONT))
>
> +#define pte_cont_addr_end(addr, end)                                           \
> +({     unsigned long __boundary = ((addr) + CONT_PTE_SIZE) & CONT_PTE_MASK;    \
> +       (__boundary - 1 < (end) - 1)? __boundary: (end);                        \
> +})
> +
>  #ifdef CONFIG_ARM64_HW_AFDBM
>  #define pte_hw_dirty(pte)      (pte_write(pte) && !(pte_val(pte) & PTE_RDONLY))
>  #else
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 2aec93ab..3cee826 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -142,6 +142,32 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>         pte_clear_fixmap();
>  }
>
> +static void alloc_init_cont_pte(pmd_t *pmd, unsigned long addr,
> +                                 unsigned long end, phys_addr_t phys,
> +                                 pgprot_t prot,
> +                                 phys_addr_t (*pgtable_alloc)(void),
> +                                 bool may_use_cont)
> +{
> +       const pgprot_t cont_prot = __pgprot(pgprot_val(prot) | PTE_CONT);
> +       unsigned long next;
> +
> +       do {
> +               next = pte_cont_addr_end(addr, end);
> +
> +               /* try a contiguous mapping first */
> +               if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
> +                   may_use_cont) {
> +                       alloc_init_pte(pmd, addr, next, phys, cont_prot,
> +                                      pgtable_alloc);
> +               } else {
> +                       alloc_init_pte(pmd, addr, next, phys, prot,
> +                                      pgtable_alloc);
> +               }
> +
> +               phys += next - addr;
> +       } while (addr = next, addr != end);
> +}
> +
>  static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>                                   phys_addr_t phys, pgprot_t prot,
>                                   phys_addr_t (*pgtable_alloc)(void),
> ---->8----
>
> It's more code, but it follows the existing pattern, and I personally
> find it easier to follow than changing prot within the loop.
>
> Note that I've cheated and made alloc_init_pte() take a phys_addr_t
> rather than a pfn, which I think we should do anyhow for consistency. I
> have a patch for that, if you agree.
>

Yes, absolutely. I did not spot this before you pointed it out, but it
looks a bit sloppy.

> Another think we *might* want to do here is pass a flags parameter that
> than separate page_mappings_only and mask_use_cont. That might also make
> things less painful in future if there are other things we need to pass
> down.
>

Yes, I already did that in a draft version, and then dropped it again.
I don't have a strong preference either way, so let me try that for
the next version

> That might also end up complicating matters, given the way we currently
> use debug_pagealloc_enabled() in __create_pgd_mapping() callers, so I'm
> also happy to leave that.
>
> [...]
>
>> @@ -173,7 +189,19 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>>               /* try section mapping first */
>>               if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
>>                     !page_mappings_only) {
>> -                     pmd_set_huge(pmd, phys, prot);
>> +                     /*
>> +                      * Set the contiguous bit for the subsequent group of
>> +                      * PMDs if its size and alignment are appropriate.
>> +                      */
>> +                     if (may_use_cont &&
>> +                         ((addr | phys) & ~CONT_PMD_MASK) == 0) {
>> +                             if (end - addr >= CONT_PMD_SIZE)
>> +                                     __prot = __pgprot(pgprot_val(prot) |
>> +                                                       PTE_CONT);
>> +                             else
>> +                                     __prot = prot;
>> +                     }
>> +                     pmd_set_huge(pmd, phys, __prot);
>
> As above, I think it would be better to have a alloc_init_cont_pmd()
> wrapper that iterated in CONT_PMD_SIZE chunks, so as to keep the usual
> pattern.
>
>> @@ -381,7 +418,8 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>>        */
>>       __create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
>>                            kernel_end - kernel_start, PAGE_KERNEL,
>> -                          early_pgtable_alloc, debug_pagealloc_enabled());
>> +                          early_pgtable_alloc, debug_pagealloc_enabled(),
>> +                          false);
>>  }
>
>> @@ -437,7 +476,8 @@ static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end,
>>       BUG_ON(!PAGE_ALIGNED(size));
>>
>>       __create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
>> -                          early_pgtable_alloc, debug_pagealloc_enabled());
>> +                          early_pgtable_alloc, debug_pagealloc_enabled(),
>> +                          !debug_pagealloc_enabled() && may_use_cont);
>>
>>       vma->addr       = va_start;
>>       vma->phys_addr  = pa_start;
>> @@ -464,14 +504,14 @@ static void __init map_kernel(pgd_t *pgd)
>>
>>       pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
>>
>> -     map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text);
>> +     map_kernel_segment(pgd, _text, _etext, text_prot, &vmlinux_text, true);
>>       map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
>> -                        &vmlinux_rodata);
>> +                        &vmlinux_rodata, false);
>>       map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
>> -                        &vmlinux_inittext);
>> +                        &vmlinux_inittext, true);
>>       map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
>> -                        &vmlinux_initdata);
>> -     map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);
>> +                        &vmlinux_initdata, true);
>> +     map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data, true);
>
> It might be worth a comment somewhere as to why we have to special case
> text, rodata, and the linear alias, but otherwise this looks fine.
>

Ack

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 6/6] arm64: mm: set the contiguous bit for kernel mappings where appropriate
  2017-03-08 10:57       ` Ard Biesheuvel
  (?)
@ 2017-03-08 11:22         ` Mark Rutland
  -1 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2017-03-08 11:22 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kees Cook, kernel-hardening, Marc Zyngier, Catalin Marinas,
	Will Deacon, Andre Przywara, kvmarm, linux-arm-kernel,
	Laura Abbott

On Wed, Mar 08, 2017 at 11:57:22AM +0100, Ard Biesheuvel wrote:
> On 7 March 2017 at 17:46, Mark Rutland <mark.rutland@arm.com> wrote:
> > Note that I've cheated and made alloc_init_pte() take a phys_addr_t
> > rather than a pfn, which I think we should do anyhow for consistency. I
> > have a patch for that, if you agree.
> 
> Yes, absolutely. I did not spot this before you pointed it out, but it
> looks a bit sloppy.

Patch below, based on patch 5 of this series.

Thanks,
Mark.

---->8----
>From 31052898c92711c871ff68aabec01b8b2c415ec1 Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.rutland@arm.com>
Date: Tue, 7 Mar 2017 15:30:13 +0000
Subject: [PATCH] arm64: mmu: unify alloc_init_p??() prototypes

Currently alloc_init_pte() accepts the physical address as a pfn, rather
than a phys_addr_t as all the other alloc_init_p??() functions do. This
also makes the structure of alloc_init_pte() unnecessarily different to
the other functions.

This patch updates alloc_init_pte() to take a the physical address as a
phys_addr_t, following the same pattern as the other alloc_init_p??()
functions.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0612573..2aec93ab 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -107,7 +107,7 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
 }
 
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
-				  unsigned long end, unsigned long pfn,
+				  unsigned long end, phys_addr_t phys,
 				  pgprot_t prot,
 				  phys_addr_t (*pgtable_alloc)(void))
 {
@@ -128,8 +128,7 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 	do {
 		pte_t old_pte = *pte;
 
-		set_pte(pte, pfn_pte(pfn, prot));
-		pfn++;
+		set_pte(pte, pfn_pte(__phys_to_pfn(phys), prot));
 
 		/*
 		 * After the PTE entry has been populated once, we
@@ -137,6 +136,7 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 		 */
 		BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(*pte)));
 
+		phys += PAGE_SIZE;
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 
 	pte_clear_fixmap();
@@ -182,7 +182,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 			BUG_ON(!pgattr_change_is_safe(pmd_val(old_pmd),
 						      pmd_val(*pmd)));
 		} else {
-			alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
+			alloc_init_pte(pmd, addr, next, phys,
 				       prot, pgtable_alloc);
 
 			BUG_ON(pmd_val(old_pmd) != 0 &&
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 6/6] arm64: mm: set the contiguous bit for kernel mappings where appropriate
@ 2017-03-08 11:22         ` Mark Rutland
  0 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2017-03-08 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Mar 08, 2017 at 11:57:22AM +0100, Ard Biesheuvel wrote:
> On 7 March 2017 at 17:46, Mark Rutland <mark.rutland@arm.com> wrote:
> > Note that I've cheated and made alloc_init_pte() take a phys_addr_t
> > rather than a pfn, which I think we should do anyhow for consistency. I
> > have a patch for that, if you agree.
> 
> Yes, absolutely. I did not spot this before you pointed it out, but it
> looks a bit sloppy.

Patch below, based on patch 5 of this series.

Thanks,
Mark.

---->8----
>From 31052898c92711c871ff68aabec01b8b2c415ec1 Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.rutland@arm.com>
Date: Tue, 7 Mar 2017 15:30:13 +0000
Subject: [PATCH] arm64: mmu: unify alloc_init_p??() prototypes

Currently alloc_init_pte() accepts the physical address as a pfn, rather
than a phys_addr_t as all the other alloc_init_p??() functions do. This
also makes the structure of alloc_init_pte() unnecessarily different to
the other functions.

This patch updates alloc_init_pte() to take a the physical address as a
phys_addr_t, following the same pattern as the other alloc_init_p??()
functions.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0612573..2aec93ab 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -107,7 +107,7 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
 }
 
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
-				  unsigned long end, unsigned long pfn,
+				  unsigned long end, phys_addr_t phys,
 				  pgprot_t prot,
 				  phys_addr_t (*pgtable_alloc)(void))
 {
@@ -128,8 +128,7 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 	do {
 		pte_t old_pte = *pte;
 
-		set_pte(pte, pfn_pte(pfn, prot));
-		pfn++;
+		set_pte(pte, pfn_pte(__phys_to_pfn(phys), prot));
 
 		/*
 		 * After the PTE entry has been populated once, we
@@ -137,6 +136,7 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 		 */
 		BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(*pte)));
 
+		phys += PAGE_SIZE;
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 
 	pte_clear_fixmap();
@@ -182,7 +182,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 			BUG_ON(!pgattr_change_is_safe(pmd_val(old_pmd),
 						      pmd_val(*pmd)));
 		} else {
-			alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
+			alloc_init_pte(pmd, addr, next, phys,
 				       prot, pgtable_alloc);
 
 			BUG_ON(pmd_val(old_pmd) != 0 &&
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [kernel-hardening] Re: [PATCH v4 6/6] arm64: mm: set the contiguous bit for kernel mappings where appropriate
@ 2017-03-08 11:22         ` Mark Rutland
  0 siblings, 0 replies; 36+ messages in thread
From: Mark Rutland @ 2017-03-08 11:22 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Catalin Marinas, Will Deacon,
	Laura Abbott, kvmarm, Marc Zyngier, Kees Cook, Andre Przywara,
	James Morse, Suzuki K. Poulose

On Wed, Mar 08, 2017 at 11:57:22AM +0100, Ard Biesheuvel wrote:
> On 7 March 2017 at 17:46, Mark Rutland <mark.rutland@arm.com> wrote:
> > Note that I've cheated and made alloc_init_pte() take a phys_addr_t
> > rather than a pfn, which I think we should do anyhow for consistency. I
> > have a patch for that, if you agree.
> 
> Yes, absolutely. I did not spot this before you pointed it out, but it
> looks a bit sloppy.

Patch below, based on patch 5 of this series.

Thanks,
Mark.

---->8----
>From 31052898c92711c871ff68aabec01b8b2c415ec1 Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.rutland@arm.com>
Date: Tue, 7 Mar 2017 15:30:13 +0000
Subject: [PATCH] arm64: mmu: unify alloc_init_p??() prototypes

Currently alloc_init_pte() accepts the physical address as a pfn, rather
than a phys_addr_t as all the other alloc_init_p??() functions do. This
also makes the structure of alloc_init_pte() unnecessarily different to
the other functions.

This patch updates alloc_init_pte() to take a the physical address as a
phys_addr_t, following the same pattern as the other alloc_init_p??()
functions.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0612573..2aec93ab 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -107,7 +107,7 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
 }
 
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
-				  unsigned long end, unsigned long pfn,
+				  unsigned long end, phys_addr_t phys,
 				  pgprot_t prot,
 				  phys_addr_t (*pgtable_alloc)(void))
 {
@@ -128,8 +128,7 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 	do {
 		pte_t old_pte = *pte;
 
-		set_pte(pte, pfn_pte(pfn, prot));
-		pfn++;
+		set_pte(pte, pfn_pte(__phys_to_pfn(phys), prot));
 
 		/*
 		 * After the PTE entry has been populated once, we
@@ -137,6 +136,7 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 		 */
 		BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(*pte)));
 
+		phys += PAGE_SIZE;
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 
 	pte_clear_fixmap();
@@ -182,7 +182,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 			BUG_ON(!pgattr_change_is_safe(pmd_val(old_pmd),
 						      pmd_val(*pmd)));
 		} else {
-			alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
+			alloc_init_pte(pmd, addr, next, phys,
 				       prot, pgtable_alloc);
 
 			BUG_ON(pmd_val(old_pmd) != 0 &&
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2017-03-08 11:22 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-04 14:30 [PATCH v4 0/6] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel Ard Biesheuvel
2017-03-04 14:30 ` [kernel-hardening] " Ard Biesheuvel
2017-03-04 14:30 ` Ard Biesheuvel
2017-03-04 14:30 ` [PATCH v4 1/6] arm: kvm: move kvm_vgic_global_state out of .text section Ard Biesheuvel
2017-03-04 14:30   ` [kernel-hardening] " Ard Biesheuvel
2017-03-04 14:30   ` Ard Biesheuvel
2017-03-04 14:30 ` [PATCH v4 2/6] arm64: mmu: move TLB maintenance from callers to create_mapping_late() Ard Biesheuvel
2017-03-04 14:30   ` [kernel-hardening] " Ard Biesheuvel
2017-03-04 14:30   ` Ard Biesheuvel
2017-03-04 14:30 ` [PATCH v4 3/6] arm64: alternatives: apply boot time fixups via the linear mapping Ard Biesheuvel
2017-03-04 14:30   ` [kernel-hardening] " Ard Biesheuvel
2017-03-04 14:30   ` Ard Biesheuvel
2017-03-04 14:30 ` [PATCH v4 4/6] arm64: mmu: map .text as read-only from the outset Ard Biesheuvel
2017-03-04 14:30   ` [kernel-hardening] " Ard Biesheuvel
2017-03-04 14:30   ` Ard Biesheuvel
2017-03-07 14:10   ` Mark Rutland
2017-03-07 14:10     ` [kernel-hardening] " Mark Rutland
2017-03-07 14:10     ` Mark Rutland
2017-03-04 14:30 ` [PATCH v4 5/6] arm64: mmu: apply strict permissions to .init.text and .init.data Ard Biesheuvel
2017-03-04 14:30   ` [kernel-hardening] " Ard Biesheuvel
2017-03-04 14:30   ` Ard Biesheuvel
2017-03-07 14:21   ` Mark Rutland
2017-03-07 14:21     ` [kernel-hardening] " Mark Rutland
2017-03-07 14:21     ` Mark Rutland
2017-03-04 14:30 ` [PATCH v4 6/6] arm64: mm: set the contiguous bit for kernel mappings where appropriate Ard Biesheuvel
2017-03-04 14:30   ` [kernel-hardening] " Ard Biesheuvel
2017-03-04 14:30   ` Ard Biesheuvel
2017-03-07 16:46   ` Mark Rutland
2017-03-07 16:46     ` [kernel-hardening] " Mark Rutland
2017-03-07 16:46     ` Mark Rutland
2017-03-08 10:57     ` Ard Biesheuvel
2017-03-08 10:57       ` [kernel-hardening] " Ard Biesheuvel
2017-03-08 10:57       ` Ard Biesheuvel
2017-03-08 11:22       ` Mark Rutland
2017-03-08 11:22         ` [kernel-hardening] " Mark Rutland
2017-03-08 11:22         ` Mark Rutland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.