All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv5 0/7] Better page protections for arm64
@ 2014-11-18  0:54 Laura Abbott
  2014-11-18  0:54 ` [PATCHv5 1/7] arm64: Treat handle_arch_irq as a function pointer Laura Abbott
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Laura Abbott @ 2014-11-18  0:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

This is v5 of the series to add stricter page protections for arm64.
The goal is to have text be RO/NX and everything else be RW/NX.
I finally got my hands on a Juno board so I was able to do more
testing with both 4K and 64K pages although I still haven't tested
with EFI. This is based off of 3.18-rc5.

Thanks,
Laura

Laura Abbott (7):
  arm64: Treat handle_arch_irq as a function pointer
  arm64: Switch to adrp for loading the stub vectors
  arm64: Move cpu_resume into the text section
  arm64: Move some head.text functions to executable section
  arm64: Factor out fixmap initialiation from ioremap
  arm64: use fixmap for text patching when text is RO
  arm64: add better page protections to arm64

 arch/arm64/Kconfig.debug            |  23 ++
 arch/arm64/include/asm/cacheflush.h |   4 +
 arch/arm64/include/asm/fixmap.h     |   8 +-
 arch/arm64/include/asm/insn.h       |   2 +
 arch/arm64/include/asm/irq.h        |   1 -
 arch/arm64/kernel/entry.S           |   6 +-
 arch/arm64/kernel/head.S            | 409 +++++++++++++++++-----------------
 arch/arm64/kernel/insn.c            |  72 +++++-
 arch/arm64/kernel/irq.c             |   2 +
 arch/arm64/kernel/jump_label.c      |   2 +-
 arch/arm64/kernel/setup.c           |   1 +
 arch/arm64/kernel/sleep.S           |  29 +--
 arch/arm64/kernel/suspend.c         |   4 +-
 arch/arm64/kernel/vmlinux.lds.S     |  21 ++
 arch/arm64/mm/init.c                |   1 +
 arch/arm64/mm/ioremap.c             |  93 +-------
 arch/arm64/mm/mm.h                  |   2 +
 arch/arm64/mm/mmu.c                 | 429 ++++++++++++++++++++++++++++++++----
 18 files changed, 743 insertions(+), 366 deletions(-)

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv5 1/7] arm64: Treat handle_arch_irq as a function pointer
  2014-11-18  0:54 [PATCHv5 0/7] Better page protections for arm64 Laura Abbott
@ 2014-11-18  0:54 ` Laura Abbott
  2014-11-18  0:55 ` [PATCHv5 2/7] arm64: Switch to adrp for loading the stub vectors Laura Abbott
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Laura Abbott @ 2014-11-18  0:54 UTC (permalink / raw)
  To: linux-arm-kernel

handle_arch_irq isn't actually text, it's just a function pointer.
It doesn't need to be stored in the text section and doing so
causes problesm if we ever want to make the kernel text read only.
Declare handle_arch_irq as a proper function pointer stored in
the data section.

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
---
 arch/arm64/include/asm/irq.h | 1 -
 arch/arm64/kernel/entry.S    | 6 ++----
 arch/arm64/kernel/irq.c      | 2 ++
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
index e1f7ecd..1eebf5b 100644
--- a/arch/arm64/include/asm/irq.h
+++ b/arch/arm64/include/asm/irq.h
@@ -3,7 +3,6 @@
 
 #include <asm-generic/irq.h>
 
-extern void (*handle_arch_irq)(struct pt_regs *);
 extern void migrate_irqs(void);
 extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
 
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 726b910..ee27f39 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -168,7 +168,8 @@ tsk	.req	x28		// current thread_info
  * Interrupt handling.
  */
 	.macro	irq_handler
-	ldr	x1, handle_arch_irq
+	adrp	x1, handle_arch_irq
+	ldr	x1, [x1, #:lo12:handle_arch_irq]
 	mov	x0, sp
 	blr	x1
 	.endm
@@ -695,6 +696,3 @@ ENTRY(sys_rt_sigreturn_wrapper)
 	mov	x0, sp
 	b	sys_rt_sigreturn
 ENDPROC(sys_rt_sigreturn_wrapper)
-
-ENTRY(handle_arch_irq)
-	.quad	0
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index 071a6ec..240b75c 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -40,6 +40,8 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 	return 0;
 }
 
+void (*handle_arch_irq)(struct pt_regs *) = NULL;
+
 void __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
 {
 	if (handle_arch_irq)
-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv5 2/7] arm64: Switch to adrp for loading the stub vectors
  2014-11-18  0:54 [PATCHv5 0/7] Better page protections for arm64 Laura Abbott
  2014-11-18  0:54 ` [PATCHv5 1/7] arm64: Treat handle_arch_irq as a function pointer Laura Abbott
@ 2014-11-18  0:55 ` Laura Abbott
  2014-11-18  0:55 ` [PATCHv5 3/7] arm64: Move cpu_resume into the text section Laura Abbott
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Laura Abbott @ 2014-11-18  0:55 UTC (permalink / raw)
  To: linux-arm-kernel

The hyp stub vectors are currently loaded using adr. This
instruction has a +/- 1MB range for the loading address. If
the alignment for sections is changed the address may be more
than 1MB away, resulting in reclocation errors. Switch to using
adrp for getting the address to ensure we aren't affected by the
location of the __hyp_stub_vectors.

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
---
 arch/arm64/kernel/head.S | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 0a6e4f9..10f5cc0 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -331,7 +331,8 @@ CPU_LE(	movk	x0, #0x30d0, lsl #16	)	// Clear EE and E0E on LE systems
 	msr	vttbr_el2, xzr
 
 	/* Hypervisor stub */
-	adr	x0, __hyp_stub_vectors
+	adrp	x0, __hyp_stub_vectors
+	add	x0, x0, #:lo12:__hyp_stub_vectors
 	msr	vbar_el2, x0
 
 	/* spsr */
-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv5 3/7] arm64: Move cpu_resume into the text section
  2014-11-18  0:54 [PATCHv5 0/7] Better page protections for arm64 Laura Abbott
  2014-11-18  0:54 ` [PATCHv5 1/7] arm64: Treat handle_arch_irq as a function pointer Laura Abbott
  2014-11-18  0:55 ` [PATCHv5 2/7] arm64: Switch to adrp for loading the stub vectors Laura Abbott
@ 2014-11-18  0:55 ` Laura Abbott
  2014-11-18 10:35   ` Lorenzo Pieralisi
  2014-11-18 10:49   ` Mark Rutland
  2014-11-18  0:55 ` [PATCHv5 4/7] arm64: Move some head.text functions to executable section Laura Abbott
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 24+ messages in thread
From: Laura Abbott @ 2014-11-18  0:55 UTC (permalink / raw)
  To: linux-arm-kernel

The function cpu_resume currently lives in the .data section.
There's no reason for it to be there since we can use relative
instructions without a problem. Move a few cpu_resume data
structures out of the assembly file so the .data annotation
can be dropped completely and cpu_resume ends up in the read
only text section.

Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
---
v5: Dropped the .data annoation completely per suggestion of
Lorenzo. Dropped mpidr_hash_ptr per suggestion of Ard.
Moved sleep_save_sp and sleep_idmap_phys from assembly per
suggestion of Mark.

Please let me know if your Tested-by/Reviewed-by/Acked-by
still stands.
---
 arch/arm64/kernel/sleep.S   | 29 ++++++-----------------------
 arch/arm64/kernel/suspend.c |  4 ++--
 2 files changed, 8 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index a564b44..3f836b2 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -147,14 +147,12 @@ cpu_resume_after_mmu:
 	ret
 ENDPROC(cpu_resume_after_mmu)
 
-	.data
 ENTRY(cpu_resume)
 	bl	el2_setup		// if in EL2 drop to EL1 cleanly
 #ifdef CONFIG_SMP
 	mrs	x1, mpidr_el1
-	adr	x4, mpidr_hash_ptr
-	ldr	x5, [x4]
-	add	x8, x4, x5		// x8 = struct mpidr_hash phys address
+	adrp	x8, mpidr_hash
+	add x8, x8, #:lo12:mpidr_hash // x8 = struct mpidr_hash phys address
         /* retrieve mpidr_hash members to compute the hash */
 	ldr	x2, [x8, #MPIDR_HASH_MASK]
 	ldp	w3, w4, [x8, #MPIDR_HASH_SHIFTS]
@@ -164,14 +162,15 @@ ENTRY(cpu_resume)
 #else
 	mov	x7, xzr
 #endif
-	adr	x0, sleep_save_sp
+	adrp	x0, sleep_save_sp
+	add	x0, x0, #:lo12:sleep_save_sp
 	ldr	x0, [x0, #SLEEP_SAVE_SP_PHYS]
 	ldr	x0, [x0, x7, lsl #3]
 	/* load sp from context */
 	ldr	x2, [x0, #CPU_CTX_SP]
-	adr	x1, sleep_idmap_phys
+	adrp	x1, sleep_idmap_phys
 	/* load physical address of identity map page table in x1 */
-	ldr	x1, [x1]
+	ldr	x1, [x1, #:lo12:sleep_idmap_phys]
 	mov	sp, x2
 	/*
 	 * cpu_do_resume expects x0 to contain context physical address
@@ -182,24 +181,8 @@ ENTRY(cpu_resume)
 ENDPROC(cpu_resume)
 
 	.align 3
-mpidr_hash_ptr:
 	/*
 	 * offset of mpidr_hash symbol from current location
 	 * used to obtain run-time mpidr_hash address with MMU off
          */
 	.quad	mpidr_hash - .
-/*
- * physical address of identity mapped page tables
- */
-	.type	sleep_idmap_phys, #object
-ENTRY(sleep_idmap_phys)
-	.quad	0
-/*
- * struct sleep_save_sp {
- *	phys_addr_t *save_ptr_stash;
- *	phys_addr_t save_ptr_stash_phys;
- * };
- */
-	.type	sleep_save_sp, #object
-ENTRY(sleep_save_sp)
-	.space	SLEEP_SAVE_SP_SZ	// struct sleep_save_sp
diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
index 13ad4db..3771b72 100644
--- a/arch/arm64/kernel/suspend.c
+++ b/arch/arm64/kernel/suspend.c
@@ -126,8 +126,8 @@ int __cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
 	return ret;
 }
 
-extern struct sleep_save_sp sleep_save_sp;
-extern phys_addr_t sleep_idmap_phys;
+struct sleep_save_sp sleep_save_sp;
+phys_addr_t sleep_idmap_phys;
 
 static int __init cpu_suspend_init(void)
 {
-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv5 4/7] arm64: Move some head.text functions to executable section
  2014-11-18  0:54 [PATCHv5 0/7] Better page protections for arm64 Laura Abbott
                   ` (2 preceding siblings ...)
  2014-11-18  0:55 ` [PATCHv5 3/7] arm64: Move cpu_resume into the text section Laura Abbott
@ 2014-11-18  0:55 ` Laura Abbott
  2014-11-18 11:41   ` Mark Rutland
  2014-11-18  0:55 ` [PATCHv5 5/7] arm64: Factor out fixmap initialiation from ioremap Laura Abbott
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Laura Abbott @ 2014-11-18  0:55 UTC (permalink / raw)
  To: linux-arm-kernel

The head.text section is intended to be run at early bootup
before any of the regular kernel mappings have been setup.
Parts of head.text may be freed back into the buddy allocator
due to TEXT_OFFSET so for security requirements this memory
must not be executable. The suspend/resume/hotplug code path
requires some of these head.S functions to run however which
means they need to be executable. Support these conflicting
requirements by moving the few head.text functions that need
to be executable to the text section which has the appropriate
page table permissions.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
---
v5: Went back to the v3 version which moved everything around instead
of adding annotations. Dropped the code duplication for head.text.
---
 arch/arm64/kernel/head.S        | 406 +++++++++++++++++++++-------------------
 arch/arm64/kernel/vmlinux.lds.S |   1 +
 2 files changed, 210 insertions(+), 197 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 10f5cc0..a41560b 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -238,7 +238,13 @@ ENTRY(stext)
 	mov	x0, x22
 	bl	lookup_processor_type
 	mov	x23, x0				// x23=current cpu_table
-	cbz	x23, __error_p			// invalid processor (x23=0)?
+	/*
+	 *	__error_p may end up out of range for cbz if text areas
+	 *	are aligned up to section sizes
+	 */
+	cbnz	x23, 1f				// invalid processor (x23=0)?
+	b	__error_p
+1:
 	bl	__vet_fdt
 	bl	__create_page_tables		// x25=TTBR0, x26=TTBR1
 	/*
@@ -250,13 +256,213 @@ ENTRY(stext)
 	 */
 	ldr	x27, __switch_data		// address to jump to after
 						// MMU has been enabled
-	adr	lr, __enable_mmu		// return (PIC) address
+	adrp	lr, __enable_mmu		// return (PIC) address
+	add	lr, lr, #:lo12:__enable_mmu
 	ldr	x12, [x23, #CPU_INFO_SETUP]
 	add	x12, x12, x28			// __virt_to_phys
 	br	x12				// initialise processor
 ENDPROC(stext)
 
 /*
+ * Determine validity of the x21 FDT pointer.
+ * The dtb must be 8-byte aligned and live in the first 512M of memory.
+ */
+__vet_fdt:
+	tst	x21, #0x7
+	b.ne	1f
+	cmp	x21, x24
+	b.lt	1f
+	mov	x0, #(1 << 29)
+	add	x0, x0, x24
+	cmp	x21, x0
+	b.ge	1f
+	ret
+1:
+	mov	x21, #0
+	ret
+ENDPROC(__vet_fdt)
+/*
+ * Macro to create a table entry to the next page.
+ *
+ *	tbl:	page table address
+ *	virt:	virtual address
+ *	shift:	#imm page table shift
+ *	ptrs:	#imm pointers per table page
+ *
+ * Preserves:	virt
+ * Corrupts:	tmp1, tmp2
+ * Returns:	tbl -> next level table page address
+ */
+	.macro	create_table_entry, tbl, virt, shift, ptrs, tmp1, tmp2
+	lsr	\tmp1, \virt, #\shift
+	and	\tmp1, \tmp1, #\ptrs - 1	// table index
+	add	\tmp2, \tbl, #PAGE_SIZE
+	orr	\tmp2, \tmp2, #PMD_TYPE_TABLE	// address of next table and entry type
+	str	\tmp2, [\tbl, \tmp1, lsl #3]
+	add	\tbl, \tbl, #PAGE_SIZE		// next level table page
+	.endm
+
+/*
+ * Macro to populate the PGD (and possibily PUD) for the corresponding
+ * block entry in the next level (tbl) for the given virtual address.
+ *
+ * Preserves:	tbl, next, virt
+ * Corrupts:	tmp1, tmp2
+ */
+	.macro	create_pgd_entry, tbl, virt, tmp1, tmp2
+	create_table_entry \tbl, \virt, PGDIR_SHIFT, PTRS_PER_PGD, \tmp1, \tmp2
+#if SWAPPER_PGTABLE_LEVELS == 3
+	create_table_entry \tbl, \virt, TABLE_SHIFT, PTRS_PER_PTE, \tmp1, \tmp2
+#endif
+	.endm
+
+/*
+ * Macro to populate block entries in the page table for the start..end
+ * virtual range (inclusive).
+ *
+ * Preserves:	tbl, flags
+ * Corrupts:	phys, start, end, pstate
+ */
+	.macro	create_block_map, tbl, flags, phys, start, end
+	lsr	\phys, \phys, #BLOCK_SHIFT
+	lsr	\start, \start, #BLOCK_SHIFT
+	and	\start, \start, #PTRS_PER_PTE - 1	// table index
+	orr	\phys, \flags, \phys, lsl #BLOCK_SHIFT	// table entry
+	lsr	\end, \end, #BLOCK_SHIFT
+	and	\end, \end, #PTRS_PER_PTE - 1		// table end index
+9999:	str	\phys, [\tbl, \start, lsl #3]		// store the entry
+	add	\start, \start, #1			// next entry
+	add	\phys, \phys, #BLOCK_SIZE		// next block
+	cmp	\start, \end
+	b.ls	9999b
+	.endm
+
+/*
+ * Setup the initial page tables. We only setup the barest amount which is
+ * required to get the kernel running. The following sections are required:
+ *   - identity mapping to enable the MMU (low address, TTBR0)
+ *   - first few MB of the kernel linear mapping to jump to once the MMU has
+ *     been enabled, including the FDT blob (TTBR1)
+ *   - pgd entry for fixed mappings (TTBR1)
+ */
+__create_page_tables:
+	pgtbl	x25, x26, x28			// idmap_pg_dir and swapper_pg_dir addresses
+	mov	x27, lr
+
+	/*
+	 * Invalidate the idmap and swapper page tables to avoid potential
+	 * dirty cache lines being evicted.
+	 */
+	mov	x0, x25
+	add	x1, x26, #SWAPPER_DIR_SIZE
+	bl	__inval_cache_range
+
+	/*
+	 * Clear the idmap and swapper page tables.
+	 */
+	mov	x0, x25
+	add	x6, x26, #SWAPPER_DIR_SIZE
+1:	stp	xzr, xzr, [x0], #16
+	stp	xzr, xzr, [x0], #16
+	stp	xzr, xzr, [x0], #16
+	stp	xzr, xzr, [x0], #16
+	cmp	x0, x6
+	b.lo	1b
+
+	ldr	x7, =MM_MMUFLAGS
+
+	/*
+	 * Create the identity mapping.
+	 */
+	mov	x0, x25				// idmap_pg_dir
+	ldr	x3, =KERNEL_START
+	add	x3, x3, x28			// __pa(KERNEL_START)
+	create_pgd_entry x0, x3, x5, x6
+	ldr	x6, =KERNEL_END
+	mov	x5, x3				// __pa(KERNEL_START)
+	add	x6, x6, x28			// __pa(KERNEL_END)
+	create_block_map x0, x7, x3, x5, x6
+
+	/*
+	 * Map the kernel image (starting with PHYS_OFFSET).
+	 */
+	mov	x0, x26				// swapper_pg_dir
+	mov	x5, #PAGE_OFFSET
+	create_pgd_entry x0, x5, x3, x6
+	ldr	x6, =KERNEL_END
+	mov	x3, x24				// phys offset
+	create_block_map x0, x7, x3, x5, x6
+
+	/*
+	 * Map the FDT blob (maximum 2MB; must be within 512MB of
+	 * PHYS_OFFSET).
+	 */
+	mov	x3, x21				// FDT phys address
+	and	x3, x3, #~((1 << 21) - 1)	// 2MB aligned
+	mov	x6, #PAGE_OFFSET
+	sub	x5, x3, x24			// subtract PHYS_OFFSET
+	tst	x5, #~((1 << 29) - 1)		// within 512MB?
+	csel	x21, xzr, x21, ne		// zero the FDT pointer
+	b.ne	1f
+	add	x5, x5, x6			// __va(FDT blob)
+	add	x6, x5, #1 << 21		// 2MB for the FDT blob
+	sub	x6, x6, #1			// inclusive range
+	create_block_map x0, x7, x3, x5, x6
+1:
+	/*
+	 * Since the page tables have been populated with non-cacheable
+	 * accesses (MMU disabled), invalidate the idmap and swapper page
+	 * tables again to remove any speculatively loaded cache lines.
+	 */
+	mov	x0, x25
+	add	x1, x26, #SWAPPER_DIR_SIZE
+	bl	__inval_cache_range
+
+	mov	lr, x27
+	ret
+ENDPROC(__create_page_tables)
+	.ltorg
+
+	.align	3
+	.type	__switch_data, %object
+__switch_data:
+	.quad	__mmap_switched
+	.quad	__bss_start			// x6
+	.quad	__bss_stop			// x7
+	.quad	processor_id			// x4
+	.quad	__fdt_pointer			// x5
+	.quad	memstart_addr			// x6
+	.quad	init_thread_union + THREAD_START_SP // sp
+
+/*
+ * The following fragment of code is executed with the MMU on in MMU mode, and
+ * uses absolute addresses; this is not position independent.
+ */
+__mmap_switched:
+	adr	x3, __switch_data + 8
+
+	ldp	x6, x7, [x3], #16
+1:	cmp	x6, x7
+	b.hs	2f
+	str	xzr, [x6], #8			// Clear BSS
+	b	1b
+2:
+	ldp	x4, x5, [x3], #16
+	ldr	x6, [x3], #8
+	ldr	x16, [x3]
+	mov	sp, x16
+	str	x22, [x4]			// Save processor ID
+	str	x21, [x5]			// Save FDT pointer
+	str	x24, [x6]			// Save PHYS_OFFSET
+	mov	x29, #0
+	b	start_kernel
+ENDPROC(__mmap_switched)
+
+/*
+ * end 'true' head section, begin head section that can be read only
+ */
+	.section ".latehead.text","ax"
+/*
  * If we're fortunate enough to boot@EL2, ensure that the world is
  * sane before dropping to EL1.
  *
@@ -493,183 +699,6 @@ ENDPROC(__calc_phys_offset)
 	.quad	PAGE_OFFSET
 
 /*
- * Macro to create a table entry to the next page.
- *
- *	tbl:	page table address
- *	virt:	virtual address
- *	shift:	#imm page table shift
- *	ptrs:	#imm pointers per table page
- *
- * Preserves:	virt
- * Corrupts:	tmp1, tmp2
- * Returns:	tbl -> next level table page address
- */
-	.macro	create_table_entry, tbl, virt, shift, ptrs, tmp1, tmp2
-	lsr	\tmp1, \virt, #\shift
-	and	\tmp1, \tmp1, #\ptrs - 1	// table index
-	add	\tmp2, \tbl, #PAGE_SIZE
-	orr	\tmp2, \tmp2, #PMD_TYPE_TABLE	// address of next table and entry type
-	str	\tmp2, [\tbl, \tmp1, lsl #3]
-	add	\tbl, \tbl, #PAGE_SIZE		// next level table page
-	.endm
-
-/*
- * Macro to populate the PGD (and possibily PUD) for the corresponding
- * block entry in the next level (tbl) for the given virtual address.
- *
- * Preserves:	tbl, next, virt
- * Corrupts:	tmp1, tmp2
- */
-	.macro	create_pgd_entry, tbl, virt, tmp1, tmp2
-	create_table_entry \tbl, \virt, PGDIR_SHIFT, PTRS_PER_PGD, \tmp1, \tmp2
-#if SWAPPER_PGTABLE_LEVELS == 3
-	create_table_entry \tbl, \virt, TABLE_SHIFT, PTRS_PER_PTE, \tmp1, \tmp2
-#endif
-	.endm
-
-/*
- * Macro to populate block entries in the page table for the start..end
- * virtual range (inclusive).
- *
- * Preserves:	tbl, flags
- * Corrupts:	phys, start, end, pstate
- */
-	.macro	create_block_map, tbl, flags, phys, start, end
-	lsr	\phys, \phys, #BLOCK_SHIFT
-	lsr	\start, \start, #BLOCK_SHIFT
-	and	\start, \start, #PTRS_PER_PTE - 1	// table index
-	orr	\phys, \flags, \phys, lsl #BLOCK_SHIFT	// table entry
-	lsr	\end, \end, #BLOCK_SHIFT
-	and	\end, \end, #PTRS_PER_PTE - 1		// table end index
-9999:	str	\phys, [\tbl, \start, lsl #3]		// store the entry
-	add	\start, \start, #1			// next entry
-	add	\phys, \phys, #BLOCK_SIZE		// next block
-	cmp	\start, \end
-	b.ls	9999b
-	.endm
-
-/*
- * Setup the initial page tables. We only setup the barest amount which is
- * required to get the kernel running. The following sections are required:
- *   - identity mapping to enable the MMU (low address, TTBR0)
- *   - first few MB of the kernel linear mapping to jump to once the MMU has
- *     been enabled, including the FDT blob (TTBR1)
- *   - pgd entry for fixed mappings (TTBR1)
- */
-__create_page_tables:
-	pgtbl	x25, x26, x28			// idmap_pg_dir and swapper_pg_dir addresses
-	mov	x27, lr
-
-	/*
-	 * Invalidate the idmap and swapper page tables to avoid potential
-	 * dirty cache lines being evicted.
-	 */
-	mov	x0, x25
-	add	x1, x26, #SWAPPER_DIR_SIZE
-	bl	__inval_cache_range
-
-	/*
-	 * Clear the idmap and swapper page tables.
-	 */
-	mov	x0, x25
-	add	x6, x26, #SWAPPER_DIR_SIZE
-1:	stp	xzr, xzr, [x0], #16
-	stp	xzr, xzr, [x0], #16
-	stp	xzr, xzr, [x0], #16
-	stp	xzr, xzr, [x0], #16
-	cmp	x0, x6
-	b.lo	1b
-
-	ldr	x7, =MM_MMUFLAGS
-
-	/*
-	 * Create the identity mapping.
-	 */
-	mov	x0, x25				// idmap_pg_dir
-	ldr	x3, =KERNEL_START
-	add	x3, x3, x28			// __pa(KERNEL_START)
-	create_pgd_entry x0, x3, x5, x6
-	ldr	x6, =KERNEL_END
-	mov	x5, x3				// __pa(KERNEL_START)
-	add	x6, x6, x28			// __pa(KERNEL_END)
-	create_block_map x0, x7, x3, x5, x6
-
-	/*
-	 * Map the kernel image (starting with PHYS_OFFSET).
-	 */
-	mov	x0, x26				// swapper_pg_dir
-	mov	x5, #PAGE_OFFSET
-	create_pgd_entry x0, x5, x3, x6
-	ldr	x6, =KERNEL_END
-	mov	x3, x24				// phys offset
-	create_block_map x0, x7, x3, x5, x6
-
-	/*
-	 * Map the FDT blob (maximum 2MB; must be within 512MB of
-	 * PHYS_OFFSET).
-	 */
-	mov	x3, x21				// FDT phys address
-	and	x3, x3, #~((1 << 21) - 1)	// 2MB aligned
-	mov	x6, #PAGE_OFFSET
-	sub	x5, x3, x24			// subtract PHYS_OFFSET
-	tst	x5, #~((1 << 29) - 1)		// within 512MB?
-	csel	x21, xzr, x21, ne		// zero the FDT pointer
-	b.ne	1f
-	add	x5, x5, x6			// __va(FDT blob)
-	add	x6, x5, #1 << 21		// 2MB for the FDT blob
-	sub	x6, x6, #1			// inclusive range
-	create_block_map x0, x7, x3, x5, x6
-1:
-	/*
-	 * Since the page tables have been populated with non-cacheable
-	 * accesses (MMU disabled), invalidate the idmap and swapper page
-	 * tables again to remove any speculatively loaded cache lines.
-	 */
-	mov	x0, x25
-	add	x1, x26, #SWAPPER_DIR_SIZE
-	bl	__inval_cache_range
-
-	mov	lr, x27
-	ret
-ENDPROC(__create_page_tables)
-	.ltorg
-
-	.align	3
-	.type	__switch_data, %object
-__switch_data:
-	.quad	__mmap_switched
-	.quad	__bss_start			// x6
-	.quad	__bss_stop			// x7
-	.quad	processor_id			// x4
-	.quad	__fdt_pointer			// x5
-	.quad	memstart_addr			// x6
-	.quad	init_thread_union + THREAD_START_SP // sp
-
-/*
- * The following fragment of code is executed with the MMU on in MMU mode, and
- * uses absolute addresses; this is not position independent.
- */
-__mmap_switched:
-	adr	x3, __switch_data + 8
-
-	ldp	x6, x7, [x3], #16
-1:	cmp	x6, x7
-	b.hs	2f
-	str	xzr, [x6], #8			// Clear BSS
-	b	1b
-2:
-	ldp	x4, x5, [x3], #16
-	ldr	x6, [x3], #8
-	ldr	x16, [x3]
-	mov	sp, x16
-	str	x22, [x4]			// Save processor ID
-	str	x21, [x5]			// Save FDT pointer
-	str	x24, [x6]			// Save PHYS_OFFSET
-	mov	x29, #0
-	b	start_kernel
-ENDPROC(__mmap_switched)
-
-/*
  * Exception handling. Something went wrong and we can't proceed. We ought to
  * tell the user, but since we don't have any guarantee that we're even
  * running on the right architecture, we do virtually nothing.
@@ -717,21 +746,4 @@ __lookup_processor_type_data:
 	.quad	cpu_table
 	.size	__lookup_processor_type_data, . - __lookup_processor_type_data
 
-/*
- * Determine validity of the x21 FDT pointer.
- * The dtb must be 8-byte aligned and live in the first 512M of memory.
- */
-__vet_fdt:
-	tst	x21, #0x7
-	b.ne	1f
-	cmp	x21, x24
-	b.lt	1f
-	mov	x0, #(1 << 29)
-	add	x0, x0, x24
-	cmp	x21, x0
-	b.ge	1f
-	ret
-1:
-	mov	x21, #0
-	ret
-ENDPROC(__vet_fdt)
+
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index edf8715..2d7e20a 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -56,6 +56,7 @@ SECTIONS
 	}
 	.text : {			/* Real text segment		*/
 		_stext = .;		/* Text and read-only data	*/
+			*(.latehead.text)
 			__exception_text_start = .;
 			*(.exception.text)
 			__exception_text_end = .;
-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv5 5/7] arm64: Factor out fixmap initialiation from ioremap
  2014-11-18  0:54 [PATCHv5 0/7] Better page protections for arm64 Laura Abbott
                   ` (3 preceding siblings ...)
  2014-11-18  0:55 ` [PATCHv5 4/7] arm64: Move some head.text functions to executable section Laura Abbott
@ 2014-11-18  0:55 ` Laura Abbott
  2014-11-18  0:55 ` [PATCHv5 6/7] arm64: use fixmap for text patching when text is RO Laura Abbott
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Laura Abbott @ 2014-11-18  0:55 UTC (permalink / raw)
  To: linux-arm-kernel

The fixmap API was originally added for arm64 for
early_ioremap purposes. It can be used for other purposes too
so move the initialization from ioremap to somewhere more
generic. This makes it obvious where the fixmap is being set
up and allows for a cleaner implementation of __set_fixmap.

Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
---
 arch/arm64/include/asm/fixmap.h |  7 +--
 arch/arm64/kernel/setup.c       |  1 +
 arch/arm64/mm/ioremap.c         | 93 ++--------------------------------------
 arch/arm64/mm/mmu.c             | 94 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 102 insertions(+), 93 deletions(-)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index 5f7bfe6..db26a2f2 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -56,10 +56,11 @@ enum fixed_addresses {
 
 #define FIXMAP_PAGE_IO     __pgprot(PROT_DEVICE_nGnRE)
 
-extern void __early_set_fixmap(enum fixed_addresses idx,
-			       phys_addr_t phys, pgprot_t flags);
+void __init early_fixmap_init(void);
 
-#define __set_fixmap __early_set_fixmap
+#define __early_set_fixmap __set_fixmap
+
+extern void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot);
 
 #include <asm-generic/fixmap.h>
 
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 2437196..fb457cb 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -376,6 +376,7 @@ void __init setup_arch(char **cmdline_p)
 
 	*cmdline_p = boot_command_line;
 
+	early_fixmap_init();
 	early_ioremap_init();
 
 	parse_early_param();
diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
index 4a07630..cbb99c8 100644
--- a/arch/arm64/mm/ioremap.c
+++ b/arch/arm64/mm/ioremap.c
@@ -103,97 +103,10 @@ void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size)
 }
 EXPORT_SYMBOL(ioremap_cache);
 
-static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
-#if CONFIG_ARM64_PGTABLE_LEVELS > 2
-static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
-#endif
-#if CONFIG_ARM64_PGTABLE_LEVELS > 3
-static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
-#endif
-
-static inline pud_t * __init early_ioremap_pud(unsigned long addr)
-{
-	pgd_t *pgd;
-
-	pgd = pgd_offset_k(addr);
-	BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
-
-	return pud_offset(pgd, addr);
-}
-
-static inline pmd_t * __init early_ioremap_pmd(unsigned long addr)
-{
-	pud_t *pud = early_ioremap_pud(addr);
-
-	BUG_ON(pud_none(*pud) || pud_bad(*pud));
-
-	return pmd_offset(pud, addr);
-}
-
-static inline pte_t * __init early_ioremap_pte(unsigned long addr)
-{
-	pmd_t *pmd = early_ioremap_pmd(addr);
-
-	BUG_ON(pmd_none(*pmd) || pmd_bad(*pmd));
-
-	return pte_offset_kernel(pmd, addr);
-}
-
+/*
+ * Must be called after early_fixmap_init
+ */
 void __init early_ioremap_init(void)
 {
-	pgd_t *pgd;
-	pud_t *pud;
-	pmd_t *pmd;
-	unsigned long addr = fix_to_virt(FIX_BTMAP_BEGIN);
-
-	pgd = pgd_offset_k(addr);
-	pgd_populate(&init_mm, pgd, bm_pud);
-	pud = pud_offset(pgd, addr);
-	pud_populate(&init_mm, pud, bm_pmd);
-	pmd = pmd_offset(pud, addr);
-	pmd_populate_kernel(&init_mm, pmd, bm_pte);
-
-	/*
-	 * The boot-ioremap range spans multiple pmds, for which
-	 * we are not prepared:
-	 */
-	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
-		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
-
-	if (pmd != early_ioremap_pmd(fix_to_virt(FIX_BTMAP_END))) {
-		WARN_ON(1);
-		pr_warn("pmd %p != %p\n",
-			pmd, early_ioremap_pmd(fix_to_virt(FIX_BTMAP_END)));
-		pr_warn("fix_to_virt(FIX_BTMAP_BEGIN): %08lx\n",
-			fix_to_virt(FIX_BTMAP_BEGIN));
-		pr_warn("fix_to_virt(FIX_BTMAP_END):   %08lx\n",
-			fix_to_virt(FIX_BTMAP_END));
-
-		pr_warn("FIX_BTMAP_END:       %d\n", FIX_BTMAP_END);
-		pr_warn("FIX_BTMAP_BEGIN:     %d\n",
-			FIX_BTMAP_BEGIN);
-	}
-
 	early_ioremap_setup();
 }
-
-void __init __early_set_fixmap(enum fixed_addresses idx,
-			       phys_addr_t phys, pgprot_t flags)
-{
-	unsigned long addr = __fix_to_virt(idx);
-	pte_t *pte;
-
-	if (idx >= __end_of_fixed_addresses) {
-		BUG();
-		return;
-	}
-
-	pte = early_ioremap_pte(addr);
-
-	if (pgprot_val(flags))
-		set_pte(pte, pfn_pte(phys >> PAGE_SHIFT, flags));
-	else {
-		pte_clear(&init_mm, addr, pte);
-		flush_tlb_kernel_range(addr, addr+PAGE_SIZE);
-	}
-}
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index f4f8b50..6032f3e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -28,6 +28,7 @@
 #include <linux/io.h>
 
 #include <asm/cputype.h>
+#include <asm/fixmap.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <asm/sizes.h>
@@ -463,3 +464,96 @@ void vmemmap_free(unsigned long start, unsigned long end)
 {
 }
 #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
+
+static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
+#if CONFIG_ARM64_PGTABLE_LEVELS > 2
+static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
+#endif
+#if CONFIG_ARM64_PGTABLE_LEVELS > 3
+static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
+#endif
+
+static inline pud_t * fixmap_pud(unsigned long addr)
+{
+	pgd_t *pgd = pgd_offset_k(addr);
+
+	BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
+
+	return pud_offset(pgd, addr);
+}
+
+static inline pmd_t * fixmap_pmd(unsigned long addr)
+{
+	pud_t *pud = fixmap_pud(addr);
+
+	BUG_ON(pud_none(*pud) || pud_bad(*pud));
+
+	return pmd_offset(pud, addr);
+}
+
+static inline pte_t * fixmap_pte(unsigned long addr)
+{
+	pmd_t *pmd = fixmap_pmd(addr);
+
+	BUG_ON(pmd_none(*pmd) || pmd_bad(*pmd));
+
+	return pte_offset_kernel(pmd, addr);
+}
+
+void __init early_fixmap_init(void)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned long addr = FIXADDR_START;
+
+	pgd = pgd_offset_k(addr);
+	pgd_populate(&init_mm, pgd, bm_pud);
+	pud = pud_offset(pgd, addr);
+	pud_populate(&init_mm, pud, bm_pmd);
+	pmd = pmd_offset(pud, addr);
+	pmd_populate_kernel(&init_mm, pmd, bm_pte);
+
+	/*
+	 * The boot-ioremap range spans multiple pmds, for which
+	 * we are not preparted:
+	 */
+	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
+		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
+
+	if ((pmd != fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)))
+	     || pmd != fixmap_pmd(fix_to_virt(FIX_BTMAP_END))) {
+		WARN_ON(1);
+		pr_warn("pmd %p != %p, %p\n",
+			pmd, fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)),
+			fixmap_pmd(fix_to_virt(FIX_BTMAP_END)));
+		pr_warn("fix_to_virt(FIX_BTMAP_BEGIN): %08lx\n",
+			fix_to_virt(FIX_BTMAP_BEGIN));
+		pr_warn("fix_to_virt(FIX_BTMAP_END):   %08lx\n",
+			fix_to_virt(FIX_BTMAP_END));
+
+		pr_warn("FIX_BTMAP_END:       %d\n", FIX_BTMAP_END);
+		pr_warn("FIX_BTMAP_BEGIN:     %d\n", FIX_BTMAP_BEGIN);
+	}
+}
+
+void __set_fixmap(enum fixed_addresses idx,
+			       phys_addr_t phys, pgprot_t flags)
+{
+	unsigned long addr = __fix_to_virt(idx);
+	pte_t *pte;
+
+	if (idx >= __end_of_fixed_addresses) {
+		BUG();
+		return;
+	}
+
+	pte = fixmap_pte(addr);
+
+	if (pgprot_val(flags)) {
+		set_pte(pte, pfn_pte(phys >> PAGE_SHIFT, flags));
+	} else {
+		pte_clear(&init_mm, addr, pte);
+		flush_tlb_kernel_range(addr, addr+PAGE_SIZE);
+	}
+}
-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv5 6/7] arm64: use fixmap for text patching when text is RO
  2014-11-18  0:54 [PATCHv5 0/7] Better page protections for arm64 Laura Abbott
                   ` (4 preceding siblings ...)
  2014-11-18  0:55 ` [PATCHv5 5/7] arm64: Factor out fixmap initialiation from ioremap Laura Abbott
@ 2014-11-18  0:55 ` Laura Abbott
  2014-11-18  0:55 ` [PATCHv5 7/7] arm64: add better page protections to arm64 Laura Abbott
  2014-11-19 22:33 ` [PATCHv5 0/7] Better page protections for arm64 Kees Cook
  7 siblings, 0 replies; 24+ messages in thread
From: Laura Abbott @ 2014-11-18  0:55 UTC (permalink / raw)
  To: linux-arm-kernel

When kernel text is marked as read only, it cannot be modified directly.
Use a fixmap to modify the text instead in a similar manner to
x86 and arm.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
---
 arch/arm64/include/asm/fixmap.h |  1 +
 arch/arm64/include/asm/insn.h   |  2 ++
 arch/arm64/kernel/insn.c        | 72 +++++++++++++++++++++++++++++++++++++++--
 arch/arm64/kernel/jump_label.c  |  2 +-
 4 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index db26a2f2..2cd4b0d 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -48,6 +48,7 @@ enum fixed_addresses {
 
 	FIX_BTMAP_END = __end_of_permanent_fixed_addresses,
 	FIX_BTMAP_BEGIN = FIX_BTMAP_END + TOTAL_FIX_BTMAPS - 1,
+	FIX_TEXT_POKE0,
 	__end_of_fixed_addresses
 };
 
diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 56a9e63..f66853b 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -282,6 +282,7 @@ bool aarch64_insn_is_nop(u32 insn);
 
 int aarch64_insn_read(void *addr, u32 *insnp);
 int aarch64_insn_write(void *addr, u32 insn);
+int aarch64_insn_write_early(void *addr, u32 insn);
 enum aarch64_insn_encoding_class aarch64_get_insn_class(u32 insn);
 u32 aarch64_insn_encode_immediate(enum aarch64_insn_imm_type type,
 				  u32 insn, u64 imm);
@@ -352,6 +353,7 @@ u32 aarch64_insn_gen_logical_shifted_reg(enum aarch64_insn_register dst,
 bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
 
 int aarch64_insn_patch_text_nosync(void *addr, u32 insn);
+int __aarch64_insn_patch_text_nosync(void *addr, u32 insn, bool early);
 int aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt);
 int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt);
 #endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 8cd27fe..b2cad38 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -19,12 +19,15 @@
 #include <linux/bitops.h>
 #include <linux/compiler.h>
 #include <linux/kernel.h>
+#include <linux/mm.h>
 #include <linux/smp.h>
+#include <linux/spinlock.h>
 #include <linux/stop_machine.h>
 #include <linux/uaccess.h>
 
 #include <asm/cacheflush.h>
 #include <asm/debug-monitors.h>
+#include <asm/fixmap.h>
 #include <asm/insn.h>
 
 #define AARCH64_INSN_SF_BIT	BIT(31)
@@ -72,6 +75,36 @@ bool __kprobes aarch64_insn_is_nop(u32 insn)
 	}
 }
 
+static DEFINE_SPINLOCK(patch_lock);
+
+static void __kprobes *patch_map(void *addr, int fixmap, unsigned long *flags)
+{
+	unsigned long uintaddr = (uintptr_t) addr;
+	bool module = !core_kernel_text(uintaddr);
+	struct page *page;
+
+	if (module && IS_ENABLED(CONFIG_DEBUG_SET_MODULE_RONX))
+		page = vmalloc_to_page(addr);
+	else if (!module && IS_ENABLED(CONFIG_DEBUG_RODATA))
+		page = virt_to_page(addr);
+	else
+		return addr;
+
+	if (flags)
+		spin_lock_irqsave(&patch_lock, *flags);
+
+	set_fixmap(fixmap, page_to_phys(page));
+
+	return (void *) (__fix_to_virt(fixmap) + (uintaddr & ~PAGE_MASK));
+}
+
+static void __kprobes patch_unmap(int fixmap, unsigned long *flags)
+{
+	clear_fixmap(fixmap);
+
+	if (flags)
+		spin_unlock_irqrestore(&patch_lock, *flags);
+}
 /*
  * In ARMv8-A, A64 instructions have a fixed length of 32 bits and are always
  * little-endian.
@@ -88,10 +121,34 @@ int __kprobes aarch64_insn_read(void *addr, u32 *insnp)
 	return ret;
 }
 
+static int __kprobes __aarch64_insn_write(void *addr, u32 insn, bool patch)
+{
+	void *waddr = addr;
+	unsigned long flags;
+	int ret;
+
+	if (patch)
+		waddr = patch_map(addr, FIX_TEXT_POKE0, &flags);
+
+	ret = probe_kernel_write(waddr, &insn, AARCH64_INSN_SIZE);
+
+	if (waddr != addr)
+		patch_unmap(FIX_TEXT_POKE0, &flags);
+
+	return ret;
+}
+
 int __kprobes aarch64_insn_write(void *addr, u32 insn)
 {
 	insn = cpu_to_le32(insn);
-	return probe_kernel_write(addr, &insn, AARCH64_INSN_SIZE);
+	return __aarch64_insn_write(addr, insn, true);
+}
+
+int __kprobes aarch64_insn_write_early(void *addr, u32 insn)
+{
+	insn = cpu_to_le32(insn);
+	return __aarch64_insn_write(addr, insn, false);
+
 }
 
 static bool __kprobes __aarch64_insn_hotpatch_safe(u32 insn)
@@ -124,7 +181,7 @@ bool __kprobes aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn)
 	       __aarch64_insn_hotpatch_safe(new_insn);
 }
 
-int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn)
+int __kprobes __aarch64_insn_patch_text_nosync(void *addr, u32 insn, bool early)
 {
 	u32 *tp = addr;
 	int ret;
@@ -133,7 +190,11 @@ int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn)
 	if ((uintptr_t)tp & 0x3)
 		return -EINVAL;
 
-	ret = aarch64_insn_write(tp, insn);
+	if (early)
+		ret = aarch64_insn_write_early(tp, insn);
+	else
+		ret = aarch64_insn_write(tp, insn);
+
 	if (ret == 0)
 		flush_icache_range((uintptr_t)tp,
 				   (uintptr_t)tp + AARCH64_INSN_SIZE);
@@ -141,6 +202,11 @@ int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn)
 	return ret;
 }
 
+int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn)
+{
+	return __aarch64_insn_patch_text_nosync(addr, insn, false);
+}
+
 struct aarch64_insn_patch {
 	void		**text_addrs;
 	u32		*new_insns;
diff --git a/arch/arm64/kernel/jump_label.c b/arch/arm64/kernel/jump_label.c
index 263a166..9ac30bb 100644
--- a/arch/arm64/kernel/jump_label.c
+++ b/arch/arm64/kernel/jump_label.c
@@ -38,7 +38,7 @@ static void __arch_jump_label_transform(struct jump_entry *entry,
 	}
 
 	if (is_static)
-		aarch64_insn_patch_text_nosync(addr, insn);
+		__aarch64_insn_patch_text_nosync(addr, insn, true);
 	else
 		aarch64_insn_patch_text(&addr, &insn, 1);
 }
-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv5 7/7] arm64: add better page protections to arm64
  2014-11-18  0:54 [PATCHv5 0/7] Better page protections for arm64 Laura Abbott
                   ` (5 preceding siblings ...)
  2014-11-18  0:55 ` [PATCHv5 6/7] arm64: use fixmap for text patching when text is RO Laura Abbott
@ 2014-11-18  0:55 ` Laura Abbott
  2014-11-19 16:31   ` Mark Rutland
  2014-11-20 12:04   ` Steve Capper
  2014-11-19 22:33 ` [PATCHv5 0/7] Better page protections for arm64 Kees Cook
  7 siblings, 2 replies; 24+ messages in thread
From: Laura Abbott @ 2014-11-18  0:55 UTC (permalink / raw)
  To: linux-arm-kernel

Add page protections for arm64 similar to those in arm.
This is for security reasons to prevent certain classes
of exploits. The current method:

- Map all memory as either RWX or RW. We round to the nearest
  section to avoid creating page tables before everything is mapped
- Once everything is mapped, if either end of the RWX section should
  not be X, we split the PMD and remap as necessary
- When initmem is to be freed, we change the permissions back to
  RW (using stop machine if necessary to flush the TLB)
- If CONFIG_DEBUG_RODATA is set, the read only sections are set
  read only.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
---
v5:
 -Dropped map_io from the alloc_init_* functions
 -Fixed a bug in split_pud and cleaned it up per suggestions from Steve
 -Aligned _etext up to SECTION_SIZE of DEBUG_ALIGN_RODATA is enabled to ensure
  that the section mapping is actually kept and we don't leak any RWX regions
 -Dropped some old ioremap code that somehow snuck in from the last rebase
 -Fixed create_id_mapping to use the early mapping since efi_idmap_init is
  called early
---
 arch/arm64/Kconfig.debug            |  23 +++
 arch/arm64/include/asm/cacheflush.h |   4 +
 arch/arm64/kernel/vmlinux.lds.S     |  20 +++
 arch/arm64/mm/init.c                |   1 +
 arch/arm64/mm/mm.h                  |   2 +
 arch/arm64/mm/mmu.c                 | 289 +++++++++++++++++++++++++++++++-----
 6 files changed, 298 insertions(+), 41 deletions(-)

diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug
index 0a12933..867fe6f1 100644
--- a/arch/arm64/Kconfig.debug
+++ b/arch/arm64/Kconfig.debug
@@ -54,4 +54,27 @@ config DEBUG_SET_MODULE_RONX
           against certain classes of kernel exploits.
           If in doubt, say "N".
 
+config DEBUG_RODATA
+	bool "Make kernel text and rodata read-only"
+	help
+	  If this is set, kernel text and rodata will be made read-only. This
+	  is to help catch accidental or malicious attempts to change the
+	  kernel's executable code. Additionally splits rodata from kernel
+	  text so it can be made explicitly non-executable.
+
+          If in doubt, say Y
+
+config DEBUG_ALIGN_RODATA
+	depends on DEBUG_RODATA && !ARM64_64K_PAGES
+	bool "Align linker sections up to SECTION_SIZE"
+	help
+	  If this option is enabled, sections that may potentially be marked as
+	  read only or non-executable will be aligned up to the section size of
+	  the kernel. This prevents sections from being split into pages and
+	  avoids a potential TLB penalty. The downside is an increase in
+	  alignment and potentially wasted space. Turn on this option if
+	  performance is more important than memory pressure.
+
+	  If in doubt, say N
+
 endmenu
diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index 689b637..0014207 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -152,4 +152,8 @@ int set_memory_ro(unsigned long addr, int numpages);
 int set_memory_rw(unsigned long addr, int numpages);
 int set_memory_x(unsigned long addr, int numpages);
 int set_memory_nx(unsigned long addr, int numpages);
+
+#ifdef CONFIG_DEBUG_RODATA
+void mark_rodata_ro(void);
+#endif
 #endif
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 2d7e20a..405820d 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -8,6 +8,7 @@
 #include <asm/thread_info.h>
 #include <asm/memory.h>
 #include <asm/page.h>
+#include <asm/pgtable.h>
 
 #include "image.h"
 
@@ -54,6 +55,9 @@ SECTIONS
 		_text = .;
 		HEAD_TEXT
 	}
+#ifdef CONFIG_DEBUG_ALIGN_RODATA
+	. = ALIGN(1<<SECTION_SHIFT);
+#endif
 	.text : {			/* Real text segment		*/
 		_stext = .;		/* Text and read-only data	*/
 			*(.latehead.text)
@@ -71,19 +75,35 @@ SECTIONS
 		*(.got)			/* Global offset table		*/
 	}
 
+#ifdef CONFIG_DEBUG_ALIGN_RODATA
+	. = ALIGN(1<<SECTION_SHIFT);
+#endif
 	RO_DATA(PAGE_SIZE)
 	EXCEPTION_TABLE(8)
 	NOTES
+#ifdef CONFIG_DEBUG_ALIGN_RODATA
+	. = ALIGN(1<<SECTION_SHIFT);
+#endif
 	_etext = .;			/* End of text and rodata section */
 
+#ifdef CONFIG_DEBUG_ALIGN_RODATA
+	. = ALIGN(1<<SECTION_SHIFT);
+#else
 	. = ALIGN(PAGE_SIZE);
+#endif
 	__init_begin = .;
 
 	INIT_TEXT_SECTION(8)
 	.exit.text : {
 		ARM_EXIT_KEEP(EXIT_TEXT)
 	}
+
+#ifdef CONFIG_DEBUG_ALIGN_RODATA
+	. = ALIGN(1<<SECTION_SHIFT);
+	__init_data_begin = .;
+#else
 	. = ALIGN(16);
+#endif
 	.init.data : {
 		INIT_DATA
 		INIT_SETUP(16)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 494297c..61f44c7 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -324,6 +324,7 @@ void __init mem_init(void)
 
 void free_initmem(void)
 {
+	fixup_init();
 	free_initmem_default(0);
 }
 
diff --git a/arch/arm64/mm/mm.h b/arch/arm64/mm/mm.h
index d519f4f..82347d8 100644
--- a/arch/arm64/mm/mm.h
+++ b/arch/arm64/mm/mm.h
@@ -1,2 +1,4 @@
 extern void __init bootmem_init(void);
 extern void __init arm64_swiotlb_init(void);
+
+void fixup_init(void);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 6032f3e..dafde8e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -26,6 +26,7 @@
 #include <linux/memblock.h>
 #include <linux/fs.h>
 #include <linux/io.h>
+#include <linux/stop_machine.h>
 
 #include <asm/cputype.h>
 #include <asm/fixmap.h>
@@ -137,17 +138,55 @@ static void __init *early_alloc(unsigned long sz)
 	return ptr;
 }
 
-static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
+/*
+ * remap a PMD into pages
+ */
+static noinline void __ref split_pmd(pmd_t *pmd, bool early)
+{
+	pte_t *pte, *start_pte;
+	unsigned long pfn;
+	int i = 0;
+
+	if (early)
+		start_pte = pte = early_alloc(PTRS_PER_PTE*sizeof(pte_t));
+	else
+		start_pte = pte = (pte_t *)__get_free_page(PGALLOC_GFP);
+
+	BUG_ON(!pte);
+
+	pfn = pmd_pfn(*pmd);
+
+	do {
+		/*
+		 * Need to have the least restrictive permissions available
+		 * permissions will be fixed up later
+		 */
+		set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
+		pfn++;
+	} while (pte++, i++, i < PTRS_PER_PTE);
+
+
+	__pmd_populate(pmd, __pa(start_pte), PMD_TYPE_TABLE);
+	flush_tlb_all();
+}
+
+static void __ref alloc_init_pte(pmd_t *pmd, unsigned long addr,
 				  unsigned long end, unsigned long pfn,
-				  pgprot_t prot)
+				  pgprot_t prot, bool early)
 {
 	pte_t *pte;
 
 	if (pmd_none(*pmd)) {
-		pte = early_alloc(PTRS_PER_PTE * sizeof(pte_t));
+		if (early)
+			pte = early_alloc(PTRS_PER_PTE * sizeof(pte_t));
+		else
+			pte = (pte_t *)__get_free_page(PGALLOC_GFP);
+		BUG_ON(!pte);
 		__pmd_populate(pmd, __pa(pte), PMD_TYPE_TABLE);
 	}
-	BUG_ON(pmd_bad(*pmd));
+
+	if (pmd_bad(*pmd))
+		split_pmd(pmd, early);
 
 	pte = pte_offset_kernel(pmd, addr);
 	do {
@@ -156,29 +195,44 @@ static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 }
 
-static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
+void __ref split_pud(pud_t *old_pud, pmd_t *pmd)
+{
+	unsigned long addr = pud_pfn(*old_pud) << PAGE_SHIFT;
+	pgprot_t prot = __pgprot(pud_val(*old_pud) ^ pud_pfn(*old_pud));
+	int i = 0;
+
+	do {
+		set_pmd(pmd, __pmd(addr | prot));
+		addr += PMD_SIZE;
+	} while (pmd++, i++, i < PTRS_PER_PMD);
+}
+
+static void __ref alloc_init_pmd(pud_t *pud, unsigned long addr,
 				  unsigned long end, phys_addr_t phys,
-				  int map_io)
+				  pgprot_t sect_prot, pgprot_t pte_prot,
+				  bool early)
 {
 	pmd_t *pmd;
 	unsigned long next;
-	pmdval_t prot_sect;
-	pgprot_t prot_pte;
-
-	if (map_io) {
-		prot_sect = PROT_SECT_DEVICE_nGnRE;
-		prot_pte = __pgprot(PROT_DEVICE_nGnRE);
-	} else {
-		prot_sect = PROT_SECT_NORMAL_EXEC;
-		prot_pte = PAGE_KERNEL_EXEC;
-	}
 
 	/*
 	 * Check for initial section mappings in the pgd/pud and remove them.
 	 */
 	if (pud_none(*pud) || pud_bad(*pud)) {
-		pmd = early_alloc(PTRS_PER_PMD * sizeof(pmd_t));
+		if (early)
+			pmd = early_alloc(PTRS_PER_PMD * sizeof(pmd_t));
+		else
+			pmd = pmd_alloc_one(&init_mm, addr);
+		BUG_ON(!pmd);
+		if (pud_sect(*pud)) {
+			/*
+			 * need to have the 1G of mappings continue to be
+			 * present
+			 */
+			split_pud(pud, pmd);
+		}
 		pud_populate(&init_mm, pud, pmd);
+		flush_tlb_all();
 	}
 
 	pmd = pmd_offset(pud, addr);
@@ -186,8 +240,8 @@ static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
 		next = pmd_addr_end(addr, end);
 		/* try section mapping first */
 		if (((addr | next | phys) & ~SECTION_MASK) == 0) {
-			pmd_t old_pmd =*pmd;
-			set_pmd(pmd, __pmd(phys | prot_sect));
+			pmd_t old_pmd = *pmd;
+			set_pmd(pmd, __pmd(phys | sect_prot));
 			/*
 			 * Check for previous table entries created during
 			 * boot (__create_page_tables) and flush them.
@@ -196,15 +250,39 @@ static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
 				flush_tlb_all();
 		} else {
 			alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
-				       prot_pte);
+					pte_prot, early);
 		}
 		phys += next - addr;
 	} while (pmd++, addr = next, addr != end);
 }
 
-static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
-				  unsigned long end, phys_addr_t phys,
-				  int map_io)
+static inline bool use_1G_block(unsigned long addr, unsigned long next,
+			unsigned long phys, pgprot_t sect_prot,
+			pgprot_t pte_prot, bool early)
+{
+	if (!early)
+		return false;
+
+	if (PAGE_SHIFT != 12)
+		return false;
+
+	if (((addr | next | phys) & ~PUD_MASK) != 0)
+		return false;
+
+	/*
+	 * The assumption here is that if the memory is anything other
+	 * than normal we should not be using a block type
+	 */
+	return ((sect_prot & PMD_ATTRINDX_MASK) ==
+				PMD_ATTRINDX(MT_NORMAL)) &&
+			((pte_prot & PTE_ATTRINDX_MASK) ==
+				PTE_ATTRINDX(MT_NORMAL));
+}
+
+static void __ref alloc_init_pud(pgd_t *pgd, unsigned long addr,
+				  unsigned long end, unsigned long phys,
+				  pgprot_t sect_prot, pgprot_t pte_prot,
+				  bool early)
 {
 	pud_t *pud;
 	unsigned long next;
@@ -222,10 +300,9 @@ static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
 		/*
 		 * For 4K granule only, attempt to put down a 1GB block
 		 */
-		if (!map_io && (PAGE_SHIFT == 12) &&
-		    ((addr | next | phys) & ~PUD_MASK) == 0) {
+		if (use_1G_block(addr, next, phys, sect_prot, pte_prot, early)) {
 			pud_t old_pud = *pud;
-			set_pud(pud, __pud(phys | PROT_SECT_NORMAL_EXEC));
+			set_pud(pud, __pud(phys | sect_prot));
 
 			/*
 			 * If we have an old value for a pud, it will
@@ -240,7 +317,8 @@ static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
 				flush_tlb_all();
 			}
 		} else {
-			alloc_init_pmd(pud, addr, next, phys, map_io);
+			alloc_init_pmd(pud, addr, next, phys, sect_prot,
+					pte_prot, early);
 		}
 		phys += next - addr;
 	} while (pud++, addr = next, addr != end);
@@ -250,9 +328,11 @@ static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
  * Create the page directory entries and any necessary page tables for the
  * mapping specified by 'md'.
  */
-static void __init __create_mapping(pgd_t *pgd, phys_addr_t phys,
-				    unsigned long virt, phys_addr_t size,
-				    int map_io)
+static void __ref __create_mapping(pgd_t *pgd, phys_addr_t phys,
+				  unsigned long virt,
+				  phys_addr_t size,
+				  pgprot_t sect_prot, pgprot_t pte_prot,
+				  bool early)
 {
 	unsigned long addr, length, end, next;
 
@@ -262,31 +342,102 @@ static void __init __create_mapping(pgd_t *pgd, phys_addr_t phys,
 	end = addr + length;
 	do {
 		next = pgd_addr_end(addr, end);
-		alloc_init_pud(pgd, addr, next, phys, map_io);
+		alloc_init_pud(pgd, addr, next, phys, sect_prot, pte_prot,
+				early);
 		phys += next - addr;
 	} while (pgd++, addr = next, addr != end);
 }
 
-static void __init create_mapping(phys_addr_t phys, unsigned long virt,
-				  phys_addr_t size)
+void __init create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io)
+{
+	pgprot_t sect_prot = PROT_SECT_NORMAL_EXEC;
+	pgprot_t pte_prot = PAGE_KERNEL_EXEC;
+
+	if ((addr >> PGDIR_SHIFT) >= ARRAY_SIZE(idmap_pg_dir)) {
+		pr_warn("BUG: not creating id mapping for %pa\n", &addr);
+		return;
+	}
+
+	if (map_io) {
+		sect_prot = PROT_SECT_DEVICE_nGnRE;
+		pte_prot = __pgprot(PROT_DEVICE_nGnRE);
+	}
+
+	__create_mapping(&idmap_pg_dir[pgd_index(addr)],
+			 addr, addr, size, sect_prot, pte_prot, true);
+}
+
+static void __ref create_mapping(phys_addr_t phys, unsigned long virt,
+				  phys_addr_t size,
+				  pgprot_t sect_prot, pgprot_t pte_prot)
 {
 	if (virt < VMALLOC_START) {
 		pr_warn("BUG: not creating mapping for %pa@0x%016lx - outside kernel range\n",
 			&phys, virt);
 		return;
 	}
-	__create_mapping(pgd_offset_k(virt & PAGE_MASK), phys, virt, size, 0);
+
+	return __create_mapping(pgd_offset_k(virt & PAGE_MASK), phys, virt,
+				size, sect_prot, pte_prot, true);
 }
 
-void __init create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io)
+static void __ref create_mapping_late(phys_addr_t phys, unsigned long virt,
+				  phys_addr_t size,
+				  pgprot_t sect_prot, pgprot_t pte_prot)
 {
-	if ((addr >> PGDIR_SHIFT) >= ARRAY_SIZE(idmap_pg_dir)) {
-		pr_warn("BUG: not creating id mapping for %pa\n", &addr);
+	if (virt < VMALLOC_START) {
+		pr_warn("BUG: not creating mapping for %pa@0x%016lx - outside kernel range\n",
+			&phys, virt);
 		return;
 	}
-	__create_mapping(&idmap_pg_dir[pgd_index(addr)],
-			 addr, addr, size, map_io);
+
+	return __create_mapping(pgd_offset_k(virt & PAGE_MASK), phys, virt,
+				size, sect_prot, pte_prot, false);
+}
+
+#ifdef CONFIG_DEBUG_RODATA
+static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
+{
+	/*
+	 * Set up the executable regions using the existing section mappings
+	 * for now. This will get more fine grained later once all memory
+	 * is mapped
+	 */
+	unsigned long kernel_x_start = round_down(__pa(_stext), SECTION_SIZE);
+	unsigned long kernel_x_end = round_up(__pa(__init_end), SECTION_SIZE);
+
+	if (end < kernel_x_start) {
+		create_mapping(start, __phys_to_virt(start),
+			end - start, PROT_SECT_NORMAL, PAGE_KERNEL);
+	} else if (start >= kernel_x_end) {
+		create_mapping(start, __phys_to_virt(start),
+			end - start, PROT_SECT_NORMAL, PAGE_KERNEL);
+	} else {
+		if (start < kernel_x_start)
+			create_mapping(start, __phys_to_virt(start),
+				kernel_x_start - start,
+				PROT_SECT_NORMAL,
+				PAGE_KERNEL);
+		create_mapping(kernel_x_start,
+				__phys_to_virt(kernel_x_start),
+				kernel_x_end - kernel_x_start,
+				PROT_SECT_NORMAL_EXEC, PAGE_KERNEL_EXEC);
+		if (kernel_x_end < end)
+			create_mapping(kernel_x_end,
+				__phys_to_virt(kernel_x_end),
+				end - kernel_x_end,
+				PROT_SECT_NORMAL,
+				PAGE_KERNEL);
+	}
+
 }
+#else
+static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
+{
+	create_mapping(start, __phys_to_virt(start), end - start,
+			PROT_SECT_NORMAL_EXEC, PAGE_KERNEL_EXEC);
+}
+#endif
 
 static void __init map_mem(void)
 {
@@ -332,14 +483,69 @@ static void __init map_mem(void)
 			memblock_set_current_limit(limit);
 		}
 #endif
-
-		create_mapping(start, __phys_to_virt(start), end - start);
+		__map_memblock(start, end);
 	}
 
 	/* Limit no longer required. */
 	memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
 }
 
+void __init fixup_executable(void)
+{
+#ifdef CONFIG_DEBUG_RODATA
+	/* now that we are actually fully mapped, make the start/end more fine grained */
+	if (!IS_ALIGNED((unsigned long)_stext, SECTION_SIZE)) {
+		unsigned long aligned_start = round_down(__pa(_stext),
+							SECTION_SIZE);
+
+		create_mapping(aligned_start, __phys_to_virt(aligned_start),
+				__pa(_stext) - aligned_start,
+				PROT_SECT_NORMAL,
+				PAGE_KERNEL);
+	}
+
+	if (!IS_ALIGNED((unsigned long)__init_end, SECTION_SIZE)) {
+		unsigned long aligned_end = round_up(__pa(__init_end),
+							SECTION_SIZE);
+		create_mapping(__pa(__init_end), (unsigned long)__init_end,
+				aligned_end - __pa(__init_end),
+				PROT_SECT_NORMAL,
+				PAGE_KERNEL);
+	}
+#endif
+}
+
+#ifdef CONFIG_DEBUG_RODATA
+void mark_rodata_ro(void)
+{
+	create_mapping_late(__pa(_stext), (unsigned long)_stext,
+				(unsigned long)_etext - (unsigned long)_stext,
+				PROT_SECT_NORMAL_EXEC | PMD_SECT_RDONLY,
+				PAGE_KERNEL_EXEC | PTE_RDONLY);
+
+}
+#endif
+
+static int __flush_mappings(void *unused)
+{
+	flush_tlb_kernel_range((unsigned long)__init_begin,
+				(unsigned long)__init_end);
+	return 0;
+}
+
+void __ref fixup_init(void)
+{
+	phys_addr_t start = __pa(__init_begin);
+	phys_addr_t end = __pa(__init_end);
+
+	create_mapping_late(start, (unsigned long)__init_begin,
+			end - start,
+			PROT_SECT_NORMAL,
+			PAGE_KERNEL);
+	if (!IS_ALIGNED(start, SECTION_SIZE) || !IS_ALIGNED(end, SECTION_SIZE))
+		stop_machine(__flush_mappings, NULL, NULL);
+}
+
 /*
  * paging_init() sets up the page tables, initialises the zone memory
  * maps and sets up the zero page.
@@ -349,6 +555,7 @@ void __init paging_init(void)
 	void *zero_page;
 
 	map_mem();
+	fixup_executable();
 
 	/*
 	 * Finally flush the caches and tlb to ensure that we're in a
-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv5 3/7] arm64: Move cpu_resume into the text section
  2014-11-18  0:55 ` [PATCHv5 3/7] arm64: Move cpu_resume into the text section Laura Abbott
@ 2014-11-18 10:35   ` Lorenzo Pieralisi
  2014-11-18 10:49   ` Mark Rutland
  1 sibling, 0 replies; 24+ messages in thread
From: Lorenzo Pieralisi @ 2014-11-18 10:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 18, 2014 at 12:55:01AM +0000, Laura Abbott wrote:
> The function cpu_resume currently lives in the .data section.
> There's no reason for it to be there since we can use relative
> instructions without a problem. Move a few cpu_resume data
> structures out of the assembly file so the .data annotation
> can be dropped completely and cpu_resume ends up in the read
> only text section.
> 
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> Tested-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>

That's nice, thank you:

Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

> ---
> v5: Dropped the .data annoation completely per suggestion of
> Lorenzo. Dropped mpidr_hash_ptr per suggestion of Ard.
> Moved sleep_save_sp and sleep_idmap_phys from assembly per
> suggestion of Mark.
> 
> Please let me know if your Tested-by/Reviewed-by/Acked-by
> still stands.
> ---
>  arch/arm64/kernel/sleep.S   | 29 ++++++-----------------------
>  arch/arm64/kernel/suspend.c |  4 ++--
>  2 files changed, 8 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
> index a564b44..3f836b2 100644
> --- a/arch/arm64/kernel/sleep.S
> +++ b/arch/arm64/kernel/sleep.S
> @@ -147,14 +147,12 @@ cpu_resume_after_mmu:
>  	ret
>  ENDPROC(cpu_resume_after_mmu)
>  
> -	.data
>  ENTRY(cpu_resume)
>  	bl	el2_setup		// if in EL2 drop to EL1 cleanly
>  #ifdef CONFIG_SMP
>  	mrs	x1, mpidr_el1
> -	adr	x4, mpidr_hash_ptr
> -	ldr	x5, [x4]
> -	add	x8, x4, x5		// x8 = struct mpidr_hash phys address
> +	adrp	x8, mpidr_hash
> +	add x8, x8, #:lo12:mpidr_hash // x8 = struct mpidr_hash phys address
>          /* retrieve mpidr_hash members to compute the hash */
>  	ldr	x2, [x8, #MPIDR_HASH_MASK]
>  	ldp	w3, w4, [x8, #MPIDR_HASH_SHIFTS]
> @@ -164,14 +162,15 @@ ENTRY(cpu_resume)
>  #else
>  	mov	x7, xzr
>  #endif
> -	adr	x0, sleep_save_sp
> +	adrp	x0, sleep_save_sp
> +	add	x0, x0, #:lo12:sleep_save_sp
>  	ldr	x0, [x0, #SLEEP_SAVE_SP_PHYS]
>  	ldr	x0, [x0, x7, lsl #3]
>  	/* load sp from context */
>  	ldr	x2, [x0, #CPU_CTX_SP]
> -	adr	x1, sleep_idmap_phys
> +	adrp	x1, sleep_idmap_phys
>  	/* load physical address of identity map page table in x1 */
> -	ldr	x1, [x1]
> +	ldr	x1, [x1, #:lo12:sleep_idmap_phys]
>  	mov	sp, x2
>  	/*
>  	 * cpu_do_resume expects x0 to contain context physical address
> @@ -182,24 +181,8 @@ ENTRY(cpu_resume)
>  ENDPROC(cpu_resume)
>  
>  	.align 3
> -mpidr_hash_ptr:
>  	/*
>  	 * offset of mpidr_hash symbol from current location
>  	 * used to obtain run-time mpidr_hash address with MMU off
>           */
>  	.quad	mpidr_hash - .
> -/*
> - * physical address of identity mapped page tables
> - */
> -	.type	sleep_idmap_phys, #object
> -ENTRY(sleep_idmap_phys)
> -	.quad	0
> -/*
> - * struct sleep_save_sp {
> - *	phys_addr_t *save_ptr_stash;
> - *	phys_addr_t save_ptr_stash_phys;
> - * };
> - */
> -	.type	sleep_save_sp, #object
> -ENTRY(sleep_save_sp)
> -	.space	SLEEP_SAVE_SP_SZ	// struct sleep_save_sp
> diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
> index 13ad4db..3771b72 100644
> --- a/arch/arm64/kernel/suspend.c
> +++ b/arch/arm64/kernel/suspend.c
> @@ -126,8 +126,8 @@ int __cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
>  	return ret;
>  }
>  
> -extern struct sleep_save_sp sleep_save_sp;
> -extern phys_addr_t sleep_idmap_phys;
> +struct sleep_save_sp sleep_save_sp;
> +phys_addr_t sleep_idmap_phys;
>  
>  static int __init cpu_suspend_init(void)
>  {
> -- 
> Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv5 3/7] arm64: Move cpu_resume into the text section
  2014-11-18  0:55 ` [PATCHv5 3/7] arm64: Move cpu_resume into the text section Laura Abbott
  2014-11-18 10:35   ` Lorenzo Pieralisi
@ 2014-11-18 10:49   ` Mark Rutland
  2014-11-18 21:20     ` Laura Abbott
  1 sibling, 1 reply; 24+ messages in thread
From: Mark Rutland @ 2014-11-18 10:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Laura,

On Tue, Nov 18, 2014 at 12:55:01AM +0000, Laura Abbott wrote:
> The function cpu_resume currently lives in the .data section.
> There's no reason for it to be there since we can use relative
> instructions without a problem. Move a few cpu_resume data
> structures out of the assembly file so the .data annotation
> can be dropped completely and cpu_resume ends up in the read
> only text section.
> 
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> Tested-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
> ---
> v5: Dropped the .data annoation completely per suggestion of
> Lorenzo. Dropped mpidr_hash_ptr per suggestion of Ard.
> Moved sleep_save_sp and sleep_idmap_phys from assembly per
> suggestion of Mark.
> 
> Please let me know if your Tested-by/Reviewed-by/Acked-by
> still stands.
> ---
>  arch/arm64/kernel/sleep.S   | 29 ++++++-----------------------
>  arch/arm64/kernel/suspend.c |  4 ++--
>  2 files changed, 8 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
> index a564b44..3f836b2 100644
> --- a/arch/arm64/kernel/sleep.S
> +++ b/arch/arm64/kernel/sleep.S
> @@ -147,14 +147,12 @@ cpu_resume_after_mmu:
>  	ret
>  ENDPROC(cpu_resume_after_mmu)
>  
> -	.data
>  ENTRY(cpu_resume)
>  	bl	el2_setup		// if in EL2 drop to EL1 cleanly
>  #ifdef CONFIG_SMP
>  	mrs	x1, mpidr_el1
> -	adr	x4, mpidr_hash_ptr
> -	ldr	x5, [x4]
> -	add	x8, x4, x5		// x8 = struct mpidr_hash phys address
> +	adrp	x8, mpidr_hash
> +	add x8, x8, #:lo12:mpidr_hash // x8 = struct mpidr_hash phys address
>          /* retrieve mpidr_hash members to compute the hash */
>  	ldr	x2, [x8, #MPIDR_HASH_MASK]
>  	ldp	w3, w4, [x8, #MPIDR_HASH_SHIFTS]
> @@ -164,14 +162,15 @@ ENTRY(cpu_resume)
>  #else
>  	mov	x7, xzr
>  #endif
> -	adr	x0, sleep_save_sp
> +	adrp	x0, sleep_save_sp
> +	add	x0, x0, #:lo12:sleep_save_sp
>  	ldr	x0, [x0, #SLEEP_SAVE_SP_PHYS]
>  	ldr	x0, [x0, x7, lsl #3]
>  	/* load sp from context */
>  	ldr	x2, [x0, #CPU_CTX_SP]
> -	adr	x1, sleep_idmap_phys
> +	adrp	x1, sleep_idmap_phys
>  	/* load physical address of identity map page table in x1 */
> -	ldr	x1, [x1]
> +	ldr	x1, [x1, #:lo12:sleep_idmap_phys]
>  	mov	sp, x2
>  	/*
>  	 * cpu_do_resume expects x0 to contain context physical address
> @@ -182,24 +181,8 @@ ENTRY(cpu_resume)
>  ENDPROC(cpu_resume)
>  
>  	.align 3
> -mpidr_hash_ptr:
>  	/*
>  	 * offset of mpidr_hash symbol from current location
>  	 * used to obtain run-time mpidr_hash address with MMU off
>           */
>  	.quad	mpidr_hash - .

The .align, comment and the .quad should go too, now the value is
unused.

Other than that this looks fine, and cpuidle still works on my Juno. My
Reviewed-by still stands.

Thanks,
Mark.

> -/*
> - * physical address of identity mapped page tables
> - */
> -	.type	sleep_idmap_phys, #object
> -ENTRY(sleep_idmap_phys)
> -	.quad	0
> -/*
> - * struct sleep_save_sp {
> - *	phys_addr_t *save_ptr_stash;
> - *	phys_addr_t save_ptr_stash_phys;
> - * };
> - */
> -	.type	sleep_save_sp, #object
> -ENTRY(sleep_save_sp)
> -	.space	SLEEP_SAVE_SP_SZ	// struct sleep_save_sp
> diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
> index 13ad4db..3771b72 100644
> --- a/arch/arm64/kernel/suspend.c
> +++ b/arch/arm64/kernel/suspend.c
> @@ -126,8 +126,8 @@ int __cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
>  	return ret;
>  }
>  
> -extern struct sleep_save_sp sleep_save_sp;
> -extern phys_addr_t sleep_idmap_phys;
> +struct sleep_save_sp sleep_save_sp;
> +phys_addr_t sleep_idmap_phys;
>  
>  static int __init cpu_suspend_init(void)
>  {
> -- 
> Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv5 4/7] arm64: Move some head.text functions to executable section
  2014-11-18  0:55 ` [PATCHv5 4/7] arm64: Move some head.text functions to executable section Laura Abbott
@ 2014-11-18 11:41   ` Mark Rutland
  2014-11-18 21:27     ` Laura Abbott
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Rutland @ 2014-11-18 11:41 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Laura,

On Tue, Nov 18, 2014 at 12:55:02AM +0000, Laura Abbott wrote:
> The head.text section is intended to be run at early bootup
> before any of the regular kernel mappings have been setup.
> Parts of head.text may be freed back into the buddy allocator
> due to TEXT_OFFSET so for security requirements this memory
> must not be executable. The suspend/resume/hotplug code path
> requires some of these head.S functions to run however which
> means they need to be executable. Support these conflicting
> requirements by moving the few head.text functions that need
> to be executable to the text section which has the appropriate
> page table permissions.
> 
> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
> ---
> v5: Went back to the v3 version which moved everything around instead
> of adding annotations. Dropped the code duplication for head.text.
> ---
>  arch/arm64/kernel/head.S        | 406 +++++++++++++++++++++-------------------
>  arch/arm64/kernel/vmlinux.lds.S |   1 +
>  2 files changed, 210 insertions(+), 197 deletions(-)

[...]

> +/*
> + * end 'true' head section, begin head section that can be read only
> + */
> +       .section ".latehead.text","ax"

Is there any reason we can't place the remainder into .text directly?

Everything after the section change is used repeatedly for hotplug and
idle, so the "latehead" distinction is a little misleading.

Given adrp and b/bl, we can jump at least +/-128MB without problems, so
we should be ok even if this code gets emitted late in the kernel image.

Other than that, this looks fine to me.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv5 3/7] arm64: Move cpu_resume into the text section
  2014-11-18 10:49   ` Mark Rutland
@ 2014-11-18 21:20     ` Laura Abbott
  0 siblings, 0 replies; 24+ messages in thread
From: Laura Abbott @ 2014-11-18 21:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/18/2014 2:49 AM, Mark Rutland wrote:
> Hi Laura,
>
> On Tue, Nov 18, 2014 at 12:55:01AM +0000, Laura Abbott wrote:
>> The function cpu_resume currently lives in the .data section.
>> There's no reason for it to be there since we can use relative
>> instructions without a problem. Move a few cpu_resume data
>> structures out of the assembly file so the .data annotation
>> can be dropped completely and cpu_resume ends up in the read
>> only text section.
>>
>> Reviewed-by: Kees Cook <keescook@chromium.org>
>> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
>> Tested-by: Mark Rutland <mark.rutland@arm.com>
>> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
>> ---
>> v5: Dropped the .data annoation completely per suggestion of
>> Lorenzo. Dropped mpidr_hash_ptr per suggestion of Ard.
>> Moved sleep_save_sp and sleep_idmap_phys from assembly per
>> suggestion of Mark.
>>
>> Please let me know if your Tested-by/Reviewed-by/Acked-by
>> still stands.
>> ---
>>   arch/arm64/kernel/sleep.S   | 29 ++++++-----------------------
>>   arch/arm64/kernel/suspend.c |  4 ++--
>>   2 files changed, 8 insertions(+), 25 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
>> index a564b44..3f836b2 100644
>> --- a/arch/arm64/kernel/sleep.S
>> +++ b/arch/arm64/kernel/sleep.S
>> @@ -147,14 +147,12 @@ cpu_resume_after_mmu:
>>   	ret
>>   ENDPROC(cpu_resume_after_mmu)
>>
>> -	.data
>>   ENTRY(cpu_resume)
>>   	bl	el2_setup		// if in EL2 drop to EL1 cleanly
>>   #ifdef CONFIG_SMP
>>   	mrs	x1, mpidr_el1
>> -	adr	x4, mpidr_hash_ptr
>> -	ldr	x5, [x4]
>> -	add	x8, x4, x5		// x8 = struct mpidr_hash phys address
>> +	adrp	x8, mpidr_hash
>> +	add x8, x8, #:lo12:mpidr_hash // x8 = struct mpidr_hash phys address
>>           /* retrieve mpidr_hash members to compute the hash */
>>   	ldr	x2, [x8, #MPIDR_HASH_MASK]
>>   	ldp	w3, w4, [x8, #MPIDR_HASH_SHIFTS]
>> @@ -164,14 +162,15 @@ ENTRY(cpu_resume)
>>   #else
>>   	mov	x7, xzr
>>   #endif
>> -	adr	x0, sleep_save_sp
>> +	adrp	x0, sleep_save_sp
>> +	add	x0, x0, #:lo12:sleep_save_sp
>>   	ldr	x0, [x0, #SLEEP_SAVE_SP_PHYS]
>>   	ldr	x0, [x0, x7, lsl #3]
>>   	/* load sp from context */
>>   	ldr	x2, [x0, #CPU_CTX_SP]
>> -	adr	x1, sleep_idmap_phys
>> +	adrp	x1, sleep_idmap_phys
>>   	/* load physical address of identity map page table in x1 */
>> -	ldr	x1, [x1]
>> +	ldr	x1, [x1, #:lo12:sleep_idmap_phys]
>>   	mov	sp, x2
>>   	/*
>>   	 * cpu_do_resume expects x0 to contain context physical address
>> @@ -182,24 +181,8 @@ ENTRY(cpu_resume)
>>   ENDPROC(cpu_resume)
>>
>>   	.align 3
>> -mpidr_hash_ptr:
>>   	/*
>>   	 * offset of mpidr_hash symbol from current location
>>   	 * used to obtain run-time mpidr_hash address with MMU off
>>            */
>>   	.quad	mpidr_hash - .
>
> The .align, comment and the .quad should go too, now the value is
> unused.
>

Ah yes, I misread what was happening here and thought this was the
declaration for mpidr_hash. It can be removed just fine.

Thanks,
Laura



-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv5 4/7] arm64: Move some head.text functions to executable section
  2014-11-18 11:41   ` Mark Rutland
@ 2014-11-18 21:27     ` Laura Abbott
  0 siblings, 0 replies; 24+ messages in thread
From: Laura Abbott @ 2014-11-18 21:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/18/2014 3:41 AM, Mark Rutland wrote:
> Hi Laura,
>
> On Tue, Nov 18, 2014 at 12:55:02AM +0000, Laura Abbott wrote:
>> The head.text section is intended to be run at early bootup
>> before any of the regular kernel mappings have been setup.
>> Parts of head.text may be freed back into the buddy allocator
>> due to TEXT_OFFSET so for security requirements this memory
>> must not be executable. The suspend/resume/hotplug code path
>> requires some of these head.S functions to run however which
>> means they need to be executable. Support these conflicting
>> requirements by moving the few head.text functions that need
>> to be executable to the text section which has the appropriate
>> page table permissions.
>>
>> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
>> ---
>> v5: Went back to the v3 version which moved everything around instead
>> of adding annotations. Dropped the code duplication for head.text.
>> ---
>>   arch/arm64/kernel/head.S        | 406 +++++++++++++++++++++-------------------
>>   arch/arm64/kernel/vmlinux.lds.S |   1 +
>>   2 files changed, 210 insertions(+), 197 deletions(-)
>
> [...]
>
>> +/*
>> + * end 'true' head section, begin head section that can be read only
>> + */
>> +       .section ".latehead.text","ax"
>
> Is there any reason we can't place the remainder into .text directly?
>
> Everything after the section change is used repeatedly for hotplug and
> idle, so the "latehead" distinction is a little misleading.
>
> Given adrp and b/bl, we can jump at least +/-128MB without problems, so
> we should be ok even if this code gets emitted late in the kernel image.
>
> Other than that, this looks fine to me.
>

My thought was trying to keep what was in head.S distinct from other text
but it works fine just moving it into text. I'll fix it.

Thanks,
Laura


-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv5 7/7] arm64: add better page protections to arm64
  2014-11-18  0:55 ` [PATCHv5 7/7] arm64: add better page protections to arm64 Laura Abbott
@ 2014-11-19 16:31   ` Mark Rutland
  2014-11-19 17:38     ` Ard Biesheuvel
  2014-11-21  1:08     ` Laura Abbott
  2014-11-20 12:04   ` Steve Capper
  1 sibling, 2 replies; 24+ messages in thread
From: Mark Rutland @ 2014-11-19 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Laura,

On Tue, Nov 18, 2014 at 12:55:05AM +0000, Laura Abbott wrote:
> Add page protections for arm64 similar to those in arm.
> This is for security reasons to prevent certain classes
> of exploits. The current method:
>
> - Map all memory as either RWX or RW. We round to the nearest
>   section to avoid creating page tables before everything is mapped
> - Once everything is mapped, if either end of the RWX section should
>   not be X, we split the PMD and remap as necessary
> - When initmem is to be freed, we change the permissions back to
>   RW (using stop machine if necessary to flush the TLB)
> - If CONFIG_DEBUG_RODATA is set, the read only sections are set
>   read only.
>
> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
> ---
> v5:
>  -Dropped map_io from the alloc_init_* functions
>  -Fixed a bug in split_pud and cleaned it up per suggestions from Steve
>  -Aligned _etext up to SECTION_SIZE of DEBUG_ALIGN_RODATA is enabled to ensure
>   that the section mapping is actually kept and we don't leak any RWX regions
>  -Dropped some old ioremap code that somehow snuck in from the last rebase
>  -Fixed create_id_mapping to use the early mapping since efi_idmap_init is
>   called early
> ---
>  arch/arm64/Kconfig.debug            |  23 +++
>  arch/arm64/include/asm/cacheflush.h |   4 +
>  arch/arm64/kernel/vmlinux.lds.S     |  20 +++
>  arch/arm64/mm/init.c                |   1 +
>  arch/arm64/mm/mm.h                  |   2 +
>  arch/arm64/mm/mmu.c                 | 289 +++++++++++++++++++++++++++++++-----
>  6 files changed, 298 insertions(+), 41 deletions(-)

Unfortuantely this is blowing up on my Juno atop of v3.18-rc5, because
EFI runtime services seem to get mapped as non-executable, and at boot
we tried to read the RTC via runtime services:

Bad mode in Synchronous Abort handler detected, code 0x8600000d
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc5+ #23
task: ffffffc9768b8000 ti: ffffffc9768c0000 task.ti: ffffffc9768c0000
PC is at 0xffffffc97ffaa3e4
LR is at virt_efi_get_time+0x60/0xa0
pc : [<ffffffc97ffaa3e4>] lr : [<ffffffc00044a6c8>] pstate: 800000c5
sp : ffffffc9768c39e0
x29: ffffffc9768c39e0 x28: ffffffc000723c90
x27: ffffffc0007b5678 x26: 0000000000000001
x25: ffffffc000584320 x24: ffffffc0006aaeb0
x23: ffffffc9768c3a60 x22: 0000000000000040
x21: ffffffc97ffaa3e4 x20: ffffffc0007b5a48
x19: ffffffc0007b5a40 x18: 0000000000000007
x17: 000000000000000e x16: 0000000000000001
x15: 0000000000000007 x14: 0ffffffffffffffe
x13: ffffffbe22feb720 x12: 0000000000000030
x11: 0000000000000003 x10: 0000000000000068
x9 : 0000000000000000 x8 : ffffffc97f981c00
x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000040 x4 : 000000097f6c8000
x3 : 0000000000000000 x2 : 000000097f6c8000
x1 : ffffffc9768c3a50 x0 : ffffffc9768c3a60

Ard, is this issue solved by any current series, or do we need to do
anything additional to ensure runtime services are mapped appropriately
for DEBUG_RODATA kernels?

> diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug
> index 0a12933..867fe6f1 100644
> --- a/arch/arm64/Kconfig.debug
> +++ b/arch/arm64/Kconfig.debug
> @@ -54,4 +54,27 @@ config DEBUG_SET_MODULE_RONX
>            against certain classes of kernel exploits.
>            If in doubt, say "N".
>
> +config DEBUG_RODATA
> +       bool "Make kernel text and rodata read-only"
> +       help
> +         If this is set, kernel text and rodata will be made read-only. This
> +         is to help catch accidental or malicious attempts to change the
> +         kernel's executable code. Additionally splits rodata from kernel
> +         text so it can be made explicitly non-executable.
> +
> +          If in doubt, say Y
> +
> +config DEBUG_ALIGN_RODATA
> +       depends on DEBUG_RODATA && !ARM64_64K_PAGES
> +       bool "Align linker sections up to SECTION_SIZE"
> +       help
> +         If this option is enabled, sections that may potentially be marked as
> +         read only or non-executable will be aligned up to the section size of
> +         the kernel. This prevents sections from being split into pages and
> +         avoids a potential TLB penalty. The downside is an increase in
> +         alignment and potentially wasted space. Turn on this option if
> +         performance is more important than memory pressure.
> +
> +         If in doubt, say N
> +
>  endmenu
> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> index 689b637..0014207 100644
> --- a/arch/arm64/include/asm/cacheflush.h
> +++ b/arch/arm64/include/asm/cacheflush.h
> @@ -152,4 +152,8 @@ int set_memory_ro(unsigned long addr, int numpages);
>  int set_memory_rw(unsigned long addr, int numpages);
>  int set_memory_x(unsigned long addr, int numpages);
>  int set_memory_nx(unsigned long addr, int numpages);
> +
> +#ifdef CONFIG_DEBUG_RODATA
> +void mark_rodata_ro(void);
> +#endif
>  #endif
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 2d7e20a..405820d 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -8,6 +8,7 @@
>  #include <asm/thread_info.h>
>  #include <asm/memory.h>
>  #include <asm/page.h>
> +#include <asm/pgtable.h>
>
>  #include "image.h"
>
> @@ -54,6 +55,9 @@ SECTIONS
>                 _text = .;
>                 HEAD_TEXT
>         }
> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
> +       . = ALIGN(1<<SECTION_SHIFT);
> +#endif

Could we have macros defined once above, something like:

#ifdef CONFIG_DEBUG_ALIGN_RODATA
#define ALIGN_DEBUG_RO                  . = ALIGN(1<<SECTION_SHIFT);
#define ALIGN_DEBUG_RO_MIN(min)         ALIGN_DEBUG_RO
#else
#define ALIGN_DEBUG_RO
#define ALIGN_DEBUG_RO_MIN(min)         . = ALIGN(min)
#endif

Then we can avoid all the ifdefs below.

>         .text : {                       /* Real text segment            */
>                 _stext = .;             /* Text and read-only data      */
>                         *(.latehead.text)
> @@ -71,19 +75,35 @@ SECTIONS
>                 *(.got)                 /* Global offset table          */
>         }
>
> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
> +       . = ALIGN(1<<SECTION_SHIFT);
> +#endif
>         RO_DATA(PAGE_SIZE)
>         EXCEPTION_TABLE(8)
>         NOTES
> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
> +       . = ALIGN(1<<SECTION_SHIFT);
> +#endif
>         _etext = .;                     /* End of text and rodata section */
>
> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
> +       . = ALIGN(1<<SECTION_SHIFT);
> +#else
>         . = ALIGN(PAGE_SIZE);
> +#endif
>         __init_begin = .;
>
>         INIT_TEXT_SECTION(8)
>         .exit.text : {
>                 ARM_EXIT_KEEP(EXIT_TEXT)
>         }
> +
> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
> +       . = ALIGN(1<<SECTION_SHIFT);
> +       __init_data_begin = .;

Is this used anywhere? we seem to have left __init_end after this, and
haven't introduced an __init_data_end.

> +#else
>         . = ALIGN(16);
> +#endif
>         .init.data : {
>                 INIT_DATA
>                 INIT_SETUP(16)
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 494297c..61f44c7 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -324,6 +324,7 @@ void __init mem_init(void)
>
>  void free_initmem(void)
>  {
> +       fixup_init();
>         free_initmem_default(0);
>  }
>
> diff --git a/arch/arm64/mm/mm.h b/arch/arm64/mm/mm.h
> index d519f4f..82347d8 100644
> --- a/arch/arm64/mm/mm.h
> +++ b/arch/arm64/mm/mm.h
> @@ -1,2 +1,4 @@
>  extern void __init bootmem_init(void);
>  extern void __init arm64_swiotlb_init(void);
> +
> +void fixup_init(void);
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 6032f3e..dafde8e 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -26,6 +26,7 @@
>  #include <linux/memblock.h>
>  #include <linux/fs.h>
>  #include <linux/io.h>
> +#include <linux/stop_machine.h>
>
>  #include <asm/cputype.h>
>  #include <asm/fixmap.h>
> @@ -137,17 +138,55 @@ static void __init *early_alloc(unsigned long sz)
>         return ptr;
>  }
>
> -static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
> +/*
> + * remap a PMD into pages
> + */
> +static noinline void __ref split_pmd(pmd_t *pmd, bool early)
> +{
> +       pte_t *pte, *start_pte;
> +       unsigned long pfn;
> +       int i = 0;
> +
> +       if (early)
> +               start_pte = pte = early_alloc(PTRS_PER_PTE*sizeof(pte_t));
> +       else
> +               start_pte = pte = (pte_t *)__get_free_page(PGALLOC_GFP);
> +
> +       BUG_ON(!pte);
> +
> +       pfn = pmd_pfn(*pmd);
> +
> +       do {
> +               /*
> +                * Need to have the least restrictive permissions available
> +                * permissions will be fixed up later
> +                */
> +               set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
> +               pfn++;
> +       } while (pte++, i++, i < PTRS_PER_PTE);
> +
> +
> +       __pmd_populate(pmd, __pa(start_pte), PMD_TYPE_TABLE);
> +       flush_tlb_all();
> +}
> +
> +static void __ref alloc_init_pte(pmd_t *pmd, unsigned long addr,
>                                   unsigned long end, unsigned long pfn,
> -                                 pgprot_t prot)
> +                                 pgprot_t prot, bool early)
>  {
>         pte_t *pte;
>
>         if (pmd_none(*pmd)) {
> -               pte = early_alloc(PTRS_PER_PTE * sizeof(pte_t));
> +               if (early)
> +                       pte = early_alloc(PTRS_PER_PTE * sizeof(pte_t));
> +               else
> +                       pte = (pte_t *)__get_free_page(PGALLOC_GFP);
> +               BUG_ON(!pte);
>                 __pmd_populate(pmd, __pa(pte), PMD_TYPE_TABLE);
>         }
> -       BUG_ON(pmd_bad(*pmd));
> +
> +       if (pmd_bad(*pmd))
> +               split_pmd(pmd, early);
>
>         pte = pte_offset_kernel(pmd, addr);
>         do {
> @@ -156,29 +195,44 @@ static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
>         } while (pte++, addr += PAGE_SIZE, addr != end);
>  }
>
> -static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
> +void __ref split_pud(pud_t *old_pud, pmd_t *pmd)
> +{
> +       unsigned long addr = pud_pfn(*old_pud) << PAGE_SHIFT;
> +       pgprot_t prot = __pgprot(pud_val(*old_pud) ^ pud_pfn(*old_pud));
> +       int i = 0;
> +
> +       do {
> +               set_pmd(pmd, __pmd(addr | prot));
> +               addr += PMD_SIZE;
> +       } while (pmd++, i++, i < PTRS_PER_PMD);
> +}

As far as I can see only the functions which call early_alloc
(split_pmd, alloc_init_pte, alloc_init_pmd) need the __ref annotation,
so we can drop it heere and elsewhere.

It would be nice if we could avoid the mismatch entirely. Perhaps we
could pass an allocator function down instead of the early parameter?
Something like:

static void *late_alloc(unsigned long sz)
{
	BUG_ON(PAGE_SIZE < sz);
	return (void *)__get_free_page(PGALLOC_GFP);
}

which we could pass down from create_mapping_late in place of
early_alloc for create_id_mapping and create_mapping.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv5 7/7] arm64: add better page protections to arm64
  2014-11-19 16:31   ` Mark Rutland
@ 2014-11-19 17:38     ` Ard Biesheuvel
  2014-11-19 18:06       ` Ard Biesheuvel
  2014-11-19 18:46       ` Mark Rutland
  2014-11-21  1:08     ` Laura Abbott
  1 sibling, 2 replies; 24+ messages in thread
From: Ard Biesheuvel @ 2014-11-19 17:38 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 November 2014 17:31, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi Laura,
>
> On Tue, Nov 18, 2014 at 12:55:05AM +0000, Laura Abbott wrote:
>> Add page protections for arm64 similar to those in arm.
>> This is for security reasons to prevent certain classes
>> of exploits. The current method:
>>
>> - Map all memory as either RWX or RW. We round to the nearest
>>   section to avoid creating page tables before everything is mapped
>> - Once everything is mapped, if either end of the RWX section should
>>   not be X, we split the PMD and remap as necessary
>> - When initmem is to be freed, we change the permissions back to
>>   RW (using stop machine if necessary to flush the TLB)
>> - If CONFIG_DEBUG_RODATA is set, the read only sections are set
>>   read only.
>>
>> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
>> ---
>> v5:
>>  -Dropped map_io from the alloc_init_* functions
>>  -Fixed a bug in split_pud and cleaned it up per suggestions from Steve
>>  -Aligned _etext up to SECTION_SIZE of DEBUG_ALIGN_RODATA is enabled to ensure
>>   that the section mapping is actually kept and we don't leak any RWX regions
>>  -Dropped some old ioremap code that somehow snuck in from the last rebase
>>  -Fixed create_id_mapping to use the early mapping since efi_idmap_init is
>>   called early
>> ---
>>  arch/arm64/Kconfig.debug            |  23 +++
>>  arch/arm64/include/asm/cacheflush.h |   4 +
>>  arch/arm64/kernel/vmlinux.lds.S     |  20 +++
>>  arch/arm64/mm/init.c                |   1 +
>>  arch/arm64/mm/mm.h                  |   2 +
>>  arch/arm64/mm/mmu.c                 | 289 +++++++++++++++++++++++++++++++-----
>>  6 files changed, 298 insertions(+), 41 deletions(-)
>
> Unfortuantely this is blowing up on my Juno atop of v3.18-rc5, because
> EFI runtime services seem to get mapped as non-executable, and at boot
> we tried to read the RTC via runtime services:
>
> Bad mode in Synchronous Abort handler detected, code 0x8600000d
> CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc5+ #23
> task: ffffffc9768b8000 ti: ffffffc9768c0000 task.ti: ffffffc9768c0000
> PC is at 0xffffffc97ffaa3e4
> LR is at virt_efi_get_time+0x60/0xa0
> pc : [<ffffffc97ffaa3e4>] lr : [<ffffffc00044a6c8>] pstate: 800000c5
> sp : ffffffc9768c39e0
> x29: ffffffc9768c39e0 x28: ffffffc000723c90
> x27: ffffffc0007b5678 x26: 0000000000000001
> x25: ffffffc000584320 x24: ffffffc0006aaeb0
> x23: ffffffc9768c3a60 x22: 0000000000000040
> x21: ffffffc97ffaa3e4 x20: ffffffc0007b5a48
> x19: ffffffc0007b5a40 x18: 0000000000000007
> x17: 000000000000000e x16: 0000000000000001
> x15: 0000000000000007 x14: 0ffffffffffffffe
> x13: ffffffbe22feb720 x12: 0000000000000030
> x11: 0000000000000003 x10: 0000000000000068
> x9 : 0000000000000000 x8 : ffffffc97f981c00
> x7 : 0000000000000000 x6 : 000000000000003f
> x5 : 0000000000000040 x4 : 000000097f6c8000
> x3 : 0000000000000000 x2 : 000000097f6c8000
> x1 : ffffffc9768c3a50 x0 : ffffffc9768c3a60
>
> Ard, is this issue solved by any current series, or do we need to do
> anything additional to ensure runtime services are mapped appropriately
> for DEBUG_RODATA kernels?
>

This problem will just vanish with my uefi virtmap series, because
then UEFI manages its own page tables.
I use PXN for everything except EFI_RUNTIME_SERVICES_CODE (which
[sadly] contains read-write data as well, due to how the PE/COFF
loader in UEFI usually implemented, i.e., it allocates one single
region to hold the entire static image, including .data and .bss).
Once these changes land, the UEFI page tables will only be active
during actual Runtime Services calls, so the attack service should be
reduced substantially.

For now, i guess adding a set_memory_x() in remap_region() for
EFI_RUNTIME_SERVICES_CODE should suffice?

-- 
Ard.


>> diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug
>> index 0a12933..867fe6f1 100644
>> --- a/arch/arm64/Kconfig.debug
>> +++ b/arch/arm64/Kconfig.debug
>> @@ -54,4 +54,27 @@ config DEBUG_SET_MODULE_RONX
>>            against certain classes of kernel exploits.
>>            If in doubt, say "N".
>>
>> +config DEBUG_RODATA
>> +       bool "Make kernel text and rodata read-only"
>> +       help
>> +         If this is set, kernel text and rodata will be made read-only. This
>> +         is to help catch accidental or malicious attempts to change the
>> +         kernel's executable code. Additionally splits rodata from kernel
>> +         text so it can be made explicitly non-executable.
>> +
>> +          If in doubt, say Y
>> +
>> +config DEBUG_ALIGN_RODATA
>> +       depends on DEBUG_RODATA && !ARM64_64K_PAGES
>> +       bool "Align linker sections up to SECTION_SIZE"
>> +       help
>> +         If this option is enabled, sections that may potentially be marked as
>> +         read only or non-executable will be aligned up to the section size of
>> +         the kernel. This prevents sections from being split into pages and
>> +         avoids a potential TLB penalty. The downside is an increase in
>> +         alignment and potentially wasted space. Turn on this option if
>> +         performance is more important than memory pressure.
>> +
>> +         If in doubt, say N
>> +
>>  endmenu
>> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
>> index 689b637..0014207 100644
>> --- a/arch/arm64/include/asm/cacheflush.h
>> +++ b/arch/arm64/include/asm/cacheflush.h
>> @@ -152,4 +152,8 @@ int set_memory_ro(unsigned long addr, int numpages);
>>  int set_memory_rw(unsigned long addr, int numpages);
>>  int set_memory_x(unsigned long addr, int numpages);
>>  int set_memory_nx(unsigned long addr, int numpages);
>> +
>> +#ifdef CONFIG_DEBUG_RODATA
>> +void mark_rodata_ro(void);
>> +#endif
>>  #endif
>> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
>> index 2d7e20a..405820d 100644
>> --- a/arch/arm64/kernel/vmlinux.lds.S
>> +++ b/arch/arm64/kernel/vmlinux.lds.S
>> @@ -8,6 +8,7 @@
>>  #include <asm/thread_info.h>
>>  #include <asm/memory.h>
>>  #include <asm/page.h>
>> +#include <asm/pgtable.h>
>>
>>  #include "image.h"
>>
>> @@ -54,6 +55,9 @@ SECTIONS
>>                 _text = .;
>>                 HEAD_TEXT
>>         }
>> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
>> +       . = ALIGN(1<<SECTION_SHIFT);
>> +#endif
>
> Could we have macros defined once above, something like:
>
> #ifdef CONFIG_DEBUG_ALIGN_RODATA
> #define ALIGN_DEBUG_RO                  . = ALIGN(1<<SECTION_SHIFT);
> #define ALIGN_DEBUG_RO_MIN(min)         ALIGN_DEBUG_RO
> #else
> #define ALIGN_DEBUG_RO
> #define ALIGN_DEBUG_RO_MIN(min)         . = ALIGN(min)
> #endif
>
> Then we can avoid all the ifdefs below.
>
>>         .text : {                       /* Real text segment            */
>>                 _stext = .;             /* Text and read-only data      */
>>                         *(.latehead.text)
>> @@ -71,19 +75,35 @@ SECTIONS
>>                 *(.got)                 /* Global offset table          */
>>         }
>>
>> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
>> +       . = ALIGN(1<<SECTION_SHIFT);
>> +#endif
>>         RO_DATA(PAGE_SIZE)
>>         EXCEPTION_TABLE(8)
>>         NOTES
>> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
>> +       . = ALIGN(1<<SECTION_SHIFT);
>> +#endif
>>         _etext = .;                     /* End of text and rodata section */
>>
>> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
>> +       . = ALIGN(1<<SECTION_SHIFT);
>> +#else
>>         . = ALIGN(PAGE_SIZE);
>> +#endif
>>         __init_begin = .;
>>
>>         INIT_TEXT_SECTION(8)
>>         .exit.text : {
>>                 ARM_EXIT_KEEP(EXIT_TEXT)
>>         }
>> +
>> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
>> +       . = ALIGN(1<<SECTION_SHIFT);
>> +       __init_data_begin = .;
>
> Is this used anywhere? we seem to have left __init_end after this, and
> haven't introduced an __init_data_end.
>
>> +#else
>>         . = ALIGN(16);
>> +#endif
>>         .init.data : {
>>                 INIT_DATA
>>                 INIT_SETUP(16)
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 494297c..61f44c7 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -324,6 +324,7 @@ void __init mem_init(void)
>>
>>  void free_initmem(void)
>>  {
>> +       fixup_init();
>>         free_initmem_default(0);
>>  }
>>
>> diff --git a/arch/arm64/mm/mm.h b/arch/arm64/mm/mm.h
>> index d519f4f..82347d8 100644
>> --- a/arch/arm64/mm/mm.h
>> +++ b/arch/arm64/mm/mm.h
>> @@ -1,2 +1,4 @@
>>  extern void __init bootmem_init(void);
>>  extern void __init arm64_swiotlb_init(void);
>> +
>> +void fixup_init(void);
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 6032f3e..dafde8e 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -26,6 +26,7 @@
>>  #include <linux/memblock.h>
>>  #include <linux/fs.h>
>>  #include <linux/io.h>
>> +#include <linux/stop_machine.h>
>>
>>  #include <asm/cputype.h>
>>  #include <asm/fixmap.h>
>> @@ -137,17 +138,55 @@ static void __init *early_alloc(unsigned long sz)
>>         return ptr;
>>  }
>>
>> -static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
>> +/*
>> + * remap a PMD into pages
>> + */
>> +static noinline void __ref split_pmd(pmd_t *pmd, bool early)
>> +{
>> +       pte_t *pte, *start_pte;
>> +       unsigned long pfn;
>> +       int i = 0;
>> +
>> +       if (early)
>> +               start_pte = pte = early_alloc(PTRS_PER_PTE*sizeof(pte_t));
>> +       else
>> +               start_pte = pte = (pte_t *)__get_free_page(PGALLOC_GFP);
>> +
>> +       BUG_ON(!pte);
>> +
>> +       pfn = pmd_pfn(*pmd);
>> +
>> +       do {
>> +               /*
>> +                * Need to have the least restrictive permissions available
>> +                * permissions will be fixed up later
>> +                */
>> +               set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
>> +               pfn++;
>> +       } while (pte++, i++, i < PTRS_PER_PTE);
>> +
>> +
>> +       __pmd_populate(pmd, __pa(start_pte), PMD_TYPE_TABLE);
>> +       flush_tlb_all();
>> +}
>> +
>> +static void __ref alloc_init_pte(pmd_t *pmd, unsigned long addr,
>>                                   unsigned long end, unsigned long pfn,
>> -                                 pgprot_t prot)
>> +                                 pgprot_t prot, bool early)
>>  {
>>         pte_t *pte;
>>
>>         if (pmd_none(*pmd)) {
>> -               pte = early_alloc(PTRS_PER_PTE * sizeof(pte_t));
>> +               if (early)
>> +                       pte = early_alloc(PTRS_PER_PTE * sizeof(pte_t));
>> +               else
>> +                       pte = (pte_t *)__get_free_page(PGALLOC_GFP);
>> +               BUG_ON(!pte);
>>                 __pmd_populate(pmd, __pa(pte), PMD_TYPE_TABLE);
>>         }
>> -       BUG_ON(pmd_bad(*pmd));
>> +
>> +       if (pmd_bad(*pmd))
>> +               split_pmd(pmd, early);
>>
>>         pte = pte_offset_kernel(pmd, addr);
>>         do {
>> @@ -156,29 +195,44 @@ static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
>>         } while (pte++, addr += PAGE_SIZE, addr != end);
>>  }
>>
>> -static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
>> +void __ref split_pud(pud_t *old_pud, pmd_t *pmd)
>> +{
>> +       unsigned long addr = pud_pfn(*old_pud) << PAGE_SHIFT;
>> +       pgprot_t prot = __pgprot(pud_val(*old_pud) ^ pud_pfn(*old_pud));
>> +       int i = 0;
>> +
>> +       do {
>> +               set_pmd(pmd, __pmd(addr | prot));
>> +               addr += PMD_SIZE;
>> +       } while (pmd++, i++, i < PTRS_PER_PMD);
>> +}
>
> As far as I can see only the functions which call early_alloc
> (split_pmd, alloc_init_pte, alloc_init_pmd) need the __ref annotation,
> so we can drop it heere and elsewhere.
>
> It would be nice if we could avoid the mismatch entirely. Perhaps we
> could pass an allocator function down instead of the early parameter?
> Something like:
>
> static void *late_alloc(unsigned long sz)
> {
>         BUG_ON(PAGE_SIZE < sz);
>         return (void *)__get_free_page(PGALLOC_GFP);
> }
>
> which we could pass down from create_mapping_late in place of
> early_alloc for create_id_mapping and create_mapping.
>
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv5 7/7] arm64: add better page protections to arm64
  2014-11-19 17:38     ` Ard Biesheuvel
@ 2014-11-19 18:06       ` Ard Biesheuvel
  2014-11-19 18:46       ` Mark Rutland
  1 sibling, 0 replies; 24+ messages in thread
From: Ard Biesheuvel @ 2014-11-19 18:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 November 2014 18:38, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 19 November 2014 17:31, Mark Rutland <mark.rutland@arm.com> wrote:
>> Hi Laura,
>>
>> On Tue, Nov 18, 2014 at 12:55:05AM +0000, Laura Abbott wrote:
>>> Add page protections for arm64 similar to those in arm.
>>> This is for security reasons to prevent certain classes
>>> of exploits. The current method:
>>>
>>> - Map all memory as either RWX or RW. We round to the nearest
>>>   section to avoid creating page tables before everything is mapped
>>> - Once everything is mapped, if either end of the RWX section should
>>>   not be X, we split the PMD and remap as necessary
>>> - When initmem is to be freed, we change the permissions back to
>>>   RW (using stop machine if necessary to flush the TLB)
>>> - If CONFIG_DEBUG_RODATA is set, the read only sections are set
>>>   read only.
>>>
>>> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
>>> ---
>>> v5:
>>>  -Dropped map_io from the alloc_init_* functions
>>>  -Fixed a bug in split_pud and cleaned it up per suggestions from Steve
>>>  -Aligned _etext up to SECTION_SIZE of DEBUG_ALIGN_RODATA is enabled to ensure
>>>   that the section mapping is actually kept and we don't leak any RWX regions
>>>  -Dropped some old ioremap code that somehow snuck in from the last rebase
>>>  -Fixed create_id_mapping to use the early mapping since efi_idmap_init is
>>>   called early
>>> ---
>>>  arch/arm64/Kconfig.debug            |  23 +++
>>>  arch/arm64/include/asm/cacheflush.h |   4 +
>>>  arch/arm64/kernel/vmlinux.lds.S     |  20 +++
>>>  arch/arm64/mm/init.c                |   1 +
>>>  arch/arm64/mm/mm.h                  |   2 +
>>>  arch/arm64/mm/mmu.c                 | 289 +++++++++++++++++++++++++++++++-----
>>>  6 files changed, 298 insertions(+), 41 deletions(-)
>>
>> Unfortuantely this is blowing up on my Juno atop of v3.18-rc5, because
>> EFI runtime services seem to get mapped as non-executable, and at boot
>> we tried to read the RTC via runtime services:
>>
>> Bad mode in Synchronous Abort handler detected, code 0x8600000d
>> CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc5+ #23
>> task: ffffffc9768b8000 ti: ffffffc9768c0000 task.ti: ffffffc9768c0000
>> PC is at 0xffffffc97ffaa3e4
>> LR is at virt_efi_get_time+0x60/0xa0
>> pc : [<ffffffc97ffaa3e4>] lr : [<ffffffc00044a6c8>] pstate: 800000c5
>> sp : ffffffc9768c39e0
>> x29: ffffffc9768c39e0 x28: ffffffc000723c90
>> x27: ffffffc0007b5678 x26: 0000000000000001
>> x25: ffffffc000584320 x24: ffffffc0006aaeb0
>> x23: ffffffc9768c3a60 x22: 0000000000000040
>> x21: ffffffc97ffaa3e4 x20: ffffffc0007b5a48
>> x19: ffffffc0007b5a40 x18: 0000000000000007
>> x17: 000000000000000e x16: 0000000000000001
>> x15: 0000000000000007 x14: 0ffffffffffffffe
>> x13: ffffffbe22feb720 x12: 0000000000000030
>> x11: 0000000000000003 x10: 0000000000000068
>> x9 : 0000000000000000 x8 : ffffffc97f981c00
>> x7 : 0000000000000000 x6 : 000000000000003f
>> x5 : 0000000000000040 x4 : 000000097f6c8000
>> x3 : 0000000000000000 x2 : 000000097f6c8000
>> x1 : ffffffc9768c3a50 x0 : ffffffc9768c3a60
>>
>> Ard, is this issue solved by any current series, or do we need to do
>> anything additional to ensure runtime services are mapped appropriately
>> for DEBUG_RODATA kernels?
>>
>
> This problem will just vanish with my uefi virtmap series, because
> then UEFI manages its own page tables.
> I use PXN for everything except EFI_RUNTIME_SERVICES_CODE (which
> [sadly] contains read-write data as well, due to how the PE/COFF
> loader in UEFI usually implemented, i.e., it allocates one single
> region to hold the entire static image, including .data and .bss).
> Once these changes land, the UEFI page tables will only be active
> during actual Runtime Services calls, so the attack service should be
> reduced substantially.
>

*surface*

> For now, i guess adding a set_memory_x() in remap_region() for
> EFI_RUNTIME_SERVICES_CODE should suffice?
>
> --
> Ard.
>
>
>>> diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug
>>> index 0a12933..867fe6f1 100644
>>> --- a/arch/arm64/Kconfig.debug
>>> +++ b/arch/arm64/Kconfig.debug
>>> @@ -54,4 +54,27 @@ config DEBUG_SET_MODULE_RONX
>>>            against certain classes of kernel exploits.
>>>            If in doubt, say "N".
>>>
>>> +config DEBUG_RODATA
>>> +       bool "Make kernel text and rodata read-only"
>>> +       help
>>> +         If this is set, kernel text and rodata will be made read-only. This
>>> +         is to help catch accidental or malicious attempts to change the
>>> +         kernel's executable code. Additionally splits rodata from kernel
>>> +         text so it can be made explicitly non-executable.
>>> +
>>> +          If in doubt, say Y
>>> +
>>> +config DEBUG_ALIGN_RODATA
>>> +       depends on DEBUG_RODATA && !ARM64_64K_PAGES
>>> +       bool "Align linker sections up to SECTION_SIZE"
>>> +       help
>>> +         If this option is enabled, sections that may potentially be marked as
>>> +         read only or non-executable will be aligned up to the section size of
>>> +         the kernel. This prevents sections from being split into pages and
>>> +         avoids a potential TLB penalty. The downside is an increase in
>>> +         alignment and potentially wasted space. Turn on this option if
>>> +         performance is more important than memory pressure.
>>> +
>>> +         If in doubt, say N
>>> +
>>>  endmenu
>>> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
>>> index 689b637..0014207 100644
>>> --- a/arch/arm64/include/asm/cacheflush.h
>>> +++ b/arch/arm64/include/asm/cacheflush.h
>>> @@ -152,4 +152,8 @@ int set_memory_ro(unsigned long addr, int numpages);
>>>  int set_memory_rw(unsigned long addr, int numpages);
>>>  int set_memory_x(unsigned long addr, int numpages);
>>>  int set_memory_nx(unsigned long addr, int numpages);
>>> +
>>> +#ifdef CONFIG_DEBUG_RODATA
>>> +void mark_rodata_ro(void);
>>> +#endif
>>>  #endif
>>> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
>>> index 2d7e20a..405820d 100644
>>> --- a/arch/arm64/kernel/vmlinux.lds.S
>>> +++ b/arch/arm64/kernel/vmlinux.lds.S
>>> @@ -8,6 +8,7 @@
>>>  #include <asm/thread_info.h>
>>>  #include <asm/memory.h>
>>>  #include <asm/page.h>
>>> +#include <asm/pgtable.h>
>>>
>>>  #include "image.h"
>>>
>>> @@ -54,6 +55,9 @@ SECTIONS
>>>                 _text = .;
>>>                 HEAD_TEXT
>>>         }
>>> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
>>> +       . = ALIGN(1<<SECTION_SHIFT);
>>> +#endif
>>
>> Could we have macros defined once above, something like:
>>
>> #ifdef CONFIG_DEBUG_ALIGN_RODATA
>> #define ALIGN_DEBUG_RO                  . = ALIGN(1<<SECTION_SHIFT);
>> #define ALIGN_DEBUG_RO_MIN(min)         ALIGN_DEBUG_RO
>> #else
>> #define ALIGN_DEBUG_RO
>> #define ALIGN_DEBUG_RO_MIN(min)         . = ALIGN(min)
>> #endif
>>
>> Then we can avoid all the ifdefs below.
>>
>>>         .text : {                       /* Real text segment            */
>>>                 _stext = .;             /* Text and read-only data      */
>>>                         *(.latehead.text)
>>> @@ -71,19 +75,35 @@ SECTIONS
>>>                 *(.got)                 /* Global offset table          */
>>>         }
>>>
>>> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
>>> +       . = ALIGN(1<<SECTION_SHIFT);
>>> +#endif
>>>         RO_DATA(PAGE_SIZE)
>>>         EXCEPTION_TABLE(8)
>>>         NOTES
>>> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
>>> +       . = ALIGN(1<<SECTION_SHIFT);
>>> +#endif
>>>         _etext = .;                     /* End of text and rodata section */
>>>
>>> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
>>> +       . = ALIGN(1<<SECTION_SHIFT);
>>> +#else
>>>         . = ALIGN(PAGE_SIZE);
>>> +#endif
>>>         __init_begin = .;
>>>
>>>         INIT_TEXT_SECTION(8)
>>>         .exit.text : {
>>>                 ARM_EXIT_KEEP(EXIT_TEXT)
>>>         }
>>> +
>>> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
>>> +       . = ALIGN(1<<SECTION_SHIFT);
>>> +       __init_data_begin = .;
>>
>> Is this used anywhere? we seem to have left __init_end after this, and
>> haven't introduced an __init_data_end.
>>
>>> +#else
>>>         . = ALIGN(16);
>>> +#endif
>>>         .init.data : {
>>>                 INIT_DATA
>>>                 INIT_SETUP(16)
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index 494297c..61f44c7 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -324,6 +324,7 @@ void __init mem_init(void)
>>>
>>>  void free_initmem(void)
>>>  {
>>> +       fixup_init();
>>>         free_initmem_default(0);
>>>  }
>>>
>>> diff --git a/arch/arm64/mm/mm.h b/arch/arm64/mm/mm.h
>>> index d519f4f..82347d8 100644
>>> --- a/arch/arm64/mm/mm.h
>>> +++ b/arch/arm64/mm/mm.h
>>> @@ -1,2 +1,4 @@
>>>  extern void __init bootmem_init(void);
>>>  extern void __init arm64_swiotlb_init(void);
>>> +
>>> +void fixup_init(void);
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 6032f3e..dafde8e 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -26,6 +26,7 @@
>>>  #include <linux/memblock.h>
>>>  #include <linux/fs.h>
>>>  #include <linux/io.h>
>>> +#include <linux/stop_machine.h>
>>>
>>>  #include <asm/cputype.h>
>>>  #include <asm/fixmap.h>
>>> @@ -137,17 +138,55 @@ static void __init *early_alloc(unsigned long sz)
>>>         return ptr;
>>>  }
>>>
>>> -static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
>>> +/*
>>> + * remap a PMD into pages
>>> + */
>>> +static noinline void __ref split_pmd(pmd_t *pmd, bool early)
>>> +{
>>> +       pte_t *pte, *start_pte;
>>> +       unsigned long pfn;
>>> +       int i = 0;
>>> +
>>> +       if (early)
>>> +               start_pte = pte = early_alloc(PTRS_PER_PTE*sizeof(pte_t));
>>> +       else
>>> +               start_pte = pte = (pte_t *)__get_free_page(PGALLOC_GFP);
>>> +
>>> +       BUG_ON(!pte);
>>> +
>>> +       pfn = pmd_pfn(*pmd);
>>> +
>>> +       do {
>>> +               /*
>>> +                * Need to have the least restrictive permissions available
>>> +                * permissions will be fixed up later
>>> +                */
>>> +               set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
>>> +               pfn++;
>>> +       } while (pte++, i++, i < PTRS_PER_PTE);
>>> +
>>> +
>>> +       __pmd_populate(pmd, __pa(start_pte), PMD_TYPE_TABLE);
>>> +       flush_tlb_all();
>>> +}
>>> +
>>> +static void __ref alloc_init_pte(pmd_t *pmd, unsigned long addr,
>>>                                   unsigned long end, unsigned long pfn,
>>> -                                 pgprot_t prot)
>>> +                                 pgprot_t prot, bool early)
>>>  {
>>>         pte_t *pte;
>>>
>>>         if (pmd_none(*pmd)) {
>>> -               pte = early_alloc(PTRS_PER_PTE * sizeof(pte_t));
>>> +               if (early)
>>> +                       pte = early_alloc(PTRS_PER_PTE * sizeof(pte_t));
>>> +               else
>>> +                       pte = (pte_t *)__get_free_page(PGALLOC_GFP);
>>> +               BUG_ON(!pte);
>>>                 __pmd_populate(pmd, __pa(pte), PMD_TYPE_TABLE);
>>>         }
>>> -       BUG_ON(pmd_bad(*pmd));
>>> +
>>> +       if (pmd_bad(*pmd))
>>> +               split_pmd(pmd, early);
>>>
>>>         pte = pte_offset_kernel(pmd, addr);
>>>         do {
>>> @@ -156,29 +195,44 @@ static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
>>>         } while (pte++, addr += PAGE_SIZE, addr != end);
>>>  }
>>>
>>> -static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
>>> +void __ref split_pud(pud_t *old_pud, pmd_t *pmd)
>>> +{
>>> +       unsigned long addr = pud_pfn(*old_pud) << PAGE_SHIFT;
>>> +       pgprot_t prot = __pgprot(pud_val(*old_pud) ^ pud_pfn(*old_pud));
>>> +       int i = 0;
>>> +
>>> +       do {
>>> +               set_pmd(pmd, __pmd(addr | prot));
>>> +               addr += PMD_SIZE;
>>> +       } while (pmd++, i++, i < PTRS_PER_PMD);
>>> +}
>>
>> As far as I can see only the functions which call early_alloc
>> (split_pmd, alloc_init_pte, alloc_init_pmd) need the __ref annotation,
>> so we can drop it heere and elsewhere.
>>
>> It would be nice if we could avoid the mismatch entirely. Perhaps we
>> could pass an allocator function down instead of the early parameter?
>> Something like:
>>
>> static void *late_alloc(unsigned long sz)
>> {
>>         BUG_ON(PAGE_SIZE < sz);
>>         return (void *)__get_free_page(PGALLOC_GFP);
>> }
>>
>> which we could pass down from create_mapping_late in place of
>> early_alloc for create_id_mapping and create_mapping.
>>
>> Thanks,
>> Mark.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv5 7/7] arm64: add better page protections to arm64
  2014-11-19 17:38     ` Ard Biesheuvel
  2014-11-19 18:06       ` Ard Biesheuvel
@ 2014-11-19 18:46       ` Mark Rutland
  2014-11-19 18:56         ` Ard Biesheuvel
  1 sibling, 1 reply; 24+ messages in thread
From: Mark Rutland @ 2014-11-19 18:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 19, 2014 at 05:38:07PM +0000, Ard Biesheuvel wrote:
> On 19 November 2014 17:31, Mark Rutland <mark.rutland@arm.com> wrote:
> > Hi Laura,
> >
> > On Tue, Nov 18, 2014 at 12:55:05AM +0000, Laura Abbott wrote:
> >> Add page protections for arm64 similar to those in arm.
> >> This is for security reasons to prevent certain classes
> >> of exploits. The current method:
> >>
> >> - Map all memory as either RWX or RW. We round to the nearest
> >>   section to avoid creating page tables before everything is mapped
> >> - Once everything is mapped, if either end of the RWX section should
> >>   not be X, we split the PMD and remap as necessary
> >> - When initmem is to be freed, we change the permissions back to
> >>   RW (using stop machine if necessary to flush the TLB)
> >> - If CONFIG_DEBUG_RODATA is set, the read only sections are set
> >>   read only.
> >>
> >> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
> >> ---
> >> v5:
> >>  -Dropped map_io from the alloc_init_* functions
> >>  -Fixed a bug in split_pud and cleaned it up per suggestions from Steve
> >>  -Aligned _etext up to SECTION_SIZE of DEBUG_ALIGN_RODATA is enabled to ensure
> >>   that the section mapping is actually kept and we don't leak any RWX regions
> >>  -Dropped some old ioremap code that somehow snuck in from the last rebase
> >>  -Fixed create_id_mapping to use the early mapping since efi_idmap_init is
> >>   called early
> >> ---
> >>  arch/arm64/Kconfig.debug            |  23 +++
> >>  arch/arm64/include/asm/cacheflush.h |   4 +
> >>  arch/arm64/kernel/vmlinux.lds.S     |  20 +++
> >>  arch/arm64/mm/init.c                |   1 +
> >>  arch/arm64/mm/mm.h                  |   2 +
> >>  arch/arm64/mm/mmu.c                 | 289 +++++++++++++++++++++++++++++++-----
> >>  6 files changed, 298 insertions(+), 41 deletions(-)
> >
> > Unfortuantely this is blowing up on my Juno atop of v3.18-rc5, because
> > EFI runtime services seem to get mapped as non-executable, and at boot
> > we tried to read the RTC via runtime services:
> >
> > Bad mode in Synchronous Abort handler detected, code 0x8600000d
> > CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc5+ #23
> > task: ffffffc9768b8000 ti: ffffffc9768c0000 task.ti: ffffffc9768c0000
> > PC is at 0xffffffc97ffaa3e4
> > LR is at virt_efi_get_time+0x60/0xa0
> > pc : [<ffffffc97ffaa3e4>] lr : [<ffffffc00044a6c8>] pstate: 800000c5
> > sp : ffffffc9768c39e0
> > x29: ffffffc9768c39e0 x28: ffffffc000723c90
> > x27: ffffffc0007b5678 x26: 0000000000000001
> > x25: ffffffc000584320 x24: ffffffc0006aaeb0
> > x23: ffffffc9768c3a60 x22: 0000000000000040
> > x21: ffffffc97ffaa3e4 x20: ffffffc0007b5a48
> > x19: ffffffc0007b5a40 x18: 0000000000000007
> > x17: 000000000000000e x16: 0000000000000001
> > x15: 0000000000000007 x14: 0ffffffffffffffe
> > x13: ffffffbe22feb720 x12: 0000000000000030
> > x11: 0000000000000003 x10: 0000000000000068
> > x9 : 0000000000000000 x8 : ffffffc97f981c00
> > x7 : 0000000000000000 x6 : 000000000000003f
> > x5 : 0000000000000040 x4 : 000000097f6c8000
> > x3 : 0000000000000000 x2 : 000000097f6c8000
> > x1 : ffffffc9768c3a50 x0 : ffffffc9768c3a60
> >
> > Ard, is this issue solved by any current series, or do we need to do
> > anything additional to ensure runtime services are mapped appropriately
> > for DEBUG_RODATA kernels?
> >
> 
> This problem will just vanish with my uefi virtmap series, because
> then UEFI manages its own page tables.
> I use PXN for everything except EFI_RUNTIME_SERVICES_CODE (which
> [sadly] contains read-write data as well, due to how the PE/COFF
> loader in UEFI usually implemented, i.e., it allocates one single
> region to hold the entire static image, including .data and .bss).
> Once these changes land, the UEFI page tables will only be active
> during actual Runtime Services calls, so the attack service should be
> reduced substantially.
> 
> For now, i guess adding a set_memory_x() in remap_region() for
> EFI_RUNTIME_SERVICES_CODE should suffice?

Do we map that as module memory? I see change_memory_common (backing
set_memory_x) will fail if:

	(!is_module_address(start) || !is_module_address(end - 1)

If it is, I guess that will work for now.

Mark.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv5 7/7] arm64: add better page protections to arm64
  2014-11-19 18:46       ` Mark Rutland
@ 2014-11-19 18:56         ` Ard Biesheuvel
  2014-11-19 19:20           ` Laura Abbott
  0 siblings, 1 reply; 24+ messages in thread
From: Ard Biesheuvel @ 2014-11-19 18:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 November 2014 19:46, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Nov 19, 2014 at 05:38:07PM +0000, Ard Biesheuvel wrote:
>> On 19 November 2014 17:31, Mark Rutland <mark.rutland@arm.com> wrote:
>> > Hi Laura,
>> >
>> > On Tue, Nov 18, 2014 at 12:55:05AM +0000, Laura Abbott wrote:
>> >> Add page protections for arm64 similar to those in arm.
>> >> This is for security reasons to prevent certain classes
>> >> of exploits. The current method:
>> >>
>> >> - Map all memory as either RWX or RW. We round to the nearest
>> >>   section to avoid creating page tables before everything is mapped
>> >> - Once everything is mapped, if either end of the RWX section should
>> >>   not be X, we split the PMD and remap as necessary
>> >> - When initmem is to be freed, we change the permissions back to
>> >>   RW (using stop machine if necessary to flush the TLB)
>> >> - If CONFIG_DEBUG_RODATA is set, the read only sections are set
>> >>   read only.
>> >>
>> >> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
>> >> ---
>> >> v5:
>> >>  -Dropped map_io from the alloc_init_* functions
>> >>  -Fixed a bug in split_pud and cleaned it up per suggestions from Steve
>> >>  -Aligned _etext up to SECTION_SIZE of DEBUG_ALIGN_RODATA is enabled to ensure
>> >>   that the section mapping is actually kept and we don't leak any RWX regions
>> >>  -Dropped some old ioremap code that somehow snuck in from the last rebase
>> >>  -Fixed create_id_mapping to use the early mapping since efi_idmap_init is
>> >>   called early
>> >> ---
>> >>  arch/arm64/Kconfig.debug            |  23 +++
>> >>  arch/arm64/include/asm/cacheflush.h |   4 +
>> >>  arch/arm64/kernel/vmlinux.lds.S     |  20 +++
>> >>  arch/arm64/mm/init.c                |   1 +
>> >>  arch/arm64/mm/mm.h                  |   2 +
>> >>  arch/arm64/mm/mmu.c                 | 289 +++++++++++++++++++++++++++++++-----
>> >>  6 files changed, 298 insertions(+), 41 deletions(-)
>> >
>> > Unfortuantely this is blowing up on my Juno atop of v3.18-rc5, because
>> > EFI runtime services seem to get mapped as non-executable, and at boot
>> > we tried to read the RTC via runtime services:
>> >
>> > Bad mode in Synchronous Abort handler detected, code 0x8600000d
>> > CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc5+ #23
>> > task: ffffffc9768b8000 ti: ffffffc9768c0000 task.ti: ffffffc9768c0000
>> > PC is at 0xffffffc97ffaa3e4
>> > LR is at virt_efi_get_time+0x60/0xa0
>> > pc : [<ffffffc97ffaa3e4>] lr : [<ffffffc00044a6c8>] pstate: 800000c5
>> > sp : ffffffc9768c39e0
>> > x29: ffffffc9768c39e0 x28: ffffffc000723c90
>> > x27: ffffffc0007b5678 x26: 0000000000000001
>> > x25: ffffffc000584320 x24: ffffffc0006aaeb0
>> > x23: ffffffc9768c3a60 x22: 0000000000000040
>> > x21: ffffffc97ffaa3e4 x20: ffffffc0007b5a48
>> > x19: ffffffc0007b5a40 x18: 0000000000000007
>> > x17: 000000000000000e x16: 0000000000000001
>> > x15: 0000000000000007 x14: 0ffffffffffffffe
>> > x13: ffffffbe22feb720 x12: 0000000000000030
>> > x11: 0000000000000003 x10: 0000000000000068
>> > x9 : 0000000000000000 x8 : ffffffc97f981c00
>> > x7 : 0000000000000000 x6 : 000000000000003f
>> > x5 : 0000000000000040 x4 : 000000097f6c8000
>> > x3 : 0000000000000000 x2 : 000000097f6c8000
>> > x1 : ffffffc9768c3a50 x0 : ffffffc9768c3a60
>> >
>> > Ard, is this issue solved by any current series, or do we need to do
>> > anything additional to ensure runtime services are mapped appropriately
>> > for DEBUG_RODATA kernels?
>> >
>>
>> This problem will just vanish with my uefi virtmap series, because
>> then UEFI manages its own page tables.
>> I use PXN for everything except EFI_RUNTIME_SERVICES_CODE (which
>> [sadly] contains read-write data as well, due to how the PE/COFF
>> loader in UEFI usually implemented, i.e., it allocates one single
>> region to hold the entire static image, including .data and .bss).
>> Once these changes land, the UEFI page tables will only be active
>> during actual Runtime Services calls, so the attack service should be
>> reduced substantially.
>>
>> For now, i guess adding a set_memory_x() in remap_region() for
>> EFI_RUNTIME_SERVICES_CODE should suffice?
>
> Do we map that as module memory? I see change_memory_common (backing
> set_memory_x) will fail if:
>
>         (!is_module_address(start) || !is_module_address(end - 1)
>
> If it is, I guess that will work for now.
>

Oh right, no we don't map it as module memory as far as I know.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv5 7/7] arm64: add better page protections to arm64
  2014-11-19 18:56         ` Ard Biesheuvel
@ 2014-11-19 19:20           ` Laura Abbott
  0 siblings, 0 replies; 24+ messages in thread
From: Laura Abbott @ 2014-11-19 19:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/19/2014 10:56 AM, Ard Biesheuvel wrote:
> On 19 November 2014 19:46, Mark Rutland <mark.rutland@arm.com> wrote:
>> On Wed, Nov 19, 2014 at 05:38:07PM +0000, Ard Biesheuvel wrote:
>>> On 19 November 2014 17:31, Mark Rutland <mark.rutland@arm.com> wrote:
>>>> Hi Laura,
>>>>
>>>> On Tue, Nov 18, 2014 at 12:55:05AM +0000, Laura Abbott wrote:
>>>>> Add page protections for arm64 similar to those in arm.
>>>>> This is for security reasons to prevent certain classes
>>>>> of exploits. The current method:
>>>>>
>>>>> - Map all memory as either RWX or RW. We round to the nearest
>>>>>    section to avoid creating page tables before everything is mapped
>>>>> - Once everything is mapped, if either end of the RWX section should
>>>>>    not be X, we split the PMD and remap as necessary
>>>>> - When initmem is to be freed, we change the permissions back to
>>>>>    RW (using stop machine if necessary to flush the TLB)
>>>>> - If CONFIG_DEBUG_RODATA is set, the read only sections are set
>>>>>    read only.
>>>>>
>>>>> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
>>>>> ---
>>>>> v5:
>>>>>   -Dropped map_io from the alloc_init_* functions
>>>>>   -Fixed a bug in split_pud and cleaned it up per suggestions from Steve
>>>>>   -Aligned _etext up to SECTION_SIZE of DEBUG_ALIGN_RODATA is enabled to ensure
>>>>>    that the section mapping is actually kept and we don't leak any RWX regions
>>>>>   -Dropped some old ioremap code that somehow snuck in from the last rebase
>>>>>   -Fixed create_id_mapping to use the early mapping since efi_idmap_init is
>>>>>    called early
>>>>> ---
>>>>>   arch/arm64/Kconfig.debug            |  23 +++
>>>>>   arch/arm64/include/asm/cacheflush.h |   4 +
>>>>>   arch/arm64/kernel/vmlinux.lds.S     |  20 +++
>>>>>   arch/arm64/mm/init.c                |   1 +
>>>>>   arch/arm64/mm/mm.h                  |   2 +
>>>>>   arch/arm64/mm/mmu.c                 | 289 +++++++++++++++++++++++++++++++-----
>>>>>   6 files changed, 298 insertions(+), 41 deletions(-)
>>>>
>>>> Unfortuantely this is blowing up on my Juno atop of v3.18-rc5, because
>>>> EFI runtime services seem to get mapped as non-executable, and at boot
>>>> we tried to read the RTC via runtime services:
>>>>
>>>> Bad mode in Synchronous Abort handler detected, code 0x8600000d
>>>> CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc5+ #23
>>>> task: ffffffc9768b8000 ti: ffffffc9768c0000 task.ti: ffffffc9768c0000
>>>> PC is at 0xffffffc97ffaa3e4
>>>> LR is at virt_efi_get_time+0x60/0xa0
>>>> pc : [<ffffffc97ffaa3e4>] lr : [<ffffffc00044a6c8>] pstate: 800000c5
>>>> sp : ffffffc9768c39e0
>>>> x29: ffffffc9768c39e0 x28: ffffffc000723c90
>>>> x27: ffffffc0007b5678 x26: 0000000000000001
>>>> x25: ffffffc000584320 x24: ffffffc0006aaeb0
>>>> x23: ffffffc9768c3a60 x22: 0000000000000040
>>>> x21: ffffffc97ffaa3e4 x20: ffffffc0007b5a48
>>>> x19: ffffffc0007b5a40 x18: 0000000000000007
>>>> x17: 000000000000000e x16: 0000000000000001
>>>> x15: 0000000000000007 x14: 0ffffffffffffffe
>>>> x13: ffffffbe22feb720 x12: 0000000000000030
>>>> x11: 0000000000000003 x10: 0000000000000068
>>>> x9 : 0000000000000000 x8 : ffffffc97f981c00
>>>> x7 : 0000000000000000 x6 : 000000000000003f
>>>> x5 : 0000000000000040 x4 : 000000097f6c8000
>>>> x3 : 0000000000000000 x2 : 000000097f6c8000
>>>> x1 : ffffffc9768c3a50 x0 : ffffffc9768c3a60
>>>>
>>>> Ard, is this issue solved by any current series, or do we need to do
>>>> anything additional to ensure runtime services are mapped appropriately
>>>> for DEBUG_RODATA kernels?
>>>>
>>>
>>> This problem will just vanish with my uefi virtmap series, because
>>> then UEFI manages its own page tables.
>>> I use PXN for everything except EFI_RUNTIME_SERVICES_CODE (which
>>> [sadly] contains read-write data as well, due to how the PE/COFF
>>> loader in UEFI usually implemented, i.e., it allocates one single
>>> region to hold the entire static image, including .data and .bss).
>>> Once these changes land, the UEFI page tables will only be active
>>> during actual Runtime Services calls, so the attack service should be
>>> reduced substantially.
>>>
>>> For now, i guess adding a set_memory_x() in remap_region() for
>>> EFI_RUNTIME_SERVICES_CODE should suffice?
>>
>> Do we map that as module memory? I see change_memory_common (backing
>> set_memory_x) will fail if:
>>
>>          (!is_module_address(start) || !is_module_address(end - 1)
>>
>> If it is, I guess that will work for now.
>>
>
> Oh right, no we don't map it as module memory as far as I know.
>

Even if we remove that check, calling set_memory_x currently won't
work unless the underlying memory is mapped with pages. Modules
are mapped with pages but the rest of memory is not guaranteed to
have pages.

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv5 0/7] Better page protections for arm64
  2014-11-18  0:54 [PATCHv5 0/7] Better page protections for arm64 Laura Abbott
                   ` (6 preceding siblings ...)
  2014-11-18  0:55 ` [PATCHv5 7/7] arm64: add better page protections to arm64 Laura Abbott
@ 2014-11-19 22:33 ` Kees Cook
  2014-11-19 22:37   ` Laura Abbott
  7 siblings, 1 reply; 24+ messages in thread
From: Kees Cook @ 2014-11-19 22:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 17, 2014 at 4:54 PM, Laura Abbott <lauraa@codeaurora.org> wrote:
> Hi,
>
> This is v5 of the series to add stricter page protections for arm64.
> The goal is to have text be RO/NX and everything else be RW/NX.
> I finally got my hands on a Juno board so I was able to do more
> testing with both 4K and 64K pages although I still haven't tested
> with EFI. This is based off of 3.18-rc5.
>
> Thanks,
> Laura
>
> Laura Abbott (7):
>   arm64: Treat handle_arch_irq as a function pointer
>   arm64: Switch to adrp for loading the stub vectors
>   arm64: Move cpu_resume into the text section
>   arm64: Move some head.text functions to executable section
>   arm64: Factor out fixmap initialiation from ioremap
>   arm64: use fixmap for text patching when text is RO
>   arm64: add better page protections to arm64
>
>  arch/arm64/Kconfig.debug            |  23 ++
>  arch/arm64/include/asm/cacheflush.h |   4 +
>  arch/arm64/include/asm/fixmap.h     |   8 +-
>  arch/arm64/include/asm/insn.h       |   2 +
>  arch/arm64/include/asm/irq.h        |   1 -
>  arch/arm64/kernel/entry.S           |   6 +-
>  arch/arm64/kernel/head.S            | 409 +++++++++++++++++-----------------
>  arch/arm64/kernel/insn.c            |  72 +++++-
>  arch/arm64/kernel/irq.c             |   2 +
>  arch/arm64/kernel/jump_label.c      |   2 +-
>  arch/arm64/kernel/setup.c           |   1 +
>  arch/arm64/kernel/sleep.S           |  29 +--
>  arch/arm64/kernel/suspend.c         |   4 +-
>  arch/arm64/kernel/vmlinux.lds.S     |  21 ++
>  arch/arm64/mm/init.c                |   1 +
>  arch/arm64/mm/ioremap.c             |  93 +-------
>  arch/arm64/mm/mm.h                  |   2 +
>  arch/arm64/mm/mmu.c                 | 429 ++++++++++++++++++++++++++++++++----
>  18 files changed, 743 insertions(+), 366 deletions(-)

Thanks for working on this series! I've tested this on my aarch64
hardware, and it worked nicely. :) Consider the whole series as:

Tested-by: Kees Cook <keescook@chromium.org>

Has anyone looked at getting an arm64 version of CONFIG_ARM_PTDUMP
built? It'd be really nice to be able to check page table layout at a
glace.

In the meantime, with this patch series, the "WRITE_RO" and
"WRITE_KERN" tests from lkdtm correctly Oops the kernel.

Thanks!

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv5 0/7] Better page protections for arm64
  2014-11-19 22:33 ` [PATCHv5 0/7] Better page protections for arm64 Kees Cook
@ 2014-11-19 22:37   ` Laura Abbott
  0 siblings, 0 replies; 24+ messages in thread
From: Laura Abbott @ 2014-11-19 22:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/19/2014 2:33 PM, Kees Cook wrote:
> On Mon, Nov 17, 2014 at 4:54 PM, Laura Abbott <lauraa@codeaurora.org> wrote:
>> Hi,
>>
>> This is v5 of the series to add stricter page protections for arm64.
>> The goal is to have text be RO/NX and everything else be RW/NX.
>> I finally got my hands on a Juno board so I was able to do more
>> testing with both 4K and 64K pages although I still haven't tested
>> with EFI. This is based off of 3.18-rc5.
>>
>> Thanks,
>> Laura
>>
>> Laura Abbott (7):
>>    arm64: Treat handle_arch_irq as a function pointer
>>    arm64: Switch to adrp for loading the stub vectors
>>    arm64: Move cpu_resume into the text section
>>    arm64: Move some head.text functions to executable section
>>    arm64: Factor out fixmap initialiation from ioremap
>>    arm64: use fixmap for text patching when text is RO
>>    arm64: add better page protections to arm64
>>
>>   arch/arm64/Kconfig.debug            |  23 ++
>>   arch/arm64/include/asm/cacheflush.h |   4 +
>>   arch/arm64/include/asm/fixmap.h     |   8 +-
>>   arch/arm64/include/asm/insn.h       |   2 +
>>   arch/arm64/include/asm/irq.h        |   1 -
>>   arch/arm64/kernel/entry.S           |   6 +-
>>   arch/arm64/kernel/head.S            | 409 +++++++++++++++++-----------------
>>   arch/arm64/kernel/insn.c            |  72 +++++-
>>   arch/arm64/kernel/irq.c             |   2 +
>>   arch/arm64/kernel/jump_label.c      |   2 +-
>>   arch/arm64/kernel/setup.c           |   1 +
>>   arch/arm64/kernel/sleep.S           |  29 +--
>>   arch/arm64/kernel/suspend.c         |   4 +-
>>   arch/arm64/kernel/vmlinux.lds.S     |  21 ++
>>   arch/arm64/mm/init.c                |   1 +
>>   arch/arm64/mm/ioremap.c             |  93 +-------
>>   arch/arm64/mm/mm.h                  |   2 +
>>   arch/arm64/mm/mmu.c                 | 429 ++++++++++++++++++++++++++++++++----
>>   18 files changed, 743 insertions(+), 366 deletions(-)
>
> Thanks for working on this series! I've tested this on my aarch64
> hardware, and it worked nicely. :) Consider the whole series as:
>
> Tested-by: Kees Cook <keescook@chromium.org>
>
> Has anyone looked at getting an arm64 version of CONFIG_ARM_PTDUMP
> built? It'd be really nice to be able to check page table layout at a
> glace.
>

Yep, I have a version of that

http://lists.infradead.org/pipermail/linux-arm-kernel/2014-November/303418.html

Testing appreciated as always :)

> In the meantime, with this patch series, the "WRITE_RO" and
> "WRITE_KERN" tests from lkdtm correctly Oops the kernel.
>
> Thanks!
>
> -Kees
>

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv5 7/7] arm64: add better page protections to arm64
  2014-11-18  0:55 ` [PATCHv5 7/7] arm64: add better page protections to arm64 Laura Abbott
  2014-11-19 16:31   ` Mark Rutland
@ 2014-11-20 12:04   ` Steve Capper
  2014-11-21  1:02     ` Laura Abbott
  1 sibling, 1 reply; 24+ messages in thread
From: Steve Capper @ 2014-11-20 12:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 17, 2014 at 04:55:05PM -0800, Laura Abbott wrote:
> Add page protections for arm64 similar to those in arm.
> This is for security reasons to prevent certain classes
> of exploits. The current method:
> 
> - Map all memory as either RWX or RW. We round to the nearest
>   section to avoid creating page tables before everything is mapped
> - Once everything is mapped, if either end of the RWX section should
>   not be X, we split the PMD and remap as necessary
> - When initmem is to be freed, we change the permissions back to
>   RW (using stop machine if necessary to flush the TLB)
> - If CONFIG_DEBUG_RODATA is set, the read only sections are set
>   read only.
> 
> Signed-off-by: Laura Abbott <lauraa@codeaurora.org>

Hi Laura,
Some comments inline.

> ---
> v5:
>  -Dropped map_io from the alloc_init_* functions
>  -Fixed a bug in split_pud and cleaned it up per suggestions from Steve
>  -Aligned _etext up to SECTION_SIZE of DEBUG_ALIGN_RODATA is enabled to ensure
>   that the section mapping is actually kept and we don't leak any RWX regions
>  -Dropped some old ioremap code that somehow snuck in from the last rebase
>  -Fixed create_id_mapping to use the early mapping since efi_idmap_init is
>   called early
> ---
>  arch/arm64/Kconfig.debug            |  23 +++
>  arch/arm64/include/asm/cacheflush.h |   4 +
>  arch/arm64/kernel/vmlinux.lds.S     |  20 +++
>  arch/arm64/mm/init.c                |   1 +
>  arch/arm64/mm/mm.h                  |   2 +
>  arch/arm64/mm/mmu.c                 | 289 +++++++++++++++++++++++++++++++-----
>  6 files changed, 298 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug
> index 0a12933..867fe6f1 100644
> --- a/arch/arm64/Kconfig.debug
> +++ b/arch/arm64/Kconfig.debug
> @@ -54,4 +54,27 @@ config DEBUG_SET_MODULE_RONX
>            against certain classes of kernel exploits.
>            If in doubt, say "N".
>  
> +config DEBUG_RODATA
> +	bool "Make kernel text and rodata read-only"
> +	help
> +	  If this is set, kernel text and rodata will be made read-only. This
> +	  is to help catch accidental or malicious attempts to change the
> +	  kernel's executable code. Additionally splits rodata from kernel
> +	  text so it can be made explicitly non-executable.
> +
> +          If in doubt, say Y
> +
> +config DEBUG_ALIGN_RODATA
> +	depends on DEBUG_RODATA && !ARM64_64K_PAGES
> +	bool "Align linker sections up to SECTION_SIZE"
> +	help
> +	  If this option is enabled, sections that may potentially be marked as
> +	  read only or non-executable will be aligned up to the section size of
> +	  the kernel. This prevents sections from being split into pages and
> +	  avoids a potential TLB penalty. The downside is an increase in
> +	  alignment and potentially wasted space. Turn on this option if
> +	  performance is more important than memory pressure.
> +
> +	  If in doubt, say N
> +
>  endmenu
> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> index 689b637..0014207 100644
> --- a/arch/arm64/include/asm/cacheflush.h
> +++ b/arch/arm64/include/asm/cacheflush.h
> @@ -152,4 +152,8 @@ int set_memory_ro(unsigned long addr, int numpages);
>  int set_memory_rw(unsigned long addr, int numpages);
>  int set_memory_x(unsigned long addr, int numpages);
>  int set_memory_nx(unsigned long addr, int numpages);
> +
> +#ifdef CONFIG_DEBUG_RODATA
> +void mark_rodata_ro(void);
> +#endif
>  #endif
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 2d7e20a..405820d 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -8,6 +8,7 @@
>  #include <asm/thread_info.h>
>  #include <asm/memory.h>
>  #include <asm/page.h>
> +#include <asm/pgtable.h>
>  
>  #include "image.h"
>  
> @@ -54,6 +55,9 @@ SECTIONS
>  		_text = .;
>  		HEAD_TEXT
>  	}
> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
> +	. = ALIGN(1<<SECTION_SHIFT);
> +#endif

For 64K pages, SECTION_SHIFT will be 29, will that large an alignment
cause any issues?

>  	.text : {			/* Real text segment		*/
>  		_stext = .;		/* Text and read-only data	*/
>  			*(.latehead.text)
> @@ -71,19 +75,35 @@ SECTIONS
>  		*(.got)			/* Global offset table		*/
>  	}
>  
> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
> +	. = ALIGN(1<<SECTION_SHIFT);
> +#endif
>  	RO_DATA(PAGE_SIZE)
>  	EXCEPTION_TABLE(8)
>  	NOTES
> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
> +	. = ALIGN(1<<SECTION_SHIFT);
> +#endif
>  	_etext = .;			/* End of text and rodata section */
>  
> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
> +	. = ALIGN(1<<SECTION_SHIFT);
> +#else
>  	. = ALIGN(PAGE_SIZE);
> +#endif
>  	__init_begin = .;
>  
>  	INIT_TEXT_SECTION(8)
>  	.exit.text : {
>  		ARM_EXIT_KEEP(EXIT_TEXT)
>  	}
> +
> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
> +	. = ALIGN(1<<SECTION_SHIFT);
> +	__init_data_begin = .;
> +#else
>  	. = ALIGN(16);
> +#endif
>  	.init.data : {
>  		INIT_DATA
>  		INIT_SETUP(16)
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 494297c..61f44c7 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -324,6 +324,7 @@ void __init mem_init(void)
>  
>  void free_initmem(void)
>  {
> +	fixup_init();
>  	free_initmem_default(0);
>  }
>  
> diff --git a/arch/arm64/mm/mm.h b/arch/arm64/mm/mm.h
> index d519f4f..82347d8 100644
> --- a/arch/arm64/mm/mm.h
> +++ b/arch/arm64/mm/mm.h
> @@ -1,2 +1,4 @@
>  extern void __init bootmem_init(void);
>  extern void __init arm64_swiotlb_init(void);
> +
> +void fixup_init(void);
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 6032f3e..dafde8e 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -26,6 +26,7 @@
>  #include <linux/memblock.h>
>  #include <linux/fs.h>
>  #include <linux/io.h>
> +#include <linux/stop_machine.h>
>  
>  #include <asm/cputype.h>
>  #include <asm/fixmap.h>
> @@ -137,17 +138,55 @@ static void __init *early_alloc(unsigned long sz)
>  	return ptr;
>  }
>  
> -static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
> +/*
> + * remap a PMD into pages
> + */
> +static noinline void __ref split_pmd(pmd_t *pmd, bool early)
> +{
> +	pte_t *pte, *start_pte;
> +	unsigned long pfn;
> +	int i = 0;
> +
> +	if (early)
> +		start_pte = pte = early_alloc(PTRS_PER_PTE*sizeof(pte_t));
> +	else
> +		start_pte = pte = (pte_t *)__get_free_page(PGALLOC_GFP);
> +
> +	BUG_ON(!pte);
> +
> +	pfn = pmd_pfn(*pmd);
> +
> +	do {
> +		/*
> +		 * Need to have the least restrictive permissions available
> +		 * permissions will be fixed up later
> +		 */
> +		set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
> +		pfn++;
> +	} while (pte++, i++, i < PTRS_PER_PTE);
> +
> +
> +	__pmd_populate(pmd, __pa(start_pte), PMD_TYPE_TABLE);
> +	flush_tlb_all();
> +}
> +
> +static void __ref alloc_init_pte(pmd_t *pmd, unsigned long addr,
>  				  unsigned long end, unsigned long pfn,
> -				  pgprot_t prot)
> +				  pgprot_t prot, bool early)
>  {
>  	pte_t *pte;
>  
>  	if (pmd_none(*pmd)) {
> -		pte = early_alloc(PTRS_PER_PTE * sizeof(pte_t));
> +		if (early)
> +			pte = early_alloc(PTRS_PER_PTE * sizeof(pte_t));
> +		else
> +			pte = (pte_t *)__get_free_page(PGALLOC_GFP);
> +		BUG_ON(!pte);
>  		__pmd_populate(pmd, __pa(pte), PMD_TYPE_TABLE);
>  	}
> -	BUG_ON(pmd_bad(*pmd));
> +
> +	if (pmd_bad(*pmd))
> +		split_pmd(pmd, early);
>  
>  	pte = pte_offset_kernel(pmd, addr);
>  	do {
> @@ -156,29 +195,44 @@ static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
>  	} while (pte++, addr += PAGE_SIZE, addr != end);
>  }
>  
> -static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
> +void __ref split_pud(pud_t *old_pud, pmd_t *pmd)
> +{
> +	unsigned long addr = pud_pfn(*old_pud) << PAGE_SHIFT;
> +	pgprot_t prot = __pgprot(pud_val(*old_pud) ^ pud_pfn(*old_pud));

I don't think this is quite right, the page frame number is the PA
shifted right, so that can't be XOR'ed directly with the pud. The
following should work though:
	pgprot_t prot = __pgprot(pud_val(*old_pud) ^ addr);

> +	int i = 0;
> +
> +	do {
> +		set_pmd(pmd, __pmd(addr | prot));
> +		addr += PMD_SIZE;
> +	} while (pmd++, i++, i < PTRS_PER_PMD);
> +}
> +
> +static void __ref alloc_init_pmd(pud_t *pud, unsigned long addr,
>  				  unsigned long end, phys_addr_t phys,
> -				  int map_io)
> +				  pgprot_t sect_prot, pgprot_t pte_prot,
> +				  bool early)
>  {
>  	pmd_t *pmd;
>  	unsigned long next;
> -	pmdval_t prot_sect;
> -	pgprot_t prot_pte;
> -
> -	if (map_io) {
> -		prot_sect = PROT_SECT_DEVICE_nGnRE;
> -		prot_pte = __pgprot(PROT_DEVICE_nGnRE);
> -	} else {
> -		prot_sect = PROT_SECT_NORMAL_EXEC;
> -		prot_pte = PAGE_KERNEL_EXEC;
> -	}
>  
>  	/*
>  	 * Check for initial section mappings in the pgd/pud and remove them.
>  	 */
>  	if (pud_none(*pud) || pud_bad(*pud)) {
> -		pmd = early_alloc(PTRS_PER_PMD * sizeof(pmd_t));
> +		if (early)
> +			pmd = early_alloc(PTRS_PER_PMD * sizeof(pmd_t));
> +		else
> +			pmd = pmd_alloc_one(&init_mm, addr);
> +		BUG_ON(!pmd);
> +		if (pud_sect(*pud)) {
> +			/*
> +			 * need to have the 1G of mappings continue to be
> +			 * present
> +			 */
> +			split_pud(pud, pmd);
> +		}
>  		pud_populate(&init_mm, pud, pmd);
> +		flush_tlb_all();
>  	}
>  
>  	pmd = pmd_offset(pud, addr);
> @@ -186,8 +240,8 @@ static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
>  		next = pmd_addr_end(addr, end);
>  		/* try section mapping first */
>  		if (((addr | next | phys) & ~SECTION_MASK) == 0) {
> -			pmd_t old_pmd =*pmd;
> -			set_pmd(pmd, __pmd(phys | prot_sect));
> +			pmd_t old_pmd = *pmd;
> +			set_pmd(pmd, __pmd(phys | sect_prot));
>  			/*
>  			 * Check for previous table entries created during
>  			 * boot (__create_page_tables) and flush them.
> @@ -196,15 +250,39 @@ static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
>  				flush_tlb_all();
>  		} else {
>  			alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
> -				       prot_pte);
> +					pte_prot, early);
>  		}
>  		phys += next - addr;
>  	} while (pmd++, addr = next, addr != end);
>  }
>  
> -static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
> -				  unsigned long end, phys_addr_t phys,
> -				  int map_io)
> +static inline bool use_1G_block(unsigned long addr, unsigned long next,
> +			unsigned long phys, pgprot_t sect_prot,
> +			pgprot_t pte_prot, bool early)
> +{
> +	if (!early)
> +		return false;
> +
> +	if (PAGE_SHIFT != 12)
> +		return false;
> +
> +	if (((addr | next | phys) & ~PUD_MASK) != 0)
> +		return false;
> +
> +	/*
> +	 * The assumption here is that if the memory is anything other
> +	 * than normal we should not be using a block type
> +	 */
> +	return ((sect_prot & PMD_ATTRINDX_MASK) ==
> +				PMD_ATTRINDX(MT_NORMAL)) &&
> +			((pte_prot & PTE_ATTRINDX_MASK) ==
> +				PTE_ATTRINDX(MT_NORMAL));

Do we need this check for memory type?
A block mapping for device memory is perfectly valid (just unlikely!).

> +}
> +
> +static void __ref alloc_init_pud(pgd_t *pgd, unsigned long addr,
> +				  unsigned long end, unsigned long phys,
> +				  pgprot_t sect_prot, pgprot_t pte_prot,
> +				  bool early)
>  {
>  	pud_t *pud;
>  	unsigned long next;
> @@ -222,10 +300,9 @@ static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
>  		/*
>  		 * For 4K granule only, attempt to put down a 1GB block
>  		 */
> -		if (!map_io && (PAGE_SHIFT == 12) &&
> -		    ((addr | next | phys) & ~PUD_MASK) == 0) {
> +		if (use_1G_block(addr, next, phys, sect_prot, pte_prot, early)) {
>  			pud_t old_pud = *pud;
> -			set_pud(pud, __pud(phys | PROT_SECT_NORMAL_EXEC));
> +			set_pud(pud, __pud(phys | sect_prot));
>  
>  			/*
>  			 * If we have an old value for a pud, it will
> @@ -240,7 +317,8 @@ static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
>  				flush_tlb_all();
>  			}
>  		} else {
> -			alloc_init_pmd(pud, addr, next, phys, map_io);
> +			alloc_init_pmd(pud, addr, next, phys, sect_prot,
> +					pte_prot, early);
>  		}
>  		phys += next - addr;
>  	} while (pud++, addr = next, addr != end);
> @@ -250,9 +328,11 @@ static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
>   * Create the page directory entries and any necessary page tables for the
>   * mapping specified by 'md'.
>   */
> -static void __init __create_mapping(pgd_t *pgd, phys_addr_t phys,
> -				    unsigned long virt, phys_addr_t size,
> -				    int map_io)
> +static void __ref __create_mapping(pgd_t *pgd, phys_addr_t phys,
> +				  unsigned long virt,
> +				  phys_addr_t size,
> +				  pgprot_t sect_prot, pgprot_t pte_prot,
> +				  bool early)
>  {
>  	unsigned long addr, length, end, next;
>  
> @@ -262,31 +342,102 @@ static void __init __create_mapping(pgd_t *pgd, phys_addr_t phys,
>  	end = addr + length;
>  	do {
>  		next = pgd_addr_end(addr, end);
> -		alloc_init_pud(pgd, addr, next, phys, map_io);
> +		alloc_init_pud(pgd, addr, next, phys, sect_prot, pte_prot,
> +				early);
>  		phys += next - addr;
>  	} while (pgd++, addr = next, addr != end);
>  }
>  
> -static void __init create_mapping(phys_addr_t phys, unsigned long virt,
> -				  phys_addr_t size)
> +void __init create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io)
> +{
> +	pgprot_t sect_prot = PROT_SECT_NORMAL_EXEC;
> +	pgprot_t pte_prot = PAGE_KERNEL_EXEC;
> +
> +	if ((addr >> PGDIR_SHIFT) >= ARRAY_SIZE(idmap_pg_dir)) {
> +		pr_warn("BUG: not creating id mapping for %pa\n", &addr);
> +		return;
> +	}
> +
> +	if (map_io) {
> +		sect_prot = PROT_SECT_DEVICE_nGnRE;
> +		pte_prot = __pgprot(PROT_DEVICE_nGnRE);
> +	}
> +
> +	__create_mapping(&idmap_pg_dir[pgd_index(addr)],
> +			 addr, addr, size, sect_prot, pte_prot, true);
> +}
> +
> +static void __ref create_mapping(phys_addr_t phys, unsigned long virt,
> +				  phys_addr_t size,
> +				  pgprot_t sect_prot, pgprot_t pte_prot)
>  {
>  	if (virt < VMALLOC_START) {
>  		pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
>  			&phys, virt);
>  		return;
>  	}
> -	__create_mapping(pgd_offset_k(virt & PAGE_MASK), phys, virt, size, 0);
> +
> +	return __create_mapping(pgd_offset_k(virt & PAGE_MASK), phys, virt,
> +				size, sect_prot, pte_prot, true);
>  }
>  
> -void __init create_id_mapping(phys_addr_t addr, phys_addr_t size, int map_io)
> +static void __ref create_mapping_late(phys_addr_t phys, unsigned long virt,
> +				  phys_addr_t size,
> +				  pgprot_t sect_prot, pgprot_t pte_prot)
>  {
> -	if ((addr >> PGDIR_SHIFT) >= ARRAY_SIZE(idmap_pg_dir)) {
> -		pr_warn("BUG: not creating id mapping for %pa\n", &addr);
> +	if (virt < VMALLOC_START) {
> +		pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
> +			&phys, virt);
>  		return;
>  	}
> -	__create_mapping(&idmap_pg_dir[pgd_index(addr)],
> -			 addr, addr, size, map_io);
> +
> +	return __create_mapping(pgd_offset_k(virt & PAGE_MASK), phys, virt,
> +				size, sect_prot, pte_prot, false);
> +}
> +
> +#ifdef CONFIG_DEBUG_RODATA
> +static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
> +{
> +	/*
> +	 * Set up the executable regions using the existing section mappings
> +	 * for now. This will get more fine grained later once all memory
> +	 * is mapped
> +	 */
> +	unsigned long kernel_x_start = round_down(__pa(_stext), SECTION_SIZE);
> +	unsigned long kernel_x_end = round_up(__pa(__init_end), SECTION_SIZE);
> +
> +	if (end < kernel_x_start) {
> +		create_mapping(start, __phys_to_virt(start),
> +			end - start, PROT_SECT_NORMAL, PAGE_KERNEL);
> +	} else if (start >= kernel_x_end) {
> +		create_mapping(start, __phys_to_virt(start),
> +			end - start, PROT_SECT_NORMAL, PAGE_KERNEL);
> +	} else {
> +		if (start < kernel_x_start)
> +			create_mapping(start, __phys_to_virt(start),
> +				kernel_x_start - start,
> +				PROT_SECT_NORMAL,
> +				PAGE_KERNEL);
> +		create_mapping(kernel_x_start,
> +				__phys_to_virt(kernel_x_start),
> +				kernel_x_end - kernel_x_start,
> +				PROT_SECT_NORMAL_EXEC, PAGE_KERNEL_EXEC);
> +		if (kernel_x_end < end)
> +			create_mapping(kernel_x_end,
> +				__phys_to_virt(kernel_x_end),
> +				end - kernel_x_end,
> +				PROT_SECT_NORMAL,
> +				PAGE_KERNEL);
> +	}
> +
>  }
> +#else
> +static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
> +{
> +	create_mapping(start, __phys_to_virt(start), end - start,
> +			PROT_SECT_NORMAL_EXEC, PAGE_KERNEL_EXEC);
> +}
> +#endif
>  
>  static void __init map_mem(void)
>  {
> @@ -332,14 +483,69 @@ static void __init map_mem(void)
>  			memblock_set_current_limit(limit);
>  		}
>  #endif
> -
> -		create_mapping(start, __phys_to_virt(start), end - start);
> +		__map_memblock(start, end);
>  	}
>  
>  	/* Limit no longer required. */
>  	memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
>  }
>  
> +void __init fixup_executable(void)
> +{
> +#ifdef CONFIG_DEBUG_RODATA
> +	/* now that we are actually fully mapped, make the start/end more fine grained */
> +	if (!IS_ALIGNED((unsigned long)_stext, SECTION_SIZE)) {
> +		unsigned long aligned_start = round_down(__pa(_stext),
> +							SECTION_SIZE);
> +
> +		create_mapping(aligned_start, __phys_to_virt(aligned_start),
> +				__pa(_stext) - aligned_start,
> +				PROT_SECT_NORMAL,
> +				PAGE_KERNEL);
> +	}
> +
> +	if (!IS_ALIGNED((unsigned long)__init_end, SECTION_SIZE)) {
> +		unsigned long aligned_end = round_up(__pa(__init_end),
> +							SECTION_SIZE);
> +		create_mapping(__pa(__init_end), (unsigned long)__init_end,
> +				aligned_end - __pa(__init_end),
> +				PROT_SECT_NORMAL,
> +				PAGE_KERNEL);
> +	}
> +#endif
> +}
> +
> +#ifdef CONFIG_DEBUG_RODATA
> +void mark_rodata_ro(void)
> +{
> +	create_mapping_late(__pa(_stext), (unsigned long)_stext,
> +				(unsigned long)_etext - (unsigned long)_stext,
> +				PROT_SECT_NORMAL_EXEC | PMD_SECT_RDONLY,
> +				PAGE_KERNEL_EXEC | PTE_RDONLY);
> +
> +}
> +#endif
> +
> +static int __flush_mappings(void *unused)
> +{
> +	flush_tlb_kernel_range((unsigned long)__init_begin,
> +				(unsigned long)__init_end);
> +	return 0;
> +}
> +
> +void __ref fixup_init(void)
> +{
> +	phys_addr_t start = __pa(__init_begin);
> +	phys_addr_t end = __pa(__init_end);
> +
> +	create_mapping_late(start, (unsigned long)__init_begin,
> +			end - start,
> +			PROT_SECT_NORMAL,
> +			PAGE_KERNEL);
> +	if (!IS_ALIGNED(start, SECTION_SIZE) || !IS_ALIGNED(end, SECTION_SIZE))
> +		stop_machine(__flush_mappings, NULL, NULL);
> +}
> +
>  /*
>   * paging_init() sets up the page tables, initialises the zone memory
>   * maps and sets up the zero page.
> @@ -349,6 +555,7 @@ void __init paging_init(void)
>  	void *zero_page;
>  
>  	map_mem();
> +	fixup_executable();
>  
>  	/*
>  	 * Finally flush the caches and tlb to ensure that we're in a
> -- 
> Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
> 


Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv5 7/7] arm64: add better page protections to arm64
  2014-11-20 12:04   ` Steve Capper
@ 2014-11-21  1:02     ` Laura Abbott
  0 siblings, 0 replies; 24+ messages in thread
From: Laura Abbott @ 2014-11-21  1:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 11/20/2014 4:04 AM, Steve Capper wrote:
>
> Hi Laura,
> Some comments inline.
>
>> ---
>> v5:
>>   -Dropped map_io from the alloc_init_* functions
>>   -Fixed a bug in split_pud and cleaned it up per suggestions from Steve
>>   -Aligned _etext up to SECTION_SIZE of DEBUG_ALIGN_RODATA is enabled to ensure
>>    that the section mapping is actually kept and we don't leak any RWX regions
>>   -Dropped some old ioremap code that somehow snuck in from the last rebase
>>   -Fixed create_id_mapping to use the early mapping since efi_idmap_init is
>>    called early
>> ---
>>   arch/arm64/Kconfig.debug            |  23 +++
>>   arch/arm64/include/asm/cacheflush.h |   4 +
>>   arch/arm64/kernel/vmlinux.lds.S     |  20 +++
>>   arch/arm64/mm/init.c                |   1 +
>>   arch/arm64/mm/mm.h                  |   2 +
>>   arch/arm64/mm/mmu.c                 | 289 +++++++++++++++++++++++++++++++-----
>>   6 files changed, 298 insertions(+), 41 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug
>> index 0a12933..867fe6f1 100644
>> --- a/arch/arm64/Kconfig.debug
>> +++ b/arch/arm64/Kconfig.debug
>> @@ -54,4 +54,27 @@ config DEBUG_SET_MODULE_RONX
>>             against certain classes of kernel exploits.
>>             If in doubt, say "N".
>>
>> +config DEBUG_RODATA
>> +	bool "Make kernel text and rodata read-only"
>> +	help
>> +	  If this is set, kernel text and rodata will be made read-only. This
>> +	  is to help catch accidental or malicious attempts to change the
>> +	  kernel's executable code. Additionally splits rodata from kernel
>> +	  text so it can be made explicitly non-executable.
>> +
>> +          If in doubt, say Y
>> +
>> +config DEBUG_ALIGN_RODATA
>> +	depends on DEBUG_RODATA && !ARM64_64K_PAGES
>> +	bool "Align linker sections up to SECTION_SIZE"
>> +	help
>> +	  If this option is enabled, sections that may potentially be marked as
>> +	  read only or non-executable will be aligned up to the section size of
>> +	  the kernel. This prevents sections from being split into pages and
>> +	  avoids a potential TLB penalty. The downside is an increase in
>> +	  alignment and potentially wasted space. Turn on this option if
>> +	  performance is more important than memory pressure.
>> +
>> +	  If in doubt, say N
>> +
>>   endmenu
>> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
>> index 689b637..0014207 100644
>> --- a/arch/arm64/include/asm/cacheflush.h
>> +++ b/arch/arm64/include/asm/cacheflush.h
>> @@ -152,4 +152,8 @@ int set_memory_ro(unsigned long addr, int numpages);
>>   int set_memory_rw(unsigned long addr, int numpages);
>>   int set_memory_x(unsigned long addr, int numpages);
>>   int set_memory_nx(unsigned long addr, int numpages);
>> +
>> +#ifdef CONFIG_DEBUG_RODATA
>> +void mark_rodata_ro(void);
>> +#endif
>>   #endif
>> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
>> index 2d7e20a..405820d 100644
>> --- a/arch/arm64/kernel/vmlinux.lds.S
>> +++ b/arch/arm64/kernel/vmlinux.lds.S
>> @@ -8,6 +8,7 @@
>>   #include <asm/thread_info.h>
>>   #include <asm/memory.h>
>>   #include <asm/page.h>
>> +#include <asm/pgtable.h>
>>
>>   #include "image.h"
>>
>> @@ -54,6 +55,9 @@ SECTIONS
>>   		_text = .;
>>   		HEAD_TEXT
>>   	}
>> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
>> +	. = ALIGN(1<<SECTION_SHIFT);
>> +#endif
>
> For 64K pages, SECTION_SHIFT will be 29, will that large an alignment
> cause any issues?
>

This option currently depends on !ARM64_64K_PAGES. It's really only
beneficial for 4K pages.

>>   	.text : {			/* Real text segment		*/
...
>> -static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
>> +void __ref split_pud(pud_t *old_pud, pmd_t *pmd)
>> +{
>> +	unsigned long addr = pud_pfn(*old_pud) << PAGE_SHIFT;
>> +	pgprot_t prot = __pgprot(pud_val(*old_pud) ^ pud_pfn(*old_pud));
>
> I don't think this is quite right, the page frame number is the PA
> shifted right, so that can't be XOR'ed directly with the pud. The
> following should work though:
> 	pgprot_t prot = __pgprot(pud_val(*old_pud) ^ addr);
>

Right, I'll fix it.

>> +	int i = 0;
>> +
>> +	do {
>> +		set_pmd(pmd, __pmd(addr | prot));
>> +		addr += PMD_SIZE;
>> +	} while (pmd++, i++, i < PTRS_PER_PMD);
>> +}
>> +
>> +static void __ref alloc_init_pmd(pud_t *pud, unsigned long addr,
>>   				  unsigned long end, phys_addr_t phys,
>> -				  int map_io)
>> +				  pgprot_t sect_prot, pgprot_t pte_prot,
>> +				  bool early)
>>   {
>>   	pmd_t *pmd;
>>   	unsigned long next;
>> -	pmdval_t prot_sect;
>> -	pgprot_t prot_pte;
>> -
>> -	if (map_io) {
>> -		prot_sect = PROT_SECT_DEVICE_nGnRE;
>> -		prot_pte = __pgprot(PROT_DEVICE_nGnRE);
>> -	} else {
>> -		prot_sect = PROT_SECT_NORMAL_EXEC;
>> -		prot_pte = PAGE_KERNEL_EXEC;
>> -	}
>>
>>   	/*
>>   	 * Check for initial section mappings in the pgd/pud and remove them.
>>   	 */
>>   	if (pud_none(*pud) || pud_bad(*pud)) {
>> -		pmd = early_alloc(PTRS_PER_PMD * sizeof(pmd_t));
>> +		if (early)
>> +			pmd = early_alloc(PTRS_PER_PMD * sizeof(pmd_t));
>> +		else
>> +			pmd = pmd_alloc_one(&init_mm, addr);
>> +		BUG_ON(!pmd);
>> +		if (pud_sect(*pud)) {
>> +			/*
>> +			 * need to have the 1G of mappings continue to be
>> +			 * present
>> +			 */
>> +			split_pud(pud, pmd);
>> +		}
>>   		pud_populate(&init_mm, pud, pmd);
>> +		flush_tlb_all();
>>   	}
>>
>>   	pmd = pmd_offset(pud, addr);
>> @@ -186,8 +240,8 @@ static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
>>   		next = pmd_addr_end(addr, end);
>>   		/* try section mapping first */
>>   		if (((addr | next | phys) & ~SECTION_MASK) == 0) {
>> -			pmd_t old_pmd =*pmd;
>> -			set_pmd(pmd, __pmd(phys | prot_sect));
>> +			pmd_t old_pmd = *pmd;
>> +			set_pmd(pmd, __pmd(phys | sect_prot));
>>   			/*
>>   			 * Check for previous table entries created during
>>   			 * boot (__create_page_tables) and flush them.
>> @@ -196,15 +250,39 @@ static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
>>   				flush_tlb_all();
>>   		} else {
>>   			alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
>> -				       prot_pte);
>> +					pte_prot, early);
>>   		}
>>   		phys += next - addr;
>>   	} while (pmd++, addr = next, addr != end);
>>   }
>>
>> -static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
>> -				  unsigned long end, phys_addr_t phys,
>> -				  int map_io)
>> +static inline bool use_1G_block(unsigned long addr, unsigned long next,
>> +			unsigned long phys, pgprot_t sect_prot,
>> +			pgprot_t pte_prot, bool early)
>> +{
>> +	if (!early)
>> +		return false;
>> +
>> +	if (PAGE_SHIFT != 12)
>> +		return false;
>> +
>> +	if (((addr | next | phys) & ~PUD_MASK) != 0)
>> +		return false;
>> +
>> +	/*
>> +	 * The assumption here is that if the memory is anything other
>> +	 * than normal we should not be using a block type
>> +	 */
>> +	return ((sect_prot & PMD_ATTRINDX_MASK) ==
>> +				PMD_ATTRINDX(MT_NORMAL)) &&
>> +			((pte_prot & PTE_ATTRINDX_MASK) ==
>> +				PTE_ATTRINDX(MT_NORMAL));
>
> Do we need this check for memory type?
> A block mapping for device memory is perfectly valid (just unlikely!).
>

I was trying to match the existing behavior of !map_io. I assumed that
check was there for a reason.

Thanks,
Laura


-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv5 7/7] arm64: add better page protections to arm64
  2014-11-19 16:31   ` Mark Rutland
  2014-11-19 17:38     ` Ard Biesheuvel
@ 2014-11-21  1:08     ` Laura Abbott
  1 sibling, 0 replies; 24+ messages in thread
From: Laura Abbott @ 2014-11-21  1:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/19/2014 8:31 AM, Mark Rutland wrote:
...
>
> Could we have macros defined once above, something like:
>
> #ifdef CONFIG_DEBUG_ALIGN_RODATA
> #define ALIGN_DEBUG_RO                  . = ALIGN(1<<SECTION_SHIFT);
> #define ALIGN_DEBUG_RO_MIN(min)         ALIGN_DEBUG_RO
> #else
> #define ALIGN_DEBUG_RO
> #define ALIGN_DEBUG_RO_MIN(min)         . = ALIGN(min)
> #endif
>
> Then we can avoid all the ifdefs below.
>

Yes, that looks cleaner.

>>          .text : {                       /* Real text segment            */
>>                  _stext = .;             /* Text and read-only data      */
>>                          *(.latehead.text)
>> @@ -71,19 +75,35 @@ SECTIONS
>>                  *(.got)                 /* Global offset table          */
>>          }
>>
>> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
>> +       . = ALIGN(1<<SECTION_SHIFT);
>> +#endif
>>          RO_DATA(PAGE_SIZE)
>>          EXCEPTION_TABLE(8)
>>          NOTES
>> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
>> +       . = ALIGN(1<<SECTION_SHIFT);
>> +#endif
>>          _etext = .;                     /* End of text and rodata section */
>>
>> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
>> +       . = ALIGN(1<<SECTION_SHIFT);
>> +#else
>>          . = ALIGN(PAGE_SIZE);
>> +#endif
>>          __init_begin = .;
>>
>>          INIT_TEXT_SECTION(8)
>>          .exit.text : {
>>                  ARM_EXIT_KEEP(EXIT_TEXT)
>>          }
>> +
>> +#ifdef CONFIG_DEBUG_ALIGN_RODATA
>> +       . = ALIGN(1<<SECTION_SHIFT);
>> +       __init_data_begin = .;
>
> Is this used anywhere? we seem to have left __init_end after this, and
> haven't introduced an __init_data_end.
>

I think this was a hold over from an older version of the series which
set things up differently. I'll drop it.

>> +#else
>>          . = ALIGN(16);
>> +#endif
>>          .init.data : {
...
>> -static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
>> +void __ref split_pud(pud_t *old_pud, pmd_t *pmd)
>> +{
>> +       unsigned long addr = pud_pfn(*old_pud) << PAGE_SHIFT;
>> +       pgprot_t prot = __pgprot(pud_val(*old_pud) ^ pud_pfn(*old_pud));
>> +       int i = 0;
>> +
>> +       do {
>> +               set_pmd(pmd, __pmd(addr | prot));
>> +               addr += PMD_SIZE;
>> +       } while (pmd++, i++, i < PTRS_PER_PMD);
>> +}
>
> As far as I can see only the functions which call early_alloc
> (split_pmd, alloc_init_pte, alloc_init_pmd) need the __ref annotation,
> so we can drop it heere and elsewhere.
>
> It would be nice if we could avoid the mismatch entirely. Perhaps we
> could pass an allocator function down instead of the early parameter?
> Something like:
>
> static void *late_alloc(unsigned long sz)
> {
> 	BUG_ON(PAGE_SIZE < sz);
> 	return (void *)__get_free_page(PGALLOC_GFP);
> }
>
> which we could pass down from create_mapping_late in place of
> early_alloc for create_id_mapping and create_mapping.
>

Yes, the __ref annotations are ugly. I'll see if I can clean them
up.

> Thanks,
> Mark.
>

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-11-21  1:08 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-18  0:54 [PATCHv5 0/7] Better page protections for arm64 Laura Abbott
2014-11-18  0:54 ` [PATCHv5 1/7] arm64: Treat handle_arch_irq as a function pointer Laura Abbott
2014-11-18  0:55 ` [PATCHv5 2/7] arm64: Switch to adrp for loading the stub vectors Laura Abbott
2014-11-18  0:55 ` [PATCHv5 3/7] arm64: Move cpu_resume into the text section Laura Abbott
2014-11-18 10:35   ` Lorenzo Pieralisi
2014-11-18 10:49   ` Mark Rutland
2014-11-18 21:20     ` Laura Abbott
2014-11-18  0:55 ` [PATCHv5 4/7] arm64: Move some head.text functions to executable section Laura Abbott
2014-11-18 11:41   ` Mark Rutland
2014-11-18 21:27     ` Laura Abbott
2014-11-18  0:55 ` [PATCHv5 5/7] arm64: Factor out fixmap initialiation from ioremap Laura Abbott
2014-11-18  0:55 ` [PATCHv5 6/7] arm64: use fixmap for text patching when text is RO Laura Abbott
2014-11-18  0:55 ` [PATCHv5 7/7] arm64: add better page protections to arm64 Laura Abbott
2014-11-19 16:31   ` Mark Rutland
2014-11-19 17:38     ` Ard Biesheuvel
2014-11-19 18:06       ` Ard Biesheuvel
2014-11-19 18:46       ` Mark Rutland
2014-11-19 18:56         ` Ard Biesheuvel
2014-11-19 19:20           ` Laura Abbott
2014-11-21  1:08     ` Laura Abbott
2014-11-20 12:04   ` Steve Capper
2014-11-21  1:02     ` Laura Abbott
2014-11-19 22:33 ` [PATCHv5 0/7] Better page protections for arm64 Kees Cook
2014-11-19 22:37   ` Laura Abbott

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.