All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/8] arm: support CONFIG_RODATA
@ 2014-09-03 21:57 ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Will Deacon, Rabin Vincent, Laura Abbott, Rob Herring,
	Leif Lindholm, Mark Salter, Liu hua, Nikolay Borisov,
	Nicolas Pitre, Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

This is a series of patches to support CONFIG_RODATA on ARM, so that
the kernel text is RO, and non-text sections default to NX. To support
on-the-fly kernel text patching (via ftrace, kprobes, etc), fixmap
support has been finalized based on several versions of various patches
that are floating around on the mailing list. This series attempts to
include the least intrusive version, so that others can build on it for
future fixmap work.

The series has been heavily tested, and appears to be working correctly:

With CONFIG_ARM_PTDUMP, expected page table permissions are seen in
/sys/kernel/debug/kernel_page_tables.

Using CONFIG_LKDTM, the kernel now correctly detects bad accesses for
for the following lkdtm tests via /sys/kernel/debug/provoke-crash/DIRECT:
        EXEC_DATA
        WRITE_RO
        WRITE_KERN

ftrace works:
        CONFIG_FTRACE_STARTUP_TEST passes
        Enabling tracing works:
                echo function > /sys/kernel/debug/tracing/current_tracer

kprobes works:
        CONFIG_ARM_KPROBES_TEST passes

kexec works:
        kexec will load and start a new kernel

Built with and without CONFIG_HIGHMEM, CONFIG_HIGHMEM_DEBUG, and
CONFIG_NR_CPUS=32.

Thanks to everyone who has been testing this series and working on its
various pieces!

Unless there are other concerns, I'd like to send a pull request to
rmk soon.

Thanks!

-Kees

v5:
- clean up #includes in mmu.c (will.deacon)
- optimize get_cr() test (will.deacon)
- created free_tcmmem to avoid excessive #ifdefs (will.deacon)
- explicitly test irqs_disabled in set_fixmap (will.deacon)

v4:
- expanded fixmap to 3MB to support 32 CPUs (robh)
- corrected pmd-finding via vaddr instead of FIXMAP_START (robh)
- switched structure size test to BUILD_BUG_ON (sboyd)
- added locking annotations to keep sparse happy (sboyd)
- adding missing "static" declarations noticed by sparse
- reorganized fixmap portion of patches

v3:
- more cleanups in switch to generic fixmap (lauraa, robh)
- fixed kexec merge hunk glitch (will.deacon)
- added tested-by tags where appropriate from v2 testing

v2:
- fix typo in kexec merge (buildbot)
- flip index order for highmem pte access (lauraa)
- added kgdb updates (dianders)


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 0/8] arm: support CONFIG_RODATA
@ 2014-09-03 21:57 ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-arm-kernel

This is a series of patches to support CONFIG_RODATA on ARM, so that
the kernel text is RO, and non-text sections default to NX. To support
on-the-fly kernel text patching (via ftrace, kprobes, etc), fixmap
support has been finalized based on several versions of various patches
that are floating around on the mailing list. This series attempts to
include the least intrusive version, so that others can build on it for
future fixmap work.

The series has been heavily tested, and appears to be working correctly:

With CONFIG_ARM_PTDUMP, expected page table permissions are seen in
/sys/kernel/debug/kernel_page_tables.

Using CONFIG_LKDTM, the kernel now correctly detects bad accesses for
for the following lkdtm tests via /sys/kernel/debug/provoke-crash/DIRECT:
        EXEC_DATA
        WRITE_RO
        WRITE_KERN

ftrace works:
        CONFIG_FTRACE_STARTUP_TEST passes
        Enabling tracing works:
                echo function > /sys/kernel/debug/tracing/current_tracer

kprobes works:
        CONFIG_ARM_KPROBES_TEST passes

kexec works:
        kexec will load and start a new kernel

Built with and without CONFIG_HIGHMEM, CONFIG_HIGHMEM_DEBUG, and
CONFIG_NR_CPUS=32.

Thanks to everyone who has been testing this series and working on its
various pieces!

Unless there are other concerns, I'd like to send a pull request to
rmk soon.

Thanks!

-Kees

v5:
- clean up #includes in mmu.c (will.deacon)
- optimize get_cr() test (will.deacon)
- created free_tcmmem to avoid excessive #ifdefs (will.deacon)
- explicitly test irqs_disabled in set_fixmap (will.deacon)

v4:
- expanded fixmap to 3MB to support 32 CPUs (robh)
- corrected pmd-finding via vaddr instead of FIXMAP_START (robh)
- switched structure size test to BUILD_BUG_ON (sboyd)
- added locking annotations to keep sparse happy (sboyd)
- adding missing "static" declarations noticed by sparse
- reorganized fixmap portion of patches

v3:
- more cleanups in switch to generic fixmap (lauraa, robh)
- fixed kexec merge hunk glitch (will.deacon)
- added tested-by tags where appropriate from v2 testing

v2:
- fix typo in kexec merge (buildbot)
- flip index order for highmem pte access (lauraa)
- added kgdb updates (dianders)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 1/8] arm: use generic fixmap.h
  2014-09-03 21:57 ` Kees Cook
@ 2014-09-03 21:57   ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Mark Salter, Laura Abbott, Rob Herring, Will Deacon,
	Rabin Vincent, Leif Lindholm, Liu hua, Nikolay Borisov,
	Nicolas Pitre, Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

From: Mark Salter <msalter@redhat.com>

ARM is different from other architectures in that fixmap pages are indexed
with a positive offset from FIXADDR_START.  Other architectures index with
a negative offset from FIXADDR_TOP.  In order to use the generic fixmap.h
definitions, this patch redefines FIXADDR_TOP to be inclusive of the
useable range.  That is, FIXADDR_TOP is the virtual address of the topmost
fixed page.  The newly defined FIXADDR_END is the first virtual address
past the fixed mappings.

Signed-off-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Doug Anderson <dianders@chromium.org>
[kees: update for a05e54c103b0 ("ARM: 8031/2: change fixmap ...")]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Rob Herring <robh@kernel.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/include/asm/fixmap.h | 27 +++++++++------------------
 arch/arm/mm/init.c            |  2 +-
 2 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/arch/arm/include/asm/fixmap.h b/arch/arm/include/asm/fixmap.h
index 74124b0d0d79..a7add6f9315d 100644
--- a/arch/arm/include/asm/fixmap.h
+++ b/arch/arm/include/asm/fixmap.h
@@ -2,27 +2,18 @@
 #define _ASM_FIXMAP_H
 
 #define FIXADDR_START		0xffc00000UL
-#define FIXADDR_TOP		0xffe00000UL
-#define FIXADDR_SIZE		(FIXADDR_TOP - FIXADDR_START)
+#define FIXADDR_END		0xffe00000UL
+#define FIXADDR_TOP		(FIXADDR_END - PAGE_SIZE)
 
-#define FIX_KMAP_NR_PTES	(FIXADDR_SIZE >> PAGE_SHIFT)
+#include <asm/kmap_types.h>
 
-#define __fix_to_virt(x)	(FIXADDR_START + ((x) << PAGE_SHIFT))
-#define __virt_to_fix(x)	(((x) - FIXADDR_START) >> PAGE_SHIFT)
+enum fixed_addresses {
+	FIX_KMAP_BEGIN,
+	FIX_KMAP_END = FIX_KMAP_BEGIN + (KM_TYPE_NR * NR_CPUS) - 1,
 
-extern void __this_fixmap_does_not_exist(void);
+	__end_of_fixed_addresses
+};
 
-static inline unsigned long fix_to_virt(const unsigned int idx)
-{
-	if (idx >= FIX_KMAP_NR_PTES)
-		__this_fixmap_does_not_exist();
-	return __fix_to_virt(idx);
-}
-
-static inline unsigned int virt_to_fix(const unsigned long vaddr)
-{
-	BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START);
-	return __virt_to_fix(vaddr);
-}
+#include <asm-generic/fixmap.h>
 
 #endif
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 659c75d808dc..ad82c05bfc3a 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -570,7 +570,7 @@ void __init mem_init(void)
 			MLK(DTCM_OFFSET, (unsigned long) dtcm_end),
 			MLK(ITCM_OFFSET, (unsigned long) itcm_end),
 #endif
-			MLK(FIXADDR_START, FIXADDR_TOP),
+			MLK(FIXADDR_START, FIXADDR_END),
 			MLM(VMALLOC_START, VMALLOC_END),
 			MLM(PAGE_OFFSET, (unsigned long)high_memory),
 #ifdef CONFIG_HIGHMEM
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 1/8] arm: use generic fixmap.h
@ 2014-09-03 21:57   ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-arm-kernel

From: Mark Salter <msalter@redhat.com>

ARM is different from other architectures in that fixmap pages are indexed
with a positive offset from FIXADDR_START.  Other architectures index with
a negative offset from FIXADDR_TOP.  In order to use the generic fixmap.h
definitions, this patch redefines FIXADDR_TOP to be inclusive of the
useable range.  That is, FIXADDR_TOP is the virtual address of the topmost
fixed page.  The newly defined FIXADDR_END is the first virtual address
past the fixed mappings.

Signed-off-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Doug Anderson <dianders@chromium.org>
[kees: update for a05e54c103b0 ("ARM: 8031/2: change fixmap ...")]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Rob Herring <robh@kernel.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/include/asm/fixmap.h | 27 +++++++++------------------
 arch/arm/mm/init.c            |  2 +-
 2 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/arch/arm/include/asm/fixmap.h b/arch/arm/include/asm/fixmap.h
index 74124b0d0d79..a7add6f9315d 100644
--- a/arch/arm/include/asm/fixmap.h
+++ b/arch/arm/include/asm/fixmap.h
@@ -2,27 +2,18 @@
 #define _ASM_FIXMAP_H
 
 #define FIXADDR_START		0xffc00000UL
-#define FIXADDR_TOP		0xffe00000UL
-#define FIXADDR_SIZE		(FIXADDR_TOP - FIXADDR_START)
+#define FIXADDR_END		0xffe00000UL
+#define FIXADDR_TOP		(FIXADDR_END - PAGE_SIZE)
 
-#define FIX_KMAP_NR_PTES	(FIXADDR_SIZE >> PAGE_SHIFT)
+#include <asm/kmap_types.h>
 
-#define __fix_to_virt(x)	(FIXADDR_START + ((x) << PAGE_SHIFT))
-#define __virt_to_fix(x)	(((x) - FIXADDR_START) >> PAGE_SHIFT)
+enum fixed_addresses {
+	FIX_KMAP_BEGIN,
+	FIX_KMAP_END = FIX_KMAP_BEGIN + (KM_TYPE_NR * NR_CPUS) - 1,
 
-extern void __this_fixmap_does_not_exist(void);
+	__end_of_fixed_addresses
+};
 
-static inline unsigned long fix_to_virt(const unsigned int idx)
-{
-	if (idx >= FIX_KMAP_NR_PTES)
-		__this_fixmap_does_not_exist();
-	return __fix_to_virt(idx);
-}
-
-static inline unsigned int virt_to_fix(const unsigned long vaddr)
-{
-	BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START);
-	return __virt_to_fix(vaddr);
-}
+#include <asm-generic/fixmap.h>
 
 #endif
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 659c75d808dc..ad82c05bfc3a 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -570,7 +570,7 @@ void __init mem_init(void)
 			MLK(DTCM_OFFSET, (unsigned long) dtcm_end),
 			MLK(ITCM_OFFSET, (unsigned long) itcm_end),
 #endif
-			MLK(FIXADDR_START, FIXADDR_TOP),
+			MLK(FIXADDR_START, FIXADDR_END),
 			MLM(VMALLOC_START, VMALLOC_END),
 			MLM(PAGE_OFFSET, (unsigned long)high_memory),
 #ifdef CONFIG_HIGHMEM
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 2/8] ARM: expand fixmap region to 3MB
  2014-09-03 21:57 ` Kees Cook
@ 2014-09-03 21:57   ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Rob Herring, Will Deacon, Rabin Vincent, Laura Abbott,
	Leif Lindholm, Mark Salter, Liu hua, Nikolay Borisov,
	Nicolas Pitre, Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

From: Rob Herring <robh@kernel.org>

With commit a05e54c103b0 ("ARM: 8031/2: change fixmap mapping region to
support 32 CPUs"), the fixmap region was expanded to 2MB, but it
precluded any other uses of the fixmap region. In order to support other
uses the fixmap region needs to be expanded beyond 2MB. Fortunately, the
adjacent 1MB range 0xffe00000-0xfff00000 is availabe.

Remove fixmap_page_table ptr and lookup the page table via the virtual
address so that the fixmap region can span more that one pmd. The 2nd
pmd is already created since it is shared with the vector page.

Signed-off-by: Rob Herring <robh@kernel.org>
[kees: fixed CONFIG_DEBUG_HIGHMEM get_fixmap() calls]
[kees: moved pte allocation outside of CONFIG_HIGHMEM]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 Documentation/arm/memory.txt  |  2 +-
 arch/arm/include/asm/fixmap.h |  2 +-
 arch/arm/mm/highmem.c         | 15 ++++++++-------
 arch/arm/mm/mmu.c             |  6 +++---
 4 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/Documentation/arm/memory.txt b/Documentation/arm/memory.txt
index 38dc06d0a791..4178ebda6e66 100644
--- a/Documentation/arm/memory.txt
+++ b/Documentation/arm/memory.txt
@@ -41,7 +41,7 @@ fffe8000	fffeffff	DTCM mapping area for platforms with
 fffe0000	fffe7fff	ITCM mapping area for platforms with
 				ITCM mounted inside the CPU.
 
-ffc00000	ffdfffff	Fixmap mapping region.  Addresses provided
+ffc00000	ffefffff	Fixmap mapping region.  Addresses provided
 				by fix_to_virt() will be located here.
 
 fee00000	feffffff	Mapping of PCI I/O space. This is a static
diff --git a/arch/arm/include/asm/fixmap.h b/arch/arm/include/asm/fixmap.h
index a7add6f9315d..d984ca69ce19 100644
--- a/arch/arm/include/asm/fixmap.h
+++ b/arch/arm/include/asm/fixmap.h
@@ -2,7 +2,7 @@
 #define _ASM_FIXMAP_H
 
 #define FIXADDR_START		0xffc00000UL
-#define FIXADDR_END		0xffe00000UL
+#define FIXADDR_END		0xfff00000UL
 #define FIXADDR_TOP		(FIXADDR_END - PAGE_SIZE)
 
 #include <asm/kmap_types.h>
diff --git a/arch/arm/mm/highmem.c b/arch/arm/mm/highmem.c
index 45aeaaca9052..81061987ac45 100644
--- a/arch/arm/mm/highmem.c
+++ b/arch/arm/mm/highmem.c
@@ -18,19 +18,20 @@
 #include <asm/tlbflush.h>
 #include "mm.h"
 
-pte_t *fixmap_page_table;
-
 static inline void set_fixmap_pte(int idx, pte_t pte)
 {
 	unsigned long vaddr = __fix_to_virt(idx);
-	set_pte_ext(fixmap_page_table + idx, pte, 0);
+	pte_t *ptep = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
+
+	set_pte_ext(ptep, pte, 0);
 	local_flush_tlb_kernel_page(vaddr);
 }
 
 static inline pte_t get_fixmap_pte(unsigned long vaddr)
 {
-	unsigned long idx = __virt_to_fix(vaddr);
-	return *(fixmap_page_table + idx);
+	pte_t *ptep = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
+
+	return *ptep;
 }
 
 void *kmap(struct page *page)
@@ -84,7 +85,7 @@ void *kmap_atomic(struct page *page)
 	 * With debugging enabled, kunmap_atomic forces that entry to 0.
 	 * Make sure it was indeed properly unmapped.
 	 */
-	BUG_ON(!pte_none(*(fixmap_page_table + idx)));
+	BUG_ON(!pte_none(get_fixmap_pte(vaddr)));
 #endif
 	/*
 	 * When debugging is off, kunmap_atomic leaves the previous mapping
@@ -134,7 +135,7 @@ void *kmap_atomic_pfn(unsigned long pfn)
 	idx = type + KM_TYPE_NR * smp_processor_id();
 	vaddr = __fix_to_virt(idx);
 #ifdef CONFIG_DEBUG_HIGHMEM
-	BUG_ON(!pte_none(*(fixmap_page_table + idx)));
+	BUG_ON(!pte_none(get_fixmap_pte(vaddr)));
 #endif
 	set_fixmap_pte(idx, pfn_pte(pfn, kmap_prot));
 
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 8348ed6b2efe..7fa0966cd15f 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1326,10 +1326,10 @@ static void __init kmap_init(void)
 #ifdef CONFIG_HIGHMEM
 	pkmap_page_table = early_pte_alloc(pmd_off_k(PKMAP_BASE),
 		PKMAP_BASE, _PAGE_KERNEL_TABLE);
-
-	fixmap_page_table = early_pte_alloc(pmd_off_k(FIXADDR_START),
-		FIXADDR_START, _PAGE_KERNEL_TABLE);
 #endif
+
+	early_pte_alloc(pmd_off_k(FIXADDR_START), FIXADDR_START,
+			_PAGE_KERNEL_TABLE);
 }
 
 static void __init map_lowmem(void)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 2/8] ARM: expand fixmap region to 3MB
@ 2014-09-03 21:57   ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-arm-kernel

From: Rob Herring <robh@kernel.org>

With commit a05e54c103b0 ("ARM: 8031/2: change fixmap mapping region to
support 32 CPUs"), the fixmap region was expanded to 2MB, but it
precluded any other uses of the fixmap region. In order to support other
uses the fixmap region needs to be expanded beyond 2MB. Fortunately, the
adjacent 1MB range 0xffe00000-0xfff00000 is availabe.

Remove fixmap_page_table ptr and lookup the page table via the virtual
address so that the fixmap region can span more that one pmd. The 2nd
pmd is already created since it is shared with the vector page.

Signed-off-by: Rob Herring <robh@kernel.org>
[kees: fixed CONFIG_DEBUG_HIGHMEM get_fixmap() calls]
[kees: moved pte allocation outside of CONFIG_HIGHMEM]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 Documentation/arm/memory.txt  |  2 +-
 arch/arm/include/asm/fixmap.h |  2 +-
 arch/arm/mm/highmem.c         | 15 ++++++++-------
 arch/arm/mm/mmu.c             |  6 +++---
 4 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/Documentation/arm/memory.txt b/Documentation/arm/memory.txt
index 38dc06d0a791..4178ebda6e66 100644
--- a/Documentation/arm/memory.txt
+++ b/Documentation/arm/memory.txt
@@ -41,7 +41,7 @@ fffe8000	fffeffff	DTCM mapping area for platforms with
 fffe0000	fffe7fff	ITCM mapping area for platforms with
 				ITCM mounted inside the CPU.
 
-ffc00000	ffdfffff	Fixmap mapping region.  Addresses provided
+ffc00000	ffefffff	Fixmap mapping region.  Addresses provided
 				by fix_to_virt() will be located here.
 
 fee00000	feffffff	Mapping of PCI I/O space. This is a static
diff --git a/arch/arm/include/asm/fixmap.h b/arch/arm/include/asm/fixmap.h
index a7add6f9315d..d984ca69ce19 100644
--- a/arch/arm/include/asm/fixmap.h
+++ b/arch/arm/include/asm/fixmap.h
@@ -2,7 +2,7 @@
 #define _ASM_FIXMAP_H
 
 #define FIXADDR_START		0xffc00000UL
-#define FIXADDR_END		0xffe00000UL
+#define FIXADDR_END		0xfff00000UL
 #define FIXADDR_TOP		(FIXADDR_END - PAGE_SIZE)
 
 #include <asm/kmap_types.h>
diff --git a/arch/arm/mm/highmem.c b/arch/arm/mm/highmem.c
index 45aeaaca9052..81061987ac45 100644
--- a/arch/arm/mm/highmem.c
+++ b/arch/arm/mm/highmem.c
@@ -18,19 +18,20 @@
 #include <asm/tlbflush.h>
 #include "mm.h"
 
-pte_t *fixmap_page_table;
-
 static inline void set_fixmap_pte(int idx, pte_t pte)
 {
 	unsigned long vaddr = __fix_to_virt(idx);
-	set_pte_ext(fixmap_page_table + idx, pte, 0);
+	pte_t *ptep = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
+
+	set_pte_ext(ptep, pte, 0);
 	local_flush_tlb_kernel_page(vaddr);
 }
 
 static inline pte_t get_fixmap_pte(unsigned long vaddr)
 {
-	unsigned long idx = __virt_to_fix(vaddr);
-	return *(fixmap_page_table + idx);
+	pte_t *ptep = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
+
+	return *ptep;
 }
 
 void *kmap(struct page *page)
@@ -84,7 +85,7 @@ void *kmap_atomic(struct page *page)
 	 * With debugging enabled, kunmap_atomic forces that entry to 0.
 	 * Make sure it was indeed properly unmapped.
 	 */
-	BUG_ON(!pte_none(*(fixmap_page_table + idx)));
+	BUG_ON(!pte_none(get_fixmap_pte(vaddr)));
 #endif
 	/*
 	 * When debugging is off, kunmap_atomic leaves the previous mapping
@@ -134,7 +135,7 @@ void *kmap_atomic_pfn(unsigned long pfn)
 	idx = type + KM_TYPE_NR * smp_processor_id();
 	vaddr = __fix_to_virt(idx);
 #ifdef CONFIG_DEBUG_HIGHMEM
-	BUG_ON(!pte_none(*(fixmap_page_table + idx)));
+	BUG_ON(!pte_none(get_fixmap_pte(vaddr)));
 #endif
 	set_fixmap_pte(idx, pfn_pte(pfn, kmap_prot));
 
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 8348ed6b2efe..7fa0966cd15f 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1326,10 +1326,10 @@ static void __init kmap_init(void)
 #ifdef CONFIG_HIGHMEM
 	pkmap_page_table = early_pte_alloc(pmd_off_k(PKMAP_BASE),
 		PKMAP_BASE, _PAGE_KERNEL_TABLE);
-
-	fixmap_page_table = early_pte_alloc(pmd_off_k(FIXADDR_START),
-		FIXADDR_START, _PAGE_KERNEL_TABLE);
 #endif
+
+	early_pte_alloc(pmd_off_k(FIXADDR_START), FIXADDR_START,
+			_PAGE_KERNEL_TABLE);
 }
 
 static void __init map_lowmem(void)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-03 21:57 ` Kees Cook
@ 2014-09-03 21:57   ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Rabin Vincent, Will Deacon, Laura Abbott, Rob Herring,
	Leif Lindholm, Mark Salter, Liu hua, Nikolay Borisov,
	Nicolas Pitre, Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

This is used from set_fixmap() and clear_fixmap() via asm-generic/fixmap.h.
Also makes sure that the fixmap allocation fits into the expected range.

Based on patch by Rabin Vincent.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Rabin Vincent <rabin@rab.in>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/include/asm/fixmap.h |  2 ++
 arch/arm/mm/mmu.c             | 25 +++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/arch/arm/include/asm/fixmap.h b/arch/arm/include/asm/fixmap.h
index d984ca69ce19..714606f70425 100644
--- a/arch/arm/include/asm/fixmap.h
+++ b/arch/arm/include/asm/fixmap.h
@@ -14,6 +14,8 @@ enum fixed_addresses {
 	__end_of_fixed_addresses
 };
 
+void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot);
+
 #include <asm-generic/fixmap.h>
 
 #endif
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 7fa0966cd15f..fa3a667852cb 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -22,6 +22,7 @@
 #include <asm/cputype.h>
 #include <asm/sections.h>
 #include <asm/cachetype.h>
+#include <asm/fixmap.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <asm/smp_plat.h>
@@ -392,6 +393,30 @@ SET_MEMORY_FN(rw, pte_set_rw)
 SET_MEMORY_FN(x, pte_set_x)
 SET_MEMORY_FN(nx, pte_set_nx)
 
+void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
+{
+	unsigned long vaddr = __fix_to_virt(idx);
+	pte_t *pte = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
+
+	/* Make sure fixmap region does not exceed available allocation. */
+	BUILD_BUG_ON(FIXADDR_START + (__end_of_fixed_addresses * PAGE_SIZE) >
+		     FIXADDR_END);
+	BUG_ON(idx >= __end_of_fixed_addresses);
+
+	if (pgprot_val(prot))
+		set_pte_at(NULL, vaddr, pte,
+			pfn_pte(phys >> PAGE_SHIFT, prot));
+	else
+		pte_clear(NULL, vaddr, pte);
+
+	/*
+	 * Given the potential a15 tlbi errata, we can only do tlb flushes
+	 * with interrupts disabled. Callers must have taken care of this.
+	 */
+	WARN_ON_ONCE(!irqs_disabled());
+	flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
+}
+
 /*
  * Adjust the PMD section entries according to the CPU in use.
  */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-03 21:57   ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-arm-kernel

This is used from set_fixmap() and clear_fixmap() via asm-generic/fixmap.h.
Also makes sure that the fixmap allocation fits into the expected range.

Based on patch by Rabin Vincent.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Rabin Vincent <rabin@rab.in>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/include/asm/fixmap.h |  2 ++
 arch/arm/mm/mmu.c             | 25 +++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/arch/arm/include/asm/fixmap.h b/arch/arm/include/asm/fixmap.h
index d984ca69ce19..714606f70425 100644
--- a/arch/arm/include/asm/fixmap.h
+++ b/arch/arm/include/asm/fixmap.h
@@ -14,6 +14,8 @@ enum fixed_addresses {
 	__end_of_fixed_addresses
 };
 
+void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot);
+
 #include <asm-generic/fixmap.h>
 
 #endif
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 7fa0966cd15f..fa3a667852cb 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -22,6 +22,7 @@
 #include <asm/cputype.h>
 #include <asm/sections.h>
 #include <asm/cachetype.h>
+#include <asm/fixmap.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <asm/smp_plat.h>
@@ -392,6 +393,30 @@ SET_MEMORY_FN(rw, pte_set_rw)
 SET_MEMORY_FN(x, pte_set_x)
 SET_MEMORY_FN(nx, pte_set_nx)
 
+void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
+{
+	unsigned long vaddr = __fix_to_virt(idx);
+	pte_t *pte = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
+
+	/* Make sure fixmap region does not exceed available allocation. */
+	BUILD_BUG_ON(FIXADDR_START + (__end_of_fixed_addresses * PAGE_SIZE) >
+		     FIXADDR_END);
+	BUG_ON(idx >= __end_of_fixed_addresses);
+
+	if (pgprot_val(prot))
+		set_pte_at(NULL, vaddr, pte,
+			pfn_pte(phys >> PAGE_SHIFT, prot));
+	else
+		pte_clear(NULL, vaddr, pte);
+
+	/*
+	 * Given the potential a15 tlbi errata, we can only do tlb flushes
+	 * with interrupts disabled. Callers must have taken care of this.
+	 */
+	WARN_ON_ONCE(!irqs_disabled());
+	flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
+}
+
 /*
  * Adjust the PMD section entries according to the CPU in use.
  */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 4/8] arm: use fixmap for text patching when text is RO
  2014-09-03 21:57 ` Kees Cook
@ 2014-09-03 21:57   ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Rabin Vincent, Will Deacon, Laura Abbott, Rob Herring,
	Leif Lindholm, Mark Salter, Liu hua, Nikolay Borisov,
	Nicolas Pitre, Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

From: Rabin Vincent <rabin@rab.in>

Use fixmaps for text patching when the kernel text is read-only,
inspired by x86.  This makes jump labels and kprobes work with the
currently available CONFIG_DEBUG_SET_MODULE_RONX and the upcoming
CONFIG_DEBUG_RODATA options.

Signed-off-by: Rabin Vincent <rabin@rab.in>
[kees: fixed up for merge with "arm: use generic fixmap.h"]
[kees: added parse acquire/release annotations to pass C=1 builds]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/include/asm/fixmap.h |  4 +++
 arch/arm/kernel/jump_label.c  |  2 +-
 arch/arm/kernel/patch.c       | 79 +++++++++++++++++++++++++++++++++++++++----
 arch/arm/kernel/patch.h       | 12 ++++++-
 4 files changed, 88 insertions(+), 9 deletions(-)

diff --git a/arch/arm/include/asm/fixmap.h b/arch/arm/include/asm/fixmap.h
index 714606f70425..0415eae1df27 100644
--- a/arch/arm/include/asm/fixmap.h
+++ b/arch/arm/include/asm/fixmap.h
@@ -11,6 +11,10 @@ enum fixed_addresses {
 	FIX_KMAP_BEGIN,
 	FIX_KMAP_END = FIX_KMAP_BEGIN + (KM_TYPE_NR * NR_CPUS) - 1,
 
+	/* Support writing RO kernel text via kprobes, jump labels, etc. */
+	FIX_TEXT_POKE0,
+	FIX_TEXT_POKE1,
+
 	__end_of_fixed_addresses
 };
 
diff --git a/arch/arm/kernel/jump_label.c b/arch/arm/kernel/jump_label.c
index 4ce4f789446d..afeeb9ea6f43 100644
--- a/arch/arm/kernel/jump_label.c
+++ b/arch/arm/kernel/jump_label.c
@@ -19,7 +19,7 @@ static void __arch_jump_label_transform(struct jump_entry *entry,
 		insn = arm_gen_nop();
 
 	if (is_static)
-		__patch_text(addr, insn);
+		__patch_text_early(addr, insn);
 	else
 		patch_text(addr, insn);
 }
diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
index 07314af47733..a1dce690446a 100644
--- a/arch/arm/kernel/patch.c
+++ b/arch/arm/kernel/patch.c
@@ -1,8 +1,11 @@
 #include <linux/kernel.h>
+#include <linux/spinlock.h>
 #include <linux/kprobes.h>
+#include <linux/mm.h>
 #include <linux/stop_machine.h>
 
 #include <asm/cacheflush.h>
+#include <asm/fixmap.h>
 #include <asm/smp_plat.h>
 #include <asm/opcodes.h>
 
@@ -13,21 +16,77 @@ struct patch {
 	unsigned int insn;
 };
 
-void __kprobes __patch_text(void *addr, unsigned int insn)
+static DEFINE_SPINLOCK(patch_lock);
+
+static void __kprobes *patch_map(void *addr, int fixmap, unsigned long *flags)
+	__acquires(&patch_lock)
+{
+	unsigned int uintaddr = (uintptr_t) addr;
+	bool module = !core_kernel_text(uintaddr);
+	struct page *page;
+
+	if (module && IS_ENABLED(CONFIG_DEBUG_SET_MODULE_RONX))
+		page = vmalloc_to_page(addr);
+	else if (!module && IS_ENABLED(CONFIG_DEBUG_RODATA))
+		page = virt_to_page(addr);
+	else
+		return addr;
+
+	if (flags)
+		spin_lock_irqsave(&patch_lock, *flags);
+	else
+		__acquire(&patch_lock);
+
+	set_fixmap(fixmap, page_to_phys(page));
+
+	return (void *) (__fix_to_virt(fixmap) + (uintaddr & ~PAGE_MASK));
+}
+
+static void __kprobes patch_unmap(int fixmap, unsigned long *flags)
+	__releases(&patch_lock)
+{
+	clear_fixmap(fixmap);
+
+	if (flags)
+		spin_unlock_irqrestore(&patch_lock, *flags);
+	else
+		__release(&patch_lock);
+}
+
+void __kprobes __patch_text_real(void *addr, unsigned int insn, bool remap)
 {
 	bool thumb2 = IS_ENABLED(CONFIG_THUMB2_KERNEL);
+	unsigned int uintaddr = (uintptr_t) addr;
+	bool twopage = false;
+	unsigned long flags;
+	void *waddr = addr;
 	int size;
 
+	if (remap)
+		waddr = patch_map(addr, FIX_TEXT_POKE0, &flags);
+	else
+		__acquire(&patch_lock);
+
 	if (thumb2 && __opcode_is_thumb16(insn)) {
-		*(u16 *)addr = __opcode_to_mem_thumb16(insn);
+		*(u16 *)waddr = __opcode_to_mem_thumb16(insn);
 		size = sizeof(u16);
-	} else if (thumb2 && ((uintptr_t)addr & 2)) {
+	} else if (thumb2 && (uintaddr & 2)) {
 		u16 first = __opcode_thumb32_first(insn);
 		u16 second = __opcode_thumb32_second(insn);
-		u16 *addrh = addr;
+		u16 *addrh0 = waddr;
+		u16 *addrh1 = waddr + 2;
+
+		twopage = (uintaddr & ~PAGE_MASK) == PAGE_SIZE - 2;
+		if (twopage && remap)
+			addrh1 = patch_map(addr + 2, FIX_TEXT_POKE1, NULL);
 
-		addrh[0] = __opcode_to_mem_thumb16(first);
-		addrh[1] = __opcode_to_mem_thumb16(second);
+		*addrh0 = __opcode_to_mem_thumb16(first);
+		*addrh1 = __opcode_to_mem_thumb16(second);
+
+		if (twopage && addrh1 != addr + 2) {
+			flush_kernel_vmap_range(addrh1, 2);
+			patch_unmap(FIX_TEXT_POKE1, NULL);
+		}
 
 		size = sizeof(u32);
 	} else {
@@ -36,10 +95,16 @@ void __kprobes __patch_text(void *addr, unsigned int insn)
 		else
 			insn = __opcode_to_mem_arm(insn);
 
-		*(u32 *)addr = insn;
+		*(u32 *)waddr = insn;
 		size = sizeof(u32);
 	}
 
+	if (waddr != addr) {
+		flush_kernel_vmap_range(waddr, twopage ? size / 2 : size);
+		patch_unmap(FIX_TEXT_POKE0, &flags);
+	} else
+		__release(&patch_lock);
+
 	flush_icache_range((uintptr_t)(addr),
 			   (uintptr_t)(addr) + size);
 }
diff --git a/arch/arm/kernel/patch.h b/arch/arm/kernel/patch.h
index b4731f2dac38..77e054c2f6cd 100644
--- a/arch/arm/kernel/patch.h
+++ b/arch/arm/kernel/patch.h
@@ -2,6 +2,16 @@
 #define _ARM_KERNEL_PATCH_H
 
 void patch_text(void *addr, unsigned int insn);
-void __patch_text(void *addr, unsigned int insn);
+void __patch_text_real(void *addr, unsigned int insn, bool remap);
+
+static inline void __patch_text(void *addr, unsigned int insn)
+{
+	__patch_text_real(addr, insn, true);
+}
+
+static inline void __patch_text_early(void *addr, unsigned int insn)
+{
+	__patch_text_real(addr, insn, false);
+}
 
 #endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 4/8] arm: use fixmap for text patching when text is RO
@ 2014-09-03 21:57   ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-arm-kernel

From: Rabin Vincent <rabin@rab.in>

Use fixmaps for text patching when the kernel text is read-only,
inspired by x86.  This makes jump labels and kprobes work with the
currently available CONFIG_DEBUG_SET_MODULE_RONX and the upcoming
CONFIG_DEBUG_RODATA options.

Signed-off-by: Rabin Vincent <rabin@rab.in>
[kees: fixed up for merge with "arm: use generic fixmap.h"]
[kees: added parse acquire/release annotations to pass C=1 builds]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/include/asm/fixmap.h |  4 +++
 arch/arm/kernel/jump_label.c  |  2 +-
 arch/arm/kernel/patch.c       | 79 +++++++++++++++++++++++++++++++++++++++----
 arch/arm/kernel/patch.h       | 12 ++++++-
 4 files changed, 88 insertions(+), 9 deletions(-)

diff --git a/arch/arm/include/asm/fixmap.h b/arch/arm/include/asm/fixmap.h
index 714606f70425..0415eae1df27 100644
--- a/arch/arm/include/asm/fixmap.h
+++ b/arch/arm/include/asm/fixmap.h
@@ -11,6 +11,10 @@ enum fixed_addresses {
 	FIX_KMAP_BEGIN,
 	FIX_KMAP_END = FIX_KMAP_BEGIN + (KM_TYPE_NR * NR_CPUS) - 1,
 
+	/* Support writing RO kernel text via kprobes, jump labels, etc. */
+	FIX_TEXT_POKE0,
+	FIX_TEXT_POKE1,
+
 	__end_of_fixed_addresses
 };
 
diff --git a/arch/arm/kernel/jump_label.c b/arch/arm/kernel/jump_label.c
index 4ce4f789446d..afeeb9ea6f43 100644
--- a/arch/arm/kernel/jump_label.c
+++ b/arch/arm/kernel/jump_label.c
@@ -19,7 +19,7 @@ static void __arch_jump_label_transform(struct jump_entry *entry,
 		insn = arm_gen_nop();
 
 	if (is_static)
-		__patch_text(addr, insn);
+		__patch_text_early(addr, insn);
 	else
 		patch_text(addr, insn);
 }
diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
index 07314af47733..a1dce690446a 100644
--- a/arch/arm/kernel/patch.c
+++ b/arch/arm/kernel/patch.c
@@ -1,8 +1,11 @@
 #include <linux/kernel.h>
+#include <linux/spinlock.h>
 #include <linux/kprobes.h>
+#include <linux/mm.h>
 #include <linux/stop_machine.h>
 
 #include <asm/cacheflush.h>
+#include <asm/fixmap.h>
 #include <asm/smp_plat.h>
 #include <asm/opcodes.h>
 
@@ -13,21 +16,77 @@ struct patch {
 	unsigned int insn;
 };
 
-void __kprobes __patch_text(void *addr, unsigned int insn)
+static DEFINE_SPINLOCK(patch_lock);
+
+static void __kprobes *patch_map(void *addr, int fixmap, unsigned long *flags)
+	__acquires(&patch_lock)
+{
+	unsigned int uintaddr = (uintptr_t) addr;
+	bool module = !core_kernel_text(uintaddr);
+	struct page *page;
+
+	if (module && IS_ENABLED(CONFIG_DEBUG_SET_MODULE_RONX))
+		page = vmalloc_to_page(addr);
+	else if (!module && IS_ENABLED(CONFIG_DEBUG_RODATA))
+		page = virt_to_page(addr);
+	else
+		return addr;
+
+	if (flags)
+		spin_lock_irqsave(&patch_lock, *flags);
+	else
+		__acquire(&patch_lock);
+
+	set_fixmap(fixmap, page_to_phys(page));
+
+	return (void *) (__fix_to_virt(fixmap) + (uintaddr & ~PAGE_MASK));
+}
+
+static void __kprobes patch_unmap(int fixmap, unsigned long *flags)
+	__releases(&patch_lock)
+{
+	clear_fixmap(fixmap);
+
+	if (flags)
+		spin_unlock_irqrestore(&patch_lock, *flags);
+	else
+		__release(&patch_lock);
+}
+
+void __kprobes __patch_text_real(void *addr, unsigned int insn, bool remap)
 {
 	bool thumb2 = IS_ENABLED(CONFIG_THUMB2_KERNEL);
+	unsigned int uintaddr = (uintptr_t) addr;
+	bool twopage = false;
+	unsigned long flags;
+	void *waddr = addr;
 	int size;
 
+	if (remap)
+		waddr = patch_map(addr, FIX_TEXT_POKE0, &flags);
+	else
+		__acquire(&patch_lock);
+
 	if (thumb2 && __opcode_is_thumb16(insn)) {
-		*(u16 *)addr = __opcode_to_mem_thumb16(insn);
+		*(u16 *)waddr = __opcode_to_mem_thumb16(insn);
 		size = sizeof(u16);
-	} else if (thumb2 && ((uintptr_t)addr & 2)) {
+	} else if (thumb2 && (uintaddr & 2)) {
 		u16 first = __opcode_thumb32_first(insn);
 		u16 second = __opcode_thumb32_second(insn);
-		u16 *addrh = addr;
+		u16 *addrh0 = waddr;
+		u16 *addrh1 = waddr + 2;
+
+		twopage = (uintaddr & ~PAGE_MASK) == PAGE_SIZE - 2;
+		if (twopage && remap)
+			addrh1 = patch_map(addr + 2, FIX_TEXT_POKE1, NULL);
 
-		addrh[0] = __opcode_to_mem_thumb16(first);
-		addrh[1] = __opcode_to_mem_thumb16(second);
+		*addrh0 = __opcode_to_mem_thumb16(first);
+		*addrh1 = __opcode_to_mem_thumb16(second);
+
+		if (twopage && addrh1 != addr + 2) {
+			flush_kernel_vmap_range(addrh1, 2);
+			patch_unmap(FIX_TEXT_POKE1, NULL);
+		}
 
 		size = sizeof(u32);
 	} else {
@@ -36,10 +95,16 @@ void __kprobes __patch_text(void *addr, unsigned int insn)
 		else
 			insn = __opcode_to_mem_arm(insn);
 
-		*(u32 *)addr = insn;
+		*(u32 *)waddr = insn;
 		size = sizeof(u32);
 	}
 
+	if (waddr != addr) {
+		flush_kernel_vmap_range(waddr, twopage ? size / 2 : size);
+		patch_unmap(FIX_TEXT_POKE0, &flags);
+	} else
+		__release(&patch_lock);
+
 	flush_icache_range((uintptr_t)(addr),
 			   (uintptr_t)(addr) + size);
 }
diff --git a/arch/arm/kernel/patch.h b/arch/arm/kernel/patch.h
index b4731f2dac38..77e054c2f6cd 100644
--- a/arch/arm/kernel/patch.h
+++ b/arch/arm/kernel/patch.h
@@ -2,6 +2,16 @@
 #define _ARM_KERNEL_PATCH_H
 
 void patch_text(void *addr, unsigned int insn);
-void __patch_text(void *addr, unsigned int insn);
+void __patch_text_real(void *addr, unsigned int insn, bool remap);
+
+static inline void __patch_text(void *addr, unsigned int insn)
+{
+	__patch_text_real(addr, insn, true);
+}
+
+static inline void __patch_text_early(void *addr, unsigned int insn)
+{
+	__patch_text_real(addr, insn, false);
+}
 
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 5/8] ARM: kexec: Make .text R/W in machine_kexec
  2014-09-03 21:57 ` Kees Cook
@ 2014-09-03 21:57   ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Nikolay Borisov, Will Deacon, Rabin Vincent,
	Laura Abbott, Rob Herring, Leif Lindholm, Mark Salter, Liu hua,
	Nicolas Pitre, Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

From: Nikolay Borisov <Nikolay.Borisov@arm.com>

With the introduction of Kees Cook's patch to make the kernel .text
read-only the existing method by which kexec works got broken since it
directly pokes some values in the template code, which resides in the
.text section.

The current patch changes the way those values are inserted so that poking
.text section occurs only in machine_kexec (e.g when we are about to nuke
the old kernel and are beyond the point of return). This allows to use
set_kernel_text_rw() to directly patch the values in the .text section.

I had already sent a patch which achieved this but it was significantly
more complicated, so this is a cleaner/straight-forward approach.

Signed-off-by: Nikolay Borisov <Nikolay.Borisov@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
[kees: collapsed kexec_boot_atags (will.daecon)]
[kees: for bisectability, moved set_kernel_text_rw() to RODATA patch]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/kernel/machine_kexec.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index 8cf0996aa1a8..8f75250cbe30 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -29,6 +29,7 @@ extern unsigned long kexec_boot_atags;
 
 static atomic_t waiting_for_crash_ipi;
 
+static unsigned long dt_mem;
 /*
  * Provide a dummy crash_notes definition while crash dump arrives to arm.
  * This prevents breakage of crash_notes attribute in kernel/ksysfs.c.
@@ -64,7 +65,7 @@ int machine_kexec_prepare(struct kimage *image)
 			return err;
 
 		if (be32_to_cpu(header) == OF_DT_HEADER)
-			kexec_boot_atags = current_segment->mem;
+			dt_mem = current_segment->mem;
 	}
 	return 0;
 }
@@ -166,9 +167,8 @@ void machine_kexec(struct kimage *image)
 	kexec_start_address = image->start;
 	kexec_indirection_page = page_list;
 	kexec_mach_type = machine_arch_type;
-	if (!kexec_boot_atags)
-		kexec_boot_atags = image->start - KEXEC_ARM_ZIMAGE_OFFSET + KEXEC_ARM_ATAGS_OFFSET;
-
+	kexec_boot_atags = dt_mem ?: image->start - KEXEC_ARM_ZIMAGE_OFFSET
+				     + KEXEC_ARM_ATAGS_OFFSET;
 
 	/* copy our kernel relocation code to the control code page */
 	reboot_entry = fncpy(reboot_code_buffer,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 5/8] ARM: kexec: Make .text R/W in machine_kexec
@ 2014-09-03 21:57   ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-arm-kernel

From: Nikolay Borisov <Nikolay.Borisov@arm.com>

With the introduction of Kees Cook's patch to make the kernel .text
read-only the existing method by which kexec works got broken since it
directly pokes some values in the template code, which resides in the
.text section.

The current patch changes the way those values are inserted so that poking
.text section occurs only in machine_kexec (e.g when we are about to nuke
the old kernel and are beyond the point of return). This allows to use
set_kernel_text_rw() to directly patch the values in the .text section.

I had already sent a patch which achieved this but it was significantly
more complicated, so this is a cleaner/straight-forward approach.

Signed-off-by: Nikolay Borisov <Nikolay.Borisov@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
[kees: collapsed kexec_boot_atags (will.daecon)]
[kees: for bisectability, moved set_kernel_text_rw() to RODATA patch]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/kernel/machine_kexec.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index 8cf0996aa1a8..8f75250cbe30 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -29,6 +29,7 @@ extern unsigned long kexec_boot_atags;
 
 static atomic_t waiting_for_crash_ipi;
 
+static unsigned long dt_mem;
 /*
  * Provide a dummy crash_notes definition while crash dump arrives to arm.
  * This prevents breakage of crash_notes attribute in kernel/ksysfs.c.
@@ -64,7 +65,7 @@ int machine_kexec_prepare(struct kimage *image)
 			return err;
 
 		if (be32_to_cpu(header) == OF_DT_HEADER)
-			kexec_boot_atags = current_segment->mem;
+			dt_mem = current_segment->mem;
 	}
 	return 0;
 }
@@ -166,9 +167,8 @@ void machine_kexec(struct kimage *image)
 	kexec_start_address = image->start;
 	kexec_indirection_page = page_list;
 	kexec_mach_type = machine_arch_type;
-	if (!kexec_boot_atags)
-		kexec_boot_atags = image->start - KEXEC_ARM_ZIMAGE_OFFSET + KEXEC_ARM_ATAGS_OFFSET;
-
+	kexec_boot_atags = dt_mem ?: image->start - KEXEC_ARM_ZIMAGE_OFFSET
+				     + KEXEC_ARM_ATAGS_OFFSET;
 
 	/* copy our kernel relocation code to the control code page */
 	reboot_entry = fncpy(reboot_code_buffer,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 6/8] arm: kgdb: Handle read-only text / modules
  2014-09-03 21:57 ` Kees Cook
@ 2014-09-03 21:57   ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Doug Anderson, Will Deacon, Rabin Vincent,
	Laura Abbott, Rob Herring, Leif Lindholm, Mark Salter, Liu hua,
	Nikolay Borisov, Nicolas Pitre, Doug Anderson, Jason Wessel,
	Catalin Marinas, Russell King - ARM Linux, linux-arm-kernel

From: Doug Anderson <dianders@chromium.org>

Handle the case where someone has set the text segment of the kernel
as read-only by using the newly introduced "patch" mechanism.

Signed-off-by: Doug Anderson <dianders@chromium.org>
[kees: switched structure size check to BUILD_BUG_ON (sboyd)]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/kernel/Makefile |  2 +-
 arch/arm/kernel/kgdb.c   | 29 +++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index 38ddd9f83d0e..70b730766330 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -67,7 +67,7 @@ test-kprobes-objs		+= kprobes-test-arm.o
 endif
 obj-$(CONFIG_OABI_COMPAT)	+= sys_oabi-compat.o
 obj-$(CONFIG_ARM_THUMBEE)	+= thumbee.o
-obj-$(CONFIG_KGDB)		+= kgdb.o
+obj-$(CONFIG_KGDB)		+= kgdb.o patch.o
 obj-$(CONFIG_ARM_UNWIND)	+= unwind.o
 obj-$(CONFIG_HAVE_TCM)		+= tcm.o
 obj-$(CONFIG_OF)		+= devtree.o
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index a74b53c1b7df..07db2f8a1b45 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -12,8 +12,12 @@
 #include <linux/irq.h>
 #include <linux/kdebug.h>
 #include <linux/kgdb.h>
+#include <linux/uaccess.h>
+
 #include <asm/traps.h>
 
+#include "patch.h"
+
 struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] =
 {
 	{ "r0", 4, offsetof(struct pt_regs, ARM_r0)},
@@ -244,6 +248,31 @@ void kgdb_arch_exit(void)
 	unregister_die_notifier(&kgdb_notifier);
 }
 
+int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
+{
+	int err;
+
+	/* patch_text() only supports int-sized breakpoints */
+	BUILD_BUG_ON(sizeof(int) != BREAK_INSTR_SIZE);
+
+	err = probe_kernel_read(bpt->saved_instr, (char *)bpt->bpt_addr,
+				BREAK_INSTR_SIZE);
+	if (err)
+		return err;
+
+	patch_text((void *)bpt->bpt_addr,
+		   *(unsigned int *)arch_kgdb_ops.gdb_bpt_instr);
+
+	return err;
+}
+
+int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
+{
+	patch_text((void *)bpt->bpt_addr, *(unsigned int *)bpt->saved_instr);
+
+	return 0;
+}
+
 /*
  * Register our undef instruction hooks with ARM undef core.
  * We regsiter a hook specifically looking for the KGB break inst
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 6/8] arm: kgdb: Handle read-only text / modules
@ 2014-09-03 21:57   ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-arm-kernel

From: Doug Anderson <dianders@chromium.org>

Handle the case where someone has set the text segment of the kernel
as read-only by using the newly introduced "patch" mechanism.

Signed-off-by: Doug Anderson <dianders@chromium.org>
[kees: switched structure size check to BUILD_BUG_ON (sboyd)]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/kernel/Makefile |  2 +-
 arch/arm/kernel/kgdb.c   | 29 +++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index 38ddd9f83d0e..70b730766330 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -67,7 +67,7 @@ test-kprobes-objs		+= kprobes-test-arm.o
 endif
 obj-$(CONFIG_OABI_COMPAT)	+= sys_oabi-compat.o
 obj-$(CONFIG_ARM_THUMBEE)	+= thumbee.o
-obj-$(CONFIG_KGDB)		+= kgdb.o
+obj-$(CONFIG_KGDB)		+= kgdb.o patch.o
 obj-$(CONFIG_ARM_UNWIND)	+= unwind.o
 obj-$(CONFIG_HAVE_TCM)		+= tcm.o
 obj-$(CONFIG_OF)		+= devtree.o
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index a74b53c1b7df..07db2f8a1b45 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -12,8 +12,12 @@
 #include <linux/irq.h>
 #include <linux/kdebug.h>
 #include <linux/kgdb.h>
+#include <linux/uaccess.h>
+
 #include <asm/traps.h>
 
+#include "patch.h"
+
 struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] =
 {
 	{ "r0", 4, offsetof(struct pt_regs, ARM_r0)},
@@ -244,6 +248,31 @@ void kgdb_arch_exit(void)
 	unregister_die_notifier(&kgdb_notifier);
 }
 
+int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
+{
+	int err;
+
+	/* patch_text() only supports int-sized breakpoints */
+	BUILD_BUG_ON(sizeof(int) != BREAK_INSTR_SIZE);
+
+	err = probe_kernel_read(bpt->saved_instr, (char *)bpt->bpt_addr,
+				BREAK_INSTR_SIZE);
+	if (err)
+		return err;
+
+	patch_text((void *)bpt->bpt_addr,
+		   *(unsigned int *)arch_kgdb_ops.gdb_bpt_instr);
+
+	return err;
+}
+
+int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
+{
+	patch_text((void *)bpt->bpt_addr, *(unsigned int *)bpt->saved_instr);
+
+	return 0;
+}
+
 /*
  * Register our undef instruction hooks with ARM undef core.
  * We regsiter a hook specifically looking for the KGB break inst
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 7/8] ARM: mm: allow non-text sections to be non-executable
  2014-09-03 21:57 ` Kees Cook
@ 2014-09-03 21:57   ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Will Deacon, Rabin Vincent, Laura Abbott, Rob Herring,
	Leif Lindholm, Mark Salter, Liu hua, Nikolay Borisov,
	Nicolas Pitre, Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

Adds CONFIG_ARM_KERNMEM_PERMS to separate the kernel memory regions
into section-sized areas that can have different permisions. Performs
the NX permission changes during free_initmem, so that init memory can be
reclaimed.

This uses section size instead of PMD size to reduce memory lost to
padding on non-LPAE systems.

Based on work by Brad Spengler, Larry Bassel, and Laura Abbott.

Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/kernel/vmlinux.lds.S |  17 +++++++
 arch/arm/mm/Kconfig           |   9 ++++
 arch/arm/mm/init.c            | 101 +++++++++++++++++++++++++++++++++++++++++-
 arch/arm/mm/mmu.c             |  13 +++++-
 4 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 6f57cb94367f..a3d07ca2bbb4 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -8,6 +8,9 @@
 #include <asm/thread_info.h>
 #include <asm/memory.h>
 #include <asm/page.h>
+#ifdef CONFIG_ARM_KERNMEM_PERMS
+#include <asm/pgtable.h>
+#endif
 	
 #define PROC_INFO							\
 	. = ALIGN(4);							\
@@ -90,6 +93,11 @@ SECTIONS
 		_text = .;
 		HEAD_TEXT
 	}
+
+#ifdef CONFIG_ARM_KERNMEM_PERMS
+	. = ALIGN(1<<SECTION_SHIFT);
+#endif
+
 	.text : {			/* Real text segment		*/
 		_stext = .;		/* Text and read-only data	*/
 			__exception_text_start = .;
@@ -145,7 +153,11 @@ SECTIONS
 	_etext = .;			/* End of text and rodata section */
 
 #ifndef CONFIG_XIP_KERNEL
+# ifdef CONFIG_ARM_KERNMEM_PERMS
+	. = ALIGN(1<<SECTION_SHIFT);
+# else
 	. = ALIGN(PAGE_SIZE);
+# endif
 	__init_begin = .;
 #endif
 	/*
@@ -220,7 +232,12 @@ SECTIONS
 	. = PAGE_OFFSET + TEXT_OFFSET;
 #else
 	__init_end = .;
+
+#ifdef CONFIG_ARM_KERNMEM_PERMS
+	. = ALIGN(1<<SECTION_SHIFT);
+#else
 	. = ALIGN(THREAD_SIZE);
+#endif
 	__data_loc = .;
 #endif
 
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index ae69809a9e47..7a0756df91a2 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -1008,3 +1008,12 @@ config ARCH_SUPPORTS_BIG_ENDIAN
 	help
 	  This option specifies the architecture can support big endian
 	  operation.
+
+config ARM_KERNMEM_PERMS
+	bool "Restrict kernel memory permissions"
+	help
+	  If this is set, kernel memory other than kernel text (and rodata)
+	  will be made non-executable. The tradeoff is that each region is
+	  padded to section-size (1MiB) boundaries (because their permissions
+	  are different and splitting the 1M pages into 4K ones causes TLB
+	  performance problems), wasting memory.
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index ad82c05bfc3a..e6bfe76b2f59 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -29,6 +29,7 @@
 #include <asm/prom.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
+#include <asm/system_info.h>
 #include <asm/tlb.h>
 #include <asm/fixmap.h>
 
@@ -615,7 +616,99 @@ void __init mem_init(void)
 	}
 }
 
-void free_initmem(void)
+#ifdef CONFIG_ARM_KERNMEM_PERMS
+struct section_perm {
+	unsigned long start;
+	unsigned long end;
+	pmdval_t mask;
+	pmdval_t prot;
+};
+
+struct section_perm nx_perms[] = {
+	/* Make pages tables, etc before _stext RW (set NX). */
+	{
+		.start	= PAGE_OFFSET,
+		.end	= (unsigned long)_stext,
+		.mask	= ~PMD_SECT_XN,
+		.prot	= PMD_SECT_XN,
+	},
+	/* Make init RW (set NX). */
+	{
+		.start	= (unsigned long)__init_begin,
+		.end	= (unsigned long)_sdata,
+		.mask	= ~PMD_SECT_XN,
+		.prot	= PMD_SECT_XN,
+	},
+};
+
+/*
+ * Updates section permissions only for the current mm (sections are
+ * copied into each mm). During startup, this is the init_mm. Is only
+ * safe to be called with preemption disabled, as under stop_machine().
+ */
+static inline void section_update(unsigned long addr, pmdval_t mask,
+				  pmdval_t prot)
+{
+	struct mm_struct *mm;
+	pmd_t *pmd;
+
+	mm = current->active_mm;
+	pmd = pmd_offset(pud_offset(pgd_offset(mm, addr), addr), addr);
+
+#ifdef CONFIG_ARM_LPAE
+	pmd[0] = __pmd((pmd_val(pmd[0]) & mask) | prot);
+#else
+	if (addr & SECTION_SIZE)
+		pmd[1] = __pmd((pmd_val(pmd[1]) & mask) | prot);
+	else
+		pmd[0] = __pmd((pmd_val(pmd[0]) & mask) | prot);
+#endif
+	flush_pmd_entry(pmd);
+	local_flush_tlb_kernel_range(addr, addr + SECTION_SIZE);
+}
+
+/* Make sure extended page tables are in use. */
+static inline bool arch_has_strict_perms(void)
+{
+	if (cpu_architecture() < CPU_ARCH_ARMv6)
+		return false;
+
+	return !!(get_cr() & CR_XP);
+}
+
+#define set_section_perms(perms, field)	{				\
+	size_t i;							\
+	unsigned long addr;						\
+									\
+	if (!arch_has_strict_perms())					\
+		return;							\
+									\
+	for (i = 0; i < ARRAY_SIZE(perms); i++) {			\
+		if (!IS_ALIGNED(perms[i].start, SECTION_SIZE) ||	\
+		    !IS_ALIGNED(perms[i].end, SECTION_SIZE)) {		\
+			pr_err("BUG: section %lx-%lx not aligned to %lx\n", \
+				perms[i].start, perms[i].end,		\
+				SECTION_SIZE);				\
+			continue;					\
+		}							\
+									\
+		for (addr = perms[i].start;				\
+		     addr < perms[i].end;				\
+		     addr += SECTION_SIZE)				\
+			section_update(addr, perms[i].mask,		\
+				       perms[i].field);			\
+	}								\
+}
+
+static inline void fix_kernmem_perms(void)
+{
+	set_section_perms(nx_perms, prot);
+}
+#else
+static inline void fix_kernmem_perms(void) { }
+#endif /* CONFIG_ARM_KERNMEM_PERMS */
+
+void free_tcmmem(void)
 {
 #ifdef CONFIG_HAVE_TCM
 	extern char __tcm_start, __tcm_end;
@@ -623,6 +716,12 @@ void free_initmem(void)
 	poison_init_mem(&__tcm_start, &__tcm_end - &__tcm_start);
 	free_reserved_area(&__tcm_start, &__tcm_end, -1, "TCM link");
 #endif
+}
+
+void free_initmem(void)
+{
+	fix_kernmem_perms();
+	free_tcmmem();
 
 	poison_init_mem(__init_begin, __init_end - __init_begin);
 	if (!machine_is_integrator() && !machine_is_cintegrator())
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index fa3a667852cb..c5e7e342ff69 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1374,13 +1374,24 @@ static void __init map_lowmem(void)
 		if (start >= end)
 			break;
 
-		if (end < kernel_x_start || start >= kernel_x_end) {
+		if (end < kernel_x_start) {
 			map.pfn = __phys_to_pfn(start);
 			map.virtual = __phys_to_virt(start);
 			map.length = end - start;
 			map.type = MT_MEMORY_RWX;
 
 			create_mapping(&map);
+		} else if (start >= kernel_x_end) {
+			map.pfn = __phys_to_pfn(start);
+			map.virtual = __phys_to_virt(start);
+			map.length = end - start;
+#ifdef CONFIG_ARM_KERNMEM_PERMS
+			map.type = MT_MEMORY_RW;
+#else
+			map.type = MT_MEMORY_RWX;
+#endif
+
+			create_mapping(&map);
 		} else {
 			/* This better cover the entire kernel */
 			if (start < kernel_x_start) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 7/8] ARM: mm: allow non-text sections to be non-executable
@ 2014-09-03 21:57   ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-arm-kernel

Adds CONFIG_ARM_KERNMEM_PERMS to separate the kernel memory regions
into section-sized areas that can have different permisions. Performs
the NX permission changes during free_initmem, so that init memory can be
reclaimed.

This uses section size instead of PMD size to reduce memory lost to
padding on non-LPAE systems.

Based on work by Brad Spengler, Larry Bassel, and Laura Abbott.

Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/kernel/vmlinux.lds.S |  17 +++++++
 arch/arm/mm/Kconfig           |   9 ++++
 arch/arm/mm/init.c            | 101 +++++++++++++++++++++++++++++++++++++++++-
 arch/arm/mm/mmu.c             |  13 +++++-
 4 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 6f57cb94367f..a3d07ca2bbb4 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -8,6 +8,9 @@
 #include <asm/thread_info.h>
 #include <asm/memory.h>
 #include <asm/page.h>
+#ifdef CONFIG_ARM_KERNMEM_PERMS
+#include <asm/pgtable.h>
+#endif
 	
 #define PROC_INFO							\
 	. = ALIGN(4);							\
@@ -90,6 +93,11 @@ SECTIONS
 		_text = .;
 		HEAD_TEXT
 	}
+
+#ifdef CONFIG_ARM_KERNMEM_PERMS
+	. = ALIGN(1<<SECTION_SHIFT);
+#endif
+
 	.text : {			/* Real text segment		*/
 		_stext = .;		/* Text and read-only data	*/
 			__exception_text_start = .;
@@ -145,7 +153,11 @@ SECTIONS
 	_etext = .;			/* End of text and rodata section */
 
 #ifndef CONFIG_XIP_KERNEL
+# ifdef CONFIG_ARM_KERNMEM_PERMS
+	. = ALIGN(1<<SECTION_SHIFT);
+# else
 	. = ALIGN(PAGE_SIZE);
+# endif
 	__init_begin = .;
 #endif
 	/*
@@ -220,7 +232,12 @@ SECTIONS
 	. = PAGE_OFFSET + TEXT_OFFSET;
 #else
 	__init_end = .;
+
+#ifdef CONFIG_ARM_KERNMEM_PERMS
+	. = ALIGN(1<<SECTION_SHIFT);
+#else
 	. = ALIGN(THREAD_SIZE);
+#endif
 	__data_loc = .;
 #endif
 
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index ae69809a9e47..7a0756df91a2 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -1008,3 +1008,12 @@ config ARCH_SUPPORTS_BIG_ENDIAN
 	help
 	  This option specifies the architecture can support big endian
 	  operation.
+
+config ARM_KERNMEM_PERMS
+	bool "Restrict kernel memory permissions"
+	help
+	  If this is set, kernel memory other than kernel text (and rodata)
+	  will be made non-executable. The tradeoff is that each region is
+	  padded to section-size (1MiB) boundaries (because their permissions
+	  are different and splitting the 1M pages into 4K ones causes TLB
+	  performance problems), wasting memory.
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index ad82c05bfc3a..e6bfe76b2f59 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -29,6 +29,7 @@
 #include <asm/prom.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
+#include <asm/system_info.h>
 #include <asm/tlb.h>
 #include <asm/fixmap.h>
 
@@ -615,7 +616,99 @@ void __init mem_init(void)
 	}
 }
 
-void free_initmem(void)
+#ifdef CONFIG_ARM_KERNMEM_PERMS
+struct section_perm {
+	unsigned long start;
+	unsigned long end;
+	pmdval_t mask;
+	pmdval_t prot;
+};
+
+struct section_perm nx_perms[] = {
+	/* Make pages tables, etc before _stext RW (set NX). */
+	{
+		.start	= PAGE_OFFSET,
+		.end	= (unsigned long)_stext,
+		.mask	= ~PMD_SECT_XN,
+		.prot	= PMD_SECT_XN,
+	},
+	/* Make init RW (set NX). */
+	{
+		.start	= (unsigned long)__init_begin,
+		.end	= (unsigned long)_sdata,
+		.mask	= ~PMD_SECT_XN,
+		.prot	= PMD_SECT_XN,
+	},
+};
+
+/*
+ * Updates section permissions only for the current mm (sections are
+ * copied into each mm). During startup, this is the init_mm. Is only
+ * safe to be called with preemption disabled, as under stop_machine().
+ */
+static inline void section_update(unsigned long addr, pmdval_t mask,
+				  pmdval_t prot)
+{
+	struct mm_struct *mm;
+	pmd_t *pmd;
+
+	mm = current->active_mm;
+	pmd = pmd_offset(pud_offset(pgd_offset(mm, addr), addr), addr);
+
+#ifdef CONFIG_ARM_LPAE
+	pmd[0] = __pmd((pmd_val(pmd[0]) & mask) | prot);
+#else
+	if (addr & SECTION_SIZE)
+		pmd[1] = __pmd((pmd_val(pmd[1]) & mask) | prot);
+	else
+		pmd[0] = __pmd((pmd_val(pmd[0]) & mask) | prot);
+#endif
+	flush_pmd_entry(pmd);
+	local_flush_tlb_kernel_range(addr, addr + SECTION_SIZE);
+}
+
+/* Make sure extended page tables are in use. */
+static inline bool arch_has_strict_perms(void)
+{
+	if (cpu_architecture() < CPU_ARCH_ARMv6)
+		return false;
+
+	return !!(get_cr() & CR_XP);
+}
+
+#define set_section_perms(perms, field)	{				\
+	size_t i;							\
+	unsigned long addr;						\
+									\
+	if (!arch_has_strict_perms())					\
+		return;							\
+									\
+	for (i = 0; i < ARRAY_SIZE(perms); i++) {			\
+		if (!IS_ALIGNED(perms[i].start, SECTION_SIZE) ||	\
+		    !IS_ALIGNED(perms[i].end, SECTION_SIZE)) {		\
+			pr_err("BUG: section %lx-%lx not aligned to %lx\n", \
+				perms[i].start, perms[i].end,		\
+				SECTION_SIZE);				\
+			continue;					\
+		}							\
+									\
+		for (addr = perms[i].start;				\
+		     addr < perms[i].end;				\
+		     addr += SECTION_SIZE)				\
+			section_update(addr, perms[i].mask,		\
+				       perms[i].field);			\
+	}								\
+}
+
+static inline void fix_kernmem_perms(void)
+{
+	set_section_perms(nx_perms, prot);
+}
+#else
+static inline void fix_kernmem_perms(void) { }
+#endif /* CONFIG_ARM_KERNMEM_PERMS */
+
+void free_tcmmem(void)
 {
 #ifdef CONFIG_HAVE_TCM
 	extern char __tcm_start, __tcm_end;
@@ -623,6 +716,12 @@ void free_initmem(void)
 	poison_init_mem(&__tcm_start, &__tcm_end - &__tcm_start);
 	free_reserved_area(&__tcm_start, &__tcm_end, -1, "TCM link");
 #endif
+}
+
+void free_initmem(void)
+{
+	fix_kernmem_perms();
+	free_tcmmem();
 
 	poison_init_mem(__init_begin, __init_end - __init_begin);
 	if (!machine_is_integrator() && !machine_is_cintegrator())
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index fa3a667852cb..c5e7e342ff69 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1374,13 +1374,24 @@ static void __init map_lowmem(void)
 		if (start >= end)
 			break;
 
-		if (end < kernel_x_start || start >= kernel_x_end) {
+		if (end < kernel_x_start) {
 			map.pfn = __phys_to_pfn(start);
 			map.virtual = __phys_to_virt(start);
 			map.length = end - start;
 			map.type = MT_MEMORY_RWX;
 
 			create_mapping(&map);
+		} else if (start >= kernel_x_end) {
+			map.pfn = __phys_to_pfn(start);
+			map.virtual = __phys_to_virt(start);
+			map.length = end - start;
+#ifdef CONFIG_ARM_KERNMEM_PERMS
+			map.type = MT_MEMORY_RW;
+#else
+			map.type = MT_MEMORY_RWX;
+#endif
+
+			create_mapping(&map);
 		} else {
 			/* This better cover the entire kernel */
 			if (start < kernel_x_start) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 8/8] ARM: mm: allow text and rodata sections to be read-only
  2014-09-03 21:57 ` Kees Cook
@ 2014-09-03 21:57   ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Will Deacon, Rabin Vincent, Laura Abbott, Rob Herring,
	Leif Lindholm, Mark Salter, Liu hua, Nikolay Borisov,
	Nicolas Pitre, Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

This introduces CONFIG_DEBUG_RODATA, making kernel text and rodata
read-only. Additionally, this splits rodata from text so that rodata can
also be NX, which may lead to wasted memory when aligning to SECTION_SIZE.
The read-only areas are made writable during ftrace updates and kexec.

Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/include/asm/cacheflush.h | 10 ++++++++
 arch/arm/kernel/ftrace.c          | 19 ++++++++++++++++
 arch/arm/kernel/machine_kexec.c   |  1 +
 arch/arm/kernel/vmlinux.lds.S     |  3 +++
 arch/arm/mm/Kconfig               | 12 ++++++++++
 arch/arm/mm/init.c                | 48 ++++++++++++++++++++++++++++++++++++++-
 6 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 79ecb4f34ffb..9108292edcb5 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -486,6 +486,16 @@ int set_memory_rw(unsigned long addr, int numpages);
 int set_memory_x(unsigned long addr, int numpages);
 int set_memory_nx(unsigned long addr, int numpages);
 
+#ifdef CONFIG_DEBUG_RODATA
+void mark_rodata_ro(void);
+void set_kernel_text_rw(void);
+void set_kernel_text_ro(void);
+#else
+static inline void set_kernel_text_rw(void) { }
+static inline void set_kernel_text_ro(void) { }
+#endif
+
 void flush_uprobe_xol_access(struct page *page, unsigned long uaddr,
 			     void *kaddr, unsigned long len);
+
 #endif
diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
index af9a8a927a4e..b8c75e45a950 100644
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -15,6 +15,7 @@
 #include <linux/ftrace.h>
 #include <linux/uaccess.h>
 #include <linux/module.h>
+#include <linux/stop_machine.h>
 
 #include <asm/cacheflush.h>
 #include <asm/opcodes.h>
@@ -35,6 +36,22 @@
 
 #define	OLD_NOP		0xe1a00000	/* mov r0, r0 */
 
+static int __ftrace_modify_code(void *data)
+{
+	int *command = data;
+
+	set_kernel_text_rw();
+	ftrace_modify_all_code(*command);
+	set_kernel_text_ro();
+
+	return 0;
+}
+
+void arch_ftrace_update_code(int command)
+{
+	stop_machine(__ftrace_modify_code, &command, NULL);
+}
+
 static unsigned long ftrace_nop_replace(struct dyn_ftrace *rec)
 {
 	return rec->arch.old_mcount ? OLD_NOP : NOP;
@@ -73,6 +90,8 @@ int ftrace_arch_code_modify_prepare(void)
 int ftrace_arch_code_modify_post_process(void)
 {
 	set_all_modules_text_ro();
+	/* Make sure any TLB misses during machine stop are cleared. */
+	flush_tlb_all();
 	return 0;
 }
 
diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index 8f75250cbe30..4423a565ef6f 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -164,6 +164,7 @@ void machine_kexec(struct kimage *image)
 	reboot_code_buffer = page_address(image->control_code_page);
 
 	/* Prepare parameters for reboot_code_buffer*/
+	set_kernel_text_rw();
 	kexec_start_address = image->start;
 	kexec_indirection_page = page_list;
 	kexec_mach_type = machine_arch_type;
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index a3d07ca2bbb4..542e58919bd9 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -120,6 +120,9 @@ SECTIONS
 			ARM_CPU_KEEP(PROC_INFO)
 	}
 
+#ifdef CONFIG_DEBUG_RODATA
+	. = ALIGN(1<<SECTION_SHIFT);
+#endif
 	RO_DATA(PAGE_SIZE)
 
 	. = ALIGN(4);
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 7a0756df91a2..c9cd9c5bf1e1 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -1017,3 +1017,15 @@ config ARM_KERNMEM_PERMS
 	  padded to section-size (1MiB) boundaries (because their permissions
 	  are different and splitting the 1M pages into 4K ones causes TLB
 	  performance problems), wasting memory.
+
+config DEBUG_RODATA
+	bool "Make kernel text and rodata read-only"
+	depends on ARM_KERNMEM_PERMS
+	default y
+	help
+	  If this is set, kernel text and rodata will be made read-only. This
+	  is to help catch accidental or malicious attempts to change the
+	  kernel's executable code. Additionally splits rodata from kernel
+	  text so it can be made explicitly non-executable. This creates
+	  another section-size padded region, so it can waste more memory
+	  space while gaining the read-only protections.
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index e6bfe76b2f59..dc2db779cdf4 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -622,9 +622,10 @@ struct section_perm {
 	unsigned long end;
 	pmdval_t mask;
 	pmdval_t prot;
+	pmdval_t clear;
 };
 
-struct section_perm nx_perms[] = {
+static struct section_perm nx_perms[] = {
 	/* Make pages tables, etc before _stext RW (set NX). */
 	{
 		.start	= PAGE_OFFSET,
@@ -639,8 +640,35 @@ struct section_perm nx_perms[] = {
 		.mask	= ~PMD_SECT_XN,
 		.prot	= PMD_SECT_XN,
 	},
+#ifdef CONFIG_DEBUG_RODATA
+	/* Make rodata NX (set RO in ro_perms below). */
+	{
+		.start  = (unsigned long)__start_rodata,
+		.end    = (unsigned long)__init_begin,
+		.mask   = ~PMD_SECT_XN,
+		.prot   = PMD_SECT_XN,
+	},
+#endif
 };
 
+#ifdef CONFIG_DEBUG_RODATA
+static struct section_perm ro_perms[] = {
+	/* Make kernel code and rodata RX (set RO). */
+	{
+		.start  = (unsigned long)_stext,
+		.end    = (unsigned long)__init_begin,
+#ifdef CONFIG_ARM_LPAE
+		.mask   = ~PMD_SECT_RDONLY,
+		.prot   = PMD_SECT_RDONLY,
+#else
+		.mask   = ~(PMD_SECT_APX | PMD_SECT_AP_WRITE),
+		.prot   = PMD_SECT_APX | PMD_SECT_AP_WRITE,
+		.clear  = PMD_SECT_AP_WRITE,
+#endif
+	},
+};
+#endif
+
 /*
  * Updates section permissions only for the current mm (sections are
  * copied into each mm). During startup, this is the init_mm. Is only
@@ -704,6 +732,24 @@ static inline void fix_kernmem_perms(void)
 {
 	set_section_perms(nx_perms, prot);
 }
+
+#ifdef CONFIG_DEBUG_RODATA
+void mark_rodata_ro(void)
+{
+	set_section_perms(ro_perms, prot);
+}
+
+void set_kernel_text_rw(void)
+{
+	set_section_perms(ro_perms, clear);
+}
+
+void set_kernel_text_ro(void)
+{
+	set_section_perms(ro_perms, prot);
+}
+#endif /* CONFIG_DEBUG_RODATA */
+
 #else
 static inline void fix_kernmem_perms(void) { }
 #endif /* CONFIG_ARM_KERNMEM_PERMS */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 8/8] ARM: mm: allow text and rodata sections to be read-only
@ 2014-09-03 21:57   ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-03 21:57 UTC (permalink / raw)
  To: linux-arm-kernel

This introduces CONFIG_DEBUG_RODATA, making kernel text and rodata
read-only. Additionally, this splits rodata from text so that rodata can
also be NX, which may lead to wasted memory when aligning to SECTION_SIZE.
The read-only areas are made writable during ftrace updates and kexec.

Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/include/asm/cacheflush.h | 10 ++++++++
 arch/arm/kernel/ftrace.c          | 19 ++++++++++++++++
 arch/arm/kernel/machine_kexec.c   |  1 +
 arch/arm/kernel/vmlinux.lds.S     |  3 +++
 arch/arm/mm/Kconfig               | 12 ++++++++++
 arch/arm/mm/init.c                | 48 ++++++++++++++++++++++++++++++++++++++-
 6 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 79ecb4f34ffb..9108292edcb5 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -486,6 +486,16 @@ int set_memory_rw(unsigned long addr, int numpages);
 int set_memory_x(unsigned long addr, int numpages);
 int set_memory_nx(unsigned long addr, int numpages);
 
+#ifdef CONFIG_DEBUG_RODATA
+void mark_rodata_ro(void);
+void set_kernel_text_rw(void);
+void set_kernel_text_ro(void);
+#else
+static inline void set_kernel_text_rw(void) { }
+static inline void set_kernel_text_ro(void) { }
+#endif
+
 void flush_uprobe_xol_access(struct page *page, unsigned long uaddr,
 			     void *kaddr, unsigned long len);
+
 #endif
diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c
index af9a8a927a4e..b8c75e45a950 100644
--- a/arch/arm/kernel/ftrace.c
+++ b/arch/arm/kernel/ftrace.c
@@ -15,6 +15,7 @@
 #include <linux/ftrace.h>
 #include <linux/uaccess.h>
 #include <linux/module.h>
+#include <linux/stop_machine.h>
 
 #include <asm/cacheflush.h>
 #include <asm/opcodes.h>
@@ -35,6 +36,22 @@
 
 #define	OLD_NOP		0xe1a00000	/* mov r0, r0 */
 
+static int __ftrace_modify_code(void *data)
+{
+	int *command = data;
+
+	set_kernel_text_rw();
+	ftrace_modify_all_code(*command);
+	set_kernel_text_ro();
+
+	return 0;
+}
+
+void arch_ftrace_update_code(int command)
+{
+	stop_machine(__ftrace_modify_code, &command, NULL);
+}
+
 static unsigned long ftrace_nop_replace(struct dyn_ftrace *rec)
 {
 	return rec->arch.old_mcount ? OLD_NOP : NOP;
@@ -73,6 +90,8 @@ int ftrace_arch_code_modify_prepare(void)
 int ftrace_arch_code_modify_post_process(void)
 {
 	set_all_modules_text_ro();
+	/* Make sure any TLB misses during machine stop are cleared. */
+	flush_tlb_all();
 	return 0;
 }
 
diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index 8f75250cbe30..4423a565ef6f 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -164,6 +164,7 @@ void machine_kexec(struct kimage *image)
 	reboot_code_buffer = page_address(image->control_code_page);
 
 	/* Prepare parameters for reboot_code_buffer*/
+	set_kernel_text_rw();
 	kexec_start_address = image->start;
 	kexec_indirection_page = page_list;
 	kexec_mach_type = machine_arch_type;
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index a3d07ca2bbb4..542e58919bd9 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -120,6 +120,9 @@ SECTIONS
 			ARM_CPU_KEEP(PROC_INFO)
 	}
 
+#ifdef CONFIG_DEBUG_RODATA
+	. = ALIGN(1<<SECTION_SHIFT);
+#endif
 	RO_DATA(PAGE_SIZE)
 
 	. = ALIGN(4);
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 7a0756df91a2..c9cd9c5bf1e1 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -1017,3 +1017,15 @@ config ARM_KERNMEM_PERMS
 	  padded to section-size (1MiB) boundaries (because their permissions
 	  are different and splitting the 1M pages into 4K ones causes TLB
 	  performance problems), wasting memory.
+
+config DEBUG_RODATA
+	bool "Make kernel text and rodata read-only"
+	depends on ARM_KERNMEM_PERMS
+	default y
+	help
+	  If this is set, kernel text and rodata will be made read-only. This
+	  is to help catch accidental or malicious attempts to change the
+	  kernel's executable code. Additionally splits rodata from kernel
+	  text so it can be made explicitly non-executable. This creates
+	  another section-size padded region, so it can waste more memory
+	  space while gaining the read-only protections.
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index e6bfe76b2f59..dc2db779cdf4 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -622,9 +622,10 @@ struct section_perm {
 	unsigned long end;
 	pmdval_t mask;
 	pmdval_t prot;
+	pmdval_t clear;
 };
 
-struct section_perm nx_perms[] = {
+static struct section_perm nx_perms[] = {
 	/* Make pages tables, etc before _stext RW (set NX). */
 	{
 		.start	= PAGE_OFFSET,
@@ -639,8 +640,35 @@ struct section_perm nx_perms[] = {
 		.mask	= ~PMD_SECT_XN,
 		.prot	= PMD_SECT_XN,
 	},
+#ifdef CONFIG_DEBUG_RODATA
+	/* Make rodata NX (set RO in ro_perms below). */
+	{
+		.start  = (unsigned long)__start_rodata,
+		.end    = (unsigned long)__init_begin,
+		.mask   = ~PMD_SECT_XN,
+		.prot   = PMD_SECT_XN,
+	},
+#endif
 };
 
+#ifdef CONFIG_DEBUG_RODATA
+static struct section_perm ro_perms[] = {
+	/* Make kernel code and rodata RX (set RO). */
+	{
+		.start  = (unsigned long)_stext,
+		.end    = (unsigned long)__init_begin,
+#ifdef CONFIG_ARM_LPAE
+		.mask   = ~PMD_SECT_RDONLY,
+		.prot   = PMD_SECT_RDONLY,
+#else
+		.mask   = ~(PMD_SECT_APX | PMD_SECT_AP_WRITE),
+		.prot   = PMD_SECT_APX | PMD_SECT_AP_WRITE,
+		.clear  = PMD_SECT_AP_WRITE,
+#endif
+	},
+};
+#endif
+
 /*
  * Updates section permissions only for the current mm (sections are
  * copied into each mm). During startup, this is the init_mm. Is only
@@ -704,6 +732,24 @@ static inline void fix_kernmem_perms(void)
 {
 	set_section_perms(nx_perms, prot);
 }
+
+#ifdef CONFIG_DEBUG_RODATA
+void mark_rodata_ro(void)
+{
+	set_section_perms(ro_perms, prot);
+}
+
+void set_kernel_text_rw(void)
+{
+	set_section_perms(ro_perms, clear);
+}
+
+void set_kernel_text_ro(void)
+{
+	set_section_perms(ro_perms, prot);
+}
+#endif /* CONFIG_DEBUG_RODATA */
+
 #else
 static inline void fix_kernmem_perms(void) { }
 #endif /* CONFIG_ARM_KERNMEM_PERMS */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-03 21:57   ` Kees Cook
@ 2014-09-04 17:03     ` Will Deacon
  -1 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2014-09-04 17:03 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Rabin Vincent, Laura Abbott, Rob Herring,
	Leif Lindholm, msalter, Liu hua, Nikolay Borisov, Nicolas Pitre,
	Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

Hi Kees,

On Wed, Sep 03, 2014 at 10:57:04PM +0100, Kees Cook wrote:
> This is used from set_fixmap() and clear_fixmap() via asm-generic/fixmap.h.
> Also makes sure that the fixmap allocation fits into the expected range.
> 
> Based on patch by Rabin Vincent.

[...]

> +void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
> +{
> +	unsigned long vaddr = __fix_to_virt(idx);
> +	pte_t *pte = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
> +
> +	/* Make sure fixmap region does not exceed available allocation. */
> +	BUILD_BUG_ON(FIXADDR_START + (__end_of_fixed_addresses * PAGE_SIZE) >
> +		     FIXADDR_END);
> +	BUG_ON(idx >= __end_of_fixed_addresses);
> +
> +	if (pgprot_val(prot))
> +		set_pte_at(NULL, vaddr, pte,
> +			pfn_pte(phys >> PAGE_SHIFT, prot));
> +	else
> +		pte_clear(NULL, vaddr, pte);
> +
> +	/*
> +	 * Given the potential a15 tlbi errata, we can only do tlb flushes
> +	 * with interrupts disabled. Callers must have taken care of this.
> +	 */
> +	WARN_ON_ONCE(!irqs_disabled());
> +	flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);

Aha, this explains why we were confusing each other! The issue is that
interrupts must be *enabled*, so this code does the exact opposite of
what we need.

I think this got lost in a sea of double negatives during the last round
of review.

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-04 17:03     ` Will Deacon
  0 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2014-09-04 17:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Kees,

On Wed, Sep 03, 2014 at 10:57:04PM +0100, Kees Cook wrote:
> This is used from set_fixmap() and clear_fixmap() via asm-generic/fixmap.h.
> Also makes sure that the fixmap allocation fits into the expected range.
> 
> Based on patch by Rabin Vincent.

[...]

> +void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
> +{
> +	unsigned long vaddr = __fix_to_virt(idx);
> +	pte_t *pte = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
> +
> +	/* Make sure fixmap region does not exceed available allocation. */
> +	BUILD_BUG_ON(FIXADDR_START + (__end_of_fixed_addresses * PAGE_SIZE) >
> +		     FIXADDR_END);
> +	BUG_ON(idx >= __end_of_fixed_addresses);
> +
> +	if (pgprot_val(prot))
> +		set_pte_at(NULL, vaddr, pte,
> +			pfn_pte(phys >> PAGE_SHIFT, prot));
> +	else
> +		pte_clear(NULL, vaddr, pte);
> +
> +	/*
> +	 * Given the potential a15 tlbi errata, we can only do tlb flushes
> +	 * with interrupts disabled. Callers must have taken care of this.
> +	 */
> +	WARN_ON_ONCE(!irqs_disabled());
> +	flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);

Aha, this explains why we were confusing each other! The issue is that
interrupts must be *enabled*, so this code does the exact opposite of
what we need.

I think this got lost in a sea of double negatives during the last round
of review.

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-04 17:03     ` Will Deacon
@ 2014-09-04 17:23       ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-04 17:23 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, Rabin Vincent, Laura Abbott, Rob Herring,
	Leif Lindholm, msalter, Liu hua, Nikolay Borisov, Nicolas Pitre,
	Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

On Thu, Sep 4, 2014 at 10:03 AM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Kees,
>
> On Wed, Sep 03, 2014 at 10:57:04PM +0100, Kees Cook wrote:
>> This is used from set_fixmap() and clear_fixmap() via asm-generic/fixmap.h.
>> Also makes sure that the fixmap allocation fits into the expected range.
>>
>> Based on patch by Rabin Vincent.
>
> [...]
>
>> +void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
>> +{
>> +     unsigned long vaddr = __fix_to_virt(idx);
>> +     pte_t *pte = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
>> +
>> +     /* Make sure fixmap region does not exceed available allocation. */
>> +     BUILD_BUG_ON(FIXADDR_START + (__end_of_fixed_addresses * PAGE_SIZE) >
>> +                  FIXADDR_END);
>> +     BUG_ON(idx >= __end_of_fixed_addresses);
>> +
>> +     if (pgprot_val(prot))
>> +             set_pte_at(NULL, vaddr, pte,
>> +                     pfn_pte(phys >> PAGE_SHIFT, prot));
>> +     else
>> +             pte_clear(NULL, vaddr, pte);
>> +
>> +     /*
>> +      * Given the potential a15 tlbi errata, we can only do tlb flushes
>> +      * with interrupts disabled. Callers must have taken care of this.
>> +      */
>> +     WARN_ON_ONCE(!irqs_disabled());
>> +     flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
>
> Aha, this explains why we were confusing each other! The issue is that
> interrupts must be *enabled*, so this code does the exact opposite of
> what we need.
>
> I think this got lost in a sea of double negatives during the last round
> of review.

Ah! If this is the case, perhaps we can get away with
local_flush_tlb_kernel_range() then?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-04 17:23       ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-04 17:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 4, 2014 at 10:03 AM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Kees,
>
> On Wed, Sep 03, 2014 at 10:57:04PM +0100, Kees Cook wrote:
>> This is used from set_fixmap() and clear_fixmap() via asm-generic/fixmap.h.
>> Also makes sure that the fixmap allocation fits into the expected range.
>>
>> Based on patch by Rabin Vincent.
>
> [...]
>
>> +void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
>> +{
>> +     unsigned long vaddr = __fix_to_virt(idx);
>> +     pte_t *pte = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
>> +
>> +     /* Make sure fixmap region does not exceed available allocation. */
>> +     BUILD_BUG_ON(FIXADDR_START + (__end_of_fixed_addresses * PAGE_SIZE) >
>> +                  FIXADDR_END);
>> +     BUG_ON(idx >= __end_of_fixed_addresses);
>> +
>> +     if (pgprot_val(prot))
>> +             set_pte_at(NULL, vaddr, pte,
>> +                     pfn_pte(phys >> PAGE_SHIFT, prot));
>> +     else
>> +             pte_clear(NULL, vaddr, pte);
>> +
>> +     /*
>> +      * Given the potential a15 tlbi errata, we can only do tlb flushes
>> +      * with interrupts disabled. Callers must have taken care of this.
>> +      */
>> +     WARN_ON_ONCE(!irqs_disabled());
>> +     flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
>
> Aha, this explains why we were confusing each other! The issue is that
> interrupts must be *enabled*, so this code does the exact opposite of
> what we need.
>
> I think this got lost in a sea of double negatives during the last round
> of review.

Ah! If this is the case, perhaps we can get away with
local_flush_tlb_kernel_range() then?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-04 17:23       ` Kees Cook
@ 2014-09-04 17:27         ` Will Deacon
  -1 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2014-09-04 17:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Rabin Vincent, Laura Abbott, Rob Herring,
	Leif Lindholm, msalter, Liu hua, Nikolay Borisov, Nicolas Pitre,
	Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
> On Thu, Sep 4, 2014 at 10:03 AM, Will Deacon <will.deacon@arm.com> wrote:
> > Hi Kees,
> >
> > On Wed, Sep 03, 2014 at 10:57:04PM +0100, Kees Cook wrote:
> >> This is used from set_fixmap() and clear_fixmap() via asm-generic/fixmap.h.
> >> Also makes sure that the fixmap allocation fits into the expected range.
> >>
> >> Based on patch by Rabin Vincent.
> >
> > [...]
> >
> >> +void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
> >> +{
> >> +     unsigned long vaddr = __fix_to_virt(idx);
> >> +     pte_t *pte = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
> >> +
> >> +     /* Make sure fixmap region does not exceed available allocation. */
> >> +     BUILD_BUG_ON(FIXADDR_START + (__end_of_fixed_addresses * PAGE_SIZE) >
> >> +                  FIXADDR_END);
> >> +     BUG_ON(idx >= __end_of_fixed_addresses);
> >> +
> >> +     if (pgprot_val(prot))
> >> +             set_pte_at(NULL, vaddr, pte,
> >> +                     pfn_pte(phys >> PAGE_SHIFT, prot));
> >> +     else
> >> +             pte_clear(NULL, vaddr, pte);
> >> +
> >> +     /*
> >> +      * Given the potential a15 tlbi errata, we can only do tlb flushes
> >> +      * with interrupts disabled. Callers must have taken care of this.
> >> +      */
> >> +     WARN_ON_ONCE(!irqs_disabled());
> >> +     flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
> >
> > Aha, this explains why we were confusing each other! The issue is that
> > interrupts must be *enabled*, so this code does the exact opposite of
> > what we need.
> >
> > I think this got lost in a sea of double negatives during the last round
> > of review.
> 
> Ah! If this is the case, perhaps we can get away with
> local_flush_tlb_kernel_range() then?

That's a bit tricky, since you need to ensure that preemption is disabled
until the mapping is put back like it was.

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-04 17:27         ` Will Deacon
  0 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2014-09-04 17:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
> On Thu, Sep 4, 2014 at 10:03 AM, Will Deacon <will.deacon@arm.com> wrote:
> > Hi Kees,
> >
> > On Wed, Sep 03, 2014 at 10:57:04PM +0100, Kees Cook wrote:
> >> This is used from set_fixmap() and clear_fixmap() via asm-generic/fixmap.h.
> >> Also makes sure that the fixmap allocation fits into the expected range.
> >>
> >> Based on patch by Rabin Vincent.
> >
> > [...]
> >
> >> +void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
> >> +{
> >> +     unsigned long vaddr = __fix_to_virt(idx);
> >> +     pte_t *pte = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
> >> +
> >> +     /* Make sure fixmap region does not exceed available allocation. */
> >> +     BUILD_BUG_ON(FIXADDR_START + (__end_of_fixed_addresses * PAGE_SIZE) >
> >> +                  FIXADDR_END);
> >> +     BUG_ON(idx >= __end_of_fixed_addresses);
> >> +
> >> +     if (pgprot_val(prot))
> >> +             set_pte_at(NULL, vaddr, pte,
> >> +                     pfn_pte(phys >> PAGE_SHIFT, prot));
> >> +     else
> >> +             pte_clear(NULL, vaddr, pte);
> >> +
> >> +     /*
> >> +      * Given the potential a15 tlbi errata, we can only do tlb flushes
> >> +      * with interrupts disabled. Callers must have taken care of this.
> >> +      */
> >> +     WARN_ON_ONCE(!irqs_disabled());
> >> +     flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
> >
> > Aha, this explains why we were confusing each other! The issue is that
> > interrupts must be *enabled*, so this code does the exact opposite of
> > what we need.
> >
> > I think this got lost in a sea of double negatives during the last round
> > of review.
> 
> Ah! If this is the case, perhaps we can get away with
> local_flush_tlb_kernel_range() then?

That's a bit tricky, since you need to ensure that preemption is disabled
until the mapping is put back like it was.

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-04 17:27         ` Will Deacon
@ 2014-09-05 19:41           ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-05 19:41 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, Rabin Vincent, Laura Abbott, Rob Herring,
	Leif Lindholm, msalter, Liu hua, Nikolay Borisov, Nicolas Pitre,
	Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

On Thu, Sep 4, 2014 at 10:27 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
>> On Thu, Sep 4, 2014 at 10:03 AM, Will Deacon <will.deacon@arm.com> wrote:
>> > Hi Kees,
>> >
>> > On Wed, Sep 03, 2014 at 10:57:04PM +0100, Kees Cook wrote:
>> >> This is used from set_fixmap() and clear_fixmap() via asm-generic/fixmap.h.
>> >> Also makes sure that the fixmap allocation fits into the expected range.
>> >>
>> >> Based on patch by Rabin Vincent.
>> >
>> > [...]
>> >
>> >> +void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
>> >> +{
>> >> +     unsigned long vaddr = __fix_to_virt(idx);
>> >> +     pte_t *pte = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
>> >> +
>> >> +     /* Make sure fixmap region does not exceed available allocation. */
>> >> +     BUILD_BUG_ON(FIXADDR_START + (__end_of_fixed_addresses * PAGE_SIZE) >
>> >> +                  FIXADDR_END);
>> >> +     BUG_ON(idx >= __end_of_fixed_addresses);
>> >> +
>> >> +     if (pgprot_val(prot))
>> >> +             set_pte_at(NULL, vaddr, pte,
>> >> +                     pfn_pte(phys >> PAGE_SHIFT, prot));
>> >> +     else
>> >> +             pte_clear(NULL, vaddr, pte);
>> >> +
>> >> +     /*
>> >> +      * Given the potential a15 tlbi errata, we can only do tlb flushes
>> >> +      * with interrupts disabled. Callers must have taken care of this.
>> >> +      */
>> >> +     WARN_ON_ONCE(!irqs_disabled());
>> >> +     flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
>> >
>> > Aha, this explains why we were confusing each other! The issue is that
>> > interrupts must be *enabled*, so this code does the exact opposite of
>> > what we need.
>> >
>> > I think this got lost in a sea of double negatives during the last round
>> > of review.
>>
>> Ah! If this is the case, perhaps we can get away with
>> local_flush_tlb_kernel_range() then?
>
> That's a bit tricky, since you need to ensure that preemption is disabled
> until the mapping is put back like it was.

Even with CONFIG_ARM_ERRATA_798181 enabled, I don't see a problem here
using flush_tlb_kernel_range(). When doing the ftrace work, this was
trivial to trip over, so I think we're in a good state here.

I've tested this on real hardware now too, and nothing falls over.
I'll get rid of the comment and the WARN_ON_ONCE, but AFAICT, this is
safe.

Do you have a suggestion about what needs fixing?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-05 19:41           ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-05 19:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 4, 2014 at 10:27 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
>> On Thu, Sep 4, 2014 at 10:03 AM, Will Deacon <will.deacon@arm.com> wrote:
>> > Hi Kees,
>> >
>> > On Wed, Sep 03, 2014 at 10:57:04PM +0100, Kees Cook wrote:
>> >> This is used from set_fixmap() and clear_fixmap() via asm-generic/fixmap.h.
>> >> Also makes sure that the fixmap allocation fits into the expected range.
>> >>
>> >> Based on patch by Rabin Vincent.
>> >
>> > [...]
>> >
>> >> +void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
>> >> +{
>> >> +     unsigned long vaddr = __fix_to_virt(idx);
>> >> +     pte_t *pte = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
>> >> +
>> >> +     /* Make sure fixmap region does not exceed available allocation. */
>> >> +     BUILD_BUG_ON(FIXADDR_START + (__end_of_fixed_addresses * PAGE_SIZE) >
>> >> +                  FIXADDR_END);
>> >> +     BUG_ON(idx >= __end_of_fixed_addresses);
>> >> +
>> >> +     if (pgprot_val(prot))
>> >> +             set_pte_at(NULL, vaddr, pte,
>> >> +                     pfn_pte(phys >> PAGE_SHIFT, prot));
>> >> +     else
>> >> +             pte_clear(NULL, vaddr, pte);
>> >> +
>> >> +     /*
>> >> +      * Given the potential a15 tlbi errata, we can only do tlb flushes
>> >> +      * with interrupts disabled. Callers must have taken care of this.
>> >> +      */
>> >> +     WARN_ON_ONCE(!irqs_disabled());
>> >> +     flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
>> >
>> > Aha, this explains why we were confusing each other! The issue is that
>> > interrupts must be *enabled*, so this code does the exact opposite of
>> > what we need.
>> >
>> > I think this got lost in a sea of double negatives during the last round
>> > of review.
>>
>> Ah! If this is the case, perhaps we can get away with
>> local_flush_tlb_kernel_range() then?
>
> That's a bit tricky, since you need to ensure that preemption is disabled
> until the mapping is put back like it was.

Even with CONFIG_ARM_ERRATA_798181 enabled, I don't see a problem here
using flush_tlb_kernel_range(). When doing the ftrace work, this was
trivial to trip over, so I think we're in a good state here.

I've tested this on real hardware now too, and nothing falls over.
I'll get rid of the comment and the WARN_ON_ONCE, but AFAICT, this is
safe.

Do you have a suggestion about what needs fixing?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-05 19:41           ` Kees Cook
@ 2014-09-08 10:39             ` Will Deacon
  -1 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2014-09-08 10:39 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Rabin Vincent, Laura Abbott, Rob Herring,
	Leif Lindholm, msalter, Liu hua, Nikolay Borisov, Nicolas Pitre,
	Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

On Fri, Sep 05, 2014 at 08:41:08PM +0100, Kees Cook wrote:
> On Thu, Sep 4, 2014 at 10:27 AM, Will Deacon <will.deacon@arm.com> wrote:
> > On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
> >> On Thu, Sep 4, 2014 at 10:03 AM, Will Deacon <will.deacon@arm.com> wrote:
> >> > Hi Kees,
> >> >
> >> > On Wed, Sep 03, 2014 at 10:57:04PM +0100, Kees Cook wrote:
> >> >> This is used from set_fixmap() and clear_fixmap() via asm-generic/fixmap.h.
> >> >> Also makes sure that the fixmap allocation fits into the expected range.
> >> >>
> >> >> Based on patch by Rabin Vincent.
> >> >
> >> > [...]
> >> >
> >> >> +void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
> >> >> +{
> >> >> +     unsigned long vaddr = __fix_to_virt(idx);
> >> >> +     pte_t *pte = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
> >> >> +
> >> >> +     /* Make sure fixmap region does not exceed available allocation. */
> >> >> +     BUILD_BUG_ON(FIXADDR_START + (__end_of_fixed_addresses * PAGE_SIZE) >
> >> >> +                  FIXADDR_END);
> >> >> +     BUG_ON(idx >= __end_of_fixed_addresses);
> >> >> +
> >> >> +     if (pgprot_val(prot))
> >> >> +             set_pte_at(NULL, vaddr, pte,
> >> >> +                     pfn_pte(phys >> PAGE_SHIFT, prot));
> >> >> +     else
> >> >> +             pte_clear(NULL, vaddr, pte);
> >> >> +
> >> >> +     /*
> >> >> +      * Given the potential a15 tlbi errata, we can only do tlb flushes
> >> >> +      * with interrupts disabled. Callers must have taken care of this.
> >> >> +      */
> >> >> +     WARN_ON_ONCE(!irqs_disabled());
> >> >> +     flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
> >> >
> >> > Aha, this explains why we were confusing each other! The issue is that
> >> > interrupts must be *enabled*, so this code does the exact opposite of
> >> > what we need.
> >> >
> >> > I think this got lost in a sea of double negatives during the last round
> >> > of review.
> >>
> >> Ah! If this is the case, perhaps we can get away with
> >> local_flush_tlb_kernel_range() then?
> >
> > That's a bit tricky, since you need to ensure that preemption is disabled
> > until the mapping is put back like it was.
> 
> Even with CONFIG_ARM_ERRATA_798181 enabled, I don't see a problem here
> using flush_tlb_kernel_range(). When doing the ftrace work, this was
> trivial to trip over, so I think we're in a good state here.
> 
> I've tested this on real hardware now too, and nothing falls over.
> I'll get rid of the comment and the WARN_ON_ONCE, but AFAICT, this is
> safe.

But was that hardware actually affected by the erratum? There is a runtime
check which will avoid the cross-call if not.

> Do you have a suggestion about what needs fixing?

The easiest thing to do would be ensuring that __set_fixmap is always called
with interrupts enabled. If you can guarantee that, then you need to rework
the locking so that interrupts aren't disabled. I admit that I've lost a
fair amount of the context here, but that's the fundamental issue.

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-08 10:39             ` Will Deacon
  0 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2014-09-08 10:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 05, 2014 at 08:41:08PM +0100, Kees Cook wrote:
> On Thu, Sep 4, 2014 at 10:27 AM, Will Deacon <will.deacon@arm.com> wrote:
> > On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
> >> On Thu, Sep 4, 2014 at 10:03 AM, Will Deacon <will.deacon@arm.com> wrote:
> >> > Hi Kees,
> >> >
> >> > On Wed, Sep 03, 2014 at 10:57:04PM +0100, Kees Cook wrote:
> >> >> This is used from set_fixmap() and clear_fixmap() via asm-generic/fixmap.h.
> >> >> Also makes sure that the fixmap allocation fits into the expected range.
> >> >>
> >> >> Based on patch by Rabin Vincent.
> >> >
> >> > [...]
> >> >
> >> >> +void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
> >> >> +{
> >> >> +     unsigned long vaddr = __fix_to_virt(idx);
> >> >> +     pte_t *pte = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
> >> >> +
> >> >> +     /* Make sure fixmap region does not exceed available allocation. */
> >> >> +     BUILD_BUG_ON(FIXADDR_START + (__end_of_fixed_addresses * PAGE_SIZE) >
> >> >> +                  FIXADDR_END);
> >> >> +     BUG_ON(idx >= __end_of_fixed_addresses);
> >> >> +
> >> >> +     if (pgprot_val(prot))
> >> >> +             set_pte_at(NULL, vaddr, pte,
> >> >> +                     pfn_pte(phys >> PAGE_SHIFT, prot));
> >> >> +     else
> >> >> +             pte_clear(NULL, vaddr, pte);
> >> >> +
> >> >> +     /*
> >> >> +      * Given the potential a15 tlbi errata, we can only do tlb flushes
> >> >> +      * with interrupts disabled. Callers must have taken care of this.
> >> >> +      */
> >> >> +     WARN_ON_ONCE(!irqs_disabled());
> >> >> +     flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
> >> >
> >> > Aha, this explains why we were confusing each other! The issue is that
> >> > interrupts must be *enabled*, so this code does the exact opposite of
> >> > what we need.
> >> >
> >> > I think this got lost in a sea of double negatives during the last round
> >> > of review.
> >>
> >> Ah! If this is the case, perhaps we can get away with
> >> local_flush_tlb_kernel_range() then?
> >
> > That's a bit tricky, since you need to ensure that preemption is disabled
> > until the mapping is put back like it was.
> 
> Even with CONFIG_ARM_ERRATA_798181 enabled, I don't see a problem here
> using flush_tlb_kernel_range(). When doing the ftrace work, this was
> trivial to trip over, so I think we're in a good state here.
> 
> I've tested this on real hardware now too, and nothing falls over.
> I'll get rid of the comment and the WARN_ON_ONCE, but AFAICT, this is
> safe.

But was that hardware actually affected by the erratum? There is a runtime
check which will avoid the cross-call if not.

> Do you have a suggestion about what needs fixing?

The easiest thing to do would be ensuring that __set_fixmap is always called
with interrupts enabled. If you can guarantee that, then you need to rework
the locking so that interrupts aren't disabled. I admit that I've lost a
fair amount of the context here, but that's the fundamental issue.

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-08 10:39             ` Will Deacon
@ 2014-09-08 18:38               ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-08 18:38 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, Rabin Vincent, Laura Abbott, Rob Herring,
	Leif Lindholm, msalter, Liu hua, Nikolay Borisov, Nicolas Pitre,
	Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

On Mon, Sep 8, 2014 at 3:39 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Sep 05, 2014 at 08:41:08PM +0100, Kees Cook wrote:
>> On Thu, Sep 4, 2014 at 10:27 AM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
>> >> On Thu, Sep 4, 2014 at 10:03 AM, Will Deacon <will.deacon@arm.com> wrote:
>> >> > Hi Kees,
>> >> >
>> >> > On Wed, Sep 03, 2014 at 10:57:04PM +0100, Kees Cook wrote:
>> >> >> This is used from set_fixmap() and clear_fixmap() via asm-generic/fixmap.h.
>> >> >> Also makes sure that the fixmap allocation fits into the expected range.
>> >> >>
>> >> >> Based on patch by Rabin Vincent.
>> >> >
>> >> > [...]
>> >> >
>> >> >> +void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
>> >> >> +{
>> >> >> +     unsigned long vaddr = __fix_to_virt(idx);
>> >> >> +     pte_t *pte = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
>> >> >> +
>> >> >> +     /* Make sure fixmap region does not exceed available allocation. */
>> >> >> +     BUILD_BUG_ON(FIXADDR_START + (__end_of_fixed_addresses * PAGE_SIZE) >
>> >> >> +                  FIXADDR_END);
>> >> >> +     BUG_ON(idx >= __end_of_fixed_addresses);
>> >> >> +
>> >> >> +     if (pgprot_val(prot))
>> >> >> +             set_pte_at(NULL, vaddr, pte,
>> >> >> +                     pfn_pte(phys >> PAGE_SHIFT, prot));
>> >> >> +     else
>> >> >> +             pte_clear(NULL, vaddr, pte);
>> >> >> +
>> >> >> +     /*
>> >> >> +      * Given the potential a15 tlbi errata, we can only do tlb flushes
>> >> >> +      * with interrupts disabled. Callers must have taken care of this.
>> >> >> +      */
>> >> >> +     WARN_ON_ONCE(!irqs_disabled());
>> >> >> +     flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
>> >> >
>> >> > Aha, this explains why we were confusing each other! The issue is that
>> >> > interrupts must be *enabled*, so this code does the exact opposite of
>> >> > what we need.
>> >> >
>> >> > I think this got lost in a sea of double negatives during the last round
>> >> > of review.
>> >>
>> >> Ah! If this is the case, perhaps we can get away with
>> >> local_flush_tlb_kernel_range() then?
>> >
>> > That's a bit tricky, since you need to ensure that preemption is disabled
>> > until the mapping is put back like it was.
>>
>> Even with CONFIG_ARM_ERRATA_798181 enabled, I don't see a problem here
>> using flush_tlb_kernel_range(). When doing the ftrace work, this was
>> trivial to trip over, so I think we're in a good state here.
>>
>> I've tested this on real hardware now too, and nothing falls over.
>> I'll get rid of the comment and the WARN_ON_ONCE, but AFAICT, this is
>> safe.
>
> But was that hardware actually affected by the erratum? There is a runtime
> check which will avoid the cross-call if not.
>
>> Do you have a suggestion about what needs fixing?
>
> The easiest thing to do would be ensuring that __set_fixmap is always called
> with interrupts enabled. If you can guarantee that, then you need to rework
> the locking so that interrupts aren't disabled. I admit that I've lost a
> fair amount of the context here, but that's the fundamental issue.

In retesting, it hit the problem. Ugh. I will investigate what can be done.

Thanks,

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-08 18:38               ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-08 18:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 8, 2014 at 3:39 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Sep 05, 2014 at 08:41:08PM +0100, Kees Cook wrote:
>> On Thu, Sep 4, 2014 at 10:27 AM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
>> >> On Thu, Sep 4, 2014 at 10:03 AM, Will Deacon <will.deacon@arm.com> wrote:
>> >> > Hi Kees,
>> >> >
>> >> > On Wed, Sep 03, 2014 at 10:57:04PM +0100, Kees Cook wrote:
>> >> >> This is used from set_fixmap() and clear_fixmap() via asm-generic/fixmap.h.
>> >> >> Also makes sure that the fixmap allocation fits into the expected range.
>> >> >>
>> >> >> Based on patch by Rabin Vincent.
>> >> >
>> >> > [...]
>> >> >
>> >> >> +void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
>> >> >> +{
>> >> >> +     unsigned long vaddr = __fix_to_virt(idx);
>> >> >> +     pte_t *pte = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
>> >> >> +
>> >> >> +     /* Make sure fixmap region does not exceed available allocation. */
>> >> >> +     BUILD_BUG_ON(FIXADDR_START + (__end_of_fixed_addresses * PAGE_SIZE) >
>> >> >> +                  FIXADDR_END);
>> >> >> +     BUG_ON(idx >= __end_of_fixed_addresses);
>> >> >> +
>> >> >> +     if (pgprot_val(prot))
>> >> >> +             set_pte_at(NULL, vaddr, pte,
>> >> >> +                     pfn_pte(phys >> PAGE_SHIFT, prot));
>> >> >> +     else
>> >> >> +             pte_clear(NULL, vaddr, pte);
>> >> >> +
>> >> >> +     /*
>> >> >> +      * Given the potential a15 tlbi errata, we can only do tlb flushes
>> >> >> +      * with interrupts disabled. Callers must have taken care of this.
>> >> >> +      */
>> >> >> +     WARN_ON_ONCE(!irqs_disabled());
>> >> >> +     flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
>> >> >
>> >> > Aha, this explains why we were confusing each other! The issue is that
>> >> > interrupts must be *enabled*, so this code does the exact opposite of
>> >> > what we need.
>> >> >
>> >> > I think this got lost in a sea of double negatives during the last round
>> >> > of review.
>> >>
>> >> Ah! If this is the case, perhaps we can get away with
>> >> local_flush_tlb_kernel_range() then?
>> >
>> > That's a bit tricky, since you need to ensure that preemption is disabled
>> > until the mapping is put back like it was.
>>
>> Even with CONFIG_ARM_ERRATA_798181 enabled, I don't see a problem here
>> using flush_tlb_kernel_range(). When doing the ftrace work, this was
>> trivial to trip over, so I think we're in a good state here.
>>
>> I've tested this on real hardware now too, and nothing falls over.
>> I'll get rid of the comment and the WARN_ON_ONCE, but AFAICT, this is
>> safe.
>
> But was that hardware actually affected by the erratum? There is a runtime
> check which will avoid the cross-call if not.
>
>> Do you have a suggestion about what needs fixing?
>
> The easiest thing to do would be ensuring that __set_fixmap is always called
> with interrupts enabled. If you can guarantee that, then you need to rework
> the locking so that interrupts aren't disabled. I admit that I've lost a
> fair amount of the context here, but that's the fundamental issue.

In retesting, it hit the problem. Ugh. I will investigate what can be done.

Thanks,

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-04 17:27         ` Will Deacon
@ 2014-09-08 19:16           ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-08 19:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, Rabin Vincent, Laura Abbott, Rob Herring,
	Leif Lindholm, msalter, Liu hua, Nikolay Borisov, Nicolas Pitre,
	Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

On Thu, Sep 04, 2014 at 06:27:48PM +0100, Will Deacon wrote:
> On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
> > On Thu, Sep 4, 2014 at 10:03 AM, Will Deacon <will.deacon@arm.com> wrote:
> > > Hi Kees,
> > >
> > > On Wed, Sep 03, 2014 at 10:57:04PM +0100, Kees Cook wrote:
> > >> This is used from set_fixmap() and clear_fixmap() via asm-generic/fixmap.h.
> > >> Also makes sure that the fixmap allocation fits into the expected range.
> > >>
> > >> Based on patch by Rabin Vincent.
> > >
> > > [...]
> > >
> > >> +void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
> > >> +{
> > >> +     unsigned long vaddr = __fix_to_virt(idx);
> > >> +     pte_t *pte = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
> > >> +
> > >> +     /* Make sure fixmap region does not exceed available allocation. */
> > >> +     BUILD_BUG_ON(FIXADDR_START + (__end_of_fixed_addresses * PAGE_SIZE) >
> > >> +                  FIXADDR_END);
> > >> +     BUG_ON(idx >= __end_of_fixed_addresses);
> > >> +
> > >> +     if (pgprot_val(prot))
> > >> +             set_pte_at(NULL, vaddr, pte,
> > >> +                     pfn_pte(phys >> PAGE_SHIFT, prot));
> > >> +     else
> > >> +             pte_clear(NULL, vaddr, pte);
> > >> +
> > >> +     /*
> > >> +      * Given the potential a15 tlbi errata, we can only do tlb flushes
> > >> +      * with interrupts disabled. Callers must have taken care of this.
> > >> +      */
> > >> +     WARN_ON_ONCE(!irqs_disabled());
> > >> +     flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
> > >
> > > Aha, this explains why we were confusing each other! The issue is that
> > > interrupts must be *enabled*, so this code does the exact opposite of
> > > what we need.
> > >
> > > I think this got lost in a sea of double negatives during the last round
> > > of review.
> > 
> > Ah! If this is the case, perhaps we can get away with
> > local_flush_tlb_kernel_range() then?
> 
> That's a bit tricky, since you need to ensure that preemption is disabled
> until the mapping is put back like it was.

Okay, under both real hardware with the errata, and under QEMU, things seem
to work with this change to the series. What do you think?

Thanks!

-Kees

diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
index b050ed7..8558d6b 100644
--- a/arch/arm/kernel/patch.c
+++ b/arch/arm/kernel/patch.c
@@ -37,6 +37,7 @@ void __kprobes *patch_map(void *addr, int fixmap, unsigned long *flags)
 	else
 		__acquire(&patch_lock);
 
+	preempt_disable();
 	set_fixmap(fixmap, page_to_phys(page));
 
 	return (void *) (__fix_to_virt(fixmap) + (uintaddr & ~PAGE_MASK));
@@ -46,6 +47,7 @@ void __kprobes patch_unmap(int fixmap, unsigned long *flags)
 	__releases(&patch_lock)
 {
 	clear_fixmap(fixmap);
+	preempt_enable();
 
 	if (flags)
 		spin_unlock_irqrestore(&patch_lock, *flags);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index fab8583..efaf74c 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -368,7 +368,7 @@ void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
 			pfn_pte(phys >> PAGE_SHIFT, prot));
 	else
 		pte_clear(NULL, vaddr, pte);
-	flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
+	local_flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
 }
 
 /*

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-08 19:16           ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-08 19:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 04, 2014 at 06:27:48PM +0100, Will Deacon wrote:
> On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
> > On Thu, Sep 4, 2014 at 10:03 AM, Will Deacon <will.deacon@arm.com> wrote:
> > > Hi Kees,
> > >
> > > On Wed, Sep 03, 2014 at 10:57:04PM +0100, Kees Cook wrote:
> > >> This is used from set_fixmap() and clear_fixmap() via asm-generic/fixmap.h.
> > >> Also makes sure that the fixmap allocation fits into the expected range.
> > >>
> > >> Based on patch by Rabin Vincent.
> > >
> > > [...]
> > >
> > >> +void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
> > >> +{
> > >> +     unsigned long vaddr = __fix_to_virt(idx);
> > >> +     pte_t *pte = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
> > >> +
> > >> +     /* Make sure fixmap region does not exceed available allocation. */
> > >> +     BUILD_BUG_ON(FIXADDR_START + (__end_of_fixed_addresses * PAGE_SIZE) >
> > >> +                  FIXADDR_END);
> > >> +     BUG_ON(idx >= __end_of_fixed_addresses);
> > >> +
> > >> +     if (pgprot_val(prot))
> > >> +             set_pte_at(NULL, vaddr, pte,
> > >> +                     pfn_pte(phys >> PAGE_SHIFT, prot));
> > >> +     else
> > >> +             pte_clear(NULL, vaddr, pte);
> > >> +
> > >> +     /*
> > >> +      * Given the potential a15 tlbi errata, we can only do tlb flushes
> > >> +      * with interrupts disabled. Callers must have taken care of this.
> > >> +      */
> > >> +     WARN_ON_ONCE(!irqs_disabled());
> > >> +     flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
> > >
> > > Aha, this explains why we were confusing each other! The issue is that
> > > interrupts must be *enabled*, so this code does the exact opposite of
> > > what we need.
> > >
> > > I think this got lost in a sea of double negatives during the last round
> > > of review.
> > 
> > Ah! If this is the case, perhaps we can get away with
> > local_flush_tlb_kernel_range() then?
> 
> That's a bit tricky, since you need to ensure that preemption is disabled
> until the mapping is put back like it was.

Okay, under both real hardware with the errata, and under QEMU, things seem
to work with this change to the series. What do you think?

Thanks!

-Kees

diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
index b050ed7..8558d6b 100644
--- a/arch/arm/kernel/patch.c
+++ b/arch/arm/kernel/patch.c
@@ -37,6 +37,7 @@ void __kprobes *patch_map(void *addr, int fixmap, unsigned long *flags)
 	else
 		__acquire(&patch_lock);
 
+	preempt_disable();
 	set_fixmap(fixmap, page_to_phys(page));
 
 	return (void *) (__fix_to_virt(fixmap) + (uintaddr & ~PAGE_MASK));
@@ -46,6 +47,7 @@ void __kprobes patch_unmap(int fixmap, unsigned long *flags)
 	__releases(&patch_lock)
 {
 	clear_fixmap(fixmap);
+	preempt_enable();
 
 	if (flags)
 		spin_unlock_irqrestore(&patch_lock, *flags);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index fab8583..efaf74c 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -368,7 +368,7 @@ void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
 			pfn_pte(phys >> PAGE_SHIFT, prot));
 	else
 		pte_clear(NULL, vaddr, pte);
-	flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
+	local_flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
 }
 
 /*

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-08 19:16           ` Kees Cook
@ 2014-09-08 21:55             ` Rabin Vincent
  -1 siblings, 0 replies; 50+ messages in thread
From: Rabin Vincent @ 2014-09-08 21:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: Will Deacon, linux-kernel, Laura Abbott, Rob Herring,
	Leif Lindholm, msalter, Liu hua, Nikolay Borisov, Nicolas Pitre,
	Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

On Mon, Sep 08, 2014 at 12:16:34PM -0700, Kees Cook wrote:
> On Thu, Sep 04, 2014 at 06:27:48PM +0100, Will Deacon wrote:
> > On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
> > > Ah! If this is the case, perhaps we can get away with
> > > local_flush_tlb_kernel_range() then?
> > 
> > That's a bit tricky, since you need to ensure that preemption is disabled
> > until the mapping is put back like it was.
> 
> Okay, under both real hardware with the errata, and under QEMU, things seem
> to work with this change to the series. What do you think?

Preemption is already disabled until the mapping is put back in this
patch.c code because interrupts are disabled from before the time
set_fixmap() is called until after clear_fixmap() is called.

I'd guess that Will meant other (future) callers of set_fixmap() would
have to ensure similar behaviour with set_fixmap() / clear_fixmap().

Unless I'm missing something set/clear_fixmap() seem to be quite arch
specific and only really used on x86, so we could ensure that future
users on ARM perform the correct tlb flush:  the first user on ARM with
a non-atomic context (or you) could implement a set_fixmap() which does
the global flush and have this patch.c (and any other atomic context
callers) call __set_fixmap() directly.

The change to local_flush_tlb_kernel_range() in __set_fixmap() would of
course be needed in that case, and IIRC that was what my original patch
had (via set_top_pte()).

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-08 21:55             ` Rabin Vincent
  0 siblings, 0 replies; 50+ messages in thread
From: Rabin Vincent @ 2014-09-08 21:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 08, 2014 at 12:16:34PM -0700, Kees Cook wrote:
> On Thu, Sep 04, 2014 at 06:27:48PM +0100, Will Deacon wrote:
> > On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
> > > Ah! If this is the case, perhaps we can get away with
> > > local_flush_tlb_kernel_range() then?
> > 
> > That's a bit tricky, since you need to ensure that preemption is disabled
> > until the mapping is put back like it was.
> 
> Okay, under both real hardware with the errata, and under QEMU, things seem
> to work with this change to the series. What do you think?

Preemption is already disabled until the mapping is put back in this
patch.c code because interrupts are disabled from before the time
set_fixmap() is called until after clear_fixmap() is called.

I'd guess that Will meant other (future) callers of set_fixmap() would
have to ensure similar behaviour with set_fixmap() / clear_fixmap().

Unless I'm missing something set/clear_fixmap() seem to be quite arch
specific and only really used on x86, so we could ensure that future
users on ARM perform the correct tlb flush:  the first user on ARM with
a non-atomic context (or you) could implement a set_fixmap() which does
the global flush and have this patch.c (and any other atomic context
callers) call __set_fixmap() directly.

The change to local_flush_tlb_kernel_range() in __set_fixmap() would of
course be needed in that case, and IIRC that was what my original patch
had (via set_top_pte()).

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-08 21:55             ` Rabin Vincent
@ 2014-09-08 22:40               ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-08 22:40 UTC (permalink / raw)
  To: Rabin Vincent
  Cc: Will Deacon, linux-kernel, Laura Abbott, Rob Herring,
	Leif Lindholm, msalter, Liu hua, Nikolay Borisov, Nicolas Pitre,
	Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

On Mon, Sep 8, 2014 at 2:55 PM, Rabin Vincent <rabin@rab.in> wrote:
> On Mon, Sep 08, 2014 at 12:16:34PM -0700, Kees Cook wrote:
>> On Thu, Sep 04, 2014 at 06:27:48PM +0100, Will Deacon wrote:
>> > On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
>> > > Ah! If this is the case, perhaps we can get away with
>> > > local_flush_tlb_kernel_range() then?
>> >
>> > That's a bit tricky, since you need to ensure that preemption is disabled
>> > until the mapping is put back like it was.
>>
>> Okay, under both real hardware with the errata, and under QEMU, things seem
>> to work with this change to the series. What do you think?
>
> Preemption is already disabled until the mapping is put back in this
> patch.c code because interrupts are disabled from before the time
> set_fixmap() is called until after clear_fixmap() is called.

Should I drop the preempt_disable/enable(), and just add a comment to
set_fixmap()?

> I'd guess that Will meant other (future) callers of set_fixmap() would
> have to ensure similar behaviour with set_fixmap() / clear_fixmap().
>
> Unless I'm missing something set/clear_fixmap() seem to be quite arch
> specific and only really used on x86, so we could ensure that future
> users on ARM perform the correct tlb flush:  the first user on ARM with
> a non-atomic context (or you) could implement a set_fixmap() which does
> the global flush and have this patch.c (and any other atomic context
> callers) call __set_fixmap() directly.
>
> The change to local_flush_tlb_kernel_range() in __set_fixmap() would of
> course be needed in that case, and IIRC that was what my original patch
> had (via set_top_pte()).

Ah, so it was, yes! Will, which version of this logic would you prefer?

Thanks!

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-08 22:40               ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-08 22:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 8, 2014 at 2:55 PM, Rabin Vincent <rabin@rab.in> wrote:
> On Mon, Sep 08, 2014 at 12:16:34PM -0700, Kees Cook wrote:
>> On Thu, Sep 04, 2014 at 06:27:48PM +0100, Will Deacon wrote:
>> > On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
>> > > Ah! If this is the case, perhaps we can get away with
>> > > local_flush_tlb_kernel_range() then?
>> >
>> > That's a bit tricky, since you need to ensure that preemption is disabled
>> > until the mapping is put back like it was.
>>
>> Okay, under both real hardware with the errata, and under QEMU, things seem
>> to work with this change to the series. What do you think?
>
> Preemption is already disabled until the mapping is put back in this
> patch.c code because interrupts are disabled from before the time
> set_fixmap() is called until after clear_fixmap() is called.

Should I drop the preempt_disable/enable(), and just add a comment to
set_fixmap()?

> I'd guess that Will meant other (future) callers of set_fixmap() would
> have to ensure similar behaviour with set_fixmap() / clear_fixmap().
>
> Unless I'm missing something set/clear_fixmap() seem to be quite arch
> specific and only really used on x86, so we could ensure that future
> users on ARM perform the correct tlb flush:  the first user on ARM with
> a non-atomic context (or you) could implement a set_fixmap() which does
> the global flush and have this patch.c (and any other atomic context
> callers) call __set_fixmap() directly.
>
> The change to local_flush_tlb_kernel_range() in __set_fixmap() would of
> course be needed in that case, and IIRC that was what my original patch
> had (via set_top_pte()).

Ah, so it was, yes! Will, which version of this logic would you prefer?

Thanks!

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-08 22:40               ` Kees Cook
@ 2014-09-09 12:38                 ` Will Deacon
  -1 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2014-09-09 12:38 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rabin Vincent, linux-kernel, Laura Abbott, Rob Herring,
	Leif Lindholm, msalter, Liu hua, Nikolay Borisov, Nicolas Pitre,
	Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

On Mon, Sep 08, 2014 at 11:40:43PM +0100, Kees Cook wrote:
> On Mon, Sep 8, 2014 at 2:55 PM, Rabin Vincent <rabin@rab.in> wrote:
> > On Mon, Sep 08, 2014 at 12:16:34PM -0700, Kees Cook wrote:
> >> On Thu, Sep 04, 2014 at 06:27:48PM +0100, Will Deacon wrote:
> >> > On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
> >> > > Ah! If this is the case, perhaps we can get away with
> >> > > local_flush_tlb_kernel_range() then?
> >> >
> >> > That's a bit tricky, since you need to ensure that preemption is disabled
> >> > until the mapping is put back like it was.
> >>
> >> Okay, under both real hardware with the errata, and under QEMU, things seem
> >> to work with this change to the series. What do you think?
> >
> > Preemption is already disabled until the mapping is put back in this
> > patch.c code because interrupts are disabled from before the time
> > set_fixmap() is called until after clear_fixmap() is called.
> 
> Should I drop the preempt_disable/enable(), and just add a comment to
> set_fixmap()?
> 
> > I'd guess that Will meant other (future) callers of set_fixmap() would
> > have to ensure similar behaviour with set_fixmap() / clear_fixmap().
> >
> > Unless I'm missing something set/clear_fixmap() seem to be quite arch
> > specific and only really used on x86, so we could ensure that future
> > users on ARM perform the correct tlb flush:  the first user on ARM with
> > a non-atomic context (or you) could implement a set_fixmap() which does
> > the global flush and have this patch.c (and any other atomic context
> > callers) call __set_fixmap() directly.
> >
> > The change to local_flush_tlb_kernel_range() in __set_fixmap() would of
> > course be needed in that case, and IIRC that was what my original patch
> > had (via set_top_pte()).
> 
> Ah, so it was, yes! Will, which version of this logic would you prefer?

I still don't think we're solving the general problem here -- we're actually
just making the ftrace case work. It is perfectly possible for another CPU
to undergo a TLB miss and refill whilst the page table is being modified by
the CPU with preemption disabled. In this case, a local tlb flush won't
invalidate that entry on the other core, and we have no way of knowing when
the original permissions are actually observed across the system.

So I think we need to figure out a way to invalidate the TLB properly. What
do architectures that use IPIs for TLB broadcasting do (x86, some powerpc,
mips, ...)? They must have exactly the same problem.

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-09 12:38                 ` Will Deacon
  0 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2014-09-09 12:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 08, 2014 at 11:40:43PM +0100, Kees Cook wrote:
> On Mon, Sep 8, 2014 at 2:55 PM, Rabin Vincent <rabin@rab.in> wrote:
> > On Mon, Sep 08, 2014 at 12:16:34PM -0700, Kees Cook wrote:
> >> On Thu, Sep 04, 2014 at 06:27:48PM +0100, Will Deacon wrote:
> >> > On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
> >> > > Ah! If this is the case, perhaps we can get away with
> >> > > local_flush_tlb_kernel_range() then?
> >> >
> >> > That's a bit tricky, since you need to ensure that preemption is disabled
> >> > until the mapping is put back like it was.
> >>
> >> Okay, under both real hardware with the errata, and under QEMU, things seem
> >> to work with this change to the series. What do you think?
> >
> > Preemption is already disabled until the mapping is put back in this
> > patch.c code because interrupts are disabled from before the time
> > set_fixmap() is called until after clear_fixmap() is called.
> 
> Should I drop the preempt_disable/enable(), and just add a comment to
> set_fixmap()?
> 
> > I'd guess that Will meant other (future) callers of set_fixmap() would
> > have to ensure similar behaviour with set_fixmap() / clear_fixmap().
> >
> > Unless I'm missing something set/clear_fixmap() seem to be quite arch
> > specific and only really used on x86, so we could ensure that future
> > users on ARM perform the correct tlb flush:  the first user on ARM with
> > a non-atomic context (or you) could implement a set_fixmap() which does
> > the global flush and have this patch.c (and any other atomic context
> > callers) call __set_fixmap() directly.
> >
> > The change to local_flush_tlb_kernel_range() in __set_fixmap() would of
> > course be needed in that case, and IIRC that was what my original patch
> > had (via set_top_pte()).
> 
> Ah, so it was, yes! Will, which version of this logic would you prefer?

I still don't think we're solving the general problem here -- we're actually
just making the ftrace case work. It is perfectly possible for another CPU
to undergo a TLB miss and refill whilst the page table is being modified by
the CPU with preemption disabled. In this case, a local tlb flush won't
invalidate that entry on the other core, and we have no way of knowing when
the original permissions are actually observed across the system.

So I think we need to figure out a way to invalidate the TLB properly. What
do architectures that use IPIs for TLB broadcasting do (x86, some powerpc,
mips, ...)? They must have exactly the same problem.

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-09 12:38                 ` Will Deacon
@ 2014-09-09 14:33                   ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-09 14:33 UTC (permalink / raw)
  To: Will Deacon
  Cc: Rabin Vincent, linux-kernel, Laura Abbott, Rob Herring,
	Leif Lindholm, msalter, Liu hua, Nikolay Borisov, Nicolas Pitre,
	Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

On Tue, Sep 9, 2014 at 5:38 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Mon, Sep 08, 2014 at 11:40:43PM +0100, Kees Cook wrote:
>> On Mon, Sep 8, 2014 at 2:55 PM, Rabin Vincent <rabin@rab.in> wrote:
>> > On Mon, Sep 08, 2014 at 12:16:34PM -0700, Kees Cook wrote:
>> >> On Thu, Sep 04, 2014 at 06:27:48PM +0100, Will Deacon wrote:
>> >> > On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
>> >> > > Ah! If this is the case, perhaps we can get away with
>> >> > > local_flush_tlb_kernel_range() then?
>> >> >
>> >> > That's a bit tricky, since you need to ensure that preemption is disabled
>> >> > until the mapping is put back like it was.
>> >>
>> >> Okay, under both real hardware with the errata, and under QEMU, things seem
>> >> to work with this change to the series. What do you think?
>> >
>> > Preemption is already disabled until the mapping is put back in this
>> > patch.c code because interrupts are disabled from before the time
>> > set_fixmap() is called until after clear_fixmap() is called.
>>
>> Should I drop the preempt_disable/enable(), and just add a comment to
>> set_fixmap()?
>>
>> > I'd guess that Will meant other (future) callers of set_fixmap() would
>> > have to ensure similar behaviour with set_fixmap() / clear_fixmap().
>> >
>> > Unless I'm missing something set/clear_fixmap() seem to be quite arch
>> > specific and only really used on x86, so we could ensure that future
>> > users on ARM perform the correct tlb flush:  the first user on ARM with
>> > a non-atomic context (or you) could implement a set_fixmap() which does
>> > the global flush and have this patch.c (and any other atomic context
>> > callers) call __set_fixmap() directly.
>> >
>> > The change to local_flush_tlb_kernel_range() in __set_fixmap() would of
>> > course be needed in that case, and IIRC that was what my original patch
>> > had (via set_top_pte()).
>>
>> Ah, so it was, yes! Will, which version of this logic would you prefer?
>
> I still don't think we're solving the general problem here -- we're actually
> just making the ftrace case work. It is perfectly possible for another CPU
> to undergo a TLB miss and refill whilst the page table is being modified by
> the CPU with preemption disabled. In this case, a local tlb flush won't
> invalidate that entry on the other core, and we have no way of knowing when
> the original permissions are actually observed across the system.

The fixmap is used by anything doing patching _except_ ftrace,
actually. It's used by jump labels, kprobes, and kgdb. This code is
the general case. Access to set_fixmap is done via the kernel patching
interface: patch_text().

Right now, the patch_text interface checks cache_ops_need_broadcast(),
and conditionally runs under stop_machine(). We could make this
unconditional, and we'll avoid any problem with TLB misses on another
CPU.

> So I think we need to figure out a way to invalidate the TLB properly. What
> do architectures that use IPIs for TLB broadcasting do (x86, some powerpc,
> mips, ...)? They must have exactly the same problem.

I don't think this should be done at the set_fixmap level, as it is
more a primitive. I think making sure patch_text() always works would
be best. What do you think of using an unconditional stop_machine()
instead?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-09 14:33                   ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-09 14:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 9, 2014 at 5:38 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Mon, Sep 08, 2014 at 11:40:43PM +0100, Kees Cook wrote:
>> On Mon, Sep 8, 2014 at 2:55 PM, Rabin Vincent <rabin@rab.in> wrote:
>> > On Mon, Sep 08, 2014 at 12:16:34PM -0700, Kees Cook wrote:
>> >> On Thu, Sep 04, 2014 at 06:27:48PM +0100, Will Deacon wrote:
>> >> > On Thu, Sep 04, 2014 at 06:23:42PM +0100, Kees Cook wrote:
>> >> > > Ah! If this is the case, perhaps we can get away with
>> >> > > local_flush_tlb_kernel_range() then?
>> >> >
>> >> > That's a bit tricky, since you need to ensure that preemption is disabled
>> >> > until the mapping is put back like it was.
>> >>
>> >> Okay, under both real hardware with the errata, and under QEMU, things seem
>> >> to work with this change to the series. What do you think?
>> >
>> > Preemption is already disabled until the mapping is put back in this
>> > patch.c code because interrupts are disabled from before the time
>> > set_fixmap() is called until after clear_fixmap() is called.
>>
>> Should I drop the preempt_disable/enable(), and just add a comment to
>> set_fixmap()?
>>
>> > I'd guess that Will meant other (future) callers of set_fixmap() would
>> > have to ensure similar behaviour with set_fixmap() / clear_fixmap().
>> >
>> > Unless I'm missing something set/clear_fixmap() seem to be quite arch
>> > specific and only really used on x86, so we could ensure that future
>> > users on ARM perform the correct tlb flush:  the first user on ARM with
>> > a non-atomic context (or you) could implement a set_fixmap() which does
>> > the global flush and have this patch.c (and any other atomic context
>> > callers) call __set_fixmap() directly.
>> >
>> > The change to local_flush_tlb_kernel_range() in __set_fixmap() would of
>> > course be needed in that case, and IIRC that was what my original patch
>> > had (via set_top_pte()).
>>
>> Ah, so it was, yes! Will, which version of this logic would you prefer?
>
> I still don't think we're solving the general problem here -- we're actually
> just making the ftrace case work. It is perfectly possible for another CPU
> to undergo a TLB miss and refill whilst the page table is being modified by
> the CPU with preemption disabled. In this case, a local tlb flush won't
> invalidate that entry on the other core, and we have no way of knowing when
> the original permissions are actually observed across the system.

The fixmap is used by anything doing patching _except_ ftrace,
actually. It's used by jump labels, kprobes, and kgdb. This code is
the general case. Access to set_fixmap is done via the kernel patching
interface: patch_text().

Right now, the patch_text interface checks cache_ops_need_broadcast(),
and conditionally runs under stop_machine(). We could make this
unconditional, and we'll avoid any problem with TLB misses on another
CPU.

> So I think we need to figure out a way to invalidate the TLB properly. What
> do architectures that use IPIs for TLB broadcasting do (x86, some powerpc,
> mips, ...)? They must have exactly the same problem.

I don't think this should be done at the set_fixmap level, as it is
more a primitive. I think making sure patch_text() always works would
be best. What do you think of using an unconditional stop_machine()
instead?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-09 14:33                   ` Kees Cook
@ 2014-09-10 17:51                     ` Will Deacon
  -1 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2014-09-10 17:51 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rabin Vincent, linux-kernel, Laura Abbott, Rob Herring,
	Leif Lindholm, msalter, Liu hua, Nikolay Borisov, Nicolas Pitre,
	Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

Hi Kees,

On Tue, Sep 09, 2014 at 03:33:11PM +0100, Kees Cook wrote:
> On Tue, Sep 9, 2014 at 5:38 AM, Will Deacon <will.deacon@arm.com> wrote:
> > On Mon, Sep 08, 2014 at 11:40:43PM +0100, Kees Cook wrote:
> >> Ah, so it was, yes! Will, which version of this logic would you prefer?
> >
> > I still don't think we're solving the general problem here -- we're actually
> > just making the ftrace case work. It is perfectly possible for another CPU
> > to undergo a TLB miss and refill whilst the page table is being modified by
> > the CPU with preemption disabled. In this case, a local tlb flush won't
> > invalidate that entry on the other core, and we have no way of knowing when
> > the original permissions are actually observed across the system.
> 
> The fixmap is used by anything doing patching _except_ ftrace,
> actually. It's used by jump labels, kprobes, and kgdb. This code is
> the general case. Access to set_fixmap is done via the kernel patching
> interface: patch_text().
> 
> Right now, the patch_text interface checks cache_ops_need_broadcast(),
> and conditionally runs under stop_machine(). We could make this
> unconditional, and we'll avoid any problem with TLB misses on another
> CPU.

Yes, it we always use stop_machine, that solves the TLB broadcast problem
and we could do that if CONFIG_ARM_ERRATA_798181 is set.

> > So I think we need to figure out a way to invalidate the TLB properly. What
> > do architectures that use IPIs for TLB broadcasting do (x86, some powerpc,
> > mips, ...)? They must have exactly the same problem.
> 
> I don't think this should be done at the set_fixmap level, as it is
> more a primitive. I think making sure patch_text() always works would
> be best. What do you think of using an unconditional stop_machine()
> instead?

Why not move the TLB invalidation into patch_text, then we can do
stop_machine if IS_ENABLED(CONFIG_ARM_ERRATA_798181) ||
tlb_ops_need_broadcast()?

Then that just leaves ftrace.

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-10 17:51                     ` Will Deacon
  0 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2014-09-10 17:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Kees,

On Tue, Sep 09, 2014 at 03:33:11PM +0100, Kees Cook wrote:
> On Tue, Sep 9, 2014 at 5:38 AM, Will Deacon <will.deacon@arm.com> wrote:
> > On Mon, Sep 08, 2014 at 11:40:43PM +0100, Kees Cook wrote:
> >> Ah, so it was, yes! Will, which version of this logic would you prefer?
> >
> > I still don't think we're solving the general problem here -- we're actually
> > just making the ftrace case work. It is perfectly possible for another CPU
> > to undergo a TLB miss and refill whilst the page table is being modified by
> > the CPU with preemption disabled. In this case, a local tlb flush won't
> > invalidate that entry on the other core, and we have no way of knowing when
> > the original permissions are actually observed across the system.
> 
> The fixmap is used by anything doing patching _except_ ftrace,
> actually. It's used by jump labels, kprobes, and kgdb. This code is
> the general case. Access to set_fixmap is done via the kernel patching
> interface: patch_text().
> 
> Right now, the patch_text interface checks cache_ops_need_broadcast(),
> and conditionally runs under stop_machine(). We could make this
> unconditional, and we'll avoid any problem with TLB misses on another
> CPU.

Yes, it we always use stop_machine, that solves the TLB broadcast problem
and we could do that if CONFIG_ARM_ERRATA_798181 is set.

> > So I think we need to figure out a way to invalidate the TLB properly. What
> > do architectures that use IPIs for TLB broadcasting do (x86, some powerpc,
> > mips, ...)? They must have exactly the same problem.
> 
> I don't think this should be done at the set_fixmap level, as it is
> more a primitive. I think making sure patch_text() always works would
> be best. What do you think of using an unconditional stop_machine()
> instead?

Why not move the TLB invalidation into patch_text, then we can do
stop_machine if IS_ENABLED(CONFIG_ARM_ERRATA_798181) ||
tlb_ops_need_broadcast()?

Then that just leaves ftrace.

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-10 17:51                     ` Will Deacon
@ 2014-09-11 15:27                       ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-11 15:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: Rabin Vincent, linux-kernel, Laura Abbott, Rob Herring,
	Leif Lindholm, msalter, Liu hua, Nikolay Borisov, Nicolas Pitre,
	Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

On Wed, Sep 10, 2014 at 10:51 AM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Kees,
>
> On Tue, Sep 09, 2014 at 03:33:11PM +0100, Kees Cook wrote:
>> On Tue, Sep 9, 2014 at 5:38 AM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Mon, Sep 08, 2014 at 11:40:43PM +0100, Kees Cook wrote:
>> >> Ah, so it was, yes! Will, which version of this logic would you prefer?
>> >
>> > I still don't think we're solving the general problem here -- we're actually
>> > just making the ftrace case work. It is perfectly possible for another CPU
>> > to undergo a TLB miss and refill whilst the page table is being modified by
>> > the CPU with preemption disabled. In this case, a local tlb flush won't
>> > invalidate that entry on the other core, and we have no way of knowing when
>> > the original permissions are actually observed across the system.
>>
>> The fixmap is used by anything doing patching _except_ ftrace,
>> actually. It's used by jump labels, kprobes, and kgdb. This code is
>> the general case. Access to set_fixmap is done via the kernel patching
>> interface: patch_text().
>>
>> Right now, the patch_text interface checks cache_ops_need_broadcast(),
>> and conditionally runs under stop_machine(). We could make this
>> unconditional, and we'll avoid any problem with TLB misses on another
>> CPU.
>
> Yes, it we always use stop_machine, that solves the TLB broadcast problem
> and we could do that if CONFIG_ARM_ERRATA_798181 is set.

Okay, sounds good.

>
>> > So I think we need to figure out a way to invalidate the TLB properly. What
>> > do architectures that use IPIs for TLB broadcasting do (x86, some powerpc,
>> > mips, ...)? They must have exactly the same problem.
>>
>> I don't think this should be done at the set_fixmap level, as it is
>> more a primitive. I think making sure patch_text() always works would
>> be best. What do you think of using an unconditional stop_machine()
>> instead?
>
> Why not move the TLB invalidation into patch_text, then we can do
> stop_machine if IS_ENABLED(CONFIG_ARM_ERRATA_798181) ||
> tlb_ops_need_broadcast()?

The (local) TLB flush needs to happen for patch_text to do its work,
so I'd rather it stay in set_fixmap, otherwise the flush calls will
have to follow each call of set_fixmap.

Is there a reason tlb_ops_need_broadcast() doesn't check
IS_ENABLED(CONFIG_ARM_ERRATA_798181) itself?

> Then that just leaves ftrace.

ftrace is already covered by stop_machine. Is there something I missing there?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-11 15:27                       ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-11 15:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 10, 2014 at 10:51 AM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Kees,
>
> On Tue, Sep 09, 2014 at 03:33:11PM +0100, Kees Cook wrote:
>> On Tue, Sep 9, 2014 at 5:38 AM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Mon, Sep 08, 2014 at 11:40:43PM +0100, Kees Cook wrote:
>> >> Ah, so it was, yes! Will, which version of this logic would you prefer?
>> >
>> > I still don't think we're solving the general problem here -- we're actually
>> > just making the ftrace case work. It is perfectly possible for another CPU
>> > to undergo a TLB miss and refill whilst the page table is being modified by
>> > the CPU with preemption disabled. In this case, a local tlb flush won't
>> > invalidate that entry on the other core, and we have no way of knowing when
>> > the original permissions are actually observed across the system.
>>
>> The fixmap is used by anything doing patching _except_ ftrace,
>> actually. It's used by jump labels, kprobes, and kgdb. This code is
>> the general case. Access to set_fixmap is done via the kernel patching
>> interface: patch_text().
>>
>> Right now, the patch_text interface checks cache_ops_need_broadcast(),
>> and conditionally runs under stop_machine(). We could make this
>> unconditional, and we'll avoid any problem with TLB misses on another
>> CPU.
>
> Yes, it we always use stop_machine, that solves the TLB broadcast problem
> and we could do that if CONFIG_ARM_ERRATA_798181 is set.

Okay, sounds good.

>
>> > So I think we need to figure out a way to invalidate the TLB properly. What
>> > do architectures that use IPIs for TLB broadcasting do (x86, some powerpc,
>> > mips, ...)? They must have exactly the same problem.
>>
>> I don't think this should be done at the set_fixmap level, as it is
>> more a primitive. I think making sure patch_text() always works would
>> be best. What do you think of using an unconditional stop_machine()
>> instead?
>
> Why not move the TLB invalidation into patch_text, then we can do
> stop_machine if IS_ENABLED(CONFIG_ARM_ERRATA_798181) ||
> tlb_ops_need_broadcast()?

The (local) TLB flush needs to happen for patch_text to do its work,
so I'd rather it stay in set_fixmap, otherwise the flush calls will
have to follow each call of set_fixmap.

Is there a reason tlb_ops_need_broadcast() doesn't check
IS_ENABLED(CONFIG_ARM_ERRATA_798181) itself?

> Then that just leaves ftrace.

ftrace is already covered by stop_machine. Is there something I missing there?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-11 15:27                       ` Kees Cook
@ 2014-09-11 16:05                         ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-11 16:05 UTC (permalink / raw)
  To: Will Deacon
  Cc: Rabin Vincent, linux-kernel, Laura Abbott, Rob Herring,
	Leif Lindholm, msalter, Liu hua, Nikolay Borisov, Nicolas Pitre,
	Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

On Thu, Sep 11, 2014 at 8:27 AM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Sep 10, 2014 at 10:51 AM, Will Deacon <will.deacon@arm.com> wrote:
>> Hi Kees,
>>
>> On Tue, Sep 09, 2014 at 03:33:11PM +0100, Kees Cook wrote:
>>> On Tue, Sep 9, 2014 at 5:38 AM, Will Deacon <will.deacon@arm.com> wrote:
>>> > On Mon, Sep 08, 2014 at 11:40:43PM +0100, Kees Cook wrote:
>>> >> Ah, so it was, yes! Will, which version of this logic would you prefer?
>>> >
>>> > I still don't think we're solving the general problem here -- we're actually
>>> > just making the ftrace case work. It is perfectly possible for another CPU
>>> > to undergo a TLB miss and refill whilst the page table is being modified by
>>> > the CPU with preemption disabled. In this case, a local tlb flush won't
>>> > invalidate that entry on the other core, and we have no way of knowing when
>>> > the original permissions are actually observed across the system.
>>>
>>> The fixmap is used by anything doing patching _except_ ftrace,
>>> actually. It's used by jump labels, kprobes, and kgdb. This code is
>>> the general case. Access to set_fixmap is done via the kernel patching
>>> interface: patch_text().
>>>
>>> Right now, the patch_text interface checks cache_ops_need_broadcast(),
>>> and conditionally runs under stop_machine(). We could make this
>>> unconditional, and we'll avoid any problem with TLB misses on another
>>> CPU.
>>
>> Yes, it we always use stop_machine, that solves the TLB broadcast problem
>> and we could do that if CONFIG_ARM_ERRATA_798181 is set.
>
> Okay, sounds good.
>
>>
>>> > So I think we need to figure out a way to invalidate the TLB properly. What
>>> > do architectures that use IPIs for TLB broadcasting do (x86, some powerpc,
>>> > mips, ...)? They must have exactly the same problem.
>>>
>>> I don't think this should be done at the set_fixmap level, as it is
>>> more a primitive. I think making sure patch_text() always works would
>>> be best. What do you think of using an unconditional stop_machine()
>>> instead?
>>
>> Why not move the TLB invalidation into patch_text, then we can do
>> stop_machine if IS_ENABLED(CONFIG_ARM_ERRATA_798181) ||
>> tlb_ops_need_broadcast()?
>
> The (local) TLB flush needs to happen for patch_text to do its work,
> so I'd rather it stay in set_fixmap, otherwise the flush calls will
> have to follow each call of set_fixmap.
>
> Is there a reason tlb_ops_need_broadcast() doesn't check
> IS_ENABLED(CONFIG_ARM_ERRATA_798181) itself?

Actually, this doesn't make sense. If we're using
local_flush_tlb_kernel_range() in set_fixmap, we must always run under
stop_machine. The needs-broadcast case is solved by using
local_flush_tlb_kernel_range(), and the TLB-miss-on-other-CPU case is
solved by using stop_machine(). This is how the ftrace case work,
though not via fixmap.

Since we need to flush the TLB on each fixmap change during
patch_text(), if we want to make the local_flush_tlb_... optionally
use flush_tlb_... to avoid calling stop_machine in the
does't-need-broadcast case, then we'd be checking in multiple places,
making this code overly complex for this rare operation. Is there a
good reason to complicate this code to avoid stop_machine()?

I think we should just do this:

diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
index 07314af47733..5038960e3c55 100644
--- a/arch/arm/kernel/patch.c
+++ b/arch/arm/kernel/patch.c
@@ -60,16 +125,5 @@ void __kprobes patch_text(void *addr, unsigned int insn)
                .insn = insn,
        };

-       if (cache_ops_need_broadcast()) {
-               stop_machine(patch_text_stop_machine, &patch, cpu_online_mask);
-       } else {
-               bool straddles_word = IS_ENABLED(CONFIG_THUMB2_KERNEL)
-                                     && __opcode_is_thumb32(insn)
-                                     && ((uintptr_t)addr & 2);
-
-               if (straddles_word)
-                       stop_machine(patch_text_stop_machine, &patch, NULL);
-               else
-                       __patch_text(addr, insn);
-       }
+       stop_machine(patch_text_stop_machine, &patch, NULL);
 }


-Kees

>> Then that just leaves ftrace.
>
> ftrace is already covered by stop_machine. Is there something I missing there?
>
> -Kees
>
> --
> Kees Cook
> Chrome OS Security



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-11 16:05                         ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-11 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 11, 2014 at 8:27 AM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Sep 10, 2014 at 10:51 AM, Will Deacon <will.deacon@arm.com> wrote:
>> Hi Kees,
>>
>> On Tue, Sep 09, 2014 at 03:33:11PM +0100, Kees Cook wrote:
>>> On Tue, Sep 9, 2014 at 5:38 AM, Will Deacon <will.deacon@arm.com> wrote:
>>> > On Mon, Sep 08, 2014 at 11:40:43PM +0100, Kees Cook wrote:
>>> >> Ah, so it was, yes! Will, which version of this logic would you prefer?
>>> >
>>> > I still don't think we're solving the general problem here -- we're actually
>>> > just making the ftrace case work. It is perfectly possible for another CPU
>>> > to undergo a TLB miss and refill whilst the page table is being modified by
>>> > the CPU with preemption disabled. In this case, a local tlb flush won't
>>> > invalidate that entry on the other core, and we have no way of knowing when
>>> > the original permissions are actually observed across the system.
>>>
>>> The fixmap is used by anything doing patching _except_ ftrace,
>>> actually. It's used by jump labels, kprobes, and kgdb. This code is
>>> the general case. Access to set_fixmap is done via the kernel patching
>>> interface: patch_text().
>>>
>>> Right now, the patch_text interface checks cache_ops_need_broadcast(),
>>> and conditionally runs under stop_machine(). We could make this
>>> unconditional, and we'll avoid any problem with TLB misses on another
>>> CPU.
>>
>> Yes, it we always use stop_machine, that solves the TLB broadcast problem
>> and we could do that if CONFIG_ARM_ERRATA_798181 is set.
>
> Okay, sounds good.
>
>>
>>> > So I think we need to figure out a way to invalidate the TLB properly. What
>>> > do architectures that use IPIs for TLB broadcasting do (x86, some powerpc,
>>> > mips, ...)? They must have exactly the same problem.
>>>
>>> I don't think this should be done at the set_fixmap level, as it is
>>> more a primitive. I think making sure patch_text() always works would
>>> be best. What do you think of using an unconditional stop_machine()
>>> instead?
>>
>> Why not move the TLB invalidation into patch_text, then we can do
>> stop_machine if IS_ENABLED(CONFIG_ARM_ERRATA_798181) ||
>> tlb_ops_need_broadcast()?
>
> The (local) TLB flush needs to happen for patch_text to do its work,
> so I'd rather it stay in set_fixmap, otherwise the flush calls will
> have to follow each call of set_fixmap.
>
> Is there a reason tlb_ops_need_broadcast() doesn't check
> IS_ENABLED(CONFIG_ARM_ERRATA_798181) itself?

Actually, this doesn't make sense. If we're using
local_flush_tlb_kernel_range() in set_fixmap, we must always run under
stop_machine. The needs-broadcast case is solved by using
local_flush_tlb_kernel_range(), and the TLB-miss-on-other-CPU case is
solved by using stop_machine(). This is how the ftrace case work,
though not via fixmap.

Since we need to flush the TLB on each fixmap change during
patch_text(), if we want to make the local_flush_tlb_... optionally
use flush_tlb_... to avoid calling stop_machine in the
does't-need-broadcast case, then we'd be checking in multiple places,
making this code overly complex for this rare operation. Is there a
good reason to complicate this code to avoid stop_machine()?

I think we should just do this:

diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
index 07314af47733..5038960e3c55 100644
--- a/arch/arm/kernel/patch.c
+++ b/arch/arm/kernel/patch.c
@@ -60,16 +125,5 @@ void __kprobes patch_text(void *addr, unsigned int insn)
                .insn = insn,
        };

-       if (cache_ops_need_broadcast()) {
-               stop_machine(patch_text_stop_machine, &patch, cpu_online_mask);
-       } else {
-               bool straddles_word = IS_ENABLED(CONFIG_THUMB2_KERNEL)
-                                     && __opcode_is_thumb32(insn)
-                                     && ((uintptr_t)addr & 2);
-
-               if (straddles_word)
-                       stop_machine(patch_text_stop_machine, &patch, NULL);
-               else
-                       __patch_text(addr, insn);
-       }
+       stop_machine(patch_text_stop_machine, &patch, NULL);
 }


-Kees

>> Then that just leaves ftrace.
>
> ftrace is already covered by stop_machine. Is there something I missing there?
>
> -Kees
>
> --
> Kees Cook
> Chrome OS Security



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-11 16:05                         ` Kees Cook
@ 2014-09-11 16:16                           ` Will Deacon
  -1 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2014-09-11 16:16 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rabin Vincent, linux-kernel, Laura Abbott, Rob Herring,
	Leif Lindholm, msalter, Liu hua, Nikolay Borisov, Nicolas Pitre,
	Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

On Thu, Sep 11, 2014 at 05:05:14PM +0100, Kees Cook wrote:
> Actually, this doesn't make sense. If we're using
> local_flush_tlb_kernel_range() in set_fixmap, we must always run under
> stop_machine. The needs-broadcast case is solved by using
> local_flush_tlb_kernel_range(), and the TLB-miss-on-other-CPU case is
> solved by using stop_machine(). This is how the ftrace case work,
> though not via fixmap.
> 
> Since we need to flush the TLB on each fixmap change during
> patch_text(), if we want to make the local_flush_tlb_... optionally
> use flush_tlb_... to avoid calling stop_machine in the
> does't-need-broadcast case, then we'd be checking in multiple places,
> making this code overly complex for this rare operation. Is there a
> good reason to complicate this code to avoid stop_machine()?
> 
> I think we should just do this:
> 
> diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
> index 07314af47733..5038960e3c55 100644
> --- a/arch/arm/kernel/patch.c
> +++ b/arch/arm/kernel/patch.c
> @@ -60,16 +125,5 @@ void __kprobes patch_text(void *addr, unsigned int insn)
>                 .insn = insn,
>         };
> 
> -       if (cache_ops_need_broadcast()) {
> -               stop_machine(patch_text_stop_machine, &patch, cpu_online_mask);
> -       } else {
> -               bool straddles_word = IS_ENABLED(CONFIG_THUMB2_KERNEL)
> -                                     && __opcode_is_thumb32(insn)
> -                                     && ((uintptr_t)addr & 2);
> -
> -               if (straddles_word)
> -                       stop_machine(patch_text_stop_machine, &patch, NULL);
> -               else
> -                       __patch_text(addr, insn);
> -       }
> +       stop_machine(patch_text_stop_machine, &patch, NULL);

The reason not to use stop machine is that it's very expensive and the
architecture does provide some guarantees that mean it's not always
required...

... however, as I pointed out in a kprobes thread the other day, the code
you remove above is actually broken, so I wouldn't be against replacing it
all with stop_machine.

Is there any way to you can WARN_ON(!called_under_stop_machine) in
set_fixmap before doing the local TLBI?

Will

P.S. If you plan on doing an equivalent series for arm64, you should do it
before we have any broken CPUs, then you can use TLBI without worry ;)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-11 16:16                           ` Will Deacon
  0 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2014-09-11 16:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 11, 2014 at 05:05:14PM +0100, Kees Cook wrote:
> Actually, this doesn't make sense. If we're using
> local_flush_tlb_kernel_range() in set_fixmap, we must always run under
> stop_machine. The needs-broadcast case is solved by using
> local_flush_tlb_kernel_range(), and the TLB-miss-on-other-CPU case is
> solved by using stop_machine(). This is how the ftrace case work,
> though not via fixmap.
> 
> Since we need to flush the TLB on each fixmap change during
> patch_text(), if we want to make the local_flush_tlb_... optionally
> use flush_tlb_... to avoid calling stop_machine in the
> does't-need-broadcast case, then we'd be checking in multiple places,
> making this code overly complex for this rare operation. Is there a
> good reason to complicate this code to avoid stop_machine()?
> 
> I think we should just do this:
> 
> diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
> index 07314af47733..5038960e3c55 100644
> --- a/arch/arm/kernel/patch.c
> +++ b/arch/arm/kernel/patch.c
> @@ -60,16 +125,5 @@ void __kprobes patch_text(void *addr, unsigned int insn)
>                 .insn = insn,
>         };
> 
> -       if (cache_ops_need_broadcast()) {
> -               stop_machine(patch_text_stop_machine, &patch, cpu_online_mask);
> -       } else {
> -               bool straddles_word = IS_ENABLED(CONFIG_THUMB2_KERNEL)
> -                                     && __opcode_is_thumb32(insn)
> -                                     && ((uintptr_t)addr & 2);
> -
> -               if (straddles_word)
> -                       stop_machine(patch_text_stop_machine, &patch, NULL);
> -               else
> -                       __patch_text(addr, insn);
> -       }
> +       stop_machine(patch_text_stop_machine, &patch, NULL);

The reason not to use stop machine is that it's very expensive and the
architecture does provide some guarantees that mean it's not always
required...

... however, as I pointed out in a kprobes thread the other day, the code
you remove above is actually broken, so I wouldn't be against replacing it
all with stop_machine.

Is there any way to you can WARN_ON(!called_under_stop_machine) in
set_fixmap before doing the local TLBI?

Will

P.S. If you plan on doing an equivalent series for arm64, you should do it
before we have any broken CPUs, then you can use TLBI without worry ;)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
  2014-09-11 16:16                           ` Will Deacon
@ 2014-09-11 16:27                             ` Kees Cook
  -1 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-11 16:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: Rabin Vincent, linux-kernel, Laura Abbott, Rob Herring,
	Leif Lindholm, msalter, Liu hua, Nikolay Borisov, Nicolas Pitre,
	Doug Anderson, Jason Wessel, Catalin Marinas,
	Russell King - ARM Linux, linux-arm-kernel

On Thu, Sep 11, 2014 at 9:16 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Sep 11, 2014 at 05:05:14PM +0100, Kees Cook wrote:
>> Actually, this doesn't make sense. If we're using
>> local_flush_tlb_kernel_range() in set_fixmap, we must always run under
>> stop_machine. The needs-broadcast case is solved by using
>> local_flush_tlb_kernel_range(), and the TLB-miss-on-other-CPU case is
>> solved by using stop_machine(). This is how the ftrace case work,
>> though not via fixmap.
>>
>> Since we need to flush the TLB on each fixmap change during
>> patch_text(), if we want to make the local_flush_tlb_... optionally
>> use flush_tlb_... to avoid calling stop_machine in the
>> does't-need-broadcast case, then we'd be checking in multiple places,
>> making this code overly complex for this rare operation. Is there a
>> good reason to complicate this code to avoid stop_machine()?
>>
>> I think we should just do this:
>>
>> diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
>> index 07314af47733..5038960e3c55 100644
>> --- a/arch/arm/kernel/patch.c
>> +++ b/arch/arm/kernel/patch.c
>> @@ -60,16 +125,5 @@ void __kprobes patch_text(void *addr, unsigned int insn)
>>                 .insn = insn,
>>         };
>>
>> -       if (cache_ops_need_broadcast()) {
>> -               stop_machine(patch_text_stop_machine, &patch, cpu_online_mask);
>> -       } else {
>> -               bool straddles_word = IS_ENABLED(CONFIG_THUMB2_KERNEL)
>> -                                     && __opcode_is_thumb32(insn)
>> -                                     && ((uintptr_t)addr & 2);
>> -
>> -               if (straddles_word)
>> -                       stop_machine(patch_text_stop_machine, &patch, NULL);
>> -               else
>> -                       __patch_text(addr, insn);
>> -       }
>> +       stop_machine(patch_text_stop_machine, &patch, NULL);
>
> The reason not to use stop machine is that it's very expensive and the
> architecture does provide some guarantees that mean it's not always
> required...
>
> ... however, as I pointed out in a kprobes thread the other day, the code
> you remove above is actually broken, so I wouldn't be against replacing it
> all with stop_machine.

Okay, sounds good.

> Is there any way to you can WARN_ON(!called_under_stop_machine) in
> set_fixmap before doing the local TLBI?

I'm not sure what the right tests for that would be. As I understand
it, this can be called during early boot too, but I'm not sure how to
distinguish either being early or being under stop machine. Rabin or
Rob, I think you've worked on this from the perspective of early boot,
any ideas?

In the meantime, I could add a comment on set_fixmap...

>
> Will
>
> P.S. If you plan on doing an equivalent series for arm64, you should do it
> before we have any broken CPUs, then you can use TLBI without worry ;)

Heh. Hopefully I can start testing arm64 soon. I think Laura sent a
series for CONFIG_DEBUG_RODATA already, so I'm hoping by the time I
get 64-bit hardware, this will already be done. ;)

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 3/8] arm: fixmap: implement __set_fixmap()
@ 2014-09-11 16:27                             ` Kees Cook
  0 siblings, 0 replies; 50+ messages in thread
From: Kees Cook @ 2014-09-11 16:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 11, 2014 at 9:16 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Sep 11, 2014 at 05:05:14PM +0100, Kees Cook wrote:
>> Actually, this doesn't make sense. If we're using
>> local_flush_tlb_kernel_range() in set_fixmap, we must always run under
>> stop_machine. The needs-broadcast case is solved by using
>> local_flush_tlb_kernel_range(), and the TLB-miss-on-other-CPU case is
>> solved by using stop_machine(). This is how the ftrace case work,
>> though not via fixmap.
>>
>> Since we need to flush the TLB on each fixmap change during
>> patch_text(), if we want to make the local_flush_tlb_... optionally
>> use flush_tlb_... to avoid calling stop_machine in the
>> does't-need-broadcast case, then we'd be checking in multiple places,
>> making this code overly complex for this rare operation. Is there a
>> good reason to complicate this code to avoid stop_machine()?
>>
>> I think we should just do this:
>>
>> diff --git a/arch/arm/kernel/patch.c b/arch/arm/kernel/patch.c
>> index 07314af47733..5038960e3c55 100644
>> --- a/arch/arm/kernel/patch.c
>> +++ b/arch/arm/kernel/patch.c
>> @@ -60,16 +125,5 @@ void __kprobes patch_text(void *addr, unsigned int insn)
>>                 .insn = insn,
>>         };
>>
>> -       if (cache_ops_need_broadcast()) {
>> -               stop_machine(patch_text_stop_machine, &patch, cpu_online_mask);
>> -       } else {
>> -               bool straddles_word = IS_ENABLED(CONFIG_THUMB2_KERNEL)
>> -                                     && __opcode_is_thumb32(insn)
>> -                                     && ((uintptr_t)addr & 2);
>> -
>> -               if (straddles_word)
>> -                       stop_machine(patch_text_stop_machine, &patch, NULL);
>> -               else
>> -                       __patch_text(addr, insn);
>> -       }
>> +       stop_machine(patch_text_stop_machine, &patch, NULL);
>
> The reason not to use stop machine is that it's very expensive and the
> architecture does provide some guarantees that mean it's not always
> required...
>
> ... however, as I pointed out in a kprobes thread the other day, the code
> you remove above is actually broken, so I wouldn't be against replacing it
> all with stop_machine.

Okay, sounds good.

> Is there any way to you can WARN_ON(!called_under_stop_machine) in
> set_fixmap before doing the local TLBI?

I'm not sure what the right tests for that would be. As I understand
it, this can be called during early boot too, but I'm not sure how to
distinguish either being early or being under stop machine. Rabin or
Rob, I think you've worked on this from the perspective of early boot,
any ideas?

In the meantime, I could add a comment on set_fixmap...

>
> Will
>
> P.S. If you plan on doing an equivalent series for arm64, you should do it
> before we have any broken CPUs, then you can use TLBI without worry ;)

Heh. Hopefully I can start testing arm64 soon. I think Laura sent a
series for CONFIG_DEBUG_RODATA already, so I'm hoping by the time I
get 64-bit hardware, this will already be done. ;)

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2014-09-11 16:27 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-03 21:57 [PATCH v5 0/8] arm: support CONFIG_RODATA Kees Cook
2014-09-03 21:57 ` Kees Cook
2014-09-03 21:57 ` [PATCH v5 1/8] arm: use generic fixmap.h Kees Cook
2014-09-03 21:57   ` Kees Cook
2014-09-03 21:57 ` [PATCH v5 2/8] ARM: expand fixmap region to 3MB Kees Cook
2014-09-03 21:57   ` Kees Cook
2014-09-03 21:57 ` [PATCH v5 3/8] arm: fixmap: implement __set_fixmap() Kees Cook
2014-09-03 21:57   ` Kees Cook
2014-09-04 17:03   ` Will Deacon
2014-09-04 17:03     ` Will Deacon
2014-09-04 17:23     ` Kees Cook
2014-09-04 17:23       ` Kees Cook
2014-09-04 17:27       ` Will Deacon
2014-09-04 17:27         ` Will Deacon
2014-09-05 19:41         ` Kees Cook
2014-09-05 19:41           ` Kees Cook
2014-09-08 10:39           ` Will Deacon
2014-09-08 10:39             ` Will Deacon
2014-09-08 18:38             ` Kees Cook
2014-09-08 18:38               ` Kees Cook
2014-09-08 19:16         ` Kees Cook
2014-09-08 19:16           ` Kees Cook
2014-09-08 21:55           ` Rabin Vincent
2014-09-08 21:55             ` Rabin Vincent
2014-09-08 22:40             ` Kees Cook
2014-09-08 22:40               ` Kees Cook
2014-09-09 12:38               ` Will Deacon
2014-09-09 12:38                 ` Will Deacon
2014-09-09 14:33                 ` Kees Cook
2014-09-09 14:33                   ` Kees Cook
2014-09-10 17:51                   ` Will Deacon
2014-09-10 17:51                     ` Will Deacon
2014-09-11 15:27                     ` Kees Cook
2014-09-11 15:27                       ` Kees Cook
2014-09-11 16:05                       ` Kees Cook
2014-09-11 16:05                         ` Kees Cook
2014-09-11 16:16                         ` Will Deacon
2014-09-11 16:16                           ` Will Deacon
2014-09-11 16:27                           ` Kees Cook
2014-09-11 16:27                             ` Kees Cook
2014-09-03 21:57 ` [PATCH v5 4/8] arm: use fixmap for text patching when text is RO Kees Cook
2014-09-03 21:57   ` Kees Cook
2014-09-03 21:57 ` [PATCH v5 5/8] ARM: kexec: Make .text R/W in machine_kexec Kees Cook
2014-09-03 21:57   ` Kees Cook
2014-09-03 21:57 ` [PATCH v5 6/8] arm: kgdb: Handle read-only text / modules Kees Cook
2014-09-03 21:57   ` Kees Cook
2014-09-03 21:57 ` [PATCH v5 7/8] ARM: mm: allow non-text sections to be non-executable Kees Cook
2014-09-03 21:57   ` Kees Cook
2014-09-03 21:57 ` [PATCH v5 8/8] ARM: mm: allow text and rodata sections to be read-only Kees Cook
2014-09-03 21:57   ` Kees Cook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.