All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5sub1 0/8] arm64: split linear and kernel mappings
@ 2016-02-01 10:54 Ard Biesheuvel
  2016-02-01 10:54 ` [PATCH v5sub1 1/8] of/fdt: make memblock minimum physical address arch configurable Ard Biesheuvel
                   ` (8 more replies)
  0 siblings, 9 replies; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-01 10:54 UTC (permalink / raw)
  To: linux-arm-kernel

At the request of Catalin, this series has been split off from my series
'arm64: implement support for KASLR v4' [1]. This sub-series deals with
moving the kernel out of the linear mapping into the vmalloc area. This
is a prerequisite for independent physical and virtual randomization of
the kernel image. On top of that, considering that these changes allow
the linear mapping to start at an arbitrary offset above PAGE_OFFSET, it
should be an improvement in itself due to the fact that we can now choose
PAGE_OFFSET such that RAM can be mapped using large block sizes.

For instance, on my Seattle A0 box, the kernel is loaded 16 MB into the
lowest GB of RAM, which means __pa(PAGE_OFFSET) is not 1 GB aligned, and
the entire 16 GB of RAM will be mapping using 2 MB blocks. (Similarly,
for 64 KB granule kernels, the entire 16 GB of RAM will be mapped using
pages since __pa(PAGE_OFFSET) is not 512 MB aligned). With these changes
 __pa(PAGE_OFFSET) will always be chosen such that it is aligned to a
quantity that allows efficient mapping.

Note that of the entire KASLR series, this sub-series is the most likely to
cause problems, and hence requires the most careful review and testing. This
is due to the fact that, with these changes, the invariant __va(__pa(x)) == x
no longer holds, and any code that is based on that assumption needs to be
updated.

Changes since v4:
- added Marc's ack to patch #6
- round the kasan zero shadow region around the kernel image to swapper block
  size (#7)
- ensure that we don't clip the kernel image when clipping RAM to the linear
  region size (#8)

Patch #1 allows the low mark of memblocks discovered from the FDT to be
overridden by the architecture.

Patch #2 enables the huge-vmap generic feature for arm64. This should be an
improvement in itself, but the significance for this series is that it allows
unmap_kernel_range() to be called on the [__init_begin, __init_end) region,
which may be partially mapped using block mappings.

Patch #3 introduces KIMAGE_VADDR as a separate, preparatory step towards
decoupling the kernel placement from PAGE_OFFSET

Patch #4 implements some translation table accessors that operate on statically
allocate translation tables before the linear mapping is up.

Patch #5 decouples the fixmap initialization from the linear mapping, by using
the accessors implemented by patch #4

Patch #6 removes assumptions made my KVM regarding the placement of the kernel
image inside the linear mapping.

Patch #7 moves the kernel image from the base of the linear mapping to the base
of the vmalloc area. The modules area, which sits right below the kernel image,
is moved along and is put right before the start of the vmalloc area.

Patch #8 decouples PHYS_OFFSET from PAGE_OFFSET, which allows the linear mapping
to cover all discovered memory, regardless of where the kernel image is located
in it. This effectively allows the kernel to be loaded at any physical address
(provided that the correct alignment is used)

[1] http://thread.gmane.org/gmane.linux.kernel/2135931

Ard Biesheuvel (8):
  of/fdt: make memblock minimum physical address arch configurable
  arm64: add support for ioremap() block mappings
  arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
  arm64: pgtable: implement static [pte|pmd|pud]_offset variants
  arm64: decouple early fixmap init from linear mapping
  arm64: kvm: deal with kernel symbols outside of linear mapping
  arm64: move kernel image to base of vmalloc area
  arm64: allow kernel Image to be loaded anywhere in physical memory

 Documentation/arm64/booting.txt                      |  20 ++-
 Documentation/features/vm/huge-vmap/arch-support.txt |   2 +-
 arch/arm/include/asm/kvm_asm.h                       |   2 +
 arch/arm/kvm/arm.c                                   |   8 +-
 arch/arm64/Kconfig                                   |   1 +
 arch/arm64/include/asm/boot.h                        |   6 +
 arch/arm64/include/asm/kasan.h                       |   2 +-
 arch/arm64/include/asm/kernel-pgtable.h              |  12 ++
 arch/arm64/include/asm/kvm_asm.h                     |   2 +
 arch/arm64/include/asm/kvm_host.h                    |   8 +-
 arch/arm64/include/asm/memory.h                      |  44 ++++--
 arch/arm64/include/asm/pgtable.h                     |  23 ++-
 arch/arm64/kernel/head.S                             |   8 +-
 arch/arm64/kernel/image.h                            |  13 +-
 arch/arm64/kernel/vmlinux.lds.S                      |   4 +-
 arch/arm64/kvm/hyp.S                                 |   6 +-
 arch/arm64/mm/dump.c                                 |  12 +-
 arch/arm64/mm/init.c                                 | 123 ++++++++++++++--
 arch/arm64/mm/kasan_init.c                           |  31 +++-
 arch/arm64/mm/mmu.c                                  | 155 +++++++++++++++-----
 drivers/of/fdt.c                                     |   5 +-
 21 files changed, 378 insertions(+), 109 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 1/8] of/fdt: make memblock minimum physical address arch configurable
  2016-02-01 10:54 [PATCH v5sub1 0/8] arm64: split linear and kernel mappings Ard Biesheuvel
@ 2016-02-01 10:54 ` Ard Biesheuvel
  2016-02-01 10:54 ` [PATCH v5sub1 2/8] arm64: add support for ioremap() block mappings Ard Biesheuvel
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-01 10:54 UTC (permalink / raw)
  To: linux-arm-kernel

By default, early_init_dt_add_memory_arch() ignores memory below
the base of the kernel image since it won't be addressable via the
linear mapping. However, this is not appropriate anymore once we
decouple the kernel text mapping from the linear mapping, so archs
may want to drop the low limit entirely. So allow the minimum to be
overridden by setting MIN_MEMBLOCK_ADDR.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 drivers/of/fdt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 655f79db7899..1f98156f8996 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -976,13 +976,16 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
 }
 
 #ifdef CONFIG_HAVE_MEMBLOCK
+#ifndef MIN_MEMBLOCK_ADDR
+#define MIN_MEMBLOCK_ADDR	__pa(PAGE_OFFSET)
+#endif
 #ifndef MAX_MEMBLOCK_ADDR
 #define MAX_MEMBLOCK_ADDR	((phys_addr_t)~0)
 #endif
 
 void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
 {
-	const u64 phys_offset = __pa(PAGE_OFFSET);
+	const u64 phys_offset = MIN_MEMBLOCK_ADDR;
 
 	if (!PAGE_ALIGNED(base)) {
 		if (size < PAGE_SIZE - (base & ~PAGE_MASK)) {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 2/8] arm64: add support for ioremap() block mappings
  2016-02-01 10:54 [PATCH v5sub1 0/8] arm64: split linear and kernel mappings Ard Biesheuvel
  2016-02-01 10:54 ` [PATCH v5sub1 1/8] of/fdt: make memblock minimum physical address arch configurable Ard Biesheuvel
@ 2016-02-01 10:54 ` Ard Biesheuvel
  2016-02-01 14:10   ` Mark Rutland
  2016-02-01 10:54 ` [PATCH v5sub1 3/8] arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region Ard Biesheuvel
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-01 10:54 UTC (permalink / raw)
  To: linux-arm-kernel

This wires up the existing generic huge-vmap feature, which allows
ioremap() to use PMD or PUD sized block mappings. It also adds support
to the unmap path for dealing with block mappings, which will allow us
to unmap the __init region using unmap_kernel_range() in a subsequent
patch.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 Documentation/features/vm/huge-vmap/arch-support.txt |  2 +-
 arch/arm64/Kconfig                                   |  1 +
 arch/arm64/include/asm/memory.h                      |  6 +++
 arch/arm64/mm/mmu.c                                  | 41 ++++++++++++++++++++
 4 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/Documentation/features/vm/huge-vmap/arch-support.txt b/Documentation/features/vm/huge-vmap/arch-support.txt
index af6816bccb43..df1d1f3c9af2 100644
--- a/Documentation/features/vm/huge-vmap/arch-support.txt
+++ b/Documentation/features/vm/huge-vmap/arch-support.txt
@@ -9,7 +9,7 @@
     |       alpha: | TODO |
     |         arc: | TODO |
     |         arm: | TODO |
-    |       arm64: | TODO |
+    |       arm64: |  ok  |
     |       avr32: | TODO |
     |    blackfin: | TODO |
     |         c6x: | TODO |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8cc62289a63e..cd767fa3037a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -49,6 +49,7 @@ config ARM64
 	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_BITREVERSE
+	select HAVE_ARCH_HUGE_VMAP
 	select HAVE_ARCH_JUMP_LABEL
 	select HAVE_ARCH_KASAN if SPARSEMEM_VMEMMAP && !(ARM64_16K_PAGES && ARM64_VA_BITS_48)
 	select HAVE_ARCH_KGDB
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 853953cd1f08..c65aad7b13dc 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -100,6 +100,12 @@
 #define MT_S2_NORMAL		0xf
 #define MT_S2_DEVICE_nGnRE	0x1
 
+#ifdef CONFIG_ARM64_4K_PAGES
+#define IOREMAP_MAX_ORDER	(PUD_SHIFT)
+#else
+#define IOREMAP_MAX_ORDER	(PMD_SHIFT)
+#endif
+
 #ifndef __ASSEMBLY__
 
 extern phys_addr_t		memstart_addr;
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 7711554a94f4..73383019f212 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -714,3 +714,44 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys)
 
 	return dt_virt;
 }
+
+int __init arch_ioremap_pud_supported(void)
+{
+	/* only 4k granule supports level 1 block mappings */
+	return IS_ENABLED(CONFIG_ARM64_4K_PAGES);
+}
+
+int __init arch_ioremap_pmd_supported(void)
+{
+	return 1;
+}
+
+int pud_set_huge(pud_t *pud, phys_addr_t phys, pgprot_t prot)
+{
+	BUG_ON(phys & ~PUD_MASK);
+	set_pud(pud, __pud(phys | PUD_TYPE_SECT | pgprot_val(mk_sect_prot(prot))));
+	return 1;
+}
+
+int pmd_set_huge(pmd_t *pmd, phys_addr_t phys, pgprot_t prot)
+{
+	BUG_ON(phys & ~PMD_MASK);
+	set_pmd(pmd, __pmd(phys | PMD_TYPE_SECT | pgprot_val(mk_sect_prot(prot))));
+	return 1;
+}
+
+int pud_clear_huge(pud_t *pud)
+{
+	if (!pud_sect(*pud))
+		return 0;
+	pud_clear(pud);
+	return 1;
+}
+
+int pmd_clear_huge(pmd_t *pmd)
+{
+	if (!pmd_sect(*pmd))
+		return 0;
+	pmd_clear(pmd);
+	return 1;
+}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 3/8] arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
  2016-02-01 10:54 [PATCH v5sub1 0/8] arm64: split linear and kernel mappings Ard Biesheuvel
  2016-02-01 10:54 ` [PATCH v5sub1 1/8] of/fdt: make memblock minimum physical address arch configurable Ard Biesheuvel
  2016-02-01 10:54 ` [PATCH v5sub1 2/8] arm64: add support for ioremap() block mappings Ard Biesheuvel
@ 2016-02-01 10:54 ` Ard Biesheuvel
  2016-02-01 10:54 ` [PATCH v5sub1 4/8] arm64: pgtable: implement static [pte|pmd|pud]_offset variants Ard Biesheuvel
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-01 10:54 UTC (permalink / raw)
  To: linux-arm-kernel

This introduces the preprocessor symbol KIMAGE_VADDR which will serve as
the symbolic virtual base of the kernel region, i.e., the kernel's virtual
offset will be KIMAGE_VADDR + TEXT_OFFSET. For now, we define it as being
equal to PAGE_OFFSET, but in the future, it will be moved below it once
we move the kernel virtual mapping out of the linear mapping.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/memory.h | 10 ++++++++--
 arch/arm64/kernel/head.S        |  2 +-
 arch/arm64/kernel/vmlinux.lds.S |  4 ++--
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index c65aad7b13dc..aebc739f5a11 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -51,7 +51,8 @@
 #define VA_BITS			(CONFIG_ARM64_VA_BITS)
 #define VA_START		(UL(0xffffffffffffffff) << VA_BITS)
 #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
-#define MODULES_END		(PAGE_OFFSET)
+#define KIMAGE_VADDR		(PAGE_OFFSET)
+#define MODULES_END		(KIMAGE_VADDR)
 #define MODULES_VADDR		(MODULES_END - SZ_64M)
 #define PCI_IO_END		(MODULES_VADDR - SZ_2M)
 #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
@@ -75,8 +76,13 @@
  * private definitions which should NOT be used outside memory.h
  * files.  Use virt_to_phys/phys_to_virt/__pa/__va instead.
  */
-#define __virt_to_phys(x)	(((phys_addr_t)(x) - PAGE_OFFSET + PHYS_OFFSET))
+#define __virt_to_phys(x) ({						\
+	phys_addr_t __x = (phys_addr_t)(x);				\
+	__x >= PAGE_OFFSET ? (__x - PAGE_OFFSET + PHYS_OFFSET) :	\
+			     (__x - KIMAGE_VADDR + PHYS_OFFSET); })
+
 #define __phys_to_virt(x)	((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET))
+#define __phys_to_kimg(x)	((unsigned long)((x) - PHYS_OFFSET + KIMAGE_VADDR))
 
 /*
  * Convert a page to/from a physical address
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 53b9f9f128c2..04d38a058b19 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -389,7 +389,7 @@ __create_page_tables:
 	 * Map the kernel image (starting with PHYS_OFFSET).
 	 */
 	mov	x0, x26				// swapper_pg_dir
-	mov	x5, #PAGE_OFFSET
+	ldr	x5, =KIMAGE_VADDR
 	create_pgd_entry x0, x5, x3, x6
 	ldr	x6, =KERNEL_END			// __va(KERNEL_END)
 	mov	x3, x24				// phys offset
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index b78a3c772294..282e3e64a17e 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -89,7 +89,7 @@ SECTIONS
 		*(.discard.*)
 	}
 
-	. = PAGE_OFFSET + TEXT_OFFSET;
+	. = KIMAGE_VADDR + TEXT_OFFSET;
 
 	.head.text : {
 		_text = .;
@@ -186,4 +186,4 @@ ASSERT(__idmap_text_end - (__idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
 /*
  * If padding is applied before .head.text, virt<->phys conversions will fail.
  */
-ASSERT(_text == (PAGE_OFFSET + TEXT_OFFSET), "HEAD is misaligned")
+ASSERT(_text == (KIMAGE_VADDR + TEXT_OFFSET), "HEAD is misaligned")
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 4/8] arm64: pgtable: implement static [pte|pmd|pud]_offset variants
  2016-02-01 10:54 [PATCH v5sub1 0/8] arm64: split linear and kernel mappings Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2016-02-01 10:54 ` [PATCH v5sub1 3/8] arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region Ard Biesheuvel
@ 2016-02-01 10:54 ` Ard Biesheuvel
  2016-02-01 10:54 ` [PATCH v5sub1 5/8] arm64: decouple early fixmap init from linear mapping Ard Biesheuvel
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-01 10:54 UTC (permalink / raw)
  To: linux-arm-kernel

The page table accessors pte_offset(), pud_offset() and pmd_offset()
rely on __va translations, so they can only be used after the linear
mapping has been installed. For the early fixmap and kasan init routines,
whose page tables are allocated statically in the kernel image, these
functions will return bogus values. So implement pte_offset_kimg(),
pmd_offset_kimg() and pud_offset_kimg(), which can be used instead
before any page tables have been allocated dynamically.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/pgtable.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 4229f75fd145..87355408d448 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -445,6 +445,9 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
 
 #define pmd_page(pmd)		pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
 
+/* use ONLY for statically allocated translation tables */
+#define pte_offset_kimg(dir,addr)	((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))))
+
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
@@ -488,6 +491,9 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
 
 #define pud_page(pud)		pfn_to_page(__phys_to_pfn(pud_val(pud) & PHYS_MASK))
 
+/* use ONLY for statically allocated translation tables */
+#define pmd_offset_kimg(dir,addr)	((pmd_t *)__phys_to_kimg(pmd_offset_phys((dir), (addr))))
+
 #else
 
 #define pud_page_paddr(pud)	({ BUILD_BUG(); 0; })
@@ -497,6 +503,8 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
 #define pmd_set_fixmap_offset(pudp, addr)	((pmd_t *)pudp)
 #define pmd_clear_fixmap()
 
+#define pmd_offset_kimg(dir,addr)	((pmd_t *)dir)
+
 #endif	/* CONFIG_PGTABLE_LEVELS > 2 */
 
 #if CONFIG_PGTABLE_LEVELS > 3
@@ -535,6 +543,9 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
 
 #define pgd_page(pgd)		pfn_to_page(__phys_to_pfn(pgd_val(pgd) & PHYS_MASK))
 
+/* use ONLY for statically allocated translation tables */
+#define pud_offset_kimg(dir,addr)	((pud_t *)__phys_to_kimg(pud_offset_phys((dir), (addr))))
+
 #else
 
 #define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
@@ -544,6 +555,8 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
 #define pud_set_fixmap_offset(pgdp, addr)	((pud_t *)pgdp)
 #define pud_clear_fixmap()
 
+#define pud_offset_kimg(dir,addr)	((pud_t *)dir)
+
 #endif  /* CONFIG_PGTABLE_LEVELS > 3 */
 
 #define pgd_ERROR(pgd)		__pgd_error(__FILE__, __LINE__, pgd_val(pgd))
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 5/8] arm64: decouple early fixmap init from linear mapping
  2016-02-01 10:54 [PATCH v5sub1 0/8] arm64: split linear and kernel mappings Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2016-02-01 10:54 ` [PATCH v5sub1 4/8] arm64: pgtable: implement static [pte|pmd|pud]_offset variants Ard Biesheuvel
@ 2016-02-01 10:54 ` Ard Biesheuvel
  2016-02-01 10:54 ` [PATCH v5sub1 6/8] arm64: kvm: deal with kernel symbols outside of " Ard Biesheuvel
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-01 10:54 UTC (permalink / raw)
  To: linux-arm-kernel

Since the early fixmap page tables are populated using pages that are
part of the static footprint of the kernel, they are covered by the
initial kernel mapping, and we can refer to them without using __va/__pa
translations, which are tied to the linear mapping.

Since the fixmap page tables are disjoint from the kernel mapping up
to the top level pgd entry, we can refer to bm_pte[] directly, and there
is no need to walk the page tables and perform __pa()/__va() translations
at each step.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 73383019f212..b84915723ea0 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -583,7 +583,7 @@ static inline pud_t * fixmap_pud(unsigned long addr)
 
 	BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
 
-	return pud_offset(pgd, addr);
+	return pud_offset_kimg(pgd, addr);
 }
 
 static inline pmd_t * fixmap_pmd(unsigned long addr)
@@ -592,16 +592,12 @@ static inline pmd_t * fixmap_pmd(unsigned long addr)
 
 	BUG_ON(pud_none(*pud) || pud_bad(*pud));
 
-	return pmd_offset(pud, addr);
+	return pmd_offset_kimg(pud, addr);
 }
 
 static inline pte_t * fixmap_pte(unsigned long addr)
 {
-	pmd_t *pmd = fixmap_pmd(addr);
-
-	BUG_ON(pmd_none(*pmd) || pmd_bad(*pmd));
-
-	return pte_offset_kernel(pmd, addr);
+	return &bm_pte[pte_index(addr)];
 }
 
 void __init early_fixmap_init(void)
@@ -613,14 +609,14 @@ void __init early_fixmap_init(void)
 
 	pgd = pgd_offset_k(addr);
 	pgd_populate(&init_mm, pgd, bm_pud);
-	pud = pud_offset(pgd, addr);
+	pud = fixmap_pud(addr);
 	pud_populate(&init_mm, pud, bm_pmd);
-	pmd = pmd_offset(pud, addr);
+	pmd = fixmap_pmd(addr);
 	pmd_populate_kernel(&init_mm, pmd, bm_pte);
 
 	/*
 	 * The boot-ioremap range spans multiple pmds, for which
-	 * we are not preparted:
+	 * we are not prepared:
 	 */
 	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
 		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 6/8] arm64: kvm: deal with kernel symbols outside of linear mapping
  2016-02-01 10:54 [PATCH v5sub1 0/8] arm64: split linear and kernel mappings Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2016-02-01 10:54 ` [PATCH v5sub1 5/8] arm64: decouple early fixmap init from linear mapping Ard Biesheuvel
@ 2016-02-01 10:54 ` Ard Biesheuvel
  2016-02-01 10:54 ` [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area Ard Biesheuvel
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-01 10:54 UTC (permalink / raw)
  To: linux-arm-kernel

KVM on arm64 uses a fixed offset between the linear mapping at EL1 and
the HYP mapping at EL2. Before we can move the kernel virtual mapping
out of the linear mapping, we have to make sure that references to kernel
symbols that are accessed via the HYP mapping are translated to their
linear equivalent.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/include/asm/kvm_asm.h    | 2 ++
 arch/arm/kvm/arm.c                | 8 +++++---
 arch/arm64/include/asm/kvm_asm.h  | 2 ++
 arch/arm64/include/asm/kvm_host.h | 8 +++++---
 arch/arm64/kvm/hyp.S              | 6 +++---
 5 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 194c91b610ff..c35c349da069 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -79,6 +79,8 @@
 #define rr_lo_hi(a1, a2) a1, a2
 #endif
 
+#define kvm_ksym_ref(kva)	(kva)
+
 #ifndef __ASSEMBLY__
 struct kvm;
 struct kvm_vcpu;
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dda1959f0dde..975da6cfbf59 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -982,7 +982,7 @@ static void cpu_init_hyp_mode(void *dummy)
 	pgd_ptr = kvm_mmu_get_httbr();
 	stack_page = __this_cpu_read(kvm_arm_hyp_stack_page);
 	hyp_stack_ptr = stack_page + PAGE_SIZE;
-	vector_ptr = (unsigned long)__kvm_hyp_vector;
+	vector_ptr = (unsigned long)kvm_ksym_ref(__kvm_hyp_vector);
 
 	__cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr);
 
@@ -1074,13 +1074,15 @@ static int init_hyp_mode(void)
 	/*
 	 * Map the Hyp-code called directly from the host
 	 */
-	err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
+	err = create_hyp_mappings(kvm_ksym_ref(__kvm_hyp_code_start),
+				  kvm_ksym_ref(__kvm_hyp_code_end));
 	if (err) {
 		kvm_err("Cannot map world-switch code\n");
 		goto out_free_mappings;
 	}
 
-	err = create_hyp_mappings(__start_rodata, __end_rodata);
+	err = create_hyp_mappings(kvm_ksym_ref(__start_rodata),
+				  kvm_ksym_ref(__end_rodata));
 	if (err) {
 		kvm_err("Cannot map rodata section\n");
 		goto out_free_mappings;
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 52b777b7d407..f5aee6e764e6 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -26,6 +26,8 @@
 #define KVM_ARM64_DEBUG_DIRTY_SHIFT	0
 #define KVM_ARM64_DEBUG_DIRTY		(1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
 
+#define kvm_ksym_ref(sym)		((void *)&sym - KIMAGE_VADDR + PAGE_OFFSET)
+
 #ifndef __ASSEMBLY__
 struct kvm;
 struct kvm_vcpu;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 689d4c95e12f..e3d67ff8798b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -307,7 +307,7 @@ static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
 
-u64 kvm_call_hyp(void *hypfn, ...);
+u64 __kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
@@ -328,8 +328,8 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
 	 * Call initialization code, and switch to the full blown
 	 * HYP code.
 	 */
-	kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr,
-		     hyp_stack_ptr, vector_ptr);
+	__kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr,
+		       hyp_stack_ptr, vector_ptr);
 }
 
 static inline void kvm_arch_hardware_disable(void) {}
@@ -343,4 +343,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
 
+#define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 0ccdcbbef3c2..870578f84b1c 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -20,7 +20,7 @@
 #include <asm/assembler.h>
 
 /*
- * u64 kvm_call_hyp(void *hypfn, ...);
+ * u64 __kvm_call_hyp(void *hypfn, ...);
  *
  * This is not really a variadic function in the classic C-way and care must
  * be taken when calling this to ensure parameters are passed in registers
@@ -37,7 +37,7 @@
  * used to implement __hyp_get_vectors in the same way as in
  * arch/arm64/kernel/hyp_stub.S.
  */
-ENTRY(kvm_call_hyp)
+ENTRY(__kvm_call_hyp)
 	hvc	#0
 	ret
-ENDPROC(kvm_call_hyp)
+ENDPROC(__kvm_call_hyp)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-01 10:54 [PATCH v5sub1 0/8] arm64: split linear and kernel mappings Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2016-02-01 10:54 ` [PATCH v5sub1 6/8] arm64: kvm: deal with kernel symbols outside of " Ard Biesheuvel
@ 2016-02-01 10:54 ` Ard Biesheuvel
  2016-02-01 12:24   ` Catalin Marinas
                     ` (3 more replies)
  2016-02-01 10:54 ` [PATCH v5sub1 8/8] arm64: allow kernel Image to be loaded anywhere in physical memory Ard Biesheuvel
  2016-02-12 19:45 ` [PATCH v5sub1 0/8] arm64: split linear and kernel mappings Matthias Brugger
  8 siblings, 4 replies; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-01 10:54 UTC (permalink / raw)
  To: linux-arm-kernel

This moves the module area to right before the vmalloc area, and
moves the kernel image to the base of the vmalloc area. This is
an intermediate step towards implementing KASLR, which allows the
kernel image to be located anywhere in the vmalloc area.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/kasan.h   |  2 +-
 arch/arm64/include/asm/memory.h  | 21 +++--
 arch/arm64/include/asm/pgtable.h | 10 +-
 arch/arm64/mm/dump.c             | 12 +--
 arch/arm64/mm/init.c             | 23 ++---
 arch/arm64/mm/kasan_init.c       | 31 ++++++-
 arch/arm64/mm/mmu.c              | 97 +++++++++++++-------
 7 files changed, 129 insertions(+), 67 deletions(-)

diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
index de0d21211c34..71ad0f93eb71 100644
--- a/arch/arm64/include/asm/kasan.h
+++ b/arch/arm64/include/asm/kasan.h
@@ -14,7 +14,7 @@
  * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
  */
 #define KASAN_SHADOW_START      (VA_START)
-#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
+#define KASAN_SHADOW_END        (KASAN_SHADOW_START + KASAN_SHADOW_SIZE)
 
 /*
  * This value is used to map an address to the corresponding shadow
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index aebc739f5a11..4388651d1f0d 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -45,16 +45,15 @@
  * VA_START - the first kernel virtual address.
  * TASK_SIZE - the maximum size of a user space task.
  * TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area.
- * The module space lives between the addresses given by TASK_SIZE
- * and PAGE_OFFSET - it must be within 128MB of the kernel text.
  */
 #define VA_BITS			(CONFIG_ARM64_VA_BITS)
 #define VA_START		(UL(0xffffffffffffffff) << VA_BITS)
 #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
-#define KIMAGE_VADDR		(PAGE_OFFSET)
-#define MODULES_END		(KIMAGE_VADDR)
-#define MODULES_VADDR		(MODULES_END - SZ_64M)
-#define PCI_IO_END		(MODULES_VADDR - SZ_2M)
+#define KIMAGE_VADDR		(MODULES_END)
+#define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
+#define MODULES_VADDR		(VA_START + KASAN_SHADOW_SIZE)
+#define MODULES_VSIZE		(SZ_64M)
+#define PCI_IO_END		(PAGE_OFFSET - SZ_2M)
 #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
 #define FIXADDR_TOP		(PCI_IO_START - SZ_2M)
 #define TASK_SIZE_64		(UL(1) << VA_BITS)
@@ -72,6 +71,16 @@
 #define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 4))
 
 /*
+ * The size of the KASAN shadow region. This should be 1/8th of the
+ * size of the entire kernel virtual address space.
+ */
+#ifdef CONFIG_KASAN
+#define KASAN_SHADOW_SIZE	(UL(1) << (VA_BITS - 3))
+#else
+#define KASAN_SHADOW_SIZE	(0)
+#endif
+
+/*
  * Physical vs virtual RAM address space conversion.  These are
  * private definitions which should NOT be used outside memory.h
  * files.  Use virt_to_phys/phys_to_virt/__pa/__va instead.
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 87355408d448..a440f5a85d08 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -36,19 +36,13 @@
  *
  * VMEMAP_SIZE: allows the whole VA space to be covered by a struct page array
  *	(rounded up to PUD_SIZE).
- * VMALLOC_START: beginning of the kernel VA space
+ * VMALLOC_START: beginning of the kernel vmalloc space
  * VMALLOC_END: extends to the available space below vmmemmap, PCI I/O space,
  *	fixed mappings and modules
  */
 #define VMEMMAP_SIZE		ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
 
-#ifndef CONFIG_KASAN
-#define VMALLOC_START		(VA_START)
-#else
-#include <asm/kasan.h>
-#define VMALLOC_START		(KASAN_SHADOW_END + SZ_64K)
-#endif
-
+#define VMALLOC_START		(MODULES_END)
 #define VMALLOC_END		(PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
 
 #define vmemmap			((struct page *)(VMALLOC_END + SZ_64K))
diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index 0adbebbc2803..e83ffb00560c 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -35,7 +35,9 @@ struct addr_marker {
 };
 
 enum address_markers_idx {
-	VMALLOC_START_NR = 0,
+	MODULES_START_NR = 0,
+	MODULES_END_NR,
+	VMALLOC_START_NR,
 	VMALLOC_END_NR,
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 	VMEMMAP_START_NR,
@@ -45,12 +47,12 @@ enum address_markers_idx {
 	FIXADDR_END_NR,
 	PCI_START_NR,
 	PCI_END_NR,
-	MODULES_START_NR,
-	MODULES_END_NR,
 	KERNEL_SPACE_NR,
 };
 
 static struct addr_marker address_markers[] = {
+	{ MODULES_VADDR,	"Modules start" },
+	{ MODULES_END,		"Modules end" },
 	{ VMALLOC_START,	"vmalloc() Area" },
 	{ VMALLOC_END,		"vmalloc() End" },
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
@@ -61,9 +63,7 @@ static struct addr_marker address_markers[] = {
 	{ FIXADDR_TOP,		"Fixmap end" },
 	{ PCI_IO_START,		"PCI I/O start" },
 	{ PCI_IO_END,		"PCI I/O end" },
-	{ MODULES_VADDR,	"Modules start" },
-	{ MODULES_END,		"Modules end" },
-	{ PAGE_OFFSET,		"Kernel Mapping" },
+	{ PAGE_OFFSET,		"Linear Mapping" },
 	{ -1,			NULL },
 };
 
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index f3b061e67bfe..1d627cd8121c 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -36,6 +36,7 @@
 #include <linux/swiotlb.h>
 
 #include <asm/fixmap.h>
+#include <asm/kasan.h>
 #include <asm/memory.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
@@ -302,22 +303,26 @@ void __init mem_init(void)
 #ifdef CONFIG_KASAN
 		  "    kasan   : 0x%16lx - 0x%16lx   (%6ld GB)\n"
 #endif
+		  "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
 		  "    vmalloc : 0x%16lx - 0x%16lx   (%6ld GB)\n"
+		  "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
+		  "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
+		  "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n"
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 		  "    vmemmap : 0x%16lx - 0x%16lx   (%6ld GB maximum)\n"
 		  "              0x%16lx - 0x%16lx   (%6ld MB actual)\n"
 #endif
 		  "    fixed   : 0x%16lx - 0x%16lx   (%6ld KB)\n"
 		  "    PCI I/O : 0x%16lx - 0x%16lx   (%6ld MB)\n"
-		  "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
-		  "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n"
-		  "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
-		  "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
-		  "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n",
+		  "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n",
 #ifdef CONFIG_KASAN
 		  MLG(KASAN_SHADOW_START, KASAN_SHADOW_END),
 #endif
+		  MLM(MODULES_VADDR, MODULES_END),
 		  MLG(VMALLOC_START, VMALLOC_END),
+		  MLK_ROUNDUP(__init_begin, __init_end),
+		  MLK_ROUNDUP(_text, _etext),
+		  MLK_ROUNDUP(_sdata, _edata),
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 		  MLG((unsigned long)vmemmap,
 		      (unsigned long)vmemmap + VMEMMAP_SIZE),
@@ -326,11 +331,7 @@ void __init mem_init(void)
 #endif
 		  MLK(FIXADDR_START, FIXADDR_TOP),
 		  MLM(PCI_IO_START, PCI_IO_END),
-		  MLM(MODULES_VADDR, MODULES_END),
-		  MLM(PAGE_OFFSET, (unsigned long)high_memory),
-		  MLK_ROUNDUP(__init_begin, __init_end),
-		  MLK_ROUNDUP(_text, _etext),
-		  MLK_ROUNDUP(_sdata, _edata));
+		  MLM(PAGE_OFFSET, (unsigned long)high_memory));
 
 #undef MLK
 #undef MLM
@@ -358,8 +359,8 @@ void __init mem_init(void)
 
 void free_initmem(void)
 {
-	fixup_init();
 	free_initmem_default(0);
+	fixup_init();
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index cc569a38bc76..66c246871d2e 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -17,9 +17,11 @@
 #include <linux/start_kernel.h>
 
 #include <asm/mmu_context.h>
+#include <asm/kernel-pgtable.h>
 #include <asm/page.h>
 #include <asm/pgalloc.h>
 #include <asm/pgtable.h>
+#include <asm/sections.h>
 #include <asm/tlbflush.h>
 
 static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
@@ -33,7 +35,7 @@ static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr,
 	if (pmd_none(*pmd))
 		pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
 
-	pte = pte_offset_kernel(pmd, addr);
+	pte = pte_offset_kimg(pmd, addr);
 	do {
 		next = addr + PAGE_SIZE;
 		set_pte(pte, pfn_pte(virt_to_pfn(kasan_zero_page),
@@ -51,7 +53,7 @@ static void __init kasan_early_pmd_populate(pud_t *pud,
 	if (pud_none(*pud))
 		pud_populate(&init_mm, pud, kasan_zero_pmd);
 
-	pmd = pmd_offset(pud, addr);
+	pmd = pmd_offset_kimg(pud, addr);
 	do {
 		next = pmd_addr_end(addr, end);
 		kasan_early_pte_populate(pmd, addr, next);
@@ -68,7 +70,7 @@ static void __init kasan_early_pud_populate(pgd_t *pgd,
 	if (pgd_none(*pgd))
 		pgd_populate(&init_mm, pgd, kasan_zero_pud);
 
-	pud = pud_offset(pgd, addr);
+	pud = pud_offset_kimg(pgd, addr);
 	do {
 		next = pud_addr_end(addr, end);
 		kasan_early_pmd_populate(pud, addr, next);
@@ -126,9 +128,13 @@ static void __init clear_pgds(unsigned long start,
 
 void __init kasan_init(void)
 {
+	u64 kimg_shadow_start, kimg_shadow_end;
 	struct memblock_region *reg;
 	int i;
 
+	kimg_shadow_start = (u64)kasan_mem_to_shadow(_text);
+	kimg_shadow_end = (u64)kasan_mem_to_shadow(_end);
+
 	/*
 	 * We are going to perform proper setup of shadow memory.
 	 * At first we should unmap early shadow (clear_pgds() call bellow).
@@ -142,8 +148,25 @@ void __init kasan_init(void)
 
 	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
 
+	vmemmap_populate(kimg_shadow_start, kimg_shadow_end, NUMA_NO_NODE);
+
+	/*
+	 * vmemmap_populate() has populated the shadow region that covers the
+	 * kernel image with SWAPPER_BLOCK_SIZE mappings, so we have to round
+	 * the start and end addresses to SWAPPER_BLOCK_SIZE as well, to prevent
+	 * kasan_populate_zero_shadow() from replacing the PMD block mappings
+	 * with PMD table mappings at the edges of the shadow region for the
+	 * kernel image.
+	 */
+	if (ARM64_SWAPPER_USES_SECTION_MAPS) {
+		kimg_shadow_start = round_down(kimg_shadow_start,
+					       SWAPPER_BLOCK_SIZE);
+		kimg_shadow_end = round_up(kimg_shadow_end, SWAPPER_BLOCK_SIZE);
+	}
 	kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
-			kasan_mem_to_shadow((void *)MODULES_VADDR));
+				   (void *)kimg_shadow_start);
+	kasan_populate_zero_shadow((void *)kimg_shadow_end,
+				   kasan_mem_to_shadow((void *)PAGE_OFFSET));
 
 	for_each_memblock(memory, reg) {
 		void *start = (void *)__phys_to_virt(reg->base);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b84915723ea0..4c4b15932963 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
 unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
 EXPORT_SYMBOL(empty_zero_page);
 
+static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
+static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
+static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
+
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 			      unsigned long size, pgprot_t vma_prot)
 {
@@ -349,14 +353,14 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 {
 
 	unsigned long kernel_start = __pa(_stext);
-	unsigned long kernel_end = __pa(_end);
+	unsigned long kernel_end = __pa(_etext);
 
 	/*
-	 * The kernel itself is mapped at page granularity. Map all other
-	 * memory, making sure we don't overwrite the existing kernel mappings.
+	 * Take care not to create a writable alias for the
+	 * read-only text and rodata sections of the kernel image.
 	 */
 
-	/* No overlap with the kernel. */
+	/* No overlap with the kernel text */
 	if (end < kernel_start || start >= kernel_end) {
 		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
 				     end - start, PAGE_KERNEL,
@@ -365,7 +369,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 	}
 
 	/*
-	 * This block overlaps the kernel mapping. Map the portion(s) which
+	 * This block overlaps the kernel text mapping. Map the portion(s) which
 	 * don't overlap.
 	 */
 	if (start < kernel_start)
@@ -398,25 +402,28 @@ static void __init map_mem(pgd_t *pgd)
 	}
 }
 
-#ifdef CONFIG_DEBUG_RODATA
 void mark_rodata_ro(void)
 {
+	if (!IS_ENABLED(CONFIG_DEBUG_RODATA))
+		return;
+
 	create_mapping_late(__pa(_stext), (unsigned long)_stext,
 				(unsigned long)_etext - (unsigned long)_stext,
 				PAGE_KERNEL_ROX);
-
 }
-#endif
 
 void fixup_init(void)
 {
-	create_mapping_late(__pa(__init_begin), (unsigned long)__init_begin,
-			(unsigned long)__init_end - (unsigned long)__init_begin,
-			PAGE_KERNEL);
+	/*
+	 * Unmap the __init region but leave the VM area in place. This
+	 * prevents the region from being reused for kernel modules, which
+	 * is not supported by kallsyms.
+	 */
+	unmap_kernel_range((u64)__init_begin, (u64)(__init_end - __init_begin));
 }
 
 static void __init map_kernel_chunk(pgd_t *pgd, void *va_start, void *va_end,
-				    pgprot_t prot)
+				    pgprot_t prot, struct vm_struct *vma)
 {
 	phys_addr_t pa_start = __pa(va_start);
 	unsigned long size = va_end - va_start;
@@ -426,6 +433,14 @@ static void __init map_kernel_chunk(pgd_t *pgd, void *va_start, void *va_end,
 
 	__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
 			     early_pgtable_alloc);
+
+	vma->addr	= va_start;
+	vma->phys_addr	= pa_start;
+	vma->size	= size;
+	vma->flags	= VM_MAP;
+	vma->caller	= map_kernel_chunk;
+
+	vm_area_add_early(vma);
 }
 
 /*
@@ -433,17 +448,35 @@ static void __init map_kernel_chunk(pgd_t *pgd, void *va_start, void *va_end,
  */
 static void __init map_kernel(pgd_t *pgd)
 {
+	static struct vm_struct vmlinux_text, vmlinux_init, vmlinux_data;
 
-	map_kernel_chunk(pgd, _stext, _etext, PAGE_KERNEL_EXEC);
-	map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
-	map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
+	map_kernel_chunk(pgd, _stext, _etext, PAGE_KERNEL_EXEC, &vmlinux_text);
+	map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
+			 &vmlinux_init);
+	map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);
 
-	/*
-	 * The fixmap falls in a separate pgd to the kernel, and doesn't live
-	 * in the carveout for the swapper_pg_dir. We can simply re-use the
-	 * existing dir for the fixmap.
-	 */
-	set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
+	if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {
+		/*
+		 * The fixmap falls in a separate pgd to the kernel, and doesn't
+		 * live in the carveout for the swapper_pg_dir. We can simply
+		 * re-use the existing dir for the fixmap.
+		 */
+		set_pgd(pgd_offset_raw(pgd, FIXADDR_START),
+			*pgd_offset_k(FIXADDR_START));
+	} else if (CONFIG_PGTABLE_LEVELS > 3) {
+		/*
+		 * The fixmap shares its top level pgd entry with the kernel
+		 * mapping. This can really only occur when we are running
+		 * with 16k/4 levels, so we can simply reuse the pud level
+		 * entry instead.
+		 */
+		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
+		set_pud(pud_set_fixmap_offset(pgd, FIXADDR_START),
+			__pud(__pa(bm_pmd) | PUD_TYPE_TABLE));
+		pud_clear_fixmap();
+	} else {
+		BUG();
+	}
 
 	kasan_copy_shadow(pgd);
 }
@@ -569,14 +602,6 @@ void vmemmap_free(unsigned long start, unsigned long end)
 }
 #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
 
-static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
-#if CONFIG_PGTABLE_LEVELS > 2
-static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
-#endif
-#if CONFIG_PGTABLE_LEVELS > 3
-static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
-#endif
-
 static inline pud_t * fixmap_pud(unsigned long addr)
 {
 	pgd_t *pgd = pgd_offset_k(addr);
@@ -608,8 +633,18 @@ void __init early_fixmap_init(void)
 	unsigned long addr = FIXADDR_START;
 
 	pgd = pgd_offset_k(addr);
-	pgd_populate(&init_mm, pgd, bm_pud);
-	pud = fixmap_pud(addr);
+	if (CONFIG_PGTABLE_LEVELS > 3 && !pgd_none(*pgd)) {
+		/*
+		 * We only end up here if the kernel mapping and the fixmap
+		 * share the top level pgd entry, which should only happen on
+		 * 16k/4 levels configurations.
+		 */
+		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
+		pud = pud_offset_kimg(pgd, addr);
+	} else {
+		pgd_populate(&init_mm, pgd, bm_pud);
+		pud = fixmap_pud(addr);
+	}
 	pud_populate(&init_mm, pud, bm_pmd);
 	pmd = fixmap_pmd(addr);
 	pmd_populate_kernel(&init_mm, pmd, bm_pte);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 8/8] arm64: allow kernel Image to be loaded anywhere in physical memory
  2016-02-01 10:54 [PATCH v5sub1 0/8] arm64: split linear and kernel mappings Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2016-02-01 10:54 ` [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area Ard Biesheuvel
@ 2016-02-01 10:54 ` Ard Biesheuvel
  2016-02-01 14:50   ` Mark Rutland
  2016-02-01 15:06   ` Catalin Marinas
  2016-02-12 19:45 ` [PATCH v5sub1 0/8] arm64: split linear and kernel mappings Matthias Brugger
  8 siblings, 2 replies; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-01 10:54 UTC (permalink / raw)
  To: linux-arm-kernel

This relaxes the kernel Image placement requirements, so that it
may be placed at any 2 MB aligned offset in physical memory.

This is accomplished by ignoring PHYS_OFFSET when installing
memblocks, and accounting for the apparent virtual offset of
the kernel Image. As a result, virtual address references
below PAGE_OFFSET are correctly mapped onto physical references
into the kernel Image regardless of where it sits in memory.

Note that limiting memory using mem= is not unambiguous anymore after
this change, considering that the kernel may be at the top of physical
memory, and clipping from the bottom rather than the top will discard
any 32-bit DMA addressable memory first. To deal with this, the handling
of mem= is reimplemented to clip top down, but take special care not to
clip memory that covers the kernel image.

Since mem= should not be considered a production feature, a panic notifier
handler is installed that dumps the memory limit at panic time if one was
set.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 Documentation/arm64/booting.txt         |  20 ++--
 arch/arm64/include/asm/boot.h           |   6 ++
 arch/arm64/include/asm/kernel-pgtable.h |  12 +++
 arch/arm64/include/asm/kvm_asm.h        |   2 +-
 arch/arm64/include/asm/memory.h         |  15 +--
 arch/arm64/kernel/head.S                |   6 +-
 arch/arm64/kernel/image.h               |  13 ++-
 arch/arm64/mm/init.c                    | 100 +++++++++++++++++++-
 arch/arm64/mm/mmu.c                     |   3 +
 9 files changed, 155 insertions(+), 22 deletions(-)

diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
index 701d39d3171a..56d6d8b796db 100644
--- a/Documentation/arm64/booting.txt
+++ b/Documentation/arm64/booting.txt
@@ -109,7 +109,13 @@ Header notes:
 			1 - 4K
 			2 - 16K
 			3 - 64K
-  Bits 3-63:	Reserved.
+  Bit 3:	Kernel physical placement
+			0 - 2MB aligned base should be as close as possible
+			    to the base of DRAM, since memory below it is not
+			    accessible via the linear mapping
+			1 - 2MB aligned base may be anywhere in physical
+			    memory
+  Bits 4-63:	Reserved.
 
 - When image_size is zero, a bootloader should attempt to keep as much
   memory as possible free for use by the kernel immediately after the
@@ -117,14 +123,14 @@ Header notes:
   depending on selected features, and is effectively unbound.
 
 The Image must be placed text_offset bytes from a 2MB aligned base
-address near the start of usable system RAM and called there. Memory
-below that base address is currently unusable by Linux, and therefore it
-is strongly recommended that this location is the start of system RAM.
-The region between the 2 MB aligned base address and the start of the
-image has no special significance to the kernel, and may be used for
-other purposes.
+address anywhere in usable system RAM and called there. The region
+between the 2 MB aligned base address and the start of the image has no
+special significance to the kernel, and may be used for other purposes.
 At least image_size bytes from the start of the image must be free for
 use by the kernel.
+NOTE: versions prior to v4.6 cannot make use of memory below the
+physical offset of the Image so it is recommended that the Image be
+placed as close as possible to the start of system RAM.
 
 Any memory described to the kernel (even that below the start of the
 image) which is not marked as reserved from the kernel (e.g., with a
diff --git a/arch/arm64/include/asm/boot.h b/arch/arm64/include/asm/boot.h
index 81151b67b26b..ebf2481889c3 100644
--- a/arch/arm64/include/asm/boot.h
+++ b/arch/arm64/include/asm/boot.h
@@ -11,4 +11,10 @@
 #define MIN_FDT_ALIGN		8
 #define MAX_FDT_SIZE		SZ_2M
 
+/*
+ * arm64 requires the kernel image to placed
+ * TEXT_OFFSET bytes beyond a 2 MB aligned base
+ */
+#define MIN_KIMG_ALIGN		SZ_2M
+
 #endif
diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index a459714ee29e..5c6375d8528b 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -79,5 +79,17 @@
 #define SWAPPER_MM_MMUFLAGS	(PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS)
 #endif
 
+/*
+ * To make optimal use of block mappings when laying out the linear
+ * mapping, round down the base of physical memory to a size that can
+ * be mapped efficiently, i.e., either PUD_SIZE (4k granule) or PMD_SIZE
+ * (64k granule), or a multiple that can be mapped using contiguous bits
+ * in the page tables: 32 * PMD_SIZE (16k granule)
+ */
+#ifdef CONFIG_ARM64_64K_PAGES
+#define ARM64_MEMSTART_ALIGN	SZ_512M
+#else
+#define ARM64_MEMSTART_ALIGN	SZ_1G
+#endif
 
 #endif	/* __ASM_KERNEL_PGTABLE_H */
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index f5aee6e764e6..054ac25e7c2e 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -26,7 +26,7 @@
 #define KVM_ARM64_DEBUG_DIRTY_SHIFT	0
 #define KVM_ARM64_DEBUG_DIRTY		(1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
 
-#define kvm_ksym_ref(sym)		((void *)&sym - KIMAGE_VADDR + PAGE_OFFSET)
+#define kvm_ksym_ref(sym)		phys_to_virt((u64)&sym - kimage_voffset)
 
 #ifndef __ASSEMBLY__
 struct kvm;
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 4388651d1f0d..61005e7dd6cb 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -88,10 +88,10 @@
 #define __virt_to_phys(x) ({						\
 	phys_addr_t __x = (phys_addr_t)(x);				\
 	__x >= PAGE_OFFSET ? (__x - PAGE_OFFSET + PHYS_OFFSET) :	\
-			     (__x - KIMAGE_VADDR + PHYS_OFFSET); })
+			     (__x - kimage_voffset); })
 
 #define __phys_to_virt(x)	((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET))
-#define __phys_to_kimg(x)	((unsigned long)((x) - PHYS_OFFSET + KIMAGE_VADDR))
+#define __phys_to_kimg(x)	((unsigned long)((x) + kimage_voffset))
 
 /*
  * Convert a page to/from a physical address
@@ -127,13 +127,14 @@ extern phys_addr_t		memstart_addr;
 /* PHYS_OFFSET - the physical address of the start of memory. */
 #define PHYS_OFFSET		({ memstart_addr; })
 
+/* the offset between the kernel virtual and physical mappings */
+extern u64			kimage_voffset;
+
 /*
- * The maximum physical address that the linear direct mapping
- * of system RAM can cover. (PAGE_OFFSET can be interpreted as
- * a 2's complement signed quantity and negated to derive the
- * maximum size of the linear mapping.)
+ * Allow all memory at the discovery stage. We will clip it later.
  */
-#define MAX_MEMBLOCK_ADDR	({ memstart_addr - PAGE_OFFSET - 1; })
+#define MIN_MEMBLOCK_ADDR	0
+#define MAX_MEMBLOCK_ADDR	U64_MAX
 
 /*
  * PFNs are used to describe any physical page; this means
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 04d38a058b19..05b98289093e 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -428,7 +428,11 @@ __mmap_switched:
 	and	x4, x4, #~(THREAD_SIZE - 1)
 	msr	sp_el0, x4			// Save thread_info
 	str_l	x21, __fdt_pointer, x5		// Save FDT pointer
-	str_l	x24, memstart_addr, x6		// Save PHYS_OFFSET
+
+	ldr	x4, =KIMAGE_VADDR		// Save the offset between
+	sub	x4, x4, x24			// the kernel virtual and
+	str_l	x4, kimage_voffset, x5		// physical mappings
+
 	mov	x29, #0
 #ifdef CONFIG_KASAN
 	bl	kasan_early_init
diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
index 999633bd7294..c9c62cab25a4 100644
--- a/arch/arm64/kernel/image.h
+++ b/arch/arm64/kernel/image.h
@@ -42,15 +42,18 @@
 #endif
 
 #ifdef CONFIG_CPU_BIG_ENDIAN
-#define __HEAD_FLAG_BE	1
+#define __HEAD_FLAG_BE		1
 #else
-#define __HEAD_FLAG_BE	0
+#define __HEAD_FLAG_BE		0
 #endif
 
-#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
+#define __HEAD_FLAG_PAGE_SIZE	((PAGE_SHIFT - 10) / 2)
 
-#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
-			 (__HEAD_FLAG_PAGE_SIZE << 1))
+#define __HEAD_FLAG_PHYS_BASE	1
+
+#define __HEAD_FLAGS		((__HEAD_FLAG_BE << 0) |	\
+				 (__HEAD_FLAG_PAGE_SIZE << 1) |	\
+				 (__HEAD_FLAG_PHYS_BASE << 3))
 
 /*
  * These will output as part of the Image header, which should be little-endian
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 1d627cd8121c..e8e853a1024c 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -35,8 +35,10 @@
 #include <linux/efi.h>
 #include <linux/swiotlb.h>
 
+#include <asm/boot.h>
 #include <asm/fixmap.h>
 #include <asm/kasan.h>
+#include <asm/kernel-pgtable.h>
 #include <asm/memory.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
@@ -158,9 +160,80 @@ static int __init early_mem(char *p)
 }
 early_param("mem", early_mem);
 
+/*
+ * clip_mem_range() - remove memblock memory between @min and @max until
+ *                    we meet the limit in 'memory_limit'.
+ */
+static void __init clip_mem_range(u64 min, u64 max)
+{
+	u64 mem_size, to_remove;
+	int i;
+
+again:
+	mem_size = memblock_phys_mem_size();
+	if (mem_size <= memory_limit || max <= min)
+		return;
+
+	to_remove = mem_size - memory_limit;
+
+	for (i = memblock.memory.cnt - 1; i >= 0; i--) {
+		struct memblock_region *r = memblock.memory.regions + i;
+		u64 start = max(min, r->base);
+		u64 end = min(max, r->base + r->size);
+
+		if (start >= max || end <= min)
+			continue;
+
+		if (end > min) {
+			u64 size = min(to_remove, end - max(start, min));
+
+			memblock_remove(end - size, size);
+		} else {
+			memblock_remove(start, min(max - start, to_remove));
+		}
+		goto again;
+	}
+}
+
 void __init arm64_memblock_init(void)
 {
-	memblock_enforce_memory_limit(memory_limit);
+	const s64 linear_region_size = -(s64)PAGE_OFFSET;
+
+	/*
+	 * Select a suitable value for the base of physical memory.
+	 */
+	memstart_addr = round_down(memblock_start_of_DRAM(),
+				   ARM64_MEMSTART_ALIGN);
+
+	/*
+	 * Remove the memory that we will not be able to cover with the
+	 * linear mapping. Take care not to clip the kernel which may be
+	 * high in memory.
+	 */
+	memblock_remove(max(memstart_addr + linear_region_size, __pa(_end)),
+			ULLONG_MAX);
+	if (memblock_end_of_DRAM() > linear_region_size)
+		memblock_remove(0, memblock_end_of_DRAM() - linear_region_size);
+
+	if (memory_limit != (phys_addr_t)ULLONG_MAX) {
+		u64 kbase = round_down(__pa(_text), MIN_KIMG_ALIGN);
+		u64 kend = PAGE_ALIGN(__pa(_end));
+		u64 const sz_4g = 0x100000000UL;
+
+		/*
+		 * Clip memory in order of preference:
+		 * - above the kernel and above 4 GB
+		 * - between 4 GB and the start of the kernel (if the kernel
+		 *   is loaded high in memory)
+		 * - between the kernel and 4 GB (if the kernel is loaded
+		 *   low in memory)
+		 * - below 4 GB
+		 */
+		clip_mem_range(max(sz_4g, kend), ULLONG_MAX);
+		clip_mem_range(sz_4g, kbase);
+		clip_mem_range(kend, sz_4g);
+		clip_mem_range(0, min(kbase, sz_4g));
+	}
 
 	/*
 	 * Register the kernel text, kernel data, initrd, and initial
@@ -381,3 +454,28 @@ static int __init keepinitrd_setup(char *__unused)
 
 __setup("keepinitrd", keepinitrd_setup);
 #endif
+
+/*
+ * Dump out memory limit information on panic.
+ */
+static int dump_mem_limit(struct notifier_block *self, unsigned long v, void *p)
+{
+	if (memory_limit != (phys_addr_t)ULLONG_MAX) {
+		pr_emerg("Memory Limit: %llu MB\n", memory_limit >> 20);
+	} else {
+		pr_emerg("Memory Limit: none\n");
+	}
+	return 0;
+}
+
+static struct notifier_block mem_limit_notifier = {
+	.notifier_call = dump_mem_limit,
+};
+
+static int __init register_mem_limit_dumper(void)
+{
+	atomic_notifier_chain_register(&panic_notifier_list,
+				       &mem_limit_notifier);
+	return 0;
+}
+__initcall(register_mem_limit_dumper);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 4c4b15932963..8dda38378959 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -46,6 +46,9 @@
 
 u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
 
+u64 kimage_voffset __read_mostly;
+EXPORT_SYMBOL(kimage_voffset);
+
 /*
  * Empty_zero_page is a special page that is used for zero-initialized data
  * and COW.
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-01 10:54 ` [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area Ard Biesheuvel
@ 2016-02-01 12:24   ` Catalin Marinas
  2016-02-01 12:27     ` Ard Biesheuvel
  2016-02-01 14:32   ` Mark Rutland
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 78+ messages in thread
From: Catalin Marinas @ 2016-02-01 12:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 01, 2016 at 11:54:52AM +0100, Ard Biesheuvel wrote:
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>  unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
>  EXPORT_SYMBOL(empty_zero_page);
>  
> +static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> +static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> +static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;

I applied a fixup locally to keep the compiler quiet:

--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -57,8 +57,8 @@ unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_
 EXPORT_SYMBOL(empty_zero_page);
 
 static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
-static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
-static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
+static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
+static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
 
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 			      unsigned long size, pgprot_t vma_prot)

-- 
Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-01 12:24   ` Catalin Marinas
@ 2016-02-01 12:27     ` Ard Biesheuvel
  2016-02-01 13:41       ` Catalin Marinas
  0 siblings, 1 reply; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-01 12:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 February 2016 at 13:24, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Mon, Feb 01, 2016 at 11:54:52AM +0100, Ard Biesheuvel wrote:
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>>  unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
>>  EXPORT_SYMBOL(empty_zero_page);
>>
>> +static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>> +static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>> +static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>
> I applied a fixup locally to keep the compiler quiet:
>
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -57,8 +57,8 @@ unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_
>  EXPORT_SYMBOL(empty_zero_page);
>
>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> -static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> -static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> +static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
> +static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
>

Ah yes, I dropped a memblock_free() against bm_pud in
early_fixmap_init(), since it occurred before the actual reservation,
so bm_pud may never be referenced. For bm_pmd, it should not be
required afaict.

If you prefer, I can keep the original code here:

#if CONFIG_PGTABLE_LEVELS > 2
static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
#endif
#if CONFIG_PGTABLE_LEVELS > 3
static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
#endif

-- 
Ard.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-01 12:27     ` Ard Biesheuvel
@ 2016-02-01 13:41       ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2016-02-01 13:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 01, 2016 at 01:27:59PM +0100, Ard Biesheuvel wrote:
> On 1 February 2016 at 13:24, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Mon, Feb 01, 2016 at 11:54:52AM +0100, Ard Biesheuvel wrote:
> >> --- a/arch/arm64/mm/mmu.c
> >> +++ b/arch/arm64/mm/mmu.c
> >> @@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
> >>  unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
> >>  EXPORT_SYMBOL(empty_zero_page);
> >>
> >> +static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> >> +static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> >> +static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> >
> > I applied a fixup locally to keep the compiler quiet:
> >
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -57,8 +57,8 @@ unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_
> >  EXPORT_SYMBOL(empty_zero_page);
> >
> >  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> > -static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> > -static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> > +static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
> > +static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
> 
> Ah yes, I dropped a memblock_free() against bm_pud in
> early_fixmap_init(), since it occurred before the actual reservation,
> so bm_pud may never be referenced. For bm_pmd, it should not be
> required afaict.
> 
> If you prefer, I can keep the original code here:
> 
> #if CONFIG_PGTABLE_LEVELS > 2
> static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> #endif
> #if CONFIG_PGTABLE_LEVELS > 3
> static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> #endif

Looking at the CodingStyle doc, __maybe_unused is preferred, so I'll
just keep the fixup.

-- 
Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 2/8] arm64: add support for ioremap() block mappings
  2016-02-01 10:54 ` [PATCH v5sub1 2/8] arm64: add support for ioremap() block mappings Ard Biesheuvel
@ 2016-02-01 14:10   ` Mark Rutland
  2016-02-01 14:56     ` Catalin Marinas
  0 siblings, 1 reply; 78+ messages in thread
From: Mark Rutland @ 2016-02-01 14:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 01, 2016 at 11:54:47AM +0100, Ard Biesheuvel wrote:
> This wires up the existing generic huge-vmap feature, which allows
> ioremap() to use PMD or PUD sized block mappings. It also adds support
> to the unmap path for dealing with block mappings, which will allow us
> to unmap the __init region using unmap_kernel_range() in a subsequent
> patch.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

I was a little bit worried about this potentially not matching the
granularity we used when creating mappings, but seeing how
p?d_clear_huge are called by unmap_kernel_range, I think this is fine.

Thanks,
Mark

> ---
>  Documentation/features/vm/huge-vmap/arch-support.txt |  2 +-
>  arch/arm64/Kconfig                                   |  1 +
>  arch/arm64/include/asm/memory.h                      |  6 +++
>  arch/arm64/mm/mmu.c                                  | 41 ++++++++++++++++++++
>  4 files changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/features/vm/huge-vmap/arch-support.txt b/Documentation/features/vm/huge-vmap/arch-support.txt
> index af6816bccb43..df1d1f3c9af2 100644
> --- a/Documentation/features/vm/huge-vmap/arch-support.txt
> +++ b/Documentation/features/vm/huge-vmap/arch-support.txt
> @@ -9,7 +9,7 @@
>      |       alpha: | TODO |
>      |         arc: | TODO |
>      |         arm: | TODO |
> -    |       arm64: | TODO |
> +    |       arm64: |  ok  |
>      |       avr32: | TODO |
>      |    blackfin: | TODO |
>      |         c6x: | TODO |
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 8cc62289a63e..cd767fa3037a 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -49,6 +49,7 @@ config ARM64
>  	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>  	select HAVE_ARCH_AUDITSYSCALL
>  	select HAVE_ARCH_BITREVERSE
> +	select HAVE_ARCH_HUGE_VMAP
>  	select HAVE_ARCH_JUMP_LABEL
>  	select HAVE_ARCH_KASAN if SPARSEMEM_VMEMMAP && !(ARM64_16K_PAGES && ARM64_VA_BITS_48)
>  	select HAVE_ARCH_KGDB
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 853953cd1f08..c65aad7b13dc 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -100,6 +100,12 @@
>  #define MT_S2_NORMAL		0xf
>  #define MT_S2_DEVICE_nGnRE	0x1
>  
> +#ifdef CONFIG_ARM64_4K_PAGES
> +#define IOREMAP_MAX_ORDER	(PUD_SHIFT)
> +#else
> +#define IOREMAP_MAX_ORDER	(PMD_SHIFT)
> +#endif
> +
>  #ifndef __ASSEMBLY__
>  
>  extern phys_addr_t		memstart_addr;
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 7711554a94f4..73383019f212 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -714,3 +714,44 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys)
>  
>  	return dt_virt;
>  }
> +
> +int __init arch_ioremap_pud_supported(void)
> +{
> +	/* only 4k granule supports level 1 block mappings */
> +	return IS_ENABLED(CONFIG_ARM64_4K_PAGES);
> +}
> +
> +int __init arch_ioremap_pmd_supported(void)
> +{
> +	return 1;
> +}
> +
> +int pud_set_huge(pud_t *pud, phys_addr_t phys, pgprot_t prot)
> +{
> +	BUG_ON(phys & ~PUD_MASK);
> +	set_pud(pud, __pud(phys | PUD_TYPE_SECT | pgprot_val(mk_sect_prot(prot))));
> +	return 1;
> +}
> +
> +int pmd_set_huge(pmd_t *pmd, phys_addr_t phys, pgprot_t prot)
> +{
> +	BUG_ON(phys & ~PMD_MASK);
> +	set_pmd(pmd, __pmd(phys | PMD_TYPE_SECT | pgprot_val(mk_sect_prot(prot))));
> +	return 1;
> +}
> +
> +int pud_clear_huge(pud_t *pud)
> +{
> +	if (!pud_sect(*pud))
> +		return 0;
> +	pud_clear(pud);
> +	return 1;
> +}
> +
> +int pmd_clear_huge(pmd_t *pmd)
> +{
> +	if (!pmd_sect(*pmd))
> +		return 0;
> +	pmd_clear(pmd);
> +	return 1;
> +}
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-01 10:54 ` [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area Ard Biesheuvel
  2016-02-01 12:24   ` Catalin Marinas
@ 2016-02-01 14:32   ` Mark Rutland
  2016-02-12 14:58   ` Catalin Marinas
  2016-02-12 17:47   ` [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area James Morse
  3 siblings, 0 replies; 78+ messages in thread
From: Mark Rutland @ 2016-02-01 14:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 01, 2016 at 11:54:52AM +0100, Ard Biesheuvel wrote:
> This moves the module area to right before the vmalloc area, and
> moves the kernel image to the base of the vmalloc area. This is
> an intermediate step towards implementing KASLR, which allows the
> kernel image to be located anywhere in the vmalloc area.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

With the fix for the issue Catalin spotted:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  arch/arm64/include/asm/kasan.h   |  2 +-
>  arch/arm64/include/asm/memory.h  | 21 +++--
>  arch/arm64/include/asm/pgtable.h | 10 +-
>  arch/arm64/mm/dump.c             | 12 +--
>  arch/arm64/mm/init.c             | 23 ++---
>  arch/arm64/mm/kasan_init.c       | 31 ++++++-
>  arch/arm64/mm/mmu.c              | 97 +++++++++++++-------
>  7 files changed, 129 insertions(+), 67 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
> index de0d21211c34..71ad0f93eb71 100644
> --- a/arch/arm64/include/asm/kasan.h
> +++ b/arch/arm64/include/asm/kasan.h
> @@ -14,7 +14,7 @@
>   * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
>   */
>  #define KASAN_SHADOW_START      (VA_START)
> -#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
> +#define KASAN_SHADOW_END        (KASAN_SHADOW_START + KASAN_SHADOW_SIZE)
>  
>  /*
>   * This value is used to map an address to the corresponding shadow
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index aebc739f5a11..4388651d1f0d 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -45,16 +45,15 @@
>   * VA_START - the first kernel virtual address.
>   * TASK_SIZE - the maximum size of a user space task.
>   * TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area.
> - * The module space lives between the addresses given by TASK_SIZE
> - * and PAGE_OFFSET - it must be within 128MB of the kernel text.
>   */
>  #define VA_BITS			(CONFIG_ARM64_VA_BITS)
>  #define VA_START		(UL(0xffffffffffffffff) << VA_BITS)
>  #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
> -#define KIMAGE_VADDR		(PAGE_OFFSET)
> -#define MODULES_END		(KIMAGE_VADDR)
> -#define MODULES_VADDR		(MODULES_END - SZ_64M)
> -#define PCI_IO_END		(MODULES_VADDR - SZ_2M)
> +#define KIMAGE_VADDR		(MODULES_END)
> +#define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
> +#define MODULES_VADDR		(VA_START + KASAN_SHADOW_SIZE)
> +#define MODULES_VSIZE		(SZ_64M)
> +#define PCI_IO_END		(PAGE_OFFSET - SZ_2M)
>  #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
>  #define FIXADDR_TOP		(PCI_IO_START - SZ_2M)
>  #define TASK_SIZE_64		(UL(1) << VA_BITS)
> @@ -72,6 +71,16 @@
>  #define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 4))
>  
>  /*
> + * The size of the KASAN shadow region. This should be 1/8th of the
> + * size of the entire kernel virtual address space.
> + */
> +#ifdef CONFIG_KASAN
> +#define KASAN_SHADOW_SIZE	(UL(1) << (VA_BITS - 3))
> +#else
> +#define KASAN_SHADOW_SIZE	(0)
> +#endif
> +
> +/*
>   * Physical vs virtual RAM address space conversion.  These are
>   * private definitions which should NOT be used outside memory.h
>   * files.  Use virt_to_phys/phys_to_virt/__pa/__va instead.
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 87355408d448..a440f5a85d08 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -36,19 +36,13 @@
>   *
>   * VMEMAP_SIZE: allows the whole VA space to be covered by a struct page array
>   *	(rounded up to PUD_SIZE).
> - * VMALLOC_START: beginning of the kernel VA space
> + * VMALLOC_START: beginning of the kernel vmalloc space
>   * VMALLOC_END: extends to the available space below vmmemmap, PCI I/O space,
>   *	fixed mappings and modules
>   */
>  #define VMEMMAP_SIZE		ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
>  
> -#ifndef CONFIG_KASAN
> -#define VMALLOC_START		(VA_START)
> -#else
> -#include <asm/kasan.h>
> -#define VMALLOC_START		(KASAN_SHADOW_END + SZ_64K)
> -#endif
> -
> +#define VMALLOC_START		(MODULES_END)
>  #define VMALLOC_END		(PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
>  
>  #define vmemmap			((struct page *)(VMALLOC_END + SZ_64K))
> diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
> index 0adbebbc2803..e83ffb00560c 100644
> --- a/arch/arm64/mm/dump.c
> +++ b/arch/arm64/mm/dump.c
> @@ -35,7 +35,9 @@ struct addr_marker {
>  };
>  
>  enum address_markers_idx {
> -	VMALLOC_START_NR = 0,
> +	MODULES_START_NR = 0,
> +	MODULES_END_NR,
> +	VMALLOC_START_NR,
>  	VMALLOC_END_NR,
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  	VMEMMAP_START_NR,
> @@ -45,12 +47,12 @@ enum address_markers_idx {
>  	FIXADDR_END_NR,
>  	PCI_START_NR,
>  	PCI_END_NR,
> -	MODULES_START_NR,
> -	MODULES_END_NR,
>  	KERNEL_SPACE_NR,
>  };
>  
>  static struct addr_marker address_markers[] = {
> +	{ MODULES_VADDR,	"Modules start" },
> +	{ MODULES_END,		"Modules end" },
>  	{ VMALLOC_START,	"vmalloc() Area" },
>  	{ VMALLOC_END,		"vmalloc() End" },
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
> @@ -61,9 +63,7 @@ static struct addr_marker address_markers[] = {
>  	{ FIXADDR_TOP,		"Fixmap end" },
>  	{ PCI_IO_START,		"PCI I/O start" },
>  	{ PCI_IO_END,		"PCI I/O end" },
> -	{ MODULES_VADDR,	"Modules start" },
> -	{ MODULES_END,		"Modules end" },
> -	{ PAGE_OFFSET,		"Kernel Mapping" },
> +	{ PAGE_OFFSET,		"Linear Mapping" },
>  	{ -1,			NULL },
>  };
>  
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index f3b061e67bfe..1d627cd8121c 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -36,6 +36,7 @@
>  #include <linux/swiotlb.h>
>  
>  #include <asm/fixmap.h>
> +#include <asm/kasan.h>
>  #include <asm/memory.h>
>  #include <asm/sections.h>
>  #include <asm/setup.h>
> @@ -302,22 +303,26 @@ void __init mem_init(void)
>  #ifdef CONFIG_KASAN
>  		  "    kasan   : 0x%16lx - 0x%16lx   (%6ld GB)\n"
>  #endif
> +		  "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>  		  "    vmalloc : 0x%16lx - 0x%16lx   (%6ld GB)\n"
> +		  "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
> +		  "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
> +		  "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  		  "    vmemmap : 0x%16lx - 0x%16lx   (%6ld GB maximum)\n"
>  		  "              0x%16lx - 0x%16lx   (%6ld MB actual)\n"
>  #endif
>  		  "    fixed   : 0x%16lx - 0x%16lx   (%6ld KB)\n"
>  		  "    PCI I/O : 0x%16lx - 0x%16lx   (%6ld MB)\n"
> -		  "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
> -		  "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n"
> -		  "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
> -		  "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
> -		  "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n",
> +		  "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n",
>  #ifdef CONFIG_KASAN
>  		  MLG(KASAN_SHADOW_START, KASAN_SHADOW_END),
>  #endif
> +		  MLM(MODULES_VADDR, MODULES_END),
>  		  MLG(VMALLOC_START, VMALLOC_END),
> +		  MLK_ROUNDUP(__init_begin, __init_end),
> +		  MLK_ROUNDUP(_text, _etext),
> +		  MLK_ROUNDUP(_sdata, _edata),
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  		  MLG((unsigned long)vmemmap,
>  		      (unsigned long)vmemmap + VMEMMAP_SIZE),
> @@ -326,11 +331,7 @@ void __init mem_init(void)
>  #endif
>  		  MLK(FIXADDR_START, FIXADDR_TOP),
>  		  MLM(PCI_IO_START, PCI_IO_END),
> -		  MLM(MODULES_VADDR, MODULES_END),
> -		  MLM(PAGE_OFFSET, (unsigned long)high_memory),
> -		  MLK_ROUNDUP(__init_begin, __init_end),
> -		  MLK_ROUNDUP(_text, _etext),
> -		  MLK_ROUNDUP(_sdata, _edata));
> +		  MLM(PAGE_OFFSET, (unsigned long)high_memory));
>  
>  #undef MLK
>  #undef MLM
> @@ -358,8 +359,8 @@ void __init mem_init(void)
>  
>  void free_initmem(void)
>  {
> -	fixup_init();
>  	free_initmem_default(0);
> +	fixup_init();
>  }
>  
>  #ifdef CONFIG_BLK_DEV_INITRD
> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
> index cc569a38bc76..66c246871d2e 100644
> --- a/arch/arm64/mm/kasan_init.c
> +++ b/arch/arm64/mm/kasan_init.c
> @@ -17,9 +17,11 @@
>  #include <linux/start_kernel.h>
>  
>  #include <asm/mmu_context.h>
> +#include <asm/kernel-pgtable.h>
>  #include <asm/page.h>
>  #include <asm/pgalloc.h>
>  #include <asm/pgtable.h>
> +#include <asm/sections.h>
>  #include <asm/tlbflush.h>
>  
>  static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
> @@ -33,7 +35,7 @@ static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr,
>  	if (pmd_none(*pmd))
>  		pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
>  
> -	pte = pte_offset_kernel(pmd, addr);
> +	pte = pte_offset_kimg(pmd, addr);
>  	do {
>  		next = addr + PAGE_SIZE;
>  		set_pte(pte, pfn_pte(virt_to_pfn(kasan_zero_page),
> @@ -51,7 +53,7 @@ static void __init kasan_early_pmd_populate(pud_t *pud,
>  	if (pud_none(*pud))
>  		pud_populate(&init_mm, pud, kasan_zero_pmd);
>  
> -	pmd = pmd_offset(pud, addr);
> +	pmd = pmd_offset_kimg(pud, addr);
>  	do {
>  		next = pmd_addr_end(addr, end);
>  		kasan_early_pte_populate(pmd, addr, next);
> @@ -68,7 +70,7 @@ static void __init kasan_early_pud_populate(pgd_t *pgd,
>  	if (pgd_none(*pgd))
>  		pgd_populate(&init_mm, pgd, kasan_zero_pud);
>  
> -	pud = pud_offset(pgd, addr);
> +	pud = pud_offset_kimg(pgd, addr);
>  	do {
>  		next = pud_addr_end(addr, end);
>  		kasan_early_pmd_populate(pud, addr, next);
> @@ -126,9 +128,13 @@ static void __init clear_pgds(unsigned long start,
>  
>  void __init kasan_init(void)
>  {
> +	u64 kimg_shadow_start, kimg_shadow_end;
>  	struct memblock_region *reg;
>  	int i;
>  
> +	kimg_shadow_start = (u64)kasan_mem_to_shadow(_text);
> +	kimg_shadow_end = (u64)kasan_mem_to_shadow(_end);
> +
>  	/*
>  	 * We are going to perform proper setup of shadow memory.
>  	 * At first we should unmap early shadow (clear_pgds() call bellow).
> @@ -142,8 +148,25 @@ void __init kasan_init(void)
>  
>  	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
>  
> +	vmemmap_populate(kimg_shadow_start, kimg_shadow_end, NUMA_NO_NODE);
> +
> +	/*
> +	 * vmemmap_populate() has populated the shadow region that covers the
> +	 * kernel image with SWAPPER_BLOCK_SIZE mappings, so we have to round
> +	 * the start and end addresses to SWAPPER_BLOCK_SIZE as well, to prevent
> +	 * kasan_populate_zero_shadow() from replacing the PMD block mappings
> +	 * with PMD table mappings at the edges of the shadow region for the
> +	 * kernel image.
> +	 */
> +	if (ARM64_SWAPPER_USES_SECTION_MAPS) {
> +		kimg_shadow_start = round_down(kimg_shadow_start,
> +					       SWAPPER_BLOCK_SIZE);
> +		kimg_shadow_end = round_up(kimg_shadow_end, SWAPPER_BLOCK_SIZE);
> +	}
>  	kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
> -			kasan_mem_to_shadow((void *)MODULES_VADDR));
> +				   (void *)kimg_shadow_start);
> +	kasan_populate_zero_shadow((void *)kimg_shadow_end,
> +				   kasan_mem_to_shadow((void *)PAGE_OFFSET));
>  
>  	for_each_memblock(memory, reg) {
>  		void *start = (void *)__phys_to_virt(reg->base);
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index b84915723ea0..4c4b15932963 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>  unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
>  EXPORT_SYMBOL(empty_zero_page);
>  
> +static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> +static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> +static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> +
>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>  			      unsigned long size, pgprot_t vma_prot)
>  {
> @@ -349,14 +353,14 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>  {
>  
>  	unsigned long kernel_start = __pa(_stext);
> -	unsigned long kernel_end = __pa(_end);
> +	unsigned long kernel_end = __pa(_etext);
>  
>  	/*
> -	 * The kernel itself is mapped at page granularity. Map all other
> -	 * memory, making sure we don't overwrite the existing kernel mappings.
> +	 * Take care not to create a writable alias for the
> +	 * read-only text and rodata sections of the kernel image.
>  	 */
>  
> -	/* No overlap with the kernel. */
> +	/* No overlap with the kernel text */
>  	if (end < kernel_start || start >= kernel_end) {
>  		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
>  				     end - start, PAGE_KERNEL,
> @@ -365,7 +369,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>  	}
>  
>  	/*
> -	 * This block overlaps the kernel mapping. Map the portion(s) which
> +	 * This block overlaps the kernel text mapping. Map the portion(s) which
>  	 * don't overlap.
>  	 */
>  	if (start < kernel_start)
> @@ -398,25 +402,28 @@ static void __init map_mem(pgd_t *pgd)
>  	}
>  }
>  
> -#ifdef CONFIG_DEBUG_RODATA
>  void mark_rodata_ro(void)
>  {
> +	if (!IS_ENABLED(CONFIG_DEBUG_RODATA))
> +		return;
> +
>  	create_mapping_late(__pa(_stext), (unsigned long)_stext,
>  				(unsigned long)_etext - (unsigned long)_stext,
>  				PAGE_KERNEL_ROX);
> -
>  }
> -#endif
>  
>  void fixup_init(void)
>  {
> -	create_mapping_late(__pa(__init_begin), (unsigned long)__init_begin,
> -			(unsigned long)__init_end - (unsigned long)__init_begin,
> -			PAGE_KERNEL);
> +	/*
> +	 * Unmap the __init region but leave the VM area in place. This
> +	 * prevents the region from being reused for kernel modules, which
> +	 * is not supported by kallsyms.
> +	 */
> +	unmap_kernel_range((u64)__init_begin, (u64)(__init_end - __init_begin));
>  }
>  
>  static void __init map_kernel_chunk(pgd_t *pgd, void *va_start, void *va_end,
> -				    pgprot_t prot)
> +				    pgprot_t prot, struct vm_struct *vma)
>  {
>  	phys_addr_t pa_start = __pa(va_start);
>  	unsigned long size = va_end - va_start;
> @@ -426,6 +433,14 @@ static void __init map_kernel_chunk(pgd_t *pgd, void *va_start, void *va_end,
>  
>  	__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
>  			     early_pgtable_alloc);
> +
> +	vma->addr	= va_start;
> +	vma->phys_addr	= pa_start;
> +	vma->size	= size;
> +	vma->flags	= VM_MAP;
> +	vma->caller	= map_kernel_chunk;
> +
> +	vm_area_add_early(vma);
>  }
>  
>  /*
> @@ -433,17 +448,35 @@ static void __init map_kernel_chunk(pgd_t *pgd, void *va_start, void *va_end,
>   */
>  static void __init map_kernel(pgd_t *pgd)
>  {
> +	static struct vm_struct vmlinux_text, vmlinux_init, vmlinux_data;
>  
> -	map_kernel_chunk(pgd, _stext, _etext, PAGE_KERNEL_EXEC);
> -	map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
> -	map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
> +	map_kernel_chunk(pgd, _stext, _etext, PAGE_KERNEL_EXEC, &vmlinux_text);
> +	map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
> +			 &vmlinux_init);
> +	map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data);
>  
> -	/*
> -	 * The fixmap falls in a separate pgd to the kernel, and doesn't live
> -	 * in the carveout for the swapper_pg_dir. We can simply re-use the
> -	 * existing dir for the fixmap.
> -	 */
> -	set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
> +	if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {
> +		/*
> +		 * The fixmap falls in a separate pgd to the kernel, and doesn't
> +		 * live in the carveout for the swapper_pg_dir. We can simply
> +		 * re-use the existing dir for the fixmap.
> +		 */
> +		set_pgd(pgd_offset_raw(pgd, FIXADDR_START),
> +			*pgd_offset_k(FIXADDR_START));
> +	} else if (CONFIG_PGTABLE_LEVELS > 3) {
> +		/*
> +		 * The fixmap shares its top level pgd entry with the kernel
> +		 * mapping. This can really only occur when we are running
> +		 * with 16k/4 levels, so we can simply reuse the pud level
> +		 * entry instead.
> +		 */
> +		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
> +		set_pud(pud_set_fixmap_offset(pgd, FIXADDR_START),
> +			__pud(__pa(bm_pmd) | PUD_TYPE_TABLE));
> +		pud_clear_fixmap();
> +	} else {
> +		BUG();
> +	}
>  
>  	kasan_copy_shadow(pgd);
>  }
> @@ -569,14 +602,6 @@ void vmemmap_free(unsigned long start, unsigned long end)
>  }
>  #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
>  
> -static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> -#if CONFIG_PGTABLE_LEVELS > 2
> -static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> -#endif
> -#if CONFIG_PGTABLE_LEVELS > 3
> -static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> -#endif
> -
>  static inline pud_t * fixmap_pud(unsigned long addr)
>  {
>  	pgd_t *pgd = pgd_offset_k(addr);
> @@ -608,8 +633,18 @@ void __init early_fixmap_init(void)
>  	unsigned long addr = FIXADDR_START;
>  
>  	pgd = pgd_offset_k(addr);
> -	pgd_populate(&init_mm, pgd, bm_pud);
> -	pud = fixmap_pud(addr);
> +	if (CONFIG_PGTABLE_LEVELS > 3 && !pgd_none(*pgd)) {
> +		/*
> +		 * We only end up here if the kernel mapping and the fixmap
> +		 * share the top level pgd entry, which should only happen on
> +		 * 16k/4 levels configurations.
> +		 */
> +		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
> +		pud = pud_offset_kimg(pgd, addr);
> +	} else {
> +		pgd_populate(&init_mm, pgd, bm_pud);
> +		pud = fixmap_pud(addr);
> +	}
>  	pud_populate(&init_mm, pud, bm_pmd);
>  	pmd = fixmap_pmd(addr);
>  	pmd_populate_kernel(&init_mm, pmd, bm_pte);
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 8/8] arm64: allow kernel Image to be loaded anywhere in physical memory
  2016-02-01 10:54 ` [PATCH v5sub1 8/8] arm64: allow kernel Image to be loaded anywhere in physical memory Ard Biesheuvel
@ 2016-02-01 14:50   ` Mark Rutland
  2016-02-01 16:28     ` Fu Wei
  2016-02-01 15:06   ` Catalin Marinas
  1 sibling, 1 reply; 78+ messages in thread
From: Mark Rutland @ 2016-02-01 14:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 01, 2016 at 11:54:53AM +0100, Ard Biesheuvel wrote:
> This relaxes the kernel Image placement requirements, so that it
> may be placed at any 2 MB aligned offset in physical memory.
> 
> This is accomplished by ignoring PHYS_OFFSET when installing
> memblocks, and accounting for the apparent virtual offset of
> the kernel Image. As a result, virtual address references
> below PAGE_OFFSET are correctly mapped onto physical references
> into the kernel Image regardless of where it sits in memory.
> 
> Note that limiting memory using mem= is not unambiguous anymore after
> this change, considering that the kernel may be at the top of physical
> memory, and clipping from the bottom rather than the top will discard
> any 32-bit DMA addressable memory first. To deal with this, the handling
> of mem= is reimplemented to clip top down, but take special care not to
> clip memory that covers the kernel image.
> 
> Since mem= should not be considered a production feature, a panic notifier
> handler is installed that dumps the memory limit at panic time if one was
> set.

Good idea!

It would be great if we could follow up with a sizes.h update for SZ_4G,
though that's only a nice-to-have, and in no way should block this.

Other than that, this looks good. Thanks for putting this together!

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

For the Documentation/arm64 parts we'll need to ask Fu Wei to update the
zh_CN/ translation to match.

Mark.

> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  Documentation/arm64/booting.txt         |  20 ++--
>  arch/arm64/include/asm/boot.h           |   6 ++
>  arch/arm64/include/asm/kernel-pgtable.h |  12 +++
>  arch/arm64/include/asm/kvm_asm.h        |   2 +-
>  arch/arm64/include/asm/memory.h         |  15 +--
>  arch/arm64/kernel/head.S                |   6 +-
>  arch/arm64/kernel/image.h               |  13 ++-
>  arch/arm64/mm/init.c                    | 100 +++++++++++++++++++-
>  arch/arm64/mm/mmu.c                     |   3 +
>  9 files changed, 155 insertions(+), 22 deletions(-)
> 
> diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
> index 701d39d3171a..56d6d8b796db 100644
> --- a/Documentation/arm64/booting.txt
> +++ b/Documentation/arm64/booting.txt
> @@ -109,7 +109,13 @@ Header notes:
>  			1 - 4K
>  			2 - 16K
>  			3 - 64K
> -  Bits 3-63:	Reserved.
> +  Bit 3:	Kernel physical placement
> +			0 - 2MB aligned base should be as close as possible
> +			    to the base of DRAM, since memory below it is not
> +			    accessible via the linear mapping
> +			1 - 2MB aligned base may be anywhere in physical
> +			    memory
> +  Bits 4-63:	Reserved.
>  
>  - When image_size is zero, a bootloader should attempt to keep as much
>    memory as possible free for use by the kernel immediately after the
> @@ -117,14 +123,14 @@ Header notes:
>    depending on selected features, and is effectively unbound.
>  
>  The Image must be placed text_offset bytes from a 2MB aligned base
> -address near the start of usable system RAM and called there. Memory
> -below that base address is currently unusable by Linux, and therefore it
> -is strongly recommended that this location is the start of system RAM.
> -The region between the 2 MB aligned base address and the start of the
> -image has no special significance to the kernel, and may be used for
> -other purposes.
> +address anywhere in usable system RAM and called there. The region
> +between the 2 MB aligned base address and the start of the image has no
> +special significance to the kernel, and may be used for other purposes.
>  At least image_size bytes from the start of the image must be free for
>  use by the kernel.
> +NOTE: versions prior to v4.6 cannot make use of memory below the
> +physical offset of the Image so it is recommended that the Image be
> +placed as close as possible to the start of system RAM.
>  
>  Any memory described to the kernel (even that below the start of the
>  image) which is not marked as reserved from the kernel (e.g., with a
> diff --git a/arch/arm64/include/asm/boot.h b/arch/arm64/include/asm/boot.h
> index 81151b67b26b..ebf2481889c3 100644
> --- a/arch/arm64/include/asm/boot.h
> +++ b/arch/arm64/include/asm/boot.h
> @@ -11,4 +11,10 @@
>  #define MIN_FDT_ALIGN		8
>  #define MAX_FDT_SIZE		SZ_2M
>  
> +/*
> + * arm64 requires the kernel image to placed
> + * TEXT_OFFSET bytes beyond a 2 MB aligned base
> + */
> +#define MIN_KIMG_ALIGN		SZ_2M
> +
>  #endif
> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
> index a459714ee29e..5c6375d8528b 100644
> --- a/arch/arm64/include/asm/kernel-pgtable.h
> +++ b/arch/arm64/include/asm/kernel-pgtable.h
> @@ -79,5 +79,17 @@
>  #define SWAPPER_MM_MMUFLAGS	(PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS)
>  #endif
>  
> +/*
> + * To make optimal use of block mappings when laying out the linear
> + * mapping, round down the base of physical memory to a size that can
> + * be mapped efficiently, i.e., either PUD_SIZE (4k granule) or PMD_SIZE
> + * (64k granule), or a multiple that can be mapped using contiguous bits
> + * in the page tables: 32 * PMD_SIZE (16k granule)
> + */
> +#ifdef CONFIG_ARM64_64K_PAGES
> +#define ARM64_MEMSTART_ALIGN	SZ_512M
> +#else
> +#define ARM64_MEMSTART_ALIGN	SZ_1G
> +#endif
>  
>  #endif	/* __ASM_KERNEL_PGTABLE_H */
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index f5aee6e764e6..054ac25e7c2e 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -26,7 +26,7 @@
>  #define KVM_ARM64_DEBUG_DIRTY_SHIFT	0
>  #define KVM_ARM64_DEBUG_DIRTY		(1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
>  
> -#define kvm_ksym_ref(sym)		((void *)&sym - KIMAGE_VADDR + PAGE_OFFSET)
> +#define kvm_ksym_ref(sym)		phys_to_virt((u64)&sym - kimage_voffset)
>  
>  #ifndef __ASSEMBLY__
>  struct kvm;
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 4388651d1f0d..61005e7dd6cb 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -88,10 +88,10 @@
>  #define __virt_to_phys(x) ({						\
>  	phys_addr_t __x = (phys_addr_t)(x);				\
>  	__x >= PAGE_OFFSET ? (__x - PAGE_OFFSET + PHYS_OFFSET) :	\
> -			     (__x - KIMAGE_VADDR + PHYS_OFFSET); })
> +			     (__x - kimage_voffset); })
>  
>  #define __phys_to_virt(x)	((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET))
> -#define __phys_to_kimg(x)	((unsigned long)((x) - PHYS_OFFSET + KIMAGE_VADDR))
> +#define __phys_to_kimg(x)	((unsigned long)((x) + kimage_voffset))
>  
>  /*
>   * Convert a page to/from a physical address
> @@ -127,13 +127,14 @@ extern phys_addr_t		memstart_addr;
>  /* PHYS_OFFSET - the physical address of the start of memory. */
>  #define PHYS_OFFSET		({ memstart_addr; })
>  
> +/* the offset between the kernel virtual and physical mappings */
> +extern u64			kimage_voffset;
> +
>  /*
> - * The maximum physical address that the linear direct mapping
> - * of system RAM can cover. (PAGE_OFFSET can be interpreted as
> - * a 2's complement signed quantity and negated to derive the
> - * maximum size of the linear mapping.)
> + * Allow all memory at the discovery stage. We will clip it later.
>   */
> -#define MAX_MEMBLOCK_ADDR	({ memstart_addr - PAGE_OFFSET - 1; })
> +#define MIN_MEMBLOCK_ADDR	0
> +#define MAX_MEMBLOCK_ADDR	U64_MAX
>  
>  /*
>   * PFNs are used to describe any physical page; this means
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 04d38a058b19..05b98289093e 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -428,7 +428,11 @@ __mmap_switched:
>  	and	x4, x4, #~(THREAD_SIZE - 1)
>  	msr	sp_el0, x4			// Save thread_info
>  	str_l	x21, __fdt_pointer, x5		// Save FDT pointer
> -	str_l	x24, memstart_addr, x6		// Save PHYS_OFFSET
> +
> +	ldr	x4, =KIMAGE_VADDR		// Save the offset between
> +	sub	x4, x4, x24			// the kernel virtual and
> +	str_l	x4, kimage_voffset, x5		// physical mappings
> +
>  	mov	x29, #0
>  #ifdef CONFIG_KASAN
>  	bl	kasan_early_init
> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
> index 999633bd7294..c9c62cab25a4 100644
> --- a/arch/arm64/kernel/image.h
> +++ b/arch/arm64/kernel/image.h
> @@ -42,15 +42,18 @@
>  #endif
>  
>  #ifdef CONFIG_CPU_BIG_ENDIAN
> -#define __HEAD_FLAG_BE	1
> +#define __HEAD_FLAG_BE		1
>  #else
> -#define __HEAD_FLAG_BE	0
> +#define __HEAD_FLAG_BE		0
>  #endif
>  
> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
> +#define __HEAD_FLAG_PAGE_SIZE	((PAGE_SHIFT - 10) / 2)
>  
> -#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
> -			 (__HEAD_FLAG_PAGE_SIZE << 1))
> +#define __HEAD_FLAG_PHYS_BASE	1
> +
> +#define __HEAD_FLAGS		((__HEAD_FLAG_BE << 0) |	\
> +				 (__HEAD_FLAG_PAGE_SIZE << 1) |	\
> +				 (__HEAD_FLAG_PHYS_BASE << 3))
>  
>  /*
>   * These will output as part of the Image header, which should be little-endian
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 1d627cd8121c..e8e853a1024c 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -35,8 +35,10 @@
>  #include <linux/efi.h>
>  #include <linux/swiotlb.h>
>  
> +#include <asm/boot.h>
>  #include <asm/fixmap.h>
>  #include <asm/kasan.h>
> +#include <asm/kernel-pgtable.h>
>  #include <asm/memory.h>
>  #include <asm/sections.h>
>  #include <asm/setup.h>
> @@ -158,9 +160,80 @@ static int __init early_mem(char *p)
>  }
>  early_param("mem", early_mem);
>  
> +/*
> + * clip_mem_range() - remove memblock memory between @min and @max until
> + *                    we meet the limit in 'memory_limit'.
> + */
> +static void __init clip_mem_range(u64 min, u64 max)
> +{
> +	u64 mem_size, to_remove;
> +	int i;
> +
> +again:
> +	mem_size = memblock_phys_mem_size();
> +	if (mem_size <= memory_limit || max <= min)
> +		return;
> +
> +	to_remove = mem_size - memory_limit;
> +
> +	for (i = memblock.memory.cnt - 1; i >= 0; i--) {
> +		struct memblock_region *r = memblock.memory.regions + i;
> +		u64 start = max(min, r->base);
> +		u64 end = min(max, r->base + r->size);
> +
> +		if (start >= max || end <= min)
> +			continue;
> +
> +		if (end > min) {
> +			u64 size = min(to_remove, end - max(start, min));
> +
> +			memblock_remove(end - size, size);
> +		} else {
> +			memblock_remove(start, min(max - start, to_remove));
> +		}
> +		goto again;
> +	}
> +}
> +
>  void __init arm64_memblock_init(void)
>  {
> -	memblock_enforce_memory_limit(memory_limit);
> +	const s64 linear_region_size = -(s64)PAGE_OFFSET;
> +
> +	/*
> +	 * Select a suitable value for the base of physical memory.
> +	 */
> +	memstart_addr = round_down(memblock_start_of_DRAM(),
> +				   ARM64_MEMSTART_ALIGN);
> +
> +	/*
> +	 * Remove the memory that we will not be able to cover with the
> +	 * linear mapping. Take care not to clip the kernel which may be
> +	 * high in memory.
> +	 */
> +	memblock_remove(max(memstart_addr + linear_region_size, __pa(_end)),
> +			ULLONG_MAX);
> +	if (memblock_end_of_DRAM() > linear_region_size)
> +		memblock_remove(0, memblock_end_of_DRAM() - linear_region_size);
> +
> +	if (memory_limit != (phys_addr_t)ULLONG_MAX) {
> +		u64 kbase = round_down(__pa(_text), MIN_KIMG_ALIGN);
> +		u64 kend = PAGE_ALIGN(__pa(_end));
> +		u64 const sz_4g = 0x100000000UL;
> +
> +		/*
> +		 * Clip memory in order of preference:
> +		 * - above the kernel and above 4 GB
> +		 * - between 4 GB and the start of the kernel (if the kernel
> +		 *   is loaded high in memory)
> +		 * - between the kernel and 4 GB (if the kernel is loaded
> +		 *   low in memory)
> +		 * - below 4 GB
> +		 */
> +		clip_mem_range(max(sz_4g, kend), ULLONG_MAX);
> +		clip_mem_range(sz_4g, kbase);
> +		clip_mem_range(kend, sz_4g);
> +		clip_mem_range(0, min(kbase, sz_4g));
> +	}
>  
>  	/*
>  	 * Register the kernel text, kernel data, initrd, and initial
> @@ -381,3 +454,28 @@ static int __init keepinitrd_setup(char *__unused)
>  
>  __setup("keepinitrd", keepinitrd_setup);
>  #endif
> +
> +/*
> + * Dump out memory limit information on panic.
> + */
> +static int dump_mem_limit(struct notifier_block *self, unsigned long v, void *p)
> +{
> +	if (memory_limit != (phys_addr_t)ULLONG_MAX) {
> +		pr_emerg("Memory Limit: %llu MB\n", memory_limit >> 20);
> +	} else {
> +		pr_emerg("Memory Limit: none\n");
> +	}
> +	return 0;
> +}
> +
> +static struct notifier_block mem_limit_notifier = {
> +	.notifier_call = dump_mem_limit,
> +};
> +
> +static int __init register_mem_limit_dumper(void)
> +{
> +	atomic_notifier_chain_register(&panic_notifier_list,
> +				       &mem_limit_notifier);
> +	return 0;
> +}
> +__initcall(register_mem_limit_dumper);
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 4c4b15932963..8dda38378959 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -46,6 +46,9 @@
>  
>  u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>  
> +u64 kimage_voffset __read_mostly;
> +EXPORT_SYMBOL(kimage_voffset);
> +
>  /*
>   * Empty_zero_page is a special page that is used for zero-initialized data
>   * and COW.
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 2/8] arm64: add support for ioremap() block mappings
  2016-02-01 14:10   ` Mark Rutland
@ 2016-02-01 14:56     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2016-02-01 14:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 01, 2016 at 02:10:04PM +0000, Mark Rutland wrote:
> On Mon, Feb 01, 2016 at 11:54:47AM +0100, Ard Biesheuvel wrote:
> > This wires up the existing generic huge-vmap feature, which allows
> > ioremap() to use PMD or PUD sized block mappings. It also adds support
> > to the unmap path for dealing with block mappings, which will allow us
> > to unmap the __init region using unmap_kernel_range() in a subsequent
> > patch.
> > 
> > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> 
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> 
> I was a little bit worried about this potentially not matching the
> granularity we used when creating mappings, but seeing how
> p?d_clear_huge are called by unmap_kernel_range, I think this is fine.

I tried the warnings below and they didn't trigger. Anyway, if we ever
unmapped more, I guess we would have quickly triggered a kernel fault.

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index fb42a5bffe47..40362d62d1e1 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -77,8 +77,10 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end)
 	pmd = pmd_offset(pud, addr);
 	do {
 		next = pmd_addr_end(addr, end);
-		if (pmd_clear_huge(pmd))
+		if (pmd_clear_huge(pmd)) {
+			WARN_ON(next < addr + PMD_SIZE);
 			continue;
+		}
 		if (pmd_none_or_clear_bad(pmd))
 			continue;
 		vunmap_pte_range(pmd, addr, next);
@@ -93,8 +95,10 @@ static void vunmap_pud_range(pgd_t *pgd, unsigned long addr, unsigned long end)
 	pud = pud_offset(pgd, addr);
 	do {
 		next = pud_addr_end(addr, end);
-		if (pud_clear_huge(pud))
+		if (pud_clear_huge(pud)) {
+			WARN_ON(next < addr + PUD_SIZE);
 			continue;
+		}
 		if (pud_none_or_clear_bad(pud))
 			continue;
 		vunmap_pmd_range(pud, addr, next);

-- 
Catalin

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 8/8] arm64: allow kernel Image to be loaded anywhere in physical memory
  2016-02-01 10:54 ` [PATCH v5sub1 8/8] arm64: allow kernel Image to be loaded anywhere in physical memory Ard Biesheuvel
  2016-02-01 14:50   ` Mark Rutland
@ 2016-02-01 15:06   ` Catalin Marinas
  2016-02-01 15:13     ` Ard Biesheuvel
  1 sibling, 1 reply; 78+ messages in thread
From: Catalin Marinas @ 2016-02-01 15:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 01, 2016 at 11:54:53AM +0100, Ard Biesheuvel wrote:
> Note that limiting memory using mem= is not unambiguous anymore after
> this change, considering that the kernel may be at the top of physical
> memory, and clipping from the bottom rather than the top will discard
> any 32-bit DMA addressable memory first. To deal with this, the handling
> of mem= is reimplemented to clip top down, but take special care not to
> clip memory that covers the kernel image.

I may have forgotten the reason - why do we need to avoid clipping the
memory that covers the kernel image? It's already mapped in the vmalloc
area, so we wouldn't need it in the linear map as well.

-- 
Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 8/8] arm64: allow kernel Image to be loaded anywhere in physical memory
  2016-02-01 15:06   ` Catalin Marinas
@ 2016-02-01 15:13     ` Ard Biesheuvel
  2016-02-01 16:31       ` Ard Biesheuvel
  0 siblings, 1 reply; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-01 15:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 February 2016 at 16:06, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Mon, Feb 01, 2016 at 11:54:53AM +0100, Ard Biesheuvel wrote:
>> Note that limiting memory using mem= is not unambiguous anymore after
>> this change, considering that the kernel may be at the top of physical
>> memory, and clipping from the bottom rather than the top will discard
>> any 32-bit DMA addressable memory first. To deal with this, the handling
>> of mem= is reimplemented to clip top down, but take special care not to
>> clip memory that covers the kernel image.
>
> I may have forgotten the reason - why do we need to avoid clipping the
> memory that covers the kernel image? It's already mapped in the vmalloc
> area, so we wouldn't need it in the linear map as well.
>

Good question. Originally, I needed it for swapper_pg_dir, whose
pud/pmd/pte levels were accessed via __va() translations of the values
found in the higher-up table entries, but after Mark's patches, only
the top level pgd of swapper_pg_dir is still used. Similarly, for
idmap_pg_dir, we don't change any mappings at runtime so the same
applies there I think.

I will try dropping this, and see what happens.

-- 
Ard.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 8/8] arm64: allow kernel Image to be loaded anywhere in physical memory
  2016-02-01 14:50   ` Mark Rutland
@ 2016-02-01 16:28     ` Fu Wei
  2016-02-16  8:55       ` Fu Wei
  0 siblings, 1 reply; 78+ messages in thread
From: Fu Wei @ 2016-02-01 16:28 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark

On 02/01/2016 10:50 PM, Mark Rutland wrote:
> On Mon, Feb 01, 2016 at 11:54:53AM +0100, Ard Biesheuvel wrote:
>> This relaxes the kernel Image placement requirements, so that it
>> may be placed at any 2 MB aligned offset in physical memory.
>>
>> This is accomplished by ignoring PHYS_OFFSET when installing
>> memblocks, and accounting for the apparent virtual offset of
>> the kernel Image. As a result, virtual address references
>> below PAGE_OFFSET are correctly mapped onto physical references
>> into the kernel Image regardless of where it sits in memory.
>>
>> Note that limiting memory using mem= is not unambiguous anymore after
>> this change, considering that the kernel may be at the top of physical
>> memory, and clipping from the bottom rather than the top will discard
>> any 32-bit DMA addressable memory first. To deal with this, the handling
>> of mem= is reimplemented to clip top down, but take special care not to
>> clip memory that covers the kernel image.
>>
>> Since mem= should not be considered a production feature, a panic notifier
>> handler is installed that dumps the memory limit at panic time if one was
>> set.
>
> Good idea!
>
> It would be great if we could follow up with a sizes.h update for SZ_4G,
> though that's only a nice-to-have, and in no way should block this.
>
> Other than that, this looks good. Thanks for putting this together!
>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
>
> For the Documentation/arm64 parts we'll need to ask Fu Wei to update the
> zh_CN/ translation to match.

Great thanks for your info
Yes, I will working on it

>
> Mark.
>
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>   Documentation/arm64/booting.txt         |  20 ++--
>>   arch/arm64/include/asm/boot.h           |   6 ++
>>   arch/arm64/include/asm/kernel-pgtable.h |  12 +++
>>   arch/arm64/include/asm/kvm_asm.h        |   2 +-
>>   arch/arm64/include/asm/memory.h         |  15 +--
>>   arch/arm64/kernel/head.S                |   6 +-
>>   arch/arm64/kernel/image.h               |  13 ++-
>>   arch/arm64/mm/init.c                    | 100 +++++++++++++++++++-
>>   arch/arm64/mm/mmu.c                     |   3 +
>>   9 files changed, 155 insertions(+), 22 deletions(-)
>>
>> diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
>> index 701d39d3171a..56d6d8b796db 100644
>> --- a/Documentation/arm64/booting.txt
>> +++ b/Documentation/arm64/booting.txt
>> @@ -109,7 +109,13 @@ Header notes:
>>   			1 - 4K
>>   			2 - 16K
>>   			3 - 64K
>> -  Bits 3-63:	Reserved.
>> +  Bit 3:	Kernel physical placement
>> +			0 - 2MB aligned base should be as close as possible
>> +			    to the base of DRAM, since memory below it is not
>> +			    accessible via the linear mapping
>> +			1 - 2MB aligned base may be anywhere in physical
>> +			    memory
>> +  Bits 4-63:	Reserved.
>>
>>   - When image_size is zero, a bootloader should attempt to keep as much
>>     memory as possible free for use by the kernel immediately after the
>> @@ -117,14 +123,14 @@ Header notes:
>>     depending on selected features, and is effectively unbound.
>>
>>   The Image must be placed text_offset bytes from a 2MB aligned base
>> -address near the start of usable system RAM and called there. Memory
>> -below that base address is currently unusable by Linux, and therefore it
>> -is strongly recommended that this location is the start of system RAM.
>> -The region between the 2 MB aligned base address and the start of the
>> -image has no special significance to the kernel, and may be used for
>> -other purposes.
>> +address anywhere in usable system RAM and called there. The region
>> +between the 2 MB aligned base address and the start of the image has no
>> +special significance to the kernel, and may be used for other purposes.
>>   At least image_size bytes from the start of the image must be free for
>>   use by the kernel.
>> +NOTE: versions prior to v4.6 cannot make use of memory below the
>> +physical offset of the Image so it is recommended that the Image be
>> +placed as close as possible to the start of system RAM.
>>
>>   Any memory described to the kernel (even that below the start of the
>>   image) which is not marked as reserved from the kernel (e.g., with a
>> diff --git a/arch/arm64/include/asm/boot.h b/arch/arm64/include/asm/boot.h
>> index 81151b67b26b..ebf2481889c3 100644
>> --- a/arch/arm64/include/asm/boot.h
>> +++ b/arch/arm64/include/asm/boot.h
>> @@ -11,4 +11,10 @@
>>   #define MIN_FDT_ALIGN		8
>>   #define MAX_FDT_SIZE		SZ_2M
>>
>> +/*
>> + * arm64 requires the kernel image to placed
>> + * TEXT_OFFSET bytes beyond a 2 MB aligned base
>> + */
>> +#define MIN_KIMG_ALIGN		SZ_2M
>> +
>>   #endif
>> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
>> index a459714ee29e..5c6375d8528b 100644
>> --- a/arch/arm64/include/asm/kernel-pgtable.h
>> +++ b/arch/arm64/include/asm/kernel-pgtable.h
>> @@ -79,5 +79,17 @@
>>   #define SWAPPER_MM_MMUFLAGS	(PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS)
>>   #endif
>>
>> +/*
>> + * To make optimal use of block mappings when laying out the linear
>> + * mapping, round down the base of physical memory to a size that can
>> + * be mapped efficiently, i.e., either PUD_SIZE (4k granule) or PMD_SIZE
>> + * (64k granule), or a multiple that can be mapped using contiguous bits
>> + * in the page tables: 32 * PMD_SIZE (16k granule)
>> + */
>> +#ifdef CONFIG_ARM64_64K_PAGES
>> +#define ARM64_MEMSTART_ALIGN	SZ_512M
>> +#else
>> +#define ARM64_MEMSTART_ALIGN	SZ_1G
>> +#endif
>>
>>   #endif	/* __ASM_KERNEL_PGTABLE_H */
>> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
>> index f5aee6e764e6..054ac25e7c2e 100644
>> --- a/arch/arm64/include/asm/kvm_asm.h
>> +++ b/arch/arm64/include/asm/kvm_asm.h
>> @@ -26,7 +26,7 @@
>>   #define KVM_ARM64_DEBUG_DIRTY_SHIFT	0
>>   #define KVM_ARM64_DEBUG_DIRTY		(1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
>>
>> -#define kvm_ksym_ref(sym)		((void *)&sym - KIMAGE_VADDR + PAGE_OFFSET)
>> +#define kvm_ksym_ref(sym)		phys_to_virt((u64)&sym - kimage_voffset)
>>
>>   #ifndef __ASSEMBLY__
>>   struct kvm;
>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
>> index 4388651d1f0d..61005e7dd6cb 100644
>> --- a/arch/arm64/include/asm/memory.h
>> +++ b/arch/arm64/include/asm/memory.h
>> @@ -88,10 +88,10 @@
>>   #define __virt_to_phys(x) ({						\
>>   	phys_addr_t __x = (phys_addr_t)(x);				\
>>   	__x >= PAGE_OFFSET ? (__x - PAGE_OFFSET + PHYS_OFFSET) :	\
>> -			     (__x - KIMAGE_VADDR + PHYS_OFFSET); })
>> +			     (__x - kimage_voffset); })
>>
>>   #define __phys_to_virt(x)	((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET))
>> -#define __phys_to_kimg(x)	((unsigned long)((x) - PHYS_OFFSET + KIMAGE_VADDR))
>> +#define __phys_to_kimg(x)	((unsigned long)((x) + kimage_voffset))
>>
>>   /*
>>    * Convert a page to/from a physical address
>> @@ -127,13 +127,14 @@ extern phys_addr_t		memstart_addr;
>>   /* PHYS_OFFSET - the physical address of the start of memory. */
>>   #define PHYS_OFFSET		({ memstart_addr; })
>>
>> +/* the offset between the kernel virtual and physical mappings */
>> +extern u64			kimage_voffset;
>> +
>>   /*
>> - * The maximum physical address that the linear direct mapping
>> - * of system RAM can cover. (PAGE_OFFSET can be interpreted as
>> - * a 2's complement signed quantity and negated to derive the
>> - * maximum size of the linear mapping.)
>> + * Allow all memory at the discovery stage. We will clip it later.
>>    */
>> -#define MAX_MEMBLOCK_ADDR	({ memstart_addr - PAGE_OFFSET - 1; })
>> +#define MIN_MEMBLOCK_ADDR	0
>> +#define MAX_MEMBLOCK_ADDR	U64_MAX
>>
>>   /*
>>    * PFNs are used to describe any physical page; this means
>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> index 04d38a058b19..05b98289093e 100644
>> --- a/arch/arm64/kernel/head.S
>> +++ b/arch/arm64/kernel/head.S
>> @@ -428,7 +428,11 @@ __mmap_switched:
>>   	and	x4, x4, #~(THREAD_SIZE - 1)
>>   	msr	sp_el0, x4			// Save thread_info
>>   	str_l	x21, __fdt_pointer, x5		// Save FDT pointer
>> -	str_l	x24, memstart_addr, x6		// Save PHYS_OFFSET
>> +
>> +	ldr	x4, =KIMAGE_VADDR		// Save the offset between
>> +	sub	x4, x4, x24			// the kernel virtual and
>> +	str_l	x4, kimage_voffset, x5		// physical mappings
>> +
>>   	mov	x29, #0
>>   #ifdef CONFIG_KASAN
>>   	bl	kasan_early_init
>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
>> index 999633bd7294..c9c62cab25a4 100644
>> --- a/arch/arm64/kernel/image.h
>> +++ b/arch/arm64/kernel/image.h
>> @@ -42,15 +42,18 @@
>>   #endif
>>
>>   #ifdef CONFIG_CPU_BIG_ENDIAN
>> -#define __HEAD_FLAG_BE	1
>> +#define __HEAD_FLAG_BE		1
>>   #else
>> -#define __HEAD_FLAG_BE	0
>> +#define __HEAD_FLAG_BE		0
>>   #endif
>>
>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>> +#define __HEAD_FLAG_PAGE_SIZE	((PAGE_SHIFT - 10) / 2)
>>
>> -#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
>> -			 (__HEAD_FLAG_PAGE_SIZE << 1))
>> +#define __HEAD_FLAG_PHYS_BASE	1
>> +
>> +#define __HEAD_FLAGS		((__HEAD_FLAG_BE << 0) |	\
>> +				 (__HEAD_FLAG_PAGE_SIZE << 1) |	\
>> +				 (__HEAD_FLAG_PHYS_BASE << 3))
>>
>>   /*
>>    * These will output as part of the Image header, which should be little-endian
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 1d627cd8121c..e8e853a1024c 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -35,8 +35,10 @@
>>   #include <linux/efi.h>
>>   #include <linux/swiotlb.h>
>>
>> +#include <asm/boot.h>
>>   #include <asm/fixmap.h>
>>   #include <asm/kasan.h>
>> +#include <asm/kernel-pgtable.h>
>>   #include <asm/memory.h>
>>   #include <asm/sections.h>
>>   #include <asm/setup.h>
>> @@ -158,9 +160,80 @@ static int __init early_mem(char *p)
>>   }
>>   early_param("mem", early_mem);
>>
>> +/*
>> + * clip_mem_range() - remove memblock memory between @min and @max until
>> + *                    we meet the limit in 'memory_limit'.
>> + */
>> +static void __init clip_mem_range(u64 min, u64 max)
>> +{
>> +	u64 mem_size, to_remove;
>> +	int i;
>> +
>> +again:
>> +	mem_size = memblock_phys_mem_size();
>> +	if (mem_size <= memory_limit || max <= min)
>> +		return;
>> +
>> +	to_remove = mem_size - memory_limit;
>> +
>> +	for (i = memblock.memory.cnt - 1; i >= 0; i--) {
>> +		struct memblock_region *r = memblock.memory.regions + i;
>> +		u64 start = max(min, r->base);
>> +		u64 end = min(max, r->base + r->size);
>> +
>> +		if (start >= max || end <= min)
>> +			continue;
>> +
>> +		if (end > min) {
>> +			u64 size = min(to_remove, end - max(start, min));
>> +
>> +			memblock_remove(end - size, size);
>> +		} else {
>> +			memblock_remove(start, min(max - start, to_remove));
>> +		}
>> +		goto again;
>> +	}
>> +}
>> +
>>   void __init arm64_memblock_init(void)
>>   {
>> -	memblock_enforce_memory_limit(memory_limit);
>> +	const s64 linear_region_size = -(s64)PAGE_OFFSET;
>> +
>> +	/*
>> +	 * Select a suitable value for the base of physical memory.
>> +	 */
>> +	memstart_addr = round_down(memblock_start_of_DRAM(),
>> +				   ARM64_MEMSTART_ALIGN);
>> +
>> +	/*
>> +	 * Remove the memory that we will not be able to cover with the
>> +	 * linear mapping. Take care not to clip the kernel which may be
>> +	 * high in memory.
>> +	 */
>> +	memblock_remove(max(memstart_addr + linear_region_size, __pa(_end)),
>> +			ULLONG_MAX);
>> +	if (memblock_end_of_DRAM() > linear_region_size)
>> +		memblock_remove(0, memblock_end_of_DRAM() - linear_region_size);
>> +
>> +	if (memory_limit != (phys_addr_t)ULLONG_MAX) {
>> +		u64 kbase = round_down(__pa(_text), MIN_KIMG_ALIGN);
>> +		u64 kend = PAGE_ALIGN(__pa(_end));
>> +		u64 const sz_4g = 0x100000000UL;
>> +
>> +		/*
>> +		 * Clip memory in order of preference:
>> +		 * - above the kernel and above 4 GB
>> +		 * - between 4 GB and the start of the kernel (if the kernel
>> +		 *   is loaded high in memory)
>> +		 * - between the kernel and 4 GB (if the kernel is loaded
>> +		 *   low in memory)
>> +		 * - below 4 GB
>> +		 */
>> +		clip_mem_range(max(sz_4g, kend), ULLONG_MAX);
>> +		clip_mem_range(sz_4g, kbase);
>> +		clip_mem_range(kend, sz_4g);
>> +		clip_mem_range(0, min(kbase, sz_4g));
>> +	}
>>
>>   	/*
>>   	 * Register the kernel text, kernel data, initrd, and initial
>> @@ -381,3 +454,28 @@ static int __init keepinitrd_setup(char *__unused)
>>
>>   __setup("keepinitrd", keepinitrd_setup);
>>   #endif
>> +
>> +/*
>> + * Dump out memory limit information on panic.
>> + */
>> +static int dump_mem_limit(struct notifier_block *self, unsigned long v, void *p)
>> +{
>> +	if (memory_limit != (phys_addr_t)ULLONG_MAX) {
>> +		pr_emerg("Memory Limit: %llu MB\n", memory_limit >> 20);
>> +	} else {
>> +		pr_emerg("Memory Limit: none\n");
>> +	}
>> +	return 0;
>> +}
>> +
>> +static struct notifier_block mem_limit_notifier = {
>> +	.notifier_call = dump_mem_limit,
>> +};
>> +
>> +static int __init register_mem_limit_dumper(void)
>> +{
>> +	atomic_notifier_chain_register(&panic_notifier_list,
>> +				       &mem_limit_notifier);
>> +	return 0;
>> +}
>> +__initcall(register_mem_limit_dumper);
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 4c4b15932963..8dda38378959 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -46,6 +46,9 @@
>>
>>   u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>>
>> +u64 kimage_voffset __read_mostly;
>> +EXPORT_SYMBOL(kimage_voffset);
>> +
>>   /*
>>    * Empty_zero_page is a special page that is used for zero-initialized data
>>    * and COW.
>> --
>> 2.5.0
>>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 8/8] arm64: allow kernel Image to be loaded anywhere in physical memory
  2016-02-01 15:13     ` Ard Biesheuvel
@ 2016-02-01 16:31       ` Ard Biesheuvel
  2016-02-01 17:31         ` Catalin Marinas
  0 siblings, 1 reply; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-01 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 February 2016 at 16:13, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 1 February 2016 at 16:06, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On Mon, Feb 01, 2016 at 11:54:53AM +0100, Ard Biesheuvel wrote:
>>> Note that limiting memory using mem= is not unambiguous anymore after
>>> this change, considering that the kernel may be at the top of physical
>>> memory, and clipping from the bottom rather than the top will discard
>>> any 32-bit DMA addressable memory first. To deal with this, the handling
>>> of mem= is reimplemented to clip top down, but take special care not to
>>> clip memory that covers the kernel image.
>>
>> I may have forgotten the reason - why do we need to avoid clipping the
>> memory that covers the kernel image? It's already mapped in the vmalloc
>> area, so we wouldn't need it in the linear map as well.
>>
>
> Good question. Originally, I needed it for swapper_pg_dir, whose
> pud/pmd/pte levels were accessed via __va() translations of the values
> found in the higher-up table entries, but after Mark's patches, only
> the top level pgd of swapper_pg_dir is still used. Similarly, for
> idmap_pg_dir, we don't change any mappings at runtime so the same
> applies there I think.
>
> I will try dropping this, and see what happens.
>

I have given this a spin, and this chokes on
a) the fact that not all of the translation tables are accessible via
the linear mapping: the fixmap, due to its vicinity to PCI i/o and
other populated regions, will share its pud/pmd level tables with
other users, like ioremap, which traverses the translation tables in
the ordinary way, i.e., it expects that __va() applied on the phys
address in the table entry returns something that is mapped
b) free_initmem() now calls __free_pages() on a region that we never
mapped or registered as available.

So it may be feasible with some hackery, but I wonder if it is worth
it to complicate the common case for implementing mem= more
efficiently.

-- 
Ard.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 8/8] arm64: allow kernel Image to be loaded anywhere in physical memory
  2016-02-01 16:31       ` Ard Biesheuvel
@ 2016-02-01 17:31         ` Catalin Marinas
  2016-02-01 17:57           ` Ard Biesheuvel
  0 siblings, 1 reply; 78+ messages in thread
From: Catalin Marinas @ 2016-02-01 17:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 01, 2016 at 05:31:11PM +0100, Ard Biesheuvel wrote:
> On 1 February 2016 at 16:13, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > On 1 February 2016 at 16:06, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> On Mon, Feb 01, 2016 at 11:54:53AM +0100, Ard Biesheuvel wrote:
> >>> Note that limiting memory using mem= is not unambiguous anymore after
> >>> this change, considering that the kernel may be at the top of physical
> >>> memory, and clipping from the bottom rather than the top will discard
> >>> any 32-bit DMA addressable memory first. To deal with this, the handling
> >>> of mem= is reimplemented to clip top down, but take special care not to
> >>> clip memory that covers the kernel image.
> >>
> >> I may have forgotten the reason - why do we need to avoid clipping the
> >> memory that covers the kernel image? It's already mapped in the vmalloc
> >> area, so we wouldn't need it in the linear map as well.
> >
> > Good question. Originally, I needed it for swapper_pg_dir, whose
> > pud/pmd/pte levels were accessed via __va() translations of the values
> > found in the higher-up table entries, but after Mark's patches, only
> > the top level pgd of swapper_pg_dir is still used. Similarly, for
> > idmap_pg_dir, we don't change any mappings at runtime so the same
> > applies there I think.
> >
> > I will try dropping this, and see what happens.
> 
> I have given this a spin, and this chokes on
> a) the fact that not all of the translation tables are accessible via
> the linear mapping: the fixmap, due to its vicinity to PCI i/o and
> other populated regions, will share its pud/pmd level tables with
> other users, like ioremap, which traverses the translation tables in
> the ordinary way, i.e., it expects that __va() applied on the phys
> address in the table entry returns something that is mapped

Ah, __va(__pa(x)) is not an identity function and I don't think it's
worth fixing it (the __pa() case is much simpler). But it also means
that we won't be able to remove the kernel image alias in the linear
mapping. It shouldn't be a problem for KASLR as long as we randomise
both kernel image PA and VA.

> b) free_initmem() now calls __free_pages() on a region that we never
> mapped or registered as available.
> 
> So it may be feasible with some hackery, but I wonder if it is worth
> it to complicate the common case for implementing mem= more
> efficiently.

I don't care about efficiency, I was hoping to avoid the additional
arm64-specific memory clipping but it seems that it could easily get
more complicated. So let's leave as it is.

Consider this sub-series merged (I'll push it to -next around -rc3).

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 8/8] arm64: allow kernel Image to be loaded anywhere in physical memory
  2016-02-01 17:31         ` Catalin Marinas
@ 2016-02-01 17:57           ` Ard Biesheuvel
  2016-02-01 18:02             ` Catalin Marinas
  0 siblings, 1 reply; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-01 17:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 February 2016 at 18:31, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Mon, Feb 01, 2016 at 05:31:11PM +0100, Ard Biesheuvel wrote:
>> On 1 February 2016 at 16:13, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > On 1 February 2016 at 16:06, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >> On Mon, Feb 01, 2016 at 11:54:53AM +0100, Ard Biesheuvel wrote:
>> >>> Note that limiting memory using mem= is not unambiguous anymore after
>> >>> this change, considering that the kernel may be at the top of physical
>> >>> memory, and clipping from the bottom rather than the top will discard
>> >>> any 32-bit DMA addressable memory first. To deal with this, the handling
>> >>> of mem= is reimplemented to clip top down, but take special care not to
>> >>> clip memory that covers the kernel image.
>> >>
>> >> I may have forgotten the reason - why do we need to avoid clipping the
>> >> memory that covers the kernel image? It's already mapped in the vmalloc
>> >> area, so we wouldn't need it in the linear map as well.
>> >
>> > Good question. Originally, I needed it for swapper_pg_dir, whose
>> > pud/pmd/pte levels were accessed via __va() translations of the values
>> > found in the higher-up table entries, but after Mark's patches, only
>> > the top level pgd of swapper_pg_dir is still used. Similarly, for
>> > idmap_pg_dir, we don't change any mappings at runtime so the same
>> > applies there I think.
>> >
>> > I will try dropping this, and see what happens.
>>
>> I have given this a spin, and this chokes on
>> a) the fact that not all of the translation tables are accessible via
>> the linear mapping: the fixmap, due to its vicinity to PCI i/o and
>> other populated regions, will share its pud/pmd level tables with
>> other users, like ioremap, which traverses the translation tables in
>> the ordinary way, i.e., it expects that __va() applied on the phys
>> address in the table entry returns something that is mapped
>
> Ah, __va(__pa(x)) is not an identity function and I don't think it's
> worth fixing it (the __pa() case is much simpler). But it also means
> that we won't be able to remove the kernel image alias in the linear
> mapping. It shouldn't be a problem for KASLR as long as we randomise
> both kernel image PA and VA.
>

indeed.

>> b) free_initmem() now calls __free_pages() on a region that we never
>> mapped or registered as available.
>>
>> So it may be feasible with some hackery, but I wonder if it is worth
>> it to complicate the common case for implementing mem= more
>> efficiently.
>
> I don't care about efficiency, I was hoping to avoid the additional
> arm64-specific memory clipping but it seems that it could easily get
> more complicated. So let's leave as it is.
>

Alternatively, we could simply apply the memory limit as before, and
add back the [__init_begin, _end] interval right afterwards using
memblock_add()

> Consider this sub-series merged (I'll push it to -next around -rc3).
>
> Thanks.
>
> --
> Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 8/8] arm64: allow kernel Image to be loaded anywhere in physical memory
  2016-02-01 17:57           ` Ard Biesheuvel
@ 2016-02-01 18:02             ` Catalin Marinas
  2016-02-01 18:30               ` [PATCH] arm64: move back to generic memblock_enforce_memory_limit() Ard Biesheuvel
  0 siblings, 1 reply; 78+ messages in thread
From: Catalin Marinas @ 2016-02-01 18:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 01, 2016 at 06:57:05PM +0100, Ard Biesheuvel wrote:
> On 1 February 2016 at 18:31, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> >> On Mon, Feb 01, 2016 at 11:54:53AM +0100, Ard Biesheuvel wrote:
> >> >>> Note that limiting memory using mem= is not unambiguous anymore after
> >> >>> this change, considering that the kernel may be at the top of physical
> >> >>> memory, and clipping from the bottom rather than the top will discard
> >> >>> any 32-bit DMA addressable memory first. To deal with this, the handling
> >> >>> of mem= is reimplemented to clip top down, but take special care not to
> >> >>> clip memory that covers the kernel image.
> >> >>
> >> >> I may have forgotten the reason - why do we need to avoid clipping the
> >> >> memory that covers the kernel image? It's already mapped in the vmalloc
> >> >> area, so we wouldn't need it in the linear map as well.
[...]
> > I don't care about efficiency, I was hoping to avoid the additional
> > arm64-specific memory clipping but it seems that it could easily get
> > more complicated. So let's leave as it is.
> 
> Alternatively, we could simply apply the memory limit as before, and
> add back the [__init_begin, _end] interval right afterwards using
> memblock_add()

If the code ends up simpler, yes, I'm fine with it.

-- 
Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH] arm64: move back to generic memblock_enforce_memory_limit()
  2016-02-01 18:02             ` Catalin Marinas
@ 2016-02-01 18:30               ` Ard Biesheuvel
  2016-02-02 10:19                 ` Catalin Marinas
  0 siblings, 1 reply; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-01 18:30 UTC (permalink / raw)
  To: linux-arm-kernel

Rather than implementing our own elaborate logic to clip the memory ranges
in order of preference, use the generic memblock_enforce_memory_limit() as
before, and explicitly add back that part of the kernel that needs to be
accessible via the linear mapping as well as via the kernel mapping.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/init.c | 59 +++++++---------------------------------------------
 1 file changed, 7 insertions(+), 52 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index e8e853a1024c..361c91209031 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -160,41 +160,6 @@ static int __init early_mem(char *p)
 }
 early_param("mem", early_mem);
 
-/*
- * clip_mem_range() - remove memblock memory between @min and @max until
- *                    we meet the limit in 'memory_limit'.
- */
-static void __init clip_mem_range(u64 min, u64 max)
-{
-	u64 mem_size, to_remove;
-	int i;
-
-again:
-	mem_size = memblock_phys_mem_size();
-	if (mem_size <= memory_limit || max <= min)
-		return;
-
-	to_remove = mem_size - memory_limit;
-
-	for (i = memblock.memory.cnt - 1; i >= 0; i--) {
-		struct memblock_region *r = memblock.memory.regions + i;
-		u64 start = max(min, r->base);
-		u64 end = min(max, r->base + r->size);
-
-		if (start >= max || end <= min)
-			continue;
-
-		if (end > min) {
-			u64 size = min(to_remove, end - max(start, min));
-
-			memblock_remove(end - size, size);
-		} else {
-			memblock_remove(start, min(max - start, to_remove));
-		}
-		goto again;
-	}
-}
-
 void __init arm64_memblock_init(void)
 {
 	const s64 linear_region_size = -(s64)PAGE_OFFSET;
@@ -215,24 +180,14 @@ void __init arm64_memblock_init(void)
 	if (memblock_end_of_DRAM() > linear_region_size)
 		memblock_remove(0, memblock_end_of_DRAM() - linear_region_size);
 
+	/*
+	 * Apply the memory limit if it was set. Since the kernel may be loaded
+	 * high up in memory, add back the kernel region that must be accessible
+	 * via the linear mapping.
+	 */
 	if (memory_limit != (phys_addr_t)ULLONG_MAX) {
-		u64 kbase = round_down(__pa(_text), MIN_KIMG_ALIGN);
-		u64 kend = PAGE_ALIGN(__pa(_end));
-		u64 const sz_4g = 0x100000000UL;
-
-		/*
-		 * Clip memory in order of preference:
-		 * - above the kernel and above 4 GB
-		 * - between 4 GB and the start of the kernel (if the kernel
-		 *   is loaded high in memory)
-		 * - between the kernel and 4 GB (if the kernel is loaded
-		 *   low in memory)
-		 * - below 4 GB
-		 */
-		clip_mem_range(max(sz_4g, kend), ULLONG_MAX);
-		clip_mem_range(sz_4g, kbase);
-		clip_mem_range(kend, sz_4g);
-		clip_mem_range(0, min(kbase, sz_4g));
+		memblock_enforce_memory_limit(memory_limit);
+		memblock_add(__pa(__init_begin), (u64)(_end - __init_begin));
 	}
 
 	/*
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH] arm64: move back to generic memblock_enforce_memory_limit()
  2016-02-01 18:30               ` [PATCH] arm64: move back to generic memblock_enforce_memory_limit() Ard Biesheuvel
@ 2016-02-02 10:19                 ` Catalin Marinas
  2016-02-02 10:28                   ` Ard Biesheuvel
  0 siblings, 1 reply; 78+ messages in thread
From: Catalin Marinas @ 2016-02-02 10:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 01, 2016 at 07:30:17PM +0100, Ard Biesheuvel wrote:
>  void __init arm64_memblock_init(void)
>  {
>  	const s64 linear_region_size = -(s64)PAGE_OFFSET;
> @@ -215,24 +180,14 @@ void __init arm64_memblock_init(void)
>  	if (memblock_end_of_DRAM() > linear_region_size)
>  		memblock_remove(0, memblock_end_of_DRAM() - linear_region_size);
>  
> +	/*
> +	 * Apply the memory limit if it was set. Since the kernel may be loaded
> +	 * high up in memory, add back the kernel region that must be accessible
> +	 * via the linear mapping.
> +	 */
>  	if (memory_limit != (phys_addr_t)ULLONG_MAX) {
> -		u64 kbase = round_down(__pa(_text), MIN_KIMG_ALIGN);
> -		u64 kend = PAGE_ALIGN(__pa(_end));
> -		u64 const sz_4g = 0x100000000UL;
> -
> -		/*
> -		 * Clip memory in order of preference:
> -		 * - above the kernel and above 4 GB
> -		 * - between 4 GB and the start of the kernel (if the kernel
> -		 *   is loaded high in memory)
> -		 * - between the kernel and 4 GB (if the kernel is loaded
> -		 *   low in memory)
> -		 * - below 4 GB
> -		 */
> -		clip_mem_range(max(sz_4g, kend), ULLONG_MAX);
> -		clip_mem_range(sz_4g, kbase);
> -		clip_mem_range(kend, sz_4g);
> -		clip_mem_range(0, min(kbase, sz_4g));
> +		memblock_enforce_memory_limit(memory_limit);
> +		memblock_add(__pa(__init_begin), (u64)(_end - __init_begin));

Thanks, it looks much simpler now. However, loading the kernel 1GB
higher with mem=1G fails somewhere during the KVM hyp initialisation. It
works if I change the last line below to:

	memblock_add(__pa(_text), (u64)(_end - _text));

I can fold the change in.

-- 
Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH] arm64: move back to generic memblock_enforce_memory_limit()
  2016-02-02 10:19                 ` Catalin Marinas
@ 2016-02-02 10:28                   ` Ard Biesheuvel
  2016-02-02 10:44                     ` Catalin Marinas
  0 siblings, 1 reply; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-02 10:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 2 February 2016 at 11:19, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Mon, Feb 01, 2016 at 07:30:17PM +0100, Ard Biesheuvel wrote:
>>  void __init arm64_memblock_init(void)
>>  {
>>       const s64 linear_region_size = -(s64)PAGE_OFFSET;
>> @@ -215,24 +180,14 @@ void __init arm64_memblock_init(void)
>>       if (memblock_end_of_DRAM() > linear_region_size)
>>               memblock_remove(0, memblock_end_of_DRAM() - linear_region_size);
>>
>> +     /*
>> +      * Apply the memory limit if it was set. Since the kernel may be loaded
>> +      * high up in memory, add back the kernel region that must be accessible
>> +      * via the linear mapping.
>> +      */
>>       if (memory_limit != (phys_addr_t)ULLONG_MAX) {
>> -             u64 kbase = round_down(__pa(_text), MIN_KIMG_ALIGN);
>> -             u64 kend = PAGE_ALIGN(__pa(_end));
>> -             u64 const sz_4g = 0x100000000UL;
>> -
>> -             /*
>> -              * Clip memory in order of preference:
>> -              * - above the kernel and above 4 GB
>> -              * - between 4 GB and the start of the kernel (if the kernel
>> -              *   is loaded high in memory)
>> -              * - between the kernel and 4 GB (if the kernel is loaded
>> -              *   low in memory)
>> -              * - below 4 GB
>> -              */
>> -             clip_mem_range(max(sz_4g, kend), ULLONG_MAX);
>> -             clip_mem_range(sz_4g, kbase);
>> -             clip_mem_range(kend, sz_4g);
>> -             clip_mem_range(0, min(kbase, sz_4g));
>> +             memblock_enforce_memory_limit(memory_limit);
>> +             memblock_add(__pa(__init_begin), (u64)(_end - __init_begin));
>
> Thanks, it looks much simpler now. However, loading the kernel 1GB
> higher with mem=1G fails somewhere during the KVM hyp initialisation. It
> works if I change the last line below to:
>
>         memblock_add(__pa(_text), (u64)(_end - _text));
>

OK, that should work as well.

I suppose the fact that mem= loses some of its accuracy is not an
issue? If you need it to be exact, you should simply not load your
kernel outside your mem= range ...

> I can fold the change in.
>

OK

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH] arm64: move back to generic memblock_enforce_memory_limit()
  2016-02-02 10:28                   ` Ard Biesheuvel
@ 2016-02-02 10:44                     ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2016-02-02 10:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 02, 2016 at 11:28:41AM +0100, Ard Biesheuvel wrote:
> On 2 February 2016 at 11:19, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Mon, Feb 01, 2016 at 07:30:17PM +0100, Ard Biesheuvel wrote:
> >>  void __init arm64_memblock_init(void)
> >>  {
> >>       const s64 linear_region_size = -(s64)PAGE_OFFSET;
> >> @@ -215,24 +180,14 @@ void __init arm64_memblock_init(void)
> >>       if (memblock_end_of_DRAM() > linear_region_size)
> >>               memblock_remove(0, memblock_end_of_DRAM() - linear_region_size);
> >>
> >> +     /*
> >> +      * Apply the memory limit if it was set. Since the kernel may be loaded
> >> +      * high up in memory, add back the kernel region that must be accessible
> >> +      * via the linear mapping.
> >> +      */
> >>       if (memory_limit != (phys_addr_t)ULLONG_MAX) {
> >> -             u64 kbase = round_down(__pa(_text), MIN_KIMG_ALIGN);
> >> -             u64 kend = PAGE_ALIGN(__pa(_end));
> >> -             u64 const sz_4g = 0x100000000UL;
> >> -
> >> -             /*
> >> -              * Clip memory in order of preference:
> >> -              * - above the kernel and above 4 GB
> >> -              * - between 4 GB and the start of the kernel (if the kernel
> >> -              *   is loaded high in memory)
> >> -              * - between the kernel and 4 GB (if the kernel is loaded
> >> -              *   low in memory)
> >> -              * - below 4 GB
> >> -              */
> >> -             clip_mem_range(max(sz_4g, kend), ULLONG_MAX);
> >> -             clip_mem_range(sz_4g, kbase);
> >> -             clip_mem_range(kend, sz_4g);
> >> -             clip_mem_range(0, min(kbase, sz_4g));
> >> +             memblock_enforce_memory_limit(memory_limit);
> >> +             memblock_add(__pa(__init_begin), (u64)(_end - __init_begin));
> >
> > Thanks, it looks much simpler now. However, loading the kernel 1GB
> > higher with mem=1G fails somewhere during the KVM hyp initialisation. It
> > works if I change the last line below to:
> >
> >         memblock_add(__pa(_text), (u64)(_end - _text));
> 
> OK, that should work as well.
> 
> I suppose the fact that mem= loses some of its accuracy is not an
> issue? If you need it to be exact, you should simply not load your
> kernel outside your mem= range ...

I'm not worried about accuracy. We could avoid freeing the init mem if
the kernel is outside the memory_limit range but I don't really think
it's worth.

-- 
Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-01 10:54 ` [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area Ard Biesheuvel
  2016-02-01 12:24   ` Catalin Marinas
  2016-02-01 14:32   ` Mark Rutland
@ 2016-02-12 14:58   ` Catalin Marinas
  2016-02-12 15:02     ` Ard Biesheuvel
  2016-02-12 17:47   ` [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area James Morse
  3 siblings, 1 reply; 78+ messages in thread
From: Catalin Marinas @ 2016-02-12 14:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

On Mon, Feb 01, 2016 at 11:54:52AM +0100, Ard Biesheuvel wrote:
> This moves the module area to right before the vmalloc area, and
> moves the kernel image to the base of the vmalloc area. This is
> an intermediate step towards implementing KASLR, which allows the
> kernel image to be located anywhere in the vmalloc area.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

This patch is causing lots of KASAN warnings on Juno (interestingly, it
doesn't seem to trigger on Seattle, though we only tried for-next/core).
I pushed the branch that I'm currently using here:

git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux for-next/kernmap


A typical error (though its place varies based on the config options,
kernel layout):

BUG: KASAN: stack-out-of-bounds in clockevents_program_event+0x28/0x1b0 at addr ffffffc936257cc8
Read of size 8 by task swapper/2/0
page:ffffffbde6d895c0 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x4000000000000000()
page dumped because: kasan: bad access detected
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.5.0-rc1+ #130
Hardware name: Juno (DT)
Call trace:
[<ffffff900408b590>] dump_backtrace+0x0/0x258
[<ffffff900408b7fc>] show_stack+0x14/0x20
[<ffffff900448789c>] dump_stack+0xac/0x100
[<ffffff9004224f3c>] kasan_report_error+0x544/0x570
[<ffffff9004225328>] kasan_report+0x40/0x48
[<ffffff9004223c58>] __asan_load8+0x60/0x78
[<ffffff90041596f0>] clockevents_program_event+0x28/0x1b0
[<ffffff900415c63c>] tick_program_event+0x74/0xb8
[<ffffff9004148944>] __remove_hrtimer+0xcc/0x100
[<ffffff9004148f0c>] hrtimer_start_range_ns+0x3f4/0x538
[<ffffff900415d450>] __tick_nohz_idle_enter+0x558/0x590
[<ffffff900415d74c>] tick_nohz_idle_enter+0x44/0x78
[<ffffff900411fcc8>] cpu_startup_entry+0x48/0x2c0
[<ffffff9004091f58>] secondary_start_kernel+0x208/0x278
[<0000000080082aac>] 0x80082aac
Memory state around the buggy address:
 ffffffc936257b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffc936257c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
>ffffffc936257c80: f1 f1 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00
                                              ^
 ffffffc936257d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffc936257d80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1


And some additional info from the kernel boot:

Processing EFI memory map:
  0x000008000000-0x00000bffffff [Memory Mapped I/O  |RUN|  |  |  |  |  |   |  |  |  |UC]
  0x00001c170000-0x00001c170fff [Memory Mapped I/O  |RUN|  |  |  |  |  |   |  |  |  |UC]
  0x000080000000-0x00008000ffff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x000080010000-0x00008007ffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x000080080000-0x00008149ffff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x0000814a0000-0x00009fdfffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x00009fe00000-0x00009fe0ffff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x00009fe10000-0x0000dfffffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x0000e00f0000-0x0000febd5fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x0000febd6000-0x0000febd9fff [ACPI Reclaim Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]*
  0x0000febda000-0x0000febdafff [ACPI Memory NVS    |   |  |  |  |  |  |   |WB|WT|WC|UC]*
  0x0000febdb000-0x0000febdcfff [ACPI Reclaim Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]*
  0x0000febdd000-0x0000feffffff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x000880000000-0x0009f8794fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x0009f8795000-0x0009f8796fff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x0009f8797000-0x0009f9bb4fff [Loader Code        |   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x0009f9bb5000-0x0009faf6efff [Boot Code          |   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x0009faf6f000-0x0009fafa9fff [Runtime Data       |RUN|  |  |  |  |  |   |WB|WT|WC|UC]*
  0x0009fafaa000-0x0009ff2b1fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x0009ff2b2000-0x0009ffb70fff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x0009ffb71000-0x0009ffb89fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x0009ffb8a000-0x0009ffb8dfff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x0009ffb8e000-0x0009ffb8efff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x0009ffb8f000-0x0009ffdddfff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x0009ffdde000-0x0009ffe76fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x0009ffe77000-0x0009fff6dfff [Boot Code          |   |  |  |  |  |  |   |WB|WT|WC|UC]
  0x0009fff6e000-0x0009fffaefff [Runtime Code       |RUN|  |  |  |  |  |   |WB|WT|WC|UC]*
  0x0009fffaf000-0x0009ffffefff [Runtime Data       |RUN|  |  |  |  |  |   |WB|WT|WC|UC]*
  0x0009fffff000-0x0009ffffffff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]


Memory: 7068520K/8371264K available (10424K kernel code, 3464K rwdata, 5284K rodata, 1016K init, 380K bss, 1286360K reserved, 16384K cma-reserved)
Virtual kernel memory layout:
    kasan   : 0xffffff8000000000 - 0xffffff9000000000   (    64 GB)
    modules : 0xffffff9000000000 - 0xffffff9004000000   (    64 MB)
    vmalloc : 0xffffff9004000000 - 0xffffffbdbfff0000   (   182 GB)
      .init : 0xffffff9004fd9000 - 0xffffff90050d7000   (  1016 KB)
      .text : 0xffffff9004080000 - 0xffffff9004fd9000   ( 15716 KB)
      .data : 0xffffff90050d7000 - 0xffffff9005439200   (  3465 KB)
    vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000   (     8 GB maximum)
              0xffffffbdc2000000 - 0xffffffbde8000000   (   608 MB actual)
    fixed   : 0xffffffbffe7fd000 - 0xffffffbffec00000   (  4108 KB)
    PCI I/O : 0xffffffbffee00000 - 0xffffffbfffe00000   (    16 MB)
    memory  : 0xffffffc000000000 - 0xffffffc980000000   ( 38912 MB)

-- 
Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-12 14:58   ` Catalin Marinas
@ 2016-02-12 15:02     ` Ard Biesheuvel
  2016-02-12 15:10       ` Catalin Marinas
  0 siblings, 1 reply; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-12 15:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 12 February 2016 at 15:58, Catalin Marinas <catalin.marinas@arm.com> wrote:
> Hi Ard,
>
> On Mon, Feb 01, 2016 at 11:54:52AM +0100, Ard Biesheuvel wrote:
>> This moves the module area to right before the vmalloc area, and
>> moves the kernel image to the base of the vmalloc area. This is
>> an intermediate step towards implementing KASLR, which allows the
>> kernel image to be located anywhere in the vmalloc area.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> This patch is causing lots of KASAN warnings on Juno (interestingly, it
> doesn't seem to trigger on Seattle, though we only tried for-next/core).
> I pushed the branch that I'm currently using here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux for-next/kernmap
>
>
> A typical error (though its place varies based on the config options,
> kernel layout):
>
> BUG: KASAN: stack-out-of-bounds in clockevents_program_event+0x28/0x1b0 at addr ffffffc936257cc8

Can you confirm that these are stack accesses? I was having similar
errors before, and I ended up creating the kasan zero page patch
because it turned out the kasan shadow page in question was aliased
and the stack writes were occurring elsewhere.


> Read of size 8 by task swapper/2/0
> page:ffffffbde6d895c0 count:0 mapcount:0 mapping:          (null) index:0x0
> flags: 0x4000000000000000()
> page dumped because: kasan: bad access detected
> CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.5.0-rc1+ #130
> Hardware name: Juno (DT)
> Call trace:
> [<ffffff900408b590>] dump_backtrace+0x0/0x258
> [<ffffff900408b7fc>] show_stack+0x14/0x20
> [<ffffff900448789c>] dump_stack+0xac/0x100
> [<ffffff9004224f3c>] kasan_report_error+0x544/0x570
> [<ffffff9004225328>] kasan_report+0x40/0x48
> [<ffffff9004223c58>] __asan_load8+0x60/0x78
> [<ffffff90041596f0>] clockevents_program_event+0x28/0x1b0
> [<ffffff900415c63c>] tick_program_event+0x74/0xb8
> [<ffffff9004148944>] __remove_hrtimer+0xcc/0x100
> [<ffffff9004148f0c>] hrtimer_start_range_ns+0x3f4/0x538
> [<ffffff900415d450>] __tick_nohz_idle_enter+0x558/0x590
> [<ffffff900415d74c>] tick_nohz_idle_enter+0x44/0x78
> [<ffffff900411fcc8>] cpu_startup_entry+0x48/0x2c0
> [<ffffff9004091f58>] secondary_start_kernel+0x208/0x278
> [<0000000080082aac>] 0x80082aac
> Memory state around the buggy address:
>  ffffffc936257b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  ffffffc936257c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
>>ffffffc936257c80: f1 f1 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00
>                                               ^
>  ffffffc936257d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  ffffffc936257d80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
>
>
> And some additional info from the kernel boot:
>
> Processing EFI memory map:
>   0x000008000000-0x00000bffffff [Memory Mapped I/O  |RUN|  |  |  |  |  |   |  |  |  |UC]
>   0x00001c170000-0x00001c170fff [Memory Mapped I/O  |RUN|  |  |  |  |  |   |  |  |  |UC]
>   0x000080000000-0x00008000ffff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x000080010000-0x00008007ffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x000080080000-0x00008149ffff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x0000814a0000-0x00009fdfffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x00009fe00000-0x00009fe0ffff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x00009fe10000-0x0000dfffffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x0000e00f0000-0x0000febd5fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x0000febd6000-0x0000febd9fff [ACPI Reclaim Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]*
>   0x0000febda000-0x0000febdafff [ACPI Memory NVS    |   |  |  |  |  |  |   |WB|WT|WC|UC]*
>   0x0000febdb000-0x0000febdcfff [ACPI Reclaim Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]*
>   0x0000febdd000-0x0000feffffff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x000880000000-0x0009f8794fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x0009f8795000-0x0009f8796fff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x0009f8797000-0x0009f9bb4fff [Loader Code        |   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x0009f9bb5000-0x0009faf6efff [Boot Code          |   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x0009faf6f000-0x0009fafa9fff [Runtime Data       |RUN|  |  |  |  |  |   |WB|WT|WC|UC]*
>   0x0009fafaa000-0x0009ff2b1fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x0009ff2b2000-0x0009ffb70fff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x0009ffb71000-0x0009ffb89fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x0009ffb8a000-0x0009ffb8dfff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x0009ffb8e000-0x0009ffb8efff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x0009ffb8f000-0x0009ffdddfff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x0009ffdde000-0x0009ffe76fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x0009ffe77000-0x0009fff6dfff [Boot Code          |   |  |  |  |  |  |   |WB|WT|WC|UC]
>   0x0009fff6e000-0x0009fffaefff [Runtime Code       |RUN|  |  |  |  |  |   |WB|WT|WC|UC]*
>   0x0009fffaf000-0x0009ffffefff [Runtime Data       |RUN|  |  |  |  |  |   |WB|WT|WC|UC]*
>   0x0009fffff000-0x0009ffffffff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
>
>
> Memory: 7068520K/8371264K available (10424K kernel code, 3464K rwdata, 5284K rodata, 1016K init, 380K bss, 1286360K reserved, 16384K cma-reserved)
> Virtual kernel memory layout:
>     kasan   : 0xffffff8000000000 - 0xffffff9000000000   (    64 GB)
>     modules : 0xffffff9000000000 - 0xffffff9004000000   (    64 MB)
>     vmalloc : 0xffffff9004000000 - 0xffffffbdbfff0000   (   182 GB)
>       .init : 0xffffff9004fd9000 - 0xffffff90050d7000   (  1016 KB)
>       .text : 0xffffff9004080000 - 0xffffff9004fd9000   ( 15716 KB)
>       .data : 0xffffff90050d7000 - 0xffffff9005439200   (  3465 KB)
>     vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000   (     8 GB maximum)
>               0xffffffbdc2000000 - 0xffffffbde8000000   (   608 MB actual)
>     fixed   : 0xffffffbffe7fd000 - 0xffffffbffec00000   (  4108 KB)
>     PCI I/O : 0xffffffbffee00000 - 0xffffffbfffe00000   (    16 MB)
>     memory  : 0xffffffc000000000 - 0xffffffc980000000   ( 38912 MB)
>
> --
> Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-12 15:02     ` Ard Biesheuvel
@ 2016-02-12 15:10       ` Catalin Marinas
  2016-02-12 15:17         ` Ard Biesheuvel
  0 siblings, 1 reply; 78+ messages in thread
From: Catalin Marinas @ 2016-02-12 15:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 12, 2016 at 04:02:58PM +0100, Ard Biesheuvel wrote:
> On 12 February 2016 at 15:58, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > Hi Ard,
> >
> > On Mon, Feb 01, 2016 at 11:54:52AM +0100, Ard Biesheuvel wrote:
> >> This moves the module area to right before the vmalloc area, and
> >> moves the kernel image to the base of the vmalloc area. This is
> >> an intermediate step towards implementing KASLR, which allows the
> >> kernel image to be located anywhere in the vmalloc area.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >
> > This patch is causing lots of KASAN warnings on Juno (interestingly, it
> > doesn't seem to trigger on Seattle, though we only tried for-next/core).
> > I pushed the branch that I'm currently using here:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux for-next/kernmap
> >
> >
> > A typical error (though its place varies based on the config options,
> > kernel layout):
> >
> > BUG: KASAN: stack-out-of-bounds in clockevents_program_event+0x28/0x1b0 at addr ffffffc936257cc8
> 
> Can you confirm that these are stack accesses? I was having similar
> errors before, and I ended up creating the kasan zero page patch
> because it turned out the kasan shadow page in question was aliased
> and the stack writes were occurring elsewhere.

It's possible, we are looking into this. Is there any other patch I miss on
the above branch?

BTW, disabling CPU_IDLE, I get other errors:

WARNING: at /work/Linux/linux-2.6-aarch64/mm/vmalloc.c:135
Modules linked in:

CPU: 2 PID: 973 Comm: systemd-modules Tainted: G        W       4.5.0-rc1+ #131
Hardware name: Juno (DT)
task: ffffffc93448e200 ti: ffffffc9346ac000 task.ti: ffffffc9346ac000
PC is at vmap_page_range_noflush+0x240/0x2e8
LR is at vmap_page_range_noflush+0x16c/0x2e8
pc : [<ffffff90041fef78>] lr : [<ffffff90041feea4>] pstate: 20000145
sp : ffffffc9346af9b0
x29: ffffffc9346af9b0 x28: ffffff90050da000
x27: ffffffc001438008 x26: ffffffbde6d16440
x25: 0000004240000000 x24: ffffffc97ff3a000
x23: 0000000000000041 x22: ffffffc078e9e600
x21: ffffff8200002000 x20: ffffff8200001000
x19: 0000000000000000 x18: 00000000f3294c2f
x17: 00000000f7dc90fb x16: 0000000087b402ce
x15: ffffffffffffffff x14: ffffff0000000000
x13: ffffffffffffffff x12: 0000000000000028
x11: 0101010101010101 x10: 00000001801a001a
x9 : 0000000000000000 x8 : ffffff89268b2400
x7 : 0000000000000000 x6 : 000000000000003f
x5 : 0000000000000040 x4 : 0000000000000000
x3 : 0000000000000000 x2 : 1ffffff800287001
x1 : dfffff9000000000 x0 : 00e8000081439713

---[ end trace 8a78d7ad8d08d2a9 ]---
Call trace:
Exception stack(0xffffffc9346af790 to 0xffffffc9346af8b0)
f780:                                   0000000000000000 ffffff8200001000
f7a0: ffffffc9346af9b0 ffffff90041fef78 0000000020000145 000000000000003d
f7c0: 0000004240000000 ffffffc078e9f600 0000000041b58ab3 ffffff9004f0c370
f7e0: ffffff9004082608 ffffffc078e9f620 00000000024080c2 0000000000400000
f800: ffffffc9346afe70 ffffffc9346afe50 ffffffc9346af9b0 ffffffc93448e200
f820: ffffffc9346af830 ffffff900408b1b0 ffffffc9346af900 ffffff900408b228
f840: ffffffc9346ac000 ffffffc9346af9b0 ffffffc9346ac000 ffffffc078e9e600
f860: 0000000041b58ab3 ffffff9004f0c8a8 ffffff900408b080 000000010010000e
f880: ffffffc9346af9b0 0000000000000000 00e8000081439713 dfffff9000000000
f8a0: 1ffffff800287001 0000000000000000
[<ffffff90041fef78>] vmap_page_range_noflush+0x240/0x2e8
[<ffffff90041ff078>] map_vm_area+0x58/0x88
[<ffffff9004200400>] __vmalloc_node_range+0x2b8/0x350
[<ffffff9004224394>] kasan_module_alloc+0x64/0xb8
[<ffffff90040943f4>] module_alloc+0x5c/0xa0
[<ffffff9004169460>] load_module+0x1798/0x3098
[<ffffff900416b020>] SyS_finit_module+0xf8/0x108
[<ffffff9004085d30>] el0_svc_naked+0x24/0x28
vmalloc: allocation failure, allocated 4096 of 4096 bytes

-- 
Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-12 15:10       ` Catalin Marinas
@ 2016-02-12 15:17         ` Ard Biesheuvel
  2016-02-12 15:26           ` Catalin Marinas
  0 siblings, 1 reply; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-12 15:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 12 February 2016 at 16:10, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Fri, Feb 12, 2016 at 04:02:58PM +0100, Ard Biesheuvel wrote:
>> On 12 February 2016 at 15:58, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> > Hi Ard,
>> >
>> > On Mon, Feb 01, 2016 at 11:54:52AM +0100, Ard Biesheuvel wrote:
>> >> This moves the module area to right before the vmalloc area, and
>> >> moves the kernel image to the base of the vmalloc area. This is
>> >> an intermediate step towards implementing KASLR, which allows the
>> >> kernel image to be located anywhere in the vmalloc area.
>> >>
>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> >
>> > This patch is causing lots of KASAN warnings on Juno (interestingly, it
>> > doesn't seem to trigger on Seattle, though we only tried for-next/core).
>> > I pushed the branch that I'm currently using here:
>> >
>> > git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux for-next/kernmap
>> >
>> >
>> > A typical error (though its place varies based on the config options,
>> > kernel layout):
>> >
>> > BUG: KASAN: stack-out-of-bounds in clockevents_program_event+0x28/0x1b0 at addr ffffffc936257cc8
>>
>> Can you confirm that these are stack accesses? I was having similar
>> errors before, and I ended up creating the kasan zero page patch
>> because it turned out the kasan shadow page in question was aliased
>> and the stack writes were occurring elsewhere.
>
> It's possible, we are looking into this. Is there any other patch I miss on
> the above branch?
>

I don't think so but I will check

> BTW, disabling CPU_IDLE, I get other errors:
>
> WARNING: at /work/Linux/linux-2.6-aarch64/mm/vmalloc.c:135
> Modules linked in:
>

Since this occurs in kasan_module_alloc(), I think this may be a
symptom of the same underlying issue, where the kernel VA space and
the projection onto the Kasan shadow area are somehow out of sync.

I will try to reproduce with the branch above.



> CPU: 2 PID: 973 Comm: systemd-modules Tainted: G        W       4.5.0-rc1+ #131
> Hardware name: Juno (DT)
> task: ffffffc93448e200 ti: ffffffc9346ac000 task.ti: ffffffc9346ac000
> PC is at vmap_page_range_noflush+0x240/0x2e8
> LR is at vmap_page_range_noflush+0x16c/0x2e8
> pc : [<ffffff90041fef78>] lr : [<ffffff90041feea4>] pstate: 20000145
> sp : ffffffc9346af9b0
> x29: ffffffc9346af9b0 x28: ffffff90050da000
> x27: ffffffc001438008 x26: ffffffbde6d16440
> x25: 0000004240000000 x24: ffffffc97ff3a000
> x23: 0000000000000041 x22: ffffffc078e9e600
> x21: ffffff8200002000 x20: ffffff8200001000
> x19: 0000000000000000 x18: 00000000f3294c2f
> x17: 00000000f7dc90fb x16: 0000000087b402ce
> x15: ffffffffffffffff x14: ffffff0000000000
> x13: ffffffffffffffff x12: 0000000000000028
> x11: 0101010101010101 x10: 00000001801a001a
> x9 : 0000000000000000 x8 : ffffff89268b2400
> x7 : 0000000000000000 x6 : 000000000000003f
> x5 : 0000000000000040 x4 : 0000000000000000
> x3 : 0000000000000000 x2 : 1ffffff800287001
> x1 : dfffff9000000000 x0 : 00e8000081439713
>
> ---[ end trace 8a78d7ad8d08d2a9 ]---
> Call trace:
> Exception stack(0xffffffc9346af790 to 0xffffffc9346af8b0)
> f780:                                   0000000000000000 ffffff8200001000
> f7a0: ffffffc9346af9b0 ffffff90041fef78 0000000020000145 000000000000003d
> f7c0: 0000004240000000 ffffffc078e9f600 0000000041b58ab3 ffffff9004f0c370
> f7e0: ffffff9004082608 ffffffc078e9f620 00000000024080c2 0000000000400000
> f800: ffffffc9346afe70 ffffffc9346afe50 ffffffc9346af9b0 ffffffc93448e200
> f820: ffffffc9346af830 ffffff900408b1b0 ffffffc9346af900 ffffff900408b228
> f840: ffffffc9346ac000 ffffffc9346af9b0 ffffffc9346ac000 ffffffc078e9e600
> f860: 0000000041b58ab3 ffffff9004f0c8a8 ffffff900408b080 000000010010000e
> f880: ffffffc9346af9b0 0000000000000000 00e8000081439713 dfffff9000000000
> f8a0: 1ffffff800287001 0000000000000000
> [<ffffff90041fef78>] vmap_page_range_noflush+0x240/0x2e8
> [<ffffff90041ff078>] map_vm_area+0x58/0x88
> [<ffffff9004200400>] __vmalloc_node_range+0x2b8/0x350
> [<ffffff9004224394>] kasan_module_alloc+0x64/0xb8
> [<ffffff90040943f4>] module_alloc+0x5c/0xa0
> [<ffffff9004169460>] load_module+0x1798/0x3098
> [<ffffff900416b020>] SyS_finit_module+0xf8/0x108
> [<ffffff9004085d30>] el0_svc_naked+0x24/0x28
> vmalloc: allocation failure, allocated 4096 of 4096 bytes
>
> --
> Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-12 15:17         ` Ard Biesheuvel
@ 2016-02-12 15:26           ` Catalin Marinas
  2016-02-12 15:38             ` Sudeep Holla
  0 siblings, 1 reply; 78+ messages in thread
From: Catalin Marinas @ 2016-02-12 15:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 12, 2016 at 04:17:09PM +0100, Ard Biesheuvel wrote:
> On 12 February 2016 at 16:10, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Fri, Feb 12, 2016 at 04:02:58PM +0100, Ard Biesheuvel wrote:
> >> On 12 February 2016 at 15:58, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> > On Mon, Feb 01, 2016 at 11:54:52AM +0100, Ard Biesheuvel wrote:
> >> >> This moves the module area to right before the vmalloc area, and
> >> >> moves the kernel image to the base of the vmalloc area. This is
> >> >> an intermediate step towards implementing KASLR, which allows the
> >> >> kernel image to be located anywhere in the vmalloc area.
> >> >>
> >> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> >
> >> > This patch is causing lots of KASAN warnings on Juno (interestingly, it
> >> > doesn't seem to trigger on Seattle, though we only tried for-next/core).
> >> > I pushed the branch that I'm currently using here:
> >> >
> >> > git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux for-next/kernmap
> >> >
> >> >
> >> > A typical error (though its place varies based on the config options,
> >> > kernel layout):
> >> >
> >> > BUG: KASAN: stack-out-of-bounds in clockevents_program_event+0x28/0x1b0 at addr ffffffc936257cc8
> >>
> >> Can you confirm that these are stack accesses? I was having similar
> >> errors before, and I ended up creating the kasan zero page patch
> >> because it turned out the kasan shadow page in question was aliased
> >> and the stack writes were occurring elsewhere.
> >
> > It's possible, we are looking into this. Is there any other patch I miss on
> > the above branch?
> 
> I don't think so but I will check

Commit 7b1af9795773 ("arm64: kasan: ensure that the KASAN zero page is
mapped read-only") was merged in -rc2 while the branch above is based on
-rc1. Anyway, I merged it into -rc2 and the errors are similar.

-- 
Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-12 15:26           ` Catalin Marinas
@ 2016-02-12 15:38             ` Sudeep Holla
  2016-02-12 16:06               ` Catalin Marinas
  0 siblings, 1 reply; 78+ messages in thread
From: Sudeep Holla @ 2016-02-12 15:38 UTC (permalink / raw)
  To: linux-arm-kernel


On 12/02/16 15:26, Catalin Marinas wrote:
> On Fri, Feb 12, 2016 at 04:17:09PM +0100, Ard Biesheuvel wrote:
>> On 12 February 2016 at 16:10, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>> On Fri, Feb 12, 2016 at 04:02:58PM +0100, Ard Biesheuvel wrote:
>>>> On 12 February 2016 at 15:58, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>>>> On Mon, Feb 01, 2016 at 11:54:52AM +0100, Ard Biesheuvel wrote:
>>>>>> This moves the module area to right before the vmalloc area, and
>>>>>> moves the kernel image to the base of the vmalloc area. This is
>>>>>> an intermediate step towards implementing KASLR, which allows the
>>>>>> kernel image to be located anywhere in the vmalloc area.
>>>>>>
>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>>
>>>>> This patch is causing lots of KASAN warnings on Juno (interestingly, it
>>>>> doesn't seem to trigger on Seattle, though we only tried for-next/core).
>>>>> I pushed the branch that I'm currently using here:
>>>>>
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux for-next/kernmap
>>>>>
>>>>>
>>>>> A typical error (though its place varies based on the config options,
>>>>> kernel layout):
>>>>>
>>>>> BUG: KASAN: stack-out-of-bounds in clockevents_program_event+0x28/0x1b0 at addr ffffffc936257cc8
>>>>
>>>> Can you confirm that these are stack accesses? I was having similar
>>>> errors before, and I ended up creating the kasan zero page patch
>>>> because it turned out the kasan shadow page in question was aliased
>>>> and the stack writes were occurring elsewhere.
>>>
>>> It's possible, we are looking into this. Is there any other patch I miss on
>>> the above branch?
>>
>> I don't think so but I will check
>
> Commit 7b1af9795773 ("arm64: kasan: ensure that the KASAN zero page is
> mapped read-only") was merged in -rc2 while the branch above is based on
> -rc1. Anyway, I merged it into -rc2 and the errors are similar.
>

Sorry to add more confusion, but I observed similar KASAN warning
with latest mainline(v4.5-rc3+, commit c05235d50f68) with below diff.

Regards,
Sudeep

--->8

diff --git i/arch/arm64/Kconfig w/arch/arm64/Kconfig
index 8cc62289a63e..fdd1d75f5bad 100644
--- i/arch/arm64/Kconfig
+++ w/arch/arm64/Kconfig
@@ -9,6 +9,7 @@ config ARM64
         select ARCH_HAS_GCOV_PROFILE_ALL
         select ARCH_HAS_SG_CHAIN
         select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
+       select ARCH_HAS_UBSAN_SANITIZE_ALL
         select ARCH_USE_CMPXCHG_LOCKREF
         select ARCH_SUPPORTS_ATOMIC_RMW
         select ARCH_WANT_OPTIONAL_GPIOLIB
diff --git i/arch/arm64/configs/defconfig w/arch/arm64/configs/defconfig
index 86581f793e39..0006b0204b97 100644
--- i/arch/arm64/configs/defconfig
+++ w/arch/arm64/configs/defconfig
@@ -240,11 +240,14 @@ CONFIG_DEBUG_INFO=y
  CONFIG_DEBUG_FS=y
  CONFIG_MAGIC_SYSRQ=y
  CONFIG_DEBUG_KERNEL=y
+CONFIG_KASAN=y
+CONFIG_TEST_KASAN=m
  CONFIG_LOCKUP_DETECTOR=y
  # CONFIG_SCHED_DEBUG is not set
  # CONFIG_DEBUG_PREEMPT is not set
  # CONFIG_FTRACE is not set
  CONFIG_MEMTEST=y
+CONFIG_UBSAN=y
  CONFIG_SECURITY=y
  CONFIG_CRYPTO_ECHAINIV=y
  CONFIG_CRYPTO_ANSI_CPRNG=y

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-12 15:38             ` Sudeep Holla
@ 2016-02-12 16:06               ` Catalin Marinas
  2016-02-12 16:44                 ` Ard Biesheuvel
  2016-02-15 14:28                 ` Andrey Ryabinin
  0 siblings, 2 replies; 78+ messages in thread
From: Catalin Marinas @ 2016-02-12 16:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 12, 2016 at 03:38:46PM +0000, Sudeep Holla wrote:
> 
> On 12/02/16 15:26, Catalin Marinas wrote:
> >On Fri, Feb 12, 2016 at 04:17:09PM +0100, Ard Biesheuvel wrote:
> >>On 12 February 2016 at 16:10, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >>>On Fri, Feb 12, 2016 at 04:02:58PM +0100, Ard Biesheuvel wrote:
> >>>>On 12 February 2016 at 15:58, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >>>>>On Mon, Feb 01, 2016 at 11:54:52AM +0100, Ard Biesheuvel wrote:
> >>>>>>This moves the module area to right before the vmalloc area, and
> >>>>>>moves the kernel image to the base of the vmalloc area. This is
> >>>>>>an intermediate step towards implementing KASLR, which allows the
> >>>>>>kernel image to be located anywhere in the vmalloc area.
> >>>>>>
> >>>>>>Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >>>>>
> >>>>>This patch is causing lots of KASAN warnings on Juno (interestingly, it
> >>>>>doesn't seem to trigger on Seattle, though we only tried for-next/core).
> >>>>>I pushed the branch that I'm currently using here:
> >>>>>
> >>>>>git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux for-next/kernmap
> >>>>>
> >>>>>
> >>>>>A typical error (though its place varies based on the config options,
> >>>>>kernel layout):
> >>>>>
> >>>>>BUG: KASAN: stack-out-of-bounds in clockevents_program_event+0x28/0x1b0 at addr ffffffc936257cc8
> >>>>
> >>>>Can you confirm that these are stack accesses? I was having similar
> >>>>errors before, and I ended up creating the kasan zero page patch
> >>>>because it turned out the kasan shadow page in question was aliased
> >>>>and the stack writes were occurring elsewhere.
> >>>
> >>>It's possible, we are looking into this. Is there any other patch I miss on
> >>>the above branch?
> >>
> >>I don't think so but I will check
> >
> >Commit 7b1af9795773 ("arm64: kasan: ensure that the KASAN zero page is
> >mapped read-only") was merged in -rc2 while the branch above is based on
> >-rc1. Anyway, I merged it into -rc2 and the errors are similar.
> >
> 
> Sorry to add more confusion, but I observed similar KASAN warning
> with latest mainline(v4.5-rc3+, commit c05235d50f68) with below diff.

I can reproduce this with UBSAN enabled (log below for the record).

So far, we have:

KASAN+for-next/kernmap goes wrong
KASAN+UBSAN goes wrong

Enabled individually, KASAN, UBSAN and for-next/kernmap seem fine. I may
have to trim for-next/core down until we figure out where the problem
is.


BUG: KASAN: stack-out-of-bounds in find_busiest_group+0x164/0x16a0 at addr ffffffc93665bc8c
Read of size 4 by task swapper/3/0
page:ffffffbde6d996c0 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x4000000000000000()
page dumped because: kasan: bad access detected
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.5.0-rc3+ #134
Hardware name: Juno (DT)
Call trace:
[<ffffffc00008f8f0>] dump_backtrace+0x0/0x358
[<ffffffc00008fc5c>] show_stack+0x14/0x20
[<ffffffc00069d0a8>] dump_stack+0x108/0x150
[<ffffffc0003077f8>] kasan_report_error+0x690/0x970
[<ffffffc0003082c0>] kasan_report+0x60/0xc0
[<ffffffc00030634c>] __asan_load4+0x64/0x80
[<ffffffc00015f714>] find_busiest_group+0x164/0x16a0
[<ffffffc000160ea0>] load_balance+0x250/0x1450
[<ffffffc0001630c0>] pick_next_task_fair+0x5d0/0xb40
[<ffffffc000f08090>] __schedule+0x460/0xbc8
[<ffffffc000f08870>] schedule+0x78/0x208
[<ffffffc000f092d4>] schedule_preempt_disabled+0x3c/0xd8
[<ffffffc000172208>] cpu_startup_entry+0x160/0x4c8
[<ffffffc0000985b8>] secondary_start_kernel+0x280/0x428
[<0000000080082e2c>] 0x80082e2c
Memory state around the buggy address:
 ffffffc93665bb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffc93665bc00: f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 00 f1 f1
>ffffffc93665bc80: f1 f1 00 00 00 00 f3 f3 00 f4 f4 f4 f3 f3 f3 f3
                      ^
 ffffffc93665bd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffc93665bd80: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f4 f4 f4

-- 
Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-12 16:06               ` Catalin Marinas
@ 2016-02-12 16:44                 ` Ard Biesheuvel
  2016-02-15 14:28                 ` Andrey Ryabinin
  1 sibling, 0 replies; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-12 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 12 February 2016 at 17:06, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Fri, Feb 12, 2016 at 03:38:46PM +0000, Sudeep Holla wrote:
>>
>> On 12/02/16 15:26, Catalin Marinas wrote:
>> >On Fri, Feb 12, 2016 at 04:17:09PM +0100, Ard Biesheuvel wrote:
>> >>On 12 February 2016 at 16:10, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >>>On Fri, Feb 12, 2016 at 04:02:58PM +0100, Ard Biesheuvel wrote:
>> >>>>On 12 February 2016 at 15:58, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >>>>>On Mon, Feb 01, 2016 at 11:54:52AM +0100, Ard Biesheuvel wrote:
>> >>>>>>This moves the module area to right before the vmalloc area, and
>> >>>>>>moves the kernel image to the base of the vmalloc area. This is
>> >>>>>>an intermediate step towards implementing KASLR, which allows the
>> >>>>>>kernel image to be located anywhere in the vmalloc area.
>> >>>>>>
>> >>>>>>Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> >>>>>
>> >>>>>This patch is causing lots of KASAN warnings on Juno (interestingly, it
>> >>>>>doesn't seem to trigger on Seattle, though we only tried for-next/core).
>> >>>>>I pushed the branch that I'm currently using here:
>> >>>>>
>> >>>>>git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux for-next/kernmap
>> >>>>>
>> >>>>>
>> >>>>>A typical error (though its place varies based on the config options,
>> >>>>>kernel layout):
>> >>>>>
>> >>>>>BUG: KASAN: stack-out-of-bounds in clockevents_program_event+0x28/0x1b0 at addr ffffffc936257cc8
>> >>>>
>> >>>>Can you confirm that these are stack accesses? I was having similar
>> >>>>errors before, and I ended up creating the kasan zero page patch
>> >>>>because it turned out the kasan shadow page in question was aliased
>> >>>>and the stack writes were occurring elsewhere.
>> >>>
>> >>>It's possible, we are looking into this. Is there any other patch I miss on
>> >>>the above branch?
>> >>
>> >>I don't think so but I will check
>> >
>> >Commit 7b1af9795773 ("arm64: kasan: ensure that the KASAN zero page is
>> >mapped read-only") was merged in -rc2 while the branch above is based on
>> >-rc1. Anyway, I merged it into -rc2 and the errors are similar.
>> >
>>
>> Sorry to add more confusion, but I observed similar KASAN warning
>> with latest mainline(v4.5-rc3+, commit c05235d50f68) with below diff.
>
> I can reproduce this with UBSAN enabled (log below for the record).
>
> So far, we have:
>
> KASAN+for-next/kernmap goes wrong
> KASAN+UBSAN goes wrong
>
> Enabled individually, KASAN, UBSAN and for-next/kernmap seem fine. I may
> have to trim for-next/core down until we figure out where the problem
> is.
>

I haven't managed to reproduce this yet on QEMU, Seattle or FVP, but I
did notice something that may or may not be related:
without my changes the memory map show this:

    kasan   : 0xffffff8000000000 - 0xffffff9000000000   (    64 GB)
    vmalloc : 0xffffff9000010000 - 0xffffffbdbfff0000   (   182 GB)

i.e., there is a 64 KB guard region between the shadow region and the
vmalloc region. I am not sure what it is for, but I realize now that I
accidentally removed it in my patch:

    kasan   : 0xffffff8000000000 - 0xffffff9000000000   (    64 GB)
    modules : 0xffffff9000000000 - 0xffffff9004000000   (    64 MB)
    vmalloc : 0xffffff9004000000 - 0xffffffbdbfff0000   (   182 GB)



>
> BUG: KASAN: stack-out-of-bounds in find_busiest_group+0x164/0x16a0 at addr ffffffc93665bc8c
> Read of size 4 by task swapper/3/0
> page:ffffffbde6d996c0 count:0 mapcount:0 mapping:          (null) index:0x0
> flags: 0x4000000000000000()
> page dumped because: kasan: bad access detected
> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.5.0-rc3+ #134
> Hardware name: Juno (DT)
> Call trace:
> [<ffffffc00008f8f0>] dump_backtrace+0x0/0x358
> [<ffffffc00008fc5c>] show_stack+0x14/0x20
> [<ffffffc00069d0a8>] dump_stack+0x108/0x150
> [<ffffffc0003077f8>] kasan_report_error+0x690/0x970
> [<ffffffc0003082c0>] kasan_report+0x60/0xc0
> [<ffffffc00030634c>] __asan_load4+0x64/0x80
> [<ffffffc00015f714>] find_busiest_group+0x164/0x16a0
> [<ffffffc000160ea0>] load_balance+0x250/0x1450
> [<ffffffc0001630c0>] pick_next_task_fair+0x5d0/0xb40
> [<ffffffc000f08090>] __schedule+0x460/0xbc8
> [<ffffffc000f08870>] schedule+0x78/0x208
> [<ffffffc000f092d4>] schedule_preempt_disabled+0x3c/0xd8
> [<ffffffc000172208>] cpu_startup_entry+0x160/0x4c8
> [<ffffffc0000985b8>] secondary_start_kernel+0x280/0x428
> [<0000000080082e2c>] 0x80082e2c
> Memory state around the buggy address:
>  ffffffc93665bb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  ffffffc93665bc00: f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 00 f1 f1
>>ffffffc93665bc80: f1 f1 00 00 00 00 f3 f3 00 f4 f4 f4 f3 f3 f3 f3
>                       ^
>  ffffffc93665bd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  ffffffc93665bd80: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f4 f4 f4
>
> --
> Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-01 10:54 ` [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area Ard Biesheuvel
                     ` (2 preceding siblings ...)
  2016-02-12 14:58   ` Catalin Marinas
@ 2016-02-12 17:47   ` James Morse
  2016-02-12 18:01     ` Ard Biesheuvel
  3 siblings, 1 reply; 78+ messages in thread
From: James Morse @ 2016-02-12 17:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

On 01/02/16 10:54, Ard Biesheuvel wrote:
> This moves the module area to right before the vmalloc area, and
> moves the kernel image to the base of the vmalloc area. This is
> an intermediate step towards implementing KASLR, which allows the
> kernel image to be located anywhere in the vmalloc area.

I've rebased hibernate onto for-next/core, and this patch leads to the hibernate
core code falling down a kernel shaped hole in the linear map.

The hibernate code assumes that for zones returned by for_each_populated_zone(),
if pfn_valid() says a page is present, then it is okay to access the page via
page_address(pfn_to_page(pfn)). But for pfns that correspond to the kernel text,
this is still returning an address in the linear map, which isn't mapped...

I'm not sure what the correct fix is here.
Should this sort of walk be valid?


>From include/linux/mm.h:
> static __always_inline void *lowmem_page_address(const struct page *page)
> {
>	return __va(PFN_PHYS(page_to_pfn(page)));
> }


Suggestions welcome!


Thanks,

James

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-12 17:47   ` [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area James Morse
@ 2016-02-12 18:01     ` Ard Biesheuvel
  0 siblings, 0 replies; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-12 18:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 12 February 2016 at 18:47, James Morse <james.morse@arm.com> wrote:
> Hi Ard,
>
> On 01/02/16 10:54, Ard Biesheuvel wrote:
>> This moves the module area to right before the vmalloc area, and
>> moves the kernel image to the base of the vmalloc area. This is
>> an intermediate step towards implementing KASLR, which allows the
>> kernel image to be located anywhere in the vmalloc area.
>
> I've rebased hibernate onto for-next/core, and this patch leads to the hibernate
> core code falling down a kernel shaped hole in the linear map.
>
> The hibernate code assumes that for zones returned by for_each_populated_zone(),
> if pfn_valid() says a page is present, then it is okay to access the page via
> page_address(pfn_to_page(pfn)). But for pfns that correspond to the kernel text,
> this is still returning an address in the linear map, which isn't mapped...
>
> I'm not sure what the correct fix is here.
> Should this sort of walk be valid?
>

I think the correct fix would be to mark the [_stext, _etext] interval
as NOMAP. That will also simplify the mapping routine where I now
check manually whether a memblock intersects that interval. And it
should make this particular piece of code behave.

However, you would still need to preserve the contents of the
interval, since the generic hibernate routines will not do that
anymore after this change.

I will experiment with this on Monday, and report back.

Thanks,
Ard.


>
> From include/linux/mm.h:
>> static __always_inline void *lowmem_page_address(const struct page *page)
>> {
>>       return __va(PFN_PHYS(page_to_pfn(page)));
>> }
>
>
> Suggestions welcome!
>
>
> Thanks,
>
> James

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 0/8] arm64: split linear and kernel mappings
  2016-02-01 10:54 [PATCH v5sub1 0/8] arm64: split linear and kernel mappings Ard Biesheuvel
                   ` (7 preceding siblings ...)
  2016-02-01 10:54 ` [PATCH v5sub1 8/8] arm64: allow kernel Image to be loaded anywhere in physical memory Ard Biesheuvel
@ 2016-02-12 19:45 ` Matthias Brugger
  2016-02-12 19:47   ` Ard Biesheuvel
  8 siblings, 1 reply; 78+ messages in thread
From: Matthias Brugger @ 2016-02-12 19:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

On 01/02/16 11:54, Ard Biesheuvel wrote:
> At the request of Catalin, this series has been split off from my series
> 'arm64: implement support for KASLR v4' [1]. This sub-series deals with
> moving the kernel out of the linear mapping into the vmalloc area. This
> is a prerequisite for independent physical and virtual randomization of
> the kernel image. On top of that, considering that these changes allow
> the linear mapping to start at an arbitrary offset above PAGE_OFFSET, it
> should be an improvement in itself due to the fact that we can now choose
> PAGE_OFFSET such that RAM can be mapped using large block sizes.
>
> For instance, on my Seattle A0 box, the kernel is loaded 16 MB into the
> lowest GB of RAM, which means __pa(PAGE_OFFSET) is not 1 GB aligned, and
> the entire 16 GB of RAM will be mapping using 2 MB blocks. (Similarly,
> for 64 KB granule kernels, the entire 16 GB of RAM will be mapped using
> pages since __pa(PAGE_OFFSET) is not 512 MB aligned). With these changes
>   __pa(PAGE_OFFSET) will always be chosen such that it is aligned to a
> quantity that allows efficient mapping.
>
> Note that of the entire KASLR series, this sub-series is the most likely to
> cause problems, and hence requires the most careful review and testing. This
> is due to the fact that, with these changes, the invariant __va(__pa(x)) == x
> no longer holds, and any code that is based on that assumption needs to be
> updated.
>
> Changes since v4:
> - added Marc's ack to patch #6
> - round the kasan zero shadow region around the kernel image to swapper block
>    size (#7)
> - ensure that we don't clip the kernel image when clipping RAM to the linear
>    region size (#8)
>
> Patch #1 allows the low mark of memblocks discovered from the FDT to be
> overridden by the architecture.
>
> Patch #2 enables the huge-vmap generic feature for arm64. This should be an
> improvement in itself, but the significance for this series is that it allows
> unmap_kernel_range() to be called on the [__init_begin, __init_end) region,
> which may be partially mapped using block mappings.
>
> Patch #3 introduces KIMAGE_VADDR as a separate, preparatory step towards
> decoupling the kernel placement from PAGE_OFFSET
>
> Patch #4 implements some translation table accessors that operate on statically
> allocate translation tables before the linear mapping is up.
>
> Patch #5 decouples the fixmap initialization from the linear mapping, by using
> the accessors implemented by patch #4
>
> Patch #6 removes assumptions made my KVM regarding the placement of the kernel
> image inside the linear mapping.
>
> Patch #7 moves the kernel image from the base of the linear mapping to the base
> of the vmalloc area. The modules area, which sits right below the kernel image,
> is moved along and is put right before the start of the vmalloc area.
>
> Patch #8 decouples PHYS_OFFSET from PAGE_OFFSET, which allows the linear mapping
> to cover all discovered memory, regardless of where the kernel image is located
> in it. This effectively allows the kernel to be loaded at any physical address
> (provided that the correct alignment is used)
>
> [1] http://thread.gmane.org/gmane.linux.kernel/2135931
>
> Ard Biesheuvel (8):
>    of/fdt: make memblock minimum physical address arch configurable
>    arm64: add support for ioremap() block mappings
>    arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
>    arm64: pgtable: implement static [pte|pmd|pud]_offset variants
>    arm64: decouple early fixmap init from linear mapping
>    arm64: kvm: deal with kernel symbols outside of linear mapping
>    arm64: move kernel image to base of vmalloc area
>    arm64: allow kernel Image to be loaded anywhere in physical memory
>

I bisected linux-next (20160212) with the following error on booting 
with an initramfs:
  Failed to execute /init (error -8)
  request_module: runaway loop modprobe binfmt-464c
  Starting init: /sbin/init exists but couldn't execute it (error -8)
  request_module: runaway loop modprobe binfmt-464c
  Starting init: /bin/sh exists but couldn't execute it (error -8)
  Kernel panic - not syncing: No working init found.  Try passing init= 
option to kernel. See Linux Documentation/init..

I tracked down the error to patch 7 of this series. But I realized that 
patch 7 does not compile, but from patch 8 onwards I observe the error.

I use defconfig with an initramfs.cpio created with buildroot.
I tested this on my mt8173 eval board, but I suppose this can be 
reproduced easily on other machines as well.

Regards,
Matthias

>   Documentation/arm64/booting.txt                      |  20 ++-
>   Documentation/features/vm/huge-vmap/arch-support.txt |   2 +-
>   arch/arm/include/asm/kvm_asm.h                       |   2 +
>   arch/arm/kvm/arm.c                                   |   8 +-
>   arch/arm64/Kconfig                                   |   1 +
>   arch/arm64/include/asm/boot.h                        |   6 +
>   arch/arm64/include/asm/kasan.h                       |   2 +-
>   arch/arm64/include/asm/kernel-pgtable.h              |  12 ++
>   arch/arm64/include/asm/kvm_asm.h                     |   2 +
>   arch/arm64/include/asm/kvm_host.h                    |   8 +-
>   arch/arm64/include/asm/memory.h                      |  44 ++++--
>   arch/arm64/include/asm/pgtable.h                     |  23 ++-
>   arch/arm64/kernel/head.S                             |   8 +-
>   arch/arm64/kernel/image.h                            |  13 +-
>   arch/arm64/kernel/vmlinux.lds.S                      |   4 +-
>   arch/arm64/kvm/hyp.S                                 |   6 +-
>   arch/arm64/mm/dump.c                                 |  12 +-
>   arch/arm64/mm/init.c                                 | 123 ++++++++++++++--
>   arch/arm64/mm/kasan_init.c                           |  31 +++-
>   arch/arm64/mm/mmu.c                                  | 155 +++++++++++++++-----
>   drivers/of/fdt.c                                     |   5 +-
>   21 files changed, 378 insertions(+), 109 deletions(-)
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 0/8] arm64: split linear and kernel mappings
  2016-02-12 19:45 ` [PATCH v5sub1 0/8] arm64: split linear and kernel mappings Matthias Brugger
@ 2016-02-12 19:47   ` Ard Biesheuvel
  2016-02-12 20:10     ` Matthias Brugger
  0 siblings, 1 reply; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-12 19:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 12 February 2016 at 20:45, Matthias Brugger <matthias.bgg@gmail.com> wrote:
> Hi Ard,
>
>
> On 01/02/16 11:54, Ard Biesheuvel wrote:
>>
>> At the request of Catalin, this series has been split off from my series
>> 'arm64: implement support for KASLR v4' [1]. This sub-series deals with
>> moving the kernel out of the linear mapping into the vmalloc area. This
>> is a prerequisite for independent physical and virtual randomization of
>> the kernel image. On top of that, considering that these changes allow
>> the linear mapping to start at an arbitrary offset above PAGE_OFFSET, it
>> should be an improvement in itself due to the fact that we can now choose
>> PAGE_OFFSET such that RAM can be mapped using large block sizes.
>>
>> For instance, on my Seattle A0 box, the kernel is loaded 16 MB into the
>> lowest GB of RAM, which means __pa(PAGE_OFFSET) is not 1 GB aligned, and
>> the entire 16 GB of RAM will be mapping using 2 MB blocks. (Similarly,
>> for 64 KB granule kernels, the entire 16 GB of RAM will be mapped using
>> pages since __pa(PAGE_OFFSET) is not 512 MB aligned). With these changes
>>   __pa(PAGE_OFFSET) will always be chosen such that it is aligned to a
>> quantity that allows efficient mapping.
>>
>> Note that of the entire KASLR series, this sub-series is the most likely
>> to
>> cause problems, and hence requires the most careful review and testing.
>> This
>> is due to the fact that, with these changes, the invariant __va(__pa(x))
>> == x
>> no longer holds, and any code that is based on that assumption needs to be
>> updated.
>>
>> Changes since v4:
>> - added Marc's ack to patch #6
>> - round the kasan zero shadow region around the kernel image to swapper
>> block
>>    size (#7)
>> - ensure that we don't clip the kernel image when clipping RAM to the
>> linear
>>    region size (#8)
>>
>> Patch #1 allows the low mark of memblocks discovered from the FDT to be
>> overridden by the architecture.
>>
>> Patch #2 enables the huge-vmap generic feature for arm64. This should be
>> an
>> improvement in itself, but the significance for this series is that it
>> allows
>> unmap_kernel_range() to be called on the [__init_begin, __init_end)
>> region,
>> which may be partially mapped using block mappings.
>>
>> Patch #3 introduces KIMAGE_VADDR as a separate, preparatory step towards
>> decoupling the kernel placement from PAGE_OFFSET
>>
>> Patch #4 implements some translation table accessors that operate on
>> statically
>> allocate translation tables before the linear mapping is up.
>>
>> Patch #5 decouples the fixmap initialization from the linear mapping, by
>> using
>> the accessors implemented by patch #4
>>
>> Patch #6 removes assumptions made my KVM regarding the placement of the
>> kernel
>> image inside the linear mapping.
>>
>> Patch #7 moves the kernel image from the base of the linear mapping to the
>> base
>> of the vmalloc area. The modules area, which sits right below the kernel
>> image,
>> is moved along and is put right before the start of the vmalloc area.
>>
>> Patch #8 decouples PHYS_OFFSET from PAGE_OFFSET, which allows the linear
>> mapping
>> to cover all discovered memory, regardless of where the kernel image is
>> located
>> in it. This effectively allows the kernel to be loaded at any physical
>> address
>> (provided that the correct alignment is used)
>>
>> [1] http://thread.gmane.org/gmane.linux.kernel/2135931
>>
>> Ard Biesheuvel (8):
>>    of/fdt: make memblock minimum physical address arch configurable
>>    arm64: add support for ioremap() block mappings
>>    arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
>>    arm64: pgtable: implement static [pte|pmd|pud]_offset variants
>>    arm64: decouple early fixmap init from linear mapping
>>    arm64: kvm: deal with kernel symbols outside of linear mapping
>>    arm64: move kernel image to base of vmalloc area
>>    arm64: allow kernel Image to be loaded anywhere in physical memory
>>
>
> I bisected linux-next (20160212) with the following error on booting with an
> initramfs:
>  Failed to execute /init (error -8)
>  request_module: runaway loop modprobe binfmt-464c
>  Starting init: /sbin/init exists but couldn't execute it (error -8)
>  request_module: runaway loop modprobe binfmt-464c
>  Starting init: /bin/sh exists but couldn't execute it (error -8)
>  Kernel panic - not syncing: No working init found.  Try passing init=
> option to kernel. See Linux Documentation/init..
>
> I tracked down the error to patch 7 of this series. But I realized that
> patch 7 does not compile, but from patch 8 onwards I observe the error.
>
> I use defconfig with an initramfs.cpio created with buildroot.
> I tested this on my mt8173 eval board, but I suppose this can be reproduced
> easily on other machines as well.
>

Thanks for the report. Does this help at all?

http://thread.gmane.org/gmane.linux.ports.arm.kernel/477645

> Regards,
> Matthias
>
>
>>   Documentation/arm64/booting.txt                      |  20 ++-
>>   Documentation/features/vm/huge-vmap/arch-support.txt |   2 +-
>>   arch/arm/include/asm/kvm_asm.h                       |   2 +
>>   arch/arm/kvm/arm.c                                   |   8 +-
>>   arch/arm64/Kconfig                                   |   1 +
>>   arch/arm64/include/asm/boot.h                        |   6 +
>>   arch/arm64/include/asm/kasan.h                       |   2 +-
>>   arch/arm64/include/asm/kernel-pgtable.h              |  12 ++
>>   arch/arm64/include/asm/kvm_asm.h                     |   2 +
>>   arch/arm64/include/asm/kvm_host.h                    |   8 +-
>>   arch/arm64/include/asm/memory.h                      |  44 ++++--
>>   arch/arm64/include/asm/pgtable.h                     |  23 ++-
>>   arch/arm64/kernel/head.S                             |   8 +-
>>   arch/arm64/kernel/image.h                            |  13 +-
>>   arch/arm64/kernel/vmlinux.lds.S                      |   4 +-
>>   arch/arm64/kvm/hyp.S                                 |   6 +-
>>   arch/arm64/mm/dump.c                                 |  12 +-
>>   arch/arm64/mm/init.c                                 | 123
>> ++++++++++++++--
>>   arch/arm64/mm/kasan_init.c                           |  31 +++-
>>   arch/arm64/mm/mmu.c                                  | 155
>> +++++++++++++++-----
>>   drivers/of/fdt.c                                     |   5 +-
>>   21 files changed, 378 insertions(+), 109 deletions(-)
>>
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 0/8] arm64: split linear and kernel mappings
  2016-02-12 19:47   ` Ard Biesheuvel
@ 2016-02-12 20:10     ` Matthias Brugger
  2016-02-12 20:37       ` Ard Biesheuvel
  2016-02-13 14:28       ` Ard Biesheuvel
  0 siblings, 2 replies; 78+ messages in thread
From: Matthias Brugger @ 2016-02-12 20:10 UTC (permalink / raw)
  To: linux-arm-kernel



On 12/02/16 20:47, Ard Biesheuvel wrote:
> On 12 February 2016 at 20:45, Matthias Brugger <matthias.bgg@gmail.com> wrote:
>> Hi Ard,
>>
>>
>> On 01/02/16 11:54, Ard Biesheuvel wrote:
>>>
>>> At the request of Catalin, this series has been split off from my series
>>> 'arm64: implement support for KASLR v4' [1]. This sub-series deals with
>>> moving the kernel out of the linear mapping into the vmalloc area. This
>>> is a prerequisite for independent physical and virtual randomization of
>>> the kernel image. On top of that, considering that these changes allow
>>> the linear mapping to start at an arbitrary offset above PAGE_OFFSET, it
>>> should be an improvement in itself due to the fact that we can now choose
>>> PAGE_OFFSET such that RAM can be mapped using large block sizes.
>>>
>>> For instance, on my Seattle A0 box, the kernel is loaded 16 MB into the
>>> lowest GB of RAM, which means __pa(PAGE_OFFSET) is not 1 GB aligned, and
>>> the entire 16 GB of RAM will be mapping using 2 MB blocks. (Similarly,
>>> for 64 KB granule kernels, the entire 16 GB of RAM will be mapped using
>>> pages since __pa(PAGE_OFFSET) is not 512 MB aligned). With these changes
>>>    __pa(PAGE_OFFSET) will always be chosen such that it is aligned to a
>>> quantity that allows efficient mapping.
>>>
>>> Note that of the entire KASLR series, this sub-series is the most likely
>>> to
>>> cause problems, and hence requires the most careful review and testing.
>>> This
>>> is due to the fact that, with these changes, the invariant __va(__pa(x))
>>> == x
>>> no longer holds, and any code that is based on that assumption needs to be
>>> updated.
>>>
>>> Changes since v4:
>>> - added Marc's ack to patch #6
>>> - round the kasan zero shadow region around the kernel image to swapper
>>> block
>>>     size (#7)
>>> - ensure that we don't clip the kernel image when clipping RAM to the
>>> linear
>>>     region size (#8)
>>>
>>> Patch #1 allows the low mark of memblocks discovered from the FDT to be
>>> overridden by the architecture.
>>>
>>> Patch #2 enables the huge-vmap generic feature for arm64. This should be
>>> an
>>> improvement in itself, but the significance for this series is that it
>>> allows
>>> unmap_kernel_range() to be called on the [__init_begin, __init_end)
>>> region,
>>> which may be partially mapped using block mappings.
>>>
>>> Patch #3 introduces KIMAGE_VADDR as a separate, preparatory step towards
>>> decoupling the kernel placement from PAGE_OFFSET
>>>
>>> Patch #4 implements some translation table accessors that operate on
>>> statically
>>> allocate translation tables before the linear mapping is up.
>>>
>>> Patch #5 decouples the fixmap initialization from the linear mapping, by
>>> using
>>> the accessors implemented by patch #4
>>>
>>> Patch #6 removes assumptions made my KVM regarding the placement of the
>>> kernel
>>> image inside the linear mapping.
>>>
>>> Patch #7 moves the kernel image from the base of the linear mapping to the
>>> base
>>> of the vmalloc area. The modules area, which sits right below the kernel
>>> image,
>>> is moved along and is put right before the start of the vmalloc area.
>>>
>>> Patch #8 decouples PHYS_OFFSET from PAGE_OFFSET, which allows the linear
>>> mapping
>>> to cover all discovered memory, regardless of where the kernel image is
>>> located
>>> in it. This effectively allows the kernel to be loaded at any physical
>>> address
>>> (provided that the correct alignment is used)
>>>
>>> [1] http://thread.gmane.org/gmane.linux.kernel/2135931
>>>
>>> Ard Biesheuvel (8):
>>>     of/fdt: make memblock minimum physical address arch configurable
>>>     arm64: add support for ioremap() block mappings
>>>     arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
>>>     arm64: pgtable: implement static [pte|pmd|pud]_offset variants
>>>     arm64: decouple early fixmap init from linear mapping
>>>     arm64: kvm: deal with kernel symbols outside of linear mapping
>>>     arm64: move kernel image to base of vmalloc area
>>>     arm64: allow kernel Image to be loaded anywhere in physical memory
>>>
>>
>> I bisected linux-next (20160212) with the following error on booting with an
>> initramfs:
>>   Failed to execute /init (error -8)
>>   request_module: runaway loop modprobe binfmt-464c
>>   Starting init: /sbin/init exists but couldn't execute it (error -8)
>>   request_module: runaway loop modprobe binfmt-464c
>>   Starting init: /bin/sh exists but couldn't execute it (error -8)
>>   Kernel panic - not syncing: No working init found.  Try passing init=
>> option to kernel. See Linux Documentation/init..
>>
>> I tracked down the error to patch 7 of this series. But I realized that
>> patch 7 does not compile, but from patch 8 onwards I observe the error.
>>
>> I use defconfig with an initramfs.cpio created with buildroot.
>> I tested this on my mt8173 eval board, but I suppose this can be reproduced
>> easily on other machines as well.
>>
>
> Thanks for the report. Does this help at all?
>
> http://thread.gmane.org/gmane.linux.ports.arm.kernel/477645
>

I applied them on top of linux-next and this fixed the problem.

Thanks,
Matthias

>> Regards,
>> Matthias
>>
>>
>>>    Documentation/arm64/booting.txt                      |  20 ++-
>>>    Documentation/features/vm/huge-vmap/arch-support.txt |   2 +-
>>>    arch/arm/include/asm/kvm_asm.h                       |   2 +
>>>    arch/arm/kvm/arm.c                                   |   8 +-
>>>    arch/arm64/Kconfig                                   |   1 +
>>>    arch/arm64/include/asm/boot.h                        |   6 +
>>>    arch/arm64/include/asm/kasan.h                       |   2 +-
>>>    arch/arm64/include/asm/kernel-pgtable.h              |  12 ++
>>>    arch/arm64/include/asm/kvm_asm.h                     |   2 +
>>>    arch/arm64/include/asm/kvm_host.h                    |   8 +-
>>>    arch/arm64/include/asm/memory.h                      |  44 ++++--
>>>    arch/arm64/include/asm/pgtable.h                     |  23 ++-
>>>    arch/arm64/kernel/head.S                             |   8 +-
>>>    arch/arm64/kernel/image.h                            |  13 +-
>>>    arch/arm64/kernel/vmlinux.lds.S                      |   4 +-
>>>    arch/arm64/kvm/hyp.S                                 |   6 +-
>>>    arch/arm64/mm/dump.c                                 |  12 +-
>>>    arch/arm64/mm/init.c                                 | 123
>>> ++++++++++++++--
>>>    arch/arm64/mm/kasan_init.c                           |  31 +++-
>>>    arch/arm64/mm/mmu.c                                  | 155
>>> +++++++++++++++-----
>>>    drivers/of/fdt.c                                     |   5 +-
>>>    21 files changed, 378 insertions(+), 109 deletions(-)
>>>
>>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 0/8] arm64: split linear and kernel mappings
  2016-02-12 20:10     ` Matthias Brugger
@ 2016-02-12 20:37       ` Ard Biesheuvel
  2016-02-13 14:28       ` Ard Biesheuvel
  1 sibling, 0 replies; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-12 20:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 12 February 2016 at 21:10, Matthias Brugger <matthias.bgg@gmail.com> wrote:
>
>
> On 12/02/16 20:47, Ard Biesheuvel wrote:
>>
>> On 12 February 2016 at 20:45, Matthias Brugger <matthias.bgg@gmail.com>
>> wrote:
>>>
>>> Hi Ard,
>>>
>>>
>>> On 01/02/16 11:54, Ard Biesheuvel wrote:
>>>>
>>>>
>>>> At the request of Catalin, this series has been split off from my series
>>>> 'arm64: implement support for KASLR v4' [1]. This sub-series deals with
>>>> moving the kernel out of the linear mapping into the vmalloc area. This
>>>> is a prerequisite for independent physical and virtual randomization of
>>>> the kernel image. On top of that, considering that these changes allow
>>>> the linear mapping to start at an arbitrary offset above PAGE_OFFSET, it
>>>> should be an improvement in itself due to the fact that we can now
>>>> choose
>>>> PAGE_OFFSET such that RAM can be mapped using large block sizes.
>>>>
>>>> For instance, on my Seattle A0 box, the kernel is loaded 16 MB into the
>>>> lowest GB of RAM, which means __pa(PAGE_OFFSET) is not 1 GB aligned, and
>>>> the entire 16 GB of RAM will be mapping using 2 MB blocks. (Similarly,
>>>> for 64 KB granule kernels, the entire 16 GB of RAM will be mapped using
>>>> pages since __pa(PAGE_OFFSET) is not 512 MB aligned). With these changes
>>>>    __pa(PAGE_OFFSET) will always be chosen such that it is aligned to a
>>>> quantity that allows efficient mapping.
>>>>
>>>> Note that of the entire KASLR series, this sub-series is the most likely
>>>> to
>>>> cause problems, and hence requires the most careful review and testing.
>>>> This
>>>> is due to the fact that, with these changes, the invariant __va(__pa(x))
>>>> == x
>>>> no longer holds, and any code that is based on that assumption needs to
>>>> be
>>>> updated.
>>>>
>>>> Changes since v4:
>>>> - added Marc's ack to patch #6
>>>> - round the kasan zero shadow region around the kernel image to swapper
>>>> block
>>>>     size (#7)
>>>> - ensure that we don't clip the kernel image when clipping RAM to the
>>>> linear
>>>>     region size (#8)
>>>>
>>>> Patch #1 allows the low mark of memblocks discovered from the FDT to be
>>>> overridden by the architecture.
>>>>
>>>> Patch #2 enables the huge-vmap generic feature for arm64. This should be
>>>> an
>>>> improvement in itself, but the significance for this series is that it
>>>> allows
>>>> unmap_kernel_range() to be called on the [__init_begin, __init_end)
>>>> region,
>>>> which may be partially mapped using block mappings.
>>>>
>>>> Patch #3 introduces KIMAGE_VADDR as a separate, preparatory step towards
>>>> decoupling the kernel placement from PAGE_OFFSET
>>>>
>>>> Patch #4 implements some translation table accessors that operate on
>>>> statically
>>>> allocate translation tables before the linear mapping is up.
>>>>
>>>> Patch #5 decouples the fixmap initialization from the linear mapping, by
>>>> using
>>>> the accessors implemented by patch #4
>>>>
>>>> Patch #6 removes assumptions made my KVM regarding the placement of the
>>>> kernel
>>>> image inside the linear mapping.
>>>>
>>>> Patch #7 moves the kernel image from the base of the linear mapping to
>>>> the
>>>> base
>>>> of the vmalloc area. The modules area, which sits right below the kernel
>>>> image,
>>>> is moved along and is put right before the start of the vmalloc area.
>>>>
>>>> Patch #8 decouples PHYS_OFFSET from PAGE_OFFSET, which allows the linear
>>>> mapping
>>>> to cover all discovered memory, regardless of where the kernel image is
>>>> located
>>>> in it. This effectively allows the kernel to be loaded at any physical
>>>> address
>>>> (provided that the correct alignment is used)
>>>>
>>>> [1] http://thread.gmane.org/gmane.linux.kernel/2135931
>>>>
>>>> Ard Biesheuvel (8):
>>>>     of/fdt: make memblock minimum physical address arch configurable
>>>>     arm64: add support for ioremap() block mappings
>>>>     arm64: introduce KIMAGE_VADDR as the virtual base of the kernel
>>>> region
>>>>     arm64: pgtable: implement static [pte|pmd|pud]_offset variants
>>>>     arm64: decouple early fixmap init from linear mapping
>>>>     arm64: kvm: deal with kernel symbols outside of linear mapping
>>>>     arm64: move kernel image to base of vmalloc area
>>>>     arm64: allow kernel Image to be loaded anywhere in physical memory
>>>>
>>>
>>> I bisected linux-next (20160212) with the following error on booting with
>>> an
>>> initramfs:
>>>   Failed to execute /init (error -8)
>>>   request_module: runaway loop modprobe binfmt-464c
>>>   Starting init: /sbin/init exists but couldn't execute it (error -8)
>>>   request_module: runaway loop modprobe binfmt-464c
>>>   Starting init: /bin/sh exists but couldn't execute it (error -8)
>>>   Kernel panic - not syncing: No working init found.  Try passing init=
>>> option to kernel. See Linux Documentation/init..
>>>
>>> I tracked down the error to patch 7 of this series. But I realized that
>>> patch 7 does not compile, but from patch 8 onwards I observe the error.
>>>
>>> I use defconfig with an initramfs.cpio created with buildroot.
>>> I tested this on my mt8173 eval board, but I suppose this can be
>>> reproduced
>>> easily on other machines as well.
>>>
>>
>> Thanks for the report. Does this help at all?
>>
>> http://thread.gmane.org/gmane.linux.ports.arm.kernel/477645
>>
>
> I applied them on top of linux-next and this fixed the problem.
>

Great! Thanks for testing

>>> Regards,
>>> Matthias
>>>
>>>
>>>>    Documentation/arm64/booting.txt                      |  20 ++-
>>>>    Documentation/features/vm/huge-vmap/arch-support.txt |   2 +-
>>>>    arch/arm/include/asm/kvm_asm.h                       |   2 +
>>>>    arch/arm/kvm/arm.c                                   |   8 +-
>>>>    arch/arm64/Kconfig                                   |   1 +
>>>>    arch/arm64/include/asm/boot.h                        |   6 +
>>>>    arch/arm64/include/asm/kasan.h                       |   2 +-
>>>>    arch/arm64/include/asm/kernel-pgtable.h              |  12 ++
>>>>    arch/arm64/include/asm/kvm_asm.h                     |   2 +
>>>>    arch/arm64/include/asm/kvm_host.h                    |   8 +-
>>>>    arch/arm64/include/asm/memory.h                      |  44 ++++--
>>>>    arch/arm64/include/asm/pgtable.h                     |  23 ++-
>>>>    arch/arm64/kernel/head.S                             |   8 +-
>>>>    arch/arm64/kernel/image.h                            |  13 +-
>>>>    arch/arm64/kernel/vmlinux.lds.S                      |   4 +-
>>>>    arch/arm64/kvm/hyp.S                                 |   6 +-
>>>>    arch/arm64/mm/dump.c                                 |  12 +-
>>>>    arch/arm64/mm/init.c                                 | 123
>>>> ++++++++++++++--
>>>>    arch/arm64/mm/kasan_init.c                           |  31 +++-
>>>>    arch/arm64/mm/mmu.c                                  | 155
>>>> +++++++++++++++-----
>>>>    drivers/of/fdt.c                                     |   5 +-
>>>>    21 files changed, 378 insertions(+), 109 deletions(-)
>>>>
>>>
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 0/8] arm64: split linear and kernel mappings
  2016-02-12 20:10     ` Matthias Brugger
  2016-02-12 20:37       ` Ard Biesheuvel
@ 2016-02-13 14:28       ` Ard Biesheuvel
  2016-02-15 13:29         ` Matthias Brugger
  1 sibling, 1 reply; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-13 14:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 12 February 2016 at 21:10, Matthias Brugger <matthias.bgg@gmail.com> wrote:
>
>
> On 12/02/16 20:47, Ard Biesheuvel wrote:
>>
>> On 12 February 2016 at 20:45, Matthias Brugger <matthias.bgg@gmail.com>
>> wrote:
>>>
>>> Hi Ard,
>>>
>>>
>>> On 01/02/16 11:54, Ard Biesheuvel wrote:
>>>>
>>>>
>>>> At the request of Catalin, this series has been split off from my series
>>>> 'arm64: implement support for KASLR v4' [1]. This sub-series deals with
>>>> moving the kernel out of the linear mapping into the vmalloc area. This
>>>> is a prerequisite for independent physical and virtual randomization of
>>>> the kernel image. On top of that, considering that these changes allow
>>>> the linear mapping to start at an arbitrary offset above PAGE_OFFSET, it
>>>> should be an improvement in itself due to the fact that we can now
>>>> choose
>>>> PAGE_OFFSET such that RAM can be mapped using large block sizes.
>>>>
>>>> For instance, on my Seattle A0 box, the kernel is loaded 16 MB into the
>>>> lowest GB of RAM, which means __pa(PAGE_OFFSET) is not 1 GB aligned, and
>>>> the entire 16 GB of RAM will be mapping using 2 MB blocks. (Similarly,
>>>> for 64 KB granule kernels, the entire 16 GB of RAM will be mapped using
>>>> pages since __pa(PAGE_OFFSET) is not 512 MB aligned). With these changes
>>>>    __pa(PAGE_OFFSET) will always be chosen such that it is aligned to a
>>>> quantity that allows efficient mapping.
>>>>
>>>> Note that of the entire KASLR series, this sub-series is the most likely
>>>> to
>>>> cause problems, and hence requires the most careful review and testing.
>>>> This
>>>> is due to the fact that, with these changes, the invariant __va(__pa(x))
>>>> == x
>>>> no longer holds, and any code that is based on that assumption needs to
>>>> be
>>>> updated.
>>>>
>>>> Changes since v4:
>>>> - added Marc's ack to patch #6
>>>> - round the kasan zero shadow region around the kernel image to swapper
>>>> block
>>>>     size (#7)
>>>> - ensure that we don't clip the kernel image when clipping RAM to the
>>>> linear
>>>>     region size (#8)
>>>>
>>>> Patch #1 allows the low mark of memblocks discovered from the FDT to be
>>>> overridden by the architecture.
>>>>
>>>> Patch #2 enables the huge-vmap generic feature for arm64. This should be
>>>> an
>>>> improvement in itself, but the significance for this series is that it
>>>> allows
>>>> unmap_kernel_range() to be called on the [__init_begin, __init_end)
>>>> region,
>>>> which may be partially mapped using block mappings.
>>>>
>>>> Patch #3 introduces KIMAGE_VADDR as a separate, preparatory step towards
>>>> decoupling the kernel placement from PAGE_OFFSET
>>>>
>>>> Patch #4 implements some translation table accessors that operate on
>>>> statically
>>>> allocate translation tables before the linear mapping is up.
>>>>
>>>> Patch #5 decouples the fixmap initialization from the linear mapping, by
>>>> using
>>>> the accessors implemented by patch #4
>>>>
>>>> Patch #6 removes assumptions made my KVM regarding the placement of the
>>>> kernel
>>>> image inside the linear mapping.
>>>>
>>>> Patch #7 moves the kernel image from the base of the linear mapping to
>>>> the
>>>> base
>>>> of the vmalloc area. The modules area, which sits right below the kernel
>>>> image,
>>>> is moved along and is put right before the start of the vmalloc area.
>>>>
>>>> Patch #8 decouples PHYS_OFFSET from PAGE_OFFSET, which allows the linear
>>>> mapping
>>>> to cover all discovered memory, regardless of where the kernel image is
>>>> located
>>>> in it. This effectively allows the kernel to be loaded at any physical
>>>> address
>>>> (provided that the correct alignment is used)
>>>>
>>>> [1] http://thread.gmane.org/gmane.linux.kernel/2135931
>>>>
>>>> Ard Biesheuvel (8):
>>>>     of/fdt: make memblock minimum physical address arch configurable
>>>>     arm64: add support for ioremap() block mappings
>>>>     arm64: introduce KIMAGE_VADDR as the virtual base of the kernel
>>>> region
>>>>     arm64: pgtable: implement static [pte|pmd|pud]_offset variants
>>>>     arm64: decouple early fixmap init from linear mapping
>>>>     arm64: kvm: deal with kernel symbols outside of linear mapping
>>>>     arm64: move kernel image to base of vmalloc area
>>>>     arm64: allow kernel Image to be loaded anywhere in physical memory
>>>>
>>>
>>> I bisected linux-next (20160212) with the following error on booting with
>>> an
>>> initramfs:
>>>   Failed to execute /init (error -8)
>>>   request_module: runaway loop modprobe binfmt-464c
>>>   Starting init: /sbin/init exists but couldn't execute it (error -8)
>>>   request_module: runaway loop modprobe binfmt-464c
>>>   Starting init: /bin/sh exists but couldn't execute it (error -8)
>>>   Kernel panic - not syncing: No working init found.  Try passing init=
>>> option to kernel. See Linux Documentation/init..
>>>
>>> I tracked down the error to patch 7 of this series. But I realized that
>>> patch 7 does not compile, but from patch 8 onwards I observe the error.
>>>

As far as this failure is concerned, I managed to reproduce an error
with patch #7 and not #8 applied, involving out of range kvm symbols
at link time. This is reproducible with GCC 4.8 but not GCC 4.9 or
later. Is this what you were seeing as well? Or is there another
problem?


>>> I use defconfig with an initramfs.cpio created with buildroot.
>>> I tested this on my mt8173 eval board, but I suppose this can be
>>> reproduced
>>> easily on other machines as well.
>>>
>>
>> Thanks for the report. Does this help at all?
>>
>> http://thread.gmane.org/gmane.linux.ports.arm.kernel/477645
>>
>
> I applied them on top of linux-next and this fixed the problem.
>
> Thanks,
> Matthias
>
>
>>> Regards,
>>> Matthias
>>>
>>>
>>>>    Documentation/arm64/booting.txt                      |  20 ++-
>>>>    Documentation/features/vm/huge-vmap/arch-support.txt |   2 +-
>>>>    arch/arm/include/asm/kvm_asm.h                       |   2 +
>>>>    arch/arm/kvm/arm.c                                   |   8 +-
>>>>    arch/arm64/Kconfig                                   |   1 +
>>>>    arch/arm64/include/asm/boot.h                        |   6 +
>>>>    arch/arm64/include/asm/kasan.h                       |   2 +-
>>>>    arch/arm64/include/asm/kernel-pgtable.h              |  12 ++
>>>>    arch/arm64/include/asm/kvm_asm.h                     |   2 +
>>>>    arch/arm64/include/asm/kvm_host.h                    |   8 +-
>>>>    arch/arm64/include/asm/memory.h                      |  44 ++++--
>>>>    arch/arm64/include/asm/pgtable.h                     |  23 ++-
>>>>    arch/arm64/kernel/head.S                             |   8 +-
>>>>    arch/arm64/kernel/image.h                            |  13 +-
>>>>    arch/arm64/kernel/vmlinux.lds.S                      |   4 +-
>>>>    arch/arm64/kvm/hyp.S                                 |   6 +-
>>>>    arch/arm64/mm/dump.c                                 |  12 +-
>>>>    arch/arm64/mm/init.c                                 | 123
>>>> ++++++++++++++--
>>>>    arch/arm64/mm/kasan_init.c                           |  31 +++-
>>>>    arch/arm64/mm/mmu.c                                  | 155
>>>> +++++++++++++++-----
>>>>    drivers/of/fdt.c                                     |   5 +-
>>>>    21 files changed, 378 insertions(+), 109 deletions(-)
>>>>
>>>
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 0/8] arm64: split linear and kernel mappings
  2016-02-13 14:28       ` Ard Biesheuvel
@ 2016-02-15 13:29         ` Matthias Brugger
  2016-02-15 13:40           ` Will Deacon
  2016-02-15 14:58           ` Ard Biesheuvel
  0 siblings, 2 replies; 78+ messages in thread
From: Matthias Brugger @ 2016-02-15 13:29 UTC (permalink / raw)
  To: linux-arm-kernel



On 13/02/16 15:28, Ard Biesheuvel wrote:
> On 12 February 2016 at 21:10, Matthias Brugger <matthias.bgg@gmail.com> wrote:
>>
>>
>> On 12/02/16 20:47, Ard Biesheuvel wrote:
>>>
>>> On 12 February 2016 at 20:45, Matthias Brugger <matthias.bgg@gmail.com>
>>> wrote:
>>>>
>>>> Hi Ard,
>>>>
>>>>
>>>> On 01/02/16 11:54, Ard Biesheuvel wrote:
>>>>>
>>>>>
>>>>> At the request of Catalin, this series has been split off from my series
>>>>> 'arm64: implement support for KASLR v4' [1]. This sub-series deals with
>>>>> moving the kernel out of the linear mapping into the vmalloc area. This
>>>>> is a prerequisite for independent physical and virtual randomization of
>>>>> the kernel image. On top of that, considering that these changes allow
>>>>> the linear mapping to start at an arbitrary offset above PAGE_OFFSET, it
>>>>> should be an improvement in itself due to the fact that we can now
>>>>> choose
>>>>> PAGE_OFFSET such that RAM can be mapped using large block sizes.
>>>>>
>>>>> For instance, on my Seattle A0 box, the kernel is loaded 16 MB into the
>>>>> lowest GB of RAM, which means __pa(PAGE_OFFSET) is not 1 GB aligned, and
>>>>> the entire 16 GB of RAM will be mapping using 2 MB blocks. (Similarly,
>>>>> for 64 KB granule kernels, the entire 16 GB of RAM will be mapped using
>>>>> pages since __pa(PAGE_OFFSET) is not 512 MB aligned). With these changes
>>>>>     __pa(PAGE_OFFSET) will always be chosen such that it is aligned to a
>>>>> quantity that allows efficient mapping.
>>>>>
>>>>> Note that of the entire KASLR series, this sub-series is the most likely
>>>>> to
>>>>> cause problems, and hence requires the most careful review and testing.
>>>>> This
>>>>> is due to the fact that, with these changes, the invariant __va(__pa(x))
>>>>> == x
>>>>> no longer holds, and any code that is based on that assumption needs to
>>>>> be
>>>>> updated.
>>>>>
>>>>> Changes since v4:
>>>>> - added Marc's ack to patch #6
>>>>> - round the kasan zero shadow region around the kernel image to swapper
>>>>> block
>>>>>      size (#7)
>>>>> - ensure that we don't clip the kernel image when clipping RAM to the
>>>>> linear
>>>>>      region size (#8)
>>>>>
>>>>> Patch #1 allows the low mark of memblocks discovered from the FDT to be
>>>>> overridden by the architecture.
>>>>>
>>>>> Patch #2 enables the huge-vmap generic feature for arm64. This should be
>>>>> an
>>>>> improvement in itself, but the significance for this series is that it
>>>>> allows
>>>>> unmap_kernel_range() to be called on the [__init_begin, __init_end)
>>>>> region,
>>>>> which may be partially mapped using block mappings.
>>>>>
>>>>> Patch #3 introduces KIMAGE_VADDR as a separate, preparatory step towards
>>>>> decoupling the kernel placement from PAGE_OFFSET
>>>>>
>>>>> Patch #4 implements some translation table accessors that operate on
>>>>> statically
>>>>> allocate translation tables before the linear mapping is up.
>>>>>
>>>>> Patch #5 decouples the fixmap initialization from the linear mapping, by
>>>>> using
>>>>> the accessors implemented by patch #4
>>>>>
>>>>> Patch #6 removes assumptions made my KVM regarding the placement of the
>>>>> kernel
>>>>> image inside the linear mapping.
>>>>>
>>>>> Patch #7 moves the kernel image from the base of the linear mapping to
>>>>> the
>>>>> base
>>>>> of the vmalloc area. The modules area, which sits right below the kernel
>>>>> image,
>>>>> is moved along and is put right before the start of the vmalloc area.
>>>>>
>>>>> Patch #8 decouples PHYS_OFFSET from PAGE_OFFSET, which allows the linear
>>>>> mapping
>>>>> to cover all discovered memory, regardless of where the kernel image is
>>>>> located
>>>>> in it. This effectively allows the kernel to be loaded at any physical
>>>>> address
>>>>> (provided that the correct alignment is used)
>>>>>
>>>>> [1] http://thread.gmane.org/gmane.linux.kernel/2135931
>>>>>
>>>>> Ard Biesheuvel (8):
>>>>>      of/fdt: make memblock minimum physical address arch configurable
>>>>>      arm64: add support for ioremap() block mappings
>>>>>      arm64: introduce KIMAGE_VADDR as the virtual base of the kernel
>>>>> region
>>>>>      arm64: pgtable: implement static [pte|pmd|pud]_offset variants
>>>>>      arm64: decouple early fixmap init from linear mapping
>>>>>      arm64: kvm: deal with kernel symbols outside of linear mapping
>>>>>      arm64: move kernel image to base of vmalloc area
>>>>>      arm64: allow kernel Image to be loaded anywhere in physical memory
>>>>>
>>>>
>>>> I bisected linux-next (20160212) with the following error on booting with
>>>> an
>>>> initramfs:
>>>>    Failed to execute /init (error -8)
>>>>    request_module: runaway loop modprobe binfmt-464c
>>>>    Starting init: /sbin/init exists but couldn't execute it (error -8)
>>>>    request_module: runaway loop modprobe binfmt-464c
>>>>    Starting init: /bin/sh exists but couldn't execute it (error -8)
>>>>    Kernel panic - not syncing: No working init found.  Try passing init=
>>>> option to kernel. See Linux Documentation/init..
>>>>
>>>> I tracked down the error to patch 7 of this series. But I realized that
>>>> patch 7 does not compile, but from patch 8 onwards I observe the error.
>>>>
>
> As far as this failure is concerned, I managed to reproduce an error
> with patch #7 and not #8 applied, involving out of range kvm symbols
> at link time. This is reproducible with GCC 4.8 but not GCC 4.9 or
> later. Is this what you were seeing as well? Or is there another
> problem?
>

I realized that I used " aarch64-linux-gnu-gcc (Linaro GCC 2014.11) 
4.9.3 20141031 (prerelease)"  which gave me errors like:

arch/arm64/kvm/built-in.o: In function `__cpu_init_hyp_mode':
/home/mbrugger/src/linux-next/./arch/arm64/include/asm/kvm_host.h:331:(.text+0x73b4): 
relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol 
`__kvm_hyp_vector' defined in .hyp.text section in arch/arm64/kvm/built-in.o

So it was time to update my toolchain to "aarch64-linux-gnu-gcc (Linaro 
GCC 5.1-2015.08) 5.1.1 20150608" which fixed the problem for me.

Thanks,
Matthias

>
>>>> I use defconfig with an initramfs.cpio created with buildroot.
>>>> I tested this on my mt8173 eval board, but I suppose this can be
>>>> reproduced
>>>> easily on other machines as well.
>>>>
>>>
>>> Thanks for the report. Does this help at all?
>>>
>>> http://thread.gmane.org/gmane.linux.ports.arm.kernel/477645
>>>
>>
>> I applied them on top of linux-next and this fixed the problem.
>>
>> Thanks,
>> Matthias
>>
>>
>>>> Regards,
>>>> Matthias
>>>>
>>>>
>>>>>     Documentation/arm64/booting.txt                      |  20 ++-
>>>>>     Documentation/features/vm/huge-vmap/arch-support.txt |   2 +-
>>>>>     arch/arm/include/asm/kvm_asm.h                       |   2 +
>>>>>     arch/arm/kvm/arm.c                                   |   8 +-
>>>>>     arch/arm64/Kconfig                                   |   1 +
>>>>>     arch/arm64/include/asm/boot.h                        |   6 +
>>>>>     arch/arm64/include/asm/kasan.h                       |   2 +-
>>>>>     arch/arm64/include/asm/kernel-pgtable.h              |  12 ++
>>>>>     arch/arm64/include/asm/kvm_asm.h                     |   2 +
>>>>>     arch/arm64/include/asm/kvm_host.h                    |   8 +-
>>>>>     arch/arm64/include/asm/memory.h                      |  44 ++++--
>>>>>     arch/arm64/include/asm/pgtable.h                     |  23 ++-
>>>>>     arch/arm64/kernel/head.S                             |   8 +-
>>>>>     arch/arm64/kernel/image.h                            |  13 +-
>>>>>     arch/arm64/kernel/vmlinux.lds.S                      |   4 +-
>>>>>     arch/arm64/kvm/hyp.S                                 |   6 +-
>>>>>     arch/arm64/mm/dump.c                                 |  12 +-
>>>>>     arch/arm64/mm/init.c                                 | 123
>>>>> ++++++++++++++--
>>>>>     arch/arm64/mm/kasan_init.c                           |  31 +++-
>>>>>     arch/arm64/mm/mmu.c                                  | 155
>>>>> +++++++++++++++-----
>>>>>     drivers/of/fdt.c                                     |   5 +-
>>>>>     21 files changed, 378 insertions(+), 109 deletions(-)
>>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 0/8] arm64: split linear and kernel mappings
  2016-02-15 13:29         ` Matthias Brugger
@ 2016-02-15 13:40           ` Will Deacon
  2016-02-15 14:58           ` Ard Biesheuvel
  1 sibling, 0 replies; 78+ messages in thread
From: Will Deacon @ 2016-02-15 13:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 15, 2016 at 02:29:51PM +0100, Matthias Brugger wrote:
> >On 12 February 2016 at 21:10, Matthias Brugger <matthias.bgg@gmail.com> wrote:
> >>On 12/02/16 20:47, Ard Biesheuvel wrote:
> >>>On 12 February 2016 at 20:45, Matthias Brugger <matthias.bgg@gmail.com>
> >>>>I bisected linux-next (20160212) with the following error on booting with
> >>>>an
> >>>>initramfs:
> >>>>   Failed to execute /init (error -8)
> >>>>   request_module: runaway loop modprobe binfmt-464c
> >>>>   Starting init: /sbin/init exists but couldn't execute it (error -8)
> >>>>   request_module: runaway loop modprobe binfmt-464c
> >>>>   Starting init: /bin/sh exists but couldn't execute it (error -8)
> >>>>   Kernel panic - not syncing: No working init found.  Try passing init=
> >>>>option to kernel. See Linux Documentation/init..
> >>>>
> >>>>I tracked down the error to patch 7 of this series. But I realized that
> >>>>patch 7 does not compile, but from patch 8 onwards I observe the error.
> >>>>
> >
> >As far as this failure is concerned, I managed to reproduce an error
> >with patch #7 and not #8 applied, involving out of range kvm symbols
> >at link time. This is reproducible with GCC 4.8 but not GCC 4.9 or
> >later. Is this what you were seeing as well? Or is there another
> >problem?
> >
> 
> I realized that I used " aarch64-linux-gnu-gcc (Linaro GCC 2014.11) 4.9.3
> 20141031 (prerelease)"  which gave me errors like:
> 
> arch/arm64/kvm/built-in.o: In function `__cpu_init_hyp_mode':
> /home/mbrugger/src/linux-next/./arch/arm64/include/asm/kvm_host.h:331:(.text+0x73b4):
> relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol
> `__kvm_hyp_vector' defined in .hyp.text section in arch/arm64/kvm/built-in.o
> 
> So it was time to update my toolchain to "aarch64-linux-gnu-gcc (Linaro GCC
> 5.1-2015.08) 5.1.1 20150608" which fixed the problem for me.

FWIW, I've also been seeing a bunch of reloc errors with allmodconfig,
for-next/core and GCC 4.9. loops_per_jiffy was one of the problematic
symbols, iirc, but upgrading to GCC 6 (devel) fixed the issue.

Will

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-12 16:06               ` Catalin Marinas
  2016-02-12 16:44                 ` Ard Biesheuvel
@ 2016-02-15 14:28                 ` Andrey Ryabinin
  2016-02-15 14:35                   ` Mark Rutland
  2016-02-15 18:59                   ` Catalin Marinas
  1 sibling, 2 replies; 78+ messages in thread
From: Andrey Ryabinin @ 2016-02-15 14:28 UTC (permalink / raw)
  To: linux-arm-kernel



On 02/12/2016 07:06 PM, Catalin Marinas wrote:
> On Fri, Feb 12, 2016 at 03:38:46PM +0000, Sudeep Holla wrote:
>>
>> On 12/02/16 15:26, Catalin Marinas wrote:
>>> On Fri, Feb 12, 2016 at 04:17:09PM +0100, Ard Biesheuvel wrote:
>>>> On 12 February 2016 at 16:10, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>>>> On Fri, Feb 12, 2016 at 04:02:58PM +0100, Ard Biesheuvel wrote:
>>>>>> On 12 February 2016 at 15:58, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>>>>>> On Mon, Feb 01, 2016 at 11:54:52AM +0100, Ard Biesheuvel wrote:
>>>>>>>> This moves the module area to right before the vmalloc area, and
>>>>>>>> moves the kernel image to the base of the vmalloc area. This is
>>>>>>>> an intermediate step towards implementing KASLR, which allows the
>>>>>>>> kernel image to be located anywhere in the vmalloc area.
>>>>>>>>
>>>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>>>>
>>>>>>> This patch is causing lots of KASAN warnings on Juno (interestingly, it
>>>>>>> doesn't seem to trigger on Seattle, though we only tried for-next/core).
>>>>>>> I pushed the branch that I'm currently using here:
>>>>>>>
>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux for-next/kernmap
>>>>>>>
>>>>>>>
>>>>>>> A typical error (though its place varies based on the config options,
>>>>>>> kernel layout):
>>>>>>>
>>>>>>> BUG: KASAN: stack-out-of-bounds in clockevents_program_event+0x28/0x1b0 at addr ffffffc936257cc8
>>>>>>
>>>>>> Can you confirm that these are stack accesses? I was having similar
>>>>>> errors before, and I ended up creating the kasan zero page patch
>>>>>> because it turned out the kasan shadow page in question was aliased
>>>>>> and the stack writes were occurring elsewhere.
>>>>>
>>>>> It's possible, we are looking into this. Is there any other patch I miss on
>>>>> the above branch?
>>>>
>>>> I don't think so but I will check
>>>
>>> Commit 7b1af9795773 ("arm64: kasan: ensure that the KASAN zero page is
>>> mapped read-only") was merged in -rc2 while the branch above is based on
>>> -rc1. Anyway, I merged it into -rc2 and the errors are similar.
>>>
>>
>> Sorry to add more confusion, but I observed similar KASAN warning
>> with latest mainline(v4.5-rc3+, commit c05235d50f68) with below diff.
> 
> I can reproduce this with UBSAN enabled (log below for the record).
> 
> So far, we have:
> 
> KASAN+for-next/kernmap goes wrong
> KASAN+UBSAN goes wrong
> 
> Enabled individually, KASAN, UBSAN and for-next/kernmap seem fine. I may
> have to trim for-next/core down until we figure out where the problem
> is.
> 
> 
> BUG: KASAN: stack-out-of-bounds in find_busiest_group+0x164/0x16a0 at addr ffffffc93665bc8c


Can it be related to TLB conflicts, which supposed to be fixed in "arm64: kasan: avoid TLB conflicts" patch
from "arm64: mm: rework page table creation" series ?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-15 14:28                 ` Andrey Ryabinin
@ 2016-02-15 14:35                   ` Mark Rutland
  2016-02-15 18:59                   ` Catalin Marinas
  1 sibling, 0 replies; 78+ messages in thread
From: Mark Rutland @ 2016-02-15 14:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 15, 2016 at 05:28:02PM +0300, Andrey Ryabinin wrote:
> 
> 
> On 02/12/2016 07:06 PM, Catalin Marinas wrote:
> > So far, we have:
> > 
> > KASAN+for-next/kernmap goes wrong
> > KASAN+UBSAN goes wrong
> > 
> > Enabled individually, KASAN, UBSAN and for-next/kernmap seem fine. I may
> > have to trim for-next/core down until we figure out where the problem
> > is.
> > 
> > BUG: KASAN: stack-out-of-bounds in find_busiest_group+0x164/0x16a0 at addr ffffffc93665bc8c
> 
> Can it be related to TLB conflicts, which supposed to be fixed in "arm64: kasan: avoid TLB conflicts" patch
> from "arm64: mm: rework page table creation" series ?

Currently I don't believe this is a TLB issue. We've been seeing issues
even with those patches in for-next/core with that patch included. It's
also incredibly reliable to trigger.

It seems that issues are more likely the larger the kernel image, so my
suspicion is that at some boundary condition we create the page tables
for the shadow region incorrectly. I'm only able to trigger this on a
particular machine, so the physical memory layout may also matter.

I'm currently looking into that.

Mark.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 0/8] arm64: split linear and kernel mappings
  2016-02-15 13:29         ` Matthias Brugger
  2016-02-15 13:40           ` Will Deacon
@ 2016-02-15 14:58           ` Ard Biesheuvel
  1 sibling, 0 replies; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-15 14:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 15 February 2016 at 14:29, Matthias Brugger <matthias.bgg@gmail.com> wrote:
>
>
> On 13/02/16 15:28, Ard Biesheuvel wrote:
>>
>> On 12 February 2016 at 21:10, Matthias Brugger <matthias.bgg@gmail.com>
>> wrote:
>>>
>>>
>>>
>>> On 12/02/16 20:47, Ard Biesheuvel wrote:
>>>>
>>>>
>>>> On 12 February 2016 at 20:45, Matthias Brugger <matthias.bgg@gmail.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> Hi Ard,
>>>>>
>>>>>
>>>>> On 01/02/16 11:54, Ard Biesheuvel wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> At the request of Catalin, this series has been split off from my
>>>>>> series
>>>>>> 'arm64: implement support for KASLR v4' [1]. This sub-series deals
>>>>>> with
>>>>>> moving the kernel out of the linear mapping into the vmalloc area.
>>>>>> This
>>>>>> is a prerequisite for independent physical and virtual randomization
>>>>>> of
>>>>>> the kernel image. On top of that, considering that these changes allow
>>>>>> the linear mapping to start at an arbitrary offset above PAGE_OFFSET,
>>>>>> it
>>>>>> should be an improvement in itself due to the fact that we can now
>>>>>> choose
>>>>>> PAGE_OFFSET such that RAM can be mapped using large block sizes.
>>>>>>
>>>>>> For instance, on my Seattle A0 box, the kernel is loaded 16 MB into
>>>>>> the
>>>>>> lowest GB of RAM, which means __pa(PAGE_OFFSET) is not 1 GB aligned,
>>>>>> and
>>>>>> the entire 16 GB of RAM will be mapping using 2 MB blocks. (Similarly,
>>>>>> for 64 KB granule kernels, the entire 16 GB of RAM will be mapped
>>>>>> using
>>>>>> pages since __pa(PAGE_OFFSET) is not 512 MB aligned). With these
>>>>>> changes
>>>>>>     __pa(PAGE_OFFSET) will always be chosen such that it is aligned to
>>>>>> a
>>>>>> quantity that allows efficient mapping.
>>>>>>
>>>>>> Note that of the entire KASLR series, this sub-series is the most
>>>>>> likely
>>>>>> to
>>>>>> cause problems, and hence requires the most careful review and
>>>>>> testing.
>>>>>> This
>>>>>> is due to the fact that, with these changes, the invariant
>>>>>> __va(__pa(x))
>>>>>> == x
>>>>>> no longer holds, and any code that is based on that assumption needs
>>>>>> to
>>>>>> be
>>>>>> updated.
>>>>>>
>>>>>> Changes since v4:
>>>>>> - added Marc's ack to patch #6
>>>>>> - round the kasan zero shadow region around the kernel image to
>>>>>> swapper
>>>>>> block
>>>>>>      size (#7)
>>>>>> - ensure that we don't clip the kernel image when clipping RAM to the
>>>>>> linear
>>>>>>      region size (#8)
>>>>>>
>>>>>> Patch #1 allows the low mark of memblocks discovered from the FDT to
>>>>>> be
>>>>>> overridden by the architecture.
>>>>>>
>>>>>> Patch #2 enables the huge-vmap generic feature for arm64. This should
>>>>>> be
>>>>>> an
>>>>>> improvement in itself, but the significance for this series is that it
>>>>>> allows
>>>>>> unmap_kernel_range() to be called on the [__init_begin, __init_end)
>>>>>> region,
>>>>>> which may be partially mapped using block mappings.
>>>>>>
>>>>>> Patch #3 introduces KIMAGE_VADDR as a separate, preparatory step
>>>>>> towards
>>>>>> decoupling the kernel placement from PAGE_OFFSET
>>>>>>
>>>>>> Patch #4 implements some translation table accessors that operate on
>>>>>> statically
>>>>>> allocate translation tables before the linear mapping is up.
>>>>>>
>>>>>> Patch #5 decouples the fixmap initialization from the linear mapping,
>>>>>> by
>>>>>> using
>>>>>> the accessors implemented by patch #4
>>>>>>
>>>>>> Patch #6 removes assumptions made my KVM regarding the placement of
>>>>>> the
>>>>>> kernel
>>>>>> image inside the linear mapping.
>>>>>>
>>>>>> Patch #7 moves the kernel image from the base of the linear mapping to
>>>>>> the
>>>>>> base
>>>>>> of the vmalloc area. The modules area, which sits right below the
>>>>>> kernel
>>>>>> image,
>>>>>> is moved along and is put right before the start of the vmalloc area.
>>>>>>
>>>>>> Patch #8 decouples PHYS_OFFSET from PAGE_OFFSET, which allows the
>>>>>> linear
>>>>>> mapping
>>>>>> to cover all discovered memory, regardless of where the kernel image
>>>>>> is
>>>>>> located
>>>>>> in it. This effectively allows the kernel to be loaded at any physical
>>>>>> address
>>>>>> (provided that the correct alignment is used)
>>>>>>
>>>>>> [1] http://thread.gmane.org/gmane.linux.kernel/2135931
>>>>>>
>>>>>> Ard Biesheuvel (8):
>>>>>>      of/fdt: make memblock minimum physical address arch configurable
>>>>>>      arm64: add support for ioremap() block mappings
>>>>>>      arm64: introduce KIMAGE_VADDR as the virtual base of the kernel
>>>>>> region
>>>>>>      arm64: pgtable: implement static [pte|pmd|pud]_offset variants
>>>>>>      arm64: decouple early fixmap init from linear mapping
>>>>>>      arm64: kvm: deal with kernel symbols outside of linear mapping
>>>>>>      arm64: move kernel image to base of vmalloc area
>>>>>>      arm64: allow kernel Image to be loaded anywhere in physical
>>>>>> memory
>>>>>>
>>>>>
>>>>> I bisected linux-next (20160212) with the following error on booting
>>>>> with
>>>>> an
>>>>> initramfs:
>>>>>    Failed to execute /init (error -8)
>>>>>    request_module: runaway loop modprobe binfmt-464c
>>>>>    Starting init: /sbin/init exists but couldn't execute it (error -8)
>>>>>    request_module: runaway loop modprobe binfmt-464c
>>>>>    Starting init: /bin/sh exists but couldn't execute it (error -8)
>>>>>    Kernel panic - not syncing: No working init found.  Try passing
>>>>> init=
>>>>> option to kernel. See Linux Documentation/init..
>>>>>
>>>>> I tracked down the error to patch 7 of this series. But I realized that
>>>>> patch 7 does not compile, but from patch 8 onwards I observe the error.
>>>>>
>>
>> As far as this failure is concerned, I managed to reproduce an error
>> with patch #7 and not #8 applied, involving out of range kvm symbols
>> at link time. This is reproducible with GCC 4.8 but not GCC 4.9 or
>> later. Is this what you were seeing as well? Or is there another
>> problem?
>>
>
> I realized that I used " aarch64-linux-gnu-gcc (Linaro GCC 2014.11) 4.9.3
> 20141031 (prerelease)"  which gave me errors like:
>
> arch/arm64/kvm/built-in.o: In function `__cpu_init_hyp_mode':
> /home/mbrugger/src/linux-next/./arch/arm64/include/asm/kvm_host.h:331:(.text+0x73b4):
> relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol
> `__kvm_hyp_vector' defined in .hyp.text section in arch/arm64/kvm/built-in.o
>
> So it was time to update my toolchain to "aarch64-linux-gnu-gcc (Linaro GCC
> 5.1-2015.08) 5.1.1 20150608" which fixed the problem for me.
>

OK, so it's the same issue, but I thought before that 4.9 worked ok,
which is indeed not the case (I checked)
This is a transient issue, since the subsequent patch replaces the
offending expressions, but to maintain bisectability, I will add a
workaround nonetheless.

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-15 14:28                 ` Andrey Ryabinin
  2016-02-15 14:35                   ` Mark Rutland
@ 2016-02-15 18:59                   ` Catalin Marinas
  2016-02-16 12:59                     ` Andrey Ryabinin
  1 sibling, 1 reply; 78+ messages in thread
From: Catalin Marinas @ 2016-02-15 18:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 15, 2016 at 05:28:02PM +0300, Andrey Ryabinin wrote:
> On 02/12/2016 07:06 PM, Catalin Marinas wrote:
> > So far, we have:
> > 
> > KASAN+for-next/kernmap goes wrong
> > KASAN+UBSAN goes wrong
> > 
> > Enabled individually, KASAN, UBSAN and for-next/kernmap seem fine. I may
> > have to trim for-next/core down until we figure out where the problem
> > is.
> > 
> > BUG: KASAN: stack-out-of-bounds in find_busiest_group+0x164/0x16a0 at addr ffffffc93665bc8c
> 
> Can it be related to TLB conflicts, which supposed to be fixed in
> "arm64: kasan: avoid TLB conflicts" patch from "arm64: mm: rework page
> table creation" series ?

I can very easily reproduce this with a vanilla 4.5-rc1 series by
enabling inline instrumentation (maybe Mark's theory is true w.r.t.
image size).

Some information, maybe you can shed some light on this. It seems to
happen only for secondary CPUs on the swapper stack (I think allocated
via fork_idle()). The code generated looks sane to me, so KASAN should
not complain but maybe there is some uninitialised shadow, hence the
error.

The report:

-------------->8---------------
BUG: KASAN: stack-out-of-bounds in clockevents_program_event+0x354/0x368 at addr ffffffc93651bca8
Read of size 8 by task swapper/1/0
page:ffffffbde6d946c0 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x4000000000000000()
page dumped because: kasan: bad access detected
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G    B           4.5.0-rc1+ #163
Hardware name: Juno (DT)
Call trace:
[<ffffffc00008f130>] dump_backtrace+0x0/0x358
[<ffffffc00008f49c>] show_stack+0x14/0x20
[<ffffffc000785dc0>] dump_stack+0xf8/0x188
[<ffffffc000343c0c>] kasan_report_error+0x524/0x550
[<ffffffc000343d50>] __asan_report_load8_noabort+0x40/0x48
[<ffffffc0001f2bc4>] clockevents_program_event+0x354/0x368
[<ffffffc0001f73d4>] tick_program_event+0xac/0x108
[<ffffffc0001d85c8>] hrtimer_start_range_ns+0x8a0/0xb20
[<ffffffc0001f8ba8>] __tick_nohz_idle_enter+0x970/0xca8
[<ffffffc0001f9368>] tick_nohz_idle_enter+0x60/0x98
[<ffffffc0001933ec>] cpu_startup_entry+0x14c/0x448
[<ffffffc000098654>] secondary_start_kernel+0x264/0x2e0
[<0000000080082ecc>] 0x80082ecc
Memory state around the buggy address:
 ffffffc93651bb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffc93651bc00: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
>ffffffc93651bc80: 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00
                                  ^
 ffffffc93651bd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffffc93651bd80: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4
-------------->8---------------

I put some printks in clockevents_program_event() and the SP is
0xffffffc93651bc70, so it matches the above.

Disassembling the code:

-------------->8---------------
ffffffc0001f2870 <clockevents_program_event>:
ffffffc0001f2870:       a9bc7bfd        stp     x29, x30, [sp,#-64]!
ffffffc0001f2874:       d2dff204        mov     x4, #0xff9000000000             // #280993940373504
ffffffc0001f2878:       910003fd        mov     x29, sp
ffffffc0001f287c:       910103a3        add     x3, x29, #0x40
ffffffc0001f2880:       f2fbffe4        movk    x4, #0xdfff, lsl #48
ffffffc0001f2884:       a90153f3        stp     x19, x20, [sp,#16]
ffffffc0001f2888:       a9025bf5        stp     x21, x22, [sp,#32]
ffffffc0001f288c:       f81f8c61        str     x1, [x3,#-8]!
ffffffc0001f2890:       aa0003f3        mov     x19, x0
ffffffc0001f2894:       53001c55        uxtb    w21, w2
ffffffc0001f2898:       d343fc60        lsr     x0, x3, #3
ffffffc0001f289c:       38e46800        ldrsb   w0, [x0,x4]
ffffffc0001f28a0:       350018e0        cbnz    w0, ffffffc0001f2bbc <clockevents_program_event+0x34c>

[...]

ffffffc0001f2bbc:       aa0303e0        mov     x0, x3
ffffffc0001f2bc0:       94054454        bl      ffffffc000343d10 <__asan_report_load8_noabort>
-------------->8---------------

To me, line ffffffc0001f288c looks like a normal store to a stack
variable and the stack boundaries look fine. The ffffffc0001f289c line
checks shadow and reads non-zero, hence the report. But I don't get
what's wrong with this function, other than corrupt KASAN shadow.

-- 
Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 8/8] arm64: allow kernel Image to be loaded anywhere in physical memory
  2016-02-01 16:28     ` Fu Wei
@ 2016-02-16  8:55       ` Fu Wei
  0 siblings, 0 replies; 78+ messages in thread
From: Fu Wei @ 2016-02-16  8:55 UTC (permalink / raw)
  To: linux-arm-kernel



On 02/02/2016 12:28 AM, Fu Wei wrote:
> Hi Mark
>
> On 02/01/2016 10:50 PM, Mark Rutland wrote:
>> On Mon, Feb 01, 2016 at 11:54:53AM +0100, Ard Biesheuvel wrote:
>>> This relaxes the kernel Image placement requirements, so that it
>>> may be placed at any 2 MB aligned offset in physical memory.
>>>
>>> This is accomplished by ignoring PHYS_OFFSET when installing
>>> memblocks, and accounting for the apparent virtual offset of
>>> the kernel Image. As a result, virtual address references
>>> below PAGE_OFFSET are correctly mapped onto physical references
>>> into the kernel Image regardless of where it sits in memory.
>>>
>>> Note that limiting memory using mem= is not unambiguous anymore after
>>> this change, considering that the kernel may be at the top of physical
>>> memory, and clipping from the bottom rather than the top will discard
>>> any 32-bit DMA addressable memory first. To deal with this, the handling
>>> of mem= is reimplemented to clip top down, but take special care not to
>>> clip memory that covers the kernel image.
>>>
>>> Since mem= should not be considered a production feature, a panic
>>> notifier
>>> handler is installed that dumps the memory limit at panic time if one
>>> was
>>> set.
>>
>> Good idea!
>>
>> It would be great if we could follow up with a sizes.h update for SZ_4G,
>> though that's only a nice-to-have, and in no way should block this.
>>
>> Other than that, this looks good. Thanks for putting this together!
>>
>> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
>>
>> For the Documentation/arm64 parts we'll need to ask Fu Wei to update the
>> zh_CN/ translation to match.
>
> Great thanks for your info
> Yes, I will working on it

The zh_CN patch has been prepared, once the English version is merged 
into mainline, I will upstream that immediately.
Because there is another zh_CN patch for booting.txt in upstream 
procedure: https://lkml.org/lkml/2016/2/16/164

sorry for delay. :-)

>
>>
>> Mark.
>>
>>>
>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> ---
>>>   Documentation/arm64/booting.txt         |  20 ++--
>>>   arch/arm64/include/asm/boot.h           |   6 ++
>>>   arch/arm64/include/asm/kernel-pgtable.h |  12 +++
>>>   arch/arm64/include/asm/kvm_asm.h        |   2 +-
>>>   arch/arm64/include/asm/memory.h         |  15 +--
>>>   arch/arm64/kernel/head.S                |   6 +-
>>>   arch/arm64/kernel/image.h               |  13 ++-
>>>   arch/arm64/mm/init.c                    | 100 +++++++++++++++++++-
>>>   arch/arm64/mm/mmu.c                     |   3 +
>>>   9 files changed, 155 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/Documentation/arm64/booting.txt
>>> b/Documentation/arm64/booting.txt
>>> index 701d39d3171a..56d6d8b796db 100644
>>> --- a/Documentation/arm64/booting.txt
>>> +++ b/Documentation/arm64/booting.txt
>>> @@ -109,7 +109,13 @@ Header notes:
>>>               1 - 4K
>>>               2 - 16K
>>>               3 - 64K
>>> -  Bits 3-63:    Reserved.
>>> +  Bit 3:    Kernel physical placement
>>> +            0 - 2MB aligned base should be as close as possible
>>> +                to the base of DRAM, since memory below it is not
>>> +                accessible via the linear mapping
>>> +            1 - 2MB aligned base may be anywhere in physical
>>> +                memory
>>> +  Bits 4-63:    Reserved.
>>>
>>>   - When image_size is zero, a bootloader should attempt to keep as much
>>>     memory as possible free for use by the kernel immediately after the
>>> @@ -117,14 +123,14 @@ Header notes:
>>>     depending on selected features, and is effectively unbound.
>>>
>>>   The Image must be placed text_offset bytes from a 2MB aligned base
>>> -address near the start of usable system RAM and called there. Memory
>>> -below that base address is currently unusable by Linux, and
>>> therefore it
>>> -is strongly recommended that this location is the start of system RAM.
>>> -The region between the 2 MB aligned base address and the start of the
>>> -image has no special significance to the kernel, and may be used for
>>> -other purposes.
>>> +address anywhere in usable system RAM and called there. The region
>>> +between the 2 MB aligned base address and the start of the image has no
>>> +special significance to the kernel, and may be used for other purposes.
>>>   At least image_size bytes from the start of the image must be free for
>>>   use by the kernel.
>>> +NOTE: versions prior to v4.6 cannot make use of memory below the
>>> +physical offset of the Image so it is recommended that the Image be
>>> +placed as close as possible to the start of system RAM.
>>>
>>>   Any memory described to the kernel (even that below the start of the
>>>   image) which is not marked as reserved from the kernel (e.g., with a
>>> diff --git a/arch/arm64/include/asm/boot.h
>>> b/arch/arm64/include/asm/boot.h
>>> index 81151b67b26b..ebf2481889c3 100644
>>> --- a/arch/arm64/include/asm/boot.h
>>> +++ b/arch/arm64/include/asm/boot.h
>>> @@ -11,4 +11,10 @@
>>>   #define MIN_FDT_ALIGN        8
>>>   #define MAX_FDT_SIZE        SZ_2M
>>>
>>> +/*
>>> + * arm64 requires the kernel image to placed
>>> + * TEXT_OFFSET bytes beyond a 2 MB aligned base
>>> + */
>>> +#define MIN_KIMG_ALIGN        SZ_2M
>>> +
>>>   #endif
>>> diff --git a/arch/arm64/include/asm/kernel-pgtable.h
>>> b/arch/arm64/include/asm/kernel-pgtable.h
>>> index a459714ee29e..5c6375d8528b 100644
>>> --- a/arch/arm64/include/asm/kernel-pgtable.h
>>> +++ b/arch/arm64/include/asm/kernel-pgtable.h
>>> @@ -79,5 +79,17 @@
>>>   #define SWAPPER_MM_MMUFLAGS    (PTE_ATTRINDX(MT_NORMAL) |
>>> SWAPPER_PTE_FLAGS)
>>>   #endif
>>>
>>> +/*
>>> + * To make optimal use of block mappings when laying out the linear
>>> + * mapping, round down the base of physical memory to a size that can
>>> + * be mapped efficiently, i.e., either PUD_SIZE (4k granule) or
>>> PMD_SIZE
>>> + * (64k granule), or a multiple that can be mapped using contiguous
>>> bits
>>> + * in the page tables: 32 * PMD_SIZE (16k granule)
>>> + */
>>> +#ifdef CONFIG_ARM64_64K_PAGES
>>> +#define ARM64_MEMSTART_ALIGN    SZ_512M
>>> +#else
>>> +#define ARM64_MEMSTART_ALIGN    SZ_1G
>>> +#endif
>>>
>>>   #endif    /* __ASM_KERNEL_PGTABLE_H */
>>> diff --git a/arch/arm64/include/asm/kvm_asm.h
>>> b/arch/arm64/include/asm/kvm_asm.h
>>> index f5aee6e764e6..054ac25e7c2e 100644
>>> --- a/arch/arm64/include/asm/kvm_asm.h
>>> +++ b/arch/arm64/include/asm/kvm_asm.h
>>> @@ -26,7 +26,7 @@
>>>   #define KVM_ARM64_DEBUG_DIRTY_SHIFT    0
>>>   #define KVM_ARM64_DEBUG_DIRTY        (1 <<
>>> KVM_ARM64_DEBUG_DIRTY_SHIFT)
>>>
>>> -#define kvm_ksym_ref(sym)        ((void *)&sym - KIMAGE_VADDR +
>>> PAGE_OFFSET)
>>> +#define kvm_ksym_ref(sym)        phys_to_virt((u64)&sym -
>>> kimage_voffset)
>>>
>>>   #ifndef __ASSEMBLY__
>>>   struct kvm;
>>> diff --git a/arch/arm64/include/asm/memory.h
>>> b/arch/arm64/include/asm/memory.h
>>> index 4388651d1f0d..61005e7dd6cb 100644
>>> --- a/arch/arm64/include/asm/memory.h
>>> +++ b/arch/arm64/include/asm/memory.h
>>> @@ -88,10 +88,10 @@
>>>   #define __virt_to_phys(x) ({                        \
>>>       phys_addr_t __x = (phys_addr_t)(x);                \
>>>       __x >= PAGE_OFFSET ? (__x - PAGE_OFFSET + PHYS_OFFSET) :    \
>>> -                 (__x - KIMAGE_VADDR + PHYS_OFFSET); })
>>> +                 (__x - kimage_voffset); })
>>>
>>>   #define __phys_to_virt(x)    ((unsigned long)((x) - PHYS_OFFSET +
>>> PAGE_OFFSET))
>>> -#define __phys_to_kimg(x)    ((unsigned long)((x) - PHYS_OFFSET +
>>> KIMAGE_VADDR))
>>> +#define __phys_to_kimg(x)    ((unsigned long)((x) + kimage_voffset))
>>>
>>>   /*
>>>    * Convert a page to/from a physical address
>>> @@ -127,13 +127,14 @@ extern phys_addr_t        memstart_addr;
>>>   /* PHYS_OFFSET - the physical address of the start of memory. */
>>>   #define PHYS_OFFSET        ({ memstart_addr; })
>>>
>>> +/* the offset between the kernel virtual and physical mappings */
>>> +extern u64            kimage_voffset;
>>> +
>>>   /*
>>> - * The maximum physical address that the linear direct mapping
>>> - * of system RAM can cover. (PAGE_OFFSET can be interpreted as
>>> - * a 2's complement signed quantity and negated to derive the
>>> - * maximum size of the linear mapping.)
>>> + * Allow all memory at the discovery stage. We will clip it later.
>>>    */
>>> -#define MAX_MEMBLOCK_ADDR    ({ memstart_addr - PAGE_OFFSET - 1; })
>>> +#define MIN_MEMBLOCK_ADDR    0
>>> +#define MAX_MEMBLOCK_ADDR    U64_MAX
>>>
>>>   /*
>>>    * PFNs are used to describe any physical page; this means
>>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>>> index 04d38a058b19..05b98289093e 100644
>>> --- a/arch/arm64/kernel/head.S
>>> +++ b/arch/arm64/kernel/head.S
>>> @@ -428,7 +428,11 @@ __mmap_switched:
>>>       and    x4, x4, #~(THREAD_SIZE - 1)
>>>       msr    sp_el0, x4            // Save thread_info
>>>       str_l    x21, __fdt_pointer, x5        // Save FDT pointer
>>> -    str_l    x24, memstart_addr, x6        // Save PHYS_OFFSET
>>> +
>>> +    ldr    x4, =KIMAGE_VADDR        // Save the offset between
>>> +    sub    x4, x4, x24            // the kernel virtual and
>>> +    str_l    x4, kimage_voffset, x5        // physical mappings
>>> +
>>>       mov    x29, #0
>>>   #ifdef CONFIG_KASAN
>>>       bl    kasan_early_init
>>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
>>> index 999633bd7294..c9c62cab25a4 100644
>>> --- a/arch/arm64/kernel/image.h
>>> +++ b/arch/arm64/kernel/image.h
>>> @@ -42,15 +42,18 @@
>>>   #endif
>>>
>>>   #ifdef CONFIG_CPU_BIG_ENDIAN
>>> -#define __HEAD_FLAG_BE    1
>>> +#define __HEAD_FLAG_BE        1
>>>   #else
>>> -#define __HEAD_FLAG_BE    0
>>> +#define __HEAD_FLAG_BE        0
>>>   #endif
>>>
>>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>>> +#define __HEAD_FLAG_PAGE_SIZE    ((PAGE_SHIFT - 10) / 2)
>>>
>>> -#define __HEAD_FLAGS    ((__HEAD_FLAG_BE << 0) |    \
>>> -             (__HEAD_FLAG_PAGE_SIZE << 1))
>>> +#define __HEAD_FLAG_PHYS_BASE    1
>>> +
>>> +#define __HEAD_FLAGS        ((__HEAD_FLAG_BE << 0) |    \
>>> +                 (__HEAD_FLAG_PAGE_SIZE << 1) |    \
>>> +                 (__HEAD_FLAG_PHYS_BASE << 3))
>>>
>>>   /*
>>>    * These will output as part of the Image header, which should be
>>> little-endian
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index 1d627cd8121c..e8e853a1024c 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -35,8 +35,10 @@
>>>   #include <linux/efi.h>
>>>   #include <linux/swiotlb.h>
>>>
>>> +#include <asm/boot.h>
>>>   #include <asm/fixmap.h>
>>>   #include <asm/kasan.h>
>>> +#include <asm/kernel-pgtable.h>
>>>   #include <asm/memory.h>
>>>   #include <asm/sections.h>
>>>   #include <asm/setup.h>
>>> @@ -158,9 +160,80 @@ static int __init early_mem(char *p)
>>>   }
>>>   early_param("mem", early_mem);
>>>
>>> +/*
>>> + * clip_mem_range() - remove memblock memory between @min and @max
>>> until
>>> + *                    we meet the limit in 'memory_limit'.
>>> + */
>>> +static void __init clip_mem_range(u64 min, u64 max)
>>> +{
>>> +    u64 mem_size, to_remove;
>>> +    int i;
>>> +
>>> +again:
>>> +    mem_size = memblock_phys_mem_size();
>>> +    if (mem_size <= memory_limit || max <= min)
>>> +        return;
>>> +
>>> +    to_remove = mem_size - memory_limit;
>>> +
>>> +    for (i = memblock.memory.cnt - 1; i >= 0; i--) {
>>> +        struct memblock_region *r = memblock.memory.regions + i;
>>> +        u64 start = max(min, r->base);
>>> +        u64 end = min(max, r->base + r->size);
>>> +
>>> +        if (start >= max || end <= min)
>>> +            continue;
>>> +
>>> +        if (end > min) {
>>> +            u64 size = min(to_remove, end - max(start, min));
>>> +
>>> +            memblock_remove(end - size, size);
>>> +        } else {
>>> +            memblock_remove(start, min(max - start, to_remove));
>>> +        }
>>> +        goto again;
>>> +    }
>>> +}
>>> +
>>>   void __init arm64_memblock_init(void)
>>>   {
>>> -    memblock_enforce_memory_limit(memory_limit);
>>> +    const s64 linear_region_size = -(s64)PAGE_OFFSET;
>>> +
>>> +    /*
>>> +     * Select a suitable value for the base of physical memory.
>>> +     */
>>> +    memstart_addr = round_down(memblock_start_of_DRAM(),
>>> +                   ARM64_MEMSTART_ALIGN);
>>> +
>>> +    /*
>>> +     * Remove the memory that we will not be able to cover with the
>>> +     * linear mapping. Take care not to clip the kernel which may be
>>> +     * high in memory.
>>> +     */
>>> +    memblock_remove(max(memstart_addr + linear_region_size,
>>> __pa(_end)),
>>> +            ULLONG_MAX);
>>> +    if (memblock_end_of_DRAM() > linear_region_size)
>>> +        memblock_remove(0, memblock_end_of_DRAM() -
>>> linear_region_size);
>>> +
>>> +    if (memory_limit != (phys_addr_t)ULLONG_MAX) {
>>> +        u64 kbase = round_down(__pa(_text), MIN_KIMG_ALIGN);
>>> +        u64 kend = PAGE_ALIGN(__pa(_end));
>>> +        u64 const sz_4g = 0x100000000UL;
>>> +
>>> +        /*
>>> +         * Clip memory in order of preference:
>>> +         * - above the kernel and above 4 GB
>>> +         * - between 4 GB and the start of the kernel (if the kernel
>>> +         *   is loaded high in memory)
>>> +         * - between the kernel and 4 GB (if the kernel is loaded
>>> +         *   low in memory)
>>> +         * - below 4 GB
>>> +         */
>>> +        clip_mem_range(max(sz_4g, kend), ULLONG_MAX);
>>> +        clip_mem_range(sz_4g, kbase);
>>> +        clip_mem_range(kend, sz_4g);
>>> +        clip_mem_range(0, min(kbase, sz_4g));
>>> +    }
>>>
>>>       /*
>>>        * Register the kernel text, kernel data, initrd, and initial
>>> @@ -381,3 +454,28 @@ static int __init keepinitrd_setup(char *__unused)
>>>
>>>   __setup("keepinitrd", keepinitrd_setup);
>>>   #endif
>>> +
>>> +/*
>>> + * Dump out memory limit information on panic.
>>> + */
>>> +static int dump_mem_limit(struct notifier_block *self, unsigned long
>>> v, void *p)
>>> +{
>>> +    if (memory_limit != (phys_addr_t)ULLONG_MAX) {
>>> +        pr_emerg("Memory Limit: %llu MB\n", memory_limit >> 20);
>>> +    } else {
>>> +        pr_emerg("Memory Limit: none\n");
>>> +    }
>>> +    return 0;
>>> +}
>>> +
>>> +static struct notifier_block mem_limit_notifier = {
>>> +    .notifier_call = dump_mem_limit,
>>> +};
>>> +
>>> +static int __init register_mem_limit_dumper(void)
>>> +{
>>> +    atomic_notifier_chain_register(&panic_notifier_list,
>>> +                       &mem_limit_notifier);
>>> +    return 0;
>>> +}
>>> +__initcall(register_mem_limit_dumper);
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 4c4b15932963..8dda38378959 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -46,6 +46,9 @@
>>>
>>>   u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>>>
>>> +u64 kimage_voffset __read_mostly;
>>> +EXPORT_SYMBOL(kimage_voffset);
>>> +
>>>   /*
>>>    * Empty_zero_page is a special page that is used for
>>> zero-initialized data
>>>    * and COW.
>>> --
>>> 2.5.0
>>>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-15 18:59                   ` Catalin Marinas
@ 2016-02-16 12:59                     ` Andrey Ryabinin
  2016-02-16 14:12                       ` Mark Rutland
                                         ` (2 more replies)
  0 siblings, 3 replies; 78+ messages in thread
From: Andrey Ryabinin @ 2016-02-16 12:59 UTC (permalink / raw)
  To: linux-arm-kernel



On 02/15/2016 09:59 PM, Catalin Marinas wrote:
> On Mon, Feb 15, 2016 at 05:28:02PM +0300, Andrey Ryabinin wrote:
>> On 02/12/2016 07:06 PM, Catalin Marinas wrote:
>>> So far, we have:
>>>
>>> KASAN+for-next/kernmap goes wrong
>>> KASAN+UBSAN goes wrong
>>>
>>> Enabled individually, KASAN, UBSAN and for-next/kernmap seem fine. I may
>>> have to trim for-next/core down until we figure out where the problem
>>> is.
>>>
>>> BUG: KASAN: stack-out-of-bounds in find_busiest_group+0x164/0x16a0 at addr ffffffc93665bc8c
>>
>> Can it be related to TLB conflicts, which supposed to be fixed in
>> "arm64: kasan: avoid TLB conflicts" patch from "arm64: mm: rework page
>> table creation" series ?
> 
> I can very easily reproduce this with a vanilla 4.5-rc1 series by
> enabling inline instrumentation (maybe Mark's theory is true w.r.t.
> image size).
> 
> Some information, maybe you can shed some light on this. It seems to
> happen only for secondary CPUs on the swapper stack (I think allocated
> via fork_idle()). The code generated looks sane to me, so KASAN should
> not complain but maybe there is some uninitialised shadow, hence the
> error.
> 
> The report:
>

Actually, the first report is a bit more useful. It shows that shadow memory was corrupted:

  ffffffc93665bc00: f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 00 f1 f1
> ffffffc93665bc80: f1 f1 00 00 00 00 f3 f3 00 f4 f4 f4 f3 f3 f3 f3
                      ^
F1 - left redzone, it indicates start of stack frame
F3 - right redzone, it should be the end of stack frame.

But here we have the second set of F1s without F3s which should close the first set of F1s.
Also those two F3s in the middle cannot be right.

So shadow is corrupted.
Some hypotheses:

1) We share stack between several tasks (e.g. stack overflow, somehow corrupted SP).
    But this probably should cause kernel crash later, after kasan reports.

2) Shadow memory wasn't cleared. GCC poison memory on function entrance and unpoisons it before return.
     If we use some tricky way to exit from function this could cause false-positives like that.
     E.g. some hand-written assembly return code.

3) Screwed shadow mapping. I think the patch below should uncover such problem.
It boot-tested on qemu and didn't show any problem


---
 arch/arm64/mm/kasan_init.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index cf038c7..25d685c 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -117,6 +117,59 @@ static void __init cpu_set_ttbr1(unsigned long ttbr1)
 	: "r" (ttbr1));
 }
 
+static void verify_shadow(void)
+{
+	struct memblock_region *reg;
+	int i = 0;
+
+	for_each_memblock(memory, reg) {
+		void *start = (void *)__phys_to_virt(reg->base);
+		void *end = (void *)__phys_to_virt(reg->base + reg->size);
+		int *shadow_start, *shadow_end;
+
+		if (start >= end)
+			break;
+		shadow_start = (int *)((unsigned long)kasan_mem_to_shadow(start) & ~(PAGE_SIZE - 1));
+		shadow_end =  (int *)kasan_mem_to_shadow(end);
+		for (; shadow_start < shadow_end; shadow_start += PAGE_SIZE/sizeof(int)) {
+			*shadow_start = i;
+			i++;
+		}
+	}
+
+	i = 0;
+	for_each_memblock(memory, reg) {
+		void *start = (void *)__phys_to_virt(reg->base);
+		void *end = (void *)__phys_to_virt(reg->base + reg->size);
+		int *shadow_start, *shadow_end;
+
+		if (start >= end)
+			break;
+		shadow_start = (int *)((unsigned long)kasan_mem_to_shadow(start) & ~(PAGE_SIZE - 1));
+		shadow_end =  (int *)kasan_mem_to_shadow(end);
+		for (; shadow_start < shadow_end; shadow_start += PAGE_SIZE/sizeof(int)) {
+			if (*shadow_start != i) {
+				pr_err("screwed shadow mapping %d, %d\n", *shadow_start, i);
+				goto clear;
+			}
+			i++;
+		}
+	}
+clear:
+	for_each_memblock(memory, reg) {
+		void *start = (void *)__phys_to_virt(reg->base);
+		void *end = (void *)__phys_to_virt(reg->base + reg->size);
+		unsigned long shadow_start, shadow_end;
+
+		if (start >= end)
+			break;
+		shadow_start =  ((unsigned long)kasan_mem_to_shadow(start) & ~(PAGE_SIZE - 1));
+		shadow_end =  (unsigned long)kasan_mem_to_shadow(end);
+		memset((void *)shadow_start, 0, shadow_end - shadow_start);
+	}
+
+}
+
 void __init kasan_init(void)
 {
 	struct memblock_region *reg;
@@ -159,6 +212,8 @@ void __init kasan_init(void)
 	cpu_set_ttbr1(__pa(swapper_pg_dir));
 	flush_tlb_all();
 
+	verify_shadow();
+
 	/* At this point kasan is fully initialized. Enable error messages */
 	init_task.kasan_depth = 0;
 	pr_info("KernelAddressSanitizer initialized\n");
-- 

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-16 12:59                     ` Andrey Ryabinin
@ 2016-02-16 14:12                       ` Mark Rutland
  2016-02-16 14:29                         ` Mark Rutland
  2016-02-16 15:17                       ` Ard Biesheuvel
  2016-02-17 14:39                       ` Mark Rutland
  2 siblings, 1 reply; 78+ messages in thread
From: Mark Rutland @ 2016-02-16 14:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 16, 2016 at 03:59:09PM +0300, Andrey Ryabinin wrote:
> 
> On 02/15/2016 09:59 PM, Catalin Marinas wrote:
> > On Mon, Feb 15, 2016 at 05:28:02PM +0300, Andrey Ryabinin wrote:
> >> On 02/12/2016 07:06 PM, Catalin Marinas wrote:
> >>> So far, we have:
> >>>
> >>> KASAN+for-next/kernmap goes wrong
> >>> KASAN+UBSAN goes wrong
> >>>
> >>> Enabled individually, KASAN, UBSAN and for-next/kernmap seem fine. I may
> >>> have to trim for-next/core down until we figure out where the problem
> >>> is.
> >>>
> >>> BUG: KASAN: stack-out-of-bounds in find_busiest_group+0x164/0x16a0 at addr ffffffc93665bc8c
> >>
> >> Can it be related to TLB conflicts, which supposed to be fixed in
> >> "arm64: kasan: avoid TLB conflicts" patch from "arm64: mm: rework page
> >> table creation" series ?
> > 
> > I can very easily reproduce this with a vanilla 4.5-rc1 series by
> > enabling inline instrumentation (maybe Mark's theory is true w.r.t.
> > image size).
> > 
> > Some information, maybe you can shed some light on this. It seems to
> > happen only for secondary CPUs on the swapper stack (I think allocated
> > via fork_idle()). The code generated looks sane to me, so KASAN should
> > not complain but maybe there is some uninitialised shadow, hence the
> > error.
> > 
> > The report:
> >
> 
> Actually, the first report is a bit more useful. It shows that shadow memory was corrupted:
> 
>   ffffffc93665bc00: f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 00 f1 f1
> > ffffffc93665bc80: f1 f1 00 00 00 00 f3 f3 00 f4 f4 f4 f3 f3 f3 f3
>                       ^
> F1 - left redzone, it indicates start of stack frame
> F3 - right redzone, it should be the end of stack frame.
> 
> But here we have the second set of F1s without F3s which should close the first set of F1s.
> Also those two F3s in the middle cannot be right.
> 
> So shadow is corrupted.
> Some hypotheses:
> 
> 1) We share stack between several tasks (e.g. stack overflow, somehow corrupted SP).
>     But this probably should cause kernel crash later, after kasan reports.
> 
> 2) Shadow memory wasn't cleared. GCC poison memory on function entrance and unpoisons it before return.
>      If we use some tricky way to exit from function this could cause false-positives like that.
>      E.g. some hand-written assembly return code.
> 
> 3) Screwed shadow mapping. I think the patch below should uncover such problem.
> It boot-tested on qemu and didn't show any problem

With that path applied I get:

[    0.000000] kasan: screwed shadow mapping 62184, 62182
[    0.000000] kasan: KernelAddressSanitizer initialized

I'm using v4.5-rc1 with KASAN_INLINE, and a random collection of debug options
to bloat the kernel per prior theory that the text size had somethign to do
with the issue.

Later in the boot process I see lots of failures like:

[   13.292190] ==================================================================
[   13.299543] BUG: KASAN: stack-out-of-bounds in find_busiest_group+0x1950/0x19b8 at addr ffffffc936ad3c8c
[   13.309090] Read of size 4 by task swapper/3/0
[   13.313575] page:ffffffbde6dab4c0 count:0 mapcount:0 mapping:          (null) index:0x0
[   13.321657] flags: 0x4000000000000000()
[   13.325539] page dumped because: kasan: bad access detected
[   13.331150] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.5.0-rc1+ #19
[   13.337528] Hardware name: ARM Juno development board (r1) (DT)
[   13.343471] Call trace:
[   13.345978] [<ffffffc000091400>] dump_backtrace+0x0/0x3c0
[   13.351416] [<ffffffc0000917e4>] show_stack+0x24/0x30
[   13.356507] [<ffffffc0008c3a64>] dump_stack+0xc4/0x150
[   13.361685] [<ffffffc0004032bc>] kasan_report_error+0x52c/0x558
[   13.367640] [<ffffffc0004033fc>] __asan_report_load4_noabort+0x54/0x60
[   13.374200] [<ffffffc0001a46e8>] find_busiest_group+0x1950/0x19b8
[   13.380327] [<ffffffc0001a49ec>] load_balance+0x29c/0x19e0
[   13.385851] [<ffffffc0001a67c0>] pick_next_task_fair+0x690/0xd88
[   13.391896] [<ffffffc001213cf4>] __schedule+0x85c/0x13c8
[   13.397248] [<ffffffc001214d7c>] schedule+0xe4/0x228
[   13.402256] [<ffffffc00121549c>] schedule_preempt_disabled+0x24/0xb8
[   13.408642] [<ffffffc0001b97f8>] cpu_startup_entry+0x188/0x738
[   13.414511] [<ffffffc00009bcfc>] secondary_start_kernel+0x244/0x2b8
[   13.420806] [<0000000080082efc>] 0x80082efc
[   13.425023] Memory state around the buggy address:
[   13.429854]  ffffffc936ad3b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   13.437153]  ffffffc936ad3c00: 00 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f3 f3
[   13.444451] >ffffffc936ad3c80: f3 f3 00 00 00 00 00 00 00 f4 f4 f4 f3 f3 f3 f3
[   13.451742]                       ^
[   13.455274]  ffffffc936ad3d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   13.462572]  ffffffc936ad3d80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[   13.469863] ==================================================================

I guess memroy layout has something to do with this. FWIW on this board my
memory map comes from EFI:

[    0.000000] Processing EFI memory map:
[    0.000000]   0x000008000000-0x00000bffffff [Memory Mapped I/O  |RUN|  |XP|  |  |  |   |  |  |  |UC]
[    0.000000]   0x00001c170000-0x00001c170fff [Memory Mapped I/O  |RUN|  |XP|  |  |  |   |  |  |  |UC]
[    0.000000]   0x000080000000-0x00008000ffff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x000080010000-0x00008007ffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x000080080000-0x000081dbffff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x000081dc0000-0x00009fdfffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x00009fe00000-0x00009fe0ffff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x00009fe10000-0x0000dfffffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000e00f0000-0x0000f5a58fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000f5a59000-0x0000f7793fff [Loader Code        |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000f7794000-0x0000f9431fff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000f9432000-0x0000f944ffff [Loader Code        |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000f9450000-0x0000f945ffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000f9460000-0x0000f94dffff [ACPI Reclaim Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000f94e0000-0x0000f94effff [ACPI Memory NVS    |   |  |  |  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000f94f0000-0x0000f94fffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000f9500000-0x0000f950ffff [Runtime Code       |RUN|  |  |  |  |RO|   |WB|WT|WC|UC]*
[    0.000000]   0x0000f9510000-0x0000f953ffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000f9540000-0x0000f954ffff [Runtime Code       |RUN|  |  |  |  |RO|   |WB|WT|WC|UC]*
[    0.000000]   0x0000f9550000-0x0000f956ffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000f9570000-0x0000f958ffff [ACPI Reclaim Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000f9590000-0x0000f960ffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000f9610000-0x0000f961ffff [Runtime Code       |RUN|  |  |  |  |RO|   |WB|WT|WC|UC]*
[    0.000000]   0x0000f9620000-0x0000f96effff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000f96f0000-0x0000f96fffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000f9700000-0x0000f970ffff [Runtime Code       |RUN|  |  |  |  |RO|   |WB|WT|WC|UC]*
[    0.000000]   0x0000f9710000-0x0000f974ffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000f9750000-0x0000f975ffff [Runtime Code       |RUN|  |  |  |  |RO|   |WB|WT|WC|UC]*
[    0.000000]   0x0000f9760000-0x0000f97cffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000f97d0000-0x0000f97dffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000f97e0000-0x0000f97effff [Runtime Code       |RUN|  |  |  |  |RO|   |WB|WT|WC|UC]*
[    0.000000]   0x0000f97f0000-0x0000f981ffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000f9820000-0x0000f9820fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000f9821000-0x0000f9827fff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000f9828000-0x0000f982bfff [Reserved           |   |  |  |  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000f982c000-0x0000fdaedfff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fdaee000-0x0000fdfbefff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fdfbf000-0x0000fdfbffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fdfc0000-0x0000fdffbfff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fdffc000-0x0000fe018fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe019000-0x0000fe020fff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe021000-0x0000fe022fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe023000-0x0000fe02bfff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe02c000-0x0000fe03afff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe03b000-0x0000fe03dfff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe03e000-0x0000fe04efff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe04f000-0x0000fe057fff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe058000-0x0000fe073fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe074000-0x0000fe074fff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe075000-0x0000fe078fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe079000-0x0000fe07bfff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe07c000-0x0000fe07dfff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe07e000-0x0000fe085fff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe086000-0x0000fe087fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe088000-0x0000fe171fff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe172000-0x0000fe198fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe199000-0x0000fe65ffff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe660000-0x0000fe6a2fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe6a3000-0x0000fe7effff [Boot Code          |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe7f0000-0x0000fe7fffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000fe800000-0x0000fe80ffff [Runtime Code       |RUN|  |  |  |  |RO|   |WB|WT|WC|UC]*
[    0.000000]   0x0000fe810000-0x0000fe82ffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000fe830000-0x0000fe83ffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe840000-0x0000fe88ffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
[    0.000000]   0x0000fe890000-0x0000fe891fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x0000fe892000-0x0000feffffff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x000880000000-0x00099bffffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000]   0x00099c000000-0x0009ffffffff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-16 14:12                       ` Mark Rutland
@ 2016-02-16 14:29                         ` Mark Rutland
  0 siblings, 0 replies; 78+ messages in thread
From: Mark Rutland @ 2016-02-16 14:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 16, 2016 at 02:12:59PM +0000, Mark Rutland wrote:
> On Tue, Feb 16, 2016 at 03:59:09PM +0300, Andrey Ryabinin wrote:
> > So shadow is corrupted.
> > Some hypotheses:
> > 
> > 1) We share stack between several tasks (e.g. stack overflow, somehow corrupted SP).
> >     But this probably should cause kernel crash later, after kasan reports.
> > 
> > 2) Shadow memory wasn't cleared. GCC poison memory on function entrance and unpoisons it before return.
> >      If we use some tricky way to exit from function this could cause false-positives like that.
> >      E.g. some hand-written assembly return code.
> > 
> > 3) Screwed shadow mapping. I think the patch below should uncover such problem.
> > It boot-tested on qemu and didn't show any problem
> 
> With that path applied I get:
> 
> [    0.000000] kasan: screwed shadow mapping 62184, 62182
> [    0.000000] kasan: KernelAddressSanitizer initialized
> 
> I'm using v4.5-rc1 with KASAN_INLINE, and a random collection of debug options
> to bloat the kernel per prior theory that the text size had somethign to do
> with the issue.

I hacked kasan_init to dump info as it created each shadow region:

[    0.000000] kasan_init shadowing [ffffffc000000000-ffffffc060000000] @ [ffffff8800000000-ffffff880c000001] nid 0
[    0.000000] kasan_init shadowing [ffffffc0600f0000-ffffffc079450000] @ [ffffff880c01e000-ffffff880f28a001] nid 0
[    0.000000] kasan_init shadowing [ffffffc079450000-ffffffc079820000] @ [ffffff880f28a000-ffffff880f304001] nid 0
[    0.000000] kasan_init shadowing [ffffffc079820000-ffffffc079821000] @ [ffffff880f304000-ffffff880f304201] nid 0
[    0.000000] kasan_init shadowing [ffffffc079821000-ffffffc079822000] @ [ffffff880f304200-ffffff880f304401] nid 0
[    0.000000] kasan_init shadowing [ffffffc079822000-ffffffc079828000] @ [ffffff880f304400-ffffff880f305001] nid 0
[    0.000000] kasan_init shadowing [ffffffc079828000-ffffffc07982c000] @ [ffffff880f305000-ffffff880f305801] nid 0
[    0.000000] kasan_init shadowing [ffffffc07982c000-ffffffc07e7f0000] @ [ffffff880f305800-ffffff880fcfe001] nid 0
[    0.000000] kasan_init shadowing [ffffffc07e7f0000-ffffffc07e830000] @ [ffffff880fcfe000-ffffff880fd06001] nid 0
[    0.000000] kasan_init shadowing [ffffffc07e830000-ffffffc07e840000] @ [ffffff880fd06000-ffffff880fd08001] nid 0
[    0.000000] kasan_init shadowing [ffffffc07e840000-ffffffc07e890000] @ [ffffff880fd08000-ffffff880fd12001] nid 0
[    0.000000] kasan_init shadowing [ffffffc07e890000-ffffffc07f000000] @ [ffffff880fd12000-ffffff880fe00001] nid 0
[    0.000000] kasan_init shadowing [ffffffc800000000-ffffffc980000000] @ [ffffff8900000000-ffffff8930000001] nid 0
[    0.000000] kasan: screwed shadow mapping 62184, 62182
[    0.000000] kasan: KernelAddressSanitizer initialized

I note the the end of each shadow region overlaps the beginning of the next due
to the intentional end+1...

Other than the waste of memory (and the TLB conflict that gets solved by my
pgtable rework), I'm not sure though I'm not sure that's a problem, though.

Mark.

> I guess memroy layout has something to do with this. FWIW on this board my
> memory map comes from EFI:
> 
> [    0.000000] Processing EFI memory map:
> [    0.000000]   0x000008000000-0x00000bffffff [Memory Mapped I/O  |RUN|  |XP|  |  |  |   |  |  |  |UC]
> [    0.000000]   0x00001c170000-0x00001c170fff [Memory Mapped I/O  |RUN|  |XP|  |  |  |   |  |  |  |UC]
> [    0.000000]   0x000080000000-0x00008000ffff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x000080010000-0x00008007ffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x000080080000-0x000081dbffff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x000081dc0000-0x00009fdfffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x00009fe00000-0x00009fe0ffff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x00009fe10000-0x0000dfffffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000e00f0000-0x0000f5a58fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000f5a59000-0x0000f7793fff [Loader Code        |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000f7794000-0x0000f9431fff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000f9432000-0x0000f944ffff [Loader Code        |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000f9450000-0x0000f945ffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f9460000-0x0000f94dffff [ACPI Reclaim Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f94e0000-0x0000f94effff [ACPI Memory NVS    |   |  |  |  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f94f0000-0x0000f94fffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f9500000-0x0000f950ffff [Runtime Code       |RUN|  |  |  |  |RO|   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f9510000-0x0000f953ffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f9540000-0x0000f954ffff [Runtime Code       |RUN|  |  |  |  |RO|   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f9550000-0x0000f956ffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f9570000-0x0000f958ffff [ACPI Reclaim Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f9590000-0x0000f960ffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f9610000-0x0000f961ffff [Runtime Code       |RUN|  |  |  |  |RO|   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f9620000-0x0000f96effff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f96f0000-0x0000f96fffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f9700000-0x0000f970ffff [Runtime Code       |RUN|  |  |  |  |RO|   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f9710000-0x0000f974ffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f9750000-0x0000f975ffff [Runtime Code       |RUN|  |  |  |  |RO|   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f9760000-0x0000f97cffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f97d0000-0x0000f97dffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f97e0000-0x0000f97effff [Runtime Code       |RUN|  |  |  |  |RO|   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f97f0000-0x0000f981ffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f9820000-0x0000f9820fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000f9821000-0x0000f9827fff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000f9828000-0x0000f982bfff [Reserved           |   |  |  |  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000f982c000-0x0000fdaedfff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fdaee000-0x0000fdfbefff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fdfbf000-0x0000fdfbffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fdfc0000-0x0000fdffbfff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fdffc000-0x0000fe018fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe019000-0x0000fe020fff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe021000-0x0000fe022fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe023000-0x0000fe02bfff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe02c000-0x0000fe03afff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe03b000-0x0000fe03dfff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe03e000-0x0000fe04efff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe04f000-0x0000fe057fff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe058000-0x0000fe073fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe074000-0x0000fe074fff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe075000-0x0000fe078fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe079000-0x0000fe07bfff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe07c000-0x0000fe07dfff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe07e000-0x0000fe085fff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe086000-0x0000fe087fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe088000-0x0000fe171fff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe172000-0x0000fe198fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe199000-0x0000fe65ffff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe660000-0x0000fe6a2fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe6a3000-0x0000fe7effff [Boot Code          |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe7f0000-0x0000fe7fffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000fe800000-0x0000fe80ffff [Runtime Code       |RUN|  |  |  |  |RO|   |WB|WT|WC|UC]*
> [    0.000000]   0x0000fe810000-0x0000fe82ffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000fe830000-0x0000fe83ffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe840000-0x0000fe88ffff [Runtime Data       |RUN|  |XP|  |  |  |   |WB|WT|WC|UC]*
> [    0.000000]   0x0000fe890000-0x0000fe891fff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x0000fe892000-0x0000feffffff [Boot Data          |   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x000880000000-0x00099bffffff [Conventional Memory|   |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000]   0x00099c000000-0x0009ffffffff [Loader Data        |   |  |  |  |  |  |   |WB|WT|WC|UC]
> 
> Thanks,
> Mark.
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-16 12:59                     ` Andrey Ryabinin
  2016-02-16 14:12                       ` Mark Rutland
@ 2016-02-16 15:17                       ` Ard Biesheuvel
  2016-02-16 15:36                         ` Andrey Ryabinin
  2016-02-17 14:39                       ` Mark Rutland
  2 siblings, 1 reply; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-16 15:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 16 February 2016 at 13:59, Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
>
>
> On 02/15/2016 09:59 PM, Catalin Marinas wrote:
>> On Mon, Feb 15, 2016 at 05:28:02PM +0300, Andrey Ryabinin wrote:
>>> On 02/12/2016 07:06 PM, Catalin Marinas wrote:
>>>> So far, we have:
>>>>
>>>> KASAN+for-next/kernmap goes wrong
>>>> KASAN+UBSAN goes wrong
>>>>
>>>> Enabled individually, KASAN, UBSAN and for-next/kernmap seem fine. I may
>>>> have to trim for-next/core down until we figure out where the problem
>>>> is.
>>>>
>>>> BUG: KASAN: stack-out-of-bounds in find_busiest_group+0x164/0x16a0 at addr ffffffc93665bc8c
>>>
>>> Can it be related to TLB conflicts, which supposed to be fixed in
>>> "arm64: kasan: avoid TLB conflicts" patch from "arm64: mm: rework page
>>> table creation" series ?
>>
>> I can very easily reproduce this with a vanilla 4.5-rc1 series by
>> enabling inline instrumentation (maybe Mark's theory is true w.r.t.
>> image size).
>>
>> Some information, maybe you can shed some light on this. It seems to
>> happen only for secondary CPUs on the swapper stack (I think allocated
>> via fork_idle()). The code generated looks sane to me, so KASAN should
>> not complain but maybe there is some uninitialised shadow, hence the
>> error.
>>
>> The report:
>>
>
> Actually, the first report is a bit more useful. It shows that shadow memory was corrupted:
>
>   ffffffc93665bc00: f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 00 f1 f1
>> ffffffc93665bc80: f1 f1 00 00 00 00 f3 f3 00 f4 f4 f4 f3 f3 f3 f3
>                       ^
> F1 - left redzone, it indicates start of stack frame
> F3 - right redzone, it should be the end of stack frame.
>
> But here we have the second set of F1s without F3s which should close the first set of F1s.
> Also those two F3s in the middle cannot be right.
>
> So shadow is corrupted.
> Some hypotheses:
>
> 1) We share stack between several tasks (e.g. stack overflow, somehow corrupted SP).
>     But this probably should cause kernel crash later, after kasan reports.
>
> 2) Shadow memory wasn't cleared. GCC poison memory on function entrance and unpoisons it before return.
>      If we use some tricky way to exit from function this could cause false-positives like that.
>      E.g. some hand-written assembly return code.
>
> 3) Screwed shadow mapping. I think the patch below should uncover such problem.
> It boot-tested on qemu and didn't show any problem
>

I think this patch gives false positive warnings in some cases:

>
> ---
>  arch/arm64/mm/kasan_init.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 55 insertions(+)
>
> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
> index cf038c7..25d685c 100644
> --- a/arch/arm64/mm/kasan_init.c
> +++ b/arch/arm64/mm/kasan_init.c
> @@ -117,6 +117,59 @@ static void __init cpu_set_ttbr1(unsigned long ttbr1)
>         : "r" (ttbr1));
>  }
>
> +static void verify_shadow(void)
> +{
> +       struct memblock_region *reg;
> +       int i = 0;
> +
> +       for_each_memblock(memory, reg) {
> +               void *start = (void *)__phys_to_virt(reg->base);
> +               void *end = (void *)__phys_to_virt(reg->base + reg->size);
> +               int *shadow_start, *shadow_end;
> +
> +               if (start >= end)
> +                       break;
> +               shadow_start = (int *)((unsigned long)kasan_mem_to_shadow(start) & ~(PAGE_SIZE - 1));
> +               shadow_end =  (int *)kasan_mem_to_shadow(end);

shadow_start and shadow_end can refer to the same page as in the
previous iteration. For instance, I have these two regions

  0x00006e090000-0x00006e0adfff [Conventional Memory|   |  |  |  |  |
|   |WB|WT|WC|UC]
  0x00006e0ae000-0x00006e0affff [Loader Data        |   |  |  |  |  |
|   |WB|WT|WC|UC]

which are covered by different memblocks since the second one is
marked as MEMBLOCK_NOMAP, due to the fact that it contains the UEFI
memory map.

I get the following output

kasan: screwed shadow mapping 23575, 23573

which I think is simply a result from the fact the shadow_start refers
to the same page as in the previous iteration(s)


> +               for (; shadow_start < shadow_end; shadow_start += PAGE_SIZE/sizeof(int)) {
> +                       *shadow_start = i;
> +                       i++;
> +               }
> +       }
> +
> +       i = 0;
> +       for_each_memblock(memory, reg) {
> +               void *start = (void *)__phys_to_virt(reg->base);
> +               void *end = (void *)__phys_to_virt(reg->base + reg->size);
> +               int *shadow_start, *shadow_end;
> +
> +               if (start >= end)
> +                       break;
> +               shadow_start = (int *)((unsigned long)kasan_mem_to_shadow(start) & ~(PAGE_SIZE - 1));
> +               shadow_end =  (int *)kasan_mem_to_shadow(end);
> +               for (; shadow_start < shadow_end; shadow_start += PAGE_SIZE/sizeof(int)) {
> +                       if (*shadow_start != i) {
> +                               pr_err("screwed shadow mapping %d, %d\n", *shadow_start, i);
> +                               goto clear;
> +                       }
> +                       i++;
> +               }
> +       }
> +clear:
> +       for_each_memblock(memory, reg) {
> +               void *start = (void *)__phys_to_virt(reg->base);
> +               void *end = (void *)__phys_to_virt(reg->base + reg->size);
> +               unsigned long shadow_start, shadow_end;
> +
> +               if (start >= end)
> +                       break;
> +               shadow_start =  ((unsigned long)kasan_mem_to_shadow(start) & ~(PAGE_SIZE - 1));
> +               shadow_end =  (unsigned long)kasan_mem_to_shadow(end);
> +               memset((void *)shadow_start, 0, shadow_end - shadow_start);
> +       }
> +
> +}
> +
>  void __init kasan_init(void)
>  {
>         struct memblock_region *reg;
> @@ -159,6 +212,8 @@ void __init kasan_init(void)
>         cpu_set_ttbr1(__pa(swapper_pg_dir));
>         flush_tlb_all();
>
> +       verify_shadow();
> +
>         /* At this point kasan is fully initialized. Enable error messages */
>         init_task.kasan_depth = 0;
>         pr_info("KernelAddressSanitizer initialized\n");
> --
>
>
>
>
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-16 15:17                       ` Ard Biesheuvel
@ 2016-02-16 15:36                         ` Andrey Ryabinin
  2016-02-16 16:42                           ` Mark Rutland
  0 siblings, 1 reply; 78+ messages in thread
From: Andrey Ryabinin @ 2016-02-16 15:36 UTC (permalink / raw)
  To: linux-arm-kernel



On 02/16/2016 06:17 PM, Ard Biesheuvel wrote:
> On 16 February 2016 at 13:59, Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
>>
>>
>> On 02/15/2016 09:59 PM, Catalin Marinas wrote:
>>> On Mon, Feb 15, 2016 at 05:28:02PM +0300, Andrey Ryabinin wrote:
>>>> On 02/12/2016 07:06 PM, Catalin Marinas wrote:
>>>>> So far, we have:
>>>>>
>>>>> KASAN+for-next/kernmap goes wrong
>>>>> KASAN+UBSAN goes wrong
>>>>>
>>>>> Enabled individually, KASAN, UBSAN and for-next/kernmap seem fine. I may
>>>>> have to trim for-next/core down until we figure out where the problem
>>>>> is.
>>>>>
>>>>> BUG: KASAN: stack-out-of-bounds in find_busiest_group+0x164/0x16a0 at addr ffffffc93665bc8c
>>>>
>>>> Can it be related to TLB conflicts, which supposed to be fixed in
>>>> "arm64: kasan: avoid TLB conflicts" patch from "arm64: mm: rework page
>>>> table creation" series ?
>>>
>>> I can very easily reproduce this with a vanilla 4.5-rc1 series by
>>> enabling inline instrumentation (maybe Mark's theory is true w.r.t.
>>> image size).
>>>
>>> Some information, maybe you can shed some light on this. It seems to
>>> happen only for secondary CPUs on the swapper stack (I think allocated
>>> via fork_idle()). The code generated looks sane to me, so KASAN should
>>> not complain but maybe there is some uninitialised shadow, hence the
>>> error.
>>>
>>> The report:
>>>
>>
>> Actually, the first report is a bit more useful. It shows that shadow memory was corrupted:
>>
>>   ffffffc93665bc00: f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 00 f1 f1
>>> ffffffc93665bc80: f1 f1 00 00 00 00 f3 f3 00 f4 f4 f4 f3 f3 f3 f3
>>                       ^
>> F1 - left redzone, it indicates start of stack frame
>> F3 - right redzone, it should be the end of stack frame.
>>
>> But here we have the second set of F1s without F3s which should close the first set of F1s.
>> Also those two F3s in the middle cannot be right.
>>
>> So shadow is corrupted.
>> Some hypotheses:
>>
>> 1) We share stack between several tasks (e.g. stack overflow, somehow corrupted SP).
>>     But this probably should cause kernel crash later, after kasan reports.
>>
>> 2) Shadow memory wasn't cleared. GCC poison memory on function entrance and unpoisons it before return.
>>      If we use some tricky way to exit from function this could cause false-positives like that.
>>      E.g. some hand-written assembly return code.
>>
>> 3) Screwed shadow mapping. I think the patch below should uncover such problem.
>> It boot-tested on qemu and didn't show any problem
>>
> 
> I think this patch gives false positive warnings in some cases:
> 
>>
>> ---
>>  arch/arm64/mm/kasan_init.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 55 insertions(+)
>>
>> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
>> index cf038c7..25d685c 100644
>> --- a/arch/arm64/mm/kasan_init.c
>> +++ b/arch/arm64/mm/kasan_init.c
>> @@ -117,6 +117,59 @@ static void __init cpu_set_ttbr1(unsigned long ttbr1)
>>         : "r" (ttbr1));
>>  }
>>
>> +static void verify_shadow(void)
>> +{
>> +       struct memblock_region *reg;
>> +       int i = 0;
>> +
>> +       for_each_memblock(memory, reg) {
>> +               void *start = (void *)__phys_to_virt(reg->base);
>> +               void *end = (void *)__phys_to_virt(reg->base + reg->size);
>> +               int *shadow_start, *shadow_end;
>> +
>> +               if (start >= end)
>> +                       break;
>> +               shadow_start = (int *)((unsigned long)kasan_mem_to_shadow(start) & ~(PAGE_SIZE - 1));
>> +               shadow_end =  (int *)kasan_mem_to_shadow(end);
> 
> shadow_start and shadow_end can refer to the same page as in the
> previous iteration. For instance, I have these two regions
> 
>   0x00006e090000-0x00006e0adfff [Conventional Memory|   |  |  |  |  |
> |   |WB|WT|WC|UC]
>   0x00006e0ae000-0x00006e0affff [Loader Data        |   |  |  |  |  |
> |   |WB|WT|WC|UC]
> 
> which are covered by different memblocks since the second one is
> marked as MEMBLOCK_NOMAP, due to the fact that it contains the UEFI
> memory map.
> 
> I get the following output
> 
> kasan: screwed shadow mapping 23575, 23573
> 
> which I think is simply a result from the fact the shadow_start refers
> to the same page as in the previous iteration(s)
> 

You are right. 
So we should write 'shadow_start' instead of 'i'.

---
 arch/arm64/mm/kasan_init.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index cf038c7..ee035c2 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -117,6 +117,55 @@ static void __init cpu_set_ttbr1(unsigned long ttbr1)
 	: "r" (ttbr1));
 }
 
+static void verify_shadow(void)
+{
+	struct memblock_region *reg;
+
+	for_each_memblock(memory, reg) {
+		void *start = (void *)__phys_to_virt(reg->base);
+		void *end = (void *)__phys_to_virt(reg->base + reg->size);
+		unsigned long *shadow_start, *shadow_end;
+
+		if (start >= end)
+			break;
+		shadow_start = (unsigned long *)kasan_mem_to_shadow(start);
+		shadow_end =  (unsigned long *)kasan_mem_to_shadow(end);
+		for (; shadow_start < shadow_end; shadow_start += PAGE_SIZE/sizeof(unsigned long)) {
+			*shadow_start = (unsigned long)shadow_start;
+		}
+	}
+
+	for_each_memblock(memory, reg) {
+		void *start = (void *)__phys_to_virt(reg->base);
+		void *end = (void *)__phys_to_virt(reg->base + reg->size);
+		unsigned long *shadow_start, *shadow_end;
+
+		if (start >= end)
+			break;
+		shadow_start = (unsigned long *)kasan_mem_to_shadow(start);
+		shadow_end =  (unsigned long *)kasan_mem_to_shadow(end);
+		for (; shadow_start < shadow_end; shadow_start += PAGE_SIZE/sizeof(unsigned long)) {
+			if (*shadow_start != (unsigned long)shadow_start) {
+				pr_err("screwed shadow mapping %lx, %lx\n", *shadow_start, (unsigned long)shadow_start);
+				goto clear;
+			}
+		}
+	}
+clear:
+	for_each_memblock(memory, reg) {
+		void *start = (void *)__phys_to_virt(reg->base);
+		void *end = (void *)__phys_to_virt(reg->base + reg->size);
+		unsigned long shadow_start, shadow_end;
+
+		if (start >= end)
+			break;
+		shadow_start =  (unsigned long)kasan_mem_to_shadow(start);
+		shadow_end =  (unsigned long)kasan_mem_to_shadow(end);
+		memset((void *)shadow_start, 0, shadow_end - shadow_start);
+	}
+
+}
+
 void __init kasan_init(void)
 {
 	struct memblock_region *reg;
@@ -159,6 +208,8 @@ void __init kasan_init(void)
 	cpu_set_ttbr1(__pa(swapper_pg_dir));
 	flush_tlb_all();
 
+	verify_shadow();
+
 	/* At this point kasan is fully initialized. Enable error messages */
 	init_task.kasan_depth = 0;
 	pr_info("KernelAddressSanitizer initialized\n");
-- 

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-16 15:36                         ` Andrey Ryabinin
@ 2016-02-16 16:42                           ` Mark Rutland
  2016-02-17  9:15                             ` Andrey Ryabinin
  0 siblings, 1 reply; 78+ messages in thread
From: Mark Rutland @ 2016-02-16 16:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 16, 2016 at 06:36:36PM +0300, Andrey Ryabinin wrote:
> 
> On 02/16/2016 06:17 PM, Ard Biesheuvel wrote:
> > On 16 February 2016 at 13:59, Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> >> +static void verify_shadow(void)
> >> +{
> >> +       struct memblock_region *reg;
> >> +       int i = 0;
> >> +
> >> +       for_each_memblock(memory, reg) {
> >> +               void *start = (void *)__phys_to_virt(reg->base);
> >> +               void *end = (void *)__phys_to_virt(reg->base + reg->size);
> >> +               int *shadow_start, *shadow_end;
> >> +
> >> +               if (start >= end)
> >> +                       break;
> >> +               shadow_start = (int *)((unsigned long)kasan_mem_to_shadow(start) & ~(PAGE_SIZE - 1));
> >> +               shadow_end =  (int *)kasan_mem_to_shadow(end);
> > 
> > shadow_start and shadow_end can refer to the same page as in the
> > previous iteration. For instance, I have these two regions
> > 
> >   0x00006e090000-0x00006e0adfff [Conventional Memory|   |  |  |  |  |
> > |   |WB|WT|WC|UC]
> >   0x00006e0ae000-0x00006e0affff [Loader Data        |   |  |  |  |  |
> > |   |WB|WT|WC|UC]
> > 
> > which are covered by different memblocks since the second one is
> > marked as MEMBLOCK_NOMAP, due to the fact that it contains the UEFI
> > memory map.
> > 
> > I get the following output
> > 
> > kasan: screwed shadow mapping 23575, 23573
> > 
> > which I think is simply a result from the fact the shadow_start refers
> > to the same page as in the previous iteration(s)
> > 
> 
> You are right. 
> So we should write 'shadow_start' instead of 'i'.

FWIW with the below patch I don't see any "screwed shadow mapping"
warnings on my board, and still later see a tonne of KASAN splats in the
scheduler.

Mark.

> ---
>  arch/arm64/mm/kasan_init.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
> index cf038c7..ee035c2 100644
> --- a/arch/arm64/mm/kasan_init.c
> +++ b/arch/arm64/mm/kasan_init.c
> @@ -117,6 +117,55 @@ static void __init cpu_set_ttbr1(unsigned long ttbr1)
>  	: "r" (ttbr1));
>  }
>  
> +static void verify_shadow(void)
> +{
> +	struct memblock_region *reg;
> +
> +	for_each_memblock(memory, reg) {
> +		void *start = (void *)__phys_to_virt(reg->base);
> +		void *end = (void *)__phys_to_virt(reg->base + reg->size);
> +		unsigned long *shadow_start, *shadow_end;
> +
> +		if (start >= end)
> +			break;
> +		shadow_start = (unsigned long *)kasan_mem_to_shadow(start);
> +		shadow_end =  (unsigned long *)kasan_mem_to_shadow(end);
> +		for (; shadow_start < shadow_end; shadow_start += PAGE_SIZE/sizeof(unsigned long)) {
> +			*shadow_start = (unsigned long)shadow_start;
> +		}
> +	}
> +
> +	for_each_memblock(memory, reg) {
> +		void *start = (void *)__phys_to_virt(reg->base);
> +		void *end = (void *)__phys_to_virt(reg->base + reg->size);
> +		unsigned long *shadow_start, *shadow_end;
> +
> +		if (start >= end)
> +			break;
> +		shadow_start = (unsigned long *)kasan_mem_to_shadow(start);
> +		shadow_end =  (unsigned long *)kasan_mem_to_shadow(end);
> +		for (; shadow_start < shadow_end; shadow_start += PAGE_SIZE/sizeof(unsigned long)) {
> +			if (*shadow_start != (unsigned long)shadow_start) {
> +				pr_err("screwed shadow mapping %lx, %lx\n", *shadow_start, (unsigned long)shadow_start);
> +				goto clear;
> +			}
> +		}
> +	}
> +clear:
> +	for_each_memblock(memory, reg) {
> +		void *start = (void *)__phys_to_virt(reg->base);
> +		void *end = (void *)__phys_to_virt(reg->base + reg->size);
> +		unsigned long shadow_start, shadow_end;
> +
> +		if (start >= end)
> +			break;
> +		shadow_start =  (unsigned long)kasan_mem_to_shadow(start);
> +		shadow_end =  (unsigned long)kasan_mem_to_shadow(end);
> +		memset((void *)shadow_start, 0, shadow_end - shadow_start);
> +	}
> +
> +}
> +
>  void __init kasan_init(void)
>  {
>  	struct memblock_region *reg;
> @@ -159,6 +208,8 @@ void __init kasan_init(void)
>  	cpu_set_ttbr1(__pa(swapper_pg_dir));
>  	flush_tlb_all();
>  
> +	verify_shadow();
> +
>  	/* At this point kasan is fully initialized. Enable error messages */
>  	init_task.kasan_depth = 0;
>  	pr_info("KernelAddressSanitizer initialized\n");
> -- 
> 
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-16 16:42                           ` Mark Rutland
@ 2016-02-17  9:15                             ` Andrey Ryabinin
  2016-02-17 10:10                               ` James Morse
  2016-02-17 10:18                               ` Catalin Marinas
  0 siblings, 2 replies; 78+ messages in thread
From: Andrey Ryabinin @ 2016-02-17  9:15 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/16/2016 07:42 PM, Mark Rutland wrote:
> On Tue, Feb 16, 2016 at 06:36:36PM +0300, Andrey Ryabinin wrote:
>>
>> On 02/16/2016 06:17 PM, Ard Biesheuvel wrote:
>>> On 16 February 2016 at 13:59, Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
>>>> +static void verify_shadow(void)
>>>> +{
>>>> +       struct memblock_region *reg;
>>>> +       int i = 0;
>>>> +
>>>> +       for_each_memblock(memory, reg) {
>>>> +               void *start = (void *)__phys_to_virt(reg->base);
>>>> +               void *end = (void *)__phys_to_virt(reg->base + reg->size);
>>>> +               int *shadow_start, *shadow_end;
>>>> +
>>>> +               if (start >= end)
>>>> +                       break;
>>>> +               shadow_start = (int *)((unsigned long)kasan_mem_to_shadow(start) & ~(PAGE_SIZE - 1));
>>>> +               shadow_end =  (int *)kasan_mem_to_shadow(end);
>>>
>>> shadow_start and shadow_end can refer to the same page as in the
>>> previous iteration. For instance, I have these two regions
>>>
>>>   0x00006e090000-0x00006e0adfff [Conventional Memory|   |  |  |  |  |
>>> |   |WB|WT|WC|UC]
>>>   0x00006e0ae000-0x00006e0affff [Loader Data        |   |  |  |  |  |
>>> |   |WB|WT|WC|UC]
>>>
>>> which are covered by different memblocks since the second one is
>>> marked as MEMBLOCK_NOMAP, due to the fact that it contains the UEFI
>>> memory map.
>>>
>>> I get the following output
>>>
>>> kasan: screwed shadow mapping 23575, 23573
>>>
>>> which I think is simply a result from the fact the shadow_start refers
>>> to the same page as in the previous iteration(s)
>>>
>>
>> You are right. 
>> So we should write 'shadow_start' instead of 'i'.
> 
> FWIW with the below patch I don't see any "screwed shadow mapping"
> warnings on my board, and still later see a tonne of KASAN splats in the
> scheduler.
> 

It is possible that I missed something, but I think it means that shadow is alright.

I wonder whether this happens on 4.4. If not, than something in 4.5-rc1 caused this, and the obvious suspect
here is irq stack.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-17  9:15                             ` Andrey Ryabinin
@ 2016-02-17 10:10                               ` James Morse
  2016-02-17 10:19                                 ` Catalin Marinas
  2016-02-17 10:18                               ` Catalin Marinas
  1 sibling, 1 reply; 78+ messages in thread
From: James Morse @ 2016-02-17 10:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/02/16 09:15, Andrey Ryabinin wrote:
> On 02/16/2016 07:42 PM, Mark Rutland wrote:
>> On Tue, Feb 16, 2016 at 06:36:36PM +0300, Andrey Ryabinin wrote:
>>> You are right. 
>>> So we should write 'shadow_start' instead of 'i'.
>>
>> FWIW with the below patch I don't see any "screwed shadow mapping"
>> warnings on my board, and still later see a tonne of KASAN splats in the
>> scheduler.
>>
> 
> It is possible that I missed something, but I think it means that shadow is alright.
> 
> I wonder whether this happens on 4.4. If not, than something in 4.5-rc1 caused this, and the obvious suspect
> here is irq stack.

This quick hack will prevent ever switching to the irq stack:

---------------------------%<---------------------------
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 1f7f5a2b61bf..83ae736429b6 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -188,7 +188,7 @@ alternative_endif
         */
        and     x25, x19, #~(THREAD_SIZE - 1)
        cmp     x25, tsk
-       b.ne    9998f
+       b       9998f

        this_cpu_ptr irq_stack, x25, x26
        mov     x26, #IRQ_STACK_START_SP
---------------------------%<---------------------------


James

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-17  9:15                             ` Andrey Ryabinin
  2016-02-17 10:10                               ` James Morse
@ 2016-02-17 10:18                               ` Catalin Marinas
  2016-02-17 10:48                                 ` Mark Rutland
  1 sibling, 1 reply; 78+ messages in thread
From: Catalin Marinas @ 2016-02-17 10:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 17, 2016 at 12:15:15PM +0300, Andrey Ryabinin wrote:
> On 02/16/2016 07:42 PM, Mark Rutland wrote:
> > FWIW with the below patch I don't see any "screwed shadow mapping"
> > warnings on my board, and still later see a tonne of KASAN splats in the
> > scheduler.
> 
> It is possible that I missed something, but I think it means that
> shadow is alright.
> 
> I wonder whether this happens on 4.4. If not, than something in
> 4.5-rc1 caused this, and the obvious suspect here is irq stack.

It doesn't seem to happen on 4.4, it starts somewhere before 4.5-rc1. I
tested the arm64 branch that we pushed upstream with the irq stack
changes but it didn't trigger. It could as well be a combination of
multiple change or just something else.

We'll do some bisecting, though it's not that fun going through the
merging window commits, especially since many are based on 4.4-rcX (we
could try on merges only first, there are fewer).

-- 
Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-17 10:10                               ` James Morse
@ 2016-02-17 10:19                                 ` Catalin Marinas
  2016-02-17 10:36                                   ` Catalin Marinas
  0 siblings, 1 reply; 78+ messages in thread
From: Catalin Marinas @ 2016-02-17 10:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 17, 2016 at 10:10:00AM +0000, James Morse wrote:
> On 17/02/16 09:15, Andrey Ryabinin wrote:
> > On 02/16/2016 07:42 PM, Mark Rutland wrote:
> >> On Tue, Feb 16, 2016 at 06:36:36PM +0300, Andrey Ryabinin wrote:
> >>> You are right. 
> >>> So we should write 'shadow_start' instead of 'i'.
> >>
> >> FWIW with the below patch I don't see any "screwed shadow mapping"
> >> warnings on my board, and still later see a tonne of KASAN splats in the
> >> scheduler.
> >>
> > 
> > It is possible that I missed something, but I think it means that shadow is alright.
> > 
> > I wonder whether this happens on 4.4. If not, than something in 4.5-rc1 caused this, and the obvious suspect
> > here is irq stack.
> 
> This quick hack will prevent ever switching to the irq stack:
> 
> ---------------------------%<---------------------------
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 1f7f5a2b61bf..83ae736429b6 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -188,7 +188,7 @@ alternative_endif
>          */
>         and     x25, x19, #~(THREAD_SIZE - 1)
>         cmp     x25, tsk
> -       b.ne    9998f
> +       b       9998f
> 
>         this_cpu_ptr irq_stack, x25, x26
>         mov     x26, #IRQ_STACK_START_SP

Thanks James. I'll give it a try.

-- 
Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-17 10:19                                 ` Catalin Marinas
@ 2016-02-17 10:36                                   ` Catalin Marinas
  0 siblings, 0 replies; 78+ messages in thread
From: Catalin Marinas @ 2016-02-17 10:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 17, 2016 at 10:19:41AM +0000, Catalin Marinas wrote:
> On Wed, Feb 17, 2016 at 10:10:00AM +0000, James Morse wrote:
> > On 17/02/16 09:15, Andrey Ryabinin wrote:
> > > On 02/16/2016 07:42 PM, Mark Rutland wrote:
> > >> On Tue, Feb 16, 2016 at 06:36:36PM +0300, Andrey Ryabinin wrote:
> > >>> You are right. 
> > >>> So we should write 'shadow_start' instead of 'i'.
> > >>
> > >> FWIW with the below patch I don't see any "screwed shadow mapping"
> > >> warnings on my board, and still later see a tonne of KASAN splats in the
> > >> scheduler.
> > >>
> > > 
> > > It is possible that I missed something, but I think it means that shadow is alright.
> > > 
> > > I wonder whether this happens on 4.4. If not, than something in 4.5-rc1 caused this, and the obvious suspect
> > > here is irq stack.
> > 
> > This quick hack will prevent ever switching to the irq stack:
> > 
> > ---------------------------%<---------------------------
> > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> > index 1f7f5a2b61bf..83ae736429b6 100644
> > --- a/arch/arm64/kernel/entry.S
> > +++ b/arch/arm64/kernel/entry.S
> > @@ -188,7 +188,7 @@ alternative_endif
> >          */
> >         and     x25, x19, #~(THREAD_SIZE - 1)
> >         cmp     x25, tsk
> > -       b.ne    9998f
> > +       b       9998f
> > 
> >         this_cpu_ptr irq_stack, x25, x26
> >         mov     x26, #IRQ_STACK_START_SP
> 
> Thanks James. I'll give it a try.

And it didn't make any difference (on top of 4.5-rc1), still the same
KASAN warnings.

-- 
Catalin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-17 10:18                               ` Catalin Marinas
@ 2016-02-17 10:48                                 ` Mark Rutland
  0 siblings, 0 replies; 78+ messages in thread
From: Mark Rutland @ 2016-02-17 10:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 17, 2016 at 10:18:03AM +0000, Catalin Marinas wrote:
> On Wed, Feb 17, 2016 at 12:15:15PM +0300, Andrey Ryabinin wrote:
> > On 02/16/2016 07:42 PM, Mark Rutland wrote:
> > > FWIW with the below patch I don't see any "screwed shadow mapping"
> > > warnings on my board, and still later see a tonne of KASAN splats in the
> > > scheduler.
> > 
> > It is possible that I missed something, but I think it means that
> > shadow is alright.
> > 
> > I wonder whether this happens on 4.4. If not, than something in
> > 4.5-rc1 caused this, and the obvious suspect here is irq stack.
> 
> It doesn't seem to happen on 4.4, it starts somewhere before 4.5-rc1. I
> tested the arm64 branch that we pushed upstream with the irq stack
> changes but it didn't trigger. It could as well be a combination of
> multiple change or just something else.
> 
> We'll do some bisecting, though it's not that fun going through the
> merging window commits, especially since many are based on 4.4-rcX (we
> could try on merges only first, there are fewer).

FWIW I did that bisect last night, and that fingered commit
f11aef69b235bc30 ("Merge branch 'pm-cpuidle'") as the first bad commit.

Either there's some subtle interaction, or it's less reproducible than I
thought and the "good" commits are only "possibly-good". I'll try to dig
into that.

FWIW, my bisect log is:

git bisect start
# bad: [92e963f50fc74041b5e9e744c330dca48e04f08d] Linux 4.5-rc1
git bisect bad 92e963f50fc74041b5e9e744c330dca48e04f08d
# good: [afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc] Linux 4.4
git bisect good afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc
# good: [4dffbfc48d65e5d8157a634fd670065d237a9377] arm64/efi: mark UEFI reserved regions as MEMBLOCK_NOMAP
git bisect good 4dffbfc48d65e5d8157a634fd670065d237a9377
# good: [1289ace5b4f70f1e68ce785735b82c7e483de863] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect good 1289ace5b4f70f1e68ce785735b82c7e483de863
# good: [984065055e6e39f8dd812529e11922374bd39352] Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
git bisect good 984065055e6e39f8dd812529e11922374bd39352
# good: [6d1c244803f2c013fb9c31b0904c01f1830b73ab] Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 6d1c244803f2c013fb9c31b0904c01f1830b73ab
# bad: [0a13daedf7ffc71b0c374a036355da7fddb20d6d] Merge branch 'for-4.5/lightnvm' of git://git.kernel.dk/linux-block
git bisect bad 0a13daedf7ffc71b0c374a036355da7fddb20d6d
# bad: [278e5acae1321978686e85ca92906054a36aa19b] Merge tag 'for-4.5' of git://git.osdn.jp/gitroot/uclinux-h8/linux
git bisect bad 278e5acae1321978686e85ca92906054a36aa19b
# good: [f9cd69fe5eb6347b4de56458d0378bc0fa44bce9] Merge tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good f9cd69fe5eb6347b4de56458d0378bc0fa44bce9
# good: [9638685e32af961943b679fcb72d4ddd458eb18f] Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 9638685e32af961943b679fcb72d4ddd458eb18f
# good: [fa8bb4518771b19460a318fbab3eb36c81db3a50] Merge branch 'pm-devfreq'
git bisect good fa8bb4518771b19460a318fbab3eb36c81db3a50
# bad: [30f05309bde49295e02e45c7e615f73aa4e0ccc2] Merge tag 'pm+acpi-4.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect bad 30f05309bde49295e02e45c7e615f73aa4e0ccc2
# good: [5bb1729cbdfbe974ad6385be94b14afbac97e19f] cpuidle: menu: Avoid pointless checks in menu_select()
git bisect good 5bb1729cbdfbe974ad6385be94b14afbac97e19f
# bad: [db2b52f75250c88ee3c6ba3d91bef38f3f1a1e8c] Merge branch 'pm-tools'
git bisect bad db2b52f75250c88ee3c6ba3d91bef38f3f1a1e8c
# good: [38cb76a307821f76c7f9dff7449f73aeb014d5cc] cpupower: Fix build error in cpufreq-info
git bisect good 38cb76a307821f76c7f9dff7449f73aeb014d5cc
# bad: [f11aef69b235bc30c323776d75ac23b43aac45bb] Merge branch 'pm-cpuidle'
git bisect bad f11aef69b235bc30c323776d75ac23b43aac45bb
# first bad commit: [f11aef69b235bc30c323776d75ac23b43aac45bb] Merge branch 'pm-cpuidle'

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-16 12:59                     ` Andrey Ryabinin
  2016-02-16 14:12                       ` Mark Rutland
  2016-02-16 15:17                       ` Ard Biesheuvel
@ 2016-02-17 14:39                       ` Mark Rutland
  2016-02-17 16:31                         ` Andrey Ryabinin
  2016-02-17 17:01                         ` KASAN issues with idle / hotplug area (was: Re: [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area) Mark Rutland
  2 siblings, 2 replies; 78+ messages in thread
From: Mark Rutland @ 2016-02-17 14:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 16, 2016 at 03:59:09PM +0300, Andrey Ryabinin wrote:
> Actually, the first report is a bit more useful. It shows that shadow memory was corrupted:
> 
>   ffffffc93665bc00: f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 00 f1 f1
> > ffffffc93665bc80: f1 f1 00 00 00 00 f3 f3 00 f4 f4 f4 f3 f3 f3 f3
>                       ^
> F1 - left redzone, it indicates start of stack frame
> F3 - right redzone, it should be the end of stack frame.
> 
> But here we have the second set of F1s without F3s which should close the first set of F1s.
> Also those two F3s in the middle cannot be right.
> 
> So shadow is corrupted.
> Some hypotheses:

> 2) Shadow memory wasn't cleared. GCC poison memory on function entrance and unpoisons it before return.
>      If we use some tricky way to exit from function this could cause false-positives like that.
>      E.g. some hand-written assembly return code.

I think this is what's happenening, at least for the idle case.

A second attempt at bisecting led me to commit e679660dbb8347f2 ("ARM:
8481/2: drivers: psci: replace psci firmware calls"). Reverting that
makes v4.5-rc1 boot without KASAN splats.

That patch turned __invoke_psci_fn_{smc,hvc} into (ASAN-instrumented) C
functions. Prior to that commit, __invoke_psci_fn_{smc,hvc} were
pure assembly functions which used no stack.

When we go down for idle, in __cpu_suspend_enter we stash some context
to the stack (in assembly). The CPU may return from a cold state via
cpu_resume, where we restore context from the stack.

However, after storing the context we call psci_suspend_finisher, which
calls psci_cpu_suspend, which calls invoke_psci_fn_*. As
psci_cpu_suspend and invoke_psci_fn_* are instrumented, they poison
memory on function entrance, but we never perform the unpoisoning.

That was always the case for psci_suspend_finisher, so there was a
latent issue that we were somehow avoiding. Perhaps we got luck with
stack layout and never hit the poison.

I'm not sure how we fix that, as invoke_psci_fn_* may or may not return
for arbitrary reasons (e.g. a CPU_SUSPEND_CALL may or may not return
depending on whether an interrupt comes in at the right time).

Perhaps the simplest option is to not instrument invoke_psci_fn_* and
psci_suspend_finisher. Do we have a per-function annotation to avoid
KASAN instrumentation, like notrace? I need to investigate, but we may
also need notrace for similar reasons.

Andrey, on a tangential note, what do we do around hotplug? I assume
that we must unpooison the shadow region for the stack of a dead CPU,
but I wasn't able to figure out where we do that. Hopefuly we're not
just getting lucky?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-17 14:39                       ` Mark Rutland
@ 2016-02-17 16:31                         ` Andrey Ryabinin
  2016-02-17 19:35                           ` Mark Rutland
  2016-02-17 17:01                         ` KASAN issues with idle / hotplug area (was: Re: [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area) Mark Rutland
  1 sibling, 1 reply; 78+ messages in thread
From: Andrey Ryabinin @ 2016-02-17 16:31 UTC (permalink / raw)
  To: linux-arm-kernel



On 02/17/2016 05:39 PM, Mark Rutland wrote:
> On Tue, Feb 16, 2016 at 03:59:09PM +0300, Andrey Ryabinin wrote:
>> Actually, the first report is a bit more useful. It shows that shadow memory was corrupted:
>>
>>   ffffffc93665bc00: f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 00 f1 f1
>>> ffffffc93665bc80: f1 f1 00 00 00 00 f3 f3 00 f4 f4 f4 f3 f3 f3 f3
>>                       ^
>> F1 - left redzone, it indicates start of stack frame
>> F3 - right redzone, it should be the end of stack frame.
>>
>> But here we have the second set of F1s without F3s which should close the first set of F1s.
>> Also those two F3s in the middle cannot be right.
>>
>> So shadow is corrupted.
>> Some hypotheses:
> 
>> 2) Shadow memory wasn't cleared. GCC poison memory on function entrance and unpoisons it before return.
>>      If we use some tricky way to exit from function this could cause false-positives like that.
>>      E.g. some hand-written assembly return code.
> 
> I think this is what's happenening, at least for the idle case.
> 
> A second attempt at bisecting led me to commit e679660dbb8347f2 ("ARM:
> 8481/2: drivers: psci: replace psci firmware calls"). Reverting that
> makes v4.5-rc1 boot without KASAN splats.
> 
> That patch turned __invoke_psci_fn_{smc,hvc} into (ASAN-instrumented) C
> functions. Prior to that commit, __invoke_psci_fn_{smc,hvc} were
> pure assembly functions which used no stack.
> 
> When we go down for idle, in __cpu_suspend_enter we stash some context
> to the stack (in assembly). The CPU may return from a cold state via
> cpu_resume, where we restore context from the stack.
> 
> However, after storing the context we call psci_suspend_finisher, which
> calls psci_cpu_suspend, which calls invoke_psci_fn_*. As
> psci_cpu_suspend and invoke_psci_fn_* are instrumented, they poison
> memory on function entrance, but we never perform the unpoisoning.
> 
> That was always the case for psci_suspend_finisher, so there was a
> latent issue that we were somehow avoiding. Perhaps we got luck with
> stack layout and never hit the poison.
> 
> I'm not sure how we fix that, as invoke_psci_fn_* may or may not return
> for arbitrary reasons (e.g. a CPU_SUSPEND_CALL may or may not return
> depending on whether an interrupt comes in at the right time).
> 
> Perhaps the simplest option is to not instrument invoke_psci_fn_* and
> psci_suspend_finisher. Do we have a per-function annotation to avoid
> KASAN instrumentation, like notrace? I need to investigate, but we may
> also need notrace for similar reasons.

include/linux/compiler-gcc.h:
/*
* Tell the compiler that address safety instrumentation (KASAN)
* should not be applied to that function.
* Conflicts with inlining: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368
*/
#define __no_sanitize_address __attribute__((no_sanitize_address))

> 
> Andrey, on a tangential note, what do we do around hotplug? I assume
> that we must unpooison the shadow region for the stack of a dead CPU,
> but I wasn't able to figure out where we do that. Hopefuly we're not
> just getting lucky?
> 

We do nothing about it. AFAIU we need to clear swapper's stack, somewhere in secondary_start_kernel() perhaps.



> Thanks,
> Mark.
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* KASAN issues with idle / hotplug area (was: Re: [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area)
  2016-02-17 14:39                       ` Mark Rutland
  2016-02-17 16:31                         ` Andrey Ryabinin
@ 2016-02-17 17:01                         ` Mark Rutland
  2016-02-17 17:56                           ` Mark Rutland
  1 sibling, 1 reply; 78+ messages in thread
From: Mark Rutland @ 2016-02-17 17:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 17, 2016 at 02:39:51PM +0000, Mark Rutland wrote:
> When we go down for idle, in __cpu_suspend_enter we stash some context
> to the stack (in assembly). The CPU may return from a cold state via
> cpu_resume, where we restore context from the stack.
> 
> However, after storing the context we call psci_suspend_finisher, which
> calls psci_cpu_suspend, which calls invoke_psci_fn_*. As
> psci_cpu_suspend and invoke_psci_fn_* are instrumented, they poison
> memory on function entrance, but we never perform the unpoisoning.
> 
> That was always the case for psci_suspend_finisher, so there was a
> latent issue that we were somehow avoiding. Perhaps we got luck with
> stack layout and never hit the poison.
> 
> I'm not sure how we fix that, as invoke_psci_fn_* may or may not return
> for arbitrary reasons (e.g. a CPU_SUSPEND_CALL may or may not return
> depending on whether an interrupt comes in at the right time).
> 
> Perhaps the simplest option is to not instrument invoke_psci_fn_* and
> psci_suspend_finisher. Do we have a per-function annotation to avoid
> KASAN instrumentation, like notrace? I need to investigate, but we may
> also need notrace for similar reasons.

I found __no_sanitize_address.

As an aside, could we rename that to nokasan? That would match the style
of notrace, is just as clear, and would make it far easier to write
consistent legible function prototypes...

Otherwise, I came up with the patch below, per the reasoning above.

It _changes_ the KASAN splats (I see errors in tick_program_event rather
than find_busiest_group), but doesn't seem to get rid of them. I'm not
sure if I've missed something, or if we also have another latent issue.

Ideas?

Mark.

---->8----
>From 8f7ae44d8f8862f5300483d45617b5bd05fc652f Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.rutland@arm.com>
Date: Wed, 17 Feb 2016 15:38:22 +0000
Subject: [PATCH] arm64/psci: avoid KASAN splats with idle

When a CPU goes into a deep idle state, we store CPU context in
__cpu_suspend_enter, then call psci_suspend_finisher to invoke the
firmware. If we entered a deep idle state, we do not return directly,
and instead start cold, restoring state in cpu_resume.

Thus we may execute the prologue and body of psci_suspend_finisher and
the PSCI invocation function, but not their epilogue. When using KASAN
this means that we poison a region of shadow memory, but never unpoison
it. After we resume, subsequent stack accesses may hit the stale poison
values, leading to false positives from KASAN.

To avoid this, we must ensure that functions called after the context
save are not instrumented, and do not posion the shadow region, by
annotating them with __no_sanitize_address. As common inlines they may
call are not similarly annotated, and the compiler refuses to allow
function attribute mismatches, we must also avoid calls to such
functions.

ARM is not affected, as it does not support KASAN. When CONFIG_KASAN is
not selected, __no_sanitize_address expands to nothing, so the
annotation should not be harmful.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
---
 arch/arm64/kernel/psci.c | 14 ++++++++------
 drivers/firmware/psci.c  |  3 +++
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
index f67f35b..8324ce8 100644
--- a/arch/arm64/kernel/psci.c
+++ b/arch/arm64/kernel/psci.c
@@ -32,12 +32,16 @@
 
 static DEFINE_PER_CPU_READ_MOSTLY(u32 *, psci_power_state);
 
+static phys_addr_t cpu_resume_phys;
+
 static int __maybe_unused cpu_psci_cpu_init_idle(unsigned int cpu)
 {
 	int i, ret, count = 0;
 	u32 *psci_states;
 	struct device_node *state_node, *cpu_node;
 
+	cpu_resume_phys = virt_to_phys(cpu_resume);
+
 	cpu_node = of_get_cpu_node(cpu, NULL);
 	if (!cpu_node)
 		return -ENODEV;
@@ -178,12 +182,10 @@ static int cpu_psci_cpu_kill(unsigned int cpu)
 }
 #endif
 
-static int psci_suspend_finisher(unsigned long index)
+__no_sanitize_address
+static int psci_suspend_finisher(unsigned long state)
 {
-	u32 *state = __this_cpu_read(psci_power_state);
-
-	return psci_ops.cpu_suspend(state[index - 1],
-				    virt_to_phys(cpu_resume));
+	return psci_ops.cpu_suspend(state, cpu_resume_phys);
 }
 
 static int __maybe_unused cpu_psci_cpu_suspend(unsigned long index)
@@ -200,7 +202,7 @@ static int __maybe_unused cpu_psci_cpu_suspend(unsigned long index)
 	if (!psci_power_state_loses_context(state[index - 1]))
 		ret = psci_ops.cpu_suspend(state[index - 1], 0);
 	else
-		ret = cpu_suspend(index, psci_suspend_finisher);
+		ret = cpu_suspend(state[index - 1], psci_suspend_finisher);
 
 	return ret;
 }
diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c
index f25cd79..e4e8dc1 100644
--- a/drivers/firmware/psci.c
+++ b/drivers/firmware/psci.c
@@ -106,6 +106,7 @@ bool psci_power_state_is_valid(u32 state)
 	return !(state & ~valid_mask);
 }
 
+__no_sanitize_address
 static unsigned long __invoke_psci_fn_hvc(unsigned long function_id,
 			unsigned long arg0, unsigned long arg1,
 			unsigned long arg2)
@@ -116,6 +117,7 @@ static unsigned long __invoke_psci_fn_hvc(unsigned long function_id,
 	return res.a0;
 }
 
+__no_sanitize_address
 static unsigned long __invoke_psci_fn_smc(unsigned long function_id,
 			unsigned long arg0, unsigned long arg1,
 			unsigned long arg2)
@@ -148,6 +150,7 @@ static u32 psci_get_version(void)
 	return invoke_psci_fn(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
 }
 
+__no_sanitize_address
 static int psci_cpu_suspend(u32 state, unsigned long entry_point)
 {
 	int err;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* KASAN issues with idle / hotplug area (was: Re: [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area)
  2016-02-17 17:01                         ` KASAN issues with idle / hotplug area (was: Re: [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area) Mark Rutland
@ 2016-02-17 17:56                           ` Mark Rutland
  2016-02-17 19:16                             ` Mark Rutland
  0 siblings, 1 reply; 78+ messages in thread
From: Mark Rutland @ 2016-02-17 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 17, 2016 at 05:01:11PM +0000, Mark Rutland wrote:
> On Wed, Feb 17, 2016 at 02:39:51PM +0000, Mark Rutland wrote:
> > Perhaps the simplest option is to not instrument invoke_psci_fn_* and
> > psci_suspend_finisher. Do we have a per-function annotation to avoid
> > KASAN instrumentation, like notrace? I need to investigate, but we may
> > also need notrace for similar reasons.
>
> I came up with the patch below, per the reasoning above.
> 
> It _changes_ the KASAN splats (I see errors in tick_program_event rather
> than find_busiest_group), but doesn't seem to get rid of them. I'm not
> sure if I've missed something, or if we also have another latent issue.
> 
> Ideas?

I'd missed annotating __cpu_suspend_save. I've fixed that up locally
(along with s/virt_to_phys/__virt_to_phys due to the inlining issue).

I'm still missing somehing; I'm getting KASAN warnings in find_busiest_group
again, and the shadow looks like it's corrupt (the second batch of f3 /
KASAN_STACK_RIGHT don't have a matching f1 / KASAN_STACK_LEFT):

[   13.138791] Memory state around the buggy address:
[   13.143624]  ffffffc936a7fb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   13.150929]  ffffffc936a7fc00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00
[   13.158232] >ffffffc936a7fc80: f3 f3 f3 f3 00 00 00 00 00 f4 f4 f4 f3 f3 f3 f3
[   13.165530]                       ^
[   13.169066]  ffffffc936a7fd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   13.176369]  ffffffc936a7fd80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1

This is turning into a whack-a-mole game...

Mark.

> ---->8----
> From 8f7ae44d8f8862f5300483d45617b5bd05fc652f Mon Sep 17 00:00:00 2001
> From: Mark Rutland <mark.rutland@arm.com>
> Date: Wed, 17 Feb 2016 15:38:22 +0000
> Subject: [PATCH] arm64/psci: avoid KASAN splats with idle
> 
> When a CPU goes into a deep idle state, we store CPU context in
> __cpu_suspend_enter, then call psci_suspend_finisher to invoke the
> firmware. If we entered a deep idle state, we do not return directly,
> and instead start cold, restoring state in cpu_resume.
> 
> Thus we may execute the prologue and body of psci_suspend_finisher and
> the PSCI invocation function, but not their epilogue. When using KASAN
> this means that we poison a region of shadow memory, but never unpoison
> it. After we resume, subsequent stack accesses may hit the stale poison
> values, leading to false positives from KASAN.
> 
> To avoid this, we must ensure that functions called after the context
> save are not instrumented, and do not posion the shadow region, by
> annotating them with __no_sanitize_address. As common inlines they may
> call are not similarly annotated, and the compiler refuses to allow
> function attribute mismatches, we must also avoid calls to such
> functions.
> 
> ARM is not affected, as it does not support KASAN. When CONFIG_KASAN is
> not selected, __no_sanitize_address expands to nothing, so the
> annotation should not be harmful.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> ---
>  arch/arm64/kernel/psci.c | 14 ++++++++------
>  drivers/firmware/psci.c  |  3 +++
>  2 files changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
> index f67f35b..8324ce8 100644
> --- a/arch/arm64/kernel/psci.c
> +++ b/arch/arm64/kernel/psci.c
> @@ -32,12 +32,16 @@
>  
>  static DEFINE_PER_CPU_READ_MOSTLY(u32 *, psci_power_state);
>  
> +static phys_addr_t cpu_resume_phys;
> +
>  static int __maybe_unused cpu_psci_cpu_init_idle(unsigned int cpu)
>  {
>  	int i, ret, count = 0;
>  	u32 *psci_states;
>  	struct device_node *state_node, *cpu_node;
>  
> +	cpu_resume_phys = virt_to_phys(cpu_resume);
> +
>  	cpu_node = of_get_cpu_node(cpu, NULL);
>  	if (!cpu_node)
>  		return -ENODEV;
> @@ -178,12 +182,10 @@ static int cpu_psci_cpu_kill(unsigned int cpu)
>  }
>  #endif
>  
> -static int psci_suspend_finisher(unsigned long index)
> +__no_sanitize_address
> +static int psci_suspend_finisher(unsigned long state)
>  {
> -	u32 *state = __this_cpu_read(psci_power_state);
> -
> -	return psci_ops.cpu_suspend(state[index - 1],
> -				    virt_to_phys(cpu_resume));
> +	return psci_ops.cpu_suspend(state, cpu_resume_phys);
>  }
>  
>  static int __maybe_unused cpu_psci_cpu_suspend(unsigned long index)
> @@ -200,7 +202,7 @@ static int __maybe_unused cpu_psci_cpu_suspend(unsigned long index)
>  	if (!psci_power_state_loses_context(state[index - 1]))
>  		ret = psci_ops.cpu_suspend(state[index - 1], 0);
>  	else
> -		ret = cpu_suspend(index, psci_suspend_finisher);
> +		ret = cpu_suspend(state[index - 1], psci_suspend_finisher);
>  
>  	return ret;
>  }
> diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c
> index f25cd79..e4e8dc1 100644
> --- a/drivers/firmware/psci.c
> +++ b/drivers/firmware/psci.c
> @@ -106,6 +106,7 @@ bool psci_power_state_is_valid(u32 state)
>  	return !(state & ~valid_mask);
>  }
>  
> +__no_sanitize_address
>  static unsigned long __invoke_psci_fn_hvc(unsigned long function_id,
>  			unsigned long arg0, unsigned long arg1,
>  			unsigned long arg2)
> @@ -116,6 +117,7 @@ static unsigned long __invoke_psci_fn_hvc(unsigned long function_id,
>  	return res.a0;
>  }
>  
> +__no_sanitize_address
>  static unsigned long __invoke_psci_fn_smc(unsigned long function_id,
>  			unsigned long arg0, unsigned long arg1,
>  			unsigned long arg2)
> @@ -148,6 +150,7 @@ static u32 psci_get_version(void)
>  	return invoke_psci_fn(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
>  }
>  
> +__no_sanitize_address
>  static int psci_cpu_suspend(u32 state, unsigned long entry_point)
>  {
>  	int err;
> -- 
> 1.9.1
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* KASAN issues with idle / hotplug area (was: Re: [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area)
  2016-02-17 17:56                           ` Mark Rutland
@ 2016-02-17 19:16                             ` Mark Rutland
  2016-02-18  8:06                               ` Ard Biesheuvel
  2016-02-18  8:22                               ` KASAN issues with idle / hotplug area Andrey Ryabinin
  0 siblings, 2 replies; 78+ messages in thread
From: Mark Rutland @ 2016-02-17 19:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 17, 2016 at 05:56:56PM +0000, Mark Rutland wrote:
> On Wed, Feb 17, 2016 at 05:01:11PM +0000, Mark Rutland wrote:
> > On Wed, Feb 17, 2016 at 02:39:51PM +0000, Mark Rutland wrote:
> > > Perhaps the simplest option is to not instrument invoke_psci_fn_* and
> > > psci_suspend_finisher. Do we have a per-function annotation to avoid
> > > KASAN instrumentation, like notrace? I need to investigate, but we may
> > > also need notrace for similar reasons.
> >
> > I came up with the patch below, per the reasoning above.
> > 
> > It _changes_ the KASAN splats (I see errors in tick_program_event rather
> > than find_busiest_group), but doesn't seem to get rid of them. I'm not
> > sure if I've missed something, or if we also have another latent issue.
> > 
> > Ideas?
> 
> I'd missed annotating __cpu_suspend_save. I've fixed that up locally
> (along with s/virt_to_phys/__virt_to_phys due to the inlining issue).

Thinking about it more, I shouldn't have to annotate __cpu_suspend_save,
as it returns (and hence should have cleaned up after itself).

Looking at the assembly, functions seem to get instrumented regardless
of the __no_sanitize_address annotation. The assembly of
__invoke_psci_fn_{smc,hvc} look identical, even if one has the
annotation and one does not.

In the case below, it looks like __invoke_psci_fn_hvc is storing to the
shadow area even though it's anotated with __no_sanitize_address.  Note
that the adrp symbol resolution is bogus; psci_to_linux_errno happens to
be at offset 0 in the as-yet unlinked psci.o object.

0000000000000420 <__invoke_psci_fn_hvc>:
 420:   d10283ff        sub     sp, sp, #0xa0
 424:   90000004        adrp    x4, 0 <psci_to_linux_errno>
 428:   91000084        add     x4, x4, #0x0
 42c:   90000005        adrp    x5, 0 <psci_to_linux_errno>
 430:   910000a5        add     x5, x5, #0x0
 434:   d2800007        mov     x7, #0x0                        // #0
 438:   a9017bfd        stp     x29, x30, [sp,#16]
 43c:   910043fd        add     x29, sp, #0x10
 440:   d2800006        mov     x6, #0x0                        // #0
 444:   a90253f3        stp     x19, x20, [sp,#32]
 448:   9100c3b3        add     x19, x29, #0x30
 44c:   d2dff214        mov     x20, #0xff9000000000            // #280993940373504
 450:   a90393a5        stp     x5, x4, [x29,#56]
 454:   f2fbfff4        movk    x20, #0xdfff, lsl #48
 458:   d343fe73        lsr     x19, x19, #3
 45c:   d2915664        mov     x4, #0x8ab3                     // #35507
 460:   f9001bf5        str     x21, [sp,#48]
 464:   f2a836a4        movk    x4, #0x41b5, lsl #16
 468:   8b140275        add     x21, x19, x20
 46c:   f9001ba4        str     x4, [x29,#48]
 470:   3204d3e4        mov     w4, #0xf1f1f1f1                 // #-235802127
 474:   b8346a64        str     w4, [x19,x20]
 478:   3204d7e4        mov     w4, #0xf3f3f3f3                 // #-202116109
 47c:   b9000aa4        str     w4, [x21,#8]
 480:   910143a4        add     x4, x29, #0x50
 484:   d2800005        mov     x5, #0x0                        // #0
 488:   f90003e4        str     x4, [sp]
 48c:   d2800004        mov     x4, #0x0                        // #0
 490:   94000000        bl      0 <arm_smccc_hvc>
 494:   f9402ba0        ldr     x0, [x29,#80]
 498:   910003bf        mov     sp, x29
 49c:   b8346a7f        str     wzr, [x19,x20]
 4a0:   b9000abf        str     wzr, [x21,#8]
 4a4:   a94153f3        ldp     x19, x20, [sp,#16]
 4a8:   f94013f5        ldr     x21, [sp,#32]
 4ac:   a8c97bfd        ldp     x29, x30, [sp],#144
 4b0:   d65f03c0        ret
 4b4:   d503201f        nop

For comparison, without KASAN __incoke_psci_fn_hvc looks like:

0000000000000280 <__invoke_psci_fn_hvc>:
 280:   d10103ff        sub     sp, sp, #0x40
 284:   d2800007        mov     x7, #0x0                        // #0
 288:   d2800006        mov     x6, #0x0                        // #0
 28c:   d2800005        mov     x5, #0x0                        // #0
 290:   a9017bfd        stp     x29, x30, [sp,#16]
 294:   910043fd        add     x29, sp, #0x10
 298:   910043a4        add     x4, x29, #0x10
 29c:   f90003e4        str     x4, [sp]
 2a0:   d2800004        mov     x4, #0x0                        // #0
 2a4:   94000000        bl      0 <arm_smccc_hvc>
 2a8:   910003bf        mov     sp, x29
 2ac:   f9400ba0        ldr     x0, [x29,#16]
 2b0:   a8c37bfd        ldp     x29, x30, [sp],#48
 2b4:   d65f03c0        ret

I also tried using __attribute__((no_sanitize_address)) directly, in
case there was some header issue, but that doesn't seem to be the case.

I'm using the Linaro 15.08 AArch64 GCC 5.1. Is anyone else able to
confirm whether they see the same? Does the same happen for x86?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area
  2016-02-17 16:31                         ` Andrey Ryabinin
@ 2016-02-17 19:35                           ` Mark Rutland
  0 siblings, 0 replies; 78+ messages in thread
From: Mark Rutland @ 2016-02-17 19:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 17, 2016 at 07:31:43PM +0300, Andrey Ryabinin wrote:
> On 02/17/2016 05:39 PM, Mark Rutland wrote:
> > Andrey, on a tangential note, what do we do around hotplug? I assume
> > that we must unpooison the shadow region for the stack of a dead CPU,
> > but I wasn't able to figure out where we do that. Hopefuly we're not
> > just getting lucky?
> 
> We do nothing about it. AFAIU we need to clear swapper's stack,
> somewhere in secondary_start_kernel() perhaps.

Oh, joy...

Surely other architectures (e.g. x86) will need to do something similar?

Do they do anything currently? I can't see that they do...

Mark.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* KASAN issues with idle / hotplug area (was: Re: [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area)
  2016-02-17 19:16                             ` Mark Rutland
@ 2016-02-18  8:06                               ` Ard Biesheuvel
  2016-02-18  8:22                               ` KASAN issues with idle / hotplug area Andrey Ryabinin
  1 sibling, 0 replies; 78+ messages in thread
From: Ard Biesheuvel @ 2016-02-18  8:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 17 February 2016 at 20:16, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Feb 17, 2016 at 05:56:56PM +0000, Mark Rutland wrote:
>> On Wed, Feb 17, 2016 at 05:01:11PM +0000, Mark Rutland wrote:
>> > On Wed, Feb 17, 2016 at 02:39:51PM +0000, Mark Rutland wrote:
>> > > Perhaps the simplest option is to not instrument invoke_psci_fn_* and
>> > > psci_suspend_finisher. Do we have a per-function annotation to avoid
>> > > KASAN instrumentation, like notrace? I need to investigate, but we may
>> > > also need notrace for similar reasons.
>> >
>> > I came up with the patch below, per the reasoning above.
>> >
>> > It _changes_ the KASAN splats (I see errors in tick_program_event rather
>> > than find_busiest_group), but doesn't seem to get rid of them. I'm not
>> > sure if I've missed something, or if we also have another latent issue.
>> >
>> > Ideas?
>>
>> I'd missed annotating __cpu_suspend_save. I've fixed that up locally
>> (along with s/virt_to_phys/__virt_to_phys due to the inlining issue).
>
> Thinking about it more, I shouldn't have to annotate __cpu_suspend_save,
> as it returns (and hence should have cleaned up after itself).
>
> Looking at the assembly, functions seem to get instrumented regardless
> of the __no_sanitize_address annotation. The assembly of
> __invoke_psci_fn_{smc,hvc} look identical, even if one has the
> annotation and one does not.
>
> In the case below, it looks like __invoke_psci_fn_hvc is storing to the
> shadow area even though it's anotated with __no_sanitize_address.  Note
> that the adrp symbol resolution is bogus; psci_to_linux_errno happens to
> be at offset 0 in the as-yet unlinked psci.o object.
>
> 0000000000000420 <__invoke_psci_fn_hvc>:
>  420:   d10283ff        sub     sp, sp, #0xa0
>  424:   90000004        adrp    x4, 0 <psci_to_linux_errno>
>  428:   91000084        add     x4, x4, #0x0
>  42c:   90000005        adrp    x5, 0 <psci_to_linux_errno>
>  430:   910000a5        add     x5, x5, #0x0
>  434:   d2800007        mov     x7, #0x0                        // #0
>  438:   a9017bfd        stp     x29, x30, [sp,#16]
>  43c:   910043fd        add     x29, sp, #0x10
>  440:   d2800006        mov     x6, #0x0                        // #0
>  444:   a90253f3        stp     x19, x20, [sp,#32]
>  448:   9100c3b3        add     x19, x29, #0x30
>  44c:   d2dff214        mov     x20, #0xff9000000000            // #280993940373504
>  450:   a90393a5        stp     x5, x4, [x29,#56]
>  454:   f2fbfff4        movk    x20, #0xdfff, lsl #48
>  458:   d343fe73        lsr     x19, x19, #3
>  45c:   d2915664        mov     x4, #0x8ab3                     // #35507
>  460:   f9001bf5        str     x21, [sp,#48]
>  464:   f2a836a4        movk    x4, #0x41b5, lsl #16
>  468:   8b140275        add     x21, x19, x20
>  46c:   f9001ba4        str     x4, [x29,#48]
>  470:   3204d3e4        mov     w4, #0xf1f1f1f1                 // #-235802127
>  474:   b8346a64        str     w4, [x19,x20]
>  478:   3204d7e4        mov     w4, #0xf3f3f3f3                 // #-202116109
>  47c:   b9000aa4        str     w4, [x21,#8]
>  480:   910143a4        add     x4, x29, #0x50
>  484:   d2800005        mov     x5, #0x0                        // #0
>  488:   f90003e4        str     x4, [sp]
>  48c:   d2800004        mov     x4, #0x0                        // #0
>  490:   94000000        bl      0 <arm_smccc_hvc>
>  494:   f9402ba0        ldr     x0, [x29,#80]
>  498:   910003bf        mov     sp, x29
>  49c:   b8346a7f        str     wzr, [x19,x20]
>  4a0:   b9000abf        str     wzr, [x21,#8]
>  4a4:   a94153f3        ldp     x19, x20, [sp,#16]
>  4a8:   f94013f5        ldr     x21, [sp,#32]
>  4ac:   a8c97bfd        ldp     x29, x30, [sp],#144
>  4b0:   d65f03c0        ret
>  4b4:   d503201f        nop
>
> For comparison, without KASAN __incoke_psci_fn_hvc looks like:
>
> 0000000000000280 <__invoke_psci_fn_hvc>:
>  280:   d10103ff        sub     sp, sp, #0x40
>  284:   d2800007        mov     x7, #0x0                        // #0
>  288:   d2800006        mov     x6, #0x0                        // #0
>  28c:   d2800005        mov     x5, #0x0                        // #0
>  290:   a9017bfd        stp     x29, x30, [sp,#16]
>  294:   910043fd        add     x29, sp, #0x10
>  298:   910043a4        add     x4, x29, #0x10
>  29c:   f90003e4        str     x4, [sp]
>  2a0:   d2800004        mov     x4, #0x0                        // #0
>  2a4:   94000000        bl      0 <arm_smccc_hvc>
>  2a8:   910003bf        mov     sp, x29
>  2ac:   f9400ba0        ldr     x0, [x29,#16]
>  2b0:   a8c37bfd        ldp     x29, x30, [sp],#48
>  2b4:   d65f03c0        ret
>
> I also tried using __attribute__((no_sanitize_address)) directly, in
> case there was some header issue, but that doesn't seem to be the case.
>
> I'm using the Linaro 15.08 AArch64 GCC 5.1. Is anyone else able to
> confirm whether they see the same? Does the same happen for x86?
>

I am seeing the same problem with  GCC 5.2.1. Replacing the attribute with

__attribute__((optimize("-fno-sanitize=address")))

works, but appears to affect the whole object file, not just the
function to which it is attached.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* KASAN issues with idle / hotplug area
  2016-02-17 19:16                             ` Mark Rutland
  2016-02-18  8:06                               ` Ard Biesheuvel
@ 2016-02-18  8:22                               ` Andrey Ryabinin
  2016-02-18  8:42                                 ` Andrey Ryabinin
                                                   ` (3 more replies)
  1 sibling, 4 replies; 78+ messages in thread
From: Andrey Ryabinin @ 2016-02-18  8:22 UTC (permalink / raw)
  To: linux-arm-kernel



On 02/17/2016 10:16 PM, Mark Rutland wrote:
> On Wed, Feb 17, 2016 at 05:56:56PM +0000, Mark Rutland wrote:
>> On Wed, Feb 17, 2016 at 05:01:11PM +0000, Mark Rutland wrote:
>>> On Wed, Feb 17, 2016 at 02:39:51PM +0000, Mark Rutland wrote:
>>>> Perhaps the simplest option is to not instrument invoke_psci_fn_* and
>>>> psci_suspend_finisher. Do we have a per-function annotation to avoid
>>>> KASAN instrumentation, like notrace? I need to investigate, but we may
>>>> also need notrace for similar reasons.
>>>
>>> I came up with the patch below, per the reasoning above.
>>>
>>> It _changes_ the KASAN splats (I see errors in tick_program_event rather
>>> than find_busiest_group), but doesn't seem to get rid of them. I'm not
>>> sure if I've missed something, or if we also have another latent issue.
>>>
>>> Ideas?
>>
>> I'd missed annotating __cpu_suspend_save. I've fixed that up locally
>> (along with s/virt_to_phys/__virt_to_phys due to the inlining issue).
> 
> Thinking about it more, I shouldn't have to annotate __cpu_suspend_save,
> as it returns (and hence should have cleaned up after itself).
> 

Right, we need to no-sanitize only functions that passed to 'cpu_suspend(arg, fn);'


> Looking at the assembly, functions seem to get instrumented regardless
> of the __no_sanitize_address annotation. The assembly of
> __invoke_psci_fn_{smc,hvc} look identical, even if one has the
> annotation and one does not.
> 
> In the case below, it looks like __invoke_psci_fn_hvc is storing to the
> shadow area even though it's anotated with __no_sanitize_address.  Note
> that the adrp symbol resolution is bogus; psci_to_linux_errno happens to
> be at offset 0 in the as-yet unlinked psci.o object.
> 

...
> I also tried using __attribute__((no_sanitize_address)) directly, in
> case there was some header issue, but that doesn't seem to be the case.
> 
> I'm using the Linaro 15.08 AArch64 GCC 5.1. Is anyone else able to
> confirm whether they see the same? Does the same happen for x86?
> 

Confirming, this happens on every GCC I have (including x86).
It seems that 'no_sanitize_address' in gcc removes only memory access checks
but it doesn't remove stack redzones.
I think this is wrong, e.g. clang removes instrumentation completely. I'll submit a bug.

But we need fix this in kernel.
I see two options here:
 * completely disable instrumentation for drivers/firmware/psci.c
 * get back to assembly implementation

> Thanks,
> Mark.
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* KASAN issues with idle / hotplug area
  2016-02-18  8:22                               ` KASAN issues with idle / hotplug area Andrey Ryabinin
@ 2016-02-18  8:42                                 ` Andrey Ryabinin
  2016-02-18  9:38                                 ` Andrey Ryabinin
                                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 78+ messages in thread
From: Andrey Ryabinin @ 2016-02-18  8:42 UTC (permalink / raw)
  To: linux-arm-kernel



On 02/18/2016 11:22 AM, Andrey Ryabinin wrote:
> I'll submit a bug.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69863

^ permalink raw reply	[flat|nested] 78+ messages in thread

* KASAN issues with idle / hotplug area
  2016-02-18  8:22                               ` KASAN issues with idle / hotplug area Andrey Ryabinin
  2016-02-18  8:42                                 ` Andrey Ryabinin
@ 2016-02-18  9:38                                 ` Andrey Ryabinin
  2016-02-18 11:34                                   ` Mark Rutland
  2016-02-18  9:39                                 ` Lorenzo Pieralisi
  2016-02-18 11:15                                 ` Mark Rutland
  3 siblings, 1 reply; 78+ messages in thread
From: Andrey Ryabinin @ 2016-02-18  9:38 UTC (permalink / raw)
  To: linux-arm-kernel



On 02/18/2016 11:22 AM, Andrey Ryabinin wrote:

> I see two options here:
>  * completely disable instrumentation for drivers/firmware/psci.c
>  * get back to assembly implementation

One more option is to allocate struct arm_smccc_res on stack of arm_smccc_[hvc, smc](), and return res.a0
from arm_smccc_[hvc,smc]().

So it will look like this:

asmlinkage unsigned long arm_smccc_hvc(unsigned long a0, unsigned long a1,
			unsigned long a2, unsigned long a3, unsigned long a4,
			unsigned long a5, unsigned long a6, unsigned long a7);


static unsigned long __invoke_psci_fn_hvc(unsigned long function_id,
			unsigned long arg0, unsigned long arg1,
			unsigned long arg2)
{
	return arm_smccc_hvc(function_id, arg0, arg1, arg2, 0, 0, 0, 0);
}

^ permalink raw reply	[flat|nested] 78+ messages in thread

* KASAN issues with idle / hotplug area
  2016-02-18  8:22                               ` KASAN issues with idle / hotplug area Andrey Ryabinin
  2016-02-18  8:42                                 ` Andrey Ryabinin
  2016-02-18  9:38                                 ` Andrey Ryabinin
@ 2016-02-18  9:39                                 ` Lorenzo Pieralisi
  2016-02-18 11:38                                   ` Mark Rutland
  2016-02-18 11:45                                   ` Andrey Ryabinin
  2016-02-18 11:15                                 ` Mark Rutland
  3 siblings, 2 replies; 78+ messages in thread
From: Lorenzo Pieralisi @ 2016-02-18  9:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 18, 2016 at 11:22:24AM +0300, Andrey Ryabinin wrote:

[...]

> > I also tried using __attribute__((no_sanitize_address)) directly, in
> > case there was some header issue, but that doesn't seem to be the case.
> > 
> > I'm using the Linaro 15.08 AArch64 GCC 5.1. Is anyone else able to
> > confirm whether they see the same? Does the same happen for x86?
> > 
> 
> Confirming, this happens on every GCC I have (including x86).
> It seems that 'no_sanitize_address' in gcc removes only memory access checks
> but it doesn't remove stack redzones.
> I think this is wrong, e.g. clang removes instrumentation completely. I'll submit a bug.
> 
> But we need fix this in kernel.
> I see two options here:
>  * completely disable instrumentation for drivers/firmware/psci.c

We have to have a way to disable instrumentation for functions that
are used to call into FW and return via different code paths.

>  * get back to assembly implementation

No, we are certainly not reverting the SMCCC work because Kasan adds
instrumentation to C functions, that's not even an option.

Is it possible at all to implement a function to remove instrumentation
for a chunk of memory (ie resetting the shadow memory to a clean slate
for a range of stack addresses) ?

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* KASAN issues with idle / hotplug area
  2016-02-18  8:22                               ` KASAN issues with idle / hotplug area Andrey Ryabinin
                                                   ` (2 preceding siblings ...)
  2016-02-18  9:39                                 ` Lorenzo Pieralisi
@ 2016-02-18 11:15                                 ` Mark Rutland
  2016-02-18 11:46                                   ` Andrey Ryabinin
  3 siblings, 1 reply; 78+ messages in thread
From: Mark Rutland @ 2016-02-18 11:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 18, 2016 at 11:22:24AM +0300, Andrey Ryabinin wrote:
> 
> On 02/17/2016 10:16 PM, Mark Rutland wrote:
> > Looking at the assembly, functions seem to get instrumented regardless
> > of the __no_sanitize_address annotation. The assembly of
> > __invoke_psci_fn_{smc,hvc} look identical, even if one has the
> > annotation and one does not.
> > 
> > In the case below, it looks like __invoke_psci_fn_hvc is storing to the
> > shadow area even though it's anotated with __no_sanitize_address.  Note
> > that the adrp symbol resolution is bogus; psci_to_linux_errno happens to
> > be at offset 0 in the as-yet unlinked psci.o object.
> > 
> 
> ...
> > I also tried using __attribute__((no_sanitize_address)) directly, in
> > case there was some header issue, but that doesn't seem to be the case.
> > 
> > I'm using the Linaro 15.08 AArch64 GCC 5.1. Is anyone else able to
> > confirm whether they see the same? Does the same happen for x86?
>
> Confirming, this happens on every GCC I have (including x86).
> It seems that 'no_sanitize_address' in gcc removes only memory access checks
> but it doesn't remove stack redzones.
> I think this is wrong, e.g. clang removes instrumentation completely. I'll submit a bug.

Ok.

Unless there's some clever trickery that we can employ, the above
renders the Linux __no_sanitize_address annotation useless for this
style of code.

We should certianly call that out in the commentary in
include/linux/compiler-gcc.h.

> But we need fix this in kernel.
> I see two options here:
>  * completely disable instrumentation for drivers/firmware/psci.c

This is somewhat overkill, and we'd also have to disable instrumentation
for arch/arm64/kernel/psci.c (for psci_suspend_finisher).

I would like to have instrumentation for everything we can safely
instrument.

This is probably the least worst option, though.

>  * get back to assembly implementation

We'd also have to convert psci_suspend_finisher and psci_cpu_suspend,
the latter being generic code. That goes against the consolidation we
were aiming for.

Mark.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* KASAN issues with idle / hotplug area
  2016-02-18  9:38                                 ` Andrey Ryabinin
@ 2016-02-18 11:34                                   ` Mark Rutland
  0 siblings, 0 replies; 78+ messages in thread
From: Mark Rutland @ 2016-02-18 11:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 18, 2016 at 12:38:09PM +0300, Andrey Ryabinin wrote:
> 
> 
> On 02/18/2016 11:22 AM, Andrey Ryabinin wrote:
> 
> > I see two options here:
> >  * completely disable instrumentation for drivers/firmware/psci.c
> >  * get back to assembly implementation
> 
> One more option is to allocate struct arm_smccc_res on stack of arm_smccc_[hvc, smc](), and return res.a0
> from arm_smccc_[hvc,smc]().

In general ARM SMCCC calls can return multiple values, and there are
callers that may care (even if they're not here just yet).

So we can't change the arm_smccc_{smc,hvc} prototypes, and adding
another asm function is somewhat self-defeating (an asm caller
of arm_smccc_* is more complex and slower than a direct SMC/HVC).

> So it will look like this:
> 
> asmlinkage unsigned long arm_smccc_hvc(unsigned long a0, unsigned long a1,
> 			unsigned long a2, unsigned long a3, unsigned long a4,
> 			unsigned long a5, unsigned long a6, unsigned long a7);
> 
> 
> static unsigned long __invoke_psci_fn_hvc(unsigned long function_id,
> 			unsigned long arg0, unsigned long arg1,
> 			unsigned long arg2)
> {
> 	return arm_smccc_hvc(function_id, arg0, arg1, arg2, 0, 0, 0, 0);
> }

While this looks like it might work today, it's going to be _extremely_
fragile -- other instrumentation might cause stack allocation and hence
shadow dirtying.

I'm not keen on this.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* KASAN issues with idle / hotplug area
  2016-02-18  9:39                                 ` Lorenzo Pieralisi
@ 2016-02-18 11:38                                   ` Mark Rutland
  2016-02-18 11:45                                   ` Andrey Ryabinin
  1 sibling, 0 replies; 78+ messages in thread
From: Mark Rutland @ 2016-02-18 11:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 18, 2016 at 09:39:38AM +0000, Lorenzo Pieralisi wrote:
> On Thu, Feb 18, 2016 at 11:22:24AM +0300, Andrey Ryabinin wrote:
> 
> [...]
> 
> > > I also tried using __attribute__((no_sanitize_address)) directly, in
> > > case there was some header issue, but that doesn't seem to be the case.
> > > 
> > > I'm using the Linaro 15.08 AArch64 GCC 5.1. Is anyone else able to
> > > confirm whether they see the same? Does the same happen for x86?
> > > 
> > 
> > Confirming, this happens on every GCC I have (including x86).
> > It seems that 'no_sanitize_address' in gcc removes only memory access checks
> > but it doesn't remove stack redzones.
> > I think this is wrong, e.g. clang removes instrumentation completely. I'll submit a bug.
> > 
> > But we need fix this in kernel.
> > I see two options here:
> >  * completely disable instrumentation for drivers/firmware/psci.c
> 
> We have to have a way to disable instrumentation for functions that
> are used to call into FW and return via different code paths.
> 
> >  * get back to assembly implementation
> 
> No, we are certainly not reverting the SMCCC work because Kasan adds
> instrumentation to C functions, that's not even an option.
> 
> Is it possible at all to implement a function to remove instrumentation
> for a chunk of memory (ie resetting the shadow memory to a clean slate
> for a range of stack addresses) ?

In mm/kasan/kasan.c (which is uninstrumented) there is:

void kasan_unpoison_shadow(const void *address, size_t size)

Mark.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* KASAN issues with idle / hotplug area
  2016-02-18  9:39                                 ` Lorenzo Pieralisi
  2016-02-18 11:38                                   ` Mark Rutland
@ 2016-02-18 11:45                                   ` Andrey Ryabinin
  1 sibling, 0 replies; 78+ messages in thread
From: Andrey Ryabinin @ 2016-02-18 11:45 UTC (permalink / raw)
  To: linux-arm-kernel



On 02/18/2016 12:39 PM, Lorenzo Pieralisi wrote:
> On Thu, Feb 18, 2016 at 11:22:24AM +0300, Andrey Ryabinin wrote:
> 
> [...]
> 
>>> I also tried using __attribute__((no_sanitize_address)) directly, in
>>> case there was some header issue, but that doesn't seem to be the case.
>>>
>>> I'm using the Linaro 15.08 AArch64 GCC 5.1. Is anyone else able to
>>> confirm whether they see the same? Does the same happen for x86?
>>>
>>
>> Confirming, this happens on every GCC I have (including x86).
>> It seems that 'no_sanitize_address' in gcc removes only memory access checks
>> but it doesn't remove stack redzones.
>> I think this is wrong, e.g. clang removes instrumentation completely. I'll submit a bug.
>>
>> But we need fix this in kernel.
>> I see two options here:
>>  * completely disable instrumentation for drivers/firmware/psci.c
> 
> We have to have a way to disable instrumentation for functions that
> are used to call into FW and return via different code paths.
> 

Unfortunately gcc doesn't allow us to do this yet.


>>  * get back to assembly implementation
> 
> No, we are certainly not reverting the SMCCC work because Kasan adds
> instrumentation to C functions, that's not even an option.
> 
> Is it possible at all to implement a function to remove instrumentation
> for a chunk of memory (ie resetting the shadow memory to a clean slate
> for a range of stack addresses) ?
> 


Yes, that's possible. We can tell that function resume SP, it can zero out all shadow for stack bellow that SP.

> Thanks,
> Lorenzo
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* KASAN issues with idle / hotplug area
  2016-02-18 11:15                                 ` Mark Rutland
@ 2016-02-18 11:46                                   ` Andrey Ryabinin
  2016-02-18 12:08                                     ` Mark Rutland
  0 siblings, 1 reply; 78+ messages in thread
From: Andrey Ryabinin @ 2016-02-18 11:46 UTC (permalink / raw)
  To: linux-arm-kernel



On 02/18/2016 02:15 PM, Mark Rutland wrote:
> On Thu, Feb 18, 2016 at 11:22:24AM +0300, Andrey Ryabinin wrote:
>>
>> On 02/17/2016 10:16 PM, Mark Rutland wrote:
>>> Looking at the assembly, functions seem to get instrumented regardless
>>> of the __no_sanitize_address annotation. The assembly of
>>> __invoke_psci_fn_{smc,hvc} look identical, even if one has the
>>> annotation and one does not.
>>>
>>> In the case below, it looks like __invoke_psci_fn_hvc is storing to the
>>> shadow area even though it's anotated with __no_sanitize_address.  Note
>>> that the adrp symbol resolution is bogus; psci_to_linux_errno happens to
>>> be at offset 0 in the as-yet unlinked psci.o object.
>>>
>>
>> ...
>>> I also tried using __attribute__((no_sanitize_address)) directly, in
>>> case there was some header issue, but that doesn't seem to be the case.
>>>
>>> I'm using the Linaro 15.08 AArch64 GCC 5.1. Is anyone else able to
>>> confirm whether they see the same? Does the same happen for x86?
>>
>> Confirming, this happens on every GCC I have (including x86).
>> It seems that 'no_sanitize_address' in gcc removes only memory access checks
>> but it doesn't remove stack redzones.
>> I think this is wrong, e.g. clang removes instrumentation completely. I'll submit a bug.
> 
> Ok.
> 
> Unless there's some clever trickery that we can employ, the above
> renders the Linux __no_sanitize_address annotation useless for this
> style of code.
> 
> We should certianly call that out in the commentary in
> include/linux/compiler-gcc.h.
> 
>> But we need fix this in kernel.
>> I see two options here:
>>  * completely disable instrumentation for drivers/firmware/psci.c
> 
> This is somewhat overkill, and we'd also have to disable instrumentation
> for arch/arm64/kernel/psci.c (for psci_suspend_finisher).
> 
> I would like to have instrumentation for everything we can safely
> instrument.
> 
> This is probably the least worst option, though.
> 
>>  * get back to assembly implementation
> 
> We'd also have to convert psci_suspend_finisher and psci_cpu_suspend,
> the latter being generic code. That goes against the consolidation we
> were aiming for.
> 

Yup, I missed these two.
In that case the only way is to manually unpoison stack.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* KASAN issues with idle / hotplug area
  2016-02-18 11:46                                   ` Andrey Ryabinin
@ 2016-02-18 12:08                                     ` Mark Rutland
  0 siblings, 0 replies; 78+ messages in thread
From: Mark Rutland @ 2016-02-18 12:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 18, 2016 at 02:46:30PM +0300, Andrey Ryabinin wrote:
> 
> On 02/18/2016 02:15 PM, Mark Rutland wrote:
> > On Thu, Feb 18, 2016 at 11:22:24AM +0300, Andrey Ryabinin wrote:
> >> I see two options here:
> >>  * completely disable instrumentation for drivers/firmware/psci.c
> > 
> > This is somewhat overkill, and we'd also have to disable instrumentation
> > for arch/arm64/kernel/psci.c (for psci_suspend_finisher).
> > 
> > I would like to have instrumentation for everything we can safely
> > instrument.
> > 
> > This is probably the least worst option, though.
> > 
> >>  * get back to assembly implementation
> > 
> > We'd also have to convert psci_suspend_finisher and psci_cpu_suspend,
> > the latter being generic code. That goes against the consolidation we
> > were aiming for.
> > 
> 
> Yup, I missed these two.
> In that case the only way is to manually unpoison stack.

I'm prototyping this now.

Mark.

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2016-02-18 12:08 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-01 10:54 [PATCH v5sub1 0/8] arm64: split linear and kernel mappings Ard Biesheuvel
2016-02-01 10:54 ` [PATCH v5sub1 1/8] of/fdt: make memblock minimum physical address arch configurable Ard Biesheuvel
2016-02-01 10:54 ` [PATCH v5sub1 2/8] arm64: add support for ioremap() block mappings Ard Biesheuvel
2016-02-01 14:10   ` Mark Rutland
2016-02-01 14:56     ` Catalin Marinas
2016-02-01 10:54 ` [PATCH v5sub1 3/8] arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region Ard Biesheuvel
2016-02-01 10:54 ` [PATCH v5sub1 4/8] arm64: pgtable: implement static [pte|pmd|pud]_offset variants Ard Biesheuvel
2016-02-01 10:54 ` [PATCH v5sub1 5/8] arm64: decouple early fixmap init from linear mapping Ard Biesheuvel
2016-02-01 10:54 ` [PATCH v5sub1 6/8] arm64: kvm: deal with kernel symbols outside of " Ard Biesheuvel
2016-02-01 10:54 ` [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area Ard Biesheuvel
2016-02-01 12:24   ` Catalin Marinas
2016-02-01 12:27     ` Ard Biesheuvel
2016-02-01 13:41       ` Catalin Marinas
2016-02-01 14:32   ` Mark Rutland
2016-02-12 14:58   ` Catalin Marinas
2016-02-12 15:02     ` Ard Biesheuvel
2016-02-12 15:10       ` Catalin Marinas
2016-02-12 15:17         ` Ard Biesheuvel
2016-02-12 15:26           ` Catalin Marinas
2016-02-12 15:38             ` Sudeep Holla
2016-02-12 16:06               ` Catalin Marinas
2016-02-12 16:44                 ` Ard Biesheuvel
2016-02-15 14:28                 ` Andrey Ryabinin
2016-02-15 14:35                   ` Mark Rutland
2016-02-15 18:59                   ` Catalin Marinas
2016-02-16 12:59                     ` Andrey Ryabinin
2016-02-16 14:12                       ` Mark Rutland
2016-02-16 14:29                         ` Mark Rutland
2016-02-16 15:17                       ` Ard Biesheuvel
2016-02-16 15:36                         ` Andrey Ryabinin
2016-02-16 16:42                           ` Mark Rutland
2016-02-17  9:15                             ` Andrey Ryabinin
2016-02-17 10:10                               ` James Morse
2016-02-17 10:19                                 ` Catalin Marinas
2016-02-17 10:36                                   ` Catalin Marinas
2016-02-17 10:18                               ` Catalin Marinas
2016-02-17 10:48                                 ` Mark Rutland
2016-02-17 14:39                       ` Mark Rutland
2016-02-17 16:31                         ` Andrey Ryabinin
2016-02-17 19:35                           ` Mark Rutland
2016-02-17 17:01                         ` KASAN issues with idle / hotplug area (was: Re: [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area) Mark Rutland
2016-02-17 17:56                           ` Mark Rutland
2016-02-17 19:16                             ` Mark Rutland
2016-02-18  8:06                               ` Ard Biesheuvel
2016-02-18  8:22                               ` KASAN issues with idle / hotplug area Andrey Ryabinin
2016-02-18  8:42                                 ` Andrey Ryabinin
2016-02-18  9:38                                 ` Andrey Ryabinin
2016-02-18 11:34                                   ` Mark Rutland
2016-02-18  9:39                                 ` Lorenzo Pieralisi
2016-02-18 11:38                                   ` Mark Rutland
2016-02-18 11:45                                   ` Andrey Ryabinin
2016-02-18 11:15                                 ` Mark Rutland
2016-02-18 11:46                                   ` Andrey Ryabinin
2016-02-18 12:08                                     ` Mark Rutland
2016-02-12 17:47   ` [PATCH v5sub1 7/8] arm64: move kernel image to base of vmalloc area James Morse
2016-02-12 18:01     ` Ard Biesheuvel
2016-02-01 10:54 ` [PATCH v5sub1 8/8] arm64: allow kernel Image to be loaded anywhere in physical memory Ard Biesheuvel
2016-02-01 14:50   ` Mark Rutland
2016-02-01 16:28     ` Fu Wei
2016-02-16  8:55       ` Fu Wei
2016-02-01 15:06   ` Catalin Marinas
2016-02-01 15:13     ` Ard Biesheuvel
2016-02-01 16:31       ` Ard Biesheuvel
2016-02-01 17:31         ` Catalin Marinas
2016-02-01 17:57           ` Ard Biesheuvel
2016-02-01 18:02             ` Catalin Marinas
2016-02-01 18:30               ` [PATCH] arm64: move back to generic memblock_enforce_memory_limit() Ard Biesheuvel
2016-02-02 10:19                 ` Catalin Marinas
2016-02-02 10:28                   ` Ard Biesheuvel
2016-02-02 10:44                     ` Catalin Marinas
2016-02-12 19:45 ` [PATCH v5sub1 0/8] arm64: split linear and kernel mappings Matthias Brugger
2016-02-12 19:47   ` Ard Biesheuvel
2016-02-12 20:10     ` Matthias Brugger
2016-02-12 20:37       ` Ard Biesheuvel
2016-02-13 14:28       ` Ard Biesheuvel
2016-02-15 13:29         ` Matthias Brugger
2016-02-15 13:40           ` Will Deacon
2016-02-15 14:58           ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.