linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] KASAN for powerpc64 radix
@ 2019-12-12 15:16 Daniel Axtens
  2019-12-12 15:16 ` [PATCH v3 1/3] kasan: define and use MAX_PTRS_PER_* for early shadow tables Daniel Axtens
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Daniel Axtens @ 2019-12-12 15:16 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev, kasan-dev,
	christophe.leroy, aneesh.kumar, bsingharora
  Cc: Daniel Axtens

Building on the work of Christophe, Aneesh and Balbir, I've ported
KASAN to 64-bit Book3S kernels running on the Radix MMU.

This provides full inline instrumentation on radix, but does require
that you be able to specify the amount of physically contiguous memory
on the system at compile time. More details in patch 3.

v3: Reduce the overly ambitious scope of the MAX_PTRS change.
    Document more things, including around why some of the
    restrictions apply.
    Clean up the code more, thanks Christophe.

v2: The big change is the introduction of tree-wide(ish)
    MAX_PTRS_PER_{PTE,PMD,PUD} macros in preference to the previous
    approach, which was for the arch to override the page table array
    definitions with their own. (And I squashed the annoying
    intermittent crash!)

    Apart from that there's just a lot of cleanup. Christophe, I've
    addressed most of what you asked for and I will reply to your v1
    emails to clarify what remains unchanged.


Daniel Axtens (3):
  kasan: define and use MAX_PTRS_PER_* for early shadow tables
  kasan: Document support on 32-bit powerpc
  powerpc: Book3S 64-bit "heavyweight" KASAN support

 Documentation/dev-tools/kasan.rst             |   7 +-
 Documentation/powerpc/kasan.txt               | 122 ++++++++++++++++++
 arch/powerpc/Kconfig                          |   3 +
 arch/powerpc/Kconfig.debug                    |  21 +++
 arch/powerpc/Makefile                         |  11 ++
 arch/powerpc/include/asm/book3s/64/hash.h     |   4 +
 arch/powerpc/include/asm/book3s/64/pgtable.h  |   7 +
 arch/powerpc/include/asm/book3s/64/radix.h    |   5 +
 arch/powerpc/include/asm/kasan.h              |  21 ++-
 arch/powerpc/kernel/process.c                 |   8 ++
 arch/powerpc/kernel/prom.c                    |  64 ++++++++-
 arch/powerpc/mm/kasan/Makefile                |   3 +-
 .../mm/kasan/{kasan_init_32.c => init_32.c}   |   0
 arch/powerpc/mm/kasan/init_book3s_64.c        |  72 +++++++++++
 include/linux/kasan.h                         |  18 ++-
 mm/kasan/init.c                               |   6 +-
 16 files changed, 359 insertions(+), 13 deletions(-)
 create mode 100644 Documentation/powerpc/kasan.txt
 rename arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} (100%)
 create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/3] kasan: define and use MAX_PTRS_PER_* for early shadow tables
  2019-12-12 15:16 [PATCH v3 0/3] KASAN for powerpc64 radix Daniel Axtens
@ 2019-12-12 15:16 ` Daniel Axtens
  2019-12-12 15:55   ` Christophe Leroy
  2019-12-13 21:37   ` Balbir Singh
  2019-12-12 15:16 ` [PATCH v3 2/3] kasan: Document support on 32-bit powerpc Daniel Axtens
  2019-12-12 15:16 ` [PATCH v3 3/3] powerpc: Book3S 64-bit "heavyweight" KASAN support Daniel Axtens
  2 siblings, 2 replies; 12+ messages in thread
From: Daniel Axtens @ 2019-12-12 15:16 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev, kasan-dev,
	christophe.leroy, aneesh.kumar, bsingharora
  Cc: Daniel Axtens

powerpc has a variable number of PTRS_PER_*, set at runtime based
on the MMU that the kernel is booted under.

This means the PTRS_PER_* are no longer constants, and therefore
breaks the build.

Define default MAX_PTRS_PER_*s in the same style as MAX_PTRS_PER_P4D.
As KASAN is the only user at the moment, just define them in the kasan
header, and have them default to PTRS_PER_* unless overridden in arch
code.

Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Suggested-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 include/linux/kasan.h | 18 +++++++++++++++---
 mm/kasan/init.c       |  6 +++---
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index e18fe54969e9..70865810d0e7 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -14,10 +14,22 @@ struct task_struct;
 #include <asm/kasan.h>
 #include <asm/pgtable.h>
 
+#ifndef MAX_PTRS_PER_PTE
+#define MAX_PTRS_PER_PTE PTRS_PER_PTE
+#endif
+
+#ifndef MAX_PTRS_PER_PMD
+#define MAX_PTRS_PER_PMD PTRS_PER_PMD
+#endif
+
+#ifndef MAX_PTRS_PER_PUD
+#define MAX_PTRS_PER_PUD PTRS_PER_PUD
+#endif
+
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
-extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE];
-extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
-extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
+extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE];
+extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
+extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
 extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
 
 int kasan_populate_early_shadow(const void *shadow_start,
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index ce45c491ebcd..8b54a96d3b3e 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -46,7 +46,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 3
-pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
 static inline bool kasan_pud_table(p4d_t p4d)
 {
 	return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -58,7 +58,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 2
-pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
 static inline bool kasan_pmd_table(pud_t pud)
 {
 	return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -69,7 +69,7 @@ static inline bool kasan_pmd_table(pud_t pud)
 	return false;
 }
 #endif
-pte_t kasan_early_shadow_pte[PTRS_PER_PTE] __page_aligned_bss;
+pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE] __page_aligned_bss;
 
 static inline bool kasan_pte_table(pmd_t pmd)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/3] kasan: Document support on 32-bit powerpc
  2019-12-12 15:16 [PATCH v3 0/3] KASAN for powerpc64 radix Daniel Axtens
  2019-12-12 15:16 ` [PATCH v3 1/3] kasan: define and use MAX_PTRS_PER_* for early shadow tables Daniel Axtens
@ 2019-12-12 15:16 ` Daniel Axtens
  2019-12-12 15:16 ` [PATCH v3 3/3] powerpc: Book3S 64-bit "heavyweight" KASAN support Daniel Axtens
  2 siblings, 0 replies; 12+ messages in thread
From: Daniel Axtens @ 2019-12-12 15:16 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev, kasan-dev,
	christophe.leroy, aneesh.kumar, bsingharora
  Cc: Daniel Axtens

KASAN is supported on 32-bit powerpc and the docs should reflect this.

Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 Documentation/dev-tools/kasan.rst |  3 ++-
 Documentation/powerpc/kasan.txt   | 12 ++++++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/powerpc/kasan.txt

diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst
index e4d66e7c50de..4af2b5d2c9b4 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -22,7 +22,8 @@ global variables yet.
 Tag-based KASAN is only supported in Clang and requires version 7.0.0 or later.
 
 Currently generic KASAN is supported for the x86_64, arm64, xtensa and s390
-architectures, and tag-based KASAN is supported only for arm64.
+architectures. It is also supported on 32-bit powerpc kernels. Tag-based KASAN
+is supported only on arm64.
 
 Usage
 -----
diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
new file mode 100644
index 000000000000..a85ce2ff8244
--- /dev/null
+++ b/Documentation/powerpc/kasan.txt
@@ -0,0 +1,12 @@
+KASAN is supported on powerpc on 32-bit only.
+
+32 bit support
+==============
+
+KASAN is supported on both hash and nohash MMUs on 32-bit.
+
+The shadow area sits at the top of the kernel virtual memory space above the
+fixmap area and occupies one eighth of the total kernel virtual memory space.
+
+Instrumentation of the vmalloc area is not currently supported, but modules
+are.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/3] powerpc: Book3S 64-bit "heavyweight" KASAN support
  2019-12-12 15:16 [PATCH v3 0/3] KASAN for powerpc64 radix Daniel Axtens
  2019-12-12 15:16 ` [PATCH v3 1/3] kasan: define and use MAX_PTRS_PER_* for early shadow tables Daniel Axtens
  2019-12-12 15:16 ` [PATCH v3 2/3] kasan: Document support on 32-bit powerpc Daniel Axtens
@ 2019-12-12 15:16 ` Daniel Axtens
  2019-12-12 23:55   ` Jordan Niethe
  2019-12-13 12:27   ` Christophe Leroy
  2 siblings, 2 replies; 12+ messages in thread
From: Daniel Axtens @ 2019-12-12 15:16 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linuxppc-dev, kasan-dev,
	christophe.leroy, aneesh.kumar, bsingharora
  Cc: Daniel Axtens, Michael Ellerman

KASAN support on Book3S is a bit tricky to get right:

 - It would be good to support inline instrumentation so as to be able to
   catch stack issues that cannot be caught with outline mode.

 - Inline instrumentation requires a fixed offset.

 - Book3S runs code in real mode after booting. Most notably a lot of KVM
   runs in real mode, and it would be good to be able to instrument it.

 - Because code runs in real mode after boot, the offset has to point to
   valid memory both in and out of real mode.

   [For those not immersed in ppc64, in real mode, the top nibble or 2 bits
   (depending on radix/hash mmu) of the address is ignored. The linear
   mapping is placed at 0xc000000000000000. This means that a pointer to
   part of the linear mapping will work both in real mode, where it will be
   interpreted as a physical address of the form 0x000..., and out of real
   mode, where it will go via the linear mapping.]

One approach is just to give up on inline instrumentation. This way all
checks can be delayed until after everything set is up correctly, and the
address-to-shadow calculations can be overridden. However, the features and
speed boost provided by inline instrumentation are worth trying to do
better.

If _at compile time_ it is known how much contiguous physical memory a
system has, the top 1/8th of the first block of physical memory can be set
aside for the shadow. This is a big hammer and comes with 3 big
consequences:

 - there's no nice way to handle physically discontiguous memory, so only
   the first physical memory block can be used.

 - kernels will simply fail to boot on machines with less memory than
   specified when compiling.

 - kernels running on machines with more memory than specified when
   compiling will simply ignore the extra memory.

Implement and document KASAN this way. The current implementation is Radix
only.

Despite the limitations, it can still find bugs,
e.g. http://patchwork.ozlabs.org/patch/1103775/

At the moment, this physical memory limit must be set _even for outline
mode_. This may be changed in a later series - a different implementation
could be added for outline mode that dynamically allocates shadow at a
fixed offset. For example, see https://patchwork.ozlabs.org/patch/795211/

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version
Cc: Christophe Leroy <christophe.leroy@c-s.fr> # ppc32 version
Signed-off-by: Daniel Axtens <dja@axtens.net>

---
Changes since v2:

 - Address feedback from Christophe around cleanups and docs.
 - Address feedback from Balbir: at this point I don't have a good solution
   for the issues you identify around the limitations of the inline implementation
   but I think that it's worth trying to get the stack instrumentation support.
   I'm happy to have an alternative and more flexible outline mode - I had
   envisoned this would be called 'lightweight' mode as it imposes fewer restrictions.
   I've linked to your implementation. I think it's best to add it in a follow-up series.
 - Made the default PHYS_MEM_SIZE_FOR_KASAN value 1024MB. I think most people have
   guests with at least that much memory in the Radix 64s case so it's a much
   saner default - it means that if you just turn on KASAN without reading the
   docs you're much more likely to have a bootable kernel, which you will never
   have if the value is set to zero! I'm happy to bikeshed the value if we want.

Changes since v1:
 - Landed kasan vmalloc support upstream
 - Lots of feedback from Christophe.

Changes since the rfc:

 - Boots real and virtual hardware, kvm works.

 - disabled reporting when we're checking the stack for exception
   frames. The behaviour isn't wrong, just incompatible with KASAN.

 - Documentation!

 - Dropped old module stuff in favour of KASAN_VMALLOC.

The bugs with ftrace and kuap were due to kernel bloat pushing
prom_init calls to be done via the plt. Because we did not have
a relocatable kernel, and they are done very early, this caused
everything to explode. Compile with CONFIG_RELOCATABLE!
---
 Documentation/dev-tools/kasan.rst             |   8 +-
 Documentation/powerpc/kasan.txt               | 112 +++++++++++++++++-
 arch/powerpc/Kconfig                          |   3 +
 arch/powerpc/Kconfig.debug                    |  21 ++++
 arch/powerpc/Makefile                         |  11 ++
 arch/powerpc/include/asm/book3s/64/hash.h     |   4 +
 arch/powerpc/include/asm/book3s/64/pgtable.h  |   7 ++
 arch/powerpc/include/asm/book3s/64/radix.h    |   5 +
 arch/powerpc/include/asm/kasan.h              |  21 +++-
 arch/powerpc/kernel/process.c                 |   8 ++
 arch/powerpc/kernel/prom.c                    |  64 +++++++++-
 arch/powerpc/mm/kasan/Makefile                |   3 +-
 .../mm/kasan/{kasan_init_32.c => init_32.c}   |   0
 arch/powerpc/mm/kasan/init_book3s_64.c        |  72 +++++++++++
 14 files changed, 330 insertions(+), 9 deletions(-)
 rename arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} (100%)
 create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c

diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst
index 4af2b5d2c9b4..d99dc580bc11 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -22,8 +22,9 @@ global variables yet.
 Tag-based KASAN is only supported in Clang and requires version 7.0.0 or later.
 
 Currently generic KASAN is supported for the x86_64, arm64, xtensa and s390
-architectures. It is also supported on 32-bit powerpc kernels. Tag-based KASAN
-is supported only on arm64.
+architectures. It is also supported on powerpc, for 32-bit kernels, and for
+64-bit kernels running under the Radix MMU. Tag-based KASAN is supported only
+on arm64.
 
 Usage
 -----
@@ -256,7 +257,8 @@ CONFIG_KASAN_VMALLOC
 ~~~~~~~~~~~~~~~~~~~~
 
 With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
-cost of greater memory usage. Currently this is only supported on x86.
+cost of greater memory usage. Currently this is optional on x86, and
+required on 64-bit powerpc.
 
 This works by hooking into vmalloc and vmap, and dynamically
 allocating real shadow memory to back the mappings.
diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
index a85ce2ff8244..f134a91600ad 100644
--- a/Documentation/powerpc/kasan.txt
+++ b/Documentation/powerpc/kasan.txt
@@ -1,4 +1,4 @@
-KASAN is supported on powerpc on 32-bit only.
+KASAN is supported on powerpc on 32-bit and Radix 64-bit only.
 
 32 bit support
 ==============
@@ -10,3 +10,113 @@ fixmap area and occupies one eighth of the total kernel virtual memory space.
 
 Instrumentation of the vmalloc area is not currently supported, but modules
 are.
+
+64 bit support
+==============
+
+Currently, only the radix MMU is supported. There have been versions for Book3E
+processors floating around on the mailing list, but nothing has been merged.
+
+KASAN support on Book3S is a bit tricky to get right:
+
+ - It would be good to support inline instrumentation so as to be able to catch
+   stack issues that cannot be caught with outline mode.
+
+ - Inline instrumentation requires a fixed offset.
+
+ - Book3S runs code in real mode after booting. Most notably a lot of KVM runs
+   in real mode, and it would be good to be able to instrument it.
+
+ - Because code runs in real mode after boot, the offset has to point to
+   valid memory both in and out of real mode.
+
+One approach is just to give up on inline instrumentation. This way all checks
+can be delayed until after everything set is up correctly, and the
+address-to-shadow calculations can be overridden. However, the features and
+speed boost provided by inline instrumentation are worth trying to do better.
+
+If _at compile time_ it is known how much contiguous physical memory a system
+has, the top 1/8th of the first block of physical memory can be set aside for
+the shadow. This is a big hammer and comes with 3 big consequences:
+
+ - there's no nice way to handle physically discontiguous memory, so only the
+   first physical memory block can be used.
+
+ - kernels will simply fail to boot on machines with less memory than specified
+   when compiling.
+
+ - kernels running on machines with more memory than specified when compiling
+   will simply ignore the extra memory.
+
+At the moment, this physical memory limit must be set _even for outline mode_.
+This may be changed in a future version - a different implementation could be
+added for outline mode that dynamically allocates shadow at a fixed offset.
+For example, see https://patchwork.ozlabs.org/patch/795211/
+
+This value is configured in CONFIG_PHYS_MEM_SIZE_FOR_KASAN.
+
+Tips
+----
+
+ - Compile with CONFIG_RELOCATABLE.
+
+   In development, boot hangs were observed when building with ftrace and KUAP
+   on. These ended up being due to kernel bloat pushing prom_init calls to be
+   done via the PLT. Because the kernel was not relocatable, and the calls are
+   done very early, this caused execution to jump off into somewhere
+   invalid. Enabling relocation fixes this.
+
+NUMA/discontiguous physical memory
+----------------------------------
+
+Currently the code cannot really deal with discontiguous physical memory. Only
+physical memory that is contiguous from physical address zero can be used. The
+size of that memory, not total memory, must be specified when configuring the
+kernel.
+
+Discontiguous memory can occur on machines with memory spread across multiple
+nodes. For example, on a Talos II with 64GB of RAM:
+
+ - 32GB runs from 0x0 to 0x0000_0008_0000_0000,
+ - then there's a gap,
+ - then the final 32GB runs from 0x0000_2000_0000_0000 to 0x0000_2008_0000_0000
+
+This can create _significant_ issues:
+
+ - If the machine is treated as having 64GB of _contiguous_ RAM, the
+   instrumentation would assume that it ran from 0x0 to
+   0x0000_0010_0000_0000. The last 1/8th - 0x0000_000e_0000_0000 to
+   0x0000_0010_0000_0000 would be reserved as the shadow region. But when the
+   kernel tried to access any of that, it would be trying to access pages that
+   are not physically present.
+
+ - If the shadow region size is based on the top address, then the shadow
+   region would be 0x2008_0000_0000 / 8 = 0x0401_0000_0000 bytes = 4100 GB of
+   memory, clearly more than the 64GB of RAM physically present.
+
+Therefore, the code currently is restricted to dealing with memory in the node
+starting at 0x0. For this system, that's 32GB. If a contiguous physical memory
+size greater than the size of the first contiguous region of memory is
+specified, the system will be unable to boot or even print an error message.
+
+The layout of a system's memory can be observed in the messages that the Radix
+MMU prints on boot. The Talos II discussed earlier has:
+
+radix-mmu: Mapped 0x0000000000000000-0x0000000040000000 with 1.00 GiB pages (exec)
+radix-mmu: Mapped 0x0000000040000000-0x0000000800000000 with 1.00 GiB pages
+radix-mmu: Mapped 0x0000200000000000-0x0000200800000000 with 1.00 GiB pages
+
+As discussed, this system would be configured for 32768 MB.
+
+Another system prints:
+
+radix-mmu: Mapped 0x0000000000000000-0x0000000040000000 with 1.00 GiB pages (exec)
+radix-mmu: Mapped 0x0000000040000000-0x0000002000000000 with 1.00 GiB pages
+radix-mmu: Mapped 0x0000200000000000-0x0000202000000000 with 1.00 GiB pages
+
+This machine has more memory: 0x0000_0040_0000_0000 total, but only
+0x0000_0020_0000_0000 is physically contiguous from zero, so it would be
+configured for 131072 MB of physically contiguous memory.
+
+This restriction currently also affects outline mode, but this could be
+changed in future if an alternative outline implementation is added.
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6987b0832e5f..2561446e85a8 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -173,6 +173,9 @@ config PPC
 	select HAVE_ARCH_HUGE_VMAP		if PPC_BOOK3S_64 && PPC_RADIX_MMU
 	select HAVE_ARCH_JUMP_LABEL
 	select HAVE_ARCH_KASAN			if PPC32
+	select HAVE_ARCH_KASAN			if PPC_BOOK3S_64 && PPC_RADIX_MMU
+	select HAVE_ARCH_KASAN_VMALLOC		if PPC_BOOK3S_64 && PPC_RADIX_MMU
+	select KASAN_VMALLOC			if KASAN && PPC_BOOK3S_64
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_MMAP_RND_BITS
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if COMPAT
diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 4e1d39847462..5c454f8fa24b 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -394,6 +394,27 @@ config PPC_FAST_ENDIAN_SWITCH
 	help
 	  If you're unsure what this is, say N.
 
+config PHYS_MEM_SIZE_FOR_KASAN
+	int "Contiguous physical memory size for KASAN (MB)" if KASAN && PPC_BOOK3S_64
+	default 1024
+	help
+
+	  To get inline instrumentation support for KASAN on 64-bit Book3S
+	  machines, you need to know how much contiguous physical memory your
+	  system has. A shadow offset will be calculated based on this figure,
+	  which will be compiled in to the kernel. KASAN will use this offset
+	  to access its shadow region, which is used to verify memory accesses.
+
+	  If you attempt to boot on a system with less memory than you specify
+	  here, your system will fail to boot very early in the process. If you
+	  boot on a system with more memory than you specify, the extra memory
+	  will wasted - it will be reserved and not used.
+
+	  For systems with discontiguous blocks of physical memory, specify the
+	  size of the block starting at 0x0. You can determine this by looking
+	  at the memory layout info printed to dmesg by the radix MMU code
+	  early in boot. See Documentation/powerpc/kasan.txt.
+
 config KASAN_SHADOW_OFFSET
 	hex
 	depends on KASAN
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index f35730548e42..eff693527462 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -230,6 +230,17 @@ ifdef CONFIG_476FPE_ERR46
 		-T $(srctree)/arch/powerpc/platforms/44x/ppc476_modules.lds
 endif
 
+ifdef CONFIG_PPC_BOOK3S_64
+# The KASAN shadow offset is such that linear map (0xc000...) is shadowed by
+# the last 8th of linearly mapped physical memory. This way, if the code uses
+# 0xc addresses throughout, accesses work both in in real mode (where the top
+# 2 bits are ignored) and outside of real mode.
+#
+# 0xc000000000000000 >> 3 = 0xa800000000000000 = 12105675798371893248
+KASAN_SHADOW_OFFSET = $(shell echo 7 \* 1024 \* 1024 \* $(CONFIG_PHYS_MEM_SIZE_FOR_KASAN) / 8 + 12105675798371893248 | bc)
+KBUILD_CFLAGS += -DKASAN_SHADOW_OFFSET=$(KASAN_SHADOW_OFFSET)UL
+endif
+
 # No AltiVec or VSX instructions when building kernel
 KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
 KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
index 2781ebf6add4..fce329b8452e 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -18,6 +18,10 @@
 #include <asm/book3s/64/hash-4k.h>
 #endif
 
+#define H_PTRS_PER_PTE		(1 << H_PTE_INDEX_SIZE)
+#define H_PTRS_PER_PMD		(1 << H_PMD_INDEX_SIZE)
+#define H_PTRS_PER_PUD		(1 << H_PUD_INDEX_SIZE)
+
 /* Bits to set in a PMD/PUD/PGD entry valid bit*/
 #define HASH_PMD_VAL_BITS		(0x8000000000000000UL)
 #define HASH_PUD_VAL_BITS		(0x8000000000000000UL)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index b01624e5c467..209817235a44 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -231,6 +231,13 @@ extern unsigned long __pmd_frag_size_shift;
 #define PTRS_PER_PUD	(1 << PUD_INDEX_SIZE)
 #define PTRS_PER_PGD	(1 << PGD_INDEX_SIZE)
 
+#define MAX_PTRS_PER_PTE	((H_PTRS_PER_PTE > R_PTRS_PER_PTE) ? \
+				  H_PTRS_PER_PTE : R_PTRS_PER_PTE)
+#define MAX_PTRS_PER_PMD	((H_PTRS_PER_PMD > R_PTRS_PER_PMD) ? \
+				  H_PTRS_PER_PMD : R_PTRS_PER_PMD)
+#define MAX_PTRS_PER_PUD	((H_PTRS_PER_PUD > R_PTRS_PER_PUD) ? \
+				  H_PTRS_PER_PUD : R_PTRS_PER_PUD)
+
 /* PMD_SHIFT determines what a second-level page table entry can map */
 #define PMD_SHIFT	(PAGE_SHIFT + PTE_INDEX_SIZE)
 #define PMD_SIZE	(1UL << PMD_SHIFT)
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index d97db3ad9aae..4f826259de71 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -35,6 +35,11 @@
 #define RADIX_PMD_SHIFT		(PAGE_SHIFT + RADIX_PTE_INDEX_SIZE)
 #define RADIX_PUD_SHIFT		(RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE)
 #define RADIX_PGD_SHIFT		(RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE)
+
+#define R_PTRS_PER_PTE		(1 << RADIX_PTE_INDEX_SIZE)
+#define R_PTRS_PER_PMD		(1 << RADIX_PMD_INDEX_SIZE)
+#define R_PTRS_PER_PUD		(1 << RADIX_PUD_INDEX_SIZE)
+
 /*
  * Size of EA range mapped by our pagetables.
  */
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 296e51c2f066..f18268cbdc33 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -2,6 +2,9 @@
 #ifndef __ASM_KASAN_H
 #define __ASM_KASAN_H
 
+#include <asm/page.h>
+#include <asm/pgtable.h>
+
 #ifdef CONFIG_KASAN
 #define _GLOBAL_KASAN(fn)	_GLOBAL(__##fn)
 #define _GLOBAL_TOC_KASAN(fn)	_GLOBAL_TOC(__##fn)
@@ -14,13 +17,19 @@
 
 #ifndef __ASSEMBLY__
 
-#include <asm/page.h>
+#ifdef CONFIG_KASAN
+void kasan_init(void);
+#else
+static inline void kasan_init(void) { }
+#endif
 
 #define KASAN_SHADOW_SCALE_SHIFT	3
 
 #define KASAN_SHADOW_START	(KASAN_SHADOW_OFFSET + \
 				 (PAGE_OFFSET >> KASAN_SHADOW_SCALE_SHIFT))
 
+#ifdef CONFIG_PPC32
+
 #define KASAN_SHADOW_OFFSET	ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
 
 #define KASAN_SHADOW_END	0UL
@@ -30,11 +39,17 @@
 #ifdef CONFIG_KASAN
 void kasan_early_init(void);
 void kasan_mmu_init(void);
-void kasan_init(void);
 #else
-static inline void kasan_init(void) { }
 static inline void kasan_mmu_init(void) { }
 #endif
+#endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
+
+#define KASAN_SHADOW_SIZE ((u64)CONFIG_PHYS_MEM_SIZE_FOR_KASAN * \
+				1024 * 1024 * 1 / 8)
+
+#endif /* CONFIG_PPC_BOOK3S_64 */
 
 #endif /* __ASSEMBLY */
 #endif
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 4df94b6e2f32..c60ff299f39b 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2081,7 +2081,14 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
 		/*
 		 * See if this is an exception frame.
 		 * We look for the "regshere" marker in the current frame.
+		 *
+		 * KASAN may complain about this. If it is an exception frame,
+		 * we won't have unpoisoned the stack in asm when we set the
+		 * exception marker. If it's not an exception frame, who knows
+		 * how things are laid out - the shadow could be in any state
+		 * at all. Just disable KASAN reporting for now.
 		 */
+		kasan_disable_current();
 		if (validate_sp(sp, tsk, STACK_INT_FRAME_SIZE)
 		    && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
 			struct pt_regs *regs = (struct pt_regs *)
@@ -2091,6 +2098,7 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
 			       regs->trap, (void *)regs->nip, (void *)lr);
 			firstframe = 1;
 		}
+		kasan_enable_current();
 
 		sp = newsp;
 	} while (count++ < kstack_depth_to_print);
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 6620f37abe73..d994c7c39c8d 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -72,6 +72,7 @@ unsigned long tce_alloc_start, tce_alloc_end;
 u64 ppc64_rma_size;
 #endif
 static phys_addr_t first_memblock_size;
+static phys_addr_t top_phys_addr;
 static int __initdata boot_cpu_count;
 
 static int __init early_parse_mem(char *p)
@@ -449,6 +450,26 @@ static bool validate_mem_limit(u64 base, u64 *size)
 {
 	u64 max_mem = 1UL << (MAX_PHYSMEM_BITS);
 
+	/*
+	 * To handle the NUMA/discontiguous memory case, don't allow a block
+	 * to be added if it falls completely beyond the configured physical
+	 * memory. Print an informational message.
+	 *
+	 * Frustratingly we also see this with qemu - it seems to split the
+	 * specified memory into a number of smaller blocks. If this happens
+	 * under qemu, it probably represents misconfiguration. So we want
+	 * the message to be noticeable, but not shouty.
+	 *
+	 * See Documentation/powerpc/kasan.txt
+	 */
+	if (IS_ENABLED(CONFIG_KASAN) &&
+	    (base >= ((u64)CONFIG_PHYS_MEM_SIZE_FOR_KASAN << 20))) {
+		pr_warn("KASAN: not adding memory block at %llx (size %llx)\n"
+			"This could be due to discontiguous memory or kernel misconfiguration.",
+			base, *size);
+		return false;
+	}
+
 	if (base >= max_mem)
 		return false;
 	if ((base + *size) > max_mem)
@@ -572,8 +593,11 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
 
 	/* Add the chunk to the MEMBLOCK list */
 	if (add_mem_to_memblock) {
-		if (validate_mem_limit(base, &size))
+		if (validate_mem_limit(base, &size)) {
 			memblock_add(base, size);
+			if (base + size > top_phys_addr)
+				top_phys_addr = base + size;
+		}
 	}
 }
 
@@ -613,6 +637,8 @@ static void __init early_reserve_mem_dt(void)
 static void __init early_reserve_mem(void)
 {
 	__be64 *reserve_map;
+	phys_addr_t kasan_shadow_start;
+	phys_addr_t kasan_memory_size;
 
 	reserve_map = (__be64 *)(((unsigned long)initial_boot_params) +
 			fdt_off_mem_rsvmap(initial_boot_params));
@@ -651,6 +677,42 @@ static void __init early_reserve_mem(void)
 		return;
 	}
 #endif
+
+	if (IS_ENABLED(CONFIG_KASAN) && IS_ENABLED(CONFIG_PPC_BOOK3S_64)) {
+		kasan_memory_size =
+			((phys_addr_t)CONFIG_PHYS_MEM_SIZE_FOR_KASAN << 20);
+
+		if (top_phys_addr < kasan_memory_size) {
+			/*
+			 * We are doomed. We shouldn't even be able to get this
+			 * far, but we do in qemu. If we continue and turn
+			 * relocations on, we'll take fatal page faults for
+			 * memory that's not physically present. Instead,
+			 * panic() here: it will be saved to __log_buf even if
+			 * it doesn't get printed to the console.
+			 */
+			panic("Tried to book a KASAN kernel configured for %u MB with only %llu MB! Aborting.",
+			      CONFIG_PHYS_MEM_SIZE_FOR_KASAN,
+			      (u64)(top_phys_addr >> 20));
+		} else if (top_phys_addr > kasan_memory_size) {
+			/* print a biiiig warning in hopes people notice */
+			pr_err("===========================================\n"
+				"Physical memory exceeds compiled-in maximum!\n"
+				"This kernel was compiled for KASAN with %u MB physical memory.\n"
+				"The physical memory detected is at least %llu MB.\n"
+				"Memory above the compiled limit will not be used!\n"
+				"===========================================\n",
+				CONFIG_PHYS_MEM_SIZE_FOR_KASAN,
+				(u64)(top_phys_addr >> 20));
+		}
+
+		kasan_shadow_start = _ALIGN_DOWN(kasan_memory_size * 7 / 8,
+						 PAGE_SIZE);
+		DBG("reserving %llx -> %llx for KASAN",
+		    kasan_shadow_start, top_phys_addr);
+		memblock_reserve(kasan_shadow_start,
+				 top_phys_addr - kasan_shadow_start);
+	}
 }
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
index 6577897673dd..f02b15c78e4d 100644
--- a/arch/powerpc/mm/kasan/Makefile
+++ b/arch/powerpc/mm/kasan/Makefile
@@ -2,4 +2,5 @@
 
 KASAN_SANITIZE := n
 
-obj-$(CONFIG_PPC32)           += kasan_init_32.o
+obj-$(CONFIG_PPC32)           += init_32.o
+obj-$(CONFIG_PPC_BOOK3S_64)   += init_book3s_64.o
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c b/arch/powerpc/mm/kasan/init_32.c
similarity index 100%
rename from arch/powerpc/mm/kasan/kasan_init_32.c
rename to arch/powerpc/mm/kasan/init_32.c
diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c b/arch/powerpc/mm/kasan/init_book3s_64.c
new file mode 100644
index 000000000000..f961e96be136
--- /dev/null
+++ b/arch/powerpc/mm/kasan/init_book3s_64.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KASAN for 64-bit Book3S powerpc
+ *
+ * Copyright (C) 2019 IBM Corporation
+ * Author: Daniel Axtens <dja@axtens.net>
+ */
+
+#define DISABLE_BRANCH_PROFILING
+
+#include <linux/kasan.h>
+#include <linux/printk.h>
+#include <linux/sched/task.h>
+#include <asm/pgalloc.h>
+
+void __init kasan_init(void)
+{
+	int i;
+	void *k_start = kasan_mem_to_shadow((void *)RADIX_KERN_VIRT_START);
+	void *k_end = kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END);
+
+	pte_t pte = __pte(__pa(kasan_early_shadow_page) |
+			  pgprot_val(PAGE_KERNEL) | _PAGE_PTE);
+
+	if (!early_radix_enabled())
+		panic("KASAN requires radix!");
+
+	for (i = 0; i < PTRS_PER_PTE; i++)
+		__set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
+			     &kasan_early_shadow_pte[i], pte, 0);
+
+	for (i = 0; i < PTRS_PER_PMD; i++)
+		pmd_populate_kernel(&init_mm, &kasan_early_shadow_pmd[i],
+				    kasan_early_shadow_pte);
+
+	for (i = 0; i < PTRS_PER_PUD; i++)
+		pud_populate(&init_mm, &kasan_early_shadow_pud[i],
+			     kasan_early_shadow_pmd);
+
+	memset(kasan_mem_to_shadow((void *)PAGE_OFFSET), KASAN_SHADOW_INIT,
+	       KASAN_SHADOW_SIZE);
+
+	kasan_populate_early_shadow(
+		kasan_mem_to_shadow((void *)RADIX_KERN_VIRT_START),
+		kasan_mem_to_shadow((void *)RADIX_VMALLOC_START));
+
+	/* leave a hole here for vmalloc */
+
+	kasan_populate_early_shadow(
+		kasan_mem_to_shadow((void *)RADIX_VMALLOC_END),
+		kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END));
+
+	flush_tlb_kernel_range((unsigned long)k_start, (unsigned long)k_end);
+
+	/* mark early shadow region as RO and wipe */
+	pte = __pte(__pa(kasan_early_shadow_page) |
+		    pgprot_val(PAGE_KERNEL_RO) | _PAGE_PTE);
+	for (i = 0; i < PTRS_PER_PTE; i++)
+		__set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
+			     &kasan_early_shadow_pte[i], pte, 0);
+
+	/*
+	 * clear_page relies on some cache info that hasn't been set up yet.
+	 * It ends up looping ~forever and blows up other data.
+	 * Use memset instead.
+	 */
+	memset(kasan_early_shadow_page, 0, PAGE_SIZE);
+
+	/* Enable error messages */
+	init_task.kasan_depth = 0;
+	pr_info("KASAN init done (64-bit Book3S heavyweight mode)\n");
+}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/3] kasan: define and use MAX_PTRS_PER_* for early shadow tables
  2019-12-12 15:16 ` [PATCH v3 1/3] kasan: define and use MAX_PTRS_PER_* for early shadow tables Daniel Axtens
@ 2019-12-12 15:55   ` Christophe Leroy
  2019-12-13 21:37   ` Balbir Singh
  1 sibling, 0 replies; 12+ messages in thread
From: Christophe Leroy @ 2019-12-12 15:55 UTC (permalink / raw)
  To: Daniel Axtens, linux-kernel, linux-mm, linuxppc-dev, kasan-dev,
	aneesh.kumar, bsingharora



Le 12/12/2019 à 16:16, Daniel Axtens a écrit :
> powerpc has a variable number of PTRS_PER_*, set at runtime based
> on the MMU that the kernel is booted under.
> 
> This means the PTRS_PER_* are no longer constants, and therefore
> breaks the build.
> 
> Define default MAX_PTRS_PER_*s in the same style as MAX_PTRS_PER_P4D.
> As KASAN is the only user at the moment, just define them in the kasan
> header, and have them default to PTRS_PER_* unless overridden in arch
> code.
> 
> Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Suggested-by: Balbir Singh <bsingharora@gmail.com>
> Signed-off-by: Daniel Axtens <dja@axtens.net>

Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>

> ---
>   include/linux/kasan.h | 18 +++++++++++++++---
>   mm/kasan/init.c       |  6 +++---
>   2 files changed, 18 insertions(+), 6 deletions(-)
> 

Christophe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] powerpc: Book3S 64-bit "heavyweight" KASAN support
  2019-12-12 15:16 ` [PATCH v3 3/3] powerpc: Book3S 64-bit "heavyweight" KASAN support Daniel Axtens
@ 2019-12-12 23:55   ` Jordan Niethe
  2019-12-18  7:01     ` Daniel Axtens
  2019-12-13 12:27   ` Christophe Leroy
  1 sibling, 1 reply; 12+ messages in thread
From: Jordan Niethe @ 2019-12-12 23:55 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: linux-kernel, linux-mm, linuxppc-dev, kasan-dev,
	christophe.leroy, aneesh.kumar, bsingharora, Michael Ellerman

On Fri, Dec 13, 2019 at 2:19 AM Daniel Axtens <dja@axtens.net> wrote:
>
> KASAN support on Book3S is a bit tricky to get right:
>
>  - It would be good to support inline instrumentation so as to be able to
>    catch stack issues that cannot be caught with outline mode.
>
>  - Inline instrumentation requires a fixed offset.
>
>  - Book3S runs code in real mode after booting. Most notably a lot of KVM
>    runs in real mode, and it would be good to be able to instrument it.
>
>  - Because code runs in real mode after boot, the offset has to point to
>    valid memory both in and out of real mode.
>
>    [For those not immersed in ppc64, in real mode, the top nibble or 2 bits
>    (depending on radix/hash mmu) of the address is ignored. The linear
>    mapping is placed at 0xc000000000000000. This means that a pointer to
>    part of the linear mapping will work both in real mode, where it will be
>    interpreted as a physical address of the form 0x000..., and out of real
>    mode, where it will go via the linear mapping.]
>

How does hash or radix mmu mode effect how many bits are ignored in real mode?

> One approach is just to give up on inline instrumentation. This way all
> checks can be delayed until after everything set is up correctly, and the
> address-to-shadow calculations can be overridden. However, the features and
> speed boost provided by inline instrumentation are worth trying to do
> better.
>
> If _at compile time_ it is known how much contiguous physical memory a
> system has, the top 1/8th of the first block of physical memory can be set
> aside for the shadow. This is a big hammer and comes with 3 big
> consequences:
>
>  - there's no nice way to handle physically discontiguous memory, so only
>    the first physical memory block can be used.
>
>  - kernels will simply fail to boot on machines with less memory than
>    specified when compiling.
>
>  - kernels running on machines with more memory than specified when
>    compiling will simply ignore the extra memory.
>
> Implement and document KASAN this way. The current implementation is Radix
> only.
>
> Despite the limitations, it can still find bugs,
> e.g. http://patchwork.ozlabs.org/patch/1103775/
>
> At the moment, this physical memory limit must be set _even for outline
> mode_. This may be changed in a later series - a different implementation
> could be added for outline mode that dynamically allocates shadow at a
> fixed offset. For example, see https://patchwork.ozlabs.org/patch/795211/
>
> Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version
> Cc: Christophe Leroy <christophe.leroy@c-s.fr> # ppc32 version
> Signed-off-by: Daniel Axtens <dja@axtens.net>
>
> ---
> Changes since v2:
>
>  - Address feedback from Christophe around cleanups and docs.
>  - Address feedback from Balbir: at this point I don't have a good solution
>    for the issues you identify around the limitations of the inline implementation
>    but I think that it's worth trying to get the stack instrumentation support.
>    I'm happy to have an alternative and more flexible outline mode - I had
>    envisoned this would be called 'lightweight' mode as it imposes fewer restrictions.
>    I've linked to your implementation. I think it's best to add it in a follow-up series.
>  - Made the default PHYS_MEM_SIZE_FOR_KASAN value 1024MB. I think most people have
>    guests with at least that much memory in the Radix 64s case so it's a much
>    saner default - it means that if you just turn on KASAN without reading the
>    docs you're much more likely to have a bootable kernel, which you will never
>    have if the value is set to zero! I'm happy to bikeshed the value if we want.
>
> Changes since v1:
>  - Landed kasan vmalloc support upstream
>  - Lots of feedback from Christophe.
>
> Changes since the rfc:
>
>  - Boots real and virtual hardware, kvm works.
>
>  - disabled reporting when we're checking the stack for exception
>    frames. The behaviour isn't wrong, just incompatible with KASAN.
>
>  - Documentation!
>
>  - Dropped old module stuff in favour of KASAN_VMALLOC.
>
> The bugs with ftrace and kuap were due to kernel bloat pushing
> prom_init calls to be done via the plt. Because we did not have
> a relocatable kernel, and they are done very early, this caused
> everything to explode. Compile with CONFIG_RELOCATABLE!
> ---
>  Documentation/dev-tools/kasan.rst             |   8 +-
>  Documentation/powerpc/kasan.txt               | 112 +++++++++++++++++-
>  arch/powerpc/Kconfig                          |   3 +
>  arch/powerpc/Kconfig.debug                    |  21 ++++
>  arch/powerpc/Makefile                         |  11 ++
>  arch/powerpc/include/asm/book3s/64/hash.h     |   4 +
>  arch/powerpc/include/asm/book3s/64/pgtable.h  |   7 ++
>  arch/powerpc/include/asm/book3s/64/radix.h    |   5 +
>  arch/powerpc/include/asm/kasan.h              |  21 +++-
>  arch/powerpc/kernel/process.c                 |   8 ++
>  arch/powerpc/kernel/prom.c                    |  64 +++++++++-
>  arch/powerpc/mm/kasan/Makefile                |   3 +-
>  .../mm/kasan/{kasan_init_32.c => init_32.c}   |   0
>  arch/powerpc/mm/kasan/init_book3s_64.c        |  72 +++++++++++
>  14 files changed, 330 insertions(+), 9 deletions(-)
>  rename arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} (100%)
>  create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c
>
> diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst
> index 4af2b5d2c9b4..d99dc580bc11 100644
> --- a/Documentation/dev-tools/kasan.rst
> +++ b/Documentation/dev-tools/kasan.rst
> @@ -22,8 +22,9 @@ global variables yet.
>  Tag-based KASAN is only supported in Clang and requires version 7.0.0 or later.
>
>  Currently generic KASAN is supported for the x86_64, arm64, xtensa and s390
> -architectures. It is also supported on 32-bit powerpc kernels. Tag-based KASAN
> -is supported only on arm64.
> +architectures. It is also supported on powerpc, for 32-bit kernels, and for
> +64-bit kernels running under the Radix MMU. Tag-based KASAN is supported only
> +on arm64.
>
>  Usage
>  -----
> @@ -256,7 +257,8 @@ CONFIG_KASAN_VMALLOC
>  ~~~~~~~~~~~~~~~~~~~~
>
>  With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
> -cost of greater memory usage. Currently this is only supported on x86.
> +cost of greater memory usage. Currently this is optional on x86, and
> +required on 64-bit powerpc.
>
>  This works by hooking into vmalloc and vmap, and dynamically
>  allocating real shadow memory to back the mappings.
> diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
> index a85ce2ff8244..f134a91600ad 100644
> --- a/Documentation/powerpc/kasan.txt
> +++ b/Documentation/powerpc/kasan.txt
> @@ -1,4 +1,4 @@
> -KASAN is supported on powerpc on 32-bit only.
> +KASAN is supported on powerpc on 32-bit and Radix 64-bit only.
>
>  32 bit support
>  ==============
> @@ -10,3 +10,113 @@ fixmap area and occupies one eighth of the total kernel virtual memory space.
>
>  Instrumentation of the vmalloc area is not currently supported, but modules
>  are.
> +
> +64 bit support
> +==============
> +
> +Currently, only the radix MMU is supported. There have been versions for Book3E
> +processors floating around on the mailing list, but nothing has been merged.
> +
> +KASAN support on Book3S is a bit tricky to get right:
> +
> + - It would be good to support inline instrumentation so as to be able to catch
> +   stack issues that cannot be caught with outline mode.
> +
> + - Inline instrumentation requires a fixed offset.
> +
> + - Book3S runs code in real mode after booting. Most notably a lot of KVM runs
> +   in real mode, and it would be good to be able to instrument it.
> +
> + - Because code runs in real mode after boot, the offset has to point to
> +   valid memory both in and out of real mode.
> +
> +One approach is just to give up on inline instrumentation. This way all checks
> +can be delayed until after everything set is up correctly, and the
> +address-to-shadow calculations can be overridden. However, the features and
> +speed boost provided by inline instrumentation are worth trying to do better.
> +
> +If _at compile time_ it is known how much contiguous physical memory a system
> +has, the top 1/8th of the first block of physical memory can be set aside for
> +the shadow. This is a big hammer and comes with 3 big consequences:
> +
> + - there's no nice way to handle physically discontiguous memory, so only the
> +   first physical memory block can be used.
> +
> + - kernels will simply fail to boot on machines with less memory than specified
> +   when compiling.
> +
> + - kernels running on machines with more memory than specified when compiling
> +   will simply ignore the extra memory.
> +
> +At the moment, this physical memory limit must be set _even for outline mode_.
> +This may be changed in a future version - a different implementation could be
> +added for outline mode that dynamically allocates shadow at a fixed offset.
> +For example, see https://patchwork.ozlabs.org/patch/795211/
> +
> +This value is configured in CONFIG_PHYS_MEM_SIZE_FOR_KASAN.
> +
> +Tips
> +----
> +
> + - Compile with CONFIG_RELOCATABLE.
> +
> +   In development, boot hangs were observed when building with ftrace and KUAP
> +   on. These ended up being due to kernel bloat pushing prom_init calls to be
> +   done via the PLT. Because the kernel was not relocatable, and the calls are
> +   done very early, this caused execution to jump off into somewhere
> +   invalid. Enabling relocation fixes this.
> +
> +NUMA/discontiguous physical memory
> +----------------------------------
> +
> +Currently the code cannot really deal with discontiguous physical memory. Only
> +physical memory that is contiguous from physical address zero can be used. The
> +size of that memory, not total memory, must be specified when configuring the
> +kernel.
> +
> +Discontiguous memory can occur on machines with memory spread across multiple
> +nodes. For example, on a Talos II with 64GB of RAM:
> +
> + - 32GB runs from 0x0 to 0x0000_0008_0000_0000,
> + - then there's a gap,
> + - then the final 32GB runs from 0x0000_2000_0000_0000 to 0x0000_2008_0000_0000
> +
> +This can create _significant_ issues:
> +
> + - If the machine is treated as having 64GB of _contiguous_ RAM, the
> +   instrumentation would assume that it ran from 0x0 to
> +   0x0000_0010_0000_0000. The last 1/8th - 0x0000_000e_0000_0000 to
> +   0x0000_0010_0000_0000 would be reserved as the shadow region. But when the
> +   kernel tried to access any of that, it would be trying to access pages that
> +   are not physically present.
> +
> + - If the shadow region size is based on the top address, then the shadow
> +   region would be 0x2008_0000_0000 / 8 = 0x0401_0000_0000 bytes = 4100 GB of
> +   memory, clearly more than the 64GB of RAM physically present.
> +
> +Therefore, the code currently is restricted to dealing with memory in the node
> +starting at 0x0. For this system, that's 32GB. If a contiguous physical memory
> +size greater than the size of the first contiguous region of memory is
> +specified, the system will be unable to boot or even print an error message.
> +
> +The layout of a system's memory can be observed in the messages that the Radix
> +MMU prints on boot. The Talos II discussed earlier has:
> +
> +radix-mmu: Mapped 0x0000000000000000-0x0000000040000000 with 1.00 GiB pages (exec)
> +radix-mmu: Mapped 0x0000000040000000-0x0000000800000000 with 1.00 GiB pages
> +radix-mmu: Mapped 0x0000200000000000-0x0000200800000000 with 1.00 GiB pages
> +
> +As discussed, this system would be configured for 32768 MB.
> +
> +Another system prints:
> +
> +radix-mmu: Mapped 0x0000000000000000-0x0000000040000000 with 1.00 GiB pages (exec)
> +radix-mmu: Mapped 0x0000000040000000-0x0000002000000000 with 1.00 GiB pages
> +radix-mmu: Mapped 0x0000200000000000-0x0000202000000000 with 1.00 GiB pages
> +
> +This machine has more memory: 0x0000_0040_0000_0000 total, but only
> +0x0000_0020_0000_0000 is physically contiguous from zero, so it would be
> +configured for 131072 MB of physically contiguous memory.
> +
> +This restriction currently also affects outline mode, but this could be
> +changed in future if an alternative outline implementation is added.
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 6987b0832e5f..2561446e85a8 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -173,6 +173,9 @@ config PPC
>         select HAVE_ARCH_HUGE_VMAP              if PPC_BOOK3S_64 && PPC_RADIX_MMU
>         select HAVE_ARCH_JUMP_LABEL
>         select HAVE_ARCH_KASAN                  if PPC32
> +       select HAVE_ARCH_KASAN                  if PPC_BOOK3S_64 && PPC_RADIX_MMU
> +       select HAVE_ARCH_KASAN_VMALLOC          if PPC_BOOK3S_64 && PPC_RADIX_MMU
> +       select KASAN_VMALLOC                    if KASAN && PPC_BOOK3S_64
>         select HAVE_ARCH_KGDB
>         select HAVE_ARCH_MMAP_RND_BITS
>         select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> index 4e1d39847462..5c454f8fa24b 100644
> --- a/arch/powerpc/Kconfig.debug
> +++ b/arch/powerpc/Kconfig.debug
> @@ -394,6 +394,27 @@ config PPC_FAST_ENDIAN_SWITCH
>         help
>           If you're unsure what this is, say N.
>
> +config PHYS_MEM_SIZE_FOR_KASAN
> +       int "Contiguous physical memory size for KASAN (MB)" if KASAN && PPC_BOOK3S_64
> +       default 1024
> +       help
> +
> +         To get inline instrumentation support for KASAN on 64-bit Book3S
> +         machines, you need to know how much contiguous physical memory your
> +         system has. A shadow offset will be calculated based on this figure,
> +         which will be compiled in to the kernel. KASAN will use this offset
> +         to access its shadow region, which is used to verify memory accesses.
> +
> +         If you attempt to boot on a system with less memory than you specify
> +         here, your system will fail to boot very early in the process. If you
> +         boot on a system with more memory than you specify, the extra memory
> +         will wasted - it will be reserved and not used.
> +
> +         For systems with discontiguous blocks of physical memory, specify the
> +         size of the block starting at 0x0. You can determine this by looking
> +         at the memory layout info printed to dmesg by the radix MMU code
> +         early in boot. See Documentation/powerpc/kasan.txt.
> +
>  config KASAN_SHADOW_OFFSET
>         hex
>         depends on KASAN
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index f35730548e42..eff693527462 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -230,6 +230,17 @@ ifdef CONFIG_476FPE_ERR46
>                 -T $(srctree)/arch/powerpc/platforms/44x/ppc476_modules.lds
>  endif
>
> +ifdef CONFIG_PPC_BOOK3S_64
> +# The KASAN shadow offset is such that linear map (0xc000...) is shadowed by
> +# the last 8th of linearly mapped physical memory. This way, if the code uses
> +# 0xc addresses throughout, accesses work both in in real mode (where the top
> +# 2 bits are ignored) and outside of real mode.
> +#
> +# 0xc000000000000000 >> 3 = 0xa800000000000000 = 12105675798371893248
> +KASAN_SHADOW_OFFSET = $(shell echo 7 \* 1024 \* 1024 \* $(CONFIG_PHYS_MEM_SIZE_FOR_KASAN) / 8 + 12105675798371893248 | bc)
> +KBUILD_CFLAGS += -DKASAN_SHADOW_OFFSET=$(KASAN_SHADOW_OFFSET)UL
> +endif
> +
>  # No AltiVec or VSX instructions when building kernel
>  KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
>  KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
> index 2781ebf6add4..fce329b8452e 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
> @@ -18,6 +18,10 @@
>  #include <asm/book3s/64/hash-4k.h>
>  #endif
>
> +#define H_PTRS_PER_PTE         (1 << H_PTE_INDEX_SIZE)
> +#define H_PTRS_PER_PMD         (1 << H_PMD_INDEX_SIZE)
> +#define H_PTRS_PER_PUD         (1 << H_PUD_INDEX_SIZE)
> +
>  /* Bits to set in a PMD/PUD/PGD entry valid bit*/
>  #define HASH_PMD_VAL_BITS              (0x8000000000000000UL)
>  #define HASH_PUD_VAL_BITS              (0x8000000000000000UL)
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index b01624e5c467..209817235a44 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -231,6 +231,13 @@ extern unsigned long __pmd_frag_size_shift;
>  #define PTRS_PER_PUD   (1 << PUD_INDEX_SIZE)
>  #define PTRS_PER_PGD   (1 << PGD_INDEX_SIZE)
>
> +#define MAX_PTRS_PER_PTE       ((H_PTRS_PER_PTE > R_PTRS_PER_PTE) ? \
> +                                 H_PTRS_PER_PTE : R_PTRS_PER_PTE)
> +#define MAX_PTRS_PER_PMD       ((H_PTRS_PER_PMD > R_PTRS_PER_PMD) ? \
> +                                 H_PTRS_PER_PMD : R_PTRS_PER_PMD)
> +#define MAX_PTRS_PER_PUD       ((H_PTRS_PER_PUD > R_PTRS_PER_PUD) ? \
> +                                 H_PTRS_PER_PUD : R_PTRS_PER_PUD)
> +
>  /* PMD_SHIFT determines what a second-level page table entry can map */
>  #define PMD_SHIFT      (PAGE_SHIFT + PTE_INDEX_SIZE)
>  #define PMD_SIZE       (1UL << PMD_SHIFT)
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> index d97db3ad9aae..4f826259de71 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -35,6 +35,11 @@
>  #define RADIX_PMD_SHIFT                (PAGE_SHIFT + RADIX_PTE_INDEX_SIZE)
>  #define RADIX_PUD_SHIFT                (RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE)
>  #define RADIX_PGD_SHIFT                (RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE)
> +
> +#define R_PTRS_PER_PTE         (1 << RADIX_PTE_INDEX_SIZE)
> +#define R_PTRS_PER_PMD         (1 << RADIX_PMD_INDEX_SIZE)
> +#define R_PTRS_PER_PUD         (1 << RADIX_PUD_INDEX_SIZE)
> +
>  /*
>   * Size of EA range mapped by our pagetables.
>   */
> diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
> index 296e51c2f066..f18268cbdc33 100644
> --- a/arch/powerpc/include/asm/kasan.h
> +++ b/arch/powerpc/include/asm/kasan.h
> @@ -2,6 +2,9 @@
>  #ifndef __ASM_KASAN_H
>  #define __ASM_KASAN_H
>
> +#include <asm/page.h>
> +#include <asm/pgtable.h>
> +
>  #ifdef CONFIG_KASAN
>  #define _GLOBAL_KASAN(fn)      _GLOBAL(__##fn)
>  #define _GLOBAL_TOC_KASAN(fn)  _GLOBAL_TOC(__##fn)
> @@ -14,13 +17,19 @@
>
>  #ifndef __ASSEMBLY__
>
> -#include <asm/page.h>
> +#ifdef CONFIG_KASAN
> +void kasan_init(void);
> +#else
> +static inline void kasan_init(void) { }
> +#endif
>
>  #define KASAN_SHADOW_SCALE_SHIFT       3
>
>  #define KASAN_SHADOW_START     (KASAN_SHADOW_OFFSET + \
>                                  (PAGE_OFFSET >> KASAN_SHADOW_SCALE_SHIFT))
>
> +#ifdef CONFIG_PPC32
> +
>  #define KASAN_SHADOW_OFFSET    ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
>
>  #define KASAN_SHADOW_END       0UL
> @@ -30,11 +39,17 @@
>  #ifdef CONFIG_KASAN
>  void kasan_early_init(void);
>  void kasan_mmu_init(void);
> -void kasan_init(void);
>  #else
> -static inline void kasan_init(void) { }
>  static inline void kasan_mmu_init(void) { }
>  #endif
> +#endif
> +
> +#ifdef CONFIG_PPC_BOOK3S_64
> +
> +#define KASAN_SHADOW_SIZE ((u64)CONFIG_PHYS_MEM_SIZE_FOR_KASAN * \
> +                               1024 * 1024 * 1 / 8)
> +
> +#endif /* CONFIG_PPC_BOOK3S_64 */
>
>  #endif /* __ASSEMBLY */
>  #endif
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 4df94b6e2f32..c60ff299f39b 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -2081,7 +2081,14 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
>                 /*
>                  * See if this is an exception frame.
>                  * We look for the "regshere" marker in the current frame.
> +                *
> +                * KASAN may complain about this. If it is an exception frame,
> +                * we won't have unpoisoned the stack in asm when we set the
> +                * exception marker. If it's not an exception frame, who knows
> +                * how things are laid out - the shadow could be in any state
> +                * at all. Just disable KASAN reporting for now.
>                  */
> +               kasan_disable_current();
>                 if (validate_sp(sp, tsk, STACK_INT_FRAME_SIZE)
>                     && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
>                         struct pt_regs *regs = (struct pt_regs *)
> @@ -2091,6 +2098,7 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
>                                regs->trap, (void *)regs->nip, (void *)lr);
>                         firstframe = 1;
>                 }
> +               kasan_enable_current();
>
>                 sp = newsp;
>         } while (count++ < kstack_depth_to_print);
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 6620f37abe73..d994c7c39c8d 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -72,6 +72,7 @@ unsigned long tce_alloc_start, tce_alloc_end;
>  u64 ppc64_rma_size;
>  #endif
>  static phys_addr_t first_memblock_size;
> +static phys_addr_t top_phys_addr;
>  static int __initdata boot_cpu_count;
>
>  static int __init early_parse_mem(char *p)
> @@ -449,6 +450,26 @@ static bool validate_mem_limit(u64 base, u64 *size)
>  {
>         u64 max_mem = 1UL << (MAX_PHYSMEM_BITS);
>
> +       /*
> +        * To handle the NUMA/discontiguous memory case, don't allow a block
> +        * to be added if it falls completely beyond the configured physical
> +        * memory. Print an informational message.
> +        *
> +        * Frustratingly we also see this with qemu - it seems to split the
> +        * specified memory into a number of smaller blocks. If this happens
> +        * under qemu, it probably represents misconfiguration. So we want
> +        * the message to be noticeable, but not shouty.
> +        *
> +        * See Documentation/powerpc/kasan.txt
> +        */
> +       if (IS_ENABLED(CONFIG_KASAN) &&
> +           (base >= ((u64)CONFIG_PHYS_MEM_SIZE_FOR_KASAN << 20))) {
> +               pr_warn("KASAN: not adding memory block at %llx (size %llx)\n"
> +                       "This could be due to discontiguous memory or kernel misconfiguration.",
> +                       base, *size);
> +               return false;
> +       }
> +
>         if (base >= max_mem)
>                 return false;
>         if ((base + *size) > max_mem)
> @@ -572,8 +593,11 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
>
>         /* Add the chunk to the MEMBLOCK list */
>         if (add_mem_to_memblock) {
> -               if (validate_mem_limit(base, &size))
> +               if (validate_mem_limit(base, &size)) {
>                         memblock_add(base, size);
> +                       if (base + size > top_phys_addr)
> +                               top_phys_addr = base + size;
> +               }
>         }
>  }
>
> @@ -613,6 +637,8 @@ static void __init early_reserve_mem_dt(void)
>  static void __init early_reserve_mem(void)
>  {
>         __be64 *reserve_map;
> +       phys_addr_t kasan_shadow_start;
> +       phys_addr_t kasan_memory_size;
>
>         reserve_map = (__be64 *)(((unsigned long)initial_boot_params) +
>                         fdt_off_mem_rsvmap(initial_boot_params));
> @@ -651,6 +677,42 @@ static void __init early_reserve_mem(void)
>                 return;
>         }
>  #endif
> +
> +       if (IS_ENABLED(CONFIG_KASAN) && IS_ENABLED(CONFIG_PPC_BOOK3S_64)) {
> +               kasan_memory_size =
> +                       ((phys_addr_t)CONFIG_PHYS_MEM_SIZE_FOR_KASAN << 20);
> +
> +               if (top_phys_addr < kasan_memory_size) {
> +                       /*
> +                        * We are doomed. We shouldn't even be able to get this
> +                        * far, but we do in qemu. If we continue and turn
> +                        * relocations on, we'll take fatal page faults for
> +                        * memory that's not physically present. Instead,
> +                        * panic() here: it will be saved to __log_buf even if
> +                        * it doesn't get printed to the console.
> +                        */
> +                       panic("Tried to book a KASAN kernel configured for %u MB with only %llu MB! Aborting.",
> +                             CONFIG_PHYS_MEM_SIZE_FOR_KASAN,
> +                             (u64)(top_phys_addr >> 20));
> +               } else if (top_phys_addr > kasan_memory_size) {
> +                       /* print a biiiig warning in hopes people notice */
> +                       pr_err("===========================================\n"
> +                               "Physical memory exceeds compiled-in maximum!\n"
> +                               "This kernel was compiled for KASAN with %u MB physical memory.\n"
> +                               "The physical memory detected is at least %llu MB.\n"
> +                               "Memory above the compiled limit will not be used!\n"
> +                               "===========================================\n",
> +                               CONFIG_PHYS_MEM_SIZE_FOR_KASAN,
> +                               (u64)(top_phys_addr >> 20));
> +               }
> +
> +               kasan_shadow_start = _ALIGN_DOWN(kasan_memory_size * 7 / 8,
> +                                                PAGE_SIZE);
> +               DBG("reserving %llx -> %llx for KASAN",
> +                   kasan_shadow_start, top_phys_addr);
> +               memblock_reserve(kasan_shadow_start,
> +                                top_phys_addr - kasan_shadow_start);
> +       }
>  }
>
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
> index 6577897673dd..f02b15c78e4d 100644
> --- a/arch/powerpc/mm/kasan/Makefile
> +++ b/arch/powerpc/mm/kasan/Makefile
> @@ -2,4 +2,5 @@
>
>  KASAN_SANITIZE := n
>
> -obj-$(CONFIG_PPC32)           += kasan_init_32.o
> +obj-$(CONFIG_PPC32)           += init_32.o
> +obj-$(CONFIG_PPC_BOOK3S_64)   += init_book3s_64.o
> diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c b/arch/powerpc/mm/kasan/init_32.c
> similarity index 100%
> rename from arch/powerpc/mm/kasan/kasan_init_32.c
> rename to arch/powerpc/mm/kasan/init_32.c
> diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c b/arch/powerpc/mm/kasan/init_book3s_64.c
> new file mode 100644
> index 000000000000..f961e96be136
> --- /dev/null
> +++ b/arch/powerpc/mm/kasan/init_book3s_64.c
> @@ -0,0 +1,72 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KASAN for 64-bit Book3S powerpc
> + *
> + * Copyright (C) 2019 IBM Corporation
> + * Author: Daniel Axtens <dja@axtens.net>
> + */
> +
> +#define DISABLE_BRANCH_PROFILING
> +
> +#include <linux/kasan.h>
> +#include <linux/printk.h>
> +#include <linux/sched/task.h>
> +#include <asm/pgalloc.h>
> +
> +void __init kasan_init(void)
> +{
> +       int i;
> +       void *k_start = kasan_mem_to_shadow((void *)RADIX_KERN_VIRT_START);
> +       void *k_end = kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END);
> +
> +       pte_t pte = __pte(__pa(kasan_early_shadow_page) |
> +                         pgprot_val(PAGE_KERNEL) | _PAGE_PTE);
> +
> +       if (!early_radix_enabled())
> +               panic("KASAN requires radix!");
> +
> +       for (i = 0; i < PTRS_PER_PTE; i++)
> +               __set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
> +                            &kasan_early_shadow_pte[i], pte, 0);
> +
> +       for (i = 0; i < PTRS_PER_PMD; i++)
> +               pmd_populate_kernel(&init_mm, &kasan_early_shadow_pmd[i],
> +                                   kasan_early_shadow_pte);
> +
> +       for (i = 0; i < PTRS_PER_PUD; i++)
> +               pud_populate(&init_mm, &kasan_early_shadow_pud[i],
> +                            kasan_early_shadow_pmd);
> +
> +       memset(kasan_mem_to_shadow((void *)PAGE_OFFSET), KASAN_SHADOW_INIT,
> +              KASAN_SHADOW_SIZE);
> +
> +       kasan_populate_early_shadow(
> +               kasan_mem_to_shadow((void *)RADIX_KERN_VIRT_START),
> +               kasan_mem_to_shadow((void *)RADIX_VMALLOC_START));
> +
> +       /* leave a hole here for vmalloc */
> +
> +       kasan_populate_early_shadow(
> +               kasan_mem_to_shadow((void *)RADIX_VMALLOC_END),
> +               kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END));
> +
> +       flush_tlb_kernel_range((unsigned long)k_start, (unsigned long)k_end);
> +
> +       /* mark early shadow region as RO and wipe */
> +       pte = __pte(__pa(kasan_early_shadow_page) |
> +                   pgprot_val(PAGE_KERNEL_RO) | _PAGE_PTE);
> +       for (i = 0; i < PTRS_PER_PTE; i++)
> +               __set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
> +                            &kasan_early_shadow_pte[i], pte, 0);
> +
> +       /*
> +        * clear_page relies on some cache info that hasn't been set up yet.
> +        * It ends up looping ~forever and blows up other data.
> +        * Use memset instead.
> +        */
> +       memset(kasan_early_shadow_page, 0, PAGE_SIZE);
> +
> +       /* Enable error messages */
> +       init_task.kasan_depth = 0;
> +       pr_info("KASAN init done (64-bit Book3S heavyweight mode)\n");
> +}
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] powerpc: Book3S 64-bit "heavyweight" KASAN support
  2019-12-12 15:16 ` [PATCH v3 3/3] powerpc: Book3S 64-bit "heavyweight" KASAN support Daniel Axtens
  2019-12-12 23:55   ` Jordan Niethe
@ 2019-12-13 12:27   ` Christophe Leroy
  2019-12-17 13:30     ` Daniel Axtens
  1 sibling, 1 reply; 12+ messages in thread
From: Christophe Leroy @ 2019-12-13 12:27 UTC (permalink / raw)
  To: Daniel Axtens, linux-kernel, linux-mm, linuxppc-dev, kasan-dev,
	aneesh.kumar, bsingharora
  Cc: Michael Ellerman



Le 12/12/2019 à 16:16, Daniel Axtens a écrit :
> KASAN support on Book3S is a bit tricky to get right:
> 
>   - It would be good to support inline instrumentation so as to be able to
>     catch stack issues that cannot be caught with outline mode.
> 
>   - Inline instrumentation requires a fixed offset.
> 
>   - Book3S runs code in real mode after booting. Most notably a lot of KVM
>     runs in real mode, and it would be good to be able to instrument it.
> 
>   - Because code runs in real mode after boot, the offset has to point to
>     valid memory both in and out of real mode.
> 
>     [For those not immersed in ppc64, in real mode, the top nibble or 2 bits
>     (depending on radix/hash mmu) of the address is ignored. The linear
>     mapping is placed at 0xc000000000000000. This means that a pointer to
>     part of the linear mapping will work both in real mode, where it will be
>     interpreted as a physical address of the form 0x000..., and out of real
>     mode, where it will go via the linear mapping.]
> 
> One approach is just to give up on inline instrumentation. This way all
> checks can be delayed until after everything set is up correctly, and the
> address-to-shadow calculations can be overridden. However, the features and
> speed boost provided by inline instrumentation are worth trying to do
> better.
> 
> If _at compile time_ it is known how much contiguous physical memory a
> system has, the top 1/8th of the first block of physical memory can be set
> aside for the shadow. This is a big hammer and comes with 3 big
> consequences:
> 
>   - there's no nice way to handle physically discontiguous memory, so only
>     the first physical memory block can be used.
> 
>   - kernels will simply fail to boot on machines with less memory than
>     specified when compiling.
> 
>   - kernels running on machines with more memory than specified when
>     compiling will simply ignore the extra memory.
> 
> Implement and document KASAN this way. The current implementation is Radix
> only.
> 
> Despite the limitations, it can still find bugs,
> e.g. http://patchwork.ozlabs.org/patch/1103775/
> 
> At the moment, this physical memory limit must be set _even for outline
> mode_. This may be changed in a later series - a different implementation
> could be added for outline mode that dynamically allocates shadow at a
> fixed offset. For example, see https://patchwork.ozlabs.org/patch/795211/
> 
> Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version
> Cc: Christophe Leroy <christophe.leroy@c-s.fr> # ppc32 version
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> 
> ---
> Changes since v2:
> 
>   - Address feedback from Christophe around cleanups and docs.
>   - Address feedback from Balbir: at this point I don't have a good solution
>     for the issues you identify around the limitations of the inline implementation
>     but I think that it's worth trying to get the stack instrumentation support.
>     I'm happy to have an alternative and more flexible outline mode - I had
>     envisoned this would be called 'lightweight' mode as it imposes fewer restrictions.
>     I've linked to your implementation. I think it's best to add it in a follow-up series.
>   - Made the default PHYS_MEM_SIZE_FOR_KASAN value 1024MB. I think most people have
>     guests with at least that much memory in the Radix 64s case so it's a much
>     saner default - it means that if you just turn on KASAN without reading the
>     docs you're much more likely to have a bootable kernel, which you will never
>     have if the value is set to zero! I'm happy to bikeshed the value if we want.
> 
> Changes since v1:
>   - Landed kasan vmalloc support upstream
>   - Lots of feedback from Christophe.
> 
> Changes since the rfc:
> 
>   - Boots real and virtual hardware, kvm works.
> 
>   - disabled reporting when we're checking the stack for exception
>     frames. The behaviour isn't wrong, just incompatible with KASAN.
> 
>   - Documentation!
> 
>   - Dropped old module stuff in favour of KASAN_VMALLOC.
> 
> The bugs with ftrace and kuap were due to kernel bloat pushing
> prom_init calls to be done via the plt. Because we did not have
> a relocatable kernel, and they are done very early, this caused
> everything to explode. Compile with CONFIG_RELOCATABLE!
> ---
>   Documentation/dev-tools/kasan.rst             |   8 +-
>   Documentation/powerpc/kasan.txt               | 112 +++++++++++++++++-
>   arch/powerpc/Kconfig                          |   3 +
>   arch/powerpc/Kconfig.debug                    |  21 ++++
>   arch/powerpc/Makefile                         |  11 ++
>   arch/powerpc/include/asm/book3s/64/hash.h     |   4 +
>   arch/powerpc/include/asm/book3s/64/pgtable.h  |   7 ++
>   arch/powerpc/include/asm/book3s/64/radix.h    |   5 +
>   arch/powerpc/include/asm/kasan.h              |  21 +++-
>   arch/powerpc/kernel/process.c                 |   8 ++
>   arch/powerpc/kernel/prom.c                    |  64 +++++++++-
>   arch/powerpc/mm/kasan/Makefile                |   3 +-
>   .../mm/kasan/{kasan_init_32.c => init_32.c}   |   0
>   arch/powerpc/mm/kasan/init_book3s_64.c        |  72 +++++++++++
>   14 files changed, 330 insertions(+), 9 deletions(-)
>   rename arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} (100%)
>   create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c
> 
> diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst
> index 4af2b5d2c9b4..d99dc580bc11 100644
> --- a/Documentation/dev-tools/kasan.rst
> +++ b/Documentation/dev-tools/kasan.rst
> @@ -22,8 +22,9 @@ global variables yet.
>   Tag-based KASAN is only supported in Clang and requires version 7.0.0 or later.
>   
>   Currently generic KASAN is supported for the x86_64, arm64, xtensa and s390
> -architectures. It is also supported on 32-bit powerpc kernels. Tag-based KASAN
> -is supported only on arm64.
> +architectures. It is also supported on powerpc, for 32-bit kernels, and for
> +64-bit kernels running under the Radix MMU. Tag-based KASAN is supported only
> +on arm64.
>   
>   Usage
>   -----
> @@ -256,7 +257,8 @@ CONFIG_KASAN_VMALLOC
>   ~~~~~~~~~~~~~~~~~~~~
>   
>   With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
> -cost of greater memory usage. Currently this is only supported on x86.
> +cost of greater memory usage. Currently this is optional on x86, and
> +required on 64-bit powerpc.
>   
>   This works by hooking into vmalloc and vmap, and dynamically
>   allocating real shadow memory to back the mappings.
> diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
> index a85ce2ff8244..f134a91600ad 100644
> --- a/Documentation/powerpc/kasan.txt
> +++ b/Documentation/powerpc/kasan.txt
> @@ -1,4 +1,4 @@
> -KASAN is supported on powerpc on 32-bit only.
> +KASAN is supported on powerpc on 32-bit and Radix 64-bit only.
>   
>   32 bit support
>   ==============
> @@ -10,3 +10,113 @@ fixmap area and occupies one eighth of the total kernel virtual memory space.
>   
>   Instrumentation of the vmalloc area is not currently supported, but modules
>   are.
> +
> +64 bit support
> +==============
> +
> +Currently, only the radix MMU is supported. There have been versions for Book3E
> +processors floating around on the mailing list, but nothing has been merged.
> +
> +KASAN support on Book3S is a bit tricky to get right:
> +
> + - It would be good to support inline instrumentation so as to be able to catch
> +   stack issues that cannot be caught with outline mode.
> +
> + - Inline instrumentation requires a fixed offset.
> +
> + - Book3S runs code in real mode after booting. Most notably a lot of KVM runs
> +   in real mode, and it would be good to be able to instrument it.
> +
> + - Because code runs in real mode after boot, the offset has to point to
> +   valid memory both in and out of real mode.
> +
> +One approach is just to give up on inline instrumentation. This way all checks
> +can be delayed until after everything set is up correctly, and the
> +address-to-shadow calculations can be overridden. However, the features and
> +speed boost provided by inline instrumentation are worth trying to do better.
> +
> +If _at compile time_ it is known how much contiguous physical memory a system
> +has, the top 1/8th of the first block of physical memory can be set aside for
> +the shadow. This is a big hammer and comes with 3 big consequences:
> +
> + - there's no nice way to handle physically discontiguous memory, so only the
> +   first physical memory block can be used.
> +
> + - kernels will simply fail to boot on machines with less memory than specified
> +   when compiling.
> +
> + - kernels running on machines with more memory than specified when compiling
> +   will simply ignore the extra memory.
> +
> +At the moment, this physical memory limit must be set _even for outline mode_.
> +This may be changed in a future version - a different implementation could be
> +added for outline mode that dynamically allocates shadow at a fixed offset.
> +For example, see https://patchwork.ozlabs.org/patch/795211/
> +
> +This value is configured in CONFIG_PHYS_MEM_SIZE_FOR_KASAN.
> +
> +Tips
> +----
> +
> + - Compile with CONFIG_RELOCATABLE.
> +
> +   In development, boot hangs were observed when building with ftrace and KUAP
> +   on. These ended up being due to kernel bloat pushing prom_init calls to be
> +   done via the PLT. Because the kernel was not relocatable, and the calls are
> +   done very early, this caused execution to jump off into somewhere
> +   invalid. Enabling relocation fixes this.
> +
> +NUMA/discontiguous physical memory
> +----------------------------------
> +
> +Currently the code cannot really deal with discontiguous physical memory. Only
> +physical memory that is contiguous from physical address zero can be used. The
> +size of that memory, not total memory, must be specified when configuring the
> +kernel.
> +
> +Discontiguous memory can occur on machines with memory spread across multiple
> +nodes. For example, on a Talos II with 64GB of RAM:
> +
> + - 32GB runs from 0x0 to 0x0000_0008_0000_0000,
> + - then there's a gap,
> + - then the final 32GB runs from 0x0000_2000_0000_0000 to 0x0000_2008_0000_0000
> +
> +This can create _significant_ issues:
> +
> + - If the machine is treated as having 64GB of _contiguous_ RAM, the
> +   instrumentation would assume that it ran from 0x0 to
> +   0x0000_0010_0000_0000. The last 1/8th - 0x0000_000e_0000_0000 to
> +   0x0000_0010_0000_0000 would be reserved as the shadow region. But when the
> +   kernel tried to access any of that, it would be trying to access pages that
> +   are not physically present.
> +
> + - If the shadow region size is based on the top address, then the shadow
> +   region would be 0x2008_0000_0000 / 8 = 0x0401_0000_0000 bytes = 4100 GB of
> +   memory, clearly more than the 64GB of RAM physically present.
> +
> +Therefore, the code currently is restricted to dealing with memory in the node
> +starting at 0x0. For this system, that's 32GB. If a contiguous physical memory
> +size greater than the size of the first contiguous region of memory is
> +specified, the system will be unable to boot or even print an error message.
> +
> +The layout of a system's memory can be observed in the messages that the Radix
> +MMU prints on boot. The Talos II discussed earlier has:
> +
> +radix-mmu: Mapped 0x0000000000000000-0x0000000040000000 with 1.00 GiB pages (exec)
> +radix-mmu: Mapped 0x0000000040000000-0x0000000800000000 with 1.00 GiB pages
> +radix-mmu: Mapped 0x0000200000000000-0x0000200800000000 with 1.00 GiB pages
> +
> +As discussed, this system would be configured for 32768 MB.
> +
> +Another system prints:
> +
> +radix-mmu: Mapped 0x0000000000000000-0x0000000040000000 with 1.00 GiB pages (exec)
> +radix-mmu: Mapped 0x0000000040000000-0x0000002000000000 with 1.00 GiB pages
> +radix-mmu: Mapped 0x0000200000000000-0x0000202000000000 with 1.00 GiB pages
> +
> +This machine has more memory: 0x0000_0040_0000_0000 total, but only
> +0x0000_0020_0000_0000 is physically contiguous from zero, so it would be
> +configured for 131072 MB of physically contiguous memory.
> +
> +This restriction currently also affects outline mode, but this could be
> +changed in future if an alternative outline implementation is added.
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 6987b0832e5f..2561446e85a8 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -173,6 +173,9 @@ config PPC
>   	select HAVE_ARCH_HUGE_VMAP		if PPC_BOOK3S_64 && PPC_RADIX_MMU
>   	select HAVE_ARCH_JUMP_LABEL
>   	select HAVE_ARCH_KASAN			if PPC32
> +	select HAVE_ARCH_KASAN			if PPC_BOOK3S_64 && PPC_RADIX_MMU
> +	select HAVE_ARCH_KASAN_VMALLOC		if PPC_BOOK3S_64 && PPC_RADIX_MMU
> +	select KASAN_VMALLOC			if KASAN && PPC_BOOK3S_64

This one should go somewhere else, most likely in the 'config 
PPC_BOOK3S_64' section in Kconfig.cputype
Here it is more or less dedicated to capabilities.

If you don't want to move it, at least you have to keep alphabetical 
order in this section.

>   	select HAVE_ARCH_KGDB
>   	select HAVE_ARCH_MMAP_RND_BITS
>   	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if COMPAT
> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> index 4e1d39847462..5c454f8fa24b 100644
> --- a/arch/powerpc/Kconfig.debug
> +++ b/arch/powerpc/Kconfig.debug
> @@ -394,6 +394,27 @@ config PPC_FAST_ENDIAN_SWITCH
>   	help
>   	  If you're unsure what this is, say N.
>   
> +config PHYS_MEM_SIZE_FOR_KASAN
> +	int "Contiguous physical memory size for KASAN (MB)" if KASAN && PPC_BOOK3S_64
> +	default 1024
> +	help
> +
> +	  To get inline instrumentation support for KASAN on 64-bit Book3S
> +	  machines, you need to know how much contiguous physical memory your
> +	  system has. A shadow offset will be calculated based on this figure,
> +	  which will be compiled in to the kernel. KASAN will use this offset
> +	  to access its shadow region, which is used to verify memory accesses.
> +
> +	  If you attempt to boot on a system with less memory than you specify
> +	  here, your system will fail to boot very early in the process. If you
> +	  boot on a system with more memory than you specify, the extra memory
> +	  will wasted - it will be reserved and not used.
> +
> +	  For systems with discontiguous blocks of physical memory, specify the
> +	  size of the block starting at 0x0. You can determine this by looking
> +	  at the memory layout info printed to dmesg by the radix MMU code
> +	  early in boot. See Documentation/powerpc/kasan.txt.
> +
>   config KASAN_SHADOW_OFFSET
>   	hex
>   	depends on KASAN
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index f35730548e42..eff693527462 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -230,6 +230,17 @@ ifdef CONFIG_476FPE_ERR46
>   		-T $(srctree)/arch/powerpc/platforms/44x/ppc476_modules.lds
>   endif
>   
> +ifdef CONFIG_PPC_BOOK3S_64
> +# The KASAN shadow offset is such that linear map (0xc000...) is shadowed by
> +# the last 8th of linearly mapped physical memory. This way, if the code uses
> +# 0xc addresses throughout, accesses work both in in real mode (where the top
> +# 2 bits are ignored) and outside of real mode.
> +#
> +# 0xc000000000000000 >> 3 = 0xa800000000000000 = 12105675798371893248
> +KASAN_SHADOW_OFFSET = $(shell echo 7 \* 1024 \* 1024 \* $(CONFIG_PHYS_MEM_SIZE_FOR_KASAN) / 8 + 12105675798371893248 | bc)
> +KBUILD_CFLAGS += -DKASAN_SHADOW_OFFSET=$(KASAN_SHADOW_OFFSET)UL
> +endif
> +
>   # No AltiVec or VSX instructions when building kernel
>   KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
>   KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
> index 2781ebf6add4..fce329b8452e 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
> @@ -18,6 +18,10 @@
>   #include <asm/book3s/64/hash-4k.h>
>   #endif
>   
> +#define H_PTRS_PER_PTE		(1 << H_PTE_INDEX_SIZE)
> +#define H_PTRS_PER_PMD		(1 << H_PMD_INDEX_SIZE)
> +#define H_PTRS_PER_PUD		(1 << H_PUD_INDEX_SIZE)
> +
>   /* Bits to set in a PMD/PUD/PGD entry valid bit*/
>   #define HASH_PMD_VAL_BITS		(0x8000000000000000UL)
>   #define HASH_PUD_VAL_BITS		(0x8000000000000000UL)
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index b01624e5c467..209817235a44 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -231,6 +231,13 @@ extern unsigned long __pmd_frag_size_shift;
>   #define PTRS_PER_PUD	(1 << PUD_INDEX_SIZE)
>   #define PTRS_PER_PGD	(1 << PGD_INDEX_SIZE)
>   
> +#define MAX_PTRS_PER_PTE	((H_PTRS_PER_PTE > R_PTRS_PER_PTE) ? \
> +				  H_PTRS_PER_PTE : R_PTRS_PER_PTE)
> +#define MAX_PTRS_PER_PMD	((H_PTRS_PER_PMD > R_PTRS_PER_PMD) ? \
> +				  H_PTRS_PER_PMD : R_PTRS_PER_PMD)
> +#define MAX_PTRS_PER_PUD	((H_PTRS_PER_PUD > R_PTRS_PER_PUD) ? \
> +				  H_PTRS_PER_PUD : R_PTRS_PER_PUD)
> +
>   /* PMD_SHIFT determines what a second-level page table entry can map */
>   #define PMD_SHIFT	(PAGE_SHIFT + PTE_INDEX_SIZE)
>   #define PMD_SIZE	(1UL << PMD_SHIFT)
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> index d97db3ad9aae..4f826259de71 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -35,6 +35,11 @@
>   #define RADIX_PMD_SHIFT		(PAGE_SHIFT + RADIX_PTE_INDEX_SIZE)
>   #define RADIX_PUD_SHIFT		(RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE)
>   #define RADIX_PGD_SHIFT		(RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE)
> +
> +#define R_PTRS_PER_PTE		(1 << RADIX_PTE_INDEX_SIZE)
> +#define R_PTRS_PER_PMD		(1 << RADIX_PMD_INDEX_SIZE)
> +#define R_PTRS_PER_PUD		(1 << RADIX_PUD_INDEX_SIZE)
> +
>   /*
>    * Size of EA range mapped by our pagetables.
>    */
> diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
> index 296e51c2f066..f18268cbdc33 100644
> --- a/arch/powerpc/include/asm/kasan.h
> +++ b/arch/powerpc/include/asm/kasan.h
> @@ -2,6 +2,9 @@
>   #ifndef __ASM_KASAN_H
>   #define __ASM_KASAN_H
>   
> +#include <asm/page.h>
> +#include <asm/pgtable.h>
> +
>   #ifdef CONFIG_KASAN
>   #define _GLOBAL_KASAN(fn)	_GLOBAL(__##fn)
>   #define _GLOBAL_TOC_KASAN(fn)	_GLOBAL_TOC(__##fn)
> @@ -14,13 +17,19 @@
>   
>   #ifndef __ASSEMBLY__
>   
> -#include <asm/page.h>
> +#ifdef CONFIG_KASAN
> +void kasan_init(void);
> +#else
> +static inline void kasan_init(void) { }
> +#endif
>   
>   #define KASAN_SHADOW_SCALE_SHIFT	3
>   
>   #define KASAN_SHADOW_START	(KASAN_SHADOW_OFFSET + \
>   				 (PAGE_OFFSET >> KASAN_SHADOW_SCALE_SHIFT))
>   
> +#ifdef CONFIG_PPC32
> +
>   #define KASAN_SHADOW_OFFSET	ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
>   
>   #define KASAN_SHADOW_END	0UL
> @@ -30,11 +39,17 @@
>   #ifdef CONFIG_KASAN
>   void kasan_early_init(void);
>   void kasan_mmu_init(void);
> -void kasan_init(void);
>   #else
> -static inline void kasan_init(void) { }
>   static inline void kasan_mmu_init(void) { }
>   #endif
> +#endif
> +
> +#ifdef CONFIG_PPC_BOOK3S_64
> +
> +#define KASAN_SHADOW_SIZE ((u64)CONFIG_PHYS_MEM_SIZE_FOR_KASAN * \
> +				1024 * 1024 * 1 / 8)
> +
> +#endif /* CONFIG_PPC_BOOK3S_64 */
>   
>   #endif /* __ASSEMBLY */
>   #endif
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 4df94b6e2f32..c60ff299f39b 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -2081,7 +2081,14 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
>   		/*
>   		 * See if this is an exception frame.
>   		 * We look for the "regshere" marker in the current frame.
> +		 *
> +		 * KASAN may complain about this. If it is an exception frame,
> +		 * we won't have unpoisoned the stack in asm when we set the
> +		 * exception marker. If it's not an exception frame, who knows
> +		 * how things are laid out - the shadow could be in any state
> +		 * at all. Just disable KASAN reporting for now.
>   		 */
> +		kasan_disable_current();
>   		if (validate_sp(sp, tsk, STACK_INT_FRAME_SIZE)
>   		    && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
>   			struct pt_regs *regs = (struct pt_regs *)
> @@ -2091,6 +2098,7 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
>   			       regs->trap, (void *)regs->nip, (void *)lr);
>   			firstframe = 1;
>   		}
> +		kasan_enable_current();

If this is really a concern for all targets including PPC32, should it 
be a separate patch with a Fixes: tag to be applied back in stable as well ?

>   
>   		sp = newsp;
>   	} while (count++ < kstack_depth_to_print);
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 6620f37abe73..d994c7c39c8d 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -72,6 +72,7 @@ unsigned long tce_alloc_start, tce_alloc_end;
>   u64 ppc64_rma_size;
>   #endif
>   static phys_addr_t first_memblock_size;
> +static phys_addr_t top_phys_addr;
>   static int __initdata boot_cpu_count;
>   
>   static int __init early_parse_mem(char *p)
> @@ -449,6 +450,26 @@ static bool validate_mem_limit(u64 base, u64 *size)
>   {
>   	u64 max_mem = 1UL << (MAX_PHYSMEM_BITS);
>   
> +	/*
> +	 * To handle the NUMA/discontiguous memory case, don't allow a block
> +	 * to be added if it falls completely beyond the configured physical
> +	 * memory. Print an informational message.
> +	 *
> +	 * Frustratingly we also see this with qemu - it seems to split the
> +	 * specified memory into a number of smaller blocks. If this happens
> +	 * under qemu, it probably represents misconfiguration. So we want
> +	 * the message to be noticeable, but not shouty.
> +	 *
> +	 * See Documentation/powerpc/kasan.txt
> +	 */
> +	if (IS_ENABLED(CONFIG_KASAN) &&
> +	    (base >= ((u64)CONFIG_PHYS_MEM_SIZE_FOR_KASAN << 20))) {
> +		pr_warn("KASAN: not adding memory block at %llx (size %llx)\n"
> +			"This could be due to discontiguous memory or kernel misconfiguration.",
> +			base, *size);
> +		return false;
> +	}
> +
>   	if (base >= max_mem)
>   		return false;
>   	if ((base + *size) > max_mem)
> @@ -572,8 +593,11 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
>   
>   	/* Add the chunk to the MEMBLOCK list */
>   	if (add_mem_to_memblock) {
> -		if (validate_mem_limit(base, &size))
> +		if (validate_mem_limit(base, &size)) {
>   			memblock_add(base, size);
> +			if (base + size > top_phys_addr)
> +				top_phys_addr = base + size;
> +		}

Can we use max() here ? Something like

top_phys_addr = max(base + size, top_phys_addr);

>   	}
>   }
>   
> @@ -613,6 +637,8 @@ static void __init early_reserve_mem_dt(void)
>   static void __init early_reserve_mem(void)
>   {
>   	__be64 *reserve_map;
> +	phys_addr_t kasan_shadow_start;
> +	phys_addr_t kasan_memory_size;
>   
>   	reserve_map = (__be64 *)(((unsigned long)initial_boot_params) +
>   			fdt_off_mem_rsvmap(initial_boot_params));
> @@ -651,6 +677,42 @@ static void __init early_reserve_mem(void)
>   		return;
>   	}
>   #endif
> +
> +	if (IS_ENABLED(CONFIG_KASAN) && IS_ENABLED(CONFIG_PPC_BOOK3S_64)) {
> +		kasan_memory_size =
> +			((phys_addr_t)CONFIG_PHYS_MEM_SIZE_FOR_KASAN << 20);
> +
> +		if (top_phys_addr < kasan_memory_size) {
> +			/*
> +			 * We are doomed. We shouldn't even be able to get this
> +			 * far, but we do in qemu. If we continue and turn
> +			 * relocations on, we'll take fatal page faults for
> +			 * memory that's not physically present. Instead,
> +			 * panic() here: it will be saved to __log_buf even if
> +			 * it doesn't get printed to the console.
> +			 */
> +			panic("Tried to book a KASAN kernel configured for %u MB with only %llu MB! Aborting.",

book ==> boot ?

> +			      CONFIG_PHYS_MEM_SIZE_FOR_KASAN,
> +			      (u64)(top_phys_addr >> 20));
> +		} else if (top_phys_addr > kasan_memory_size) {
> +			/* print a biiiig warning in hopes people notice */
> +			pr_err("===========================================\n"
> +				"Physical memory exceeds compiled-in maximum!\n"
> +				"This kernel was compiled for KASAN with %u MB physical memory.\n"
> +				"The physical memory detected is at least %llu MB.\n"
> +				"Memory above the compiled limit will not be used!\n"
> +				"===========================================\n",
> +				CONFIG_PHYS_MEM_SIZE_FOR_KASAN,
> +				(u64)(top_phys_addr >> 20));
> +		}
> +
> +		kasan_shadow_start = _ALIGN_DOWN(kasan_memory_size * 7 / 8,
> +						 PAGE_SIZE);

Can't this fit on a single line ? powerpc allows 90 chars.

> +		DBG("reserving %llx -> %llx for KASAN",
> +		    kasan_shadow_start, top_phys_addr);
> +		memblock_reserve(kasan_shadow_start,
> +				 top_phys_addr - kasan_shadow_start);

Same ?

> +	}
>   }
>   
>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
> index 6577897673dd..f02b15c78e4d 100644
> --- a/arch/powerpc/mm/kasan/Makefile
> +++ b/arch/powerpc/mm/kasan/Makefile
> @@ -2,4 +2,5 @@
>   
>   KASAN_SANITIZE := n
>   
> -obj-$(CONFIG_PPC32)           += kasan_init_32.o
> +obj-$(CONFIG_PPC32)           += init_32.o

Shouldn't we do ppc32 name change in another patch ?

> +obj-$(CONFIG_PPC_BOOK3S_64)   += init_book3s_64.o
> diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c b/arch/powerpc/mm/kasan/init_32.c
> similarity index 100%
> rename from arch/powerpc/mm/kasan/kasan_init_32.c
> rename to arch/powerpc/mm/kasan/init_32.c
> diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c b/arch/powerpc/mm/kasan/init_book3s_64.c
> new file mode 100644
> index 000000000000..f961e96be136
> --- /dev/null
> +++ b/arch/powerpc/mm/kasan/init_book3s_64.c
> @@ -0,0 +1,72 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KASAN for 64-bit Book3S powerpc
> + *
> + * Copyright (C) 2019 IBM Corporation
> + * Author: Daniel Axtens <dja@axtens.net>
> + */
> +
> +#define DISABLE_BRANCH_PROFILING
> +
> +#include <linux/kasan.h>
> +#include <linux/printk.h>
> +#include <linux/sched/task.h>
> +#include <asm/pgalloc.h>
> +
> +void __init kasan_init(void)
> +{
> +	int i;
> +	void *k_start = kasan_mem_to_shadow((void *)RADIX_KERN_VIRT_START);
> +	void *k_end = kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END);
> +
> +	pte_t pte = __pte(__pa(kasan_early_shadow_page) |
> +			  pgprot_val(PAGE_KERNEL) | _PAGE_PTE);

Can't we do something with existing helpers ? Something like:

pte = pte_mkpte(pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL));

> +
> +	if (!early_radix_enabled())
> +		panic("KASAN requires radix!");
> +
> +	for (i = 0; i < PTRS_PER_PTE; i++)
> +		__set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
> +			     &kasan_early_shadow_pte[i], pte, 0);
> +
> +	for (i = 0; i < PTRS_PER_PMD; i++)
> +		pmd_populate_kernel(&init_mm, &kasan_early_shadow_pmd[i],
> +				    kasan_early_shadow_pte);
> +
> +	for (i = 0; i < PTRS_PER_PUD; i++)
> +		pud_populate(&init_mm, &kasan_early_shadow_pud[i],
> +			     kasan_early_shadow_pmd);
> +
> +	memset(kasan_mem_to_shadow((void *)PAGE_OFFSET), KASAN_SHADOW_INIT,
> +	       KASAN_SHADOW_SIZE);
> +
> +	kasan_populate_early_shadow(
> +		kasan_mem_to_shadow((void *)RADIX_KERN_VIRT_START),
> +		kasan_mem_to_shadow((void *)RADIX_VMALLOC_START));
> +
> +	/* leave a hole here for vmalloc */
> +
> +	kasan_populate_early_shadow(
> +		kasan_mem_to_shadow((void *)RADIX_VMALLOC_END),
> +		kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END));
> +
> +	flush_tlb_kernel_range((unsigned long)k_start, (unsigned long)k_end);
> +
> +	/* mark early shadow region as RO and wipe */
> +	pte = __pte(__pa(kasan_early_shadow_page) |
> +		    pgprot_val(PAGE_KERNEL_RO) | _PAGE_PTE);

Same comment as above, use helpers ?

> +	for (i = 0; i < PTRS_PER_PTE; i++)
> +		__set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
> +			     &kasan_early_shadow_pte[i], pte, 0);
> +
> +	/*
> +	 * clear_page relies on some cache info that hasn't been set up yet.
> +	 * It ends up looping ~forever and blows up other data.
> +	 * Use memset instead.
> +	 */
> +	memset(kasan_early_shadow_page, 0, PAGE_SIZE);
> +
> +	/* Enable error messages */
> +	init_task.kasan_depth = 0;
> +	pr_info("KASAN init done (64-bit Book3S heavyweight mode)\n");
> +}
> 

Christophe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/3] kasan: define and use MAX_PTRS_PER_* for early shadow tables
  2019-12-12 15:16 ` [PATCH v3 1/3] kasan: define and use MAX_PTRS_PER_* for early shadow tables Daniel Axtens
  2019-12-12 15:55   ` Christophe Leroy
@ 2019-12-13 21:37   ` Balbir Singh
  1 sibling, 0 replies; 12+ messages in thread
From: Balbir Singh @ 2019-12-13 21:37 UTC (permalink / raw)
  To: Daniel Axtens, linux-kernel, linux-mm, linuxppc-dev, kasan-dev,
	christophe.leroy, aneesh.kumar



On 13/12/19 2:16 am, Daniel Axtens wrote:
> powerpc has a variable number of PTRS_PER_*, set at runtime based
> on the MMU that the kernel is booted under.
> 
> This means the PTRS_PER_* are no longer constants, and therefore
> breaks the build.
> 
> Define default MAX_PTRS_PER_*s in the same style as MAX_PTRS_PER_P4D.
> As KASAN is the only user at the moment, just define them in the kasan
> header, and have them default to PTRS_PER_* unless overridden in arch
> code.
> 
> Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Suggested-by: Balbir Singh <bsingharora@gmail.com>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
Reviewed-by: Balbir Singh <bsingharora@gmail.com>

Balbir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] powerpc: Book3S 64-bit "heavyweight" KASAN support
  2019-12-13 12:27   ` Christophe Leroy
@ 2019-12-17 13:30     ` Daniel Axtens
  2019-12-18  4:32       ` Daniel Axtens
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Axtens @ 2019-12-17 13:30 UTC (permalink / raw)
  To: Christophe Leroy, linux-kernel, linux-mm, linuxppc-dev,
	kasan-dev, aneesh.kumar, bsingharora
  Cc: Michael Ellerman

Hi Christophe,

I'm working through your feedback, thank you. Regarding this one:

>> --- a/arch/powerpc/kernel/process.c
>> +++ b/arch/powerpc/kernel/process.c
>> @@ -2081,7 +2081,14 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
>>   		/*
>>   		 * See if this is an exception frame.
>>   		 * We look for the "regshere" marker in the current frame.
>> +		 *
>> +		 * KASAN may complain about this. If it is an exception frame,
>> +		 * we won't have unpoisoned the stack in asm when we set the
>> +		 * exception marker. If it's not an exception frame, who knows
>> +		 * how things are laid out - the shadow could be in any state
>> +		 * at all. Just disable KASAN reporting for now.
>>   		 */
>> +		kasan_disable_current();
>>   		if (validate_sp(sp, tsk, STACK_INT_FRAME_SIZE)
>>   		    && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
>>   			struct pt_regs *regs = (struct pt_regs *)
>> @@ -2091,6 +2098,7 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
>>   			       regs->trap, (void *)regs->nip, (void *)lr);
>>   			firstframe = 1;
>>   		}
>> +		kasan_enable_current();
>
> If this is really a concern for all targets including PPC32, should it 
> be a separate patch with a Fixes: tag to be applied back in stable as well ?

I've managed to repro this by commening out the kasan_disable/enable
lines, and just booting in qemu without a disk attached:

sudo qemu-system-ppc64 -accel kvm -m 2G -M pseries -cpu power9  -kernel ./vmlinux  -nographic -chardev stdio,id=charserial0,mux=on -device spapr-vty,chardev=charserial0,reg=0x30000000  -mon chardev=charserial0,mode=readline -nodefaults -smp 2 

...

[    0.210740] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[    0.210789] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc1-next-20191213-16824-g469a24fbdb34 #12
[    0.210844] Call Trace:
[    0.210866] [c00000006a4839b0] [c000000001f74f48] dump_stack+0xfc/0x154 (unreliable)
[    0.210915] [c00000006a483a00] [c00000000025411c] panic+0x258/0x59c
[    0.210958] [c00000006a483aa0] [c0000000024870b0] mount_block_root+0x648/0x7ac
[    0.211005] ==================================================================
[    0.211054] BUG: KASAN: stack-out-of-bounds in show_stack+0x438/0x580
[    0.211095] Read of size 8 at addr c00000006a483b00 by task swapper/0/1
[    0.211134] 
[    0.211152] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc1-next-20191213-16824-g469a24fbdb34 #12
[    0.211207] Call Trace:
[    0.211225] [c00000006a483680] [c000000001f74f48] dump_stack+0xfc/0x154 (unreliable)
[    0.211274] [c00000006a4836d0] [c0000000008f877c] print_address_description.isra.10+0x7c/0x470
[    0.211330] [c00000006a483760] [c0000000008f8e7c] __kasan_report+0x1bc/0x244
[    0.211380] [c00000006a483830] [c0000000008f6eb8] kasan_report+0x18/0x30
[    0.211422] [c00000006a483850] [c0000000008fa5d4] __asan_report_load8_noabort+0x24/0x40
[    0.211471] [c00000006a483870] [c00000000003d448] show_stack+0x438/0x580
[    0.211512] [c00000006a4839b0] [c000000001f74f48] dump_stack+0xfc/0x154
[    0.211553] [c00000006a483a00] [c00000000025411c] panic+0x258/0x59c
[    0.211595] [c00000006a483aa0] [c0000000024870b0] mount_block_root+0x648/0x7ac
[    0.211644] [c00000006a483be0] [c000000002487784] prepare_namespace+0x1ec/0x240
[    0.211694] [c00000006a483c60] [c00000000248669c] kernel_init_freeable+0x7f4/0x870
[    0.211745] [c00000006a483da0] [c000000000011f30] kernel_init+0x3c/0x15c
[    0.211787] [c00000006a483e20] [c00000000000bebc] ret_from_kernel_thread+0x5c/0x80
[    0.211834] 
[    0.211851] Allocated by task 0:
[    0.211878]  save_stack+0x2c/0xe0
[    0.211904]  __kasan_kmalloc.isra.16+0x11c/0x150
[    0.211937]  kmem_cache_alloc_node+0x114/0x3b0
[    0.211971]  copy_process+0x5b8/0x6410
[    0.211996]  _do_fork+0x130/0xbf0
[    0.212022]  kernel_thread+0xdc/0x130
[    0.212047]  rest_init+0x44/0x184
[    0.212072]  start_kernel+0x77c/0x7dc
[    0.212098]  start_here_common+0x1c/0x20
[    0.212122] 
[    0.212139] Freed by task 0:
[    0.212163] (stack is not available)
[    0.212187] 
[    0.212205] The buggy address belongs to the object at c00000006a480000
[    0.212205]  which belongs to the cache thread_stack of size 16384
[    0.212285] The buggy address is located 15104 bytes inside of
[    0.212285]  16384-byte region [c00000006a480000, c00000006a484000)
[    0.212356] The buggy address belongs to the page:
[    0.212391] page:c00c0000001a9200 refcount:1 mapcount:0 mapping:c00000006a019e00 index:0x0 compound_mapcount: 0
[    0.212455] raw: 007ffff000010200 5deadbeef0000100 5deadbeef0000122 c00000006a019e00
[    0.212504] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
[    0.212551] page dumped because: kasan: bad access detected
[    0.212583] 
[    0.212600] addr c00000006a483b00 is located in stack of task swapper/0/1 at offset 0 in frame:
[    0.212656]  mount_block_root+0x0/0x7ac
[    0.212681] 
[    0.212698] this frame has 1 object:
[    0.212722]  [32, 64) 'b'
[    0.212723] 
[    0.212755] Memory state around the buggy address:
[    0.212788]  c00000006a483a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    0.212836]  c00000006a483a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    0.212884] >c00000006a483b00: f1 f1 f1 f1 00 00 00 00 f3 f3 f3 f3 00 00 00 00
[    0.212931]                    ^
[    0.212957]  c00000006a483b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    0.213005]  c00000006a483c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    0.213052] ==================================================================
[    0.213100] Disabling lock debugging due to kernel taint
[    0.213134] [c00000006a483be0] [c000000002487784] prepare_namespace+0x1ec/0x240
[    0.213182] [c00000006a483c60] [c00000000248669c] kernel_init_freeable+0x7f4/0x870
[    0.213231] [c00000006a483da0] [c000000000011f30] kernel_init+0x3c/0x15c
[    0.213272] [c00000006a483e20] [c00000000000bebc] ret_from_kernel_thread+0x5c/0x80

Is that something that reproduces on ppc32?

I don't see it running the test_kasan tests, so I guess that matches up
with your experience.

Regards,
Daniel



>
>>   
>>   		sp = newsp;
>>   	} while (count++ < kstack_depth_to_print);
>> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
>> index 6620f37abe73..d994c7c39c8d 100644
>> --- a/arch/powerpc/kernel/prom.c
>> +++ b/arch/powerpc/kernel/prom.c
>> @@ -72,6 +72,7 @@ unsigned long tce_alloc_start, tce_alloc_end;
>>   u64 ppc64_rma_size;
>>   #endif
>>   static phys_addr_t first_memblock_size;
>> +static phys_addr_t top_phys_addr;
>>   static int __initdata boot_cpu_count;
>>   
>>   static int __init early_parse_mem(char *p)
>> @@ -449,6 +450,26 @@ static bool validate_mem_limit(u64 base, u64 *size)
>>   {
>>   	u64 max_mem = 1UL << (MAX_PHYSMEM_BITS);
>>   
>> +	/*
>> +	 * To handle the NUMA/discontiguous memory case, don't allow a block
>> +	 * to be added if it falls completely beyond the configured physical
>> +	 * memory. Print an informational message.
>> +	 *
>> +	 * Frustratingly we also see this with qemu - it seems to split the
>> +	 * specified memory into a number of smaller blocks. If this happens
>> +	 * under qemu, it probably represents misconfiguration. So we want
>> +	 * the message to be noticeable, but not shouty.
>> +	 *
>> +	 * See Documentation/powerpc/kasan.txt
>> +	 */
>> +	if (IS_ENABLED(CONFIG_KASAN) &&
>> +	    (base >= ((u64)CONFIG_PHYS_MEM_SIZE_FOR_KASAN << 20))) {
>> +		pr_warn("KASAN: not adding memory block at %llx (size %llx)\n"
>> +			"This could be due to discontiguous memory or kernel misconfiguration.",
>> +			base, *size);
>> +		return false;
>> +	}
>> +
>>   	if (base >= max_mem)
>>   		return false;
>>   	if ((base + *size) > max_mem)
>> @@ -572,8 +593,11 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
>>   
>>   	/* Add the chunk to the MEMBLOCK list */
>>   	if (add_mem_to_memblock) {
>> -		if (validate_mem_limit(base, &size))
>> +		if (validate_mem_limit(base, &size)) {
>>   			memblock_add(base, size);
>> +			if (base + size > top_phys_addr)
>> +				top_phys_addr = base + size;
>> +		}
>
> Can we use max() here ? Something like
>
> top_phys_addr = max(base + size, top_phys_addr);
>
>>   	}
>>   }
>>   
>> @@ -613,6 +637,8 @@ static void __init early_reserve_mem_dt(void)
>>   static void __init early_reserve_mem(void)
>>   {
>>   	__be64 *reserve_map;
>> +	phys_addr_t kasan_shadow_start;
>> +	phys_addr_t kasan_memory_size;
>>   
>>   	reserve_map = (__be64 *)(((unsigned long)initial_boot_params) +
>>   			fdt_off_mem_rsvmap(initial_boot_params));
>> @@ -651,6 +677,42 @@ static void __init early_reserve_mem(void)
>>   		return;
>>   	}
>>   #endif
>> +
>> +	if (IS_ENABLED(CONFIG_KASAN) && IS_ENABLED(CONFIG_PPC_BOOK3S_64)) {
>> +		kasan_memory_size =
>> +			((phys_addr_t)CONFIG_PHYS_MEM_SIZE_FOR_KASAN << 20);
>> +
>> +		if (top_phys_addr < kasan_memory_size) {
>> +			/*
>> +			 * We are doomed. We shouldn't even be able to get this
>> +			 * far, but we do in qemu. If we continue and turn
>> +			 * relocations on, we'll take fatal page faults for
>> +			 * memory that's not physically present. Instead,
>> +			 * panic() here: it will be saved to __log_buf even if
>> +			 * it doesn't get printed to the console.
>> +			 */
>> +			panic("Tried to book a KASAN kernel configured for %u MB with only %llu MB! Aborting.",
>
> book ==> boot ?
>
>> +			      CONFIG_PHYS_MEM_SIZE_FOR_KASAN,
>> +			      (u64)(top_phys_addr >> 20));
>> +		} else if (top_phys_addr > kasan_memory_size) {
>> +			/* print a biiiig warning in hopes people notice */
>> +			pr_err("===========================================\n"
>> +				"Physical memory exceeds compiled-in maximum!\n"
>> +				"This kernel was compiled for KASAN with %u MB physical memory.\n"
>> +				"The physical memory detected is at least %llu MB.\n"
>> +				"Memory above the compiled limit will not be used!\n"
>> +				"===========================================\n",
>> +				CONFIG_PHYS_MEM_SIZE_FOR_KASAN,
>> +				(u64)(top_phys_addr >> 20));
>> +		}
>> +
>> +		kasan_shadow_start = _ALIGN_DOWN(kasan_memory_size * 7 / 8,
>> +						 PAGE_SIZE);
>
> Can't this fit on a single line ? powerpc allows 90 chars.
>
>> +		DBG("reserving %llx -> %llx for KASAN",
>> +		    kasan_shadow_start, top_phys_addr);
>> +		memblock_reserve(kasan_shadow_start,
>> +				 top_phys_addr - kasan_shadow_start);
>
> Same ?
>
>> +	}
>>   }
>>   
>>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>> diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
>> index 6577897673dd..f02b15c78e4d 100644
>> --- a/arch/powerpc/mm/kasan/Makefile
>> +++ b/arch/powerpc/mm/kasan/Makefile
>> @@ -2,4 +2,5 @@
>>   
>>   KASAN_SANITIZE := n
>>   
>> -obj-$(CONFIG_PPC32)           += kasan_init_32.o
>> +obj-$(CONFIG_PPC32)           += init_32.o
>
> Shouldn't we do ppc32 name change in another patch ?
>
>> +obj-$(CONFIG_PPC_BOOK3S_64)   += init_book3s_64.o
>> diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c b/arch/powerpc/mm/kasan/init_32.c
>> similarity index 100%
>> rename from arch/powerpc/mm/kasan/kasan_init_32.c
>> rename to arch/powerpc/mm/kasan/init_32.c
>> diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c b/arch/powerpc/mm/kasan/init_book3s_64.c
>> new file mode 100644
>> index 000000000000..f961e96be136
>> --- /dev/null
>> +++ b/arch/powerpc/mm/kasan/init_book3s_64.c
>> @@ -0,0 +1,72 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * KASAN for 64-bit Book3S powerpc
>> + *
>> + * Copyright (C) 2019 IBM Corporation
>> + * Author: Daniel Axtens <dja@axtens.net>
>> + */
>> +
>> +#define DISABLE_BRANCH_PROFILING
>> +
>> +#include <linux/kasan.h>
>> +#include <linux/printk.h>
>> +#include <linux/sched/task.h>
>> +#include <asm/pgalloc.h>
>> +
>> +void __init kasan_init(void)
>> +{
>> +	int i;
>> +	void *k_start = kasan_mem_to_shadow((void *)RADIX_KERN_VIRT_START);
>> +	void *k_end = kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END);
>> +
>> +	pte_t pte = __pte(__pa(kasan_early_shadow_page) |
>> +			  pgprot_val(PAGE_KERNEL) | _PAGE_PTE);
>
> Can't we do something with existing helpers ? Something like:
>
> pte = pte_mkpte(pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL));
>
>> +
>> +	if (!early_radix_enabled())
>> +		panic("KASAN requires radix!");
>> +
>> +	for (i = 0; i < PTRS_PER_PTE; i++)
>> +		__set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
>> +			     &kasan_early_shadow_pte[i], pte, 0);
>> +
>> +	for (i = 0; i < PTRS_PER_PMD; i++)
>> +		pmd_populate_kernel(&init_mm, &kasan_early_shadow_pmd[i],
>> +				    kasan_early_shadow_pte);
>> +
>> +	for (i = 0; i < PTRS_PER_PUD; i++)
>> +		pud_populate(&init_mm, &kasan_early_shadow_pud[i],
>> +			     kasan_early_shadow_pmd);
>> +
>> +	memset(kasan_mem_to_shadow((void *)PAGE_OFFSET), KASAN_SHADOW_INIT,
>> +	       KASAN_SHADOW_SIZE);
>> +
>> +	kasan_populate_early_shadow(
>> +		kasan_mem_to_shadow((void *)RADIX_KERN_VIRT_START),
>> +		kasan_mem_to_shadow((void *)RADIX_VMALLOC_START));
>> +
>> +	/* leave a hole here for vmalloc */
>> +
>> +	kasan_populate_early_shadow(
>> +		kasan_mem_to_shadow((void *)RADIX_VMALLOC_END),
>> +		kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END));
>> +
>> +	flush_tlb_kernel_range((unsigned long)k_start, (unsigned long)k_end);
>> +
>> +	/* mark early shadow region as RO and wipe */
>> +	pte = __pte(__pa(kasan_early_shadow_page) |
>> +		    pgprot_val(PAGE_KERNEL_RO) | _PAGE_PTE);
>
> Same comment as above, use helpers ?
>
>> +	for (i = 0; i < PTRS_PER_PTE; i++)
>> +		__set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
>> +			     &kasan_early_shadow_pte[i], pte, 0);
>> +
>> +	/*
>> +	 * clear_page relies on some cache info that hasn't been set up yet.
>> +	 * It ends up looping ~forever and blows up other data.
>> +	 * Use memset instead.
>> +	 */
>> +	memset(kasan_early_shadow_page, 0, PAGE_SIZE);
>> +
>> +	/* Enable error messages */
>> +	init_task.kasan_depth = 0;
>> +	pr_info("KASAN init done (64-bit Book3S heavyweight mode)\n");
>> +}
>> 
>
> Christophe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] powerpc: Book3S 64-bit "heavyweight" KASAN support
  2019-12-17 13:30     ` Daniel Axtens
@ 2019-12-18  4:32       ` Daniel Axtens
  2019-12-18 13:39         ` Christophe Leroy
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Axtens @ 2019-12-18  4:32 UTC (permalink / raw)
  To: Christophe Leroy, linux-kernel, linux-mm, linuxppc-dev,
	kasan-dev, aneesh.kumar, bsingharora
  Cc: Michael Ellerman

Daniel Axtens <dja@axtens.net> writes:

> Hi Christophe,
>
> I'm working through your feedback, thank you. Regarding this one:
>
>>> --- a/arch/powerpc/kernel/process.c
>>> +++ b/arch/powerpc/kernel/process.c
>>> @@ -2081,7 +2081,14 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
>>>   		/*
>>>   		 * See if this is an exception frame.
>>>   		 * We look for the "regshere" marker in the current frame.
>>> +		 *
>>> +		 * KASAN may complain about this. If it is an exception frame,
>>> +		 * we won't have unpoisoned the stack in asm when we set the
>>> +		 * exception marker. If it's not an exception frame, who knows
>>> +		 * how things are laid out - the shadow could be in any state
>>> +		 * at all. Just disable KASAN reporting for now.
>>>   		 */
>>> +		kasan_disable_current();
>>>   		if (validate_sp(sp, tsk, STACK_INT_FRAME_SIZE)
>>>   		    && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
>>>   			struct pt_regs *regs = (struct pt_regs *)
>>> @@ -2091,6 +2098,7 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
>>>   			       regs->trap, (void *)regs->nip, (void *)lr);
>>>   			firstframe = 1;
>>>   		}
>>> +		kasan_enable_current();
>>
>> If this is really a concern for all targets including PPC32, should it 
>> be a separate patch with a Fixes: tag to be applied back in stable as well ?
>
> I've managed to repro this by commening out the kasan_disable/enable
> lines, and just booting in qemu without a disk attached:
>
> sudo qemu-system-ppc64 -accel kvm -m 2G -M pseries -cpu power9  -kernel ./vmlinux  -nographic -chardev stdio,id=charserial0,mux=on -device spapr-vty,chardev=charserial0,reg=0x30000000  -mon chardev=charserial0,mode=readline -nodefaults -smp 2 
>
> ...
>
> [    0.210740] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
> [    0.210789] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc1-next-20191213-16824-g469a24fbdb34 #12
> [    0.210844] Call Trace:
> [    0.210866] [c00000006a4839b0] [c000000001f74f48] dump_stack+0xfc/0x154 (unreliable)
> [    0.210915] [c00000006a483a00] [c00000000025411c] panic+0x258/0x59c
> [    0.210958] [c00000006a483aa0] [c0000000024870b0] mount_block_root+0x648/0x7ac
> [    0.211005] ==================================================================
> [    0.211054] BUG: KASAN: stack-out-of-bounds in show_stack+0x438/0x580
> [    0.211095] Read of size 8 at addr c00000006a483b00 by task swapper/0/1
> [    0.211134] 
> [    0.211152] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc1-next-20191213-16824-g469a24fbdb34 #12
> [    0.211207] Call Trace:
> [    0.211225] [c00000006a483680] [c000000001f74f48] dump_stack+0xfc/0x154 (unreliable)
> [    0.211274] [c00000006a4836d0] [c0000000008f877c] print_address_description.isra.10+0x7c/0x470
> [    0.211330] [c00000006a483760] [c0000000008f8e7c] __kasan_report+0x1bc/0x244
> [    0.211380] [c00000006a483830] [c0000000008f6eb8] kasan_report+0x18/0x30
> [    0.211422] [c00000006a483850] [c0000000008fa5d4] __asan_report_load8_noabort+0x24/0x40
> [    0.211471] [c00000006a483870] [c00000000003d448] show_stack+0x438/0x580
> [    0.211512] [c00000006a4839b0] [c000000001f74f48] dump_stack+0xfc/0x154
> [    0.211553] [c00000006a483a00] [c00000000025411c] panic+0x258/0x59c
> [    0.211595] [c00000006a483aa0] [c0000000024870b0] mount_block_root+0x648/0x7ac
> [    0.211644] [c00000006a483be0] [c000000002487784] prepare_namespace+0x1ec/0x240
> [    0.211694] [c00000006a483c60] [c00000000248669c] kernel_init_freeable+0x7f4/0x870
> [    0.211745] [c00000006a483da0] [c000000000011f30] kernel_init+0x3c/0x15c
> [    0.211787] [c00000006a483e20] [c00000000000bebc] ret_from_kernel_thread+0x5c/0x80
> [    0.211834] 
> [    0.211851] Allocated by task 0:
> [    0.211878]  save_stack+0x2c/0xe0
> [    0.211904]  __kasan_kmalloc.isra.16+0x11c/0x150
> [    0.211937]  kmem_cache_alloc_node+0x114/0x3b0
> [    0.211971]  copy_process+0x5b8/0x6410
> [    0.211996]  _do_fork+0x130/0xbf0
> [    0.212022]  kernel_thread+0xdc/0x130
> [    0.212047]  rest_init+0x44/0x184
> [    0.212072]  start_kernel+0x77c/0x7dc
> [    0.212098]  start_here_common+0x1c/0x20
> [    0.212122] 
> [    0.212139] Freed by task 0:
> [    0.212163] (stack is not available)
> [    0.212187] 
> [    0.212205] The buggy address belongs to the object at c00000006a480000
> [    0.212205]  which belongs to the cache thread_stack of size 16384
> [    0.212285] The buggy address is located 15104 bytes inside of
> [    0.212285]  16384-byte region [c00000006a480000, c00000006a484000)
> [    0.212356] The buggy address belongs to the page:
> [    0.212391] page:c00c0000001a9200 refcount:1 mapcount:0 mapping:c00000006a019e00 index:0x0 compound_mapcount: 0
> [    0.212455] raw: 007ffff000010200 5deadbeef0000100 5deadbeef0000122 c00000006a019e00
> [    0.212504] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
> [    0.212551] page dumped because: kasan: bad access detected
> [    0.212583] 
> [    0.212600] addr c00000006a483b00 is located in stack of task swapper/0/1 at offset 0 in frame:
> [    0.212656]  mount_block_root+0x0/0x7ac
> [    0.212681] 
> [    0.212698] this frame has 1 object:
> [    0.212722]  [32, 64) 'b'
> [    0.212723] 
> [    0.212755] Memory state around the buggy address:
> [    0.212788]  c00000006a483a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [    0.212836]  c00000006a483a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [    0.212884] >c00000006a483b00: f1 f1 f1 f1 00 00 00 00 f3 f3 f3 f3 00 00 00 00
> [    0.212931]                    ^
> [    0.212957]  c00000006a483b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [    0.213005]  c00000006a483c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [    0.213052] ==================================================================
> [    0.213100] Disabling lock debugging due to kernel taint
> [    0.213134] [c00000006a483be0] [c000000002487784] prepare_namespace+0x1ec/0x240
> [    0.213182] [c00000006a483c60] [c00000000248669c] kernel_init_freeable+0x7f4/0x870
> [    0.213231] [c00000006a483da0] [c000000000011f30] kernel_init+0x3c/0x15c
> [    0.213272] [c00000006a483e20] [c00000000000bebc] ret_from_kernel_thread+0x5c/0x80
>
> Is that something that reproduces on ppc32?
>
> I don't see it running the test_kasan tests, so I guess that matches up
> with your experience.

I've debugged this a bit further. If I put a dump_stack() in
kernel_init() right before I call kernel_init_freeable(), I don't see
the splat. But if I put a dump_stack() immediately inside
kernel_init_freeable() I do see the splat. I wonder if some early init
code isn't setting up the stack quite right?

I don't see this in walking stacks that contain an interrupt frame, so I
think the correct thing is to tear out this code and debug the weird
stack frame stuff around kernel_init_freeable in parallel.

Thanks for your attention to detail.

Regards,
Daniel

>
> Regards,
> Daniel
>
>
>
>>
>>>   
>>>   		sp = newsp;
>>>   	} while (count++ < kstack_depth_to_print);
>>> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
>>> index 6620f37abe73..d994c7c39c8d 100644
>>> --- a/arch/powerpc/kernel/prom.c
>>> +++ b/arch/powerpc/kernel/prom.c
>>> @@ -72,6 +72,7 @@ unsigned long tce_alloc_start, tce_alloc_end;
>>>   u64 ppc64_rma_size;
>>>   #endif
>>>   static phys_addr_t first_memblock_size;
>>> +static phys_addr_t top_phys_addr;
>>>   static int __initdata boot_cpu_count;
>>>   
>>>   static int __init early_parse_mem(char *p)
>>> @@ -449,6 +450,26 @@ static bool validate_mem_limit(u64 base, u64 *size)
>>>   {
>>>   	u64 max_mem = 1UL << (MAX_PHYSMEM_BITS);
>>>   
>>> +	/*
>>> +	 * To handle the NUMA/discontiguous memory case, don't allow a block
>>> +	 * to be added if it falls completely beyond the configured physical
>>> +	 * memory. Print an informational message.
>>> +	 *
>>> +	 * Frustratingly we also see this with qemu - it seems to split the
>>> +	 * specified memory into a number of smaller blocks. If this happens
>>> +	 * under qemu, it probably represents misconfiguration. So we want
>>> +	 * the message to be noticeable, but not shouty.
>>> +	 *
>>> +	 * See Documentation/powerpc/kasan.txt
>>> +	 */
>>> +	if (IS_ENABLED(CONFIG_KASAN) &&
>>> +	    (base >= ((u64)CONFIG_PHYS_MEM_SIZE_FOR_KASAN << 20))) {
>>> +		pr_warn("KASAN: not adding memory block at %llx (size %llx)\n"
>>> +			"This could be due to discontiguous memory or kernel misconfiguration.",
>>> +			base, *size);
>>> +		return false;
>>> +	}
>>> +
>>>   	if (base >= max_mem)
>>>   		return false;
>>>   	if ((base + *size) > max_mem)
>>> @@ -572,8 +593,11 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
>>>   
>>>   	/* Add the chunk to the MEMBLOCK list */
>>>   	if (add_mem_to_memblock) {
>>> -		if (validate_mem_limit(base, &size))
>>> +		if (validate_mem_limit(base, &size)) {
>>>   			memblock_add(base, size);
>>> +			if (base + size > top_phys_addr)
>>> +				top_phys_addr = base + size;
>>> +		}
>>
>> Can we use max() here ? Something like
>>
>> top_phys_addr = max(base + size, top_phys_addr);
>>
>>>   	}
>>>   }
>>>   
>>> @@ -613,6 +637,8 @@ static void __init early_reserve_mem_dt(void)
>>>   static void __init early_reserve_mem(void)
>>>   {
>>>   	__be64 *reserve_map;
>>> +	phys_addr_t kasan_shadow_start;
>>> +	phys_addr_t kasan_memory_size;
>>>   
>>>   	reserve_map = (__be64 *)(((unsigned long)initial_boot_params) +
>>>   			fdt_off_mem_rsvmap(initial_boot_params));
>>> @@ -651,6 +677,42 @@ static void __init early_reserve_mem(void)
>>>   		return;
>>>   	}
>>>   #endif
>>> +
>>> +	if (IS_ENABLED(CONFIG_KASAN) && IS_ENABLED(CONFIG_PPC_BOOK3S_64)) {
>>> +		kasan_memory_size =
>>> +			((phys_addr_t)CONFIG_PHYS_MEM_SIZE_FOR_KASAN << 20);
>>> +
>>> +		if (top_phys_addr < kasan_memory_size) {
>>> +			/*
>>> +			 * We are doomed. We shouldn't even be able to get this
>>> +			 * far, but we do in qemu. If we continue and turn
>>> +			 * relocations on, we'll take fatal page faults for
>>> +			 * memory that's not physically present. Instead,
>>> +			 * panic() here: it will be saved to __log_buf even if
>>> +			 * it doesn't get printed to the console.
>>> +			 */
>>> +			panic("Tried to book a KASAN kernel configured for %u MB with only %llu MB! Aborting.",
>>
>> book ==> boot ?
>>
>>> +			      CONFIG_PHYS_MEM_SIZE_FOR_KASAN,
>>> +			      (u64)(top_phys_addr >> 20));
>>> +		} else if (top_phys_addr > kasan_memory_size) {
>>> +			/* print a biiiig warning in hopes people notice */
>>> +			pr_err("===========================================\n"
>>> +				"Physical memory exceeds compiled-in maximum!\n"
>>> +				"This kernel was compiled for KASAN with %u MB physical memory.\n"
>>> +				"The physical memory detected is at least %llu MB.\n"
>>> +				"Memory above the compiled limit will not be used!\n"
>>> +				"===========================================\n",
>>> +				CONFIG_PHYS_MEM_SIZE_FOR_KASAN,
>>> +				(u64)(top_phys_addr >> 20));
>>> +		}
>>> +
>>> +		kasan_shadow_start = _ALIGN_DOWN(kasan_memory_size * 7 / 8,
>>> +						 PAGE_SIZE);
>>
>> Can't this fit on a single line ? powerpc allows 90 chars.
>>
>>> +		DBG("reserving %llx -> %llx for KASAN",
>>> +		    kasan_shadow_start, top_phys_addr);
>>> +		memblock_reserve(kasan_shadow_start,
>>> +				 top_phys_addr - kasan_shadow_start);
>>
>> Same ?
>>
>>> +	}
>>>   }
>>>   
>>>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>>> diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
>>> index 6577897673dd..f02b15c78e4d 100644
>>> --- a/arch/powerpc/mm/kasan/Makefile
>>> +++ b/arch/powerpc/mm/kasan/Makefile
>>> @@ -2,4 +2,5 @@
>>>   
>>>   KASAN_SANITIZE := n
>>>   
>>> -obj-$(CONFIG_PPC32)           += kasan_init_32.o
>>> +obj-$(CONFIG_PPC32)           += init_32.o
>>
>> Shouldn't we do ppc32 name change in another patch ?
>>
>>> +obj-$(CONFIG_PPC_BOOK3S_64)   += init_book3s_64.o
>>> diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c b/arch/powerpc/mm/kasan/init_32.c
>>> similarity index 100%
>>> rename from arch/powerpc/mm/kasan/kasan_init_32.c
>>> rename to arch/powerpc/mm/kasan/init_32.c
>>> diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c b/arch/powerpc/mm/kasan/init_book3s_64.c
>>> new file mode 100644
>>> index 000000000000..f961e96be136
>>> --- /dev/null
>>> +++ b/arch/powerpc/mm/kasan/init_book3s_64.c
>>> @@ -0,0 +1,72 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * KASAN for 64-bit Book3S powerpc
>>> + *
>>> + * Copyright (C) 2019 IBM Corporation
>>> + * Author: Daniel Axtens <dja@axtens.net>
>>> + */
>>> +
>>> +#define DISABLE_BRANCH_PROFILING
>>> +
>>> +#include <linux/kasan.h>
>>> +#include <linux/printk.h>
>>> +#include <linux/sched/task.h>
>>> +#include <asm/pgalloc.h>
>>> +
>>> +void __init kasan_init(void)
>>> +{
>>> +	int i;
>>> +	void *k_start = kasan_mem_to_shadow((void *)RADIX_KERN_VIRT_START);
>>> +	void *k_end = kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END);
>>> +
>>> +	pte_t pte = __pte(__pa(kasan_early_shadow_page) |
>>> +			  pgprot_val(PAGE_KERNEL) | _PAGE_PTE);
>>
>> Can't we do something with existing helpers ? Something like:
>>
>> pte = pte_mkpte(pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL));
>>
>>> +
>>> +	if (!early_radix_enabled())
>>> +		panic("KASAN requires radix!");
>>> +
>>> +	for (i = 0; i < PTRS_PER_PTE; i++)
>>> +		__set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
>>> +			     &kasan_early_shadow_pte[i], pte, 0);
>>> +
>>> +	for (i = 0; i < PTRS_PER_PMD; i++)
>>> +		pmd_populate_kernel(&init_mm, &kasan_early_shadow_pmd[i],
>>> +				    kasan_early_shadow_pte);
>>> +
>>> +	for (i = 0; i < PTRS_PER_PUD; i++)
>>> +		pud_populate(&init_mm, &kasan_early_shadow_pud[i],
>>> +			     kasan_early_shadow_pmd);
>>> +
>>> +	memset(kasan_mem_to_shadow((void *)PAGE_OFFSET), KASAN_SHADOW_INIT,
>>> +	       KASAN_SHADOW_SIZE);
>>> +
>>> +	kasan_populate_early_shadow(
>>> +		kasan_mem_to_shadow((void *)RADIX_KERN_VIRT_START),
>>> +		kasan_mem_to_shadow((void *)RADIX_VMALLOC_START));
>>> +
>>> +	/* leave a hole here for vmalloc */
>>> +
>>> +	kasan_populate_early_shadow(
>>> +		kasan_mem_to_shadow((void *)RADIX_VMALLOC_END),
>>> +		kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END));
>>> +
>>> +	flush_tlb_kernel_range((unsigned long)k_start, (unsigned long)k_end);
>>> +
>>> +	/* mark early shadow region as RO and wipe */
>>> +	pte = __pte(__pa(kasan_early_shadow_page) |
>>> +		    pgprot_val(PAGE_KERNEL_RO) | _PAGE_PTE);
>>
>> Same comment as above, use helpers ?
>>
>>> +	for (i = 0; i < PTRS_PER_PTE; i++)
>>> +		__set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
>>> +			     &kasan_early_shadow_pte[i], pte, 0);
>>> +
>>> +	/*
>>> +	 * clear_page relies on some cache info that hasn't been set up yet.
>>> +	 * It ends up looping ~forever and blows up other data.
>>> +	 * Use memset instead.
>>> +	 */
>>> +	memset(kasan_early_shadow_page, 0, PAGE_SIZE);
>>> +
>>> +	/* Enable error messages */
>>> +	init_task.kasan_depth = 0;
>>> +	pr_info("KASAN init done (64-bit Book3S heavyweight mode)\n");
>>> +}
>>> 
>>
>> Christophe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] powerpc: Book3S 64-bit "heavyweight" KASAN support
  2019-12-12 23:55   ` Jordan Niethe
@ 2019-12-18  7:01     ` Daniel Axtens
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Axtens @ 2019-12-18  7:01 UTC (permalink / raw)
  To: Jordan Niethe
  Cc: linux-kernel, linux-mm, linuxppc-dev, kasan-dev,
	christophe.leroy, aneesh.kumar, bsingharora, Michael Ellerman


>>    [For those not immersed in ppc64, in real mode, the top nibble or 2 bits
>>    (depending on radix/hash mmu) of the address is ignored. The linear
>>    mapping is placed at 0xc000000000000000. This means that a pointer to
>>    part of the linear mapping will work both in real mode, where it will be
>>    interpreted as a physical address of the form 0x000..., and out of real
>>    mode, where it will go via the linear mapping.]
>>
>
> How does hash or radix mmu mode effect how many bits are ignored in real mode?

Bah, you're picking on details that I picked up from random
conversations in the office rather than from reading the spec! :P

The ISA suggests that real addresses space is limited to at most 64
bits. ISAv3, Book III s5.7:

| * Host real address space size is 2^m bytes, m <= 60;
|   see Note 1.
| * Guest real address space size is 2 m bytes, m <= 60;
|   see Notes 1 and 2.
...
| Notes:
| 1. The value of m is implementation-dependent (sub-
|    ject to the maximum given above). When used to
|    address storage or to represent a guest real
|    address, the high-order 60-m bits of the “60-bit”
|    real address must be zeros.
| 2. The hypervisor may assign a guest real address
|    space size for each partition that uses Radix Tree
|    translation. Accesses to guest real storage out-
|    side this range but still mappable by the second
|    level Radix Tree will cause an HISI or HDSI.
|    Accesses to storage outside the mappable range
|    will have boundedly undefined results.

However, it doesn't follow from that passage that the top 4 bits are
always ignored when translations are off ('real mode'): see for example
the discussion of the HRMOR in s 5.7.3 and s 5.7.3.1. 

I think I got the 'top 2 bits on radix' thing from the discussion of
'quadrants' in arch/powerpc/include/asm/book3s/64/radix.h, which in turn
is discussed in s 5.7.5.1. Table 20 in particular is really helpful for
understanding it. But it's not especially relevant to what I'm actually
doing here.

I think to fully understand all of what's going on I would need to spend
some serious time with the entirety of s5.7, because there a lot of
quirks about how storage works! But I think for our purposes it suffices
to say:

  The kernel installs a linear mapping at effective address
  c000... onward. This is a one-to-one mapping with physical memory from
  0000... onward. Because of how memory accesses work on powerpc 64-bit
  Book3S, a kernel pointer in the linear map accesses the same memory
  both with translations on (accessing as an 'effective address'), and
  with translations off (accessing as a 'real address'). This works in
  both guests and the hypervisor. For more details, see s5.7 of Book III
  of version 3 of the ISA, in particular the Storage Control Overview,
  s5.7.3, and s5.7.5 - noting that this KASAN implementation currently
  only supports Radix.

Thanks for your attention to detail!

Regards,
Daniel




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] powerpc: Book3S 64-bit "heavyweight" KASAN support
  2019-12-18  4:32       ` Daniel Axtens
@ 2019-12-18 13:39         ` Christophe Leroy
  0 siblings, 0 replies; 12+ messages in thread
From: Christophe Leroy @ 2019-12-18 13:39 UTC (permalink / raw)
  To: Daniel Axtens, linux-kernel, linux-mm, linuxppc-dev, kasan-dev,
	aneesh.kumar, bsingharora
  Cc: Michael Ellerman



On 12/18/2019 04:32 AM, Daniel Axtens wrote:
> Daniel Axtens <dja@axtens.net> writes:
> 
>> Hi Christophe,
>>
>> I'm working through your feedback, thank you. Regarding this one:
>>
>>>> --- a/arch/powerpc/kernel/process.c
>>>> +++ b/arch/powerpc/kernel/process.c
>>>> @@ -2081,7 +2081,14 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
>>>>    		/*
>>>>    		 * See if this is an exception frame.
>>>>    		 * We look for the "regshere" marker in the current frame.
>>>> +		 *
>>>> +		 * KASAN may complain about this. If it is an exception frame,
>>>> +		 * we won't have unpoisoned the stack in asm when we set the
>>>> +		 * exception marker. If it's not an exception frame, who knows
>>>> +		 * how things are laid out - the shadow could be in any state
>>>> +		 * at all. Just disable KASAN reporting for now.
>>>>    		 */
>>>> +		kasan_disable_current();
>>>>    		if (validate_sp(sp, tsk, STACK_INT_FRAME_SIZE)
>>>>    		    && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
>>>>    			struct pt_regs *regs = (struct pt_regs *)
>>>> @@ -2091,6 +2098,7 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
>>>>    			       regs->trap, (void *)regs->nip, (void *)lr);
>>>>    			firstframe = 1;
>>>>    		}
>>>> +		kasan_enable_current();
>>>
>>> If this is really a concern for all targets including PPC32, should it
>>> be a separate patch with a Fixes: tag to be applied back in stable as well ?
>>
>> I've managed to repro this by commening out the kasan_disable/enable
>> lines, and just booting in qemu without a disk attached:
>>
>> sudo qemu-system-ppc64 -accel kvm -m 2G -M pseries -cpu power9  -kernel ./vmlinux  -nographic -chardev stdio,id=charserial0,mux=on -device spapr-vty,chardev=charserial0,reg=0x30000000  -mon chardev=charserial0,mode=readline -nodefaults -smp 2
>>
>> ...
>>
>> [    0.210740] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
>> [    0.210789] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc1-next-20191213-16824-g469a24fbdb34 #12
>> [    0.210844] Call Trace:
>> [    0.210866] [c00000006a4839b0] [c000000001f74f48] dump_stack+0xfc/0x154 (unreliable)
>> [    0.210915] [c00000006a483a00] [c00000000025411c] panic+0x258/0x59c
>> [    0.210958] [c00000006a483aa0] [c0000000024870b0] mount_block_root+0x648/0x7ac
>> [    0.211005] ==================================================================
>> [    0.211054] BUG: KASAN: stack-out-of-bounds in show_stack+0x438/0x580
>> [    0.211095] Read of size 8 at addr c00000006a483b00 by task swapper/0/1
>> [    0.211134]
>> [    0.211152] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc1-next-20191213-16824-g469a24fbdb34 #12
>> [    0.211207] Call Trace:
>> [    0.211225] [c00000006a483680] [c000000001f74f48] dump_stack+0xfc/0x154 (unreliable)
>> [    0.211274] [c00000006a4836d0] [c0000000008f877c] print_address_description.isra.10+0x7c/0x470
>> [    0.211330] [c00000006a483760] [c0000000008f8e7c] __kasan_report+0x1bc/0x244
>> [    0.211380] [c00000006a483830] [c0000000008f6eb8] kasan_report+0x18/0x30
>> [    0.211422] [c00000006a483850] [c0000000008fa5d4] __asan_report_load8_noabort+0x24/0x40
>> [    0.211471] [c00000006a483870] [c00000000003d448] show_stack+0x438/0x580
>> [    0.211512] [c00000006a4839b0] [c000000001f74f48] dump_stack+0xfc/0x154
>> [    0.211553] [c00000006a483a00] [c00000000025411c] panic+0x258/0x59c
>> [    0.211595] [c00000006a483aa0] [c0000000024870b0] mount_block_root+0x648/0x7ac
>> [    0.211644] [c00000006a483be0] [c000000002487784] prepare_namespace+0x1ec/0x240
>> [    0.211694] [c00000006a483c60] [c00000000248669c] kernel_init_freeable+0x7f4/0x870
>> [    0.211745] [c00000006a483da0] [c000000000011f30] kernel_init+0x3c/0x15c
>> [    0.211787] [c00000006a483e20] [c00000000000bebc] ret_from_kernel_thread+0x5c/0x80
>> [    0.211834]
>> [    0.211851] Allocated by task 0:
>> [    0.211878]  save_stack+0x2c/0xe0
>> [    0.211904]  __kasan_kmalloc.isra.16+0x11c/0x150
>> [    0.211937]  kmem_cache_alloc_node+0x114/0x3b0
>> [    0.211971]  copy_process+0x5b8/0x6410
>> [    0.211996]  _do_fork+0x130/0xbf0
>> [    0.212022]  kernel_thread+0xdc/0x130
>> [    0.212047]  rest_init+0x44/0x184
>> [    0.212072]  start_kernel+0x77c/0x7dc
>> [    0.212098]  start_here_common+0x1c/0x20
>> [    0.212122]
>> [    0.212139] Freed by task 0:
>> [    0.212163] (stack is not available)
>> [    0.212187]
>> [    0.212205] The buggy address belongs to the object at c00000006a480000
>> [    0.212205]  which belongs to the cache thread_stack of size 16384
>> [    0.212285] The buggy address is located 15104 bytes inside of
>> [    0.212285]  16384-byte region [c00000006a480000, c00000006a484000)
>> [    0.212356] The buggy address belongs to the page:
>> [    0.212391] page:c00c0000001a9200 refcount:1 mapcount:0 mapping:c00000006a019e00 index:0x0 compound_mapcount: 0
>> [    0.212455] raw: 007ffff000010200 5deadbeef0000100 5deadbeef0000122 c00000006a019e00
>> [    0.212504] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
>> [    0.212551] page dumped because: kasan: bad access detected
>> [    0.212583]
>> [    0.212600] addr c00000006a483b00 is located in stack of task swapper/0/1 at offset 0 in frame:
>> [    0.212656]  mount_block_root+0x0/0x7ac
>> [    0.212681]
>> [    0.212698] this frame has 1 object:
>> [    0.212722]  [32, 64) 'b'
>> [    0.212723]
>> [    0.212755] Memory state around the buggy address:
>> [    0.212788]  c00000006a483a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [    0.212836]  c00000006a483a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [    0.212884] >c00000006a483b00: f1 f1 f1 f1 00 00 00 00 f3 f3 f3 f3 00 00 00 00
>> [    0.212931]                    ^
>> [    0.212957]  c00000006a483b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [    0.213005]  c00000006a483c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [    0.213052] ==================================================================
>> [    0.213100] Disabling lock debugging due to kernel taint
>> [    0.213134] [c00000006a483be0] [c000000002487784] prepare_namespace+0x1ec/0x240
>> [    0.213182] [c00000006a483c60] [c00000000248669c] kernel_init_freeable+0x7f4/0x870
>> [    0.213231] [c00000006a483da0] [c000000000011f30] kernel_init+0x3c/0x15c
>> [    0.213272] [c00000006a483e20] [c00000000000bebc] ret_from_kernel_thread+0x5c/0x80
>>
>> Is that something that reproduces on ppc32?
>>
>> I don't see it running the test_kasan tests, so I guess that matches up
>> with your experience.
> 
> I've debugged this a bit further. If I put a dump_stack() in
> kernel_init() right before I call kernel_init_freeable(), I don't see
> the splat. But if I put a dump_stack() immediately inside
> kernel_init_freeable() I do see the splat. I wonder if some early init
> code isn't setting up the stack quite right?
> 
> I don't see this in walking stacks that contain an interrupt frame, so I
> think the correct thing is to tear out this code and debug the weird
> stack frame stuff around kernel_init_freeable in parallel.
> 
> Thanks for your attention to detail.
> 

I added a dump_stack() at the start of kernel_init_freeable() and I get 
nothing more than what follows:

[    0.000000] Activating Kernel Userspace Execution Prevention
[    0.000000] Activating Kernel Userspace Access Protection
[    0.000000] Linux version 5.5.0-rc2-s3k-dev-00932-gf5a548a2b0bc-dirty 
(root@po16098vm.idsi0.si.c-s.fr) (gcc version 5.5.0 (GCC)) #2596 PREEMPT 
Wed Dec 18 09:05:02 UTC 2019
[    0.000000] KASAN init done
[    0.000000] Using CMPC885 machine description
[    0.000000] -----------------------------------------------------
[    0.000000] phys_mem_size     = 0x8000000
[    0.000000] dcache_bsize      = 0x10
[    0.000000] icache_bsize      = 0x10
[    0.000000] cpu_features      = 0x0000000000000100
[    0.000000]   possible        = 0x0000000000000120
[    0.000000]   always          = 0x0000000000000000
[    0.000000] cpu_user_features = 0x84000000 0x00000000
[    0.000000] mmu_features      = 0x00000002
[    0.000000] -----------------------------------------------------
[    0.000000] SMC microcode patch installed
[    0.000000] Top of RAM: 0x8000000, Total RAM: 0x8000000
[    0.000000] Memory hole size: 0MB
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x0000000007ffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x0000000007ffffff]
[    0.000000] Initmem setup node 0 [mem 
0x0000000000000000-0x0000000007ffffff]
[    0.000000] On node 0 totalpages: 8192
[    0.000000]   Normal zone: 16 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 8192 pages, LIFO batch:0
[    0.000000] MMU: Allocated 76 bytes of context maps for 16 contexts
[    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[    0.000000] pcpu-alloc: [0] 0
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 8176
[    0.000000] Kernel command line: console=ttyCPM0,115200N8 
ip=192.168.0.3:192.168.0.1::255.0.0.0:vgoip:eth0:off
[    0.000000] Dentry cache hash table entries: 16384 (order: 2, 65536 
bytes, linear)
[    0.000000] Inode-cache hash table entries: 8192 (order: 1, 32768 
bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 95520K/131072K available (8640K kernel code, 
1840K rwdata, 2720K rodata, 656K init, 4679K bss, 35552K reserved, 0K 
cma-reserved)
[    0.000000] Kernel virtual memory layout:
[    0.000000]   * 0xf8000000..0x00000000  : kasan shadow mem
[    0.000000]   * 0xf7afc000..0xf7ffc000  : fixmap
[    0.000000]   * 0xc9000000..0xf7afc000  : vmalloc & ioremap
[    0.000000] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] rcu: Preemptible hierarchical RCU implementation.
[    0.000000] 	Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay 
is 10 jiffies.
[    0.000000] NR_IRQS: 512, nr_irqs: 512, preallocated irqs: 16
[    0.000000] Decrementer Frequency = 0x7de290
[    0.000000] time_init: decrementer frequency = 8.250000 MHz
[    0.000000] time_init: processor frequency   = 132.000000 MHz
[    0.000153] clocksource: timebase: mask: 0xffffffffffffffff 
max_cycles: 0x1e717863c, max_idle_ns: 440795202213 ns
[    0.000336] clocksource: timebase mult[79364d93] shift[24] registered
[    0.000569] clockevent: decrementer mult[21cac08] shift[32] cpu[0]
[    0.246805] printk: console [ttyCPM0] enabled
[    0.251462] pid_max: default: 32768 minimum: 301
[    0.259445] Mount-cache hash table entries: 4096 (order: 0, 16384 
bytes, linear)
[    0.267030] Mountpoint-cache hash table entries: 4096 (order: 0, 
16384 bytes, linear)
[    0.295520] CPU: 0 PID: 1 Comm: swapper Not tainted 
5.5.0-rc2-s3k-dev-00932-gf5a548a2b0bc-dirty #2596
[    0.304710] Call Trace:
[    0.307115] [c5121ed8] [c0b19290] kernel_init_freeable+0x20/0x240 
(unreliable)
[    0.314283] [c5121f18] [c0003ddc] kernel_init+0x18/0x10c
[    0.319728] [c5121f38] [c00121cc] ret_from_kernel_thread+0x14/0x1c
[    0.337686] rcu: Hierarchical SRCU implementation.
[    0.350228] devtmpfs: initialized
[    0.533520] device: 'platform': device_add
[    0.535188] bus: 'platform': registered
...


Christophe

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-12-18 13:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-12 15:16 [PATCH v3 0/3] KASAN for powerpc64 radix Daniel Axtens
2019-12-12 15:16 ` [PATCH v3 1/3] kasan: define and use MAX_PTRS_PER_* for early shadow tables Daniel Axtens
2019-12-12 15:55   ` Christophe Leroy
2019-12-13 21:37   ` Balbir Singh
2019-12-12 15:16 ` [PATCH v3 2/3] kasan: Document support on 32-bit powerpc Daniel Axtens
2019-12-12 15:16 ` [PATCH v3 3/3] powerpc: Book3S 64-bit "heavyweight" KASAN support Daniel Axtens
2019-12-12 23:55   ` Jordan Niethe
2019-12-18  7:01     ` Daniel Axtens
2019-12-13 12:27   ` Christophe Leroy
2019-12-17 13:30     ` Daniel Axtens
2019-12-18  4:32       ` Daniel Axtens
2019-12-18 13:39         ` Christophe Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).