Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [patch 064/158] mm: add generic ptdump
@ 2019-12-01  1:53 akpm
  2019-12-01  9:07 ` Borislav Petkov
  2019-12-03 10:47 ` David Hildenbrand
  0 siblings, 2 replies; 10+ messages in thread
From: akpm @ 2019-12-01  1:53 UTC (permalink / raw)
  To: akpm, alex, aou, ard.biesheuvel, arnd, aryabinin, benh,
	borntraeger, bp, cai, catalin.marinas, dave.hansen, dave.jiang,
	davem, dvyukov, glider, gor, heiko.carstens, hpa, james.morse,
	jhogan, kan.liang, linux-mm, linux, luto, mark.rutland, mawilcox,
	mingo, mm-commits, mpe, n-horiguchi, palmer, paul.burton,
	paul.walmsley, paulus, peterz, ralf, shashim, steven.price, tglx,
	torvalds, vgupta, will, zong.li

From: Steven Price <steven.price@arm.com>
Subject: mm: add generic ptdump

Add a generic version of page table dumping that architectures can opt-in
to

[steven.price@arm.com: v15]
  Link: http://lkml.kernel.org/r/20191101140942.51554-20-steven.price@arm.com
[cai@lca.pw: fix a -Wold-style-declaration warning]
  Link: http://lkml.kernel.org/r/1572895385-29194-1-git-send-email-cai@lca.pw
Link: http://lkml.kernel.org/r/20191028135910.33253-20-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Shiraz Hashim <shashim@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/ptdump.h |   21 +++++
 mm/Kconfig.debug       |   21 +++++
 mm/Makefile            |    1 
 mm/ptdump.c            |  151 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 194 insertions(+)

--- /dev/null
+++ a/include/linux/ptdump.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _LINUX_PTDUMP_H
+#define _LINUX_PTDUMP_H
+
+#include <linux/mm_types.h>
+
+struct ptdump_range {
+	unsigned long start;
+	unsigned long end;
+};
+
+struct ptdump_state {
+	void (*note_page)(struct ptdump_state *st, unsigned long addr,
+			  int level, unsigned long val);
+	const struct ptdump_range *range;
+};
+
+void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm);
+
+#endif /* _LINUX_PTDUMP_H */
--- a/mm/Kconfig.debug~mm-add-generic-ptdump
+++ a/mm/Kconfig.debug
@@ -117,3 +117,24 @@ config DEBUG_RODATA_TEST
     depends on STRICT_KERNEL_RWX
     ---help---
       This option enables a testcase for the setting rodata read-only.
+
+config GENERIC_PTDUMP
+	bool
+
+config PTDUMP_CORE
+	bool
+
+config PTDUMP_DEBUGFS
+	bool "Export kernel pagetable layout to userspace via debugfs"
+	depends on DEBUG_KERNEL
+	depends on DEBUG_FS
+	depends on GENERIC_PTDUMP
+	select PTDUMP_CORE
+	help
+	  Say Y here if you want to show the kernel pagetable layout in a
+	  debugfs file. This information is only useful for kernel developers
+	  who are working in architecture specific areas of the kernel.
+	  It is probably not a good idea to enable this feature in a production
+	  kernel.
+
+	  If in doubt, say N.
--- a/mm/Makefile~mm-add-generic-ptdump
+++ a/mm/Makefile
@@ -98,6 +98,7 @@ obj-$(CONFIG_CMA)	+= cma.o
 obj-$(CONFIG_MEMORY_BALLOON) += balloon_compaction.o
 obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
 obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
+obj-$(CONFIG_PTDUMP_CORE) += ptdump.o
 obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
 obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
 obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
--- /dev/null
+++ a/mm/ptdump.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/pagewalk.h>
+#include <linux/ptdump.h>
+#include <linux/kasan.h>
+
+static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
+			    unsigned long next, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+	pgd_t val = READ_ONCE(*pgd);
+
+	if (pgd_leaf(val))
+		st->note_page(st, addr, 1, pgd_val(val));
+
+	return 0;
+}
+
+static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
+			    unsigned long next, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+	p4d_t val = READ_ONCE(*p4d);
+
+	if (p4d_leaf(val))
+		st->note_page(st, addr, 2, p4d_val(val));
+
+	return 0;
+}
+
+static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
+			    unsigned long next, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+	pud_t val = READ_ONCE(*pud);
+
+	if (pud_leaf(val))
+		st->note_page(st, addr, 3, pud_val(val));
+
+	return 0;
+}
+
+static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
+			    unsigned long next, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+	pmd_t val = READ_ONCE(*pmd);
+
+	if (pmd_leaf(val))
+		st->note_page(st, addr, 4, pmd_val(val));
+
+	return 0;
+}
+
+static int ptdump_pte_entry(pte_t *pte, unsigned long addr,
+			    unsigned long next, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+
+	st->note_page(st, addr, 5, pte_val(READ_ONCE(*pte)));
+
+	return 0;
+}
+
+#ifdef CONFIG_KASAN
+/*
+ * This is an optimization for KASAN=y case. Since all kasan page tables
+ * eventually point to the kasan_early_shadow_page we could call note_page()
+ * right away without walking through lower level page tables. This saves
+ * us dozens of seconds (minutes for 5-level config) while checking for
+ * W+X mapping or reading kernel_page_tables debugfs file.
+ */
+static inline int note_kasan_page_table(struct mm_walk *walk,
+					unsigned long addr)
+{
+	struct ptdump_state *st = walk->private;
+
+	st->note_page(st, addr, 5, pte_val(kasan_early_shadow_pte[0]));
+	return 1;
+}
+
+static int ptdump_test_p4d(unsigned long addr, unsigned long next,
+			   p4d_t *p4d, struct mm_walk *walk)
+{
+#if CONFIG_PGTABLE_LEVELS > 4
+	if (p4d == lm_alias(kasan_early_shadow_p4d))
+		return note_kasan_page_table(walk, addr);
+#endif
+	return 0;
+}
+
+static int ptdump_test_pud(unsigned long addr, unsigned long next,
+			   pud_t *pud, struct mm_walk *walk)
+{
+#if CONFIG_PGTABLE_LEVELS > 3
+	if (pud == lm_alias(kasan_early_shadow_pud))
+		return note_kasan_page_table(walk, addr);
+#endif
+	return 0;
+}
+
+static int ptdump_test_pmd(unsigned long addr, unsigned long next,
+			   pmd_t *pmd, struct mm_walk *walk)
+{
+#if CONFIG_PGTABLE_LEVELS > 2
+	if (pmd == lm_alias(kasan_early_shadow_pmd))
+		return note_kasan_page_table(walk, addr);
+#endif
+	return 0;
+}
+#endif /* CONFIG_KASAN */
+
+static int ptdump_hole(unsigned long addr, unsigned long next,
+		       int depth, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+
+	st->note_page(st, addr, depth + 1, 0);
+
+	return 0;
+}
+
+static const struct mm_walk_ops ptdump_ops = {
+	.pgd_entry	= ptdump_pgd_entry,
+	.p4d_entry	= ptdump_p4d_entry,
+	.pud_entry	= ptdump_pud_entry,
+	.pmd_entry	= ptdump_pmd_entry,
+	.pte_entry	= ptdump_pte_entry,
+#ifdef CONFIG_KASAN
+	.test_p4d	= ptdump_test_p4d,
+	.test_pud	= ptdump_test_pud,
+	.test_pmd	= ptdump_test_pmd,
+#endif
+	.pte_hole	= ptdump_hole,
+};
+
+void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm)
+{
+	const struct ptdump_range *range = st->range;
+
+	down_read(&mm->mmap_sem);
+	while (range->start != range->end) {
+		walk_page_range_novma(mm, range->start, range->end,
+				      &ptdump_ops, st);
+		range++;
+	}
+	up_read(&mm->mmap_sem);
+
+	/* Flush out the last page */
+	st->note_page(st, 0, 0, 0);
+}
_


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 064/158] mm: add generic ptdump
  2019-12-01  1:53 [patch 064/158] mm: add generic ptdump akpm
@ 2019-12-01  9:07 ` Borislav Petkov
  2019-12-01 14:45   ` Linus Torvalds
  2019-12-03 10:47 ` David Hildenbrand
  1 sibling, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2019-12-01  9:07 UTC (permalink / raw)
  To: akpm
  Cc: alex, aou, ard.biesheuvel, arnd, aryabinin, benh, borntraeger,
	cai, catalin.marinas, dave.hansen, dave.jiang, davem, dvyukov,
	glider, gor, heiko.carstens, hpa, james.morse, jhogan, kan.liang,
	linux-mm, linux, luto, mark.rutland, mawilcox, mingo, mm-commits,
	mpe, n-horiguchi, palmer, paul.burton, paul.walmsley, paulus,
	peterz, ralf, shashim, steven.price, tglx, torvalds, vgupta,
	will, zong.li

On Sat, Nov 30, 2019 at 05:53:04PM -0800, akpm@linux-foundation.org wrote:
> From: Steven Price <steven.price@arm.com>
> Subject: mm: add generic ptdump
> 
> Add a generic version of page table dumping that architectures can opt-in
> to

That generic ptdump stuff is probably causing a splat on 32-bit:

https://lkml.kernel.org/r/20191125144946.GA6628@duo.ucw.cz

.config is attached in that thread too and triggers pretty reliably in a
vm but I haven't poked at it further.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 064/158] mm: add generic ptdump
  2019-12-01  9:07 ` Borislav Petkov
@ 2019-12-01 14:45   ` Linus Torvalds
  2019-12-01 15:10     ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2019-12-01 14:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andrew Morton, alex, aou, Ard Biesheuvel, Arnd Bergmann,
	Andrey Ryabinin, Benjamin Herrenschmidt, Christian Borntraeger,
	Qian Cai, Catalin Marinas, Dave Hansen, dave.jiang, David Miller,
	Dmitry Vyukov, Alexander Potapenko, Vasily Gorbik,
	Heiko Carstens, Peter Anvin, James Morse, James Hogan, Kan Liang,
	Linux-MM, Russell King - ARM Linux, Andrew Lutomirski,
	Mark Rutland, mawilcox, Ingo Molnar, mm-commits,
	Michael Ellerman, n-horiguchi, Palmer Dabbelt, Paul Burton,
	Paul Walmsley, Paul Mackerras, Peter Zijlstra, ralf, shashim,
	Steven Price, Thomas Gleixner, vgupta, Will Deacon, zong.li

On Sun, Dec 1, 2019 at 1:09 AM Borislav Petkov <bp@alien8.de> wrote:
>
> That generic ptdump stuff is probably causing a splat on 32-bit:
>
> https://lkml.kernel.org/r/20191125144946.GA6628@duo.ucw.cz

Hmm. I'm not sure about code generation, but for me that config gives me

  60:   55                      push   %ebp
  61:   89 e5                   mov    %esp,%ebp
  63:   57                      push   %edi
  64:   8b 4d 08                mov    0x8(%ebp),%ecx
  67:   56                      push   %esi
  68:   53                      push   %ebx
  69:   8b 30                   mov    (%eax),%esi
  6b:   8b 59 10                mov    0x10(%ecx),%ebx

so that "ptdump_pte_entry+9" is the "mov    (%eax),%esi"

And that is "READ_ONCE(*pte)"

So the pte pointer itself is broken. Which sounds really odd.

 Hmm. I've applied the whole series to a local branch, but I'm not
merging it into my master branch yet. Can somebody figure out how the
page walking could get that broken?

             Linus


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 064/158] mm: add generic ptdump
  2019-12-01 14:45   ` Linus Torvalds
@ 2019-12-01 15:10     ` Borislav Petkov
  2019-12-01 15:21       ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2019-12-01 15:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, alex, aou, Ard Biesheuvel, Arnd Bergmann,
	Andrey Ryabinin, Benjamin Herrenschmidt, Christian Borntraeger,
	Qian Cai, Catalin Marinas, Dave Hansen, dave.jiang, David Miller,
	Dmitry Vyukov, Alexander Potapenko, Vasily Gorbik,
	Heiko Carstens, Peter Anvin, James Morse, James Hogan, Kan Liang,
	Linux-MM, Russell King - ARM Linux, Andrew Lutomirski,
	Mark Rutland, mawilcox, Ingo Molnar, mm-commits,
	Michael Ellerman, n-horiguchi, Palmer Dabbelt, Paul Burton,
	Paul Walmsley, Paul Mackerras, Peter Zijlstra, ralf, shashim,
	Steven Price, Thomas Gleixner, vgupta, Will Deacon, zong.li

On Sun, Dec 01, 2019 at 06:45:23AM -0800, Linus Torvalds wrote:
> On Sun, Dec 1, 2019 at 1:09 AM Borislav Petkov <bp@alien8.de> wrote:
> >
> > That generic ptdump stuff is probably causing a splat on 32-bit:
> >
> > https://lkml.kernel.org/r/20191125144946.GA6628@duo.ucw.cz
> 
> Hmm. I'm not sure about code generation, but for me that config gives me

Note that I typed "probably" above because I'm not 100% sure it is
those patches that would cause it. I mean, I saw EIP pointing to
ptdump_pte_entry and was able to repro on linux-next with the .config in
a vm.

But then your master or tip/master wouldn't trigger so I shelved that as
it is merge window and other 32-bit shit was broken, which needed more
attention.

So lemme first confirm it really is caused by those patches.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 064/158] mm: add generic ptdump
  2019-12-01 15:10     ` Borislav Petkov
@ 2019-12-01 15:21       ` Borislav Petkov
  2019-12-01 15:45         ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2019-12-01 15:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, alex, aou, Ard Biesheuvel, Arnd Bergmann,
	Andrey Ryabinin, Benjamin Herrenschmidt, Christian Borntraeger,
	Qian Cai, Catalin Marinas, Dave Hansen, dave.jiang, David Miller,
	Dmitry Vyukov, Alexander Potapenko, Vasily Gorbik,
	Heiko Carstens, Peter Anvin, James Morse, James Hogan, Kan Liang,
	Linux-MM, Russell King - ARM Linux, Andrew Lutomirski,
	Mark Rutland, mawilcox, Ingo Molnar, mm-commits,
	Michael Ellerman, n-horiguchi, Palmer Dabbelt, Paul Burton,
	Paul Walmsley, Paul Mackerras, Peter Zijlstra, ralf, shashim,
	Steven Price, Thomas Gleixner, vgupta, Will Deacon, zong.li

On Sun, Dec 01, 2019 at 04:10:11PM +0100, Borislav Petkov wrote:
> So lemme first confirm it really is caused by those patches.

Yeah, those patches are causing it. Tried your current master - it is OK
- and then applied Andrew's patches I was CCed on, ontop, and I got in a
VM:

VFS: Mounted root (ext4 filesystem) readonly on device 8:2.
devtmpfs: mounted
Freeing unused kernel image (initmem) memory: 664K
Write protecting kernel text and read-only data: 18164k
NX-protecting the kernel data: 7416k
BUG: kernel NULL pointer dereference, address: 00000014
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
*pdpt = 0000000000000000 *pde = f000ff53f000ff53 
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
EIP: __lock_acquire.isra.0+0x2e8/0x4e0
Code: e8 bd a1 2f 00 85 c0 74 11 8b 1d 08 8f 26 c5 85 db 0f 84 05 1a 00 00 8d 76 00 31 db 8d 65 f4 89 d8 5b 5e 5f 5d c3 8d 74 26 00 <8b> 44 90 04 85 c0 0f 85 4c fd ff ff e9 33 fd ff ff 8d b4 26 00 00
EAX: 00000010 EBX: 00000010 ECX: 00000001 EDX: 00000000
ESI: f1070040 EDI: f1070040 EBP: f1073e04 ESP: f1073de0
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010097
CR0: 80050033 CR2: 00000014 CR3: 05348000 CR4: 001406b0
Call Trace:
 lock_acquire+0x42/0x60
 ? __walk_page_range+0x4d9/0x590
 _raw_spin_lock+0x22/0x40
 ? __walk_page_range+0x4d9/0x590
 __walk_page_range+0x4d9/0x590
 walk_page_range_novma+0x57/0xa0
 ptdump_walk_pgd+0x38/0x70
 ptdump_walk_pgd_level_core+0x66/0x90
 ? ptdump_walk_pgd_level_core+0x90/0x90
 ptdump_walk_pgd_level_checkwx+0x16/0x19
 mark_rodata_ro+0x95/0x9a
 ? rest_init+0xfb/0xfb
 kernel_init+0x25/0xe5
 ret_from_fork+0x2e/0x38
Modules linked in:
CR2: 0000000000000014
---[ end trace 8b67ede738f0029a ]---
EIP: __lock_acquire.isra.0+0x2e8/0x4e0
Code: e8 bd a1 2f 00 85 c0 74 11 8b 1d 08 8f 26 c5 85 db 0f 84 05 1a 00 00 8d 76 00 31 db 8d 65 f4 89 d8 5b 5e 5f 5d c3 8d 74 26 00 <8b> 44 90 04 85 c0 0f 85 4c fd ff ff e9 33 fd ff ff 8d b4 26 00 00
EAX: 00000010 EBX: 00000010 ECX: 00000001 EDX: 00000000
ESI: f1070040 EDI: f1070040 EBP: f1073e04 ESP: f1073de0
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010097
CR0: 80050033 CR2: 00000014 CR3: 05348000 CR4: 001406b0
note: swapper/0[1] exited with preempt_count 1
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 064/158] mm: add generic ptdump
  2019-12-01 15:21       ` Borislav Petkov
@ 2019-12-01 15:45         ` Borislav Petkov
  2019-12-02  9:09           ` Steven Price
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2019-12-01 15:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, alex, aou, Ard Biesheuvel, Arnd Bergmann,
	Andrey Ryabinin, Benjamin Herrenschmidt, Christian Borntraeger,
	Qian Cai, Catalin Marinas, Dave Hansen, dave.jiang, David Miller,
	Dmitry Vyukov, Alexander Potapenko, Vasily Gorbik,
	Heiko Carstens, Peter Anvin, James Morse, James Hogan, Kan Liang,
	Linux-MM, Russell King - ARM Linux, Andrew Lutomirski,
	Mark Rutland, mawilcox, Ingo Molnar, mm-commits,
	Michael Ellerman, n-horiguchi, Palmer Dabbelt, Paul Burton,
	Paul Walmsley, Paul Mackerras, Peter Zijlstra, ralf, shashim,
	Steven Price, Thomas Gleixner, vgupta, Will Deacon, zong.li

On Sun, Dec 01, 2019 at 04:21:19PM +0100, Borislav Petkov wrote:
> On Sun, Dec 01, 2019 at 04:10:11PM +0100, Borislav Petkov wrote:
> > So lemme first confirm it really is caused by those patches.
> 
> Yeah, those patches are causing it. Tried your current master - it is OK
> - and then applied Andrew's patches I was CCed on, ontop, and I got in a
> VM:
> 
> VFS: Mounted root (ext4 filesystem) readonly on device 8:2.
> devtmpfs: mounted
> Freeing unused kernel image (initmem) memory: 664K
> Write protecting kernel text and read-only data: 18164k
> NX-protecting the kernel data: 7416k
> BUG: kernel NULL pointer dereference, address: 00000014
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> *pdpt = 0000000000000000 *pde = f000ff53f000ff53 
> Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0+ #3
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
> EIP: __lock_acquire.isra.0+0x2e8/0x4e0
> Code: e8 bd a1 2f 00 85 c0 74 11 8b 1d 08 8f 26 c5 85 db 0f 84 05 1a 00 00 8d 76 00 31 db 8d 65 f4 89 d8 5b 5e 5f 5d c3 8d 74 26 00 <8b> 44 90 04 85 c0 0f 85 4c fd ff ff e9 33 fd ff ff 8d b4 26 00 00
> EAX: 00000010 EBX: 00000010 ECX: 00000001 EDX: 00000000
> ESI: f1070040 EDI: f1070040 EBP: f1073e04 ESP: f1073de0
> DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010097
> CR0: 80050033 CR2: 00000014 CR3: 05348000 CR4: 001406b0
> Call Trace:
>  lock_acquire+0x42/0x60
>  ? __walk_page_range+0x4d9/0x590
>  _raw_spin_lock+0x22/0x40
>  ? __walk_page_range+0x4d9/0x590
>  __walk_page_range+0x4d9/0x590

Ok, some more staring. That offset is:

# mm/pagewalk.c:31:     pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
        sall    $5, %eax        #, tmp235
        addl    -64(%ebp), %eax # %sfp, tmp236
        call    page_address    #
        addl    %eax, %esi      # tmp306, __pte
# ./include/linux/spinlock.h:338:       raw_spin_lock(&lock->rlock);
        movl    -76(%ebp), %eax # %sfp,
        call    _raw_spin_lock  #
        movl    %edi, %edx      # start, start
        movl    %ebx, -64(%ebp) # __boundary, %sfp
        movl    -80(%ebp), %edi # %sfp, ops
        movl    %esi, -40(%ebp) # __pte, %sfp

i.e., pte_offset_map_lock() and I *think* that ptl thing is NULL. The Code
section decodes to:

Code: e8 bd a1 2f 00 85 c0 74 11 8b 1d 08 8f 26 c5 85 db 0f 84 05 1a 00 00 8d 76 00 31 db 8d 65 f4 89 d8 5b 5e 5f 5d c3 8d 74 26 00 <8b> 44 90 04 85 c0 0f 85 4c fd ff ff e9 33 fd ff ff 8d b4 26 00 00
All code
========
   0:   e8 bd a1 2f 00          callq  0x2fa1c2
   5:   85 c0                   test   %eax,%eax
   7:   74 11                   je     0x1a
   9:   8b 1d 08 8f 26 c5       mov    -0x3ad970f8(%rip),%ebx        # 0xffffffffc5268f17
   f:   85 db                   test   %ebx,%ebx
  11:   0f 84 05 1a 00 00       je     0x1a1c
  17:   8d 76 00                lea    0x0(%rsi),%esi
  1a:   31 db                   xor    %ebx,%ebx
  1c:   8d 65 f4                lea    -0xc(%rbp),%esp
  1f:   89 d8                   mov    %ebx,%eax
  21:   5b                      pop    %rbx
  22:   5e                      pop    %rsi
  23:   5f                      pop    %rdi
  24:   5d                      pop    %rbp
  25:   c3                      retq   
  26:   8d 74 26 00             lea    0x0(%rsi,%riz,1),%esi
  2a:*  8b 44 90 04             mov    0x4(%rax,%rdx,4),%eax            <-- trapping instruction
  2e:   85 c0                   test   %eax,%eax
  30:   0f 85 4c fd ff ff       jne    0xfffffffffffffd82
  36:   e9 33 fd ff ff          jmpq   0xfffffffffffffd6e
  3b:   8d                      .byte 0x8d
  3c:   b4 26

which is this corresponding piece in __lock_acquire():

        call    debug_locks_off #
# kernel/locking/lockdep.c:3775:        if (!debug_locks_off())
        testl   %eax, %eax      # tmp325
        je      .L562   #,
# kernel/locking/lockdep.c:3777:        if (debug_locks_silent)
        movl    debug_locks_silent, %ebx        # debug_locks_silent, <retval>
# kernel/locking/lockdep.c:3777:        if (debug_locks_silent)
        testl   %ebx, %ebx      # <retval>
        je      .L642   #,
        .p2align 4,,10
        .p2align 3
.L562:
# kernel/locking/lockdep.c:3826:                return 0;
        xorl    %ebx, %ebx      # <retval>
.L557:
# kernel/locking/lockdep.c:3982: }
        leal    -12(%ebp), %esp #,
        movl    %ebx, %eax      # <retval>,
        popl    %ebx    #
        popl    %esi    #
        popl    %edi    #
        popl    %ebp    #
        ret     
        .p2align 4,,10
        .p2align 3
.L649:
# kernel/locking/lockdep.c:3832:                class = lock->class_cache[subclass];
        movl    4(%eax,%edx,4), %eax    # lock_7(D)->class_cache, class
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(the LEA above is NOP padding) and %eax and %edx are both NULL.

i.e., that thing:

        if (subclass < NR_LOCKDEP_CACHING_CLASSES)
                class = lock->class_cache[subclass];
			^^^^^^^^^^^^^^^

AFAICT, of course.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 064/158] mm: add generic ptdump
  2019-12-01 15:45         ` Borislav Petkov
@ 2019-12-02  9:09           ` Steven Price
  2019-12-02 15:42             ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Price @ 2019-12-02  9:09 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linus Torvalds, Andrew Morton, alex, aou, Ard Biesheuvel,
	Arnd Bergmann, Andrey Ryabinin, Benjamin Herrenschmidt,
	Christian Borntraeger, Qian Cai, Catalin Marinas, Dave Hansen,
	dave.jiang, David Miller, Dmitry Vyukov, Alexander Potapenko,
	Vasily Gorbik, Heiko Carstens, Peter Anvin, James Morse,
	James Hogan, Kan Liang, Linux-MM, Russell King - ARM Linux,
	Andrew Lutomirski, Mark Rutland, mawilcox, Ingo Molnar,
	mm-commits, Michael Ellerman, n-horiguchi, Palmer Dabbelt,
	Paul Burton, Paul Walmsley, Paul Mackerras, Peter Zijlstra, ralf,
	shashim, Thomas Gleixner, vgupta, Will Deacon, zong.li

On Sun, Dec 01, 2019 at 03:45:54PM +0000, Borislav Petkov wrote:
> On Sun, Dec 01, 2019 at 04:21:19PM +0100, Borislav Petkov wrote:
> > On Sun, Dec 01, 2019 at 04:10:11PM +0100, Borislav Petkov wrote:
> > > So lemme first confirm it really is caused by those patches.
> > 
> > Yeah, those patches are causing it. Tried your current master - it is OK
> > - and then applied Andrew's patches I was CCed on, ontop, and I got in a
> > VM:
> > 
> > VFS: Mounted root (ext4 filesystem) readonly on device 8:2.
> > devtmpfs: mounted
> > Freeing unused kernel image (initmem) memory: 664K
> > Write protecting kernel text and read-only data: 18164k
> > NX-protecting the kernel data: 7416k
> > BUG: kernel NULL pointer dereference, address: 00000014
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > *pdpt = 0000000000000000 *pde = f000ff53f000ff53 
> > Oops: 0000 [#1] PREEMPT SMP PTI
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0+ #3
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
> > EIP: __lock_acquire.isra.0+0x2e8/0x4e0
> > Code: e8 bd a1 2f 00 85 c0 74 11 8b 1d 08 8f 26 c5 85 db 0f 84 05 1a 00 00 8d 76 00 31 db 8d 65 f4 89 d8 5b 5e 5f 5d c3 8d 74 26 00 <8b> 44 90 04 85 c0 0f 85 4c fd ff ff e9 33 fd ff ff 8d b4 26 00 00
> > EAX: 00000010 EBX: 00000010 ECX: 00000001 EDX: 00000000
> > ESI: f1070040 EDI: f1070040 EBP: f1073e04 ESP: f1073de0
> > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010097
> > CR0: 80050033 CR2: 00000014 CR3: 05348000 CR4: 001406b0
> > Call Trace:
> >  lock_acquire+0x42/0x60
> >  ? __walk_page_range+0x4d9/0x590
> >  _raw_spin_lock+0x22/0x40
> >  ? __walk_page_range+0x4d9/0x590
> >  __walk_page_range+0x4d9/0x590
> 

Thanks for looking into this. I've been able to reproduce it locally
with that config and I can see what's going wrong here.

walk_pte_range() is being called with end=0xffffffff, but the comparison
in the function is:

	if (addr == end)
		break;

So addr never actually equals end, it skips from 0xfffff000 to 0x0. This
means the function continues walking straight off the end and
dereferencing 'random' ptes. As a quick hack I modified the condition
to:

	if (addr == end || !addr)
		break;

and I can then boot the VM. Clearly that's not the correct solution -
I'll go away and have a think about the cleanest way of handling this
case and also do some more testing before I resubmit for 5.6.

Sorry for the trouble and thanks again for investigating.

Steve



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 064/158] mm: add generic ptdump
  2019-12-02  9:09           ` Steven Price
@ 2019-12-02 15:42             ` Borislav Petkov
  0 siblings, 0 replies; 10+ messages in thread
From: Borislav Petkov @ 2019-12-02 15:42 UTC (permalink / raw)
  To: Steven Price
  Cc: Linus Torvalds, Andrew Morton, alex, aou, Ard Biesheuvel,
	Arnd Bergmann, Andrey Ryabinin, Benjamin Herrenschmidt,
	Christian Borntraeger, Qian Cai, Catalin Marinas, Dave Hansen,
	dave.jiang, David Miller, Dmitry Vyukov, Alexander Potapenko,
	Vasily Gorbik, Heiko Carstens, Peter Anvin, James Morse,
	James Hogan, Kan Liang, Linux-MM, Russell King - ARM Linux,
	Andrew Lutomirski, Mark Rutland, mawilcox, Ingo Molnar,
	mm-commits, Michael Ellerman, n-horiguchi, Palmer Dabbelt,
	Paul Burton, Paul Walmsley, Paul Mackerras, Peter Zijlstra, ralf,
	shashim, Thomas Gleixner, vgupta, Will Deacon, zong.li

On Mon, Dec 02, 2019 at 09:09:24AM +0000, Steven Price wrote:
> Sorry for the trouble and thanks again for investigating.

You're very welcome! 8-)

Holler if you need the new version tested a bit.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 064/158] mm: add generic ptdump
  2019-12-01  1:53 [patch 064/158] mm: add generic ptdump akpm
  2019-12-01  9:07 ` Borislav Petkov
@ 2019-12-03 10:47 ` David Hildenbrand
  2019-12-03 11:00   ` David Hildenbrand
  1 sibling, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2019-12-03 10:47 UTC (permalink / raw)
  To: akpm, alex, aou, ard.biesheuvel, arnd, aryabinin, benh,
	borntraeger, bp, cai, catalin.marinas, dave.hansen, dave.jiang,
	davem, dvyukov, glider, gor, heiko.carstens, hpa, james.morse,
	jhogan, kan.liang, linux-mm, linux, luto, mark.rutland, mawilcox,
	mingo, mm-commits, mpe, n-horiguchi, palmer, paul.burton,
	paul.walmsley, paulus, peterz, ralf, shashim, steven.price, tglx,
	torvalds, vgupta, will, zong.li

On 01.12.19 02:53, akpm@linux-foundation.org wrote:
> From: Steven Price <steven.price@arm.com>
> Subject: mm: add generic ptdump
> 
> Add a generic version of page table dumping that architectures can opt-in
> to
> 
> [steven.price@arm.com: v15]
>   Link: http://lkml.kernel.org/r/20191101140942.51554-20-steven.price@arm.com
> [cai@lca.pw: fix a -Wold-style-declaration warning]
>   Link: http://lkml.kernel.org/r/1572895385-29194-1-git-send-email-cai@lca.pw
> Link: http://lkml.kernel.org/r/20191028135910.33253-20-steven.price@arm.com
> Signed-off-by: Steven Price <steven.price@arm.com>
> Signed-off-by: Qian Cai <cai@lca.pw>
> Cc: Albert Ou <aou@eecs.berkeley.edu>
> Cc: Alexander Potapenko <glider@google.com>
> Cc: Alexandre Ghiti <alex@ghiti.fr>
> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: James Hogan <jhogan@kernel.org>
> Cc: James Morse <james.morse@arm.com>
> Cc: "Liang, Kan" <kan.liang@linux.intel.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Matthew Wilcox <mawilcox@microsoft.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Palmer Dabbelt <palmer@sifive.com>
> Cc: Paul Burton <paul.burton@mips.com>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Paul Walmsley <paul.walmsley@sifive.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Shiraz Hashim <shashim@codeaurora.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Vineet Gupta <vgupta@synopsys.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Zong Li <zong.li@sifive.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  include/linux/ptdump.h |   21 +++++
>  mm/Kconfig.debug       |   21 +++++
>  mm/Makefile            |    1 
>  mm/ptdump.c            |  151 +++++++++++++++++++++++++++++++++++++++
>  4 files changed, 194 insertions(+)
> 
> --- /dev/null
> +++ a/include/linux/ptdump.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_PTDUMP_H
> +#define _LINUX_PTDUMP_H
> +
> +#include <linux/mm_types.h>
> +
> +struct ptdump_range {
> +	unsigned long start;
> +	unsigned long end;
> +};
> +
> +struct ptdump_state {
> +	void (*note_page)(struct ptdump_state *st, unsigned long addr,
> +			  int level, unsigned long val);
> +	const struct ptdump_range *range;
> +};
> +
> +void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm);
> +
> +#endif /* _LINUX_PTDUMP_H */
> --- a/mm/Kconfig.debug~mm-add-generic-ptdump
> +++ a/mm/Kconfig.debug
> @@ -117,3 +117,24 @@ config DEBUG_RODATA_TEST
>      depends on STRICT_KERNEL_RWX
>      ---help---
>        This option enables a testcase for the setting rodata read-only.
> +
> +config GENERIC_PTDUMP
> +	bool
> +
> +config PTDUMP_CORE
> +	bool
> +
> +config PTDUMP_DEBUGFS
> +	bool "Export kernel pagetable layout to userspace via debugfs"
> +	depends on DEBUG_KERNEL
> +	depends on DEBUG_FS
> +	depends on GENERIC_PTDUMP
> +	select PTDUMP_CORE
> +	help
> +	  Say Y here if you want to show the kernel pagetable layout in a
> +	  debugfs file. This information is only useful for kernel developers
> +	  who are working in architecture specific areas of the kernel.
> +	  It is probably not a good idea to enable this feature in a production
> +	  kernel.
> +
> +	  If in doubt, say N.
> --- a/mm/Makefile~mm-add-generic-ptdump
> +++ a/mm/Makefile
> @@ -98,6 +98,7 @@ obj-$(CONFIG_CMA)	+= cma.o
>  obj-$(CONFIG_MEMORY_BALLOON) += balloon_compaction.o
>  obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
>  obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
> +obj-$(CONFIG_PTDUMP_CORE) += ptdump.o
>  obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
>  obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
>  obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
> --- /dev/null
> +++ a/mm/ptdump.c
> @@ -0,0 +1,151 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/pagewalk.h>
> +#include <linux/ptdump.h>
> +#include <linux/kasan.h>
> +
> +static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
> +			    unsigned long next, struct mm_walk *walk)
> +{
> +	struct ptdump_state *st = walk->private;
> +	pgd_t val = READ_ONCE(*pgd);
> +
> +	if (pgd_leaf(val))
> +		st->note_page(st, addr, 1, pgd_val(val));
> +
> +	return 0;
> +}
> +
> +static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
> +			    unsigned long next, struct mm_walk *walk)
> +{
> +	struct ptdump_state *st = walk->private;
> +	p4d_t val = READ_ONCE(*p4d);
> +
> +	if (p4d_leaf(val))
> +		st->note_page(st, addr, 2, p4d_val(val));
> +
> +	return 0;
> +}
> +
> +static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
> +			    unsigned long next, struct mm_walk *walk)
> +{
> +	struct ptdump_state *st = walk->private;
> +	pud_t val = READ_ONCE(*pud);
> +
> +	if (pud_leaf(val))
> +		st->note_page(st, addr, 3, pud_val(val));
> +
> +	return 0;
> +}
> +
> +static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
> +			    unsigned long next, struct mm_walk *walk)
> +{
> +	struct ptdump_state *st = walk->private;
> +	pmd_t val = READ_ONCE(*pmd);
> +
> +	if (pmd_leaf(val))
> +		st->note_page(st, addr, 4, pmd_val(val));
> +
> +	return 0;
> +}
> +
> +static int ptdump_pte_entry(pte_t *pte, unsigned long addr,
> +			    unsigned long next, struct mm_walk *walk)
> +{
> +	struct ptdump_state *st = walk->private;
> +
> +	st->note_page(st, addr, 5, pte_val(READ_ONCE(*pte)));
> +
> +	return 0;
> +}
> +
> +#ifdef CONFIG_KASAN
> +/*
> + * This is an optimization for KASAN=y case. Since all kasan page tables
> + * eventually point to the kasan_early_shadow_page we could call note_page()
> + * right away without walking through lower level page tables. This saves
> + * us dozens of seconds (minutes for 5-level config) while checking for
> + * W+X mapping or reading kernel_page_tables debugfs file.
> + */
> +static inline int note_kasan_page_table(struct mm_walk *walk,
> +					unsigned long addr)
> +{
> +	struct ptdump_state *st = walk->private;
> +
> +	st->note_page(st, addr, 5, pte_val(kasan_early_shadow_pte[0]));
> +	return 1;
> +}
> +
> +static int ptdump_test_p4d(unsigned long addr, unsigned long next,
> +			   p4d_t *p4d, struct mm_walk *walk)
> +{
> +#if CONFIG_PGTABLE_LEVELS > 4
> +	if (p4d == lm_alias(kasan_early_shadow_p4d))
> +		return note_kasan_page_table(walk, addr);
> +#endif
> +	return 0;
> +}
> +
> +static int ptdump_test_pud(unsigned long addr, unsigned long next,
> +			   pud_t *pud, struct mm_walk *walk)
> +{
> +#if CONFIG_PGTABLE_LEVELS > 3
> +	if (pud == lm_alias(kasan_early_shadow_pud))
> +		return note_kasan_page_table(walk, addr);
> +#endif
> +	return 0;
> +}
> +
> +static int ptdump_test_pmd(unsigned long addr, unsigned long next,
> +			   pmd_t *pmd, struct mm_walk *walk)
> +{
> +#if CONFIG_PGTABLE_LEVELS > 2
> +	if (pmd == lm_alias(kasan_early_shadow_pmd))
> +		return note_kasan_page_table(walk, addr);
> +#endif
> +	return 0;
> +}
> +#endif /* CONFIG_KASAN */
> +
> +static int ptdump_hole(unsigned long addr, unsigned long next,
> +		       int depth, struct mm_walk *walk)
> +{
> +	struct ptdump_state *st = walk->private;
> +
> +	st->note_page(st, addr, depth + 1, 0);
> +
> +	return 0;
> +}
> +
> +static const struct mm_walk_ops ptdump_ops = {
> +	.pgd_entry	= ptdump_pgd_entry,
> +	.p4d_entry	= ptdump_p4d_entry,
> +	.pud_entry	= ptdump_pud_entry,
> +	.pmd_entry	= ptdump_pmd_entry,
> +	.pte_entry	= ptdump_pte_entry,
> +#ifdef CONFIG_KASAN
> +	.test_p4d	= ptdump_test_p4d,
> +	.test_pud	= ptdump_test_pud,
> +	.test_pmd	= ptdump_test_pmd,
> +#endif
> +	.pte_hole	= ptdump_hole,
> +};
> +
> +void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm)
> +{
> +	const struct ptdump_range *range = st->range;
> +
> +	down_read(&mm->mmap_sem);
> +	while (range->start != range->end) {
> +		walk_page_range_novma(mm, range->start, range->end,
> +				      &ptdump_ops, st);
> +		range++;
> +	}
> +	up_read(&mm->mmap_sem);
> +
> +	/* Flush out the last page */
> +	st->note_page(st, 0, 0, 0);
> +}
> _
> 

On linux-next, booting a simple QEMU x86-64 guest (since I updated from
pre-v5.4 base), I get:

[    1.231285] BUG: kernel NULL pointer dereference, address: 0000000000000018
[    1.231897] #PF: supervisor read access in kernel mode
[    1.232354] #PF: error_code(0x0000) - not-present page
[    1.232803] PGD 0 P4D 0 
[    1.233033] Oops: 0000 [#1] SMP NOPTI
[    1.233359] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.4.0-next-20191203+ #29
[    1.233998] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
[    1.235015] RIP: 0010:__lock_acquire+0x778/0x1940
[    1.235428] Code: 00 45 31 ff 48 8b 44 24 48 65 48 33 04 25 28 00 00 00 0f 85 fd 0d 00 00 48 83 c4 50 44 89 f8 5b7
[    1.237051] RSP: 0018:ffffbc6100637c48 EFLAGS: 00010002
[    1.237512] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[    1.238147] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000018
[    1.238765] RBP: ffff92dd7db54d80 R08: 0000000000000001 R09: 0000000000000000
[    1.239395] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    1.240012] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[    1.240626] FS:  0000000000000000(0000) GS:ffff92dd7dd00000(0000) knlGS:0000000000000000
[    1.241316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.241808] CR2: 0000000000000018 CR3: 00000000a8610000 CR4: 00000000000006e0
[    1.242407] Call Trace:
[    1.242626]  ? check_usage_backwards+0x99/0x140
[    1.243023]  ? stack_trace_save+0x4b/0x70
[    1.243385]  lock_acquire+0xa2/0x1b0
[    1.243707]  ? __walk_page_range+0x6e5/0xa00
[    1.244104]  _raw_spin_lock+0x2c/0x40
[    1.244431]  ? __walk_page_range+0x6e5/0xa00
[    1.244817]  __walk_page_range+0x6e5/0xa00
[    1.245184]  walk_page_range_novma+0x69/0xb0
[    1.245562]  ptdump_walk_pgd+0x46/0x80
[    1.245904]  ptdump_walk_pgd_level_core+0xb7/0xe0
[    1.246318]  ? ptdump_walk_pgd_level_core+0xe0/0xe0
[    1.246748]  ? rest_init+0x23a/0x23a
[    1.247076]  ? rest_init+0x23a/0x23a
[    1.247392]  kernel_init+0x2c/0x106
[    1.247700]  ret_from_fork+0x27/0x50
[    1.248025] Modules linked in:
[    1.248298] CR2: 0000000000000018
[    1.248594] ---[ end trace d9ad45dca0b4f3a3 ]---
[    1.249020] RIP: 0010:__lock_acquire+0x778/0x1940
[    1.249432] Code: 00 45 31 ff 48 8b 44 24 48 65 48 33 04 25 28 00 00 00 0f 85 fd 0d 00 00 48 83 c4 50 44 89 f8 5b7
[    1.251059] RSP: 0018:ffffbc6100637c48 EFLAGS: 00010002
[    1.251514] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[    1.252153] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000018
[    1.252773] RBP: ffff92dd7db54d80 R08: 0000000000000001 R09: 0000000000000000
[    1.253396] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    1.254026] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[    1.254648] FS:  0000000000000000(0000) GS:ffff92dd7dd00000(0000) knlGS:0000000000000000
[    1.255360] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.255867] CR2: 0000000000000018 CR3: 00000000a8610000 CR4: 00000000000006e0
[    1.256491] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:38
[    1.257268] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
[    1.257952] INFO: lockdep is turned off.
[    1.258299] irq event stamp: 1570043
[    1.258617] hardirqs last  enabled at (1570043): [<ffffffff9716dd2c>] console_unlock+0x45c/0x5c0
[    1.259386] hardirqs last disabled at (1570042): [<ffffffff9716d964>] console_unlock+0x94/0x5c0
[    1.260153] softirqs last  enabled at (1570040): [<ffffffff97e0035d>] __do_softirq+0x35d/0x45d
[    1.260898] softirqs last disabled at (1570033): [<ffffffff970efe54>] irq_exit+0xf4/0x100
[    1.261615] CPU: 3 PID: 1 Comm: swapper/0 Tainted: G      D           5.4.0-next-20191203+ #29
[    1.262370] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
[    1.263372] Call Trace:
[    1.263595]  dump_stack+0x8f/0xd0
[    1.263895]  ___might_sleep.cold+0xb3/0xc3
[    1.264246]  exit_signals+0x30/0x2d0
[    1.264552]  do_exit+0xb4/0xc40
[    1.264832]  rewind_stack_do_exit+0x17/0x20
[    1.265198] note: swapper/0[1] exited with preempt_count 1
[    1.265700] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[    1.266443] Kernel Offset: 0x16000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbff)
[    1.267394] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---

Related to this?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 064/158] mm: add generic ptdump
  2019-12-03 10:47 ` David Hildenbrand
@ 2019-12-03 11:00   ` David Hildenbrand
  0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2019-12-03 11:00 UTC (permalink / raw)
  To: akpm, alex, aou, ard.biesheuvel, arnd, aryabinin, benh,
	borntraeger, bp, cai, catalin.marinas, dave.hansen, dave.jiang,
	davem, dvyukov, glider, gor, heiko.carstens, hpa, james.morse,
	jhogan, kan.liang, linux-mm, linux, luto, mark.rutland, mawilcox,
	mingo, mm-commits, mpe, n-horiguchi, palmer, paul.burton,
	paul.walmsley, paulus, peterz, ralf, shashim, steven.price, tglx,
	torvalds, vgupta, will, zong.li

On 03.12.19 11:47, David Hildenbrand wrote:
> On 01.12.19 02:53, akpm@linux-foundation.org wrote:
>> From: Steven Price <steven.price@arm.com>
>> Subject: mm: add generic ptdump
>>
>> Add a generic version of page table dumping that architectures can opt-in
>> to
>>
>> [steven.price@arm.com: v15]
>>   Link: http://lkml.kernel.org/r/20191101140942.51554-20-steven.price@arm.com
>> [cai@lca.pw: fix a -Wold-style-declaration warning]
>>   Link: http://lkml.kernel.org/r/1572895385-29194-1-git-send-email-cai@lca.pw
>> Link: http://lkml.kernel.org/r/20191028135910.33253-20-steven.price@arm.com
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> Signed-off-by: Qian Cai <cai@lca.pw>
>> Cc: Albert Ou <aou@eecs.berkeley.edu>
>> Cc: Alexander Potapenko <glider@google.com>
>> Cc: Alexandre Ghiti <alex@ghiti.fr>
>> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Dave Jiang <dave.jiang@intel.com>
>> Cc: David S. Miller <davem@davemloft.net>
>> Cc: Dmitry Vyukov <dvyukov@google.com>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Ingo Molnar <mingo@elte.hu>
>> Cc: James Hogan <jhogan@kernel.org>
>> Cc: James Morse <james.morse@arm.com>
>> Cc: "Liang, Kan" <kan.liang@linux.intel.com>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Matthew Wilcox <mawilcox@microsoft.com>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> Cc: Palmer Dabbelt <palmer@sifive.com>
>> Cc: Paul Burton <paul.burton@mips.com>
>> Cc: Paul Mackerras <paulus@samba.org>
>> Cc: Paul Walmsley <paul.walmsley@sifive.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Ralf Baechle <ralf@linux-mips.org>
>> Cc: Russell King <linux@armlinux.org.uk>
>> Cc: Shiraz Hashim <shashim@codeaurora.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>> Cc: Vineet Gupta <vgupta@synopsys.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Zong Li <zong.li@sifive.com>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> ---
>>
>>  include/linux/ptdump.h |   21 +++++
>>  mm/Kconfig.debug       |   21 +++++
>>  mm/Makefile            |    1 
>>  mm/ptdump.c            |  151 +++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 194 insertions(+)
>>
>> --- /dev/null
>> +++ a/include/linux/ptdump.h
>> @@ -0,0 +1,21 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +#ifndef _LINUX_PTDUMP_H
>> +#define _LINUX_PTDUMP_H
>> +
>> +#include <linux/mm_types.h>
>> +
>> +struct ptdump_range {
>> +	unsigned long start;
>> +	unsigned long end;
>> +};
>> +
>> +struct ptdump_state {
>> +	void (*note_page)(struct ptdump_state *st, unsigned long addr,
>> +			  int level, unsigned long val);
>> +	const struct ptdump_range *range;
>> +};
>> +
>> +void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm);
>> +
>> +#endif /* _LINUX_PTDUMP_H */
>> --- a/mm/Kconfig.debug~mm-add-generic-ptdump
>> +++ a/mm/Kconfig.debug
>> @@ -117,3 +117,24 @@ config DEBUG_RODATA_TEST
>>      depends on STRICT_KERNEL_RWX
>>      ---help---
>>        This option enables a testcase for the setting rodata read-only.
>> +
>> +config GENERIC_PTDUMP
>> +	bool
>> +
>> +config PTDUMP_CORE
>> +	bool
>> +
>> +config PTDUMP_DEBUGFS
>> +	bool "Export kernel pagetable layout to userspace via debugfs"
>> +	depends on DEBUG_KERNEL
>> +	depends on DEBUG_FS
>> +	depends on GENERIC_PTDUMP
>> +	select PTDUMP_CORE
>> +	help
>> +	  Say Y here if you want to show the kernel pagetable layout in a
>> +	  debugfs file. This information is only useful for kernel developers
>> +	  who are working in architecture specific areas of the kernel.
>> +	  It is probably not a good idea to enable this feature in a production
>> +	  kernel.
>> +
>> +	  If in doubt, say N.
>> --- a/mm/Makefile~mm-add-generic-ptdump
>> +++ a/mm/Makefile
>> @@ -98,6 +98,7 @@ obj-$(CONFIG_CMA)	+= cma.o
>>  obj-$(CONFIG_MEMORY_BALLOON) += balloon_compaction.o
>>  obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
>>  obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
>> +obj-$(CONFIG_PTDUMP_CORE) += ptdump.o
>>  obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
>>  obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
>>  obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
>> --- /dev/null
>> +++ a/mm/ptdump.c
>> @@ -0,0 +1,151 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#include <linux/pagewalk.h>
>> +#include <linux/ptdump.h>
>> +#include <linux/kasan.h>
>> +
>> +static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
>> +			    unsigned long next, struct mm_walk *walk)
>> +{
>> +	struct ptdump_state *st = walk->private;
>> +	pgd_t val = READ_ONCE(*pgd);
>> +
>> +	if (pgd_leaf(val))
>> +		st->note_page(st, addr, 1, pgd_val(val));
>> +
>> +	return 0;
>> +}
>> +
>> +static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
>> +			    unsigned long next, struct mm_walk *walk)
>> +{
>> +	struct ptdump_state *st = walk->private;
>> +	p4d_t val = READ_ONCE(*p4d);
>> +
>> +	if (p4d_leaf(val))
>> +		st->note_page(st, addr, 2, p4d_val(val));
>> +
>> +	return 0;
>> +}
>> +
>> +static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
>> +			    unsigned long next, struct mm_walk *walk)
>> +{
>> +	struct ptdump_state *st = walk->private;
>> +	pud_t val = READ_ONCE(*pud);
>> +
>> +	if (pud_leaf(val))
>> +		st->note_page(st, addr, 3, pud_val(val));
>> +
>> +	return 0;
>> +}
>> +
>> +static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
>> +			    unsigned long next, struct mm_walk *walk)
>> +{
>> +	struct ptdump_state *st = walk->private;
>> +	pmd_t val = READ_ONCE(*pmd);
>> +
>> +	if (pmd_leaf(val))
>> +		st->note_page(st, addr, 4, pmd_val(val));
>> +
>> +	return 0;
>> +}
>> +
>> +static int ptdump_pte_entry(pte_t *pte, unsigned long addr,
>> +			    unsigned long next, struct mm_walk *walk)
>> +{
>> +	struct ptdump_state *st = walk->private;
>> +
>> +	st->note_page(st, addr, 5, pte_val(READ_ONCE(*pte)));
>> +
>> +	return 0;
>> +}
>> +
>> +#ifdef CONFIG_KASAN
>> +/*
>> + * This is an optimization for KASAN=y case. Since all kasan page tables
>> + * eventually point to the kasan_early_shadow_page we could call note_page()
>> + * right away without walking through lower level page tables. This saves
>> + * us dozens of seconds (minutes for 5-level config) while checking for
>> + * W+X mapping or reading kernel_page_tables debugfs file.
>> + */
>> +static inline int note_kasan_page_table(struct mm_walk *walk,
>> +					unsigned long addr)
>> +{
>> +	struct ptdump_state *st = walk->private;
>> +
>> +	st->note_page(st, addr, 5, pte_val(kasan_early_shadow_pte[0]));
>> +	return 1;
>> +}
>> +
>> +static int ptdump_test_p4d(unsigned long addr, unsigned long next,
>> +			   p4d_t *p4d, struct mm_walk *walk)
>> +{
>> +#if CONFIG_PGTABLE_LEVELS > 4
>> +	if (p4d == lm_alias(kasan_early_shadow_p4d))
>> +		return note_kasan_page_table(walk, addr);
>> +#endif
>> +	return 0;
>> +}
>> +
>> +static int ptdump_test_pud(unsigned long addr, unsigned long next,
>> +			   pud_t *pud, struct mm_walk *walk)
>> +{
>> +#if CONFIG_PGTABLE_LEVELS > 3
>> +	if (pud == lm_alias(kasan_early_shadow_pud))
>> +		return note_kasan_page_table(walk, addr);
>> +#endif
>> +	return 0;
>> +}
>> +
>> +static int ptdump_test_pmd(unsigned long addr, unsigned long next,
>> +			   pmd_t *pmd, struct mm_walk *walk)
>> +{
>> +#if CONFIG_PGTABLE_LEVELS > 2
>> +	if (pmd == lm_alias(kasan_early_shadow_pmd))
>> +		return note_kasan_page_table(walk, addr);
>> +#endif
>> +	return 0;
>> +}
>> +#endif /* CONFIG_KASAN */
>> +
>> +static int ptdump_hole(unsigned long addr, unsigned long next,
>> +		       int depth, struct mm_walk *walk)
>> +{
>> +	struct ptdump_state *st = walk->private;
>> +
>> +	st->note_page(st, addr, depth + 1, 0);
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct mm_walk_ops ptdump_ops = {
>> +	.pgd_entry	= ptdump_pgd_entry,
>> +	.p4d_entry	= ptdump_p4d_entry,
>> +	.pud_entry	= ptdump_pud_entry,
>> +	.pmd_entry	= ptdump_pmd_entry,
>> +	.pte_entry	= ptdump_pte_entry,
>> +#ifdef CONFIG_KASAN
>> +	.test_p4d	= ptdump_test_p4d,
>> +	.test_pud	= ptdump_test_pud,
>> +	.test_pmd	= ptdump_test_pmd,
>> +#endif
>> +	.pte_hole	= ptdump_hole,
>> +};
>> +
>> +void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm)
>> +{
>> +	const struct ptdump_range *range = st->range;
>> +
>> +	down_read(&mm->mmap_sem);
>> +	while (range->start != range->end) {
>> +		walk_page_range_novma(mm, range->start, range->end,
>> +				      &ptdump_ops, st);
>> +		range++;
>> +	}
>> +	up_read(&mm->mmap_sem);
>> +
>> +	/* Flush out the last page */
>> +	st->note_page(st, 0, 0, 0);
>> +}
>> _
>>
> 
> On linux-next, booting a simple QEMU x86-64 guest (since I updated from
> pre-v5.4 base), I get:
> 
> [    1.231285] BUG: kernel NULL pointer dereference, address: 0000000000000018
> [    1.231897] #PF: supervisor read access in kernel mode
> [    1.232354] #PF: error_code(0x0000) - not-present page
> [    1.232803] PGD 0 P4D 0 
> [    1.233033] Oops: 0000 [#1] SMP NOPTI
> [    1.233359] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.4.0-next-20191203+ #29
> [    1.233998] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
> [    1.235015] RIP: 0010:__lock_acquire+0x778/0x1940
> [    1.235428] Code: 00 45 31 ff 48 8b 44 24 48 65 48 33 04 25 28 00 00 00 0f 85 fd 0d 00 00 48 83 c4 50 44 89 f8 5b7
> [    1.237051] RSP: 0018:ffffbc6100637c48 EFLAGS: 00010002
> [    1.237512] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> [    1.238147] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000018
> [    1.238765] RBP: ffff92dd7db54d80 R08: 0000000000000001 R09: 0000000000000000
> [    1.239395] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [    1.240012] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
> [    1.240626] FS:  0000000000000000(0000) GS:ffff92dd7dd00000(0000) knlGS:0000000000000000
> [    1.241316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.241808] CR2: 0000000000000018 CR3: 00000000a8610000 CR4: 00000000000006e0
> [    1.242407] Call Trace:
> [    1.242626]  ? check_usage_backwards+0x99/0x140
> [    1.243023]  ? stack_trace_save+0x4b/0x70
> [    1.243385]  lock_acquire+0xa2/0x1b0
> [    1.243707]  ? __walk_page_range+0x6e5/0xa00
> [    1.244104]  _raw_spin_lock+0x2c/0x40
> [    1.244431]  ? __walk_page_range+0x6e5/0xa00
> [    1.244817]  __walk_page_range+0x6e5/0xa00
> [    1.245184]  walk_page_range_novma+0x69/0xb0
> [    1.245562]  ptdump_walk_pgd+0x46/0x80
> [    1.245904]  ptdump_walk_pgd_level_core+0xb7/0xe0
> [    1.246318]  ? ptdump_walk_pgd_level_core+0xe0/0xe0
> [    1.246748]  ? rest_init+0x23a/0x23a
> [    1.247076]  ? rest_init+0x23a/0x23a
> [    1.247392]  kernel_init+0x2c/0x106
> [    1.247700]  ret_from_fork+0x27/0x50
> [    1.248025] Modules linked in:
> [    1.248298] CR2: 0000000000000018
> [    1.248594] ---[ end trace d9ad45dca0b4f3a3 ]---
> [    1.249020] RIP: 0010:__lock_acquire+0x778/0x1940
> [    1.249432] Code: 00 45 31 ff 48 8b 44 24 48 65 48 33 04 25 28 00 00 00 0f 85 fd 0d 00 00 48 83 c4 50 44 89 f8 5b7
> [    1.251059] RSP: 0018:ffffbc6100637c48 EFLAGS: 00010002
> [    1.251514] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> [    1.252153] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000018
> [    1.252773] RBP: ffff92dd7db54d80 R08: 0000000000000001 R09: 0000000000000000
> [    1.253396] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [    1.254026] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
> [    1.254648] FS:  0000000000000000(0000) GS:ffff92dd7dd00000(0000) knlGS:0000000000000000
> [    1.255360] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.255867] CR2: 0000000000000018 CR3: 00000000a8610000 CR4: 00000000000006e0
> [    1.256491] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:38
> [    1.257268] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> [    1.257952] INFO: lockdep is turned off.
> [    1.258299] irq event stamp: 1570043
> [    1.258617] hardirqs last  enabled at (1570043): [<ffffffff9716dd2c>] console_unlock+0x45c/0x5c0
> [    1.259386] hardirqs last disabled at (1570042): [<ffffffff9716d964>] console_unlock+0x94/0x5c0
> [    1.260153] softirqs last  enabled at (1570040): [<ffffffff97e0035d>] __do_softirq+0x35d/0x45d
> [    1.260898] softirqs last disabled at (1570033): [<ffffffff970efe54>] irq_exit+0xf4/0x100
> [    1.261615] CPU: 3 PID: 1 Comm: swapper/0 Tainted: G      D           5.4.0-next-20191203+ #29
> [    1.262370] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
> [    1.263372] Call Trace:
> [    1.263595]  dump_stack+0x8f/0xd0
> [    1.263895]  ___might_sleep.cold+0xb3/0xc3
> [    1.264246]  exit_signals+0x30/0x2d0
> [    1.264552]  do_exit+0xb4/0xc40
> [    1.264832]  rewind_stack_do_exit+0x17/0x20
> [    1.265198] note: swapper/0[1] exited with preempt_count 1
> [    1.265700] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> [    1.266443] Kernel Offset: 0x16000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbff)
> [    1.267394] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
> 
> Related to this?
> 

I just made sure that I am actually on the latest linux-next. I do have

commit d3634da666853cdff2258a49dd3ce3607c0fd6c5
Author: Steven Price <steven.price@arm.com>
Date:   Tue Nov 19 11:47:24 2019 +1100

    mm-pagewalk-allow-walking-without-vma-fix

    fix boot crash

    Reported-by: Qian Cai <cai@lca.pw>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>


Problem persists. I do have a bunch of debug options enabled in my
config and can share if required.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-01  1:53 [patch 064/158] mm: add generic ptdump akpm
2019-12-01  9:07 ` Borislav Petkov
2019-12-01 14:45   ` Linus Torvalds
2019-12-01 15:10     ` Borislav Petkov
2019-12-01 15:21       ` Borislav Petkov
2019-12-01 15:45         ` Borislav Petkov
2019-12-02  9:09           ` Steven Price
2019-12-02 15:42             ` Borislav Petkov
2019-12-03 10:47 ` David Hildenbrand
2019-12-03 11:00   ` David Hildenbrand

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git