LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Steven Price <steven.price@arm.com>
To: Anshuman Khandual <anshuman.khandual@arm.com>, linux-mm@kvack.org
Cc: "Mark Rutland" <Mark.Rutland@arm.com>,
	x86@kernel.org, "Arnd Bergmann" <arnd@arndb.de>,
	"Ard Biesheuvel" <ard.biesheuvel@linaro.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	linux-kernel@vger.kernel.org,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Andy Lutomirski" <luto@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"James Morse" <james.morse@arm.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Will Deacon" <will@kernel.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linux-arm-kernel@lists.infradead.org, "Liang,
	Kan" <kan.liang@linux.intel.com>
Subject: Re: [PATCH v9 19/21] mm: Add generic ptdump
Date: Mon, 29 Jul 2019 14:56:41 +0100
Message-ID: <75e314f2-b4e6-0a5f-20f0-ad5f56ce77f6@arm.com> (raw)
In-Reply-To: <f8444b1f-c886-9bfd-4873-3ed9068d3c44@arm.com>

On 29/07/2019 03:59, Anshuman Khandual wrote:
> 
> On 07/22/2019 09:12 PM, Steven Price wrote:
>> Add a generic version of page table dumping that architectures can
>> opt-in to
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>  include/linux/ptdump.h |  19 +++++
>>  mm/Kconfig.debug       |  21 ++++++
>>  mm/Makefile            |   1 +
>>  mm/ptdump.c            | 161 +++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 202 insertions(+)
>>  create mode 100644 include/linux/ptdump.h
>>  create mode 100644 mm/ptdump.c
>>
>> diff --git a/include/linux/ptdump.h b/include/linux/ptdump.h
>> new file mode 100644
>> index 000000000000..eb8e78154be3
>> --- /dev/null
>> +++ b/include/linux/ptdump.h
>> @@ -0,0 +1,19 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +#ifndef _LINUX_PTDUMP_H
>> +#define _LINUX_PTDUMP_H
>> +
>> +struct ptdump_range {
>> +	unsigned long start;
>> +	unsigned long end;
>> +};
>> +
>> +struct ptdump_state {
>> +	void (*note_page)(struct ptdump_state *st, unsigned long addr,
>> +			  int level, unsigned long val);
>> +	const struct ptdump_range *range;
>> +};
>> +
>> +void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm);
>> +
>> +#endif /* _LINUX_PTDUMP_H */
>> diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
>> index 82b6a20898bd..7ad939b7140f 100644
>> --- a/mm/Kconfig.debug
>> +++ b/mm/Kconfig.debug
>> @@ -115,3 +115,24 @@ config DEBUG_RODATA_TEST
>>      depends on STRICT_KERNEL_RWX
>>      ---help---
>>        This option enables a testcase for the setting rodata read-only.
>> +
>> +config GENERIC_PTDUMP
>> +	bool
>> +
>> +config PTDUMP_CORE
>> +	bool
>> +
>> +config PTDUMP_DEBUGFS
>> +	bool "Export kernel pagetable layout to userspace via debugfs"
>> +	depends on DEBUG_KERNEL
>> +	depends on DEBUG_FS
>> +	depends on GENERIC_PTDUMP
>> +	select PTDUMP_CORE
> 
> So PTDUMP_DEBUGFS depends on GENERIC_PTDUMP but selects PTDUMP_CORE. So any arch
> subscribing this new generic PTDUMP by selecting GENERIC_PTDUMP needs to provide
> some functions for PTDUMP_DEBUGFS which does not really have any code in generic
> MM. Also ptdump_walk_pgd() is wrapped in PTDUMP_CORE not GENERIC_PTDUMP. Then what
> does PTDUMP_GENERIC really indicate ? Bit confusing here.

The intention is:

* PTDUMP_DEBUGFS: Controls if the debugfs file is available. This
enables arch specific code which creates the debugfs file (as the files
available vary between architectures).

* GENERIC_PTDUMP: Architecture is opting in to the generic ptdump
infrastructure. The arch code is expected to provide the debugfs code
for PTDUMP_DBEUGFS.

* PTDUMP_CORE: The core page table walker is enabled. This code is used
by both PTDUMP_DEBUGFS as well as the DEBUG_WX ("Warn on W+X mappings at
boot"). x86 also has EFI_PGT_DUMP which uses the core.

> The new ptdump_walk_pgd() symbol needs to be wrapped in a config symbol for sure
> which should be selected in all platforms wishing to use it. GENERIC_PTDUMP can
> be that config.

The intention is that GENERIC_PTDUMP is signalling that the architecture
supports PTDUMP_DEBUGFS. PTDUMP_CORE is the configuration which chooses
whether ptdump_walk_pgd() is built - selected by the options that
require it.

> PTDUMP_DEBUGFS will require a full implementation (i.e PTDUMP_CORE) irrespective
> of whether the platform subscribes GENERIC_PTDUMP or not. It should be something
> like this.
> 
> config PTDUMP_DEBUGFS
> 	bool "Export kernel pagetable layout to userspace via debugfs"
> 	depends on DEBUG_KERNEL
> 	depends on DEBUG_FS
> 	select PTDUMP_CORE
> 
> PTDUMP_DEBUGFS need not depend on GENERIC_PTDUMP. All it requires is a PTDUMP_CORE
> implementation which can optionally use ptdump_walk_pgd() through GENERIC_PTDUMP.
> s/GENERIC_PTDUMP/PTDUMP_GENERIC to match and group with other configs.

The intention here is to hide PTDUMP_DEBUGFS on architectures that
haven't migrated to it. Because the generic code isn't responsible for
creating the debugfs entries if we don't hide it then it will compile
but nothing will appear in debugfs.

> DEBUG_WX can also be moved to generic MM like PTDUMP_DEBUGFS ?

Well the DEBUG_WX requires some arch specific code (separate from
PTDUMP_DEBUGFS), so while the config option could be moved you would
then require a "ARCH_HAS_DEBUG_WX" to hide it on the architectures that
don't support it. I'm not sure that's worth doing when only 3
architectures support it, and I would argue it's separate to this patch
series so can be done at a different time.

>> +	help
>> +	  Say Y here if you want to show the kernel pagetable layout in a
>> +	  debugfs file. This information is only useful for kernel developers
>> +	  who are working in architecture specific areas of the kernel.
>> +	  It is probably not a good idea to enable this feature in a production
>> +	  kernel.
>> +
>> +	  If in doubt, say N.
>> diff --git a/mm/Makefile b/mm/Makefile
>> index 338e528ad436..750a4c12d5da 100644
>> --- a/mm/Makefile
>> +++ b/mm/Makefile
>> @@ -104,3 +104,4 @@ obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
>>  obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
>>  obj-$(CONFIG_HMM_MIRROR) += hmm.o
>>  obj-$(CONFIG_MEMFD_CREATE) += memfd.o
>> +obj-$(CONFIG_PTDUMP_CORE) += ptdump.o
> 
> Should be GENERIC_PTDUMP instead ?

No - GENERIC_PTDUMP is just signalling the architecture support it, we
don't want to compile in the code unless it is used.

>> diff --git a/mm/ptdump.c b/mm/ptdump.c
>> new file mode 100644
>> index 000000000000..39befc9088b8
>> --- /dev/null
>> +++ b/mm/ptdump.c
>> @@ -0,0 +1,161 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#include <linux/mm.h>
>> +#include <linux/ptdump.h>
>> +#include <linux/kasan.h>
>> +
>> +static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
>> +			    unsigned long next, struct mm_walk *walk)
>> +{
>> +	struct ptdump_state *st = walk->private;
>> +	pgd_t val = READ_ONCE(*pgd);
>> +
>> +	if (pgd_leaf(val))
>> +		st->note_page(st, addr, 1, pgd_val(val));
>> +
>> +	return 0;
>> +}
>> +
>> +static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
>> +			    unsigned long next, struct mm_walk *walk)
>> +{
>> +	struct ptdump_state *st = walk->private;
>> +	p4d_t val = READ_ONCE(*p4d);
>> +
>> +	if (p4d_leaf(val))
>> +		st->note_page(st, addr, 2, p4d_val(val));
>> +
>> +	return 0;
>> +}
>> +
>> +static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
>> +			    unsigned long next, struct mm_walk *walk)
>> +{
>> +	struct ptdump_state *st = walk->private;
>> +	pud_t val = READ_ONCE(*pud);
>> +
>> +	if (pud_leaf(val))
>> +		st->note_page(st, addr, 3, pud_val(val));
>> +
>> +	return 0;
>> +}
>> +
>> +static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
>> +			    unsigned long next, struct mm_walk *walk)
>> +{
>> +	struct ptdump_state *st = walk->private;
>> +	pmd_t val = READ_ONCE(*pmd);
>> +
>> +	if (pmd_leaf(val))
>> +		st->note_page(st, addr, 4, pmd_val(val));
>> +
>> +	return 0;
>> +}
>> +
>> +static int ptdump_pte_entry(pte_t *pte, unsigned long addr,
>> +			    unsigned long next, struct mm_walk *walk)
>> +{
>> +	struct ptdump_state *st = walk->private;
>> +
>> +	st->note_page(st, addr, 5, pte_val(READ_ONCE(*pte)));
>> +
>> +	return 0;
>> +}
>> +
>> +#ifdef CONFIG_KASAN
>> +/*
>> + * This is an optimization for KASAN=y case. Since all kasan page tables
>> + * eventually point to the kasan_early_shadow_page we could call note_page()
>> + * right away without walking through lower level page tables. This saves
>> + * us dozens of seconds (minutes for 5-level config) while checking for
>> + * W+X mapping or reading kernel_page_tables debugfs file.
>> + */
>> +static inline bool kasan_page_table(struct ptdump_state *st, void *pt,
>> +				    unsigned long addr)
>> +{
>> +	if (__pa(pt) == __pa(kasan_early_shadow_pmd) ||
>> +#ifdef CONFIG_X86
>> +	    (pgtable_l5_enabled() &&
>> +			__pa(pt) == __pa(kasan_early_shadow_p4d)) ||
>> +#endif
>> +	    __pa(pt) == __pa(kasan_early_shadow_pud)) {
>> +		st->note_page(st, addr, 5, pte_val(kasan_early_shadow_pte[0]));
>> +		return true;
>> +	}
>> +	return false;
>> +}
>> +#else
>> +static inline bool kasan_page_table(struct ptdump_state *st, void *pt,
>> +				    unsigned long addr)
>> +{
>> +	return false;
>> +}
>> +#endif
>> +
>> +static int ptdump_test_p4d(unsigned long addr, unsigned long next,
>> +			   p4d_t *p4d, struct mm_walk *walk)
>> +{
>> +	struct ptdump_state *st = walk->private;
>> +
>> +	if (kasan_page_table(st, p4d, addr))
>> +		return 1;
>> +	return 0;
>> +}
>> +
>> +static int ptdump_test_pud(unsigned long addr, unsigned long next,
>> +			   pud_t *pud, struct mm_walk *walk)
>> +{
>> +	struct ptdump_state *st = walk->private;
>> +
>> +	if (kasan_page_table(st, pud, addr))
>> +		return 1;
>> +	return 0;
>> +}
>> +
>> +static int ptdump_test_pmd(unsigned long addr, unsigned long next,
>> +			   pmd_t *pmd, struct mm_walk *walk)
>> +{
>> +	struct ptdump_state *st = walk->private;
>> +
>> +	if (kasan_page_table(st, pmd, addr))
>> +		return 1;
>> +	return 0;
>> +}
>> +
>> +static int ptdump_hole(unsigned long addr, unsigned long next,
>> +		       struct mm_walk *walk)
>> +{
>> +	struct ptdump_state *st = walk->private;
>> +
>> +	st->note_page(st, addr, -1, 0);
>> +
>> +	return 0;
>> +}
>> +
>> +void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm)
>> +{
>> +	struct mm_walk walk = {
>> +		.mm		= mm,
>> +		.pgd_entry	= ptdump_pgd_entry,
>> +		.p4d_entry	= ptdump_p4d_entry,
>> +		.pud_entry	= ptdump_pud_entry,
>> +		.pmd_entry	= ptdump_pmd_entry,
>> +		.pte_entry	= ptdump_pte_entry,
>> +		.test_p4d	= ptdump_test_p4d,
>> +		.test_pud	= ptdump_test_pud,
>> +		.test_pmd	= ptdump_test_pmd,
>> +		.pte_hole	= ptdump_hole,
>> +		.private	= st
>> +	};
>> +	const struct ptdump_range *range = st->range;
>> +
>> +	down_read(&mm->mmap_sem);
>> +	while (range->start != range->end) {
>> +		walk_page_range(range->start, range->end, &walk);
>> +		range++;
>> +	}
>> +	up_read(&mm->mmap_sem);
> 
> Does walk_page_range() really needed here when it is definitely walking a
> kernel page table. Why not directly use walk_pgd_range() instead which can
> save some cycles avoiding going over VMAs, checking for HugeTLB, taking the
> mmap_sem lock etc. AFAICS only thing it will miss is the opportunity to call
> walk->test_walk() via walk_page_test(). IIUC test_walk() callback is primarily
> for testing a VMA for it's eligibility and for kernel page table now there are
> test callbacks like p?d_test() for individual levels anyway.

Well it's a debug interface so saving a few cycles is largely
irrelevant. I'm reluctant to export walk_pgd_range() in case it gets
used to walk real VMAs. Having just one interface is cleanest. But I
agree for kernel mappings all the extra work in walk_page_range() isn't
needed.

Steve

  reply index

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-22 15:41 [PATCH v9 00/21] Generic page walk and ptdump Steven Price
2019-07-22 15:41 ` [PATCH v9 01/21] arc: mm: Add p?d_leaf() definitions Steven Price
2019-07-22 15:41 ` [PATCH v9 02/21] arm: " Steven Price
2019-07-22 15:41 ` [PATCH v9 03/21] arm64: " Steven Price
2019-07-22 15:41 ` [PATCH v9 04/21] mips: " Steven Price
2019-07-22 21:47   ` Paul Burton
2019-07-24 13:03     ` Steven Price
2019-07-22 15:41 ` [PATCH v9 05/21] powerpc: " Steven Price
2019-07-22 15:41 ` [PATCH v9 06/21] riscv: " Steven Price
2019-07-22 15:41 ` [PATCH v9 07/21] s390: " Steven Price
2019-07-22 15:41 ` [PATCH v9 08/21] sparc: " Steven Price
2019-07-22 15:41 ` [PATCH v9 09/21] x86: " Steven Price
2019-07-22 15:41 ` [PATCH v9 10/21] mm: Add generic p?d_leaf() macros Steven Price
2019-07-23  9:41   ` Mark Rutland
2019-07-24 13:48     ` Steven Price
2019-07-28 11:44     ` Anshuman Khandual
2019-07-29 11:38       ` Steven Price
2019-08-01  6:09         ` Anshuman Khandual
2019-08-01 12:22           ` Steven Price
2019-07-29 12:50       ` Mark Rutland
2019-08-01  6:13         ` Anshuman Khandual
2019-07-22 15:42 ` [PATCH v9 11/21] mm: pagewalk: Add p4d_entry() and pgd_entry() Steven Price
2019-07-23 10:14   ` Mark Rutland
2019-07-24 13:53     ` Steven Price
2019-07-24 14:09       ` Mark Rutland
2019-07-28 12:33   ` Anshuman Khandual
2019-07-29 12:17     ` Steven Price
2019-07-22 15:42 ` [PATCH v9 12/21] mm: pagewalk: Allow walking without vma Steven Price
2019-07-28 14:20   ` Anshuman Khandual
2019-07-29 12:29     ` Steven Price
2019-08-01  6:41       ` Anshuman Khandual
2019-07-22 15:42 ` [PATCH v9 13/21] mm: pagewalk: Add test_p?d callbacks Steven Price
2019-07-28 13:41   ` Anshuman Khandual
2019-07-29 12:34     ` Steven Price
2019-07-22 15:42 ` [PATCH v9 14/21] x86: mm: Don't display pages which aren't present in debugfs Steven Price
2019-07-22 15:42 ` [PATCH v9 15/21] x86: mm: Point to struct seq_file from struct pg_state Steven Price
2019-07-22 15:42 ` [PATCH v9 16/21] x86: mm+efi: Convert ptdump_walk_pgd_level() to take a mm_struct Steven Price
2019-07-22 15:42 ` [PATCH v9 17/21] x86: mm: Convert ptdump_walk_pgd_level_debugfs() to take an mm_struct Steven Price
2019-07-22 15:42 ` [PATCH v9 18/21] x86: mm: Convert ptdump_walk_pgd_level_core() " Steven Price
2019-07-22 15:42 ` [PATCH v9 19/21] mm: Add generic ptdump Steven Price
2019-07-23  9:57   ` Mark Rutland
2019-07-24 16:36     ` Steven Price
2019-07-29  2:59   ` Anshuman Khandual
2019-07-29 13:56     ` Steven Price [this message]
2019-07-22 15:42 ` [PATCH v9 20/21] x86: mm: Convert dump_pagetables to use walk_page_range Steven Price
2019-07-22 15:42 ` [PATCH v9 21/21] arm64: mm: Convert mm/dump.c to use walk_page_range() Steven Price
2019-07-23  6:39 ` [PATCH v9 00/21] Generic page walk and ptdump Anshuman Khandual
2019-07-24 13:35   ` Steven Price
2019-07-25  9:09     ` Anshuman Khandual
2019-07-25  9:30       ` Will Deacon
2019-07-26  6:03         ` Anshuman Khandual
2019-07-25 10:15       ` Steven Price
2019-07-23 10:16 ` Mark Rutland
2019-07-24 13:35   ` Steven Price
2019-07-24 13:57     ` Thomas Gleixner
2019-07-24 14:07       ` Mark Rutland
2019-07-24 14:18       ` Steven Price
2019-07-24 14:37         ` Thomas Gleixner
2019-07-28 11:20 ` Anshuman Khandual
2019-07-29 11:32   ` Steven Price
2019-07-31  9:27     ` Sven Schnelle
2019-07-31 11:18       ` Steven Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=75e314f2-b4e6-0a5f-20f0-ad5f56ce77f6@arm.com \
    --to=steven.price@arm.com \
    --cc=Mark.Rutland@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=jglisse@redhat.com \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git