LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Anshuman Khandual <anshuman.khandual@arm.com>
To: Steven Price <steven.price@arm.com>, linux-mm@kvack.org
Cc: "Mark Rutland" <Mark.Rutland@arm.com>,
	x86@kernel.org, "Arnd Bergmann" <arnd@arndb.de>,
	"Ard Biesheuvel" <ard.biesheuvel@linaro.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	linux-kernel@vger.kernel.org,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Andy Lutomirski" <luto@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"James Morse" <james.morse@arm.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Will Deacon" <will@kernel.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linux-arm-kernel@lists.infradead.org, "Liang,
	Kan" <kan.liang@linux.intel.com>
Subject: Re: [PATCH v9 00/21] Generic page walk and ptdump
Date: Thu, 25 Jul 2019 14:39:22 +0530
Message-ID: <6f59521e-1f3e-6765-9a6f-c8eca4c0c154@arm.com> (raw)
In-Reply-To: <c9d2042f-c731-4705-4148-b38deccf7963@arm.com>



On 07/24/2019 07:05 PM, Steven Price wrote:
> On 23/07/2019 07:39, Anshuman Khandual wrote:
>> Hello Steven,
>>
>> On 07/22/2019 09:11 PM, Steven Price wrote:
>>> This is a slight reworking and extension of my previous patch set
>>> (Convert x86 & arm64 to use generic page walk), but I've continued the
>>> version numbering as most of the changes are the same. In particular
>>> this series ends with a generic PTDUMP implemention for arm64 and x86.
>>>
>>> Many architectures current have a debugfs file for dumping the kernel
>>> page tables. Currently each architecture has to implement custom
>>> functions for this because the details of walking the page tables used
>>> by the kernel are different between architectures.
>>>
>>> This series extends the capabilities of walk_page_range() so that it can
>>> deal with the page tables of the kernel (which have no VMAs and can
>>> contain larger huge pages than exist for user space). A generic PTDUMP
>>> implementation is the implemented making use of the new functionality of
>>> walk_page_range() and finally arm64 and x86 are switch to using it,
>>> removing the custom table walkers.
>>
>> Could other architectures just enable this new generic PTDUMP feature if
>> required without much problem ?
> 
> The generic PTDUMP is implemented as a library - so the architectures
> would have to provide the call into ptdump_walk_pgd() and provide the
> necessary callback note_page() which formats the lines in the output.

Though I understand that the leaf flag (any given level) details are very much
arch specific would there be any possibility for note_page() call back to be
unified as well. This is extracted from current PTDUMP output on arm64.

0xffffffc000000000-0xffffffc000080000  512K PTE  RW NX SHD AF  UXN MEM/NORMAL

The first three columns are generic

1. Kernel virtual range span
2. Kernel virtual range size
3. Kernel virtual range mapping level

Where as rest of the output are architecture specific page table entry flags.
Just wondering if we could print the first three columns in ptdump_walk_pgd()
itself before calling arch specific callback to fetch a print buffer for rest
of the line bounded with some character limit so that line does not overflow.
Its not something which must be done but I guess it's worth giving it a try.
This will help consolidate ptdump_walk_pgd() further.

> 
> Hopefully the implementation is generic enough that it should be
> flexible enough to work for most architectures.
> 
> arm, powerpc and s390 are the obvious architectures to convert next as
> they already have note_page() functions which shouldn't be too difficult
> to convert to match the callback prototype.

Which can be done independently later on, fair enough.

> 
>>>
>>> To enable a generic page table walker to walk the unusual mappings of
>>> the kernel we need to implement a set of functions which let us know
>>> when the walker has reached the leaf entry. After a suggestion from Will
>>> Deacon I've chosen the name p?d_leaf() as this (hopefully) describes
>>> the purpose (and is a new name so has no historic baggage). Some
>>> architectures have p?d_large macros but this is easily confused with
>>> "large pages".
>>
>> I have not been following the previous version of the series closely, hence
>> might be missing something here. But p?d_large() which identifies large
>> mappings on a given level can only signify a leaf entry. Large pages on the
>> table exist only as leaf entries. So what is the problem for it being used
>> directly instead. Is there any possibility in the kernel mapping when these
>> large pages are not leaf entries ?
> 
> There isn't any problem as such with using p?d_large macros. However the
> name "large" has caused confusion in the past. In particular there are
> two types of "large" page:
> 
> 1. leaf entries at high levels than normal ('sections' on Arm, for 4K
> pages this gives you 2MB and 1GB pages).
> 
> 2. sets of contiguous entries that can share a TLB entry (the
> 'Contiguous bit' on Arm - which for 4K pages gives you 16 entries = 64
> KB 'pages').

This is arm64 specific and AFAIK there are no other architectures where there
will be any confusion wrt p?d_large() not meaning a single entry.

As you have noted before if we are printing individual entries with PTE_CONT
then they need not be identified as p??d_large(). In which case p?d_large()
can just safely point to p?d_sect() identifying regular huge leaf entries.

> 
> In many cases both give the same effect (reduce pressure on TLBs and
> requires contiguous and aligned physical addresses). But for this case
> we only care about the 'leaf' case (because the contiguous bit makes no
> difference to walking the page tables).

Right and we can just safely identify section entries with it. What will be
the problem with that ? Again this is only arm64 specific.

> 
> As far as I'm aware p?d_large() currently implements the first and
> p?d_(trans_)huge() implements either 1 or 2 depending on the architecture.

AFAIK option 2 exists only on arm6 platform. IIUC generic MM requires two
different huge page dentition from platform. HugeTLB identifies large entries
at PGD|PUD|PMD after converting it's content into PTE first. So there is no
need for direct large page definitions for other levels.

1. THP		- pmd_trans_huge()
2. HugeTLB	- pte_huge()	   CONFIG_ARCH_WANT_GENERAL_HUGETLB is set

A simple check for p?d_large() on mm/ and include/linux shows that there are
no existing usage for these in generic MM. Hence it is available.

p?d_trans_huge() cannot use contiguous entries, so its definitely 1 in above
example.

The problem is there is no other type of mapped leaf entries apart from a large
mapping at PGD, PUD, PMD level. Had there been another type of leaf entry then
p?d_leaf() would have made sense as p?d_large() would not have been sufficient.
Hence just wondering if it is really necessary to add brand new p?d_leaf() page
table helper in generic MM functions.

IMHO the new addition of p?d_leaf() can be avoided and p?d_large() should be
cleaned up (if required) in platforms and used in generic functions.

> 
> Will[1] suggested the same p?d_leaf() and this also avoids stepping on
> the existing usage of p?d_large() which isn't always available on every
> architecture.

PTDUMP now needs to identify large leaf entries uniformly on each platform.
Hence platforms enabling generic PTDUMP need to provide clean p?d_large()
definitions.

If there are existing definitions and usage of p?d_large() functions on some
platforms, those need to be fixed before they can use generic PTDUMP. IMHO we
should not be just adding new page table helpers in order to avoid cleaning
up these in platform code.

> 
> [1]
> https://lore.kernel.org/linux-mm/20190701101510.qup3nd6vm6cbdgjv@willie-the-truck/

I guess the primary concern was with the existence of p?d_large()or p?d_huge()
definitions in various platform code and p?d_leaf() was an way of working around
it. The problem is, it adds a new helper without a real need for one.

> 
>>>
>>> Mostly this is a clean up and there should be very little functional
>>> change. The exceptions are:
>>>
>>> * x86 PTDUMP debugfs output no longer display pages which aren't
>>>   present (patch 14).
>>
>> Hmm, kernel mappings pages which are not present! which ones are those ?
>> Just curious.
> 
> x86 currently outputs a line for every range, including those which are
> unpopulated. Patch 14 removes those lines from the output, only showing
> the populated ranges. This was discussed[2] previously.
> 
> [2]
> https://lore.kernel.org/lkml/26df02dd-c54e-ea91-bdd1-0a4aad3a30ac@arm.com/

Currently that is a difference between x86 and arm64 ptdump output. Whether to
show the gaps or not could not be achieved by defining a note_page() callback
function which does nothing but just return ? But if the single line output is
split between generic and callback as I had proposed earlier this will not be
possible any more as half the line would have been already printed.

  reply index

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-22 15:41 Steven Price
2019-07-22 15:41 ` [PATCH v9 01/21] arc: mm: Add p?d_leaf() definitions Steven Price
2019-07-22 15:41 ` [PATCH v9 02/21] arm: " Steven Price
2019-07-22 15:41 ` [PATCH v9 03/21] arm64: " Steven Price
2019-07-22 15:41 ` [PATCH v9 04/21] mips: " Steven Price
2019-07-22 21:47   ` Paul Burton
2019-07-24 13:03     ` Steven Price
2019-07-22 15:41 ` [PATCH v9 05/21] powerpc: " Steven Price
2019-07-22 15:41 ` [PATCH v9 06/21] riscv: " Steven Price
2019-07-22 15:41 ` [PATCH v9 07/21] s390: " Steven Price
2019-07-22 15:41 ` [PATCH v9 08/21] sparc: " Steven Price
2019-07-22 15:41 ` [PATCH v9 09/21] x86: " Steven Price
2019-07-22 15:41 ` [PATCH v9 10/21] mm: Add generic p?d_leaf() macros Steven Price
2019-07-23  9:41   ` Mark Rutland
2019-07-24 13:48     ` Steven Price
2019-07-28 11:44     ` Anshuman Khandual
2019-07-29 11:38       ` Steven Price
2019-08-01  6:09         ` Anshuman Khandual
2019-08-01 12:22           ` Steven Price
2019-07-29 12:50       ` Mark Rutland
2019-08-01  6:13         ` Anshuman Khandual
2019-07-22 15:42 ` [PATCH v9 11/21] mm: pagewalk: Add p4d_entry() and pgd_entry() Steven Price
2019-07-23 10:14   ` Mark Rutland
2019-07-24 13:53     ` Steven Price
2019-07-24 14:09       ` Mark Rutland
2019-07-28 12:33   ` Anshuman Khandual
2019-07-29 12:17     ` Steven Price
2019-07-22 15:42 ` [PATCH v9 12/21] mm: pagewalk: Allow walking without vma Steven Price
2019-07-28 14:20   ` Anshuman Khandual
2019-07-29 12:29     ` Steven Price
2019-08-01  6:41       ` Anshuman Khandual
2019-07-22 15:42 ` [PATCH v9 13/21] mm: pagewalk: Add test_p?d callbacks Steven Price
2019-07-28 13:41   ` Anshuman Khandual
2019-07-29 12:34     ` Steven Price
2019-07-22 15:42 ` [PATCH v9 14/21] x86: mm: Don't display pages which aren't present in debugfs Steven Price
2019-07-22 15:42 ` [PATCH v9 15/21] x86: mm: Point to struct seq_file from struct pg_state Steven Price
2019-07-22 15:42 ` [PATCH v9 16/21] x86: mm+efi: Convert ptdump_walk_pgd_level() to take a mm_struct Steven Price
2019-07-22 15:42 ` [PATCH v9 17/21] x86: mm: Convert ptdump_walk_pgd_level_debugfs() to take an mm_struct Steven Price
2019-07-22 15:42 ` [PATCH v9 18/21] x86: mm: Convert ptdump_walk_pgd_level_core() " Steven Price
2019-07-22 15:42 ` [PATCH v9 19/21] mm: Add generic ptdump Steven Price
2019-07-23  9:57   ` Mark Rutland
2019-07-24 16:36     ` Steven Price
2019-07-29  2:59   ` Anshuman Khandual
2019-07-29 13:56     ` Steven Price
2019-07-22 15:42 ` [PATCH v9 20/21] x86: mm: Convert dump_pagetables to use walk_page_range Steven Price
2019-07-22 15:42 ` [PATCH v9 21/21] arm64: mm: Convert mm/dump.c to use walk_page_range() Steven Price
2019-07-23  6:39 ` [PATCH v9 00/21] Generic page walk and ptdump Anshuman Khandual
2019-07-24 13:35   ` Steven Price
2019-07-25  9:09     ` Anshuman Khandual [this message]
2019-07-25  9:30       ` Will Deacon
2019-07-26  6:03         ` Anshuman Khandual
2019-07-25 10:15       ` Steven Price
2019-07-23 10:16 ` Mark Rutland
2019-07-24 13:35   ` Steven Price
2019-07-24 13:57     ` Thomas Gleixner
2019-07-24 14:07       ` Mark Rutland
2019-07-24 14:18       ` Steven Price
2019-07-24 14:37         ` Thomas Gleixner
2019-07-28 11:20 ` Anshuman Khandual
2019-07-29 11:32   ` Steven Price
2019-07-31  9:27     ` Sven Schnelle
2019-07-31 11:18       ` Steven Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6f59521e-1f3e-6765-9a6f-c8eca4c0c154@arm.com \
    --to=anshuman.khandual@arm.com \
    --cc=Mark.Rutland@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=jglisse@redhat.com \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=steven.price@arm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git