[V12] mm/debug: Add tests validating architecture page table helpers
diff mbox series

Message ID 1580174873-18117-1-git-send-email-anshuman.khandual@arm.com
State New
Headers show
Series
  • [V12] mm/debug: Add tests validating architecture page table helpers
Related show

Commit Message

Anshuman Khandual Jan. 28, 2020, 1:27 a.m. UTC
This adds tests which will validate architecture page table helpers and
other accessors in their compliance with expected generic MM semantics.
This will help various architectures in validating changes to existing
page table helpers or addition of new ones.

This test covers basic page table entry transformations including but not
limited to old, young, dirty, clean, write, write protect etc at various
level along with populating intermediate entries with next page table page
and validating them.

Test page table pages are allocated from system memory with required size
and alignments. The mapped pfns at page table levels are derived from a
real pfn representing a valid kernel text symbol. This test gets called
right after page_alloc_init_late().

This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
arm64. Going forward, other architectures too can enable this after fixing
build or runtime problems (if any) with their page table helpers.

Folks interested in making sure that a given platform's page table helpers
conform to expected generic MM semantics should enable the above config
which will just trigger this test during boot. Any non conformity here will
be reported as an warning which would need to be fixed. This test will help
catch any changes to the agreed upon semantics expected from generic MM and
enable platforms to accommodate it thereafter.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Steven Price <Steven.Price@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sri Krishna chowdary <schowdary@nvidia.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-mips@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-ia64@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org

Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>		#PPC32
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
This adds a test validation for architecture exported page table helpers.
Patch adds basic transformation tests at various levels of the page table.

This test was originally suggested by Catalin during arm64 THP migration
RFC discussion earlier. Going forward it can include more specific tests
with respect to various generic MM functions like THP, HugeTLB etc and
platform specific tests.

https://lore.kernel.org/linux-mm/20190628102003.GA56463@arrakis.emea.arm.com/

Needs to be applied on linux V5.5-rc7

Changes in V12:

- Replaced __mmdrop() with mmdrop()
- Enable ARCH_HAS_DEBUG_VM_PGTABLE on X86 for non CONFIG_X86_PAE platforms as the
  test procedure interfere with pre-allocated PMDs attached to the PGD resulting
  in runtime failures with VM_BUG_ON()

Changes in V11: (https://patchwork.kernel.org/project/linux-mm/list/?series=221135)

- Rebased the patch on V5.4

Changes in V10: (https://patchwork.kernel.org/project/linux-mm/list/?series=205529)

- Always enable DEBUG_VM_PGTABLE when DEBUG_VM is enabled per Ingo
- Added tags from Ingo

Changes in V9: (https://patchwork.kernel.org/project/linux-mm/list/?series=201429)

- Changed feature support enumeration for powerpc platforms per Christophe
- Changed config wrapper for basic_[pmd|pud]_tests() to enable ARC platform
- Enabled the test on ARC platform

Changes in V8: (https://patchwork.kernel.org/project/linux-mm/list/?series=194297)

- Enabled ARCH_HAS_DEBUG_VM_PGTABLE on PPC32 platform per Christophe
- Updated feature documentation as DEBUG_VM_PGTABLE is now enabled on PPC32 platform
- Moved ARCH_HAS_DEBUG_VM_PGTABLE earlier to indent it with DEBUG_VM per Christophe
- Added an information message in debug_vm_pgtable() per Christophe
- Dropped random_vaddr boundary condition checks per Christophe and Qian
- Replaced virt_addr_valid() check with pfn_valid() check in debug_vm_pgtable()
- Slightly changed pr_fmt(fmt) information

Changes in V7: (https://patchwork.kernel.org/project/linux-mm/list/?series=193051)

- Memory allocation and free routines for mapped pages have been droped
- Mapped pfns are derived from standard kernel text symbol per Matthew
- Moved debug_vm_pgtaable() after page_alloc_init_late() per Michal and Qian 
- Updated the commit message per Michal
- Updated W=1 GCC warning problem on x86 per Qian Cai
- Addition of new alloc_contig_pages() helper has been submitted separately

Changes in V6: (https://patchwork.kernel.org/project/linux-mm/list/?series=187589)

- Moved alloc_gigantic_page_order() into mm/page_alloc.c per Michal
- Moved alloc_gigantic_page_order() within CONFIG_CONTIG_ALLOC in the test
- Folded Andrew's include/asm-generic/pgtable.h fix into the test patch 2/2

Changes in V5: (https://patchwork.kernel.org/project/linux-mm/list/?series=185991)

- Redefined and moved X86 mm_p4d_folded() into a different header per Kirill/Ingo
- Updated the config option comment per Ingo and dropped 'kernel module' reference
- Updated the commit message and dropped 'kernel module' reference
- Changed DEBUG_ARCH_PGTABLE_TEST into DEBUG_VM_PGTABLE per Ingo
- Moved config option from mm/Kconfig.debug into lib/Kconfig.debug
- Renamed core test function arch_pgtable_tests() as debug_vm_pgtable()
- Renamed mm/arch_pgtable_test.c as mm/debug_vm_pgtable.c
- debug_vm_pgtable() gets called from kernel_init_freeable() after init_mm_internals()
- Added an entry in Documentation/features/debug/ per Ingo
- Enabled the test on arm64 and x86 platforms for now

Changes in V4: (https://patchwork.kernel.org/project/linux-mm/list/?series=183465)

- Disable DEBUG_ARCH_PGTABLE_TEST for ARM and IA64 platforms

Changes in V3: (https://lore.kernel.org/patchwork/project/lkml/list/?series=411216)

- Changed test trigger from module format into late_initcall()
- Marked all functions with __init to be freed after completion
- Changed all __PGTABLE_PXX_FOLDED checks as mm_pxx_folded()
- Folded in PPC32 fixes from Christophe

Changes in V2:

https://lore.kernel.org/linux-mm/1568268173-31302-1-git-send-email-anshuman.khandual@arm.com/T/#t

- Fixed small typo error in MODULE_DESCRIPTION()
- Fixed m64k build problems for lvalue concerns in pmd_xxx_tests()
- Fixed dynamic page table level folding problems on x86 as per Kirril
- Fixed second pointers during pxx_populate_tests() per Kirill and Gerald
- Allocate and free pte table with pte_alloc_one/pte_free per Kirill
- Modified pxx_clear_tests() to accommodate s390 lower 12 bits situation
- Changed RANDOM_NZVALUE value from 0xbe to 0xff
- Changed allocation, usage, free sequence for saved_ptep
- Renamed VMA_FLAGS as VMFLAGS
- Implemented a new method for random vaddr generation
- Implemented some other cleanups
- Dropped extern reference to mm_alloc()
- Created and exported new alloc_gigantic_page_order()
- Dropped the custom allocator and used new alloc_gigantic_page_order()

Changes in V1:

https://lore.kernel.org/linux-mm/1567497706-8649-1-git-send-email-anshuman.khandual@arm.com/

- Added fallback mechanism for PMD aligned memory allocation failure

Changes in RFC V2:

https://lore.kernel.org/linux-mm/1565335998-22553-1-git-send-email-anshuman.khandual@arm.com/T/#u

- Moved test module and it's config from lib/ to mm/
- Renamed config TEST_ARCH_PGTABLE as DEBUG_ARCH_PGTABLE_TEST
- Renamed file from test_arch_pgtable.c to arch_pgtable_test.c
- Added relevant MODULE_DESCRIPTION() and MODULE_AUTHOR() details
- Dropped loadable module config option
- Basic tests now use memory blocks with required size and alignment
- PUD aligned memory block gets allocated with alloc_contig_range()
- If PUD aligned memory could not be allocated it falls back on PMD aligned
  memory block from page allocator and pud_* tests are skipped
- Clear and populate tests now operate on real in memory page table entries
- Dummy mm_struct gets allocated with mm_alloc()
- Dummy page table entries get allocated with [pud|pmd|pte]_alloc_[map]()
- Simplified [p4d|pgd]_basic_tests(), now has random values in the entries

Original RFC V1:

https://lore.kernel.org/linux-mm/1564037723-26676-1-git-send-email-anshuman.khandual@arm.com/

 .../debug/debug-vm-pgtable/arch-support.txt   |  35 ++
 arch/arc/Kconfig                              |   1 +
 arch/arm64/Kconfig                            |   1 +
 arch/powerpc/Kconfig                          |   1 +
 arch/x86/Kconfig                              |   1 +
 arch/x86/include/asm/pgtable_64.h             |   6 +
 include/asm-generic/pgtable.h                 |   6 +
 init/main.c                                   |   1 +
 lib/Kconfig.debug                             |  22 +
 mm/Makefile                                   |   1 +
 mm/debug_vm_pgtable.c                         | 388 ++++++++++++++++++
 11 files changed, 463 insertions(+)
 create mode 100644 Documentation/features/debug/debug-vm-pgtable/arch-support.txt
 create mode 100644 mm/debug_vm_pgtable.c

Comments

Qian Cai Jan. 28, 2020, 2:11 a.m. UTC | #1
> On Jan 27, 2020, at 8:28 PM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
> 
> This adds tests which will validate architecture page table helpers and
> other accessors in their compliance with expected generic MM semantics.
> This will help various architectures in validating changes to existing
> page table helpers or addition of new ones.
> 
> This test covers basic page table entry transformations including but not
> limited to old, young, dirty, clean, write, write protect etc at various
> level along with populating intermediate entries with next page table page
> and validating them.
> 
> Test page table pages are allocated from system memory with required size
> and alignments. The mapped pfns at page table levels are derived from a
> real pfn representing a valid kernel text symbol. This test gets called
> right after page_alloc_init_late().
> 
> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
> arm64. Going forward, other architectures too can enable this after fixing
> build or runtime problems (if any) with their page table helpers.

What’s the value of this block of new code? It only supports x86 and arm64 which are supposed to be good now. Did those tests ever find any regression or this is almost only useful for new architectures which only happened once in a few years? The worry if not many people will use this config and code those that much in the future because it is inefficient to find bugs, it will simply be rotten like a few other debugging options out there we have in the mainline that will be a pain to remove later on.
Anshuman Khandual Jan. 28, 2020, 3:06 a.m. UTC | #2
On 01/28/2020 07:41 AM, Qian Cai wrote:
> 
> 
>> On Jan 27, 2020, at 8:28 PM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
>>
>> This adds tests which will validate architecture page table helpers and
>> other accessors in their compliance with expected generic MM semantics.
>> This will help various architectures in validating changes to existing
>> page table helpers or addition of new ones.
>>
>> This test covers basic page table entry transformations including but not
>> limited to old, young, dirty, clean, write, write protect etc at various
>> level along with populating intermediate entries with next page table page
>> and validating them.
>>
>> Test page table pages are allocated from system memory with required size
>> and alignments. The mapped pfns at page table levels are derived from a
>> real pfn representing a valid kernel text symbol. This test gets called
>> right after page_alloc_init_late().
>>
>> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
>> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
>> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
>> arm64. Going forward, other architectures too can enable this after fixing
>> build or runtime problems (if any) with their page table helpers.

Hello Qian,

> 
> What’s the value of this block of new code? It only supports x86 and arm64
> which are supposed to be good now.

We have been over the usefulness of this code many times before as the patch is
already in it's V12. Currently it is enabled on arm64, x86 (except PAE), arc and
ppc32. There are build time or runtime problems with other archs which prevent
enablement of this test (for the moment) but then the goal is to integrate all
of them going forward. The test not only validates platform's adherence to the
expected semantics from generic MM but also helps in keeping it that way during
code changes in future as well.

> Did those tests ever find any regression or this is almost only useful for new

The test has already found problems with s390 page table helpers.

> architectures which only happened once in a few years?

Again, not only it validates what exist today but its also a tool to make
sure that all platforms continue adhere to a common agreed upon semantics
as reflected through the tests here.

> The worry if not many people will use this config and code those that much in

Debug features or tests in the kernel are used when required. These are never or
should not be enabled by default. AFAICT this is true even for entire DEBUG_VM
packaged tests. Do you have any particular data or precedence to substantiate
the fact that this test will be used any less often than the other similar ones
in the tree ? I can only speak for arm64 platform but the very idea for this
test came from Catalin when we were trying to understand the semantics for THP
helpers while enabling THP migration without split. Apart from going over the
commit messages from the past, there were no other way to figure out how any
particular page table helper is suppose to change given page table entry. This
test tries to formalize those semantics.

> the future because it is inefficient to find bugs, it will simply be rotten
Could you be more specific here ? What parts of the test are inefficient ? I
am happy to improve upon the test. Do let me know you if you have suggestions.

> like a few other debugging options out there we have in the mainline that
will be a pain to remove later on.
>

Even though I am not agreeing to your assessment about the usefulness of the
test without any substantial data backing up the claims, the test case in
itself is very much compartmentalized, staying clear from generic MM and
debug_vm_pgtable() is only function executing the test which is getting
called from kernel_init_freeable() path.

- Anshuman
Qian Cai Jan. 28, 2020, 3:33 a.m. UTC | #3
> On Jan 27, 2020, at 10:06 PM, Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> 
> 
> 
> On 01/28/2020 07:41 AM, Qian Cai wrote:
>> 
>> 
>>> On Jan 27, 2020, at 8:28 PM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
>>> 
>>> This adds tests which will validate architecture page table helpers and
>>> other accessors in their compliance with expected generic MM semantics.
>>> This will help various architectures in validating changes to existing
>>> page table helpers or addition of new ones.
>>> 
>>> This test covers basic page table entry transformations including but not
>>> limited to old, young, dirty, clean, write, write protect etc at various
>>> level along with populating intermediate entries with next page table page
>>> and validating them.
>>> 
>>> Test page table pages are allocated from system memory with required size
>>> and alignments. The mapped pfns at page table levels are derived from a
>>> real pfn representing a valid kernel text symbol. This test gets called
>>> right after page_alloc_init_late().
>>> 
>>> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
>>> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
>>> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
>>> arm64. Going forward, other architectures too can enable this after fixing
>>> build or runtime problems (if any) with their page table helpers.
> 
> Hello Qian,
> 
>> 
>> What’s the value of this block of new code? It only supports x86 and arm64
>> which are supposed to be good now.
> 
> We have been over the usefulness of this code many times before as the patch is
> already in it's V12. Currently it is enabled on arm64, x86 (except PAE), arc and
> ppc32. There are build time or runtime problems with other archs which prevent

I am not sure if I care too much about arc and ppc32 which are pretty much legacy
platforms.

> enablement of this test (for the moment) but then the goal is to integrate all
> of them going forward. The test not only validates platform's adherence to the
> expected semantics from generic MM but also helps in keeping it that way during
> code changes in future as well.

Another option maybe to get some decent arches on board first before merging this
thing, so it have more changes to catch regressions for developers who might run this. 

> 
>> Did those tests ever find any regression or this is almost only useful for new
> 
> The test has already found problems with s390 page table helpers.

Hmm, that is pretty weak where s390 is not even official supported with this version.

> 
>> architectures which only happened once in a few years?
> 
> Again, not only it validates what exist today but its also a tool to make
> sure that all platforms continue adhere to a common agreed upon semantics
> as reflected through the tests here.
> 
>> The worry if not many people will use this config and code those that much in
> 
> Debug features or tests in the kernel are used when required. These are never or
> should not be enabled by default. AFAICT this is true even for entire DEBUG_VM
> packaged tests. Do you have any particular data or precedence to substantiate
> the fact that this test will be used any less often than the other similar ones
> in the tree ? I can only speak for arm64 platform but the very idea for this
> test came from Catalin when we were trying to understand the semantics for THP
> helpers while enabling THP migration without split. Apart from going over the
> commit messages from the past, there were no other way to figure out how any
> particular page table helper is suppose to change given page table entry. This
> test tries to formalize those semantics.

I am thinking about how we made so many mistakes before by merging too many of
those debugging options that many of them have been broken for many releases
proving that nobody actually used them regularly. We don’t need to repeat the same
mistake again. I am actually thinking about to remove things like  page_poisoning often
which is almost are never found any bug recently and only cause pains when interacting
with other new features that almost nobody will test them together to begin with.
We even have some SLUB debugging code sit there for almost 15 years that almost
nobody used it and maintainers refused to remove it.

> 
>> the future because it is inefficient to find bugs, it will simply be rotten
> Could you be more specific here ? What parts of the test are inefficient ? I
> am happy to improve upon the test. Do let me know you if you have suggestions.
> 
>> like a few other debugging options out there we have in the mainline that
> will be a pain to remove later on.
>> 
> 
> Even though I am not agreeing to your assessment about the usefulness of the
> test without any substantial data backing up the claims, the test case in
> itself is very much compartmentalized, staying clear from generic MM and
> debug_vm_pgtable() is only function executing the test which is getting
> called from kernel_init_freeable() path.

I am thinking exactly the other way around. You are proposing to merge this tests
without proving how useful it will be able to find regressions for future developers
to make sure it will actually get used.
Anshuman Khandual Jan. 28, 2020, 4:57 a.m. UTC | #4
On 01/28/2020 09:03 AM, Qian Cai wrote:
> 
> 
>> On Jan 27, 2020, at 10:06 PM, Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>
>>
>>
>> On 01/28/2020 07:41 AM, Qian Cai wrote:
>>>
>>>
>>>> On Jan 27, 2020, at 8:28 PM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
>>>>
>>>> This adds tests which will validate architecture page table helpers and
>>>> other accessors in their compliance with expected generic MM semantics.
>>>> This will help various architectures in validating changes to existing
>>>> page table helpers or addition of new ones.
>>>>
>>>> This test covers basic page table entry transformations including but not
>>>> limited to old, young, dirty, clean, write, write protect etc at various
>>>> level along with populating intermediate entries with next page table page
>>>> and validating them.
>>>>
>>>> Test page table pages are allocated from system memory with required size
>>>> and alignments. The mapped pfns at page table levels are derived from a
>>>> real pfn representing a valid kernel text symbol. This test gets called
>>>> right after page_alloc_init_late().
>>>>
>>>> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
>>>> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
>>>> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
>>>> arm64. Going forward, other architectures too can enable this after fixing
>>>> build or runtime problems (if any) with their page table helpers.
>>
>> Hello Qian,
>>
>>>
>>> What’s the value of this block of new code? It only supports x86 and arm64
>>> which are supposed to be good now.
>>
>> We have been over the usefulness of this code many times before as the patch is
>> already in it's V12. Currently it is enabled on arm64, x86 (except PAE), arc and
>> ppc32. There are build time or runtime problems with other archs which prevent
> 
> I am not sure if I care too much about arc and ppc32 which are pretty much legacy
> platforms.

Okay but FWIW the maintainers for all these enabled platforms cared for this test
at the least and really helped in shaping the test to it's current state. Besides
I am still failing to understand your point here about evaluating particular feature's
usefulness based on it's support on relative and perceived importance of some platforms
compared to others. Again the idea is to integrate all platforms eventually but we had
discovered build and runtime issues which needs to be resolved at platform level first.
Unless I am mistaken, debug feature like this which is putting down a framework while
also benefiting some initial platforms to start with, will be a potential candidate for
eventual inclusion in the mainline. Otherwise, please point to any other agreed upon
community criteria for debug feature's mainline inclusion which I will try to adhere.
I wonder if all other similar debug features from the past ever met 'the all inclusive
at the beginning' criteria which you are trying to propose here. This test also adds a
feature file, enlisting all supported archs as suggested by Ingo for the exact same
reason. This is not the first time, a feature is listing out archs which are supported
and archs which are not.

> 
>> enablement of this test (for the moment) but then the goal is to integrate all
>> of them going forward. The test not only validates platform's adherence to the
>> expected semantics from generic MM but also helps in keeping it that way during
>> code changes in future as well.
> 
> Another option maybe to get some decent arches on board first before merging this
> thing, so it have more changes to catch regressions for developers who might run this. 
> 
>>
>>> Did those tests ever find any regression or this is almost only useful for new
>>
>> The test has already found problems with s390 page table helpers.
> 
> Hmm, that is pretty weak where s390 is not even official supported with this version.

And there were valid reasons why s390 could not be enabled just yet as explained by s390
folks during our previous discussions. I just pointed out an example where this test was
useful as you had asked previously. Not being official supported in this version does
not take away the fact the it was indeed useful for that platform in discovering a bug.

> 
>>
>>> architectures which only happened once in a few years?
>>
>> Again, not only it validates what exist today but its also a tool to make
>> sure that all platforms continue adhere to a common agreed upon semantics
>> as reflected through the tests here.
>>
>>> The worry if not many people will use this config and code those that much in
>>
>> Debug features or tests in the kernel are used when required. These are never or
>> should not be enabled by default. AFAICT this is true even for entire DEBUG_VM
>> packaged tests. Do you have any particular data or precedence to substantiate
>> the fact that this test will be used any less often than the other similar ones
>> in the tree ? I can only speak for arm64 platform but the very idea for this
>> test came from Catalin when we were trying to understand the semantics for THP
>> helpers while enabling THP migration without split. Apart from going over the
>> commit messages from the past, there were no other way to figure out how any
>> particular page table helper is suppose to change given page table entry. This
>> test tries to formalize those semantics.
> 
> I am thinking about how we made so many mistakes before by merging too many of
> those debugging options that many of them have been broken for many releases
> proving that nobody actually used them regularly. We don’t need to repeat the same

Again will ask for some data to substantiate these claims. Though I am not really
sure but believe that there are integration test frameworks out there which regularly
validates each of these code path on multiple platforms. One such automation found
that V11 of the test was broken on X86 PAE platform which I fixed. Nonetheless, I can
speak only for arm64 platform and we intend to use this test to validate arm64 exported
page table helpers. Citing unsubstantiated past examples should not really block these
enabled platforms (arm64 at the very least) from getting this debug feature which has
already demonstrated it's usefulness during arm64 THP migration development and on s390
platforms as well.

> mistake again. I am actually thinking about to remove things like  page_poisoning often
> which is almost are never found any bug recently and only cause pains when interacting
> with other new features that almost nobody will test them together to begin with.
> We even have some SLUB debugging code sit there for almost 15 years that almost
> nobody used it and maintainers refused to remove it.

Unlike those, the proposed test here is isolated as a stand alone test and stays clear
off from any other code path. I have not been involved in or aware of the usefulness of
existing MM debug features and hence will just leave them upto the judgment of the
maintainers whether to keep or discard them.

> 
>>
>>> the future because it is inefficient to find bugs, it will simply be rotten
>> Could you be more specific here ? What parts of the test are inefficient ? I
>> am happy to improve upon the test. Do let me know you if you have suggestions.
>>
>>> like a few other debugging options out there we have in the mainline that
>> will be a pain to remove later on.
>>>
>>
>> Even though I am not agreeing to your assessment about the usefulness of the
>> test without any substantial data backing up the claims, the test case in
>> itself is very much compartmentalized, staying clear from generic MM and
>> debug_vm_pgtable() is only function executing the test which is getting
>> called from kernel_init_freeable() path.
> 
> I am thinking exactly the other way around. You are proposing to merge this tests
> without proving how useful it will be able to find regressions for future developers
> to make sure it will actually get used.

As I had mentioned before, the test attempts to formalize page table helper semantics
as expected from generic MM code paths and intend to catch deviations when enabled on
a given platform. How else should we test semantics errors otherwise ? There are past
examples of usefulness for this procedure on arm64 and on s390. I am wondering how
else to prove the usefulness of a debug feature if these references are not enough.

>
Qian Cai Jan. 28, 2020, 5:48 a.m. UTC | #5
> On Jan 27, 2020, at 11:58 PM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
> 
> As I had mentioned before, the test attempts to formalize page table helper semantics
> as expected from generic MM code paths and intend to catch deviations when enabled on
> a given platform. How else should we test semantics errors otherwise ? There are past
> examples of usefulness for this procedure on arm64 and on s390. I am wondering how
> else to prove the usefulness of a debug feature if these references are not enough.

Not saying it will not be useful. As you mentioned it actually found a bug or two in the past. The problem is that there is always a cost to maintain something like this, and nobody knew how things could be broken even for the isolated code you mentioned in the future given how complicated the kernel code base is. I am not so positive that many developers would enable this debug feature and use it on a regular basis from the information you gave so far. 

On the other hand, it might just be good at maintaining this thing out of tree by yourself anyway, because if there isn’t going to be used by many developers, few people is going to contribute to this and even noticed when it is broken. What’s the point of getting this merged apart from being getting some meaningless credits?
Christophe Leroy Jan. 28, 2020, 6:13 a.m. UTC | #6
Le 28/01/2020 à 04:33, Qian Cai a écrit :
> 
> 
>> On Jan 27, 2020, at 10:06 PM, Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>
>>
>>
>> On 01/28/2020 07:41 AM, Qian Cai wrote:
>>>
>>>
>>>> On Jan 27, 2020, at 8:28 PM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
>>>>
>>>> This adds tests which will validate architecture page table helpers and
>>>> other accessors in their compliance with expected generic MM semantics.
>>>> This will help various architectures in validating changes to existing
>>>> page table helpers or addition of new ones.
>>>>
>>>> This test covers basic page table entry transformations including but not
>>>> limited to old, young, dirty, clean, write, write protect etc at various
>>>> level along with populating intermediate entries with next page table page
>>>> and validating them.
>>>>
>>>> Test page table pages are allocated from system memory with required size
>>>> and alignments. The mapped pfns at page table levels are derived from a
>>>> real pfn representing a valid kernel text symbol. This test gets called
>>>> right after page_alloc_init_late().
>>>>
>>>> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
>>>> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
>>>> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
>>>> arm64. Going forward, other architectures too can enable this after fixing
>>>> build or runtime problems (if any) with their page table helpers.
>>
>> Hello Qian,
>>
>>>
>>> What’s the value of this block of new code? It only supports x86 and arm64
>>> which are supposed to be good now.
>>
>> We have been over the usefulness of this code many times before as the patch is
>> already in it's V12. Currently it is enabled on arm64, x86 (except PAE), arc and
>> ppc32. There are build time or runtime problems with other archs which prevent
> 
> I am not sure if I care too much about arc and ppc32 which are pretty much legacy
> platforms.
> 
>> enablement of this test (for the moment) but then the goal is to integrate all
>> of them going forward. The test not only validates platform's adherence to the
>> expected semantics from generic MM but also helps in keeping it that way during
>> code changes in future as well.
> 
> Another option maybe to get some decent arches on board first before merging this
> thing, so it have more changes to catch regressions for developers who might run this.
> 

ppc32 an indecent / legacy platform ? Are you kidying ?

Powerquicc II PRO for instance is fully supported by the manufacturer 
and widely used in many small networking devices.

Christophe
Christophe Leroy Jan. 28, 2020, 6:17 a.m. UTC | #7
Le 28/01/2020 à 06:48, Qian Cai a écrit :
> 
> 
>> On Jan 27, 2020, at 11:58 PM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
>>
>> As I had mentioned before, the test attempts to formalize page table helper semantics
>> as expected from generic MM code paths and intend to catch deviations when enabled on
>> a given platform. How else should we test semantics errors otherwise ? There are past
>> examples of usefulness for this procedure on arm64 and on s390. I am wondering how
>> else to prove the usefulness of a debug feature if these references are not enough.
> 
> Not saying it will not be useful. As you mentioned it actually found a bug or two in the past. The problem is that there is always a cost to maintain something like this, and nobody knew how things could be broken even for the isolated code you mentioned in the future given how complicated the kernel code base is. I am not so positive that many developers would enable this debug feature and use it on a regular basis from the information you gave so far.
> 
> On the other hand, it might just be good at maintaining this thing out of tree by yourself anyway, because if there isn’t going to be used by many developers, few people is going to contribute to this and even noticed when it is broken. What’s the point of getting this merged apart from being getting some meaningless credits?
> 

It is 'default y' so there is no much risk that it is forgotten, at 
least all test suites run with 'allyes_defconfig' will trigger the test, 
so I think it is really a good feature.

Christophe
Qian Cai Jan. 28, 2020, 6:36 a.m. UTC | #8
> On Jan 28, 2020, at 1:17 AM, Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
> It is 'default y' so there is no much risk that it is forgotten, at least all test suites run with 'allyes_defconfig' will trigger the test, so I think it is really a good feature.

This thing depends on DEBUG_VM which I don’t see it is selected by any defconfig. Am I missing anything?
Anshuman Khandual Jan. 28, 2020, 7:03 a.m. UTC | #9
On 01/28/2020 12:06 PM, Qian Cai wrote:
> 
> 
>> On Jan 28, 2020, at 1:17 AM, Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>
>> It is 'default y' so there is no much risk that it is forgotten, at least all test suites run with 'allyes_defconfig' will trigger the test, so I think it is really a good feature.
> 
> This thing depends on DEBUG_VM which I don’t see it is selected by any defconfig. Am I missing anything?
> 

'allyesconfig' makes 'DEBUG_VM = y' which in turn will enable 'DEBUG_VM_PGTABLE = y'
on platforms that subscribe ARCH_HAS_DEBUG_VM_PGTABLE.
Qian Cai Jan. 28, 2020, 7:07 a.m. UTC | #10
> On Jan 28, 2020, at 2:03 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
> 
> 'allyesconfig' makes 'DEBUG_VM = y' which in turn will enable 'DEBUG_VM_PGTABLE = y'
> on platforms that subscribe ARCH_HAS_DEBUG_VM_PGTABLE.

Isn’t that only for compiling testing? Who is booting such a beast and make sure everything working as expected?
Qian Cai Jan. 28, 2020, 7:12 a.m. UTC | #11
> On Jan 28, 2020, at 1:13 AM, Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
> ppc32 an indecent / legacy platform ? Are you kidying ?
> 
> Powerquicc II PRO for instance is fully supported by the manufacturer and widely used in many small networking devices.

Of course I forgot about embedded devices. The problem is that how many developers are actually going to run this debug option on embedded devices?
Mark Brown Jan. 28, 2020, 11:58 a.m. UTC | #12
On Tue, Jan 28, 2020 at 02:12:56AM -0500, Qian Cai wrote:
> > On Jan 28, 2020, at 1:13 AM, Christophe Leroy <christophe.leroy@c-s.fr> wrote:

> > ppc32 an indecent / legacy platform ? Are you kidying ?

> > Powerquicc II PRO for instance is fully supported by the
> > manufacturer and widely used in many small networking devices.

> Of course I forgot about embedded devices. The problem is that how
> many developers are actually going to run this debug option on
> embedded devices?

Much fewer if the code isn't upstream than if it is.  This isn't
something that every developer is going to enable all the time but that
doesn't mean it's not useful, it's more for people doing work on the
architectures or on memory management (or who suspect they're running
into a relevant problem), and I'm sure some of the automated testing
people will enable it.  The more barriers there are in place to getting
the testsuite up and running the less likely it is that any of these
groups will run it regularly.
Mike Rapoport Jan. 28, 2020, 12:09 p.m. UTC | #13
Hello Qian,

On Mon, Jan 27, 2020 at 10:33:08PM -0500, Qian Cai wrote:
> 
> > On Jan 27, 2020, at 10:06 PM, Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> >
> > enablement of this test (for the moment) but then the goal is to integrate all
> > of them going forward. The test not only validates platform's adherence to the
> > expected semantics from generic MM but also helps in keeping it that way during
> > code changes in future as well.
> 
> Another option maybe to get some decent arches on board first before merging this
> thing, so it have more changes to catch regressions for developers who might run this. 

Aren't x86 and arm64 not decent enough?
Even if this test could be used to detect regressions only on these two
platforms, the test is valuable.
Christophe Leroy Jan. 28, 2020, 5:05 p.m. UTC | #14
Le 28/01/2020 à 02:27, Anshuman Khandual a écrit :
> This adds tests which will validate architecture page table helpers and
> other accessors in their compliance with expected generic MM semantics.
> This will help various architectures in validating changes to existing
> page table helpers or addition of new ones.
> 
> This test covers basic page table entry transformations including but not
> limited to old, young, dirty, clean, write, write protect etc at various
> level along with populating intermediate entries with next page table page
> and validating them.
> 
> Test page table pages are allocated from system memory with required size
> and alignments. The mapped pfns at page table levels are derived from a
> real pfn representing a valid kernel text symbol. This test gets called
> right after page_alloc_init_late().
> 
> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
> arm64. Going forward, other architectures too can enable this after fixing
> build or runtime problems (if any) with their page table helpers.
> 
> Folks interested in making sure that a given platform's page table helpers
> conform to expected generic MM semantics should enable the above config
> which will just trigger this test during boot. Any non conformity here will
> be reported as an warning which would need to be fixed. This test will help
> catch any changes to the agreed upon semantics expected from generic MM and
> enable platforms to accommodate it thereafter.
> 

[...]

> 
> Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>		#PPC32

Also tested on PPC64 (under QEMU): book3s/64 64k pages, book3s/64 4k 
pages and book3e/64

> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---

[...]

> 
> diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
> new file mode 100644
> index 000000000000..f3f8111edbe3
> --- /dev/null
> +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
> @@ -0,0 +1,35 @@
> +#
> +# Feature name:          debug-vm-pgtable
> +#         Kconfig:       ARCH_HAS_DEBUG_VM_PGTABLE
> +#         description:   arch supports pgtable tests for semantics compliance
> +#
> +    -----------------------
> +    |         arch |status|
> +    -----------------------
> +    |       alpha: | TODO |
> +    |         arc: |  ok  |
> +    |         arm: | TODO |
> +    |       arm64: |  ok  |
> +    |         c6x: | TODO |
> +    |        csky: | TODO |
> +    |       h8300: | TODO |
> +    |     hexagon: | TODO |
> +    |        ia64: | TODO |
> +    |        m68k: | TODO |
> +    |  microblaze: | TODO |
> +    |        mips: | TODO |
> +    |       nds32: | TODO |
> +    |       nios2: | TODO |
> +    |    openrisc: | TODO |
> +    |      parisc: | TODO |
> +    |  powerpc/32: |  ok  |
> +    |  powerpc/64: | TODO |

You can change the two above lines by

	powerpc: ok

> +    |       riscv: | TODO |
> +    |        s390: | TODO |
> +    |          sh: | TODO |
> +    |       sparc: | TODO |
> +    |          um: | TODO |
> +    |   unicore32: | TODO |
> +    |         x86: |  ok  |
> +    |      xtensa: | TODO |
> +    -----------------------

> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 1ec34e16ed65..253dcab0bebc 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -120,6 +120,7 @@ config PPC
>   	#
>   	select ARCH_32BIT_OFF_T if PPC32
>   	select ARCH_HAS_DEBUG_VIRTUAL
> +	select ARCH_HAS_DEBUG_VM_PGTABLE if PPC32

Remove the 'if PPC32' as we now know it also work on PPC64.

>   	select ARCH_HAS_DEVMEM_IS_ALLOWED
>   	select ARCH_HAS_ELF_RANDOMIZE
>   	select ARCH_HAS_FORTIFY_SOURCE

> diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
> index 0b6c4042942a..fb0e76d254b3 100644
> --- a/arch/x86/include/asm/pgtable_64.h
> +++ b/arch/x86/include/asm/pgtable_64.h
> @@ -53,6 +53,12 @@ static inline void sync_initial_page_table(void) { }
>   
>   struct mm_struct;
>   
> +#define mm_p4d_folded mm_p4d_folded
> +static inline bool mm_p4d_folded(struct mm_struct *mm)
> +{
> +	return !pgtable_l5_enabled();
> +}
> +

For me this should be part of another patch, it is not directly linked 
to the tests.

>   void set_pte_vaddr_p4d(p4d_t *p4d_page, unsigned long vaddr, pte_t new_pte);
>   void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte);
>   
> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> index 798ea36a0549..e0b04787e789 100644
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -1208,6 +1208,12 @@ static inline bool arch_has_pfn_modify_check(void)
>   # define PAGE_KERNEL_EXEC PAGE_KERNEL
>   #endif
>   
> +#ifdef CONFIG_DEBUG_VM_PGTABLE

Not sure it is a good idea to put that in include/asm-generic/pgtable.h

By doing this you are forcing a rebuild of almost all files, whereas 
only init/main.o and mm/debug_vm_pgtable.o should be rebuilt when 
activating this config option.

> +extern void debug_vm_pgtable(void);

Please don't use the 'extern' keyword, it is useless and not to be used 
for functions declaration.

> +#else
> +static inline void debug_vm_pgtable(void) { }
> +#endif
> +
>   #endif /* !__ASSEMBLY__ */
>   
>   #ifndef io_remap_pfn_range
> diff --git a/init/main.c b/init/main.c
> index da1bc0b60a7d..5e59e6ac0780 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1197,6 +1197,7 @@ static noinline void __init kernel_init_freeable(void)
>   	sched_init_smp();
>   
>   	page_alloc_init_late();
> +	debug_vm_pgtable();

Wouldn't it be better to call debug_vm_pgtable() in kernel_init() 
between the call to async_synchronise_full() and ftrace_free_init_mem() ?

>   	/* Initialize page ext after all struct pages are initialized. */
>   	page_ext_init();
>   
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 5ffe144c9794..7cceae923c05 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -653,6 +653,12 @@ config SCHED_STACK_END_CHECK
>   	  data corruption or a sporadic crash at a later stage once the region
>   	  is examined. The runtime overhead introduced is minimal.
>   
> +config ARCH_HAS_DEBUG_VM_PGTABLE
> +	bool
> +	help
> +	  An architecture should select this when it can successfully
> +	  build and run DEBUG_VM_PGTABLE.
> +
>   config DEBUG_VM
>   	bool "Debug VM"
>   	depends on DEBUG_KERNEL
> @@ -688,6 +694,22 @@ config DEBUG_VM_PGFLAGS
>   
>   	  If unsure, say N.
>   
> +config DEBUG_VM_PGTABLE
> +	bool "Debug arch page table for semantics compliance"
> +	depends on MMU
> +	depends on DEBUG_VM

Does it really need to depend on DEBUG_VM ?
I think we could make it standalone and 'default y if DEBUG_VM' instead.

> +	depends on ARCH_HAS_DEBUG_VM_PGTABLE
> +	default y
> +	help
> +	  This option provides a debug method which can be used to test
> +	  architecture page table helper functions on various platforms in
> +	  verifying if they comply with expected generic MM semantics. This
> +	  will help architecture code in making sure that any changes or
> +	  new additions of these helpers still conform to expected
> +	  semantics of the generic MM.
> +
> +	  If unsure, say N.
> +

Does it make sense to make it 'default y' and say 'If unsure, say N' ?

>   config ARCH_HAS_DEBUG_VIRTUAL
>   	bool
>   

Christophe
Catalin Marinas Jan. 28, 2020, 5:47 p.m. UTC | #15
On Mon, Jan 27, 2020 at 09:11:53PM -0500, Qian Cai wrote:
> On Jan 27, 2020, at 8:28 PM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
> > This adds tests which will validate architecture page table helpers and
> > other accessors in their compliance with expected generic MM semantics.
> > This will help various architectures in validating changes to existing
> > page table helpers or addition of new ones.
[...]
> What’s the value of this block of new code? It only supports x86 and
> arm64 which are supposed to be good now. Did those tests ever find any
> regression or this is almost only useful for new architectures which
> only happened once in a few years?

The primary goal here is not finding regressions but having clearly
defined semantics of the page table accessors across architectures. x86
and arm64 are a good starting point and other architectures will be
enabled as they are aligned to the same semantics.

See for example this past discussion:

https://lore.kernel.org/linux-mm/20190628102003.GA56463@arrakis.emea.arm.com/

These tests should act as the 'contract' between the generic mm code and
the architecture port. Without clear semantics, some bugs may be a lot
subtler than a boot failure.

FTR, I fully support this patch (and I should get around to review it
properly; thanks for the reminder ;)).
Qian Cai Jan. 28, 2020, 7:07 p.m. UTC | #16
> On Jan 28, 2020, at 12:47 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
> 
> The primary goal here is not finding regressions but having clearly
> defined semantics of the page table accessors across architectures. x86
> and arm64 are a good starting point and other architectures will be
> enabled as they are aligned to the same semantics.

This still does not answer the fundamental question. If this test is simply inefficient to find bugs, who wants to spend time to use it regularly?  If this is just one off test that may get running once in a few years (when introducing a new arch), how does it justify the ongoing cost to maintain it?

I do agree there could be a need to clearly define this thing but that belongs to documentation rather than testing purpose. It is confusing to mix this with other config options which have somewhat a different purpose, it will then be a waste of time for people who mistakenly enable this for regular automatic testing and never found any bug from it.
Catalin Marinas Jan. 29, 2020, 10:36 a.m. UTC | #17
On Tue, Jan 28, 2020 at 02:07:10PM -0500, Qian Cai wrote:
> On Jan 28, 2020, at 12:47 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > The primary goal here is not finding regressions but having clearly
> > defined semantics of the page table accessors across architectures. x86
> > and arm64 are a good starting point and other architectures will be
> > enabled as they are aligned to the same semantics.
> 
> This still does not answer the fundamental question. If this test is
> simply inefficient to find bugs,

Who said this is inefficient (other than you)?

> who wants to spend time to use it regularly? 

Arch maintainers, mm maintainers introducing new macros or assuming
certain new semantics of the existing macros.

> If this is just one off test that may get running once in a few years
> (when introducing a new arch), how does it justify the ongoing cost to
> maintain it?

You are really missing the point. It's not only for a new arch but
changes to existing arch code. And if the arch code churn in this area
is relatively small, I'd expect a similarly small cost of maintaining
this test.

If you only turn DEBUG_VM on once every few years, don't generalise this
to the rest of the kernel developers (as others pointed out, this test
is default y if DEBUG_VM).

Anyway, I think that's a pointless discussion, so not going to reply
further (unless you have technical content to add).
Qian Cai Jan. 29, 2020, 11:09 a.m. UTC | #18
> On Jan 29, 2020, at 5:36 AM, Catalin Marinas <catalin.marinas@arm.com> wrote:
> 
> On Tue, Jan 28, 2020 at 02:07:10PM -0500, Qian Cai wrote:
>> On Jan 28, 2020, at 12:47 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>> The primary goal here is not finding regressions but having clearly
>>> defined semantics of the page table accessors across architectures. x86
>>> and arm64 are a good starting point and other architectures will be
>>> enabled as they are aligned to the same semantics.
>> 
>> This still does not answer the fundamental question. If this test is
>> simply inefficient to find bugs,
> 
> Who said this is inefficient (other than you)?

Inefficient of finding bugs. It said only found a bug or two in its lifetime?

> 
>> who wants to spend time to use it regularly? 
> 
> Arch maintainers, mm maintainers introducing new macros or assuming
> certain new semantics of the existing macros.
> 
>> If this is just one off test that may get running once in a few years
>> (when introducing a new arch), how does it justify the ongoing cost to
>> maintain it?
> 
> You are really missing the point. It's not only for a new arch but
> changes to existing arch code. And if the arch code churn in this area
> is relatively small, I'd expect a similarly small cost of maintaining
> this test.
> 
> If you only turn DEBUG_VM on once every few years, don't generalise this
> to the rest of the kernel developers (as others pointed out, this test
> is default y if DEBUG_VM).

Quite the opposite, I am running DEBUG_VM almost daily for regression
workload while I felt strongly this thing does not add any value mixing there.

So, I would suggest to decouple this away from DEBUG_VM, and clearly
document that this test is not something intended for automated regression
workloads, so those people don’t need to waste time running this.

> 
> Anyway, I think that's a pointless discussion, so not going to reply
> further (unless you have technical content to add).
> 
> -- 
> Catalin
Gerald Schaefer Jan. 29, 2020, 10:20 p.m. UTC | #19
On Tue, 28 Jan 2020 06:57:53 +0530
Anshuman Khandual <anshuman.khandual@arm.com> wrote:

> This adds tests which will validate architecture page table helpers and
> other accessors in their compliance with expected generic MM semantics.
> This will help various architectures in validating changes to existing
> page table helpers or addition of new ones.
> 
> This test covers basic page table entry transformations including but not
> limited to old, young, dirty, clean, write, write protect etc at various
> level along with populating intermediate entries with next page table page
> and validating them.
> 
> Test page table pages are allocated from system memory with required size
> and alignments. The mapped pfns at page table levels are derived from a
> real pfn representing a valid kernel text symbol. This test gets called
> right after page_alloc_init_late().
> 
> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
> arm64. Going forward, other architectures too can enable this after fixing
> build or runtime problems (if any) with their page table helpers.
> 
> Folks interested in making sure that a given platform's page table helpers
> conform to expected generic MM semantics should enable the above config
> which will just trigger this test during boot. Any non conformity here will
> be reported as an warning which would need to be fixed. This test will help
> catch any changes to the agreed upon semantics expected from generic MM and
> enable platforms to accommodate it thereafter.
> 

[...]

> 
> Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>		#PPC32

Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> # s390

Thanks again for this effort, and for keeping up the spirit against
all odds and even after 12 iterations :-)

> 
> diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
> new file mode 100644
> index 000000000000..f3f8111edbe3
> --- /dev/null
> +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
> @@ -0,0 +1,35 @@
> +#
> +# Feature name:          debug-vm-pgtable
> +#         Kconfig:       ARCH_HAS_DEBUG_VM_PGTABLE
> +#         description:   arch supports pgtable tests for semantics compliance
> +#
> +    -----------------------
> +    |         arch |status|
> +    -----------------------
> +    |       alpha: | TODO |
> +    |         arc: |  ok  |
> +    |         arm: | TODO |
> +    |       arm64: |  ok  |
> +    |         c6x: | TODO |
> +    |        csky: | TODO |
> +    |       h8300: | TODO |
> +    |     hexagon: | TODO |
> +    |        ia64: | TODO |
> +    |        m68k: | TODO |
> +    |  microblaze: | TODO |
> +    |        mips: | TODO |
> +    |       nds32: | TODO |
> +    |       nios2: | TODO |
> +    |    openrisc: | TODO |
> +    |      parisc: | TODO |
> +    |  powerpc/32: |  ok  |
> +    |  powerpc/64: | TODO |
> +    |       riscv: | TODO |
> +    |        s390: | TODO |

s390 is ok now, with my patches included in v5.5-rc1. So you can now add

--- a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
+++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
@@ -25,7 +25,7 @@
     |  powerpc/32: |  ok  |
     |  powerpc/64: | TODO |
     |       riscv: | TODO |
-    |        s390: | TODO |
+    |        s390: |  ok  |
     |          sh: | TODO |
     |       sparc: | TODO |
     |          um: | TODO |
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -59,6 +59,7 @@ config KASAN_SHADOW_OFFSET
 config S390
 	def_bool y
 	select ARCH_BINFMT_ELF_STATE
+	select ARCH_HAS_DEBUG_VM_PGTABLE
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_FORTIFY_SOURCE
Gerald Schaefer Jan. 29, 2020, 10:20 p.m. UTC | #20
On Mon, 27 Jan 2020 22:33:08 -0500
Qian Cai <cai@lca.pw> wrote:

> > 
> >> Did those tests ever find any regression or this is almost only useful for new
> > 
> > The test has already found problems with s390 page table helpers.
> 
> Hmm, that is pretty weak where s390 is not even official supported with this version.
> 

I first had to get the three patches upstream, each fixing non-conform
behavior on s390, and each issue was found by this extremely useful test:

2416cefc504b s390/mm: add mm_pxd_folded() checks to pxd_free()
2d1fc1eb9b54 s390/mm: simplify page table helpers for large entries
1c27a4bc817b s390/mm: make pmd/pud_bad() report large entries as bad

I did not see any direct effect of this misbehavior yet, but I am
very happy that this could be found and fixed in order to prevent
future issues. And this is exactly the value of this test, to make
sure that all architectures have a common understanding of how
the various page table helpers are supposed to work.

For example, who would have thought that pXd_bad() is supposed to
report large entries as bad? It's not really documented anywhere,
so we just checked them for sanity like normal entries, which
apparently worked fine so far, but for how long?
Mike Rapoport Jan. 30, 2020, 7:27 a.m. UTC | #21
On Wed, Jan 29, 2020 at 11:20:44PM +0100, Gerald Schaefer wrote:
> On Mon, 27 Jan 2020 22:33:08 -0500
> 
> For example, who would have thought that pXd_bad() is supposed to
> report large entries as bad? It's not really documented anywhere,

A bit off-topic,

@Anshuman, maybe you could start a Documentation/ patch that describes at
least some of the pXd_whaterver()?
Or that would be too much to ask? ;-)

> so we just checked them for sanity like normal entries, which
> apparently worked fine so far, but for how long?
Anshuman Khandual Jan. 30, 2020, 1:04 p.m. UTC | #22
On 01/28/2020 10:35 PM, Christophe Leroy wrote:
> 
> 
> Le 28/01/2020 à 02:27, Anshuman Khandual a écrit :
>> This adds tests which will validate architecture page table helpers and
>> other accessors in their compliance with expected generic MM semantics.
>> This will help various architectures in validating changes to existing
>> page table helpers or addition of new ones.
>>
>> This test covers basic page table entry transformations including but not
>> limited to old, young, dirty, clean, write, write protect etc at various
>> level along with populating intermediate entries with next page table page
>> and validating them.
>>
>> Test page table pages are allocated from system memory with required size
>> and alignments. The mapped pfns at page table levels are derived from a
>> real pfn representing a valid kernel text symbol. This test gets called
>> right after page_alloc_init_late().
>>
>> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
>> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
>> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
>> arm64. Going forward, other architectures too can enable this after fixing
>> build or runtime problems (if any) with their page table helpers.
>>
>> Folks interested in making sure that a given platform's page table helpers
>> conform to expected generic MM semantics should enable the above config
>> which will just trigger this test during boot. Any non conformity here will
>> be reported as an warning which would need to be fixed. This test will help
>> catch any changes to the agreed upon semantics expected from generic MM and
>> enable platforms to accommodate it thereafter.
>>
> 
> [...]
> 
>>
>> Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>        #PPC32
> 
> Also tested on PPC64 (under QEMU): book3s/64 64k pages, book3s/64 4k pages and book3e/64

Hmm but earlier Michael Ellerman had reported some problems while
running these tests on PPC64, a soft lock up in hash__pte_update()
and a kernel BUG (radix MMU). Are those problems gone away now ?

Details in this thread - https://patchwork.kernel.org/patch/11214603/

> 
>> Reviewed-by: Ingo Molnar <mingo@kernel.org>
>> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
> 
> [...]
> 
>>
>> diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
>> new file mode 100644
>> index 000000000000..f3f8111edbe3
>> --- /dev/null
>> +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
>> @@ -0,0 +1,35 @@
>> +#
>> +# Feature name:          debug-vm-pgtable
>> +#         Kconfig:       ARCH_HAS_DEBUG_VM_PGTABLE
>> +#         description:   arch supports pgtable tests for semantics compliance
>> +#
>> +    -----------------------
>> +    |         arch |status|
>> +    -----------------------
>> +    |       alpha: | TODO |
>> +    |         arc: |  ok  |
>> +    |         arm: | TODO |
>> +    |       arm64: |  ok  |
>> +    |         c6x: | TODO |
>> +    |        csky: | TODO |
>> +    |       h8300: | TODO |
>> +    |     hexagon: | TODO |
>> +    |        ia64: | TODO |
>> +    |        m68k: | TODO |
>> +    |  microblaze: | TODO |
>> +    |        mips: | TODO |
>> +    |       nds32: | TODO |
>> +    |       nios2: | TODO |
>> +    |    openrisc: | TODO |
>> +    |      parisc: | TODO |
>> +    |  powerpc/32: |  ok  |
>> +    |  powerpc/64: | TODO |
> 
> You can change the two above lines by
> 
>     powerpc: ok
>
>> +    |       riscv: | TODO |
>> +    |        s390: | TODO |
>> +    |          sh: | TODO |
>> +    |       sparc: | TODO |
>> +    |          um: | TODO |
>> +    |   unicore32: | TODO |
>> +    |         x86: |  ok  |
>> +    |      xtensa: | TODO |
>> +    -----------------------
> 
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 1ec34e16ed65..253dcab0bebc 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -120,6 +120,7 @@ config PPC
>>       #
>>       select ARCH_32BIT_OFF_T if PPC32
>>       select ARCH_HAS_DEBUG_VIRTUAL
>> +    select ARCH_HAS_DEBUG_VM_PGTABLE if PPC32
> 
> Remove the 'if PPC32' as we now know it also work on PPC64.

But in case there is a subset of PPC64 which still does not work
(problem reported earlier) with the test, will have to adjust the
config accordingly.

> 
>>       select ARCH_HAS_DEVMEM_IS_ALLOWED
>>       select ARCH_HAS_ELF_RANDOMIZE
>>       select ARCH_HAS_FORTIFY_SOURCE
> 
>> diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
>> index 0b6c4042942a..fb0e76d254b3 100644
>> --- a/arch/x86/include/asm/pgtable_64.h
>> +++ b/arch/x86/include/asm/pgtable_64.h
>> @@ -53,6 +53,12 @@ static inline void sync_initial_page_table(void) { }
>>     struct mm_struct;
>>   +#define mm_p4d_folded mm_p4d_folded
>> +static inline bool mm_p4d_folded(struct mm_struct *mm)
>> +{
>> +    return !pgtable_l5_enabled();
>> +}
>> +
> 
> For me this should be part of another patch, it is not directly linked to the tests.

We did discuss about this earlier and Kirril mentioned its not worth
a separate patch.

https://lore.kernel.org/linux-arm-kernel/20190913091305.rkds4f3fqv3yjhjy@box/

> 
>>   void set_pte_vaddr_p4d(p4d_t *p4d_page, unsigned long vaddr, pte_t new_pte);
>>   void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte);
>>   diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
>> index 798ea36a0549..e0b04787e789 100644
>> --- a/include/asm-generic/pgtable.h
>> +++ b/include/asm-generic/pgtable.h
>> @@ -1208,6 +1208,12 @@ static inline bool arch_has_pfn_modify_check(void)
>>   # define PAGE_KERNEL_EXEC PAGE_KERNEL
>>   #endif
>>   +#ifdef CONFIG_DEBUG_VM_PGTABLE
> 
> Not sure it is a good idea to put that in include/asm-generic/pgtable.h

Logically that is the right place, as it is related to page table but
not something platform related.

> 
> By doing this you are forcing a rebuild of almost all files, whereas only init/main.o and mm/debug_vm_pgtable.o should be rebuilt when activating this config option.

I agreed but whats the alternative ? We could move these into init/main.c
to make things simpler but will that be a right place, given its related
to generic page table.

> 
>> +extern void debug_vm_pgtable(void);
> 
> Please don't use the 'extern' keyword, it is useless and not to be used for functions declaration.

Really ? But, there are tons of examples doing the same thing both in
generic and platform code as well.

> 
>> +#else
>> +static inline void debug_vm_pgtable(void) { }
>> +#endif
>> +
>>   #endif /* !__ASSEMBLY__ */
>>     #ifndef io_remap_pfn_range
>> diff --git a/init/main.c b/init/main.c
>> index da1bc0b60a7d..5e59e6ac0780 100644
>> --- a/init/main.c
>> +++ b/init/main.c
>> @@ -1197,6 +1197,7 @@ static noinline void __init kernel_init_freeable(void)
>>       sched_init_smp();
>>         page_alloc_init_late();
>> +    debug_vm_pgtable();
> 
> Wouldn't it be better to call debug_vm_pgtable() in kernel_init() between the call to async_synchronise_full() and ftrace_free_init_mem() ?

IIRC, proposed location is the earliest we could call debug_vm_pgtable().
Is there any particular benefit or reason to move it into kernel_init() ?

> 
>>       /* Initialize page ext after all struct pages are initialized. */
>>       page_ext_init();
>>   diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>> index 5ffe144c9794..7cceae923c05 100644
>> --- a/lib/Kconfig.debug
>> +++ b/lib/Kconfig.debug
>> @@ -653,6 +653,12 @@ config SCHED_STACK_END_CHECK
>>         data corruption or a sporadic crash at a later stage once the region
>>         is examined. The runtime overhead introduced is minimal.
>>   +config ARCH_HAS_DEBUG_VM_PGTABLE
>> +    bool
>> +    help
>> +      An architecture should select this when it can successfully
>> +      build and run DEBUG_VM_PGTABLE.
>> +
>>   config DEBUG_VM
>>       bool "Debug VM"
>>       depends on DEBUG_KERNEL
>> @@ -688,6 +694,22 @@ config DEBUG_VM_PGFLAGS
>>           If unsure, say N.
>>   +config DEBUG_VM_PGTABLE
>> +    bool "Debug arch page table for semantics compliance"
>> +    depends on MMU
>> +    depends on DEBUG_VM
> 
> Does it really need to depend on DEBUG_VM ?

No. It seemed better to package this test along with DEBUG_VM (although I
dont remember the conversation around it) and hence this dependency.

> I think we could make it standalone and 'default y if DEBUG_VM' instead.

Which will yield the same result like before but in a different way. But
yes, this test could go about either way but unless there is a good enough
reason why change the current one.

> 
>> +    depends on ARCH_HAS_DEBUG_VM_PGTABLE
>> +    default y
>> +    help
>> +      This option provides a debug method which can be used to test
>> +      architecture page table helper functions on various platforms in
>> +      verifying if they comply with expected generic MM semantics. This
>> +      will help architecture code in making sure that any changes or
>> +      new additions of these helpers still conform to expected
>> +      semantics of the generic MM.
>> +
>> +      If unsure, say N.
>> +
> 
> Does it make sense to make it 'default y' and say 'If unsure, say N' ?

No it does. Not when it defaults 'y' unconditionally. Will drop the last
sentence "If unsure, say N". Nice catch, thank you.

> 
>>   config ARCH_HAS_DEBUG_VIRTUAL
>>       bool
>>   
> 
> Christophe
>
Anshuman Khandual Jan. 30, 2020, 1:11 p.m. UTC | #23
On 01/30/2020 03:50 AM, Gerald Schaefer wrote:
> On Tue, 28 Jan 2020 06:57:53 +0530
> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> 
>> This adds tests which will validate architecture page table helpers and
>> other accessors in their compliance with expected generic MM semantics.
>> This will help various architectures in validating changes to existing
>> page table helpers or addition of new ones.
>>
>> This test covers basic page table entry transformations including but not
>> limited to old, young, dirty, clean, write, write protect etc at various
>> level along with populating intermediate entries with next page table page
>> and validating them.
>>
>> Test page table pages are allocated from system memory with required size
>> and alignments. The mapped pfns at page table levels are derived from a
>> real pfn representing a valid kernel text symbol. This test gets called
>> right after page_alloc_init_late().
>>
>> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
>> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
>> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
>> arm64. Going forward, other architectures too can enable this after fixing
>> build or runtime problems (if any) with their page table helpers.
>>
>> Folks interested in making sure that a given platform's page table helpers
>> conform to expected generic MM semantics should enable the above config
>> which will just trigger this test during boot. Any non conformity here will
>> be reported as an warning which would need to be fixed. This test will help
>> catch any changes to the agreed upon semantics expected from generic MM and
>> enable platforms to accommodate it thereafter.
>>
> 
> [...]
> 
>>
>> Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>		#PPC32
> 
> Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> # s390

Thanks for testing.

> 
> Thanks again for this effort, and for keeping up the spirit against
> all odds and even after 12 iterations :-)
> 
>>
>> diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
>> new file mode 100644
>> index 000000000000..f3f8111edbe3
>> --- /dev/null
>> +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
>> @@ -0,0 +1,35 @@
>> +#
>> +# Feature name:          debug-vm-pgtable
>> +#         Kconfig:       ARCH_HAS_DEBUG_VM_PGTABLE
>> +#         description:   arch supports pgtable tests for semantics compliance
>> +#
>> +    -----------------------
>> +    |         arch |status|
>> +    -----------------------
>> +    |       alpha: | TODO |
>> +    |         arc: |  ok  |
>> +    |         arm: | TODO |
>> +    |       arm64: |  ok  |
>> +    |         c6x: | TODO |
>> +    |        csky: | TODO |
>> +    |       h8300: | TODO |
>> +    |     hexagon: | TODO |
>> +    |        ia64: | TODO |
>> +    |        m68k: | TODO |
>> +    |  microblaze: | TODO |
>> +    |        mips: | TODO |
>> +    |       nds32: | TODO |
>> +    |       nios2: | TODO |
>> +    |    openrisc: | TODO |
>> +    |      parisc: | TODO |
>> +    |  powerpc/32: |  ok  |
>> +    |  powerpc/64: | TODO |
>> +    |       riscv: | TODO |
>> +    |        s390: | TODO |
> 
> s390 is ok now, with my patches included in v5.5-rc1. So you can now add
> 
> --- a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
> +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
> @@ -25,7 +25,7 @@
>      |  powerpc/32: |  ok  |
>      |  powerpc/64: | TODO |
>      |       riscv: | TODO |
> -    |        s390: | TODO |
> +    |        s390: |  ok  |
>      |          sh: | TODO |
>      |       sparc: | TODO |
>      |          um: | TODO |
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -59,6 +59,7 @@ config KASAN_SHADOW_OFFSET
>  config S390
>  	def_bool y
>  	select ARCH_BINFMT_ELF_STATE
> +	select ARCH_HAS_DEBUG_VM_PGTABLE
>  	select ARCH_HAS_DEVMEM_IS_ALLOWED
>  	select ARCH_HAS_ELF_RANDOMIZE
>  	select ARCH_HAS_FORTIFY_SOURCE

Sure, will add this up.
Anshuman Khandual Jan. 30, 2020, 1:32 p.m. UTC | #24
On 01/30/2020 12:57 PM, Mike Rapoport wrote:
> On Wed, Jan 29, 2020 at 11:20:44PM +0100, Gerald Schaefer wrote:
>> On Mon, 27 Jan 2020 22:33:08 -0500
>>
>> For example, who would have thought that pXd_bad() is supposed to
>> report large entries as bad? It's not really documented anywhere,
> 
> A bit off-topic,
> 
> @Anshuman, maybe you could start a Documentation/ patch that describes at
> least some of the pXd_whaterver()?
> Or that would be too much to ask? ;-)

No, it would not be :) I have been documenting the expected semantics for
the helpers in the test itself. The idea is to collate them all (have been
working on some additional tests but waiting for this one to get merged
first) here and once most of the test gets settled, will move semantics
documentation from here into Documentation/ directory in a proper format.

/*
 * Basic operations
 *
 * mkold(entry)			= An old and not a young entry
 * mkyoung(entry)		= A young and not an old entry
 * mkdirty(entry)		= A dirty and not a clean entry
 * mkclean(entry)		= A clean and not a dirty entry
 * mkwrite(entry)		= A write and not a write protected entry
 * wrprotect(entry)		= A write protected and not a write entry
 * pxx_bad(entry)		= A mapped and non-table entry
 * pxx_same(entry1, entry2)	= Both entries hold the exact same value
 */ 



> 
>> so we just checked them for sanity like normal entries, which
>> apparently worked fine so far, but for how long?
>
Christophe Leroy Jan. 30, 2020, 2:13 p.m. UTC | #25
Le 30/01/2020 à 14:04, Anshuman Khandual a écrit :
> 
> On 01/28/2020 10:35 PM, Christophe Leroy wrote:
>>
>>
>> Le 28/01/2020 à 02:27, Anshuman Khandual a écrit :
>>> diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
>>> index 0b6c4042942a..fb0e76d254b3 100644
>>> --- a/arch/x86/include/asm/pgtable_64.h
>>> +++ b/arch/x86/include/asm/pgtable_64.h
>>> @@ -53,6 +53,12 @@ static inline void sync_initial_page_table(void) { }
>>>      struct mm_struct;
>>>    +#define mm_p4d_folded mm_p4d_folded
>>> +static inline bool mm_p4d_folded(struct mm_struct *mm)
>>> +{
>>> +    return !pgtable_l5_enabled();
>>> +}
>>> +
>>
>> For me this should be part of another patch, it is not directly linked to the tests.
> 
> We did discuss about this earlier and Kirril mentioned its not worth
> a separate patch.
> 
> https://lore.kernel.org/linux-arm-kernel/20190913091305.rkds4f3fqv3yjhjy@box/

For me it would make sense to not mix this patch which implement tests, 
and changes that are needed for the test to work (or even build) on the 
different architectures.

But that's up to you.

> 
>>
>>>    void set_pte_vaddr_p4d(p4d_t *p4d_page, unsigned long vaddr, pte_t new_pte);
>>>    void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte);
>>>    diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
>>> index 798ea36a0549..e0b04787e789 100644
>>> --- a/include/asm-generic/pgtable.h
>>> +++ b/include/asm-generic/pgtable.h
>>> @@ -1208,6 +1208,12 @@ static inline bool arch_has_pfn_modify_check(void)
>>>    # define PAGE_KERNEL_EXEC PAGE_KERNEL
>>>    #endif
>>>    +#ifdef CONFIG_DEBUG_VM_PGTABLE
>>
>> Not sure it is a good idea to put that in include/asm-generic/pgtable.h
> 
> Logically that is the right place, as it is related to page table but
> not something platform related.

I can't see any debug related features in that file.

> 
>>
>> By doing this you are forcing a rebuild of almost all files, whereas only init/main.o and mm/debug_vm_pgtable.o should be rebuilt when activating this config option.
> 
> I agreed but whats the alternative ? We could move these into init/main.c
> to make things simpler but will that be a right place, given its related
> to generic page table.

What about linux/mmdebug.h instead ? (I have not checked if it would 
reduce the impact, but that's where things related to CONFIG_DEBUG_VM 
seems to be).

Otherwise, you can just create new file, for instance 
<linux/mmdebug-pgtable.h> and include that file only in the init/main.c 
and mm/debug_vm_pgtable.c



> 
>>
>>> +extern void debug_vm_pgtable(void);
>>
>> Please don't use the 'extern' keyword, it is useless and not to be used for functions declaration.
> 
> Really ? But, there are tons of examples doing the same thing both in
> generic and platform code as well.

Yes, but how can we improve if we blindly copy the errors from the past 
? Having tons of 'extern' doesn't mean we must add more.

I think checkpatch.pl usually complains when a patch brings a new 
unreleval extern symbol.

> 
>>
>>> +#else
>>> +static inline void debug_vm_pgtable(void) { }
>>> +#endif
>>> +
>>>    #endif /* !__ASSEMBLY__ */
>>>      #ifndef io_remap_pfn_range
>>> diff --git a/init/main.c b/init/main.c
>>> index da1bc0b60a7d..5e59e6ac0780 100644
>>> --- a/init/main.c
>>> +++ b/init/main.c
>>> @@ -1197,6 +1197,7 @@ static noinline void __init kernel_init_freeable(void)
>>>        sched_init_smp();
>>>          page_alloc_init_late();
>>> +    debug_vm_pgtable();
>>
>> Wouldn't it be better to call debug_vm_pgtable() in kernel_init() between the call to async_synchronise_full() and ftrace_free_init_mem() ?
> 
> IIRC, proposed location is the earliest we could call debug_vm_pgtable().
> Is there any particular benefit or reason to move it into kernel_init() ?

It would avoid having it lost in the middle of drivers logs, would be 
close to the end of init, at a place we can't miss it, close to the 
result of other tests like CONFIG_DEBUG_RODATA_TEST for instance.

At the moment, you have to look for it to be sure the test is done and 
what the result is.

> 
>>
>>>        /* Initialize page ext after all struct pages are initialized. */
>>>        page_ext_init();
>>>    diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>>> index 5ffe144c9794..7cceae923c05 100644
>>> --- a/lib/Kconfig.debug
>>> +++ b/lib/Kconfig.debug
>>> @@ -653,6 +653,12 @@ config SCHED_STACK_END_CHECK
>>>          data corruption or a sporadic crash at a later stage once the region
>>>          is examined. The runtime overhead introduced is minimal.
>>>    +config ARCH_HAS_DEBUG_VM_PGTABLE
>>> +    bool
>>> +    help
>>> +      An architecture should select this when it can successfully
>>> +      build and run DEBUG_VM_PGTABLE.
>>> +
>>>    config DEBUG_VM
>>>        bool "Debug VM"
>>>        depends on DEBUG_KERNEL
>>> @@ -688,6 +694,22 @@ config DEBUG_VM_PGFLAGS
>>>            If unsure, say N.
>>>    +config DEBUG_VM_PGTABLE
>>> +    bool "Debug arch page table for semantics compliance"
>>> +    depends on MMU
>>> +    depends on DEBUG_VM
>>
>> Does it really need to depend on DEBUG_VM ?
> 
> No. It seemed better to package this test along with DEBUG_VM (although I
> dont remember the conversation around it) and hence this dependency.

Yes but it perfectly work as standalone. The more easy it is to activate 
and the more people will use it. DEBUG_VM obliges to rebuild the kernel 
entirely and could modify the behaviour. Could the helpers we are 
testing behave differently when DEBUG_VM is not set ? I think it's good 
the test things as close as possible to final config.

> 
>> I think we could make it standalone and 'default y if DEBUG_VM' instead.
> 
> Which will yield the same result like before but in a different way. But
> yes, this test could go about either way but unless there is a good enough
> reason why change the current one.

I think if we want people to really use it on other architectures it 
must be possible to activate it without having to modify Kconfig. 
Otherwise people won't even know the test exists and the architecture 
fails the test.

The purpose of a test suite is to detect bugs. If you can't run the test 
until you have fixed the bugs, I guess nobody will ever detect the bugs 
and they will never be fixed.

So I think:
- the test should be 'default y' when ARCH_HAS_DEBUG_VM_PGTABLE is selected
- the test should be 'default n' when ARCH_HAS_DEBUG_VM_PGTABLE is not 
selected, and it should be user selectable if EXPERT is selected.

Something like:

config DEBUG_VM_PGTABLE
     bool "Debug arch page table for semantics compliance" if 
ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
     depends on MMU
     default 'n' if !ARCH_HAS_DEBUG_VM_PGTABLE
     default 'y' if DEBUG_VM


> 
>>
>>> +    depends on ARCH_HAS_DEBUG_VM_PGTABLE
>>> +    default y
>>> +    help
>>> +      This option provides a debug method which can be used to test
>>> +      architecture page table helper functions on various platforms in
>>> +      verifying if they comply with expected generic MM semantics. This
>>> +      will help architecture code in making sure that any changes or
>>> +      new additions of these helpers still conform to expected
>>> +      semantics of the generic MM.
>>> +
>>> +      If unsure, say N.
>>> +
>>
>> Does it make sense to make it 'default y' and say 'If unsure, say N' ?
> 
> No it does. Not when it defaults 'y' unconditionally. Will drop the last
> sentence "If unsure, say N". Nice catch, thank you.

Well I was not asking if 'default y' was making sense but only if 'If 
unsure say N' was making sense due to the 'default y'. You got it.

Christophe
Anshuman Khandual Jan. 30, 2020, 3:06 p.m. UTC | #26
On 01/28/2020 06:57 AM, Anshuman Khandual wrote:
> This adds tests which will validate architecture page table helpers and
> other accessors in their compliance with expected generic MM semantics.
> This will help various architectures in validating changes to existing
> page table helpers or addition of new ones.
> 
> This test covers basic page table entry transformations including but not
> limited to old, young, dirty, clean, write, write protect etc at various
> level along with populating intermediate entries with next page table page
> and validating them.
> 
> Test page table pages are allocated from system memory with required size
> and alignments. The mapped pfns at page table levels are derived from a
> real pfn representing a valid kernel text symbol. This test gets called
> right after page_alloc_init_late().
> 
> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
> arm64. Going forward, other architectures too can enable this after fixing
> build or runtime problems (if any) with their page table helpers.
> 
> Folks interested in making sure that a given platform's page table helpers
> conform to expected generic MM semantics should enable the above config
> which will just trigger this test during boot. Any non conformity here will
> be reported as an warning which would need to be fixed. This test will help
> catch any changes to the agreed upon semantics expected from generic MM and
> enable platforms to accommodate it thereafter.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Steven Price <Steven.Price@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Sri Krishna chowdary <schowdary@nvidia.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Vineet Gupta <vgupta@synopsys.com>
> Cc: James Hogan <jhogan@kernel.org>
> Cc: Paul Burton <paul.burton@mips.com>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: linux-snps-arc@lists.infradead.org
> Cc: linux-mips@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-ia64@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-s390@vger.kernel.org
> Cc: linux-sh@vger.kernel.org
> Cc: sparclinux@vger.kernel.org
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org

I should have included mailing lists for all missing platforms here.
Will add them in the patch next time around but for now just adding
them here explicitly so that hopefully in case some of them can build
and run the test successfully on respective platforms.

ALPHA:

+ linux-alpha@vger.kernel.org
+ Richard Henderson <rth@twiddle.net>
+ Ivan Kokshaysky <ink@jurassic.park.msu.ru>
+ Matt Turner <mattst88@gmail.com>

C6X:

+ linux-c6x-dev@linux-c6x.org
+ Mark Salter <msalter@redhat.com>
+ Aurelien Jacquiot <jacquiot.aurelien@gmail.com>

H8300:

+ uclinux-h8-devel@lists.sourceforge.jp
+ Yoshinori Sato <ysato@users.sourceforge.jp>

HEXAGON:

+ linux-hexagon@vger.kernel.org
+ Brian Cain <bcain@codeaurora.org>

M68K:

+ linux-m68k@lists.linux-m68k.org
+ Geert Uytterhoeven <geert@linux-m68k.org>

MICROBLAZE:

+ Michal Simek <monstr@monstr.eu>

RISCV:

+ linux-riscv@lists.infradead.org
+ Paul Walmsley <paul.walmsley@sifive.com>
+ Palmer Dabbelt <palmer@dabbelt.com>

UNICORE32:

+ Guan Xuetao <gxt@pku.edu.cn>

XTENSA:

+ linux-xtensa@linux-xtensa.org
+ Chris Zankel <chris@zankel.net>
+ Max Filippov <jcmvbkbc@gmail.com>

Please feel free to add others if I have missed.
Anshuman Khandual Feb. 2, 2020, 7:18 a.m. UTC | #27
On 01/30/2020 07:43 PM, Christophe Leroy wrote:
> 
> 
> Le 30/01/2020 à 14:04, Anshuman Khandual a écrit :
>>
>> On 01/28/2020 10:35 PM, Christophe Leroy wrote:
>>>
>>>
>>> Le 28/01/2020 à 02:27, Anshuman Khandual a écrit :
>>>> diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
>>>> index 0b6c4042942a..fb0e76d254b3 100644
>>>> --- a/arch/x86/include/asm/pgtable_64.h
>>>> +++ b/arch/x86/include/asm/pgtable_64.h
>>>> @@ -53,6 +53,12 @@ static inline void sync_initial_page_table(void) { }
>>>>      struct mm_struct;
>>>>    +#define mm_p4d_folded mm_p4d_folded
>>>> +static inline bool mm_p4d_folded(struct mm_struct *mm)
>>>> +{
>>>> +    return !pgtable_l5_enabled();
>>>> +}
>>>> +
>>>
>>> For me this should be part of another patch, it is not directly linked to the tests.
>>
>> We did discuss about this earlier and Kirril mentioned its not worth
>> a separate patch.
>>
>> https://lore.kernel.org/linux-arm-kernel/20190913091305.rkds4f3fqv3yjhjy@box/
> 
> For me it would make sense to not mix this patch which implement tests, and changes that are needed for the test to work (or even build) on the different architectures.
> 
> But that's up to you.
> 
>>
>>>
>>>>    void set_pte_vaddr_p4d(p4d_t *p4d_page, unsigned long vaddr, pte_t new_pte);
>>>>    void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte);
>>>>    diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
>>>> index 798ea36a0549..e0b04787e789 100644
>>>> --- a/include/asm-generic/pgtable.h
>>>> +++ b/include/asm-generic/pgtable.h
>>>> @@ -1208,6 +1208,12 @@ static inline bool arch_has_pfn_modify_check(void)
>>>>    # define PAGE_KERNEL_EXEC PAGE_KERNEL
>>>>    #endif
>>>>    +#ifdef CONFIG_DEBUG_VM_PGTABLE
>>>
>>> Not sure it is a good idea to put that in include/asm-generic/pgtable.h
>>
>> Logically that is the right place, as it is related to page table but
>> not something platform related.
> 
> I can't see any debug related features in that file.
> 
>>
>>>
>>> By doing this you are forcing a rebuild of almost all files, whereas only init/main.o and mm/debug_vm_pgtable.o should be rebuilt when activating this config option.
>>
>> I agreed but whats the alternative ? We could move these into init/main.c
>> to make things simpler but will that be a right place, given its related
>> to generic page table.
> 
> What about linux/mmdebug.h instead ? (I have not checked if it would reduce the impact, but that's where things related to CONFIG_DEBUG_VM seems to be).
> 
> Otherwise, you can just create new file, for instance <linux/mmdebug-pgtable.h> and include that file only in the init/main.c and mm/debug_vm_pgtable.c

IMHO it might not be wise to add yet another header file for this purpose.
Instead lets use <linux/mmdebug.h> in line with DEBUG_VM, DEBUG_VM_PGFLAGS,
DEBUG_VIRTUAL (which is also a stand alone test). A simple grep shows that
the impact of mmdebug.h would be less than generic pgtable.h header.

> 
> 
> 
>>
>>>
>>>> +extern void debug_vm_pgtable(void);
>>>
>>> Please don't use the 'extern' keyword, it is useless and not to be used for functions declaration.
>>
>> Really ? But, there are tons of examples doing the same thing both in
>> generic and platform code as well.
> 
> Yes, but how can we improve if we blindly copy the errors from the past ? Having tons of 'extern' doesn't mean we must add more.
> 
> I think checkpatch.pl usually complains when a patch brings a new unreleval extern symbol.

Sure np, will drop it. But checkpatch.pl never complained.

> 
>>
>>>
>>>> +#else
>>>> +static inline void debug_vm_pgtable(void) { }
>>>> +#endif
>>>> +
>>>>    #endif /* !__ASSEMBLY__ */
>>>>      #ifndef io_remap_pfn_range
>>>> diff --git a/init/main.c b/init/main.c
>>>> index da1bc0b60a7d..5e59e6ac0780 100644
>>>> --- a/init/main.c
>>>> +++ b/init/main.c
>>>> @@ -1197,6 +1197,7 @@ static noinline void __init kernel_init_freeable(void)
>>>>        sched_init_smp();
>>>>          page_alloc_init_late();
>>>> +    debug_vm_pgtable();
>>>
>>> Wouldn't it be better to call debug_vm_pgtable() in kernel_init() between the call to async_synchronise_full() and ftrace_free_init_mem() ?
>>
>> IIRC, proposed location is the earliest we could call debug_vm_pgtable().
>> Is there any particular benefit or reason to move it into kernel_init() ?
> 
> It would avoid having it lost in the middle of drivers logs, would be close to the end of init, at a place we can't miss it, close to the result of other tests like CONFIG_DEBUG_RODATA_TEST for instance.
> 
> At the moment, you have to look for it to be sure the test is done and what the result is.

Sure, will move it.

> 
>>
>>>
>>>>        /* Initialize page ext after all struct pages are initialized. */
>>>>        page_ext_init();
>>>>    diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>>>> index 5ffe144c9794..7cceae923c05 100644
>>>> --- a/lib/Kconfig.debug
>>>> +++ b/lib/Kconfig.debug
>>>> @@ -653,6 +653,12 @@ config SCHED_STACK_END_CHECK
>>>>          data corruption or a sporadic crash at a later stage once the region
>>>>          is examined. The runtime overhead introduced is minimal.
>>>>    +config ARCH_HAS_DEBUG_VM_PGTABLE
>>>> +    bool
>>>> +    help
>>>> +      An architecture should select this when it can successfully
>>>> +      build and run DEBUG_VM_PGTABLE.
>>>> +
>>>>    config DEBUG_VM
>>>>        bool "Debug VM"
>>>>        depends on DEBUG_KERNEL
>>>> @@ -688,6 +694,22 @@ config DEBUG_VM_PGFLAGS
>>>>            If unsure, say N.
>>>>    +config DEBUG_VM_PGTABLE
>>>> +    bool "Debug arch page table for semantics compliance"
>>>> +    depends on MMU
>>>> +    depends on DEBUG_VM
>>>
>>> Does it really need to depend on DEBUG_VM ?
>>
>> No. It seemed better to package this test along with DEBUG_VM (although I
>> dont remember the conversation around it) and hence this dependency.
> 
> Yes but it perfectly work as standalone. The more easy it is to activate and the more people will use it. DEBUG_VM obliges to rebuild the kernel entirely and could modify the behaviour. Could the helpers we are testing behave differently when DEBUG_VM is not set ? I think it's good the test things as close as possible to final config.

Makes sense. There is no functional dependency for the individual tests
here on DEBUG_VM.

> 
>>
>>> I think we could make it standalone and 'default y if DEBUG_VM' instead.
>>
>> Which will yield the same result like before but in a different way. But
>> yes, this test could go about either way but unless there is a good enough
>> reason why change the current one.
> 
> I think if we want people to really use it on other architectures it must be possible to activate it without having to modify Kconfig. Otherwise people won't even know the test exists and the architecture fails the test.
> 
> The purpose of a test suite is to detect bugs. If you can't run the test until you have fixed the bugs, I guess nobody will ever detect the bugs and they will never be fixed.
> 
> So I think:
> - the test should be 'default y' when ARCH_HAS_DEBUG_VM_PGTABLE is selected
> - the test should be 'default n' when ARCH_HAS_DEBUG_VM_PGTABLE is not selected, and it should be user selectable if EXPERT is selected.
> 
> Something like:
> 
> config DEBUG_VM_PGTABLE
>     bool "Debug arch page table for semantics compliance" if ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
>     depends on MMU

(ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT) be moved along side MMU on the same line ?

>     default 'n' if !ARCH_HAS_DEBUG_VM_PGTABLE
>     default 'y' if DEBUG_VM

This looks good, at least until we get all platforms enabled. Will do all these
changes along with s390 enablement and re-spin.

> 
> 
>>
>>>
>>>> +    depends on ARCH_HAS_DEBUG_VM_PGTABLE
>>>> +    default y
>>>> +    help
>>>> +      This option provides a debug method which can be used to test
>>>> +      architecture page table helper functions on various platforms in
>>>> +      verifying if they comply with expected generic MM semantics. This
>>>> +      will help architecture code in making sure that any changes or
>>>> +      new additions of these helpers still conform to expected
>>>> +      semantics of the generic MM.
>>>> +
>>>> +      If unsure, say N.
>>>> +
>>>
>>> Does it make sense to make it 'default y' and say 'If unsure, say N' ?
>>
>> No it does. Not when it defaults 'y' unconditionally. Will drop the last
>> sentence "If unsure, say N". Nice catch, thank you.
> 
> Well I was not asking if 'default y' was making sense but only if 'If unsure say N' was making sense due to the 'default y'. You got it.
> 
> Christophe
> 
>
Anshuman Khandual Feb. 2, 2020, 8:26 a.m. UTC | #28
On 01/30/2020 06:34 PM, Anshuman Khandual wrote:
> On 01/28/2020 10:35 PM, Christophe Leroy wrote:
>>
>> Le 28/01/2020 à 02:27, Anshuman Khandual a écrit :
>>> This adds tests which will validate architecture page table helpers and
>>> other accessors in their compliance with expected generic MM semantics.
>>> This will help various architectures in validating changes to existing
>>> page table helpers or addition of new ones.
>>>
>>> This test covers basic page table entry transformations including but not
>>> limited to old, young, dirty, clean, write, write protect etc at various
>>> level along with populating intermediate entries with next page table page
>>> and validating them.
>>>
>>> Test page table pages are allocated from system memory with required size
>>> and alignments. The mapped pfns at page table levels are derived from a
>>> real pfn representing a valid kernel text symbol. This test gets called
>>> right after page_alloc_init_late().
>>>
>>> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
>>> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
>>> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
>>> arm64. Going forward, other architectures too can enable this after fixing
>>> build or runtime problems (if any) with their page table helpers.
>>>
>>> Folks interested in making sure that a given platform's page table helpers
>>> conform to expected generic MM semantics should enable the above config
>>> which will just trigger this test during boot. Any non conformity here will
>>> be reported as an warning which would need to be fixed. This test will help
>>> catch any changes to the agreed upon semantics expected from generic MM and
>>> enable platforms to accommodate it thereafter.
>>>
>> [...]
>>
>>> Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>        #PPC32
>> Also tested on PPC64 (under QEMU): book3s/64 64k pages, book3s/64 4k pages and book3e/64
> Hmm but earlier Michael Ellerman had reported some problems while
> running these tests on PPC64, a soft lock up in hash__pte_update()
> and a kernel BUG (radix MMU). Are those problems gone away now ?
> 
> Details in this thread - https://patchwork.kernel.org/patch/11214603/
> 

It is always better to have more platforms enabled than not. But lets keep
this test disabled on PPC64 for now, if there is any inconsistency between
results while running this under QEMU and on actual systems.
Christophe Leroy Feb. 2, 2020, 8:31 a.m. UTC | #29
Le 02/02/2020 à 08:18, Anshuman Khandual a écrit :
> On 01/30/2020 07:43 PM, Christophe Leroy wrote:
>>
>>
>> Le 30/01/2020 à 14:04, Anshuman Khandual a écrit :
>>>
>>> On 01/28/2020 10:35 PM, Christophe Leroy wrote:
>>
>>>
>>>> I think we could make it standalone and 'default y if DEBUG_VM' instead.
>>>
>>> Which will yield the same result like before but in a different way. But
>>> yes, this test could go about either way but unless there is a good enough
>>> reason why change the current one.
>>
>> I think if we want people to really use it on other architectures it must be possible to activate it without having to modify Kconfig. Otherwise people won't even know the test exists and the architecture fails the test.
>>
>> The purpose of a test suite is to detect bugs. If you can't run the test until you have fixed the bugs, I guess nobody will ever detect the bugs and they will never be fixed.
>>
>> So I think:
>> - the test should be 'default y' when ARCH_HAS_DEBUG_VM_PGTABLE is selected
>> - the test should be 'default n' when ARCH_HAS_DEBUG_VM_PGTABLE is not selected, and it should be user selectable if EXPERT is selected.
>>
>> Something like:
>>
>> config DEBUG_VM_PGTABLE
>>      bool "Debug arch page table for semantics compliance" if ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
>>      depends on MMU
> 
> (ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT) be moved along side MMU on the same line ?

Yes could also go along side MMU, or could be a depend by itself:
	depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT

> 
>>      default 'n' if !ARCH_HAS_DEBUG_VM_PGTABLE
>>      default 'y' if DEBUG_VM
> 
> This looks good, at least until we get all platforms enabled. Will do all these
> changes along with s390 enablement and re-spin.

Christophe
Qian Cai Feb. 2, 2020, 11:26 a.m. UTC | #30
> On Jan 30, 2020, at 9:13 AM, Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
> config DEBUG_VM_PGTABLE
>    bool "Debug arch page table for semantics compliance" if ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
>    depends on MMU
>    default 'n' if !ARCH_HAS_DEBUG_VM_PGTABLE
>    default 'y' if DEBUG_VM

Does it really necessary to potentially force all bots to run this? Syzbot, kernel test robot etc? Does it ever pay off for all their machine times there?
Christophe Leroy Feb. 3, 2020, 3:14 p.m. UTC | #31
Le 02/02/2020 à 12:26, Qian Cai a écrit :
> 
> 
>> On Jan 30, 2020, at 9:13 AM, Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>
>> config DEBUG_VM_PGTABLE
>>     bool "Debug arch page table for semantics compliance" if ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
>>     depends on MMU
>>     default 'n' if !ARCH_HAS_DEBUG_VM_PGTABLE
>>     default 'y' if DEBUG_VM
> 
> Does it really necessary to potentially force all bots to run this? Syzbot, kernel test robot etc? Does it ever pay off for all their machine times there?
> 

Machine time ?

On a 32 bits powerpc running at 132 MHz, the tests takes less than 10ms. 
Is it worth taking the risk of not detecting faults by not selecting it 
by default ?

[    5.656916] debug_vm_pgtable: debug_vm_pgtable: Validating 
architecture page table helpers
[    5.665661] debug_vm_pgtable: debug_vm_pgtable: Validated 
architecture page table helpers

Christophe
Qian Cai Feb. 3, 2020, 3:48 p.m. UTC | #32
On Mon, 2020-02-03 at 16:14 +0100, Christophe Leroy wrote:
> 
> Le 02/02/2020 à 12:26, Qian Cai a écrit :
> > 
> > 
> > > On Jan 30, 2020, at 9:13 AM, Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> > > 
> > > config DEBUG_VM_PGTABLE
> > >     bool "Debug arch page table for semantics compliance" if ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
> > >     depends on MMU
> > >     default 'n' if !ARCH_HAS_DEBUG_VM_PGTABLE
> > >     default 'y' if DEBUG_VM
> > 
> > Does it really necessary to potentially force all bots to run this? Syzbot, kernel test robot etc? Does it ever pay off for all their machine times there?
> > 
> 
> Machine time ?
> 
> On a 32 bits powerpc running at 132 MHz, the tests takes less than 10ms. 
> Is it worth taking the risk of not detecting faults by not selecting it 
> by default ?

The risk is quite low as Catalin mentioned this thing is not to detect
regressions but rather for arch/mm maintainers.

I do appreciate the efforts to get everyone as possible to run this thing,
so it get more notices once it is broken. However, DEBUG_VM seems like such
a generic Kconfig those days that have even been enabled by default for
Fedora Linux, so I would rather see a more sensitive default been taken
even though the test runtime is fairly quickly on a small machine for now.

> 
> [    5.656916] debug_vm_pgtable: debug_vm_pgtable: Validating 
> architecture page table helpers
> [    5.665661] debug_vm_pgtable: debug_vm_pgtable: Validated 
> architecture page table helpers
Catalin Marinas Feb. 10, 2020, 3:37 p.m. UTC | #33
On Tue, Jan 28, 2020 at 06:57:53AM +0530, Anshuman Khandual wrote:
> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
> arm64. Going forward, other architectures too can enable this after fixing
> build or runtime problems (if any) with their page table helpers.

It may be worth posting the next version to linux-arch to reach out to
other arch maintainers.

Also I've seen that you posted a v13 but it hasn't reached
linux-arm-kernel (likely held in moderation because of the large amount
of addresses cc'ed) and I don't normally follow LKML. I'm not cc'ed to
this patch either (which is fine as long as you post to a list that I
read).

Since I started the reply on v12 about a week ago, I'll follow up here.
When you post a v14, please trim the people on cc only to those strictly
necessary (e.g. arch maintainers, linux-mm, linux-arch and lkml).

> diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
> new file mode 100644
> index 000000000000..f3f8111edbe3
> --- /dev/null
> +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
> @@ -0,0 +1,35 @@
> +#
> +# Feature name:          debug-vm-pgtable
> +#         Kconfig:       ARCH_HAS_DEBUG_VM_PGTABLE
> +#         description:   arch supports pgtable tests for semantics compliance
> +#
> +    -----------------------
> +    |         arch |status|
> +    -----------------------
> +    |       alpha: | TODO |
> +    |         arc: |  ok  |
> +    |         arm: | TODO |

I'm sure you can find some arm32 hardware around (or a VM) to give this
a try ;).

> diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
> index 0b6c4042942a..fb0e76d254b3 100644
> --- a/arch/x86/include/asm/pgtable_64.h
> +++ b/arch/x86/include/asm/pgtable_64.h
[...]
> @@ -1197,6 +1197,7 @@ static noinline void __init kernel_init_freeable(void)
>  	sched_init_smp();
>  
>  	page_alloc_init_late();
> +	debug_vm_pgtable();
>  	/* Initialize page ext after all struct pages are initialized. */
>  	page_ext_init();

I guess you could even make debug_vm_pgtable() an early_initcall(). I
don't have a strong opinion either way.

> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> new file mode 100644
> index 000000000000..0f37f32d15f1
> --- /dev/null
> +++ b/mm/debug_vm_pgtable.c
> @@ -0,0 +1,388 @@
[...]
> +/*
> + * Basic operations
> + *
> + * mkold(entry)			= An old and not a young entry
> + * mkyoung(entry)		= A young and not an old entry
> + * mkdirty(entry)		= A dirty and not a clean entry
> + * mkclean(entry)		= A clean and not a dirty entry
> + * mkwrite(entry)		= A write and not a write protected entry
> + * wrprotect(entry)		= A write protected and not a write entry
> + * pxx_bad(entry)		= A mapped and non-table entry
> + * pxx_same(entry1, entry2)	= Both entries hold the exact same value
> + */
> +#define VMFLAGS	(VM_READ|VM_WRITE|VM_EXEC)
> +
> +/*
> + * On s390 platform, the lower 12 bits are used to identify given page table
> + * entry type and for other arch specific requirements. But these bits might
> + * affect the ability to clear entries with pxx_clear(). So while loading up
> + * the entries skip all lower 12 bits in order to accommodate s390 platform.
> + * It does not have affect any other platform.
> + */
> +#define RANDOM_ORVALUE	(0xfffffffffffff000UL)

I'd suggest you generate this mask with something like
GENMASK(BITS_PER_LONG, PAGE_SHIFT).

> +#define RANDOM_NZVALUE	(0xff)
> +
> +static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot)
> +{
> +	pte_t pte = pfn_pte(pfn, prot);
> +
> +	WARN_ON(!pte_same(pte, pte));
> +	WARN_ON(!pte_young(pte_mkyoung(pte)));
> +	WARN_ON(!pte_dirty(pte_mkdirty(pte)));
> +	WARN_ON(!pte_write(pte_mkwrite(pte)));
> +	WARN_ON(pte_young(pte_mkold(pte)));
> +	WARN_ON(pte_dirty(pte_mkclean(pte)));
> +	WARN_ON(pte_write(pte_wrprotect(pte)));

Given that you start with rwx permissions set,
some of these ops would not have any effect. For example, on arm64 at
least, mkwrite clears a bit already cleared here. You could try with
multiple rwx combinations values (e.g. all set and all cleared) or maybe
something like below:

	WARN_ON(!pte_write(pte_mkwrite(pte_wrprotect(pte))));

You could also try something like this:

	WARN_ON(!pte_same(pte_wrprotect(pte), pte_wrprotect(pte_mkwrite(pte))));

though the above approach may not work for arm64 ptep_set_wrprotect() on
a dirty pte (if you extend these tests later).

> +}
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
> +{
> +	pmd_t pmd = pfn_pmd(pfn, prot);
> +
> +	WARN_ON(!pmd_same(pmd, pmd));
> +	WARN_ON(!pmd_young(pmd_mkyoung(pmd)));
> +	WARN_ON(!pmd_dirty(pmd_mkdirty(pmd)));
> +	WARN_ON(!pmd_write(pmd_mkwrite(pmd)));
> +	WARN_ON(pmd_young(pmd_mkold(pmd)));
> +	WARN_ON(pmd_dirty(pmd_mkclean(pmd)));
> +	WARN_ON(pmd_write(pmd_wrprotect(pmd)));
> +	/*
> +	 * A huge page does not point to next level page table
> +	 * entry. Hence this must qualify as pmd_bad().
> +	 */
> +	WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
> +}
> +
> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> +static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot)
> +{
> +	pud_t pud = pfn_pud(pfn, prot);
> +
> +	WARN_ON(!pud_same(pud, pud));
> +	WARN_ON(!pud_young(pud_mkyoung(pud)));
> +	WARN_ON(!pud_write(pud_mkwrite(pud)));
> +	WARN_ON(pud_write(pud_wrprotect(pud)));
> +	WARN_ON(pud_young(pud_mkold(pud)));
> +
> +	if (mm_pmd_folded(mm) || __is_defined(ARCH_HAS_4LEVEL_HACK))
> +		return;
> +
> +	/*
> +	 * A huge page does not point to next level page table
> +	 * entry. Hence this must qualify as pud_bad().
> +	 */
> +	WARN_ON(!pud_bad(pud_mkhuge(pud)));
> +}
> +#else
> +static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
> +#endif
> +#else
> +static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { }
> +static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
> +#endif
> +
> +static void __init p4d_basic_tests(unsigned long pfn, pgprot_t prot)
> +{
> +	p4d_t p4d;
> +
> +	memset(&p4d, RANDOM_NZVALUE, sizeof(p4d_t));
> +	WARN_ON(!p4d_same(p4d, p4d));
> +}
> +
> +static void __init pgd_basic_tests(unsigned long pfn, pgprot_t prot)
> +{
> +	pgd_t pgd;
> +
> +	memset(&pgd, RANDOM_NZVALUE, sizeof(pgd_t));
> +	WARN_ON(!pgd_same(pgd, pgd));
> +}
> +
> +#ifndef __ARCH_HAS_4LEVEL_HACK

This macro doesn't exist in the kernel anymore (it's a 5LEVEL now). But
can you not use the __PAGETABLE_PUD_FOLDED instead?

> +static void __init pud_clear_tests(struct mm_struct *mm, pud_t *pudp)
> +{
> +	pud_t pud = READ_ONCE(*pudp);
> +
> +	if (mm_pmd_folded(mm))
> +		return;
> +
> +	pud = __pud(pud_val(pud) | RANDOM_ORVALUE);
> +	WRITE_ONCE(*pudp, pud);
> +	pud_clear(pudp);
> +	pud = READ_ONCE(*pudp);
> +	WARN_ON(!pud_none(pud));
> +}
> +
> +static void __init pud_populate_tests(struct mm_struct *mm, pud_t *pudp,
> +				      pmd_t *pmdp)
> +{
> +	pud_t pud;
> +
> +	if (mm_pmd_folded(mm))
> +		return;
> +	/*
> +	 * This entry points to next level page table page.
> +	 * Hence this must not qualify as pud_bad().
> +	 */
> +	pmd_clear(pmdp);
> +	pud_clear(pudp);
> +	pud_populate(mm, pudp, pmdp);
> +	pud = READ_ONCE(*pudp);
> +	WARN_ON(pud_bad(pud));
> +}
> +#else
> +static void __init pud_clear_tests(struct mm_struct *mm, pud_t *pudp) { }
> +static void __init pud_populate_tests(struct mm_struct *mm, pud_t *pudp,
> +				      pmd_t *pmdp)
> +{
> +}
> +#endif
> +
> +#ifndef __ARCH_HAS_5LEVEL_HACK

Could you use __PAGETABLE_P4D_FOLDED instead?

> +static void __init p4d_clear_tests(struct mm_struct *mm, p4d_t *p4dp)
> +{
> +	p4d_t p4d = READ_ONCE(*p4dp);
> +
> +	if (mm_pud_folded(mm))
> +		return;
> +
> +	p4d = __p4d(p4d_val(p4d) | RANDOM_ORVALUE);
> +	WRITE_ONCE(*p4dp, p4d);
> +	p4d_clear(p4dp);
> +	p4d = READ_ONCE(*p4dp);
> +	WARN_ON(!p4d_none(p4d));
> +}

Otherwise the patch looks fine. As per the comment on v13, make sure you
don't break the build on any architecture, so this could either be an
opt-in or patch those architectures before this patch is applied.

Thanks.
Anshuman Khandual Feb. 12, 2020, 9:42 a.m. UTC | #34
On 02/10/2020 09:07 PM, Catalin Marinas wrote:
> On Tue, Jan 28, 2020 at 06:57:53AM +0530, Anshuman Khandual wrote:
>> This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
>> CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
>> select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
>> arm64. Going forward, other architectures too can enable this after fixing
>> build or runtime problems (if any) with their page table helpers.
> 
> It may be worth posting the next version to linux-arch to reach out to
> other arch maintainers.

Sure, will do.

> 
> Also I've seen that you posted a v13 but it hasn't reached
> linux-arm-kernel (likely held in moderation because of the large amount
> of addresses cc'ed) and I don't normally follow LKML. I'm not cc'ed to
> this patch either (which is fine as long as you post to a list that I
> read).

Right, the CC list on V13 was a disaster. I did not realize that it will
exceed the permitted limit when the lists will start refusing to take. In
fact, it looks like LKML did not get the email either.

> 
> Since I started the reply on v12 about a week ago, I'll follow up here.
> When you post a v14, please trim the people on cc only to those strictly
> necessary (e.g. arch maintainers, linux-mm, linux-arch and lkml).

Sure, will do.

> 
>> diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
>> new file mode 100644
>> index 000000000000..f3f8111edbe3
>> --- /dev/null
>> +++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
>> @@ -0,0 +1,35 @@
>> +#
>> +# Feature name:          debug-vm-pgtable
>> +#         Kconfig:       ARCH_HAS_DEBUG_VM_PGTABLE
>> +#         description:   arch supports pgtable tests for semantics compliance
>> +#
>> +    -----------------------
>> +    |         arch |status|
>> +    -----------------------
>> +    |       alpha: | TODO |
>> +    |         arc: |  ok  |
>> +    |         arm: | TODO |
> 
> I'm sure you can find some arm32 hardware around (or a VM) to give this
> a try ;).

It does not build on arm32 and we dont have an agreement on how to go about
that either, hence will disable this test on IA64 and ARM (32) in order to
prevent the known build failures (as Andrew had requested).

> 
>> diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
>> index 0b6c4042942a..fb0e76d254b3 100644
>> --- a/arch/x86/include/asm/pgtable_64.h
>> +++ b/arch/x86/include/asm/pgtable_64.h
> [...]
>> @@ -1197,6 +1197,7 @@ static noinline void __init kernel_init_freeable(void)
>>  	sched_init_smp();
>>  
>>  	page_alloc_init_late();
>> +	debug_vm_pgtable();
>>  	/* Initialize page ext after all struct pages are initialized. */
>>  	page_ext_init();
> 
> I guess you could even make debug_vm_pgtable() an early_initcall(). I
> don't have a strong opinion either way.
> 
>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>> new file mode 100644
>> index 000000000000..0f37f32d15f1
>> --- /dev/null
>> +++ b/mm/debug_vm_pgtable.c
>> @@ -0,0 +1,388 @@
> [...]
>> +/*
>> + * Basic operations
>> + *
>> + * mkold(entry)			= An old and not a young entry
>> + * mkyoung(entry)		= A young and not an old entry
>> + * mkdirty(entry)		= A dirty and not a clean entry
>> + * mkclean(entry)		= A clean and not a dirty entry
>> + * mkwrite(entry)		= A write and not a write protected entry
>> + * wrprotect(entry)		= A write protected and not a write entry
>> + * pxx_bad(entry)		= A mapped and non-table entry
>> + * pxx_same(entry1, entry2)	= Both entries hold the exact same value
>> + */
>> +#define VMFLAGS	(VM_READ|VM_WRITE|VM_EXEC)
>> +
>> +/*
>> + * On s390 platform, the lower 12 bits are used to identify given page table
>> + * entry type and for other arch specific requirements. But these bits might
>> + * affect the ability to clear entries with pxx_clear(). So while loading up
>> + * the entries skip all lower 12 bits in order to accommodate s390 platform.
>> + * It does not have affect any other platform.
>> + */
>> +#define RANDOM_ORVALUE	(0xfffffffffffff000UL)
> 
> I'd suggest you generate this mask with something like
> GENMASK(BITS_PER_LONG, PAGE_SHIFT).

IIRC the lower 12 bits constrains on s390 platform might not be really related
to it's PAGE_SHIFT which can be a variable, but instead just a constant number.
But can definitely use GENMASK or it's variants here.

https://lkml.org/lkml/2019/9/5/862

> 
>> +#define RANDOM_NZVALUE	(0xff)
>> +
>> +static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> +	pte_t pte = pfn_pte(pfn, prot);
>> +
>> +	WARN_ON(!pte_same(pte, pte));
>> +	WARN_ON(!pte_young(pte_mkyoung(pte)));
>> +	WARN_ON(!pte_dirty(pte_mkdirty(pte)));
>> +	WARN_ON(!pte_write(pte_mkwrite(pte)));
>> +	WARN_ON(pte_young(pte_mkold(pte)));
>> +	WARN_ON(pte_dirty(pte_mkclean(pte)));
>> +	WARN_ON(pte_write(pte_wrprotect(pte)));
> 
> Given that you start with rwx permissions set,
> some of these ops would not have any effect. For example, on arm64 at
> least, mkwrite clears a bit already cleared here. You could try with

PTE_RDONLY !

> multiple rwx combinations values (e.g. all set and all cleared) or maybe

Which will require running the sequence of tests multiple times, each
time with different prot value (e.g all set or all clear). Wondering
if that would be better than the proposed single pass.

> something like below:
> 
> 	WARN_ON(!pte_write(pte_mkwrite(pte_wrprotect(pte))));

Hmm, we should run invert functions first for each function we are
trying to test ? That makes sense because any platform specific bit
combination (clear or set) for the function to be tested, will first
be flipped with it's invert function.

> 
> You could also try something like this:
> 
> 	WARN_ON(!pte_same(pte_wrprotect(pte), pte_wrprotect(pte_mkwrite(pte))));
> 
> though the above approach may not work for arm64 ptep_set_wrprotect() on
> a dirty pte (if you extend these tests later).

Okay, will use the previous method (invert function -> actual function) for
basic tests on each level.

> 
>> +}
>> +
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> +	pmd_t pmd = pfn_pmd(pfn, prot);
>> +
>> +	WARN_ON(!pmd_same(pmd, pmd));
>> +	WARN_ON(!pmd_young(pmd_mkyoung(pmd)));
>> +	WARN_ON(!pmd_dirty(pmd_mkdirty(pmd)));
>> +	WARN_ON(!pmd_write(pmd_mkwrite(pmd)));
>> +	WARN_ON(pmd_young(pmd_mkold(pmd)));
>> +	WARN_ON(pmd_dirty(pmd_mkclean(pmd)));
>> +	WARN_ON(pmd_write(pmd_wrprotect(pmd)));
>> +	/*
>> +	 * A huge page does not point to next level page table
>> +	 * entry. Hence this must qualify as pmd_bad().
>> +	 */
>> +	WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
>> +}
>> +
>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>> +static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> +	pud_t pud = pfn_pud(pfn, prot);
>> +
>> +	WARN_ON(!pud_same(pud, pud));
>> +	WARN_ON(!pud_young(pud_mkyoung(pud)));
>> +	WARN_ON(!pud_write(pud_mkwrite(pud)));
>> +	WARN_ON(pud_write(pud_wrprotect(pud)));
>> +	WARN_ON(pud_young(pud_mkold(pud)));
>> +
>> +	if (mm_pmd_folded(mm) || __is_defined(ARCH_HAS_4LEVEL_HACK))
>> +		return;
>> +
>> +	/*
>> +	 * A huge page does not point to next level page table
>> +	 * entry. Hence this must qualify as pud_bad().
>> +	 */
>> +	WARN_ON(!pud_bad(pud_mkhuge(pud)));
>> +}
>> +#else
>> +static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
>> +#endif
>> +#else
>> +static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { }
>> +static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
>> +#endif
>> +
>> +static void __init p4d_basic_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> +	p4d_t p4d;
>> +
>> +	memset(&p4d, RANDOM_NZVALUE, sizeof(p4d_t));
>> +	WARN_ON(!p4d_same(p4d, p4d));
>> +}
>> +
>> +static void __init pgd_basic_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> +	pgd_t pgd;
>> +
>> +	memset(&pgd, RANDOM_NZVALUE, sizeof(pgd_t));
>> +	WARN_ON(!pgd_same(pgd, pgd));
>> +}
>> +
>> +#ifndef __ARCH_HAS_4LEVEL_HACK
> 
> This macro doesn't exist in the kernel anymore (it's a 5LEVEL now). But

I was aware about the work to drop __ARCH_HAS_4LEVEL_HACK but did not realize
that it has already merged.

> can you not use the __PAGETABLE_PUD_FOLDED instead?

Sure, will try.

> 
>> +static void __init pud_clear_tests(struct mm_struct *mm, pud_t *pudp)
>> +{
>> +	pud_t pud = READ_ONCE(*pudp);
>> +
>> +	if (mm_pmd_folded(mm))
>> +		return;
>> +
>> +	pud = __pud(pud_val(pud) | RANDOM_ORVALUE);
>> +	WRITE_ONCE(*pudp, pud);
>> +	pud_clear(pudp);
>> +	pud = READ_ONCE(*pudp);
>> +	WARN_ON(!pud_none(pud));
>> +}
>> +
>> +static void __init pud_populate_tests(struct mm_struct *mm, pud_t *pudp,
>> +				      pmd_t *pmdp)
>> +{
>> +	pud_t pud;
>> +
>> +	if (mm_pmd_folded(mm))
>> +		return;
>> +	/*
>> +	 * This entry points to next level page table page.
>> +	 * Hence this must not qualify as pud_bad().
>> +	 */
>> +	pmd_clear(pmdp);
>> +	pud_clear(pudp);
>> +	pud_populate(mm, pudp, pmdp);
>> +	pud = READ_ONCE(*pudp);
>> +	WARN_ON(pud_bad(pud));
>> +}
>> +#else
>> +static void __init pud_clear_tests(struct mm_struct *mm, pud_t *pudp) { }
>> +static void __init pud_populate_tests(struct mm_struct *mm, pud_t *pudp,
>> +				      pmd_t *pmdp)
>> +{
>> +}
>> +#endif
>> +
>> +#ifndef __ARCH_HAS_5LEVEL_HACK
> 
> Could you use __PAGETABLE_P4D_FOLDED instead?

Sure, will try.

Initial tests with __PAGETABLE_PUD_FOLDED and __PAGETABLE_P4D_FOLDED
replacement looks okay.
 
> 
>> +static void __init p4d_clear_tests(struct mm_struct *mm, p4d_t *p4dp)
>> +{
>> +	p4d_t p4d = READ_ONCE(*p4dp);
>> +
>> +	if (mm_pud_folded(mm))
>> +		return;
>> +
>> +	p4d = __p4d(p4d_val(p4d) | RANDOM_ORVALUE);
>> +	WRITE_ONCE(*p4dp, p4d);
>> +	p4d_clear(p4dp);
>> +	p4d = READ_ONCE(*p4dp);
>> +	WARN_ON(!p4d_none(p4d));
>> +}
> 
> Otherwise the patch looks fine. As per the comment on v13, make sure you
> don't break the build on any architecture, so this could either be an
> opt-in or patch those architectures before this patch is applied.

We already have an opt-in method through ARCH_HAS_DEBUG_VM_PGTABLE config.
But lately (v13) we had decided to enable the test through CONFIG_EXPERT,
for better adaptability on non supported platforms without requiring it's
Kconfig change. This exposed the existing build failures on IA64 and ARM.
I will probably disable the test on those platforms as agreed upon on V13
thread.

> 
> Thanks.
>
Gerald Schaefer Feb. 12, 2020, 5:55 p.m. UTC | #35
On Wed, 12 Feb 2020 15:12:54 +0530
Anshuman Khandual <anshuman.khandual@arm.com> wrote:

> >> +/*
> >> + * On s390 platform, the lower 12 bits are used to identify given page table
> >> + * entry type and for other arch specific requirements. But these bits might
> >> + * affect the ability to clear entries with pxx_clear(). So while loading up
> >> + * the entries skip all lower 12 bits in order to accommodate s390 platform.
> >> + * It does not have affect any other platform.
> >> + */
> >> +#define RANDOM_ORVALUE	(0xfffffffffffff000UL)  
> > 
> > I'd suggest you generate this mask with something like
> > GENMASK(BITS_PER_LONG, PAGE_SHIFT).  
> 
> IIRC the lower 12 bits constrains on s390 platform might not be really related
> to it's PAGE_SHIFT which can be a variable, but instead just a constant number.
> But can definitely use GENMASK or it's variants here.
> 
> https://lkml.org/lkml/2019/9/5/862

PAGE_SHIFT would be fine, it is 12 on s390. However, in order to be
more precise, we do not really need all 12 bits, only the last 4 bits.
So, something like this would work:

#define RANDOM_ORVALUE GENMASK(BITS_PER_LONG - 1, 4)

The text in the comment could then also be changed from 12 to 4, and
be a bit more specific on the fact that the impact on pxx_clear()
results from the dynamic page table folding logic on s390:

/*
 * On s390 platform, the lower 4 bits are used to identify given page table
 * entry type. But these bits might affect the ability to clear entries with
 * pxx_clear() because of how dynamic page table folding works on s390. So
 * while loading up the entries do not change the lower 4 bits.
 * It does not have affect any other platform.
 */
Anshuman Khandual Feb. 13, 2020, 2:15 a.m. UTC | #36
On 02/12/2020 11:25 PM, Gerald Schaefer wrote:
> On Wed, 12 Feb 2020 15:12:54 +0530
> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> 
>>>> +/*
>>>> + * On s390 platform, the lower 12 bits are used to identify given page table
>>>> + * entry type and for other arch specific requirements. But these bits might
>>>> + * affect the ability to clear entries with pxx_clear(). So while loading up
>>>> + * the entries skip all lower 12 bits in order to accommodate s390 platform.
>>>> + * It does not have affect any other platform.
>>>> + */
>>>> +#define RANDOM_ORVALUE	(0xfffffffffffff000UL)  
>>>
>>> I'd suggest you generate this mask with something like
>>> GENMASK(BITS_PER_LONG, PAGE_SHIFT).  
>>
>> IIRC the lower 12 bits constrains on s390 platform might not be really related
>> to it's PAGE_SHIFT which can be a variable, but instead just a constant number.
>> But can definitely use GENMASK or it's variants here.
>>
>> https://lkml.org/lkml/2019/9/5/862
> 
> PAGE_SHIFT would be fine, it is 12 on s390. However, in order to be
> more precise, we do not really need all 12 bits, only the last 4 bits.
> So, something like this would work:
> 
> #define RANDOM_ORVALUE GENMASK(BITS_PER_LONG - 1, 4)
> 
> The text in the comment could then also be changed from 12 to 4, and
> be a bit more specific on the fact that the impact on pxx_clear()
> results from the dynamic page table folding logic on s390:
> 
> /*
>  * On s390 platform, the lower 4 bits are used to identify given page table
>  * entry type. But these bits might affect the ability to clear entries with
>  * pxx_clear() because of how dynamic page table folding works on s390. So
>  * while loading up the entries do not change the lower 4 bits.
>  * It does not have affect any other platform.
>  */

Sure, will update accordingly.

Patch
diff mbox series

diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
new file mode 100644
index 000000000000..f3f8111edbe3
--- /dev/null
+++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
@@ -0,0 +1,35 @@ 
+#
+# Feature name:          debug-vm-pgtable
+#         Kconfig:       ARCH_HAS_DEBUG_VM_PGTABLE
+#         description:   arch supports pgtable tests for semantics compliance
+#
+    -----------------------
+    |         arch |status|
+    -----------------------
+    |       alpha: | TODO |
+    |         arc: |  ok  |
+    |         arm: | TODO |
+    |       arm64: |  ok  |
+    |         c6x: | TODO |
+    |        csky: | TODO |
+    |       h8300: | TODO |
+    |     hexagon: | TODO |
+    |        ia64: | TODO |
+    |        m68k: | TODO |
+    |  microblaze: | TODO |
+    |        mips: | TODO |
+    |       nds32: | TODO |
+    |       nios2: | TODO |
+    |    openrisc: | TODO |
+    |      parisc: | TODO |
+    |  powerpc/32: |  ok  |
+    |  powerpc/64: | TODO |
+    |       riscv: | TODO |
+    |        s390: | TODO |
+    |          sh: | TODO |
+    |       sparc: | TODO |
+    |          um: | TODO |
+    |   unicore32: | TODO |
+    |         x86: |  ok  |
+    |      xtensa: | TODO |
+    -----------------------
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 26108ea785c2..6a76f3b609b2 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -6,6 +6,7 @@ 
 config ARC
 	def_bool y
 	select ARC_TIMERS
+	select ARCH_HAS_DEBUG_VM_PGTABLE
 	select ARCH_HAS_DMA_PREP_COHERENT
 	select ARCH_HAS_PTE_SPECIAL
 	select ARCH_HAS_SETUP_DMA_OPS
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e688dfad0b72..876c9a672db1 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -11,6 +11,7 @@  config ARM64
 	select ACPI_PPTT if ACPI
 	select ARCH_CLOCKSOURCE_DATA
 	select ARCH_HAS_DEBUG_VIRTUAL
+	select ARCH_HAS_DEBUG_VM_PGTABLE
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select ARCH_HAS_DMA_PREP_COHERENT
 	select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1ec34e16ed65..253dcab0bebc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -120,6 +120,7 @@  config PPC
 	#
 	select ARCH_32BIT_OFF_T if PPC32
 	select ARCH_HAS_DEBUG_VIRTUAL
+	select ARCH_HAS_DEBUG_VM_PGTABLE if PPC32
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_FORTIFY_SOURCE
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5e8949953660..218536a4581d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -61,6 +61,7 @@  config X86
 	select ARCH_CLOCKSOURCE_INIT
 	select ARCH_HAS_ACPI_TABLE_UPGRADE	if ACPI
 	select ARCH_HAS_DEBUG_VIRTUAL
+	select ARCH_HAS_DEBUG_VM_PGTABLE	if !X86_PAE
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_FAST_MULTIPLIER
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 0b6c4042942a..fb0e76d254b3 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -53,6 +53,12 @@  static inline void sync_initial_page_table(void) { }
 
 struct mm_struct;
 
+#define mm_p4d_folded mm_p4d_folded
+static inline bool mm_p4d_folded(struct mm_struct *mm)
+{
+	return !pgtable_l5_enabled();
+}
+
 void set_pte_vaddr_p4d(p4d_t *p4d_page, unsigned long vaddr, pte_t new_pte);
 void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte);
 
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 798ea36a0549..e0b04787e789 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1208,6 +1208,12 @@  static inline bool arch_has_pfn_modify_check(void)
 # define PAGE_KERNEL_EXEC PAGE_KERNEL
 #endif
 
+#ifdef CONFIG_DEBUG_VM_PGTABLE
+extern void debug_vm_pgtable(void);
+#else
+static inline void debug_vm_pgtable(void) { }
+#endif
+
 #endif /* !__ASSEMBLY__ */
 
 #ifndef io_remap_pfn_range
diff --git a/init/main.c b/init/main.c
index da1bc0b60a7d..5e59e6ac0780 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1197,6 +1197,7 @@  static noinline void __init kernel_init_freeable(void)
 	sched_init_smp();
 
 	page_alloc_init_late();
+	debug_vm_pgtable();
 	/* Initialize page ext after all struct pages are initialized. */
 	page_ext_init();
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 5ffe144c9794..7cceae923c05 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -653,6 +653,12 @@  config SCHED_STACK_END_CHECK
 	  data corruption or a sporadic crash at a later stage once the region
 	  is examined. The runtime overhead introduced is minimal.
 
+config ARCH_HAS_DEBUG_VM_PGTABLE
+	bool
+	help
+	  An architecture should select this when it can successfully
+	  build and run DEBUG_VM_PGTABLE.
+
 config DEBUG_VM
 	bool "Debug VM"
 	depends on DEBUG_KERNEL
@@ -688,6 +694,22 @@  config DEBUG_VM_PGFLAGS
 
 	  If unsure, say N.
 
+config DEBUG_VM_PGTABLE
+	bool "Debug arch page table for semantics compliance"
+	depends on MMU
+	depends on DEBUG_VM
+	depends on ARCH_HAS_DEBUG_VM_PGTABLE
+	default y
+	help
+	  This option provides a debug method which can be used to test
+	  architecture page table helper functions on various platforms in
+	  verifying if they comply with expected generic MM semantics. This
+	  will help architecture code in making sure that any changes or
+	  new additions of these helpers still conform to expected
+	  semantics of the generic MM.
+
+	  If unsure, say N.
+
 config ARCH_HAS_DEBUG_VIRTUAL
 	bool
 
diff --git a/mm/Makefile b/mm/Makefile
index 1937cc251883..eba423a26b19 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -86,6 +86,7 @@  obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
 obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
 obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
 obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o
+obj-$(CONFIG_DEBUG_VM_PGTABLE) += debug_vm_pgtable.o
 obj-$(CONFIG_PAGE_OWNER) += page_owner.o
 obj-$(CONFIG_CLEANCACHE) += cleancache.o
 obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
new file mode 100644
index 000000000000..0f37f32d15f1
--- /dev/null
+++ b/mm/debug_vm_pgtable.c
@@ -0,0 +1,388 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * This kernel test validates architecture page table helpers and
+ * accessors and helps in verifying their continued compliance with
+ * expected generic MM semantics.
+ *
+ * Copyright (C) 2019 ARM Ltd.
+ *
+ * Author: Anshuman Khandual <anshuman.khandual@arm.com>
+ */
+#define pr_fmt(fmt) "debug_vm_pgtable: %s: " fmt, __func__
+
+#include <linux/gfp.h>
+#include <linux/highmem.h>
+#include <linux/hugetlb.h>
+#include <linux/kernel.h>
+#include <linux/kconfig.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/pfn_t.h>
+#include <linux/printk.h>
+#include <linux/random.h>
+#include <linux/spinlock.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
+#include <linux/start_kernel.h>
+#include <linux/sched/mm.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+
+/*
+ * Basic operations
+ *
+ * mkold(entry)			= An old and not a young entry
+ * mkyoung(entry)		= A young and not an old entry
+ * mkdirty(entry)		= A dirty and not a clean entry
+ * mkclean(entry)		= A clean and not a dirty entry
+ * mkwrite(entry)		= A write and not a write protected entry
+ * wrprotect(entry)		= A write protected and not a write entry
+ * pxx_bad(entry)		= A mapped and non-table entry
+ * pxx_same(entry1, entry2)	= Both entries hold the exact same value
+ */
+#define VMFLAGS	(VM_READ|VM_WRITE|VM_EXEC)
+
+/*
+ * On s390 platform, the lower 12 bits are used to identify given page table
+ * entry type and for other arch specific requirements. But these bits might
+ * affect the ability to clear entries with pxx_clear(). So while loading up
+ * the entries skip all lower 12 bits in order to accommodate s390 platform.
+ * It does not have affect any other platform.
+ */
+#define RANDOM_ORVALUE	(0xfffffffffffff000UL)
+#define RANDOM_NZVALUE	(0xff)
+
+static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot)
+{
+	pte_t pte = pfn_pte(pfn, prot);
+
+	WARN_ON(!pte_same(pte, pte));
+	WARN_ON(!pte_young(pte_mkyoung(pte)));
+	WARN_ON(!pte_dirty(pte_mkdirty(pte)));
+	WARN_ON(!pte_write(pte_mkwrite(pte)));
+	WARN_ON(pte_young(pte_mkold(pte)));
+	WARN_ON(pte_dirty(pte_mkclean(pte)));
+	WARN_ON(pte_write(pte_wrprotect(pte)));
+}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
+{
+	pmd_t pmd = pfn_pmd(pfn, prot);
+
+	WARN_ON(!pmd_same(pmd, pmd));
+	WARN_ON(!pmd_young(pmd_mkyoung(pmd)));
+	WARN_ON(!pmd_dirty(pmd_mkdirty(pmd)));
+	WARN_ON(!pmd_write(pmd_mkwrite(pmd)));
+	WARN_ON(pmd_young(pmd_mkold(pmd)));
+	WARN_ON(pmd_dirty(pmd_mkclean(pmd)));
+	WARN_ON(pmd_write(pmd_wrprotect(pmd)));
+	/*
+	 * A huge page does not point to next level page table
+	 * entry. Hence this must qualify as pmd_bad().
+	 */
+	WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
+}
+
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot)
+{
+	pud_t pud = pfn_pud(pfn, prot);
+
+	WARN_ON(!pud_same(pud, pud));
+	WARN_ON(!pud_young(pud_mkyoung(pud)));
+	WARN_ON(!pud_write(pud_mkwrite(pud)));
+	WARN_ON(pud_write(pud_wrprotect(pud)));
+	WARN_ON(pud_young(pud_mkold(pud)));
+
+	if (mm_pmd_folded(mm) || __is_defined(ARCH_HAS_4LEVEL_HACK))
+		return;
+
+	/*
+	 * A huge page does not point to next level page table
+	 * entry. Hence this must qualify as pud_bad().
+	 */
+	WARN_ON(!pud_bad(pud_mkhuge(pud)));
+}
+#else
+static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
+#endif
+#else
+static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { }
+static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
+#endif
+
+static void __init p4d_basic_tests(unsigned long pfn, pgprot_t prot)
+{
+	p4d_t p4d;
+
+	memset(&p4d, RANDOM_NZVALUE, sizeof(p4d_t));
+	WARN_ON(!p4d_same(p4d, p4d));
+}
+
+static void __init pgd_basic_tests(unsigned long pfn, pgprot_t prot)
+{
+	pgd_t pgd;
+
+	memset(&pgd, RANDOM_NZVALUE, sizeof(pgd_t));
+	WARN_ON(!pgd_same(pgd, pgd));
+}
+
+#ifndef __ARCH_HAS_4LEVEL_HACK
+static void __init pud_clear_tests(struct mm_struct *mm, pud_t *pudp)
+{
+	pud_t pud = READ_ONCE(*pudp);
+
+	if (mm_pmd_folded(mm))
+		return;
+
+	pud = __pud(pud_val(pud) | RANDOM_ORVALUE);
+	WRITE_ONCE(*pudp, pud);
+	pud_clear(pudp);
+	pud = READ_ONCE(*pudp);
+	WARN_ON(!pud_none(pud));
+}
+
+static void __init pud_populate_tests(struct mm_struct *mm, pud_t *pudp,
+				      pmd_t *pmdp)
+{
+	pud_t pud;
+
+	if (mm_pmd_folded(mm))
+		return;
+	/*
+	 * This entry points to next level page table page.
+	 * Hence this must not qualify as pud_bad().
+	 */
+	pmd_clear(pmdp);
+	pud_clear(pudp);
+	pud_populate(mm, pudp, pmdp);
+	pud = READ_ONCE(*pudp);
+	WARN_ON(pud_bad(pud));
+}
+#else
+static void __init pud_clear_tests(struct mm_struct *mm, pud_t *pudp) { }
+static void __init pud_populate_tests(struct mm_struct *mm, pud_t *pudp,
+				      pmd_t *pmdp)
+{
+}
+#endif
+
+#ifndef __ARCH_HAS_5LEVEL_HACK
+static void __init p4d_clear_tests(struct mm_struct *mm, p4d_t *p4dp)
+{
+	p4d_t p4d = READ_ONCE(*p4dp);
+
+	if (mm_pud_folded(mm))
+		return;
+
+	p4d = __p4d(p4d_val(p4d) | RANDOM_ORVALUE);
+	WRITE_ONCE(*p4dp, p4d);
+	p4d_clear(p4dp);
+	p4d = READ_ONCE(*p4dp);
+	WARN_ON(!p4d_none(p4d));
+}
+
+static void __init p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp,
+				      pud_t *pudp)
+{
+	p4d_t p4d;
+
+	if (mm_pud_folded(mm))
+		return;
+
+	/*
+	 * This entry points to next level page table page.
+	 * Hence this must not qualify as p4d_bad().
+	 */
+	pud_clear(pudp);
+	p4d_clear(p4dp);
+	p4d_populate(mm, p4dp, pudp);
+	p4d = READ_ONCE(*p4dp);
+	WARN_ON(p4d_bad(p4d));
+}
+
+static void __init pgd_clear_tests(struct mm_struct *mm, pgd_t *pgdp)
+{
+	pgd_t pgd = READ_ONCE(*pgdp);
+
+	if (mm_p4d_folded(mm))
+		return;
+
+	pgd = __pgd(pgd_val(pgd) | RANDOM_ORVALUE);
+	WRITE_ONCE(*pgdp, pgd);
+	pgd_clear(pgdp);
+	pgd = READ_ONCE(*pgdp);
+	WARN_ON(!pgd_none(pgd));
+}
+
+static void __init pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp,
+				      p4d_t *p4dp)
+{
+	pgd_t pgd;
+
+	if (mm_p4d_folded(mm))
+		return;
+
+	/*
+	 * This entry points to next level page table page.
+	 * Hence this must not qualify as pgd_bad().
+	 */
+	p4d_clear(p4dp);
+	pgd_clear(pgdp);
+	pgd_populate(mm, pgdp, p4dp);
+	pgd = READ_ONCE(*pgdp);
+	WARN_ON(pgd_bad(pgd));
+}
+#else
+static void __init p4d_clear_tests(struct mm_struct *mm, p4d_t *p4dp) { }
+static void __init pgd_clear_tests(struct mm_struct *mm, pgd_t *pgdp) { }
+static void __init p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp,
+				      pud_t *pudp)
+{
+}
+static void __init pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp,
+				      p4d_t *p4dp)
+{
+}
+#endif
+
+static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep)
+{
+	pte_t pte = READ_ONCE(*ptep);
+
+	pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
+	WRITE_ONCE(*ptep, pte);
+	pte_clear(mm, 0, ptep);
+	pte = READ_ONCE(*ptep);
+	WARN_ON(!pte_none(pte));
+}
+
+static void __init pmd_clear_tests(struct mm_struct *mm, pmd_t *pmdp)
+{
+	pmd_t pmd = READ_ONCE(*pmdp);
+
+	pmd = __pmd(pmd_val(pmd) | RANDOM_ORVALUE);
+	WRITE_ONCE(*pmdp, pmd);
+	pmd_clear(pmdp);
+	pmd = READ_ONCE(*pmdp);
+	WARN_ON(!pmd_none(pmd));
+}
+
+static void __init pmd_populate_tests(struct mm_struct *mm, pmd_t *pmdp,
+				      pgtable_t pgtable)
+{
+	pmd_t pmd;
+
+	/*
+	 * This entry points to next level page table page.
+	 * Hence this must not qualify as pmd_bad().
+	 */
+	pmd_clear(pmdp);
+	pmd_populate(mm, pmdp, pgtable);
+	pmd = READ_ONCE(*pmdp);
+	WARN_ON(pmd_bad(pmd));
+}
+
+static unsigned long __init get_random_vaddr(void)
+{
+	unsigned long random_vaddr, random_pages, total_user_pages;
+
+	total_user_pages = (TASK_SIZE - FIRST_USER_ADDRESS) / PAGE_SIZE;
+
+	random_pages = get_random_long() % total_user_pages;
+	random_vaddr = FIRST_USER_ADDRESS + random_pages * PAGE_SIZE;
+
+	return random_vaddr;
+}
+
+void __init debug_vm_pgtable(void)
+{
+	struct mm_struct *mm;
+	pgd_t *pgdp;
+	p4d_t *p4dp, *saved_p4dp;
+	pud_t *pudp, *saved_pudp;
+	pmd_t *pmdp, *saved_pmdp, pmd;
+	pte_t *ptep;
+	pgtable_t saved_ptep;
+	pgprot_t prot;
+	phys_addr_t paddr;
+	unsigned long vaddr, pte_aligned, pmd_aligned;
+	unsigned long pud_aligned, p4d_aligned, pgd_aligned;
+
+	pr_info("Validating architecture page table helpers\n");
+	prot = vm_get_page_prot(VMFLAGS);
+	vaddr = get_random_vaddr();
+	mm = mm_alloc();
+	if (!mm) {
+		pr_err("mm_struct allocation failed\n");
+		return;
+	}
+
+	/*
+	 * PFN for mapping at PTE level is determined from a standard kernel
+	 * text symbol. But pfns for higher page table levels are derived by
+	 * masking lower bits of this real pfn. These derived pfns might not
+	 * exist on the platform but that does not really matter as pfn_pxx()
+	 * helpers will still create appropriate entries for the test. This
+	 * helps avoid large memory block allocations to be used for mapping
+	 * at higher page table levels.
+	 */
+	paddr = __pa(&start_kernel);
+
+	pte_aligned = (paddr & PAGE_MASK) >> PAGE_SHIFT;
+	pmd_aligned = (paddr & PMD_MASK) >> PAGE_SHIFT;
+	pud_aligned = (paddr & PUD_MASK) >> PAGE_SHIFT;
+	p4d_aligned = (paddr & P4D_MASK) >> PAGE_SHIFT;
+	pgd_aligned = (paddr & PGDIR_MASK) >> PAGE_SHIFT;
+	WARN_ON(!pfn_valid(pte_aligned));
+
+	pgdp = pgd_offset(mm, vaddr);
+	p4dp = p4d_alloc(mm, pgdp, vaddr);
+	pudp = pud_alloc(mm, p4dp, vaddr);
+	pmdp = pmd_alloc(mm, pudp, vaddr);
+	ptep = pte_alloc_map(mm, pmdp, vaddr);
+
+	/*
+	 * Save all the page table page addresses as the page table
+	 * entries will be used for testing with random or garbage
+	 * values. These saved addresses will be used for freeing
+	 * page table pages.
+	 */
+	pmd = READ_ONCE(*pmdp);
+	saved_p4dp = p4d_offset(pgdp, 0UL);
+	saved_pudp = pud_offset(p4dp, 0UL);
+	saved_pmdp = pmd_offset(pudp, 0UL);
+	saved_ptep = pmd_pgtable(pmd);
+
+	pte_basic_tests(pte_aligned, prot);
+	pmd_basic_tests(pmd_aligned, prot);
+	pud_basic_tests(pud_aligned, prot);
+	p4d_basic_tests(p4d_aligned, prot);
+	pgd_basic_tests(pgd_aligned, prot);
+
+	pte_clear_tests(mm, ptep);
+	pmd_clear_tests(mm, pmdp);
+	pud_clear_tests(mm, pudp);
+	p4d_clear_tests(mm, p4dp);
+	pgd_clear_tests(mm, pgdp);
+
+	pte_unmap(ptep);
+
+	pmd_populate_tests(mm, pmdp, saved_ptep);
+	pud_populate_tests(mm, pudp, saved_pmdp);
+	p4d_populate_tests(mm, p4dp, saved_pudp);
+	pgd_populate_tests(mm, pgdp, saved_p4dp);
+
+	p4d_free(mm, saved_p4dp);
+	pud_free(mm, saved_pudp);
+	pmd_free(mm, saved_pmdp);
+	pte_free(mm, saved_ptep);
+
+	mm_dec_nr_puds(mm);
+	mm_dec_nr_pmds(mm);
+	mm_dec_nr_ptes(mm);
+	mmdrop(mm);
+}