Linux-RISC-V Archive on lore.kernel.org
 help / color / Atom feed
* Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes
       [not found] <20200902114222.181353-1-aneesh.kumar@linux.ibm.com>
@ 2020-09-04  6:48 ` Anshuman Khandual
  2020-09-04 15:26   ` Gerald Schaefer
       [not found] ` <20200902114222.181353-14-aneesh.kumar@linux.ibm.com>
  1 sibling, 1 reply; 15+ messages in thread
From: Anshuman Khandual @ 2020-09-04  6:48 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm
  Cc: linux-s390, mpe, Vineet Gupta, linux-riscv, linux-snps-arc,
	linuxppc-dev, Gerald Schaefer



On 09/02/2020 05:12 PM, Aneesh Kumar K.V wrote:
> This patch series includes fixes for debug_vm_pgtable test code so that
> they follow page table updates rules correctly. The first two patches introduce
> changes w.r.t ppc64. The patches are included in this series for completeness. We can
> merge them via ppc64 tree if required.
> 
> Hugetlb test is disabled on ppc64 because that needs larger change to satisfy
> page table update rules.
> 
> These tests are broken w.r.t page table update rules and results in kernel
> crash as below. 
> 
> [   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
> cpu 0x0: Vector: 700 (Program Check) at [c000000c6d1e76c0]
>     pc: c00000000009a5ec: assert_pte_locked+0x14c/0x380
>     lr: c0000000005eeeec: pte_update+0x11c/0x190
>     sp: c000000c6d1e7950
>    msr: 8000000002029033
>   current = 0xc000000c6d172c80
>   paca    = 0xc000000003ba0000   irqmask: 0x03   irq_happened: 0x01
>     pid   = 1, comm = swapper/0
> kernel BUG at arch/powerpc/mm/pgtable.c:304!
> [link register   ] c0000000005eeeec pte_update+0x11c/0x190
> [c000000c6d1e7950] 0000000000000001 (unreliable)
> [c000000c6d1e79b0] c0000000005eee14 pte_update+0x44/0x190
> [c000000c6d1e7a10] c000000001a2ca9c pte_advanced_tests+0x160/0x3d8
> [c000000c6d1e7ab0] c000000001a2d4fc debug_vm_pgtable+0x7e8/0x1338
> [c000000c6d1e7ba0] c0000000000116ec do_one_initcall+0xac/0x5f0
> [c000000c6d1e7c80] c0000000019e4fac kernel_init_freeable+0x4dc/0x5a4
> [c000000c6d1e7db0] c000000000012474 kernel_init+0x24/0x160
> [c000000c6d1e7e20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
> 
> With DEBUG_VM disabled
> 
> [   20.530152] BUG: Kernel NULL pointer dereference on read at 0x00000000
> [   20.530183] Faulting instruction address: 0xc0000000000df330
> cpu 0x33: Vector: 380 (Data SLB Access) at [c000000c6d19f700]
>     pc: c0000000000df330: memset+0x68/0x104
>     lr: c00000000009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
>     sp: c000000c6d19f990
>    msr: 8000000002009033
>    dar: 0
>   current = 0xc000000c6d177480
>   paca    = 0xc00000001ec4f400   irqmask: 0x03   irq_happened: 0x01
>     pid   = 1, comm = swapper/0
> [link register   ] c00000000009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
> [c000000c6d19f990] c00000000009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 (unreliable)
> [c000000c6d19fa10] c0000000019ebf30 pmd_advanced_tests+0x1f0/0x378
> [c000000c6d19fab0] c0000000019ed088 debug_vm_pgtable+0x79c/0x1244
> [c000000c6d19fba0] c0000000000116ec do_one_initcall+0xac/0x5f0
> [c000000c6d19fc80] c0000000019a4fac kernel_init_freeable+0x4dc/0x5a4
> [c000000c6d19fdb0] c000000000012474 kernel_init+0x24/0x160
> [c000000c6d19fe20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
> 
> Changes from v3:
> * Address review feedback
> * Move page table depost and withdraw patch after adding pmdlock to avoid bisect failure.

This version

- Builds on x86, arm64, s390, arc, powerpc and riscv (defconfig with DEBUG_VM_PGTABLE)
- Runs on arm64 and x86 without any regression, atleast nothing that I have noticed
- Will be great if this could get tested on s390, arc, riscv, ppc32 platforms as well

+ linux-riscv <linux-riscv@lists.infradead.org>
+ linux-snps-arc@lists.infradead.org <linux-snps-arc@lists.infradead.org>
+ linux-s390@vger.kernel.org
+ Gerald Schaefer <gerald.schaefer@de.ibm.com>
+ Vineet Gupta <vgupta@synopsys.com>

There is still an open git bisect issue on arm64 platform which ideally should be fixed.

- Anshuman

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes
  2020-09-04  6:48 ` [PATCH v4 00/13] mm/debug_vm_pgtable fixes Anshuman Khandual
@ 2020-09-04 15:26   ` Gerald Schaefer
  2020-09-04 16:01     ` Gerald Schaefer
  2020-09-09  8:08     ` Anshuman Khandual
  0 siblings, 2 replies; 15+ messages in thread
From: Gerald Schaefer @ 2020-09-04 15:26 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-s390, Aneesh Kumar K.V, linux-mm, Vineet Gupta, mpe, akpm,
	linux-snps-arc, linuxppc-dev, linux-riscv, Gerald Schaefer

On Fri, 4 Sep 2020 12:18:05 +0530
Anshuman Khandual <anshuman.khandual@arm.com> wrote:

> 
> 
> On 09/02/2020 05:12 PM, Aneesh Kumar K.V wrote:
> > This patch series includes fixes for debug_vm_pgtable test code so that
> > they follow page table updates rules correctly. The first two patches introduce
> > changes w.r.t ppc64. The patches are included in this series for completeness. We can
> > merge them via ppc64 tree if required.
> > 
> > Hugetlb test is disabled on ppc64 because that needs larger change to satisfy
> > page table update rules.
> > 
> > These tests are broken w.r.t page table update rules and results in kernel
> > crash as below. 
> > 
> > [   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
> > cpu 0x0: Vector: 700 (Program Check) at [c000000c6d1e76c0]
> >     pc: c00000000009a5ec: assert_pte_locked+0x14c/0x380
> >     lr: c0000000005eeeec: pte_update+0x11c/0x190
> >     sp: c000000c6d1e7950
> >    msr: 8000000002029033
> >   current = 0xc000000c6d172c80
> >   paca    = 0xc000000003ba0000   irqmask: 0x03   irq_happened: 0x01
> >     pid   = 1, comm = swapper/0
> > kernel BUG at arch/powerpc/mm/pgtable.c:304!
> > [link register   ] c0000000005eeeec pte_update+0x11c/0x190
> > [c000000c6d1e7950] 0000000000000001 (unreliable)
> > [c000000c6d1e79b0] c0000000005eee14 pte_update+0x44/0x190
> > [c000000c6d1e7a10] c000000001a2ca9c pte_advanced_tests+0x160/0x3d8
> > [c000000c6d1e7ab0] c000000001a2d4fc debug_vm_pgtable+0x7e8/0x1338
> > [c000000c6d1e7ba0] c0000000000116ec do_one_initcall+0xac/0x5f0
> > [c000000c6d1e7c80] c0000000019e4fac kernel_init_freeable+0x4dc/0x5a4
> > [c000000c6d1e7db0] c000000000012474 kernel_init+0x24/0x160
> > [c000000c6d1e7e20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
> > 
> > With DEBUG_VM disabled
> > 
> > [   20.530152] BUG: Kernel NULL pointer dereference on read at 0x00000000
> > [   20.530183] Faulting instruction address: 0xc0000000000df330
> > cpu 0x33: Vector: 380 (Data SLB Access) at [c000000c6d19f700]
> >     pc: c0000000000df330: memset+0x68/0x104
> >     lr: c00000000009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
> >     sp: c000000c6d19f990
> >    msr: 8000000002009033
> >    dar: 0
> >   current = 0xc000000c6d177480
> >   paca    = 0xc00000001ec4f400   irqmask: 0x03   irq_happened: 0x01
> >     pid   = 1, comm = swapper/0
> > [link register   ] c00000000009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
> > [c000000c6d19f990] c00000000009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 (unreliable)
> > [c000000c6d19fa10] c0000000019ebf30 pmd_advanced_tests+0x1f0/0x378
> > [c000000c6d19fab0] c0000000019ed088 debug_vm_pgtable+0x79c/0x1244
> > [c000000c6d19fba0] c0000000000116ec do_one_initcall+0xac/0x5f0
> > [c000000c6d19fc80] c0000000019a4fac kernel_init_freeable+0x4dc/0x5a4
> > [c000000c6d19fdb0] c000000000012474 kernel_init+0x24/0x160
> > [c000000c6d19fe20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
> > 
> > Changes from v3:
> > * Address review feedback
> > * Move page table depost and withdraw patch after adding pmdlock to avoid bisect failure.
> 
> This version
> 
> - Builds on x86, arm64, s390, arc, powerpc and riscv (defconfig with DEBUG_VM_PGTABLE)
> - Runs on arm64 and x86 without any regression, atleast nothing that I have noticed
> - Will be great if this could get tested on s390, arc, riscv, ppc32 platforms as well

When I quickly tested v3, it worked fine, but now it turned out to
only work fine "sometimes", both v3 and v4. I need to look into it
further, but so far it seems related to the hugetlb_advanced_tests().

I guess there was already some discussion on this test, but we did
not receive all of the thread(s). Please always add at least
linux-s390@vger.kernel.org and maybe myself and Vasily Gorbik <gor@linux.ibm.com>
for further discussions.

That being said, sorry for duplications, this might already have been
discussed. Preliminary analysis showed that it only seems to go wrong
for certain random vaddr values. I cannot make any sense of that yet,
but what seems strange to me is that the hugetlb_advanced_tests()
take a (real) pte_t pointer as input, and also use that for all
kinds of operations (set_huge_pte_at, huge_ptep_get_and_clear, etc.).

Although all the hugetlb code in the kernel is (mis)using pte_t
pointers instead of the correct pmd/pud_t pointers like THP, that
is just for historic reasons. The pointers will actually never point
to a real pte_t (i.e. page table entry), but of course to a pmd
or pud entry, depending on hugepage size.

What is passed in as ptep to hugetlb_advanced_tests() seems to be
the result from the previous ptep = pte_alloc_map(mm, pmdp, vaddr),
so I would expect that it points to a real page table entry. Need
to investigate further, but IIUC, using such a pointer for adding
large pte entries (i.e. pmd/pud entries) at least feels very wrong
to me, and I assume it is related to the issues we see on s390.

We actually see different issues, e.g. once a panic directly in
hugetlb_advanced_tests() -> huge_ptep_get_and_clear(), but also
indirect symptoms after debug_vm_pgtable() completes, like this:

[   10.533901] BUG task_struct (Not tainted): Padding overwritten. 0x0000000019f798c7-0x0000000019f798c7 @offset=30087

Last but not least, what I said about the pte vs. pmd/pud of
course also should apply to the hugetlb_basic_tests(), although
they are not directly using a pte_t pointer, and especially
also not writing to it. Still, the pte_aligned pfn parameter
is not guaranteed to also be pmd/pud_aligned, which doesn't
feel right.

So, for now, until this is sorted out, I guess we also need
to exclude s390 at least from the hugetlb_advanced_tests().
The hugetlb_basic_tests() seem to work fine so far (probably
by chance :-))

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes
  2020-09-04 15:26   ` Gerald Schaefer
@ 2020-09-04 16:01     ` Gerald Schaefer
  2020-09-04 17:53       ` Gerald Schaefer
                         ` (2 more replies)
  2020-09-09  8:08     ` Anshuman Khandual
  1 sibling, 3 replies; 15+ messages in thread
From: Gerald Schaefer @ 2020-09-04 16:01 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-s390, Aneesh Kumar K.V, linux-mm, Vineet Gupta, mpe, akpm,
	linux-snps-arc, linuxppc-dev, linux-riscv, Gerald Schaefer

On Fri, 4 Sep 2020 17:26:47 +0200
Gerald Schaefer <gerald.schaefer@linux.ibm.com> wrote:

> On Fri, 4 Sep 2020 12:18:05 +0530
> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> 
> > 
> > 
> > On 09/02/2020 05:12 PM, Aneesh Kumar K.V wrote:
> > > This patch series includes fixes for debug_vm_pgtable test code so that
> > > they follow page table updates rules correctly. The first two patches introduce
> > > changes w.r.t ppc64. The patches are included in this series for completeness. We can
> > > merge them via ppc64 tree if required.
> > > 
> > > Hugetlb test is disabled on ppc64 because that needs larger change to satisfy
> > > page table update rules.
> > > 
> > > These tests are broken w.r.t page table update rules and results in kernel
> > > crash as below. 
> > > 
> > > [   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
> > > cpu 0x0: Vector: 700 (Program Check) at [c000000c6d1e76c0]
> > >     pc: c00000000009a5ec: assert_pte_locked+0x14c/0x380
> > >     lr: c0000000005eeeec: pte_update+0x11c/0x190
> > >     sp: c000000c6d1e7950
> > >    msr: 8000000002029033
> > >   current = 0xc000000c6d172c80
> > >   paca    = 0xc000000003ba0000   irqmask: 0x03   irq_happened: 0x01
> > >     pid   = 1, comm = swapper/0
> > > kernel BUG at arch/powerpc/mm/pgtable.c:304!
> > > [link register   ] c0000000005eeeec pte_update+0x11c/0x190
> > > [c000000c6d1e7950] 0000000000000001 (unreliable)
> > > [c000000c6d1e79b0] c0000000005eee14 pte_update+0x44/0x190
> > > [c000000c6d1e7a10] c000000001a2ca9c pte_advanced_tests+0x160/0x3d8
> > > [c000000c6d1e7ab0] c000000001a2d4fc debug_vm_pgtable+0x7e8/0x1338
> > > [c000000c6d1e7ba0] c0000000000116ec do_one_initcall+0xac/0x5f0
> > > [c000000c6d1e7c80] c0000000019e4fac kernel_init_freeable+0x4dc/0x5a4
> > > [c000000c6d1e7db0] c000000000012474 kernel_init+0x24/0x160
> > > [c000000c6d1e7e20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
> > > 
> > > With DEBUG_VM disabled
> > > 
> > > [   20.530152] BUG: Kernel NULL pointer dereference on read at 0x00000000
> > > [   20.530183] Faulting instruction address: 0xc0000000000df330
> > > cpu 0x33: Vector: 380 (Data SLB Access) at [c000000c6d19f700]
> > >     pc: c0000000000df330: memset+0x68/0x104
> > >     lr: c00000000009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
> > >     sp: c000000c6d19f990
> > >    msr: 8000000002009033
> > >    dar: 0
> > >   current = 0xc000000c6d177480
> > >   paca    = 0xc00000001ec4f400   irqmask: 0x03   irq_happened: 0x01
> > >     pid   = 1, comm = swapper/0
> > > [link register   ] c00000000009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
> > > [c000000c6d19f990] c00000000009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 (unreliable)
> > > [c000000c6d19fa10] c0000000019ebf30 pmd_advanced_tests+0x1f0/0x378
> > > [c000000c6d19fab0] c0000000019ed088 debug_vm_pgtable+0x79c/0x1244
> > > [c000000c6d19fba0] c0000000000116ec do_one_initcall+0xac/0x5f0
> > > [c000000c6d19fc80] c0000000019a4fac kernel_init_freeable+0x4dc/0x5a4
> > > [c000000c6d19fdb0] c000000000012474 kernel_init+0x24/0x160
> > > [c000000c6d19fe20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
> > > 
> > > Changes from v3:
> > > * Address review feedback
> > > * Move page table depost and withdraw patch after adding pmdlock to avoid bisect failure.
> > 
> > This version
> > 
> > - Builds on x86, arm64, s390, arc, powerpc and riscv (defconfig with DEBUG_VM_PGTABLE)
> > - Runs on arm64 and x86 without any regression, atleast nothing that I have noticed
> > - Will be great if this could get tested on s390, arc, riscv, ppc32 platforms as well
> 
> When I quickly tested v3, it worked fine, but now it turned out to
> only work fine "sometimes", both v3 and v4. I need to look into it
> further, but so far it seems related to the hugetlb_advanced_tests().
> 
> I guess there was already some discussion on this test, but we did
> not receive all of the thread(s). Please always add at least
> linux-s390@vger.kernel.org and maybe myself and Vasily Gorbik <gor@linux.ibm.com>
> for further discussions.

BTW, with myself I mean the new address gerald.schaefer@linux.ibm.com.
The old gerald.schaefer@de.ibm.com seems to work (again), but is not
very reliable.

BTW2, a quick test with this change (so far) made the issues on s390
go away:

@@ -1069,7 +1074,7 @@ static int __init debug_vm_pgtable(void)
        spin_unlock(ptl);
 
 #ifndef CONFIG_PPC_BOOK3S_64
-       hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+       hugetlb_advanced_tests(mm, vma, (pte_t *) pmdp, pmd_aligned, vaddr, prot);
 #endif
 
        spin_lock(&mm->page_table_lock);

That would more match the "pte_t pointer" usage for hugetlb code,
i.e. just cast a pmd_t pointer to it. Also changed to pmd_aligned,
but I think the root cause is the pte_t pointer.

Not entirely sure though if that would really be the correct fix.
I somehow lost whatever little track I had about what these tests
really want to check, and if that would still be valid with that
change.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes
  2020-09-04 16:01     ` Gerald Schaefer
@ 2020-09-04 17:53       ` Gerald Schaefer
  2020-09-09  8:38         ` Anshuman Khandual
  2020-09-08 15:39       ` Gerald Schaefer
  2020-09-09  8:15       ` Anshuman Khandual
  2 siblings, 1 reply; 15+ messages in thread
From: Gerald Schaefer @ 2020-09-04 17:53 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-s390, Aneesh Kumar K.V, linux-mm, Vineet Gupta, mpe, akpm,
	linux-snps-arc, linuxppc-dev, linux-riscv, Gerald Schaefer

On Fri, 4 Sep 2020 18:01:15 +0200
Gerald Schaefer <gerald.schaefer@linux.ibm.com> wrote:

> On Fri, 4 Sep 2020 17:26:47 +0200
> Gerald Schaefer <gerald.schaefer@linux.ibm.com> wrote:
> 
> > On Fri, 4 Sep 2020 12:18:05 +0530
> > Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> > 
> > > 
> > > 
> > > On 09/02/2020 05:12 PM, Aneesh Kumar K.V wrote:
> > > > This patch series includes fixes for debug_vm_pgtable test code so that
> > > > they follow page table updates rules correctly. The first two patches introduce
> > > > changes w.r.t ppc64. The patches are included in this series for completeness. We can
> > > > merge them via ppc64 tree if required.
> > > > 
> > > > Hugetlb test is disabled on ppc64 because that needs larger change to satisfy
> > > > page table update rules.
> > > > 
> > > > These tests are broken w.r.t page table update rules and results in kernel
> > > > crash as below. 
> > > > 
> > > > [   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
> > > > cpu 0x0: Vector: 700 (Program Check) at [c000000c6d1e76c0]
> > > >     pc: c00000000009a5ec: assert_pte_locked+0x14c/0x380
> > > >     lr: c0000000005eeeec: pte_update+0x11c/0x190
> > > >     sp: c000000c6d1e7950
> > > >    msr: 8000000002029033
> > > >   current = 0xc000000c6d172c80
> > > >   paca    = 0xc000000003ba0000   irqmask: 0x03   irq_happened: 0x01
> > > >     pid   = 1, comm = swapper/0
> > > > kernel BUG at arch/powerpc/mm/pgtable.c:304!
> > > > [link register   ] c0000000005eeeec pte_update+0x11c/0x190
> > > > [c000000c6d1e7950] 0000000000000001 (unreliable)
> > > > [c000000c6d1e79b0] c0000000005eee14 pte_update+0x44/0x190
> > > > [c000000c6d1e7a10] c000000001a2ca9c pte_advanced_tests+0x160/0x3d8
> > > > [c000000c6d1e7ab0] c000000001a2d4fc debug_vm_pgtable+0x7e8/0x1338
> > > > [c000000c6d1e7ba0] c0000000000116ec do_one_initcall+0xac/0x5f0
> > > > [c000000c6d1e7c80] c0000000019e4fac kernel_init_freeable+0x4dc/0x5a4
> > > > [c000000c6d1e7db0] c000000000012474 kernel_init+0x24/0x160
> > > > [c000000c6d1e7e20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
> > > > 
> > > > With DEBUG_VM disabled
> > > > 
> > > > [   20.530152] BUG: Kernel NULL pointer dereference on read at 0x00000000
> > > > [   20.530183] Faulting instruction address: 0xc0000000000df330
> > > > cpu 0x33: Vector: 380 (Data SLB Access) at [c000000c6d19f700]
> > > >     pc: c0000000000df330: memset+0x68/0x104
> > > >     lr: c00000000009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
> > > >     sp: c000000c6d19f990
> > > >    msr: 8000000002009033
> > > >    dar: 0
> > > >   current = 0xc000000c6d177480
> > > >   paca    = 0xc00000001ec4f400   irqmask: 0x03   irq_happened: 0x01
> > > >     pid   = 1, comm = swapper/0
> > > > [link register   ] c00000000009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
> > > > [c000000c6d19f990] c00000000009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 (unreliable)
> > > > [c000000c6d19fa10] c0000000019ebf30 pmd_advanced_tests+0x1f0/0x378
> > > > [c000000c6d19fab0] c0000000019ed088 debug_vm_pgtable+0x79c/0x1244
> > > > [c000000c6d19fba0] c0000000000116ec do_one_initcall+0xac/0x5f0
> > > > [c000000c6d19fc80] c0000000019a4fac kernel_init_freeable+0x4dc/0x5a4
> > > > [c000000c6d19fdb0] c000000000012474 kernel_init+0x24/0x160
> > > > [c000000c6d19fe20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
> > > > 
> > > > Changes from v3:
> > > > * Address review feedback
> > > > * Move page table depost and withdraw patch after adding pmdlock to avoid bisect failure.
> > > 
> > > This version
> > > 
> > > - Builds on x86, arm64, s390, arc, powerpc and riscv (defconfig with DEBUG_VM_PGTABLE)
> > > - Runs on arm64 and x86 without any regression, atleast nothing that I have noticed
> > > - Will be great if this could get tested on s390, arc, riscv, ppc32 platforms as well
> > 
> > When I quickly tested v3, it worked fine, but now it turned out to
> > only work fine "sometimes", both v3 and v4. I need to look into it
> > further, but so far it seems related to the hugetlb_advanced_tests().
> > 
> > I guess there was already some discussion on this test, but we did
> > not receive all of the thread(s). Please always add at least
> > linux-s390@vger.kernel.org and maybe myself and Vasily Gorbik <gor@linux.ibm.com>
> > for further discussions.
> 
> BTW, with myself I mean the new address gerald.schaefer@linux.ibm.com.
> The old gerald.schaefer@de.ibm.com seems to work (again), but is not
> very reliable.
> 
> BTW2, a quick test with this change (so far) made the issues on s390
> go away:
> 
> @@ -1069,7 +1074,7 @@ static int __init debug_vm_pgtable(void)
>         spin_unlock(ptl);
> 
>  #ifndef CONFIG_PPC_BOOK3S_64
> -       hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
> +       hugetlb_advanced_tests(mm, vma, (pte_t *) pmdp, pmd_aligned, vaddr, prot);
>  #endif
> 
>         spin_lock(&mm->page_table_lock);
> 
> That would more match the "pte_t pointer" usage for hugetlb code,
> i.e. just cast a pmd_t pointer to it. Also changed to pmd_aligned,
> but I think the root cause is the pte_t pointer.
> 
> Not entirely sure though if that would really be the correct fix.
> I somehow lost whatever little track I had about what these tests
> really want to check, and if that would still be valid with that
> change.

Another potential issue, apparently not for s390, but maybe for
others, is that the vaddr passed to hugetlb_advanced_tests() is
also not pmd/pud size aligned, like you did in pmd/pud_advanced_tests().

I guess for the hugetlb_advanced_tests() you need to choose if
you want to test pmd or pud hugepages, and accordingly prepare
the *ptep, pfn and vaddr input. If you only check for CONFIG_HUGETLB_PAGE,
then probably only pmd hugepages would be safe, there might be
architectures only supporting one hugepage size.

So, for s390, at least the ptep input value is a problem. Still
need to better understand how it goes wrong, but it seems to be
fixed when using proper pmdp, and also works with pudp.

For others, especially the apparent issues on ppc64, the other
non-hugepage aligned input pfn and vaddr might also be an issue,
e.g. power at least seems to use the vaddr in its set_huge_pte_at()
implementation for some pmd_off(mm, addr) calculation.

Again, sorry if this was already discussed, I missed most of it
and honestly didn't properly look at the scarce mails that we did
receive...

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes
  2020-09-04 16:01     ` Gerald Schaefer
  2020-09-04 17:53       ` Gerald Schaefer
@ 2020-09-08 15:39       ` Gerald Schaefer
  2020-09-09  6:08         ` Aneesh Kumar K.V
  2020-09-09  8:15       ` Anshuman Khandual
  2 siblings, 1 reply; 15+ messages in thread
From: Gerald Schaefer @ 2020-09-08 15:39 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-s390, Aneesh Kumar K.V, linux-mm, Vineet Gupta, mpe, akpm,
	linux-snps-arc, linuxppc-dev, linux-riscv, Gerald Schaefer

On Fri, 4 Sep 2020 18:01:15 +0200
Gerald Schaefer <gerald.schaefer@linux.ibm.com> wrote:

[...]
> 
> BTW2, a quick test with this change (so far) made the issues on s390
> go away:
> 
> @@ -1069,7 +1074,7 @@ static int __init debug_vm_pgtable(void)
>         spin_unlock(ptl);
> 
>  #ifndef CONFIG_PPC_BOOK3S_64
> -       hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
> +       hugetlb_advanced_tests(mm, vma, (pte_t *) pmdp, pmd_aligned, vaddr, prot);
>  #endif
> 
>         spin_lock(&mm->page_table_lock);
> 
> That would more match the "pte_t pointer" usage for hugetlb code,
> i.e. just cast a pmd_t pointer to it. Also changed to pmd_aligned,
> but I think the root cause is the pte_t pointer.
> 
> Not entirely sure though if that would really be the correct fix.
> I somehow lost whatever little track I had about what these tests
> really want to check, and if that would still be valid with that
> change.

Uh oh, wasn't aware that this (or some predecessor) already went
upstream, and broke our debug kernel today.

I found out now what goes (horribly) wrong on s390, see below for
more details. In short, using hugetlb primitives with ptep pointers
that do _not_ point to a pmd or pud entry will not work on s390.
It also seems to make no sense to verify / test such a thing in general,
as it would also be a severe bug if any kernel code would do that.
After all, with hugepages, there are no pte tables, only pmd etc.
tables.

My change above would fix the issue for s390, but I can still not
completely judge if that would not break other things for your
tests. In general, for normal kernel code, much of what you do would
be very broken, but I guess your tests are doing such "special" things
because they can. E.g. because they operate on some "sandbox" mm
and page tables, and you also do not need properly populated page
tables for some exit / free cleanup, you just throw them away
explicitly with pXd_free at the end. So it might just be "the right
thing" to pass a casted pmd pointer to hugetlb_advanced_tests(),
to simulate and test (proper) usage of the hugetlb primitives.
I also see no other way to make this work for s390, than using a
proper pmd/pud pointer. If not possible, please add us to the
#ifndef.

So, for all those interested, here is what goes wrong on s390.
huge_ptep_get_and_clear() uses the "idte" instruction for the
clearing (and TLB invalidation) part. That instruction expects
a "region or segment table" origin, which is a pmd/pud/p4d/pgd,
but not a pte table. Even worse, when we calculate the table
origin from the given ptep (which *should* not point to a pte),
due to different table sizes for pte / pXd tables, we end up
at some place before the given pte table.

The "idte" instruction also gets the virtual address, and does
corresponding index addition to the given table origin. Depending
on the pmd_index we now end up either within the pte table again,
in which case we see a panic because idte complains about seeing
a pte value. If we are unlucky, then we end up outside the pte
table, and depending on the content of that memory location, idte
might succeed, effectively corrupting that memory.

That explains why we only see the panic sometimes, depending on
random vaddr, other symptoms other times, and probably completely
silent memory corruption for the rest...

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes
  2020-09-08 15:39       ` Gerald Schaefer
@ 2020-09-09  6:08         ` Aneesh Kumar K.V
  2020-09-09 11:16           ` Gerald Schaefer
  0 siblings, 1 reply; 15+ messages in thread
From: Aneesh Kumar K.V @ 2020-09-09  6:08 UTC (permalink / raw)
  To: Gerald Schaefer, Anshuman Khandual
  Cc: linux-s390, mpe, linux-mm, Vineet Gupta, akpm, linux-snps-arc,
	linuxppc-dev, linux-riscv, Gerald Schaefer

Gerald Schaefer <gerald.schaefer@linux.ibm.com> writes:

> On Fri, 4 Sep 2020 18:01:15 +0200
> Gerald Schaefer <gerald.schaefer@linux.ibm.com> wrote:
>
> [...]
>> 
>> BTW2, a quick test with this change (so far) made the issues on s390
>> go away:
>> 
>> @@ -1069,7 +1074,7 @@ static int __init debug_vm_pgtable(void)
>>         spin_unlock(ptl);
>> 
>>  #ifndef CONFIG_PPC_BOOK3S_64
>> -       hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
>> +       hugetlb_advanced_tests(mm, vma, (pte_t *) pmdp, pmd_aligned, vaddr, prot);
>>  #endif
>> 
>>         spin_lock(&mm->page_table_lock);
>> 
>> That would more match the "pte_t pointer" usage for hugetlb code,
>> i.e. just cast a pmd_t pointer to it. Also changed to pmd_aligned,
>> but I think the root cause is the pte_t pointer.
>> 
>> Not entirely sure though if that would really be the correct fix.
>> I somehow lost whatever little track I had about what these tests
>> really want to check, and if that would still be valid with that
>> change.
>
> Uh oh, wasn't aware that this (or some predecessor) already went
> upstream, and broke our debug kernel today.

Not sure i followed the above. Are you finding that s390 kernel crash
after this patch series or the original patchset? As noted in my patch
the hugetlb test is broken and we should fix that. A quick fix is to
comment out that test for s390 too as i have done for PPC64.


-aneesh

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes
  2020-09-04 15:26   ` Gerald Schaefer
  2020-09-04 16:01     ` Gerald Schaefer
@ 2020-09-09  8:08     ` Anshuman Khandual
  2020-09-09 11:36       ` Gerald Schaefer
  1 sibling, 1 reply; 15+ messages in thread
From: Anshuman Khandual @ 2020-09-09  8:08 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: linux-s390, Aneesh Kumar K.V, linux-mm, Vineet Gupta, mpe, akpm,
	linux-snps-arc, linuxppc-dev, linux-riscv, Gerald Schaefer



On 09/04/2020 08:56 PM, Gerald Schaefer wrote:
> On Fri, 4 Sep 2020 12:18:05 +0530
> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> 
>>
>>
>> On 09/02/2020 05:12 PM, Aneesh Kumar K.V wrote:
>>> This patch series includes fixes for debug_vm_pgtable test code so that
>>> they follow page table updates rules correctly. The first two patches introduce
>>> changes w.r.t ppc64. The patches are included in this series for completeness. We can
>>> merge them via ppc64 tree if required.
>>>
>>> Hugetlb test is disabled on ppc64 because that needs larger change to satisfy
>>> page table update rules.
>>>
>>> These tests are broken w.r.t page table update rules and results in kernel
>>> crash as below. 
>>>
>>> [   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
>>> cpu 0x0: Vector: 700 (Program Check) at [c000000c6d1e76c0]
>>>     pc: c00000000009a5ec: assert_pte_locked+0x14c/0x380
>>>     lr: c0000000005eeeec: pte_update+0x11c/0x190
>>>     sp: c000000c6d1e7950
>>>    msr: 8000000002029033
>>>   current = 0xc000000c6d172c80
>>>   paca    = 0xc000000003ba0000   irqmask: 0x03   irq_happened: 0x01
>>>     pid   = 1, comm = swapper/0
>>> kernel BUG at arch/powerpc/mm/pgtable.c:304!
>>> [link register   ] c0000000005eeeec pte_update+0x11c/0x190
>>> [c000000c6d1e7950] 0000000000000001 (unreliable)
>>> [c000000c6d1e79b0] c0000000005eee14 pte_update+0x44/0x190
>>> [c000000c6d1e7a10] c000000001a2ca9c pte_advanced_tests+0x160/0x3d8
>>> [c000000c6d1e7ab0] c000000001a2d4fc debug_vm_pgtable+0x7e8/0x1338
>>> [c000000c6d1e7ba0] c0000000000116ec do_one_initcall+0xac/0x5f0
>>> [c000000c6d1e7c80] c0000000019e4fac kernel_init_freeable+0x4dc/0x5a4
>>> [c000000c6d1e7db0] c000000000012474 kernel_init+0x24/0x160
>>> [c000000c6d1e7e20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
>>>
>>> With DEBUG_VM disabled
>>>
>>> [   20.530152] BUG: Kernel NULL pointer dereference on read at 0x00000000
>>> [   20.530183] Faulting instruction address: 0xc0000000000df330
>>> cpu 0x33: Vector: 380 (Data SLB Access) at [c000000c6d19f700]
>>>     pc: c0000000000df330: memset+0x68/0x104
>>>     lr: c00000000009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
>>>     sp: c000000c6d19f990
>>>    msr: 8000000002009033
>>>    dar: 0
>>>   current = 0xc000000c6d177480
>>>   paca    = 0xc00000001ec4f400   irqmask: 0x03   irq_happened: 0x01
>>>     pid   = 1, comm = swapper/0
>>> [link register   ] c00000000009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
>>> [c000000c6d19f990] c00000000009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 (unreliable)
>>> [c000000c6d19fa10] c0000000019ebf30 pmd_advanced_tests+0x1f0/0x378
>>> [c000000c6d19fab0] c0000000019ed088 debug_vm_pgtable+0x79c/0x1244
>>> [c000000c6d19fba0] c0000000000116ec do_one_initcall+0xac/0x5f0
>>> [c000000c6d19fc80] c0000000019a4fac kernel_init_freeable+0x4dc/0x5a4
>>> [c000000c6d19fdb0] c000000000012474 kernel_init+0x24/0x160
>>> [c000000c6d19fe20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
>>>
>>> Changes from v3:
>>> * Address review feedback
>>> * Move page table depost and withdraw patch after adding pmdlock to avoid bisect failure.
>>
>> This version
>>
>> - Builds on x86, arm64, s390, arc, powerpc and riscv (defconfig with DEBUG_VM_PGTABLE)
>> - Runs on arm64 and x86 without any regression, atleast nothing that I have noticed
>> - Will be great if this could get tested on s390, arc, riscv, ppc32 platforms as well
> 
> When I quickly tested v3, it worked fine, but now it turned out to
> only work fine "sometimes", both v3 and v4. I need to look into it
> further, but so far it seems related to the hugetlb_advanced_tests().
> 
> I guess there was already some discussion on this test, but we did
> not receive all of the thread(s). Please always add at least
> linux-s390@vger.kernel.org and maybe myself and Vasily Gorbik <gor@linux.ibm.com>
> for further discussions.

IIRC, the V3 series previously had all these addresses copied properly
but this version once again missed copying all required addresses.

> 
> That being said, sorry for duplications, this might already have been
> discussed. Preliminary analysis showed that it only seems to go wrong
> for certain random vaddr values. I cannot make any sense of that yet,
> but what seems strange to me is that the hugetlb_advanced_tests()
> take a (real) pte_t pointer as input, and also use that for all
> kinds of operations (set_huge_pte_at, huge_ptep_get_and_clear, etc.).
> 
> Although all the hugetlb code in the kernel is (mis)using pte_t
> pointers instead of the correct pmd/pud_t pointers like THP, that
> is just for historic reasons. The pointers will actually never point
> to a real pte_t (i.e. page table entry), but of course to a pmd
> or pud entry, depending on hugepage size.

HugeTLB logically operates on a PTE entry irrespective of it's real
page table level position. Nonetheless, IIUC, vaddr here should have
been aligned to real page table level in which the entry is being
mapped currently.

> 
> What is passed in as ptep to hugetlb_advanced_tests() seems to be
> the result from the previous ptep = pte_alloc_map(mm, pmdp, vaddr),
> so I would expect that it points to a real page table entry. Need
> to investigate further, but IIUC, using such a pointer for adding
> large pte entries (i.e. pmd/pud entries) at least feels very wrong
> to me, and I assume it is related to the issues we see on s390.
Will look into this further.

> 
> We actually see different issues, e.g. once a panic directly in
> hugetlb_advanced_tests() -> huge_ptep_get_and_clear(), but also
> indirect symptoms after debug_vm_pgtable() completes, like this:
> 
> [   10.533901] BUG task_struct (Not tainted): Padding overwritten. 0x0000000019f798c7-0x0000000019f798c7 @offset=30087
> 
> Last but not least, what I said about the pte vs. pmd/pud of
> course also should apply to the hugetlb_basic_tests(), although
> they are not directly using a pte_t pointer, and especially
> also not writing to it. Still, the pte_aligned pfn parameter
> is not guaranteed to also be pmd/pud_aligned, which doesn't
> feel right.
hugetlb_basic_tests() does not directly operate on real page table
entries. But I do see the point wrt using pmd_aligned pfn instead.
I will look into this in detail and send out something after this
series settles down.

> 
> So, for now, until this is sorted out, I guess we also need
> to exclude s390 at least from the hugetlb_advanced_tests().
> The hugetlb_basic_tests() seem to work fine so far (probably
> by chance :-))

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes
  2020-09-04 16:01     ` Gerald Schaefer
  2020-09-04 17:53       ` Gerald Schaefer
  2020-09-08 15:39       ` Gerald Schaefer
@ 2020-09-09  8:15       ` Anshuman Khandual
  2020-09-09 11:10         ` Gerald Schaefer
  2 siblings, 1 reply; 15+ messages in thread
From: Anshuman Khandual @ 2020-09-09  8:15 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: linux-s390, Aneesh Kumar K.V, linux-mm, Vineet Gupta, mpe, akpm,
	linux-snps-arc, linuxppc-dev, linux-riscv, Gerald Schaefer



On 09/04/2020 09:31 PM, Gerald Schaefer wrote:
> On Fri, 4 Sep 2020 17:26:47 +0200
> Gerald Schaefer <gerald.schaefer@linux.ibm.com> wrote:
> 
>> On Fri, 4 Sep 2020 12:18:05 +0530
>> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>
>>>
>>>
>>> On 09/02/2020 05:12 PM, Aneesh Kumar K.V wrote:
>>>> This patch series includes fixes for debug_vm_pgtable test code so that
>>>> they follow page table updates rules correctly. The first two patches introduce
>>>> changes w.r.t ppc64. The patches are included in this series for completeness. We can
>>>> merge them via ppc64 tree if required.
>>>>
>>>> Hugetlb test is disabled on ppc64 because that needs larger change to satisfy
>>>> page table update rules.
>>>>
>>>> These tests are broken w.r.t page table update rules and results in kernel
>>>> crash as below. 
>>>>
>>>> [   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
>>>> cpu 0x0: Vector: 700 (Program Check) at [c000000c6d1e76c0]
>>>>     pc: c00000000009a5ec: assert_pte_locked+0x14c/0x380
>>>>     lr: c0000000005eeeec: pte_update+0x11c/0x190
>>>>     sp: c000000c6d1e7950
>>>>    msr: 8000000002029033
>>>>   current = 0xc000000c6d172c80
>>>>   paca    = 0xc000000003ba0000   irqmask: 0x03   irq_happened: 0x01
>>>>     pid   = 1, comm = swapper/0
>>>> kernel BUG at arch/powerpc/mm/pgtable.c:304!
>>>> [link register   ] c0000000005eeeec pte_update+0x11c/0x190
>>>> [c000000c6d1e7950] 0000000000000001 (unreliable)
>>>> [c000000c6d1e79b0] c0000000005eee14 pte_update+0x44/0x190
>>>> [c000000c6d1e7a10] c000000001a2ca9c pte_advanced_tests+0x160/0x3d8
>>>> [c000000c6d1e7ab0] c000000001a2d4fc debug_vm_pgtable+0x7e8/0x1338
>>>> [c000000c6d1e7ba0] c0000000000116ec do_one_initcall+0xac/0x5f0
>>>> [c000000c6d1e7c80] c0000000019e4fac kernel_init_freeable+0x4dc/0x5a4
>>>> [c000000c6d1e7db0] c000000000012474 kernel_init+0x24/0x160
>>>> [c000000c6d1e7e20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
>>>>
>>>> With DEBUG_VM disabled
>>>>
>>>> [   20.530152] BUG: Kernel NULL pointer dereference on read at 0x00000000
>>>> [   20.530183] Faulting instruction address: 0xc0000000000df330
>>>> cpu 0x33: Vector: 380 (Data SLB Access) at [c000000c6d19f700]
>>>>     pc: c0000000000df330: memset+0x68/0x104
>>>>     lr: c00000000009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
>>>>     sp: c000000c6d19f990
>>>>    msr: 8000000002009033
>>>>    dar: 0
>>>>   current = 0xc000000c6d177480
>>>>   paca    = 0xc00000001ec4f400   irqmask: 0x03   irq_happened: 0x01
>>>>     pid   = 1, comm = swapper/0
>>>> [link register   ] c00000000009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
>>>> [c000000c6d19f990] c00000000009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 (unreliable)
>>>> [c000000c6d19fa10] c0000000019ebf30 pmd_advanced_tests+0x1f0/0x378
>>>> [c000000c6d19fab0] c0000000019ed088 debug_vm_pgtable+0x79c/0x1244
>>>> [c000000c6d19fba0] c0000000000116ec do_one_initcall+0xac/0x5f0
>>>> [c000000c6d19fc80] c0000000019a4fac kernel_init_freeable+0x4dc/0x5a4
>>>> [c000000c6d19fdb0] c000000000012474 kernel_init+0x24/0x160
>>>> [c000000c6d19fe20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
>>>>
>>>> Changes from v3:
>>>> * Address review feedback
>>>> * Move page table depost and withdraw patch after adding pmdlock to avoid bisect failure.
>>>
>>> This version
>>>
>>> - Builds on x86, arm64, s390, arc, powerpc and riscv (defconfig with DEBUG_VM_PGTABLE)
>>> - Runs on arm64 and x86 without any regression, atleast nothing that I have noticed
>>> - Will be great if this could get tested on s390, arc, riscv, ppc32 platforms as well
>>
>> When I quickly tested v3, it worked fine, but now it turned out to
>> only work fine "sometimes", both v3 and v4. I need to look into it
>> further, but so far it seems related to the hugetlb_advanced_tests().
>>
>> I guess there was already some discussion on this test, but we did
>> not receive all of the thread(s). Please always add at least
>> linux-s390@vger.kernel.org and maybe myself and Vasily Gorbik <gor@linux.ibm.com>
>> for further discussions.
> 
> BTW, with myself I mean the new address gerald.schaefer@linux.ibm.com.
> The old gerald.schaefer@de.ibm.com seems to work (again), but is not
> very reliable.

Sure, noted.

> 
> BTW2, a quick test with this change (so far) made the issues on s390
> go away:
> 
> @@ -1069,7 +1074,7 @@ static int __init debug_vm_pgtable(void)
>         spin_unlock(ptl);
>  
>  #ifndef CONFIG_PPC_BOOK3S_64
> -       hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
> +       hugetlb_advanced_tests(mm, vma, (pte_t *) pmdp, pmd_aligned, vaddr, prot);
>  #endif
>  
>         spin_lock(&mm->page_table_lock);
> 
> That would more match the "pte_t pointer" usage for hugetlb code,
> i.e. just cast a pmd_t pointer to it. Also changed to pmd_aligned,
> but I think the root cause is the pte_t pointer.

Ideally, the pte_t pointer used here should be from huge_pte_alloc()
not from pte_alloc_map_lock() as the case currently.

> 
> Not entirely sure though if that would really be the correct fix.
> I somehow lost whatever little track I had about what these tests
> really want to check, and if that would still be valid with that
> change.
> 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes
  2020-09-04 17:53       ` Gerald Schaefer
@ 2020-09-09  8:38         ` Anshuman Khandual
  0 siblings, 0 replies; 15+ messages in thread
From: Anshuman Khandual @ 2020-09-09  8:38 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: linux-s390, Aneesh Kumar K.V, linux-mm, Vineet Gupta, mpe, akpm,
	linux-snps-arc, linuxppc-dev, linux-riscv, Gerald Schaefer

On 09/04/2020 11:23 PM, Gerald Schaefer wrote:
> On Fri, 4 Sep 2020 18:01:15 +0200
> Gerald Schaefer <gerald.schaefer@linux.ibm.com> wrote:
> 
>> On Fri, 4 Sep 2020 17:26:47 +0200
>> Gerald Schaefer <gerald.schaefer@linux.ibm.com> wrote:
>>
>>> On Fri, 4 Sep 2020 12:18:05 +0530
>>> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>
>>>>
>>>>
>>>> On 09/02/2020 05:12 PM, Aneesh Kumar K.V wrote:
>>>>> This patch series includes fixes for debug_vm_pgtable test code so that
>>>>> they follow page table updates rules correctly. The first two patches introduce
>>>>> changes w.r.t ppc64. The patches are included in this series for completeness. We can
>>>>> merge them via ppc64 tree if required.
>>>>>
>>>>> Hugetlb test is disabled on ppc64 because that needs larger change to satisfy
>>>>> page table update rules.
>>>>>
>>>>> These tests are broken w.r.t page table update rules and results in kernel
>>>>> crash as below. 
>>>>>
>>>>> [   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
>>>>> cpu 0x0: Vector: 700 (Program Check) at [c000000c6d1e76c0]
>>>>>     pc: c00000000009a5ec: assert_pte_locked+0x14c/0x380
>>>>>     lr: c0000000005eeeec: pte_update+0x11c/0x190
>>>>>     sp: c000000c6d1e7950
>>>>>    msr: 8000000002029033
>>>>>   current = 0xc000000c6d172c80
>>>>>   paca    = 0xc000000003ba0000   irqmask: 0x03   irq_happened: 0x01
>>>>>     pid   = 1, comm = swapper/0
>>>>> kernel BUG at arch/powerpc/mm/pgtable.c:304!
>>>>> [link register   ] c0000000005eeeec pte_update+0x11c/0x190
>>>>> [c000000c6d1e7950] 0000000000000001 (unreliable)
>>>>> [c000000c6d1e79b0] c0000000005eee14 pte_update+0x44/0x190
>>>>> [c000000c6d1e7a10] c000000001a2ca9c pte_advanced_tests+0x160/0x3d8
>>>>> [c000000c6d1e7ab0] c000000001a2d4fc debug_vm_pgtable+0x7e8/0x1338
>>>>> [c000000c6d1e7ba0] c0000000000116ec do_one_initcall+0xac/0x5f0
>>>>> [c000000c6d1e7c80] c0000000019e4fac kernel_init_freeable+0x4dc/0x5a4
>>>>> [c000000c6d1e7db0] c000000000012474 kernel_init+0x24/0x160
>>>>> [c000000c6d1e7e20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
>>>>>
>>>>> With DEBUG_VM disabled
>>>>>
>>>>> [   20.530152] BUG: Kernel NULL pointer dereference on read at 0x00000000
>>>>> [   20.530183] Faulting instruction address: 0xc0000000000df330
>>>>> cpu 0x33: Vector: 380 (Data SLB Access) at [c000000c6d19f700]
>>>>>     pc: c0000000000df330: memset+0x68/0x104
>>>>>     lr: c00000000009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
>>>>>     sp: c000000c6d19f990
>>>>>    msr: 8000000002009033
>>>>>    dar: 0
>>>>>   current = 0xc000000c6d177480
>>>>>   paca    = 0xc00000001ec4f400   irqmask: 0x03   irq_happened: 0x01
>>>>>     pid   = 1, comm = swapper/0
>>>>> [link register   ] c00000000009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
>>>>> [c000000c6d19f990] c00000000009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 (unreliable)
>>>>> [c000000c6d19fa10] c0000000019ebf30 pmd_advanced_tests+0x1f0/0x378
>>>>> [c000000c6d19fab0] c0000000019ed088 debug_vm_pgtable+0x79c/0x1244
>>>>> [c000000c6d19fba0] c0000000000116ec do_one_initcall+0xac/0x5f0
>>>>> [c000000c6d19fc80] c0000000019a4fac kernel_init_freeable+0x4dc/0x5a4
>>>>> [c000000c6d19fdb0] c000000000012474 kernel_init+0x24/0x160
>>>>> [c000000c6d19fe20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
>>>>>
>>>>> Changes from v3:
>>>>> * Address review feedback
>>>>> * Move page table depost and withdraw patch after adding pmdlock to avoid bisect failure.
>>>>
>>>> This version
>>>>
>>>> - Builds on x86, arm64, s390, arc, powerpc and riscv (defconfig with DEBUG_VM_PGTABLE)
>>>> - Runs on arm64 and x86 without any regression, atleast nothing that I have noticed
>>>> - Will be great if this could get tested on s390, arc, riscv, ppc32 platforms as well
>>>
>>> When I quickly tested v3, it worked fine, but now it turned out to
>>> only work fine "sometimes", both v3 and v4. I need to look into it
>>> further, but so far it seems related to the hugetlb_advanced_tests().
>>>
>>> I guess there was already some discussion on this test, but we did
>>> not receive all of the thread(s). Please always add at least
>>> linux-s390@vger.kernel.org and maybe myself and Vasily Gorbik <gor@linux.ibm.com>
>>> for further discussions.
>>
>> BTW, with myself I mean the new address gerald.schaefer@linux.ibm.com.
>> The old gerald.schaefer@de.ibm.com seems to work (again), but is not
>> very reliable.
>>
>> BTW2, a quick test with this change (so far) made the issues on s390
>> go away:
>>
>> @@ -1069,7 +1074,7 @@ static int __init debug_vm_pgtable(void)
>>         spin_unlock(ptl);
>>
>>  #ifndef CONFIG_PPC_BOOK3S_64
>> -       hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
>> +       hugetlb_advanced_tests(mm, vma, (pte_t *) pmdp, pmd_aligned, vaddr, prot);
>>  #endif
>>
>>         spin_lock(&mm->page_table_lock);
>>
>> That would more match the "pte_t pointer" usage for hugetlb code,
>> i.e. just cast a pmd_t pointer to it. Also changed to pmd_aligned,
>> but I think the root cause is the pte_t pointer.
>>
>> Not entirely sure though if that would really be the correct fix.
>> I somehow lost whatever little track I had about what these tests
>> really want to check, and if that would still be valid with that
>> change.
> 
> Another potential issue, apparently not for s390, but maybe for
> others, is that the vaddr passed to hugetlb_advanced_tests() is
> also not pmd/pud size aligned, like you did in pmd/pud_advanced_tests().
> 
> I guess for the hugetlb_advanced_tests() you need to choose if
> you want to test pmd or pud hugepages, and accordingly prepare
> the *ptep, pfn and vaddr input. If you only check for CONFIG_HUGETLB_PAGE,
> then probably only pmd hugepages would be safe, there might be
> architectures only supporting one hugepage size.

I guess preparing for PMD based HugeTLB tests should be sufficient
for now, which can be improved later on to cover other levels.

> 
> So, for s390, at least the ptep input value is a problem. Still
> need to better understand how it goes wrong, but it seems to be
> fixed when using proper pmdp, and also works with pudp.
> 
> For others, especially the apparent issues on ppc64, the other
> non-hugepage aligned input pfn and vaddr might also be an issue,
> e.g. power at least seems to use the vaddr in its set_huge_pte_at()
> implementation for some pmd_off(mm, addr) calculation.
> 
> Again, sorry if this was already discussed, I missed most of it
> and honestly didn't properly look at the scarce mails that we did
> receive...

Sure, will consider these points and try improve tests afterwards.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes
  2020-09-09  8:15       ` Anshuman Khandual
@ 2020-09-09 11:10         ` Gerald Schaefer
  0 siblings, 0 replies; 15+ messages in thread
From: Gerald Schaefer @ 2020-09-09 11:10 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-s390, Aneesh Kumar K.V, linux-mm, Vineet Gupta, mpe, akpm,
	linux-snps-arc, linuxppc-dev, linux-riscv, Gerald Schaefer

On Wed, 9 Sep 2020 13:45:48 +0530
Anshuman Khandual <anshuman.khandual@arm.com> wrote:

[...]
> > 
> > That would more match the "pte_t pointer" usage for hugetlb code,
> > i.e. just cast a pmd_t pointer to it. Also changed to pmd_aligned,
> > but I think the root cause is the pte_t pointer.
> 
> Ideally, the pte_t pointer used here should be from huge_pte_alloc()
> not from pte_alloc_map_lock() as the case currently.

Ah, good point. I assumed that this would also always return casted
pmd etc. pointers, and never pte pointers. Unfortunately, that doesn't
seem to be true for all architectures, e.g. ia64, parisc, (some) powerpc,
where they really do a pte_alloc_map() for some reason.

I guess that means you cannot simply cast the pmd pointer, as suggested,
although I really do not understand how any architecture can work with
real ptes for hugepages. But that's fair, s390 also does some things that
nobody would expect or understand for other architectures...

So, for using huge_pte_alloc() you'd also need some size, maybe
iterating over hstates with for_each_hstate() could be an option,
if they are already initialized at that point. Then you have the
size(s) with huge_page_size(hstate) and can actually call the
hugetlb tests for all supported sizes, and with proper pointer
from huge_pte_alloc().

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes
  2020-09-09  6:08         ` Aneesh Kumar K.V
@ 2020-09-09 11:16           ` Gerald Schaefer
  0 siblings, 0 replies; 15+ messages in thread
From: Gerald Schaefer @ 2020-09-09 11:16 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-s390, Anshuman Khandual, mpe, linux-mm, Vineet Gupta, akpm,
	linux-snps-arc, linuxppc-dev, linux-riscv, Gerald Schaefer

On Wed, 09 Sep 2020 11:38:39 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:

> Gerald Schaefer <gerald.schaefer@linux.ibm.com> writes:
> 
> > On Fri, 4 Sep 2020 18:01:15 +0200
> > Gerald Schaefer <gerald.schaefer@linux.ibm.com> wrote:
> >
> > [...]
> >> 
> >> BTW2, a quick test with this change (so far) made the issues on s390
> >> go away:
> >> 
> >> @@ -1069,7 +1074,7 @@ static int __init debug_vm_pgtable(void)
> >>         spin_unlock(ptl);
> >> 
> >>  #ifndef CONFIG_PPC_BOOK3S_64
> >> -       hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
> >> +       hugetlb_advanced_tests(mm, vma, (pte_t *) pmdp, pmd_aligned, vaddr, prot);
> >>  #endif
> >> 
> >>         spin_lock(&mm->page_table_lock);
> >> 
> >> That would more match the "pte_t pointer" usage for hugetlb code,
> >> i.e. just cast a pmd_t pointer to it. Also changed to pmd_aligned,
> >> but I think the root cause is the pte_t pointer.
> >> 
> >> Not entirely sure though if that would really be the correct fix.
> >> I somehow lost whatever little track I had about what these tests
> >> really want to check, and if that would still be valid with that
> >> change.
> >
> > Uh oh, wasn't aware that this (or some predecessor) already went
> > upstream, and broke our debug kernel today.
> 
> Not sure i followed the above. Are you finding that s390 kernel crash
> after this patch series or the original patchset? As noted in my patch
> the hugetlb test is broken and we should fix that. A quick fix is to
> comment out that test for s390 too as i have done for PPC64.

We see it with both, it basically is broken since there is a hugetlb
test using real pte pointers. It doesn't always show, depending on
random vaddr, so it slipped through earlier testing.

I guess we also would have had one or the other chance to notice
that earlier, through better review, or better reading of previous
mails. I must admit that I neglected this a bit.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes
  2020-09-09  8:08     ` Anshuman Khandual
@ 2020-09-09 11:36       ` Gerald Schaefer
  0 siblings, 0 replies; 15+ messages in thread
From: Gerald Schaefer @ 2020-09-09 11:36 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-s390, Aneesh Kumar K.V, linux-mm, Vineet Gupta, mpe, akpm,
	linux-snps-arc, linuxppc-dev, linux-riscv, Gerald Schaefer

On Wed, 9 Sep 2020 13:38:25 +0530
Anshuman Khandual <anshuman.khandual@arm.com> wrote:

> 
> 
> On 09/04/2020 08:56 PM, Gerald Schaefer wrote:
> > On Fri, 4 Sep 2020 12:18:05 +0530
> > Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> > 
> >>
> >>
> >> On 09/02/2020 05:12 PM, Aneesh Kumar K.V wrote:
> >>> This patch series includes fixes for debug_vm_pgtable test code so that
> >>> they follow page table updates rules correctly. The first two patches introduce
> >>> changes w.r.t ppc64. The patches are included in this series for completeness. We can
> >>> merge them via ppc64 tree if required.
> >>>
> >>> Hugetlb test is disabled on ppc64 because that needs larger change to satisfy
> >>> page table update rules.
> >>>
> >>> These tests are broken w.r.t page table update rules and results in kernel
> >>> crash as below. 
> >>>
> >>> [   21.083519] kernel BUG at arch/powerpc/mm/pgtable.c:304!
> >>> cpu 0x0: Vector: 700 (Program Check) at [c000000c6d1e76c0]
> >>>     pc: c00000000009a5ec: assert_pte_locked+0x14c/0x380
> >>>     lr: c0000000005eeeec: pte_update+0x11c/0x190
> >>>     sp: c000000c6d1e7950
> >>>    msr: 8000000002029033
> >>>   current = 0xc000000c6d172c80
> >>>   paca    = 0xc000000003ba0000   irqmask: 0x03   irq_happened: 0x01
> >>>     pid   = 1, comm = swapper/0
> >>> kernel BUG at arch/powerpc/mm/pgtable.c:304!
> >>> [link register   ] c0000000005eeeec pte_update+0x11c/0x190
> >>> [c000000c6d1e7950] 0000000000000001 (unreliable)
> >>> [c000000c6d1e79b0] c0000000005eee14 pte_update+0x44/0x190
> >>> [c000000c6d1e7a10] c000000001a2ca9c pte_advanced_tests+0x160/0x3d8
> >>> [c000000c6d1e7ab0] c000000001a2d4fc debug_vm_pgtable+0x7e8/0x1338
> >>> [c000000c6d1e7ba0] c0000000000116ec do_one_initcall+0xac/0x5f0
> >>> [c000000c6d1e7c80] c0000000019e4fac kernel_init_freeable+0x4dc/0x5a4
> >>> [c000000c6d1e7db0] c000000000012474 kernel_init+0x24/0x160
> >>> [c000000c6d1e7e20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
> >>>
> >>> With DEBUG_VM disabled
> >>>
> >>> [   20.530152] BUG: Kernel NULL pointer dereference on read at 0x00000000
> >>> [   20.530183] Faulting instruction address: 0xc0000000000df330
> >>> cpu 0x33: Vector: 380 (Data SLB Access) at [c000000c6d19f700]
> >>>     pc: c0000000000df330: memset+0x68/0x104
> >>>     lr: c00000000009f6d8: hash__pmdp_huge_get_and_clear+0xe8/0x1b0
> >>>     sp: c000000c6d19f990
> >>>    msr: 8000000002009033
> >>>    dar: 0
> >>>   current = 0xc000000c6d177480
> >>>   paca    = 0xc00000001ec4f400   irqmask: 0x03   irq_happened: 0x01
> >>>     pid   = 1, comm = swapper/0
> >>> [link register   ] c00000000009f6d8 hash__pmdp_huge_get_and_clear+0xe8/0x1b0
> >>> [c000000c6d19f990] c00000000009f748 hash__pmdp_huge_get_and_clear+0x158/0x1b0 (unreliable)
> >>> [c000000c6d19fa10] c0000000019ebf30 pmd_advanced_tests+0x1f0/0x378
> >>> [c000000c6d19fab0] c0000000019ed088 debug_vm_pgtable+0x79c/0x1244
> >>> [c000000c6d19fba0] c0000000000116ec do_one_initcall+0xac/0x5f0
> >>> [c000000c6d19fc80] c0000000019a4fac kernel_init_freeable+0x4dc/0x5a4
> >>> [c000000c6d19fdb0] c000000000012474 kernel_init+0x24/0x160
> >>> [c000000c6d19fe20] c00000000000cbd0 ret_from_kernel_thread+0x5c/0x6c
> >>>
> >>> Changes from v3:
> >>> * Address review feedback
> >>> * Move page table depost and withdraw patch after adding pmdlock to avoid bisect failure.
> >>
> >> This version
> >>
> >> - Builds on x86, arm64, s390, arc, powerpc and riscv (defconfig with DEBUG_VM_PGTABLE)
> >> - Runs on arm64 and x86 without any regression, atleast nothing that I have noticed
> >> - Will be great if this could get tested on s390, arc, riscv, ppc32 platforms as well
> > 
> > When I quickly tested v3, it worked fine, but now it turned out to
> > only work fine "sometimes", both v3 and v4. I need to look into it
> > further, but so far it seems related to the hugetlb_advanced_tests().
> > 
> > I guess there was already some discussion on this test, but we did
> > not receive all of the thread(s). Please always add at least
> > linux-s390@vger.kernel.org and maybe myself and Vasily Gorbik <gor@linux.ibm.com>
> > for further discussions.
> 
> IIRC, the V3 series previously had all these addresses copied properly
> but this version once again missed copying all required addresses.

I also had issues with the de.ibm.com address, which might also have
made some mails disappear, and others might simply have been overlooked
be me. Don't bother, my bad.

> 
> > 
> > That being said, sorry for duplications, this might already have been
> > discussed. Preliminary analysis showed that it only seems to go wrong
> > for certain random vaddr values. I cannot make any sense of that yet,
> > but what seems strange to me is that the hugetlb_advanced_tests()
> > take a (real) pte_t pointer as input, and also use that for all
> > kinds of operations (set_huge_pte_at, huge_ptep_get_and_clear, etc.).
> > 
> > Although all the hugetlb code in the kernel is (mis)using pte_t
> > pointers instead of the correct pmd/pud_t pointers like THP, that
> > is just for historic reasons. The pointers will actually never point
> > to a real pte_t (i.e. page table entry), but of course to a pmd
> > or pud entry, depending on hugepage size.
> 
> HugeTLB logically operates on a PTE entry irrespective of it's real
> page table level position. Nonetheless, IIUC, vaddr here should have
> been aligned to real page table level in which the entry is being
> mapped currently.

That goes back to the time where only x86 had hugepages, and they
have the same layout for pte/pmd/etc entries, so it simply didn't
matter that the code (mis)used pte pointers / entries. But even for
x86, the hugetlb pte pointers would never have pointed to real ptes,
but pmds instead. That's why I call it misuse.

s390 is very sensitive to page table level, and we can also determine
the level from the entry value, which is used for some primitives.
Others have implicit assumptions and calculations, which go wrong
if a wrong level is passed in, like in this case for
huge_ptep_get_and_clear(). Simply aligning vaddr / pfn will not
be enough to fix this for s390, it has to be a pmd/pud pointer.
Or, as you already mentioned, the result of huge_pte_alloc().

Furthermore, the pmd and pte layout are different, so we simply cannot
use any pte_xxx primitives for hugepages. That was the reason for
introducing huge_ptep_get(), which will do an implicit conversion
from the real pmd/pud entry to a "fake" pte entry, which can then
be used with such pte_xxx primitives. Before writing it back in
set_huge_pte_at() we then do the reverse conversion to a proper
pmd/pud again.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 13/13] mm/debug_vm_pgtable: Avoid none pte in pte_clear_test
       [not found]   ` <20200911021358.GA3656343@ubuntu-n2-xlarge-x86>
@ 2020-09-11  5:21     ` Aneesh Kumar K.V
  2020-09-23  3:14       ` Anshuman Khandual
  0 siblings, 1 reply; 15+ messages in thread
From: Aneesh Kumar K.V @ 2020-09-11  5:21 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Anshuman Khandual, mpe, linux-mm, akpm, linuxppc-dev, linux-riscv

Nathan Chancellor <natechancellor@gmail.com> writes:

> On Wed, Sep 02, 2020 at 05:12:22PM +0530, Aneesh Kumar K.V wrote:
>> pte_clear_tests operate on an existing pte entry. Make sure that
>> is not a none pte entry.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>  mm/debug_vm_pgtable.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>> 
>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>> index 9afa1354326b..c36530c69e33 100644
>> --- a/mm/debug_vm_pgtable.c
>> +++ b/mm/debug_vm_pgtable.c
>> @@ -542,9 +542,10 @@ static void __init pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp,
>>  #endif /* PAGETABLE_P4D_FOLDED */
>>  
>>  static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
>> -				   unsigned long vaddr)
>> +				   unsigned long pfn, unsigned long vaddr,
>> +				   pgprot_t prot)
>>  {
>> -	pte_t pte = ptep_get(ptep);
>> +	pte_t pte = pfn_pte(pfn, prot);
>>  
>>  	pr_debug("Validating PTE clear\n");
>>  	pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
>> @@ -1049,7 +1050,7 @@ static int __init debug_vm_pgtable(void)
>>  
>>  	ptl = pte_lockptr(mm, pmdp);
>>  	spin_lock(ptl);
>> -	pte_clear_tests(mm, ptep, vaddr);
>> +	pte_clear_tests(mm, ptep, pte_aligned, vaddr, prot);
>>  	pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
>>  	pte_unmap_unlock(ptep, ptl);
>>  
>> -- 
> This patch causes a panic at boot for RISC-V defconfig. The rootfs is here if it is needed:
> https://github.com/ClangBuiltLinux/boot-utils/blob/3b21a5b71451742866349ba4f18638c5a754e660/images/riscv/rootfs.cpio.zst
>
> $ make -skj"$(nproc)" ARCH=riscv CROSS_COMPILE=riscv64-linux- O=out/riscv distclean defconfig Image
>
> $ qemu-system-riscv64 -bios default -M virt -display none -initrd rootfs.cpio -kernel Image -m 512m -nodefaults -serial mon:stdio
> ...
>
> OpenSBI v0.6
>    ____                    _____ ____ _____
>   / __ \                  / ____|  _ \_   _|
>  | |  | |_ __   ___ _ __ | (___ | |_) || |
>  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
>  | |__| | |_) |  __/ | | |____) | |_) || |_
>   \____/| .__/ \___|_| |_|_____/|____/_____|
>         | |
>         |_|
>
> Platform Name          : QEMU Virt Machine
> Platform HART Features : RV64ACDFIMSU
> Platform Max HARTs     : 8
> Current Hart           : 0
> Firmware Base          : 0x80000000
> Firmware Size          : 120 KB
> Runtime SBI Version    : 0.2
>
> MIDELEG : 0x0000000000000222
> MEDELEG : 0x000000000000b109
> PMP0    : 0x0000000080000000-0x000000008001ffff (A)
> PMP1    : 0x0000000000000000-0xffffffffffffffff (A,R,W,X)
> [    0.000000] Linux version 5.9.0-rc4-next-20200910 (nathan@ubuntu-n2-xlarge-x86) (riscv64-linux-gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.35) #1 SMP Thu Sep 10 19:10:43 MST 2020
> ...
> [    0.294593] NET: Registered protocol family 17
> [    0.295781] 9pnet: Installing 9P2000 support
> [    0.296153] Key type dns_resolver registered
> [    0.296694] debug_vm_pgtable: [debug_vm_pgtable         ]: Validating architecture page table helpers
> [    0.297635] Unable to handle kernel paging request at virtual address 0a7fffe01dafefc8
> [    0.298029] Oops [#1]
> [    0.298153] Modules linked in:
> [    0.298433] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4-next-20200910 #1
> [    0.298792] epc: ffffffe000205afc ra : ffffffe0008be0aa sp : ffffffe01ae73d40
> [    0.299078]  gp : ffffffe0010b9b48 tp : ffffffe01ae68000 t0 : ffffffe008152000
> [    0.299362]  t1 : 0000000000000000 t2 : 0000000000000000 s0 : ffffffe01ae73d60
> [    0.299648]  s1 : bffffffffffffffb a0 : 0a7fffe01dafefc8 a1 : bffffffffffffffb
> [    0.299948]  a2 : ffffffe0010a2698 a3 : 0000000000000001 a4 : 0000000000000003
> [    0.300231]  a5 : 0000000000000800 a6 : fffffffff0000080 a7 : 000000001b642000
> [    0.300521]  s2 : ffffffe0081517b8 s3 : ffffffe008150a80 s4 : ffffffe01af30000
> [    0.300806]  s5 : ffffffe01f8ca9b8 s6 : ffffffe008150000 s7 : ffffffe0010bb100
> [    0.301161]  s8 : ffffffe0010bb108 s9 : 0000000000080202 s10: ffffffe0010bb928
> [    0.301481]  s11: 000000002008085b t3 : 0000000000000000 t4 : 0000000000000000
> [    0.301722]  t5 : 0000000000000000 t6 : ffffffe008150000
> [    0.301947] status: 0000000000000120 badaddr: 0a7fffe01dafefc8 cause: 000000000000000f
> [    0.302569] ---[ end trace 7ffb153d816164cf ]---
> [    0.302797] note: swapper/0[1] exited with preempt_count 1
> [    0.303101] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [    0.303614] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---


I guess it is the combination of a valid pte and usage of
RANDOM_ORVALUE. The below change get the kernel to boot. Can somebody
faimilar with riscv pte format take a look at the RANDOM_ORVALUE?

modified   mm/debug_vm_pgtable.c
@@ -548,7 +548,7 @@ static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
 	pte_t pte = pfn_pte(pfn, prot);
 
 	pr_debug("Validating PTE clear\n");
-	pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
+//	pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
 	set_pte_at(mm, vaddr, ptep, pte);
 	barrier();
 	pte_clear(mm, vaddr, ptep);


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 13/13] mm/debug_vm_pgtable: Avoid none pte in pte_clear_test
  2020-09-11  5:21     ` [PATCH v4 13/13] mm/debug_vm_pgtable: Avoid none pte in pte_clear_test Aneesh Kumar K.V
@ 2020-09-23  3:14       ` Anshuman Khandual
  0 siblings, 0 replies; 15+ messages in thread
From: Anshuman Khandual @ 2020-09-23  3:14 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Nathan Chancellor
  Cc: linux-mm, akpm, linuxppc-dev, linux-riscv, mpe



On 09/11/2020 10:51 AM, Aneesh Kumar K.V wrote:
> Nathan Chancellor <natechancellor@gmail.com> writes:
> 
>> On Wed, Sep 02, 2020 at 05:12:22PM +0530, Aneesh Kumar K.V wrote:
>>> pte_clear_tests operate on an existing pte entry. Make sure that
>>> is not a none pte entry.
>>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>> ---
>>>  mm/debug_vm_pgtable.c | 7 ++++---
>>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>> index 9afa1354326b..c36530c69e33 100644
>>> --- a/mm/debug_vm_pgtable.c
>>> +++ b/mm/debug_vm_pgtable.c
>>> @@ -542,9 +542,10 @@ static void __init pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp,
>>>  #endif /* PAGETABLE_P4D_FOLDED */
>>>  
>>>  static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
>>> -				   unsigned long vaddr)
>>> +				   unsigned long pfn, unsigned long vaddr,
>>> +				   pgprot_t prot)
>>>  {
>>> -	pte_t pte = ptep_get(ptep);
>>> +	pte_t pte = pfn_pte(pfn, prot);
>>>  
>>>  	pr_debug("Validating PTE clear\n");
>>>  	pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
>>> @@ -1049,7 +1050,7 @@ static int __init debug_vm_pgtable(void)
>>>  
>>>  	ptl = pte_lockptr(mm, pmdp);
>>>  	spin_lock(ptl);
>>> -	pte_clear_tests(mm, ptep, vaddr);
>>> +	pte_clear_tests(mm, ptep, pte_aligned, vaddr, prot);
>>>  	pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
>>>  	pte_unmap_unlock(ptep, ptl);
>>>  
>>> -- 
>> This patch causes a panic at boot for RISC-V defconfig. The rootfs is here if it is needed:
>> https://github.com/ClangBuiltLinux/boot-utils/blob/3b21a5b71451742866349ba4f18638c5a754e660/images/riscv/rootfs.cpio.zst
>>
>> $ make -skj"$(nproc)" ARCH=riscv CROSS_COMPILE=riscv64-linux- O=out/riscv distclean defconfig Image
>>
>> $ qemu-system-riscv64 -bios default -M virt -display none -initrd rootfs.cpio -kernel Image -m 512m -nodefaults -serial mon:stdio
>> ...
>>
>> OpenSBI v0.6
>>    ____                    _____ ____ _____
>>   / __ \                  / ____|  _ \_   _|
>>  | |  | |_ __   ___ _ __ | (___ | |_) || |
>>  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
>>  | |__| | |_) |  __/ | | |____) | |_) || |_
>>   \____/| .__/ \___|_| |_|_____/|____/_____|
>>         | |
>>         |_|
>>
>> Platform Name          : QEMU Virt Machine
>> Platform HART Features : RV64ACDFIMSU
>> Platform Max HARTs     : 8
>> Current Hart           : 0
>> Firmware Base          : 0x80000000
>> Firmware Size          : 120 KB
>> Runtime SBI Version    : 0.2
>>
>> MIDELEG : 0x0000000000000222
>> MEDELEG : 0x000000000000b109
>> PMP0    : 0x0000000080000000-0x000000008001ffff (A)
>> PMP1    : 0x0000000000000000-0xffffffffffffffff (A,R,W,X)
>> [    0.000000] Linux version 5.9.0-rc4-next-20200910 (nathan@ubuntu-n2-xlarge-x86) (riscv64-linux-gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.35) #1 SMP Thu Sep 10 19:10:43 MST 2020
>> ...
>> [    0.294593] NET: Registered protocol family 17
>> [    0.295781] 9pnet: Installing 9P2000 support
>> [    0.296153] Key type dns_resolver registered
>> [    0.296694] debug_vm_pgtable: [debug_vm_pgtable         ]: Validating architecture page table helpers
>> [    0.297635] Unable to handle kernel paging request at virtual address 0a7fffe01dafefc8
>> [    0.298029] Oops [#1]
>> [    0.298153] Modules linked in:
>> [    0.298433] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4-next-20200910 #1
>> [    0.298792] epc: ffffffe000205afc ra : ffffffe0008be0aa sp : ffffffe01ae73d40
>> [    0.299078]  gp : ffffffe0010b9b48 tp : ffffffe01ae68000 t0 : ffffffe008152000
>> [    0.299362]  t1 : 0000000000000000 t2 : 0000000000000000 s0 : ffffffe01ae73d60
>> [    0.299648]  s1 : bffffffffffffffb a0 : 0a7fffe01dafefc8 a1 : bffffffffffffffb
>> [    0.299948]  a2 : ffffffe0010a2698 a3 : 0000000000000001 a4 : 0000000000000003
>> [    0.300231]  a5 : 0000000000000800 a6 : fffffffff0000080 a7 : 000000001b642000
>> [    0.300521]  s2 : ffffffe0081517b8 s3 : ffffffe008150a80 s4 : ffffffe01af30000
>> [    0.300806]  s5 : ffffffe01f8ca9b8 s6 : ffffffe008150000 s7 : ffffffe0010bb100
>> [    0.301161]  s8 : ffffffe0010bb108 s9 : 0000000000080202 s10: ffffffe0010bb928
>> [    0.301481]  s11: 000000002008085b t3 : 0000000000000000 t4 : 0000000000000000
>> [    0.301722]  t5 : 0000000000000000 t6 : ffffffe008150000
>> [    0.301947] status: 0000000000000120 badaddr: 0a7fffe01dafefc8 cause: 000000000000000f
>> [    0.302569] ---[ end trace 7ffb153d816164cf ]---
>> [    0.302797] note: swapper/0[1] exited with preempt_count 1
>> [    0.303101] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>> [    0.303614] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
> 
> 
> I guess it is the combination of a valid pte and usage of
> RANDOM_ORVALUE. The below change get the kernel to boot. Can somebody
> faimilar with riscv pte format take a look at the RANDOM_ORVALUE?
> 
> modified   mm/debug_vm_pgtable.c
> @@ -548,7 +548,7 @@ static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
>  	pte_t pte = pfn_pte(pfn, prot);
>  
>  	pr_debug("Validating PTE clear\n");
> -	pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
> +//	pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
>  	set_pte_at(mm, vaddr, ptep, pte);
>  	barrier();
>  	pte_clear(mm, vaddr, ptep);

Do we have a fix for this problem ? Otherwise we just risk going into
the next release with this regression on riscv platforms.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 13/13] mm/debug_vm_pgtable: Avoid none pte in pte_clear_test
       [not found]   ` <20201011200258.GA91021@roeck-us.net>
@ 2020-10-12  4:29     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2020-10-12  4:29 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Anshuman Khandual, mpe, linux-mm, akpm, linuxppc-dev, linux-riscv

Guenter Roeck <linux@roeck-us.net> writes:

> On Wed, Sep 02, 2020 at 05:12:22PM +0530, Aneesh Kumar K.V wrote:
>> pte_clear_tests operate on an existing pte entry. Make sure that
>> is not a none pte entry.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>
> This patch causes all riscv64 images to crash. Reverting it
> as well as the follow-up patch fixes the problem, but there are
> still several warning messages starting with
> 	BUG kmem_cache (Not tainted): Freechain corrupt
> I did not try to track down this other problem.
>
> A detailed crash log is at
> 	https://kerneltests.org/builders/qemu-riscv64-next/builds/523/steps/qemubuildcommand/logs/stdio
>
> Bisect log is attached.


https://lore.kernel.org/linux-mm/87zh5wx51b.fsf@linux.ibm.com

This was mentioned earlier. The RANDOM_OR_VALUE used is interacting with
some of the riscv page table accessors. 

-aneesh

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, back to index

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200902114222.181353-1-aneesh.kumar@linux.ibm.com>
2020-09-04  6:48 ` [PATCH v4 00/13] mm/debug_vm_pgtable fixes Anshuman Khandual
2020-09-04 15:26   ` Gerald Schaefer
2020-09-04 16:01     ` Gerald Schaefer
2020-09-04 17:53       ` Gerald Schaefer
2020-09-09  8:38         ` Anshuman Khandual
2020-09-08 15:39       ` Gerald Schaefer
2020-09-09  6:08         ` Aneesh Kumar K.V
2020-09-09 11:16           ` Gerald Schaefer
2020-09-09  8:15       ` Anshuman Khandual
2020-09-09 11:10         ` Gerald Schaefer
2020-09-09  8:08     ` Anshuman Khandual
2020-09-09 11:36       ` Gerald Schaefer
     [not found] ` <20200902114222.181353-14-aneesh.kumar@linux.ibm.com>
     [not found]   ` <20200911021358.GA3656343@ubuntu-n2-xlarge-x86>
2020-09-11  5:21     ` [PATCH v4 13/13] mm/debug_vm_pgtable: Avoid none pte in pte_clear_test Aneesh Kumar K.V
2020-09-23  3:14       ` Anshuman Khandual
     [not found]   ` <20201011200258.GA91021@roeck-us.net>
2020-10-12  4:29     ` Aneesh Kumar K.V

Linux-RISC-V Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-riscv/0 linux-riscv/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-riscv linux-riscv/ https://lore.kernel.org/linux-riscv \
		linux-riscv@lists.infradead.org
	public-inbox-index linux-riscv

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-riscv


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git