linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
@ 2018-06-29 20:42 Larry Finger
  2018-06-29 21:01 ` Linus Torvalds
  2018-06-30  9:31 ` christophe leroy
  0 siblings, 2 replies; 11+ messages in thread
From: Larry Finger @ 2018-06-29 20:42 UTC (permalink / raw)
  To: Matthew Wilcox, Kirill A. Shutemov, Vlastimil Babka,
	Christoph Lameter, Dave Hansen, Jérôme Glisse,
	Lai Jiangshan, Martin Schwidefsky, Pekka Enberg, Randy Dunlap,
	Andrey Ryabinin, Andrew Morton, Linus Torvalds,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, LKML

My PowerBook G4 Aluminum crashes on boot with 4.18-rcX kernels with a kernel BUG 
at include/linux/page-flags.h:700! The problem was bisected to commit 
1d40a5ea01d5 ("mm: mark pages in use for page tables"). It is not possible to 
capture the bug with anything other than a camera. The first few lines of the 
traceback are as follows:

free_pgd_range+0x19c/0x30c (unreliable)
free_pgtables_0xa0/0xb0
exit_pmap+0xf4/0x16c
mmput+0x64/0xf0
do_exit+0x33c/0x89c
oops_end+0x13c/0x144
_exception_pkey+0x58/0x128
ret_from_except_full+0x0/0x4
--- interrupt: 700 at free_pgd_range+0x19c/0x30c
     LR = free_pgd_range+0x19c/0x30c
free_pgtables+0xa/0xb
exit_mnap+0xf4/0x16c
mmput+0x64/0xf0
flush_old_exec+0x490/0x550

I have more information regarding this BUG. Line 700 of page-flags.h is the 
macro PAGE_TYPE_OPS(Table, table). For further debugging, I manually expanded 
the macro, and found that the bug line is VM_BUG_ON_PAGE(!PageTable(page), page) 
in routine __ClearPageTable(), which is called from pgtable_page_dtor() in 
include/linux/mm.h. I also added a printk call to PageTable() that logs 
page->page_type. The routine was called twice. The first had page_type of 
0xfffffbff, which would have been expected for a . The second call had 
0xffffffff, which led to the BUG.

Larry

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
  2018-06-29 20:42 [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5 Larry Finger
@ 2018-06-29 21:01 ` Linus Torvalds
  2018-06-29 21:46   ` Kirill A. Shutemov
                     ` (3 more replies)
  2018-06-30  9:31 ` christophe leroy
  1 sibling, 4 replies; 11+ messages in thread
From: Linus Torvalds @ 2018-06-29 21:01 UTC (permalink / raw)
  To: Larry Finger
  Cc: Matthew Wilcox, Kirill A. Shutemov, Vlastimil Babka,
	Christoph Lameter, Dave Hansen, Jerome Glisse, Lai Jiangshan,
	Martin Schwidefsky, Pekka Enberg, Randy Dunlap, Andrey Ryabinin,
	Andrew Morton, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, ppc-dev, Linux Kernel Mailing List

On Fri, Jun 29, 2018 at 1:42 PM Larry Finger <Larry.Finger@lwfinger.net> wrote:
>
> I have more information regarding this BUG. Line 700 of page-flags.h is the
> macro PAGE_TYPE_OPS(Table, table). For further debugging, I manually expanded
> the macro, and found that the bug line is VM_BUG_ON_PAGE(!PageTable(page), page)
> in routine __ClearPageTable(), which is called from pgtable_page_dtor() in
> include/linux/mm.h. I also added a printk call to PageTable() that logs
> page->page_type. The routine was called twice. The first had page_type of
> 0xfffffbff, which would have been expected for a . The second call had
> 0xffffffff, which led to the BUG.

So it looks to me like the tear-down of the page tables first found a
page that is indeed a page table, and cleared the page table bit
(well, it set it - the bits are reversed).

Then it took an exception (that "interrupt: 700") and that causes
do_exit() again, and it tries to free the same page table - and now
it's no longer marked as a page table, because it already went through
the __ClearPageTable() dance once.

So on the second path through, it catches that "the bit already said
it wasn't a page table" and does the BUG.

But the real question is what the problem was the *first* time around.
I assume that has scrolled off the screen? This part:

  _exception_pkey+0x58/0x128
  ret_from_except_full+0x0/0x4
  --- interrupt: 700 at free_pgd_range+0x19c/0x30c
       LR = free_pgd_range+0x19c/0x30c
  free_pgtables+0xa/0xb
  exit_mnap+0xf4/0x16c
  mmput+0x64/0xf0

Does reverting that commit 1d40a5ea01d5 make everything work for you?
Because if so, judging by the deafening silence on this so far, I
think that's what we should do.

That said, can some ppc person who knows the 32-bit ppc code and maybe
knows what that "interrupt: 700" means talk about that oddity in the
trace, please?

                    Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
  2018-06-29 21:01 ` Linus Torvalds
@ 2018-06-29 21:46   ` Kirill A. Shutemov
  2018-06-30  2:22     ` Linus Torvalds
  2018-06-30  6:23     ` Aneesh Kumar K.V
  2018-06-30  0:55   ` Segher Boessenkool
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: Kirill A. Shutemov @ 2018-06-29 21:46 UTC (permalink / raw)
  To: Linus Torvalds, Aneesh Kumar K.V
  Cc: Larry Finger, Matthew Wilcox, Kirill A. Shutemov,
	Vlastimil Babka, Christoph Lameter, Dave Hansen, Jerome Glisse,
	Lai Jiangshan, Martin Schwidefsky, Pekka Enberg, Randy Dunlap,
	Andrey Ryabinin, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, ppc-dev,
	Linux Kernel Mailing List

On Fri, Jun 29, 2018 at 02:01:46PM -0700, Linus Torvalds wrote:
> On Fri, Jun 29, 2018 at 1:42 PM Larry Finger <Larry.Finger@lwfinger.net> wrote:
> >
> > I have more information regarding this BUG. Line 700 of page-flags.h is the
> > macro PAGE_TYPE_OPS(Table, table). For further debugging, I manually expanded
> > the macro, and found that the bug line is VM_BUG_ON_PAGE(!PageTable(page), page)
> > in routine __ClearPageTable(), which is called from pgtable_page_dtor() in
> > include/linux/mm.h. I also added a printk call to PageTable() that logs
> > page->page_type. The routine was called twice. The first had page_type of
> > 0xfffffbff, which would have been expected for a . The second call had
> > 0xffffffff, which led to the BUG.
> 
> So it looks to me like the tear-down of the page tables first found a
> page that is indeed a page table, and cleared the page table bit
> (well, it set it - the bits are reversed).
> 
> Then it took an exception (that "interrupt: 700") and that causes
> do_exit() again, and it tries to free the same page table - and now
> it's no longer marked as a page table, because it already went through
> the __ClearPageTable() dance once.
> 
> So on the second path through, it catches that "the bit already said
> it wasn't a page table" and does the BUG.
> 
> But the real question is what the problem was the *first* time around.

+Aneesh.

Looks like pgtable_page_dtor() gets called in __pte_free_tlb() path twice.
Once in __pte_free_tlb() itself and the second time in pgtable_free().

Would this help?

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 6a6673907e45..e7a2f0e6b695 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -137,7 +137,6 @@ static inline void pgtable_free_tlb(struct mmu_gather *tlb,
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
 				  unsigned long address)
 {
-	pgtable_page_dtor(table);
 	pgtable_free_tlb(tlb, page_address(table), 0);
 }
 #endif /* _ASM_POWERPC_BOOK3S_32_PGALLOC_H */
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 1707781d2f20..30a13b80fd58 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -139,7 +139,6 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
 				  unsigned long address)
 {
 	tlb_flush_pgtable(tlb, address);
-	pgtable_page_dtor(table);
 	pgtable_free_tlb(tlb, page_address(table), 0);
 }
 #endif /* _ASM_POWERPC_PGALLOC_32_H */
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
  2018-06-29 21:01 ` Linus Torvalds
  2018-06-29 21:46   ` Kirill A. Shutemov
@ 2018-06-30  0:55   ` Segher Boessenkool
  2018-06-30  2:38   ` Denise Finger
  2018-07-02  4:16   ` Michael Ellerman
  3 siblings, 0 replies; 11+ messages in thread
From: Segher Boessenkool @ 2018-06-30  0:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Larry Finger, Randy Dunlap, Dave Hansen, Lai Jiangshan,
	Linux Kernel Mailing List, Matthew Wilcox, Pekka Enberg,
	Jerome Glisse, Paul Mackerras, Kirill A. Shutemov,
	Martin Schwidefsky, Andrey Ryabinin, Christoph Lameter, ppc-dev,
	Andrew Morton, Vlastimil Babka

On Fri, Jun 29, 2018 at 02:01:46PM -0700, Linus Torvalds wrote:
> On Fri, Jun 29, 2018 at 1:42 PM Larry Finger <Larry.Finger@lwfinger.net> wrote:
> But the real question is what the problem was the *first* time around.
> I assume that has scrolled off the screen? This part:
> 
>   _exception_pkey+0x58/0x128
>   ret_from_except_full+0x0/0x4
>   --- interrupt: 700 at free_pgd_range+0x19c/0x30c
>        LR = free_pgd_range+0x19c/0x30c
>   free_pgtables+0xa/0xb
>   exit_mnap+0xf4/0x16c
>   mmput+0x64/0xf0
> 
> Does reverting that commit 1d40a5ea01d5 make everything work for you?
> Because if so, judging by the deafening silence on this so far, I
> think that's what we should do.
> 
> That said, can some ppc person who knows the 32-bit ppc code and maybe
> knows what that "interrupt: 700" means talk about that oddity in the
> trace, please?

700 is "program interrupt"; here it probably means a BUG() happened (which
does a trap instruction, which causes a 700).  The stuff that scrolled away
should tell more.


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
  2018-06-29 21:46   ` Kirill A. Shutemov
@ 2018-06-30  2:22     ` Linus Torvalds
  2018-06-30  6:23     ` Aneesh Kumar K.V
  1 sibling, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2018-06-30  2:22 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: aneesh.kumar, Larry Finger, Matthew Wilcox, Kirill A. Shutemov,
	Vlastimil Babka, Christoph Lameter, Dave Hansen, Jerome Glisse,
	Lai Jiangshan, Martin Schwidefsky, Pekka Enberg, Randy Dunlap,
	Andrey Ryabinin, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, ppc-dev,
	Linux Kernel Mailing List

On Fri, Jun 29, 2018 at 2:46 PM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> Looks like pgtable_page_dtor() gets called in __pte_free_tlb() path twice.
> Once in __pte_free_tlb() itself and the second time in pgtable_free().

Ahh, that would certainly do it,. and explains why this hits ppc32 but
not x86, for example.

                Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
  2018-06-29 21:01 ` Linus Torvalds
  2018-06-29 21:46   ` Kirill A. Shutemov
  2018-06-30  0:55   ` Segher Boessenkool
@ 2018-06-30  2:38   ` Denise Finger
  2018-07-02  4:16   ` Michael Ellerman
  3 siblings, 0 replies; 11+ messages in thread
From: Denise Finger @ 2018-06-30  2:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthew Wilcox, Kirill A. Shutemov, Vlastimil Babka,
	Christoph Lameter, Dave Hansen, Jerome Glisse, Lai Jiangshan,
	Martin Schwidefsky, Pekka Enberg, Randy Dunlap, Andrey Ryabinin,
	Andrew Morton, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, ppc-dev, Linux Kernel Mailing List

On 06/29/2018 04:01 PM, Linus Torvalds wrote:
> On Fri, Jun 29, 2018 at 1:42 PM Larry Finger <Larry.Finger@lwfinger.net> wrote:
>>
>> I have more information regarding this BUG. Line 700 of page-flags.h is the
>> macro PAGE_TYPE_OPS(Table, table). For further debugging, I manually expanded
>> the macro, and found that the bug line is VM_BUG_ON_PAGE(!PageTable(page), page)
>> in routine __ClearPageTable(), which is called from pgtable_page_dtor() in
>> include/linux/mm.h. I also added a printk call to PageTable() that logs
>> page->page_type. The routine was called twice. The first had page_type of
>> 0xfffffbff, which would have been expected for a . The second call had
>> 0xffffffff, which led to the BUG.
> 
> So it looks to me like the tear-down of the page tables first found a
> page that is indeed a page table, and cleared the page table bit
> (well, it set it - the bits are reversed).
> 
> Then it took an exception (that "interrupt: 700") and that causes
> do_exit() again, and it tries to free the same page table - and now
> it's no longer marked as a page table, because it already went through
> the __ClearPageTable() dance once.
> 
> So on the second path through, it catches that "the bit already said
> it wasn't a page table" and does the BUG.
> 
> But the real question is what the problem was the *first* time around.
> I assume that has scrolled off the screen? This part:
> 
>    _exception_pkey+0x58/0x128
>    ret_from_except_full+0x0/0x4
>    --- interrupt: 700 at free_pgd_range+0x19c/0x30c
>         LR = free_pgd_range+0x19c/0x30c
>    free_pgtables+0xa/0xb
>    exit_mnap+0xf4/0x16c
>    mmput+0x64/0xf0
> 
> Does reverting that commit 1d40a5ea01d5 make everything work for you?
> Because if so, judging by the deafening silence on this so far, I
> think that's what we should do.
> 
> That said, can some ppc person who knows the 32-bit ppc code and maybe
> knows what that "interrupt: 700" means talk about that oddity in the
> trace, please?

The deafening silence may be due to my having an old Microsoft address for 
Matthew Wilcox in my first posting. He should now have received the BUG report, 
and he may have some suggestions. Yes, reverting commit 1d40a5ea01d5 does permit 
the box to boot.

Kirill's patch also works, which seems like a better solution. If any other 
architecture bugs on boot, at least we will know where to look. :)

@Kirill: You may add a Reported-by: and Tested-by: Larry Finger 
<Larry.Finger@lwfinger.net> to the patch.

Thanks for the help,

Larry


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
  2018-06-29 21:46   ` Kirill A. Shutemov
  2018-06-30  2:22     ` Linus Torvalds
@ 2018-06-30  6:23     ` Aneesh Kumar K.V
  1 sibling, 0 replies; 11+ messages in thread
From: Aneesh Kumar K.V @ 2018-06-30  6:23 UTC (permalink / raw)
  To: Kirill A. Shutemov, Linus Torvalds
  Cc: Larry Finger, Matthew Wilcox, Kirill A. Shutemov,
	Vlastimil Babka, Christoph Lameter, Dave Hansen, Jerome Glisse,
	Lai Jiangshan, Martin Schwidefsky, Pekka Enberg, Randy Dunlap,
	Andrey Ryabinin, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, ppc-dev,
	Linux Kernel Mailing List

On 06/30/2018 03:16 AM, Kirill A. Shutemov wrote:
> On Fri, Jun 29, 2018 at 02:01:46PM -0700, Linus Torvalds wrote:
>> On Fri, Jun 29, 2018 at 1:42 PM Larry Finger <Larry.Finger@lwfinger.net> wrote:
>>>
>>> I have more information regarding this BUG. Line 700 of page-flags.h is the
>>> macro PAGE_TYPE_OPS(Table, table). For further debugging, I manually expanded
>>> the macro, and found that the bug line is VM_BUG_ON_PAGE(!PageTable(page), page)
>>> in routine __ClearPageTable(), which is called from pgtable_page_dtor() in
>>> include/linux/mm.h. I also added a printk call to PageTable() that logs
>>> page->page_type. The routine was called twice. The first had page_type of
>>> 0xfffffbff, which would have been expected for a . The second call had
>>> 0xffffffff, which led to the BUG.
>>
>> So it looks to me like the tear-down of the page tables first found a
>> page that is indeed a page table, and cleared the page table bit
>> (well, it set it - the bits are reversed).
>>
>> Then it took an exception (that "interrupt: 700") and that causes
>> do_exit() again, and it tries to free the same page table - and now
>> it's no longer marked as a page table, because it already went through
>> the __ClearPageTable() dance once.
>>
>> So on the second path through, it catches that "the bit already said
>> it wasn't a page table" and does the BUG.
>>
>> But the real question is what the problem was the *first* time around.
> 
> +Aneesh.
> 
> Looks like pgtable_page_dtor() gets called in __pte_free_tlb() path twice.
> Once in __pte_free_tlb() itself and the second time in pgtable_free().
> 
> Would this help?
> 
> diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h b/arch/powerpc/include/asm/book3s/32/pgalloc.h
> index 6a6673907e45..e7a2f0e6b695 100644
> --- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
> +++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
> @@ -137,7 +137,6 @@ static inline void pgtable_free_tlb(struct mmu_gather *tlb,
>   static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
>   				  unsigned long address)
>   {
> -	pgtable_page_dtor(table);
>   	pgtable_free_tlb(tlb, page_address(table), 0);
>   }
>   #endif /* _ASM_POWERPC_BOOK3S_32_PGALLOC_H */
> diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h
> index 1707781d2f20..30a13b80fd58 100644
> --- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
> +++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
> @@ -139,7 +139,6 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
>   				  unsigned long address)
>   {
>   	tlb_flush_pgtable(tlb, address);
> -	pgtable_page_dtor(table);
>   	pgtable_free_tlb(tlb, page_address(table), 0);
>   }
>   #endif /* _ASM_POWERPC_PGALLOC_32_H */
> 


https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-June/175015.html

Also part of pull request from Michael Ellerman

-aneesh


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
  2018-06-29 20:42 [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5 Larry Finger
  2018-06-29 21:01 ` Linus Torvalds
@ 2018-06-30  9:31 ` christophe leroy
  2018-06-30 16:25   ` Larry Finger
  1 sibling, 1 reply; 11+ messages in thread
From: christophe leroy @ 2018-06-30  9:31 UTC (permalink / raw)
  To: Larry Finger, Matthew Wilcox, Kirill A. Shutemov,
	Vlastimil Babka, Christoph Lameter, Dave Hansen,
	Jérôme Glisse, Lai Jiangshan, Martin Schwidefsky,
	Pekka Enberg, Randy Dunlap, Andrey Ryabinin, Andrew Morton,
	Linus Torvalds, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linuxppc-dev, LKML



Le 29/06/2018 à 22:42, Larry Finger a écrit :
> My PowerBook G4 Aluminum crashes on boot with 4.18-rcX kernels with a 
> kernel BUG at include/linux/page-flags.h:700! The problem was bisected 
> to commit 1d40a5ea01d5 ("mm: mark pages in use for page tables"). It is 
> not possible to capture the bug with anything other than a camera. The 
> first few lines of the traceback are as follows:
> 
> free_pgd_range+0x19c/0x30c (unreliable)
> free_pgtables_0xa0/0xb0
> exit_pmap+0xf4/0x16c
> mmput+0x64/0xf0
> do_exit+0x33c/0x89c
> oops_end+0x13c/0x144
> _exception_pkey+0x58/0x128
> ret_from_except_full+0x0/0x4
> --- interrupt: 700 at free_pgd_range+0x19c/0x30c
>      LR = free_pgd_range+0x19c/0x30c
> free_pgtables+0xa/0xb
> exit_mnap+0xf4/0x16c
> mmput+0x64/0xf0
> flush_old_exec+0x490/0x550
> 
> I have more information regarding this BUG. Line 700 of page-flags.h is 
> the macro PAGE_TYPE_OPS(Table, table). For further debugging, I manually 
> expanded the macro, and found that the bug line is 
> VM_BUG_ON_PAGE(!PageTable(page), page) in routine __ClearPageTable(), 
> which is called from pgtable_page_dtor() in include/linux/mm.h. I also 
> added a printk call to PageTable() that logs page->page_type. The 
> routine was called twice. The first had page_type of 0xfffffbff, which 
> would have been expected for a . The second call had 0xffffffff, which 
> led to the BUG.
> 

Oh, seems to be the one I noticed and told Aneesh about 
(https://patchwork.ozlabs.org/patch/922771/)

Aneesh provided the patch https://patchwork.ozlabs.org/patch/934111/ for 
it, does it help ?

Christophe

> Larry

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
  2018-06-30  9:31 ` christophe leroy
@ 2018-06-30 16:25   ` Larry Finger
  0 siblings, 0 replies; 11+ messages in thread
From: Larry Finger @ 2018-06-30 16:25 UTC (permalink / raw)
  To: christophe leroy, Matthew Wilcox, Kirill A. Shutemov,
	Vlastimil Babka, Christoph Lameter, Dave Hansen,
	Jérôme Glisse, Lai Jiangshan, Martin Schwidefsky,
	Pekka Enberg, Randy Dunlap, Andrey Ryabinin, Andrew Morton,
	Linus Torvalds, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linuxppc-dev, LKML

On 06/30/2018 04:31 AM, christophe leroy wrote:
> 
> 
> Le 29/06/2018 à 22:42, Larry Finger a écrit :
>> My PowerBook G4 Aluminum crashes on boot with 4.18-rcX kernels with a kernel 
>> BUG at include/linux/page-flags.h:700! The problem was bisected to commit 
>> 1d40a5ea01d5 ("mm: mark pages in use for page tables"). It is not possible to 
>> capture the bug with anything other than a camera. The first few lines of the 
>> traceback are as follows:
>>
>> free_pgd_range+0x19c/0x30c (unreliable)
>> free_pgtables_0xa0/0xb0
>> exit_pmap+0xf4/0x16c
>> mmput+0x64/0xf0
>> do_exit+0x33c/0x89c
>> oops_end+0x13c/0x144
>> _exception_pkey+0x58/0x128
>> ret_from_except_full+0x0/0x4
>> --- interrupt: 700 at free_pgd_range+0x19c/0x30c
>>      LR = free_pgd_range+0x19c/0x30c
>> free_pgtables+0xa/0xb
>> exit_mnap+0xf4/0x16c
>> mmput+0x64/0xf0
>> flush_old_exec+0x490/0x550
>>
>> I have more information regarding this BUG. Line 700 of page-flags.h is the 
>> macro PAGE_TYPE_OPS(Table, table). For further debugging, I manually expanded 
>> the macro, and found that the bug line is VM_BUG_ON_PAGE(!PageTable(page), 
>> page) in routine __ClearPageTable(), which is called from pgtable_page_dtor() 
>> in include/linux/mm.h. I also added a printk call to PageTable() that logs 
>> page->page_type. The routine was called twice. The first had page_type of 
>> 0xfffffbff, which would have been expected for a . The second call had 
>> 0xffffffff, which led to the BUG.
>>
> 
> Oh, seems to be the one I noticed and told Aneesh about 
> (https://patchwork.ozlabs.org/patch/922771/)
> 
> Aneesh provided the patch https://patchwork.ozlabs.org/patch/934111/ for it, 
> does it help ?

Yes, those changes fix the problem.

Larry


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
  2018-06-29 21:01 ` Linus Torvalds
                     ` (2 preceding siblings ...)
  2018-06-30  2:38   ` Denise Finger
@ 2018-07-02  4:16   ` Michael Ellerman
  2018-07-02 20:51     ` Larry Finger
  3 siblings, 1 reply; 11+ messages in thread
From: Michael Ellerman @ 2018-07-02  4:16 UTC (permalink / raw)
  To: Linus Torvalds, Larry Finger
  Cc: Matthew Wilcox, Kirill A. Shutemov, Vlastimil Babka,
	Christoph Lameter, Dave Hansen, Jerome Glisse, Lai Jiangshan,
	Martin Schwidefsky, Pekka Enberg, Randy Dunlap, Andrey Ryabinin,
	Andrew Morton, Benjamin Herrenschmidt, Paul Mackerras, ppc-dev,
	Linux Kernel Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Fri, Jun 29, 2018 at 1:42 PM Larry Finger <Larry.Finger@lwfinger.net> wrote:
>>
>> I have more information regarding this BUG. Line 700 of page-flags.h is the
>> macro PAGE_TYPE_OPS(Table, table). For further debugging, I manually expanded
>> the macro, and found that the bug line is VM_BUG_ON_PAGE(!PageTable(page), page)
>> in routine __ClearPageTable(), which is called from pgtable_page_dtor() in
>> include/linux/mm.h. I also added a printk call to PageTable() that logs
>> page->page_type. The routine was called twice. The first had page_type of
>> 0xfffffbff, which would have been expected for a . The second call had
>> 0xffffffff, which led to the BUG.
>
> So it looks to me like the tear-down of the page tables first found a
> page that is indeed a page table, and cleared the page table bit
> (well, it set it - the bits are reversed).
...
>
> That said, can some ppc person who knows the 32-bit ppc code and maybe
> knows what that "interrupt: 700" means talk about that oddity in the
> trace, please?

I think everyone else answered your questions here, and it should be
fixed now in your tree.

Larry let me know if you're still seeing a crash with 4.18-rc3.

cheers

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5
  2018-07-02  4:16   ` Michael Ellerman
@ 2018-07-02 20:51     ` Larry Finger
  0 siblings, 0 replies; 11+ messages in thread
From: Larry Finger @ 2018-07-02 20:51 UTC (permalink / raw)
  To: Michael Ellerman, Linus Torvalds
  Cc: Matthew Wilcox, Kirill A. Shutemov, Vlastimil Babka,
	Christoph Lameter, Dave Hansen, Jerome Glisse, Lai Jiangshan,
	Martin Schwidefsky, Pekka Enberg, Randy Dunlap, Andrey Ryabinin,
	Andrew Morton, Benjamin Herrenschmidt, Paul Mackerras, ppc-dev,
	Linux Kernel Mailing List

On 07/01/2018 11:16 PM, Michael Ellerman wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>> On Fri, Jun 29, 2018 at 1:42 PM Larry Finger <Larry.Finger@lwfinger.net> wrote:
>>>
>>> I have more information regarding this BUG. Line 700 of page-flags.h is the
>>> macro PAGE_TYPE_OPS(Table, table). For further debugging, I manually expanded
>>> the macro, and found that the bug line is VM_BUG_ON_PAGE(!PageTable(page), page)
>>> in routine __ClearPageTable(), which is called from pgtable_page_dtor() in
>>> include/linux/mm.h. I also added a printk call to PageTable() that logs
>>> page->page_type. The routine was called twice. The first had page_type of
>>> 0xfffffbff, which would have been expected for a . The second call had
>>> 0xffffffff, which led to the BUG.
>>
>> So it looks to me like the tear-down of the page tables first found a
>> page that is indeed a page table, and cleared the page table bit
>> (well, it set it - the bits are reversed).
> ...
>>
>> That said, can some ppc person who knows the 32-bit ppc code and maybe
>> knows what that "interrupt: 700" means talk about that oddity in the
>> trace, please?
> 
> I think everyone else answered your questions here, and it should be
> fixed now in your tree.
> 
> Larry let me know if you're still seeing a crash with 4.18-rc3.

The problem is fixed in 4.18-rc3. Thanks to all that helped.

Larry


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-07-02 20:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-29 20:42 [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5 Larry Finger
2018-06-29 21:01 ` Linus Torvalds
2018-06-29 21:46   ` Kirill A. Shutemov
2018-06-30  2:22     ` Linus Torvalds
2018-06-30  6:23     ` Aneesh Kumar K.V
2018-06-30  0:55   ` Segher Boessenkool
2018-06-30  2:38   ` Denise Finger
2018-07-02  4:16   ` Michael Ellerman
2018-07-02 20:51     ` Larry Finger
2018-06-30  9:31 ` christophe leroy
2018-06-30 16:25   ` Larry Finger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).