linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* BUG_ON() in pfn_swap_entry_to_page()
@ 2022-03-22 17:25 Sebastian Andrzej Siewior
  2022-03-22 17:41 ` Matthew Wilcox
  2022-03-22 18:53 ` Ritesh Harjani
  0 siblings, 2 replies; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-03-22 17:25 UTC (permalink / raw)
  To: linux-mm; +Cc: Andrew Morton, Thomas Gleixner

Run into

|kernel BUG at include/linux/swapops.h:258!
|invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
|CPU: 10 PID: 1741 Comm: malloc-mem Not tainted 5.17.0-next-20220322 #19
|Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
|RIP: 0010:migration_entry_wait_on_locked+0x233/0x2c0
|Code: 66 90 e9 8a fe ff ff 66 90 e9 3d ff ff ff 48 8b 43 08 a8 01 0f 85 87 00 00 00 66 90 48 89 d8 48 8b 00 a8 01 0f 85 23 fe ff ff <0f> 0b 48 83 e8 01 48 89 c3 e9 29 fe ff ff  65 48 8b 04 25 00 c0 01
|RSP: 0000:ffffc9000a44bd78 EFLAGS: 00010246
|RAX: 002fffff80000000 RBX: ffffea0014a20000 RCX: ffffffff5aeffe00
|RDX: ffff88811b42e200 RSI: 0000000000000000 RDI: 000000000000001f
|RBP: ffff88811b42e200 R08: 0000000000000001 R09: 0000000000000000
|R10: 0000000000000000 R11: 0000000000000000 R12: 00007f7131200000
|R13: 0000000000000c48 R14: 0400000000000000 R15: fff000003fffffff
|FS:  00007f7145570580(0000) GS:ffff888a2aa80000(0000) knlGS:0000000000000000
|CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
|CR2: 00007f7131200000 CR3: 00000006f42c4006 CR4: 00000000000606e0
|Call Trace:
| <TASK>
| __handle_mm_fault+0x2bf/0x1810
| handle_mm_fault+0x136/0x3e0
| exc_page_fault+0x1d2/0x850
| asm_exc_page_fault+0x22/0x30

This is the BUG_ON() in pfn_swap_entry_to_page(). The box itself has no
swapspace configured. Is this something known?

Sebastian


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG_ON() in pfn_swap_entry_to_page()
  2022-03-22 17:25 BUG_ON() in pfn_swap_entry_to_page() Sebastian Andrzej Siewior
@ 2022-03-22 17:41 ` Matthew Wilcox
  2022-03-23  0:29   ` Alistair Popple
  2022-03-22 18:53 ` Ritesh Harjani
  1 sibling, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2022-03-22 17:41 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-mm, Andrew Morton, Thomas Gleixner, Alistair Popple

On Tue, Mar 22, 2022 at 06:25:02PM +0100, Sebastian Andrzej Siewior wrote:
> Run into
> 
> |kernel BUG at include/linux/swapops.h:258!
> |RIP: 0010:migration_entry_wait_on_locked+0x233/0x2c0
> |Call Trace:
> | <TASK>
> | __handle_mm_fault+0x2bf/0x1810
> | handle_mm_fault+0x136/0x3e0
> | exc_page_fault+0x1d2/0x850
> | asm_exc_page_fault+0x22/0x30
> 
> This is the BUG_ON() in pfn_swap_entry_to_page(). The box itself has no
> swapspace configured. Is this something known?

It's not been reported before, but it seems obvious why it triggers.
I think it's ffa65753c431 "mm/migrate.c: rework migration_entry_wait()
to not take a pageref".

There's a lot of inlining going on, so let's write this out:

__handle_mm_fault
  handle_pte_fault
    do_swap_page():

        entry = pte_to_swp_entry(vmf->orig_pte);
        if (unlikely(non_swap_entry(entry))) {
                if (is_migration_entry(entry)) {
                        migration_entry_wait(vma->vm_mm, vmf->pmd,
                                             vmf->address);

We don't (and can't) have the underlying page locked here.  We're just
waiting for it to finish migration.

      migration_entry_wait
        migration_entry_wait_on_locked():

        struct folio *folio = page_folio(pfn_swap_entry_to_page(entry));

	  pfn_swap_entry_to_page():

        struct page *p = pfn_to_page(swp_offset(entry));

        /*
         * Any use of migration entries may only occur while the
         * corresponding page is locked
         */
        BUG_ON(is_migration_entry(entry) && !PageLocked(p));

So why did we not hit this earlier?  Surely somebody's testing page
migration?  I'm not familiar with the migration code, so maybe this
assertion is obsolete and it's now legitimate to use a migration entry
while the page is unlocked.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG_ON() in pfn_swap_entry_to_page()
  2022-03-22 17:25 BUG_ON() in pfn_swap_entry_to_page() Sebastian Andrzej Siewior
  2022-03-22 17:41 ` Matthew Wilcox
@ 2022-03-22 18:53 ` Ritesh Harjani
  1 sibling, 0 replies; 11+ messages in thread
From: Ritesh Harjani @ 2022-03-22 18:53 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-mm, Andrew Morton, Thomas Gleixner

On 22/03/22 06:25PM, Sebastian Andrzej Siewior wrote:
> Run into
>
> |kernel BUG at include/linux/swapops.h:258!
> |invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> |CPU: 10 PID: 1741 Comm: malloc-mem Not tainted 5.17.0-next-20220322 #19
> |Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
> |RIP: 0010:migration_entry_wait_on_locked+0x233/0x2c0
> |Code: 66 90 e9 8a fe ff ff 66 90 e9 3d ff ff ff 48 8b 43 08 a8 01 0f 85 87 00 00 00 66 90 48 89 d8 48 8b 00 a8 01 0f 85 23 fe ff ff <0f> 0b 48 83 e8 01 48 89 c3 e9 29 fe ff ff  65 48 8b 04 25 00 c0 01
> |RSP: 0000:ffffc9000a44bd78 EFLAGS: 00010246
> |RAX: 002fffff80000000 RBX: ffffea0014a20000 RCX: ffffffff5aeffe00
> |RDX: ffff88811b42e200 RSI: 0000000000000000 RDI: 000000000000001f
> |RBP: ffff88811b42e200 R08: 0000000000000001 R09: 0000000000000000
> |R10: 0000000000000000 R11: 0000000000000000 R12: 00007f7131200000
> |R13: 0000000000000c48 R14: 0400000000000000 R15: fff000003fffffff
> |FS:  00007f7145570580(0000) GS:ffff888a2aa80000(0000) knlGS:0000000000000000
> |CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> |CR2: 00007f7131200000 CR3: 00000006f42c4006 CR4: 00000000000606e0
> |Call Trace:
> | <TASK>
> | __handle_mm_fault+0x2bf/0x1810
> | handle_mm_fault+0x136/0x3e0
> | exc_page_fault+0x1d2/0x850
> | asm_exc_page_fault+0x22/0x30

Based on call stack, this looks similar to the issue reported here [1].
I just happen to stumble upon it, but haven't checked any details of why this
can trigger though.

[1]: https://lore.kernel.org/all/20220317080504.GC735@xsang-OptiPlex-9020/


-ritesh

>
> This is the BUG_ON() in pfn_swap_entry_to_page(). The box itself has no
> swapspace configured. Is this something known?
>
> Sebastian
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG_ON() in pfn_swap_entry_to_page()
  2022-03-22 17:41 ` Matthew Wilcox
@ 2022-03-23  0:29   ` Alistair Popple
  2022-03-24  3:24     ` Matthew Wilcox
  0 siblings, 1 reply; 11+ messages in thread
From: Alistair Popple @ 2022-03-23  0:29 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Sebastian Andrzej Siewior, linux-mm, Andrew Morton, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 3831 bytes --]

Matthew Wilcox <willy@infradead.org> writes:

> On Tue, Mar 22, 2022 at 06:25:02PM +0100, Sebastian Andrzej Siewior wrote:
>> Run into
>>
>> |kernel BUG at include/linux/swapops.h:258!
>> |RIP: 0010:migration_entry_wait_on_locked+0x233/0x2c0
>> |Call Trace:
>> | <TASK>
>> | __handle_mm_fault+0x2bf/0x1810
>> | handle_mm_fault+0x136/0x3e0
>> | exc_page_fault+0x1d2/0x850
>> | asm_exc_page_fault+0x22/0x30
>>
>> This is the BUG_ON() in pfn_swap_entry_to_page(). The box itself has no
>> swapspace configured. Is this something known?
>
> It's not been reported before, but it seems obvious why it triggers.
> I think it's ffa65753c431 "mm/migrate.c: rework migration_entry_wait()
> to not take a pageref".

It's obvious the BUG_ON() is because the page isn't locked, but it's less
obvious to me at least why we have a migration entry pointing to an unlocked
page.

> There's a lot of inlining going on, so let's write this out:
>
> __handle_mm_fault
>   handle_pte_fault
>     do_swap_page():
>
>         entry = pte_to_swp_entry(vmf->orig_pte);
>         if (unlikely(non_swap_entry(entry))) {
>                 if (is_migration_entry(entry)) {
>                         migration_entry_wait(vma->vm_mm, vmf->pmd,
>                                              vmf->address);
>
> We don't (and can't) have the underlying page locked here.  We're just
> waiting for it to finish migration.

Why can't the page be locked here? Waiting for migration to finish is equivalent
to waiting for the page to be unlocked. __unmap_and_move() follows this rough
sequence:

__unmap_and_move()
	if (!trylock_page(page)) {
		if (!force || mode == MIGRATE_ASYNC)
			goto out;

/*
* It's not safe for direct compaction to call lock_page.
* For example, during page readahead pages are added locked
* to the LRU. Later, when the IO completes the pages are
* marked uptodate and unlocked. However, the queueing
* could be merging multiple pages for one bio (e.g.
* mpage_readahead). If an allocation happens for the
* second or third page, the process can end up locking
* the same page twice and deadlocking. Rather than
* trying to be clever about what pages can be locked,
* avoid the use of lock_page for direct compaction
* altogether.
 */
if (current->flags & PF_MEMALLOC)
    goto out;

    lock_page(page);
}

[...]

if (unlikely(!trylock_page(newpage)))
    goto out_unlock;

[...]

} else if (page_mapped(page)) {
    /* Establish migration ptes */
    VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma,
            page);
    try_to_migrate(page, 0);
    page_was_mapped = true;
}

if (!page_mapped(page))
    rc = move_to_new_page(newpage, page, mode);

if (page_was_mapped)
    remove_migration_ptes(page,
        rc == MIGRATEPAGE_SUCCESS ? newpage : page, false);

Both the source and destination pages (page and newpage respectively) are locked
prior to installing migration entries in try_to_migrate() and unlocked after
migration entries are removed. In order to remove a migration entry the PTL is
required. Therefore while holding the PTL for a migration entry it should be
impossible to observe it pointing to an unlocked page.

>       migration_entry_wait
>         migration_entry_wait_on_locked():
>
>         struct folio *folio = page_folio(pfn_swap_entry_to_page(entry));
>
> 	  pfn_swap_entry_to_page():
>
>         struct page *p = pfn_to_page(swp_offset(entry));
>
>         /*
>          * Any use of migration entries may only occur while the
>          * corresponding page is locked
>          */
>         BUG_ON(is_migration_entry(entry) && !PageLocked(p));
>
> So why did we not hit this earlier?  Surely somebody's testing page
> migration?  I'm not familiar with the migration code, so maybe this
> assertion is obsolete and it's now legitimate to use a migration entry
> while the page is unlocked.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG_ON() in pfn_swap_entry_to_page()
  2022-03-23  0:29   ` Alistair Popple
@ 2022-03-24  3:24     ` Matthew Wilcox
  2022-03-24  3:51       ` Alistair Popple
  0 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2022-03-24  3:24 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Sebastian Andrzej Siewior, linux-mm, Andrew Morton, Thomas Gleixner

On Wed, Mar 23, 2022 at 11:29:12AM +1100, Alistair Popple wrote:
> Matthew Wilcox <willy@infradead.org> writes:
> > It's not been reported before, but it seems obvious why it triggers.
> > I think it's ffa65753c431 "mm/migrate.c: rework migration_entry_wait()
> > to not take a pageref".
> 
> It's obvious the BUG_ON() is because the page isn't locked, but it's less
> obvious to me at least why we have a migration entry pointing to an unlocked
> page.
> 
> > There's a lot of inlining going on, so let's write this out:
> >
> > __handle_mm_fault
> >   handle_pte_fault
> >     do_swap_page():
> >
> >         entry = pte_to_swp_entry(vmf->orig_pte);
> >         if (unlikely(non_swap_entry(entry))) {
> >                 if (is_migration_entry(entry)) {
> >                         migration_entry_wait(vma->vm_mm, vmf->pmd,
> >                                              vmf->address);
> >
> > We don't (and can't) have the underlying page locked here.  We're just
> > waiting for it to finish migration.
> 
> Why can't the page be locked here? Waiting for migration to finish is equivalent
> to waiting for the page to be unlocked. __unmap_and_move() follows this rough
> sequence:
> 
> __unmap_and_move()
> 	if (!trylock_page(page)) {
> 		if (!force || mode == MIGRATE_ASYNC)
> 			goto out;
> 
> /*
> * It's not safe for direct compaction to call lock_page.
> * For example, during page readahead pages are added locked
> * to the LRU. Later, when the IO completes the pages are
> * marked uptodate and unlocked. However, the queueing
> * could be merging multiple pages for one bio (e.g.
> * mpage_readahead). If an allocation happens for the
> * second or third page, the process can end up locking
> * the same page twice and deadlocking. Rather than
> * trying to be clever about what pages can be locked,
> * avoid the use of lock_page for direct compaction
> * altogether.
>  */
> if (current->flags & PF_MEMALLOC)
>     goto out;
> 
>     lock_page(page);
> }
> 
> [...]
> 
> if (unlikely(!trylock_page(newpage)))
>     goto out_unlock;
> 
> [...]
> 
> } else if (page_mapped(page)) {
>     /* Establish migration ptes */
>     VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma,
>             page);
>     try_to_migrate(page, 0);
>     page_was_mapped = true;
> }
> 
> if (!page_mapped(page))
>     rc = move_to_new_page(newpage, page, mode);
> 
> if (page_was_mapped)
>     remove_migration_ptes(page,
>         rc == MIGRATEPAGE_SUCCESS ? newpage : page, false);
> 
> Both the source and destination pages (page and newpage respectively) are locked
> prior to installing migration entries in try_to_migrate() and unlocked after
> migration entries are removed. In order to remove a migration entry the PTL is
> required. Therefore while holding the PTL for a migration entry it should be
> impossible to observe it pointing to an unlocked page.

Hm, right.  We check the PTE initially, then take the PTL in
__migration_entry_wait() and then re-check that it's still a migration
entry.  I can't think of how we're seeing this ...


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG_ON() in pfn_swap_entry_to_page()
  2022-03-24  3:24     ` Matthew Wilcox
@ 2022-03-24  3:51       ` Alistair Popple
  2024-04-24 19:45         ` Felix Kuehling
  0 siblings, 1 reply; 11+ messages in thread
From: Alistair Popple @ 2022-03-24  3:51 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Sebastian Andrzej Siewior, linux-mm, Andrew Morton, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 3423 bytes --]

Matthew Wilcox <willy@infradead.org> writes:

> On Wed, Mar 23, 2022 at 11:29:12AM +1100, Alistair Popple wrote:
>> Matthew Wilcox <willy@infradead.org> writes:
>> > It's not been reported before, but it seems obvious why it triggers.
>> > I think it's ffa65753c431 "mm/migrate.c: rework migration_entry_wait()
>> > to not take a pageref".
>>
>> It's obvious the BUG_ON() is because the page isn't locked, but it's less
>> obvious to me at least why we have a migration entry pointing to an unlocked
>> page.
>>
>> > There's a lot of inlining going on, so let's write this out:
>> >
>> > __handle_mm_fault
>> >   handle_pte_fault
>> >     do_swap_page():
>> >
>> >         entry = pte_to_swp_entry(vmf->orig_pte);
>> >         if (unlikely(non_swap_entry(entry))) {
>> >                 if (is_migration_entry(entry)) {
>> >                         migration_entry_wait(vma->vm_mm, vmf->pmd,
>> >                                              vmf->address);
>> >
>> > We don't (and can't) have the underlying page locked here.  We're just
>> > waiting for it to finish migration.
>>
>> Why can't the page be locked here? Waiting for migration to finish is equivalent
>> to waiting for the page to be unlocked. __unmap_and_move() follows this rough
>> sequence:
>>
>> __unmap_and_move()
>> 	if (!trylock_page(page)) {
>> 		if (!force || mode `= MIGRATE_ASYNC)
>> 			goto out;
>>
>> /*
>> * It's not safe for direct compaction to call lock_page.
>> * For example, during page readahead pages are added locked
>> * to the LRU. Later, when the IO completes the pages are
>> * marked uptodate and unlocked. However, the queueing
>> * could be merging multiple pages for one bio (e.g.
>> * mpage_readahead). If an allocation happens for the
>> * second or third page, the process can end up locking
>> * the same page twice and deadlocking. Rather than
>> * trying to be clever about what pages can be locked,
>> * avoid the use of lock_page for direct compaction
>> * altogether.
>>  */
>> if (current->flags & PF_MEMALLOC)
>>     goto out;
>>
>>     lock_page(page);
>> }
>>
>> [...]
>>
>> if (unlikely(!trylock_page(newpage)))
>>     goto out_unlock;
>>
>> [...]
>>
>> } else if (page_mapped(page)) {
>>     /* Establish migration ptes */
>>     VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma,
>>             page);
>>     try_to_migrate(page, 0);
>>     page_was_mapped = true;
>> }
>>
>> if (!page_mapped(page))
>>     rc = move_to_new_page(newpage, page, mode);
>>
>> if (page_was_mapped)
>>     remove_migration_ptes(page,
>>         rc =' MIGRATEPAGE_SUCCESS ? newpage : page, false);
>>
>> Both the source and destination pages (page and newpage respectively) are locked
>> prior to installing migration entries in try_to_migrate() and unlocked after
>> migration entries are removed. In order to remove a migration entry the PTL is
>> required. Therefore while holding the PTL for a migration entry it should be
>> impossible to observe it pointing to an unlocked page.
>
> Hm, right.  We check the PTE initially, then take the PTL in
> __migration_entry_wait() and then re-check that it's still a migration
> entry.  I can't think of how we're seeing this ...

Right. The only thing I can think of is if we somehow didn't find all the
migration entries to remove once migration is complete. There have been a few
changes to page_vma_mapped_walk() but I haven't spotted anything yet that would
cause this.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG_ON() in pfn_swap_entry_to_page()
  2022-03-24  3:51       ` Alistair Popple
@ 2024-04-24 19:45         ` Felix Kuehling
  2024-04-25  9:32           ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Felix Kuehling @ 2024-04-24 19:45 UTC (permalink / raw)
  To: Alistair Popple, Matthew Wilcox
  Cc: Sebastian Andrzej Siewior, linux-mm, Andrew Morton, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 4302 bytes --]

Sorry for top-posting. I'm resurrecting an old thread here because I 
think I ran into the same problem with this assertion failing on Linux 6.7:

static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry)
{
         struct page *p = pfn_to_page(swp_offset_pfn(entry));

         /*
          * Any use of migration entries may only occur while the
          * corresponding page is locked
          */
-->     BUG_ON(is_migration_entry(entry) && !PageLocked(p));

         return p;
}

It looks like this thread just fizzled two years ago. Did anything ever 
come of this?

Maybe I should add that I saw this in a pre-silicon test environment. 
I've never seen this on real hardware. Maybe something timing-sensitive.

Regards,
   Felix


On 2022-03-23 23:51, Alistair Popple wrote:
> Matthew Wilcox<willy@infradead.org>  writes:
>
>> On Wed, Mar 23, 2022 at 11:29:12AM +1100, Alistair Popple wrote:
>>> Matthew Wilcox<willy@infradead.org>  writes:
>>>> It's not been reported before, but it seems obvious why it triggers.
>>>> I think it's ffa65753c431 "mm/migrate.c: rework migration_entry_wait()
>>>> to not take a pageref".
>>> It's obvious the BUG_ON() is because the page isn't locked, but it's less
>>> obvious to me at least why we have a migration entry pointing to an unlocked
>>> page.
>>>
>>>> There's a lot of inlining going on, so let's write this out:
>>>>
>>>> __handle_mm_fault
>>>>    handle_pte_fault
>>>>      do_swap_page():
>>>>
>>>>          entry = pte_to_swp_entry(vmf->orig_pte);
>>>>          if (unlikely(non_swap_entry(entry))) {
>>>>                  if (is_migration_entry(entry)) {
>>>>                          migration_entry_wait(vma->vm_mm, vmf->pmd,
>>>>                                               vmf->address);
>>>>
>>>> We don't (and can't) have the underlying page locked here.  We're just
>>>> waiting for it to finish migration.
>>> Why can't the page be locked here? Waiting for migration to finish is equivalent
>>> to waiting for the page to be unlocked. __unmap_and_move() follows this rough
>>> sequence:
>>>
>>> __unmap_and_move()
>>> 	if (!trylock_page(page)) {
>>> 		if (!force || mode `= MIGRATE_ASYNC)
>>> 			goto out;
>>>
>>> /*
>>> * It's not safe for direct compaction to call lock_page.
>>> * For example, during page readahead pages are added locked
>>> * to the LRU. Later, when the IO completes the pages are
>>> * marked uptodate and unlocked. However, the queueing
>>> * could be merging multiple pages for one bio (e.g.
>>> * mpage_readahead). If an allocation happens for the
>>> * second or third page, the process can end up locking
>>> * the same page twice and deadlocking. Rather than
>>> * trying to be clever about what pages can be locked,
>>> * avoid the use of lock_page for direct compaction
>>> * altogether.
>>>   */
>>> if (current->flags & PF_MEMALLOC)
>>>      goto out;
>>>
>>>      lock_page(page);
>>> }
>>>
>>> [...]
>>>
>>> if (unlikely(!trylock_page(newpage)))
>>>      goto out_unlock;
>>>
>>> [...]
>>>
>>> } else if (page_mapped(page)) {
>>>      /* Establish migration ptes */
>>>      VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma,
>>>              page);
>>>      try_to_migrate(page, 0);
>>>      page_was_mapped = true;
>>> }
>>>
>>> if (!page_mapped(page))
>>>      rc = move_to_new_page(newpage, page, mode);
>>>
>>> if (page_was_mapped)
>>>      remove_migration_ptes(page,
>>>          rc =' MIGRATEPAGE_SUCCESS ? newpage : page, false);
>>>
>>> Both the source and destination pages (page and newpage respectively) are locked
>>> prior to installing migration entries in try_to_migrate() and unlocked after
>>> migration entries are removed. In order to remove a migration entry the PTL is
>>> required. Therefore while holding the PTL for a migration entry it should be
>>> impossible to observe it pointing to an unlocked page.
>> Hm, right.  We check the PTE initially, then take the PTL in
>> __migration_entry_wait() and then re-check that it's still a migration
>> entry.  I can't think of how we're seeing this ...
> Right. The only thing I can think of is if we somehow didn't find all the
> migration entries to remove once migration is complete. There have been a few
> changes to page_vma_mapped_walk() but I haven't spotted anything yet that would
> cause this.

[-- Attachment #2: Type: text/html, Size: 5570 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG_ON() in pfn_swap_entry_to_page()
  2024-04-24 19:45         ` Felix Kuehling
@ 2024-04-25  9:32           ` David Hildenbrand
  2024-04-25 14:33             ` Felix Kuehling
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-04-25  9:32 UTC (permalink / raw)
  To: Felix Kuehling, Alistair Popple, Matthew Wilcox
  Cc: Sebastian Andrzej Siewior, linux-mm, Andrew Morton, Thomas Gleixner

On 24.04.24 21:45, Felix Kuehling wrote:
> Sorry for top-posting. I'm resurrecting an old thread here because I 
> think I ran into the same problem with this assertion failing on Linux 6.7:
> 
> static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry)
> {
>          struct page *p = pfn_to_page(swp_offset_pfn(entry));
> 
>          /*
>           * Any use of migration entries may only occur while the
>           * corresponding page is locked
>           */
> -->     BUG_ON(is_migration_entry(entry) && !PageLocked(p));
> 
>          return p;
> }
> 
> It looks like this thread just fizzled two years ago. Did anything ever 
> come of this?
> 
> Maybe I should add that I saw this in a pre-silicon test environment. 
> I've never seen this on real hardware. Maybe something timing-sensitive.

In the past, it indicated a swp pte corruption, that would e.g., mess up 
the stored PFN ot the swap entry type.

On which call chain do you see that?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG_ON() in pfn_swap_entry_to_page()
  2024-04-25  9:32           ` David Hildenbrand
@ 2024-04-25 14:33             ` Felix Kuehling
  2024-04-26  8:49               ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Felix Kuehling @ 2024-04-25 14:33 UTC (permalink / raw)
  To: David Hildenbrand, Alistair Popple, Matthew Wilcox
  Cc: Sebastian Andrzej Siewior, linux-mm, Andrew Morton, Thomas Gleixner



On 2024-04-25 5:32, David Hildenbrand wrote:
> On 24.04.24 21:45, Felix Kuehling wrote:
>> Sorry for top-posting. I'm resurrecting an old thread here because I 
>> think I ran into the same problem with this assertion failing on Linux 
>> 6.7:
>>
>> static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry)
>> {
>>          struct page *p = pfn_to_page(swp_offset_pfn(entry));
>>
>>          /*
>>           * Any use of migration entries may only occur while the
>>           * corresponding page is locked
>>           */
>> -->     BUG_ON(is_migration_entry(entry) && !PageLocked(p));
>>
>>          return p;
>> }
>>
>> It looks like this thread just fizzled two years ago. Did anything 
>> ever come of this?
>>
>> Maybe I should add that I saw this in a pre-silicon test environment. 
>> I've never seen this on real hardware. Maybe something timing-sensitive.
> 
> In the past, it indicated a swp pte corruption, that would e.g., mess up 
> the stored PFN ot the swap entry type.
> 
> On which call chain do you see that?
> 

This is the backtrace, it's coming from hmm_range_fault. Looks like the 
swap entries are from migrated DEVICE_PRIVATE pages.

[Apr 3 20:11] ------------[ cut here ]------------
[  +0.000041] kernel BUG at include/linux/swapops.h:466!
[  +0.000691] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  +0.000342] CPU: 2 PID: 49 Comm: kworker/2:1 Not tainted 
6.7.0-kfd-compute-rocm-npi-186 #1
[  +0.000556] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  +0.000703] Workqueue: events amdgpu_irq_handle_ih_soft [amdgpu]
[  +0.000501] RIP: 0010:migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000389] Code: fe ff ff 48 8d 7c 24 07 e8 02 7e f0 ff e9 58 fe ff 
ff 48 8b 43 08 a8 01 75 3f 66 90 48 89 d8 48 8b 00 a8 01 0f 85 f1 fd ff 
ff <0f> 0b 48 8d 58 ff e9 f7 fd ff ff 48 89 d8 f7 c3 ff 0f 00 00 75 df
[  +0.001161] RSP: 0018:ffffb211c01bb788 EFLAGS: 00010246
[  +0.000339] RAX: 017fff8000080018 RBX: fffff682c40ce8c0 RCX: 
0000000000000001
[  +0.000463] RDX: 0000000000000000 RSI: ffff977a45034840 RDI: 
000000000000001a
[  +0.000454] RBP: ffff977a45034840 R08: 68000000001033a3 R09: 
0000000000000030
[  +0.000451] R10: ffffb211c01bb6a8 R11: 0000000000000001 R12: 
ffff977a46bd1318
[  +0.000461] R13: 0000000000000003 R14: 4000000000000000 R15: 
ffffb211c01bb9b8
[  +0.000454] FS:  0000000000000000(0000) GS:ffff977dafd00000(0000) 
knlGS:0000000000000000
[  +0.000518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000372] CR2: 00007fa2d1cba000 CR3: 00000001030d2004 CR4: 
0000000000770ef0
[  +0.000453] PKRU: 55555554
[  +0.000182] Call Trace:
[  +0.000171]  <TASK>
[  +0.000147]  ? die+0x37/0x90
[  +0.000211]  ? do_trap+0xe0/0x110
[  +0.000221]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000351]  ? do_error_trap+0x98/0x120
[  +0.000252]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000346]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000355]  ? exc_invalid_op+0x52/0x70
[  +0.000254]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000345]  ? asm_exc_invalid_op+0x1a/0x20
[  +0.000274]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000361]  ? migration_entry_wait+0x4e/0x160
[  +0.000293]  ? lock_release+0x119/0x260
[  +0.000255]  migration_entry_wait+0x105/0x160
[  +0.000290]  hmm_vma_walk_pmd+0x822/0x8a0
[  +0.000263]  walk_pgd_range+0x40b/0x900
[  +0.000268]  __walk_page_range+0x205/0x220
[  +0.000267]  walk_page_range+0x13a/0x250
[  +0.000259]  hmm_range_fault+0x5d/0xb0
[  +0.000247]  amdgpu_hmm_range_get_pages+0x144/0x240 [amdgpu]
[  +0.000491]  svm_range_validate_and_map+0x2e5/0x1310 [amdgpu]
[  +0.000479]  ? svm_migrate_ram_to_vram+0x360/0x630 [amdgpu]
[  +0.000453]  svm_range_restore_pages+0xd1e/0x11b0 [amdgpu]
[  +0.000462]  amdgpu_vm_handle_fault+0xc0/0x370 [amdgpu]
[  +0.000428]  gmc_v9_0_process_interrupt+0x10d/0x670 [amdgpu]
[  +0.000463]  ? __wake_up+0x21/0x60
[  +0.000427]  ? find_held_lock+0x2b/0x80
[  +0.000435]  ? process_one_work+0x16a/0x4b0
[  +0.000446]  ? amdgpu_irq_dispatch+0xc2/0x220 [amdgpu]
[  +0.000596]  amdgpu_irq_dispatch+0xc2/0x220 [amdgpu]
[  +0.000579]  amdgpu_ih_process+0x7d/0xe0 [amdgpu]
[  +0.000561]  process_one_work+0x1d1/0x4b0
[  +0.000435]  worker_thread+0x1d3/0x3d0
[  +0.000400]  ? rescuer_thread+0x360/0x360
[  +0.000410]  kthread+0xee/0x120
[  +0.000367]  ? kthread_complete_and_exit+0x20/0x20
[  +0.000452]  ret_from_fork+0x31/0x50
[  +0.000371]  ? kthread_complete_and_exit+0x20/0x20
[  +0.000448]  ret_from_fork_asm+0x11/0x20
[  +0.000390]  </TASK>
[  +0.000281] Modules linked in: amdgpu drm_ttm_helper ttm video wmi 
drm_exec drm_suballoc_helper amdxcp drm_buddy gpu_sched 
drm_display_helper fuse ip_tables x_tables virtio_gpu virtio_dma_buf 
drm_shmem_helper drm_kms_helper drm drm_panel_orientation_quirks
[  +0.002319] ---[ end trace 0000000000000000 ]---


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG_ON() in pfn_swap_entry_to_page()
  2024-04-25 14:33             ` Felix Kuehling
@ 2024-04-26  8:49               ` David Hildenbrand
  2024-04-26 14:56                 ` Felix Kuehling
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-04-26  8:49 UTC (permalink / raw)
  To: Felix Kuehling, Alistair Popple, Matthew Wilcox
  Cc: Sebastian Andrzej Siewior, linux-mm, Andrew Morton, Thomas Gleixner

On 25.04.24 16:33, Felix Kuehling wrote:
> 
> 
> On 2024-04-25 5:32, David Hildenbrand wrote:
>> On 24.04.24 21:45, Felix Kuehling wrote:
>>> Sorry for top-posting. I'm resurrecting an old thread here because I
>>> think I ran into the same problem with this assertion failing on Linux
>>> 6.7:
>>>
>>> static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry)
>>> {
>>>           struct page *p = pfn_to_page(swp_offset_pfn(entry));
>>>
>>>           /*
>>>            * Any use of migration entries may only occur while the
>>>            * corresponding page is locked
>>>            */
>>> -->     BUG_ON(is_migration_entry(entry) && !PageLocked(p));
>>>
>>>           return p;
>>> }
>>>
>>> It looks like this thread just fizzled two years ago. Did anything
>>> ever come of this?
>>>
>>> Maybe I should add that I saw this in a pre-silicon test environment.
>>> I've never seen this on real hardware. Maybe something timing-sensitive.
>>
>> In the past, it indicated a swp pte corruption, that would e.g., mess up
>> the stored PFN ot the swap entry type.
>>
>> On which call chain do you see that?
>>
> 
> This is the backtrace, it's coming from hmm_range_fault. Looks like the
> swap entries are from migrated DEVICE_PRIVATE pages.

Thanks, on which kernel version can you reproduce this?

> 
> [Apr 3 20:11] ------------[ cut here ]------------
> [  +0.000041] kernel BUG at include/linux/swapops.h:466!
> [  +0.000691] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [  +0.000342] CPU: 2 PID: 49 Comm: kworker/2:1 Not tainted
> 6.7.0-kfd-compute-rocm-npi-186 #1
> [  +0.000556] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> [  +0.000703] Workqueue: events amdgpu_irq_handle_ih_soft [amdgpu]
> [  +0.000501] RIP: 0010:migration_entry_wait_on_locked+0x26b/0x2b0
> [  +0.000389] Code: fe ff ff 48 8d 7c 24 07 e8 02 7e f0 ff e9 58 fe ff
> ff 48 8b 43 08 a8 01 75 3f 66 90 48 89 d8 48 8b 00 a8 01 0f 85 f1 fd ff
> ff <0f> 0b 48 8d 58 ff e9 f7 fd ff ff 48 89 d8 f7 c3 ff 0f 00 00 75 df
> [  +0.001161] RSP: 0018:ffffb211c01bb788 EFLAGS: 00010246
> [  +0.000339] RAX: 017fff8000080018 RBX: fffff682c40ce8c0 RCX:
> 0000000000000001
> [  +0.000463] RDX: 0000000000000000 RSI: ffff977a45034840 RDI:
> 000000000000001a
> [  +0.000454] RBP: ffff977a45034840 R08: 68000000001033a3 R09:
> 0000000000000030
> [  +0.000451] R10: ffffb211c01bb6a8 R11: 0000000000000001 R12:
> ffff977a46bd1318
> [  +0.000461] R13: 0000000000000003 R14: 4000000000000000 R15:
> ffffb211c01bb9b8
> [  +0.000454] FS:  0000000000000000(0000) GS:ffff977dafd00000(0000)
> knlGS:0000000000000000
> [  +0.000518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  +0.000372] CR2: 00007fa2d1cba000 CR3: 00000001030d2004 CR4:
> 0000000000770ef0
> [  +0.000453] PKRU: 55555554
> [  +0.000182] Call Trace:
> [  +0.000171]  <TASK>
> [  +0.000147]  ? die+0x37/0x90
> [  +0.000211]  ? do_trap+0xe0/0x110
> [  +0.000221]  ? migration_entry_wait_on_locked+0x26b/0x2b0
> [  +0.000351]  ? do_error_trap+0x98/0x120
> [  +0.000252]  ? migration_entry_wait_on_locked+0x26b/0x2b0
> [  +0.000346]  ? migration_entry_wait_on_locked+0x26b/0x2b0
> [  +0.000355]  ? exc_invalid_op+0x52/0x70
> [  +0.000254]  ? migration_entry_wait_on_locked+0x26b/0x2b0
> [  +0.000345]  ? asm_exc_invalid_op+0x1a/0x20
> [  +0.000274]  ? migration_entry_wait_on_locked+0x26b/0x2b0
> [  +0.000361]  ? migration_entry_wait+0x4e/0x160
> [  +0.000293]  ? lock_release+0x119/0x260
> [  +0.000255]  migration_entry_wait+0x105/0x160
> [  +0.000290]  hmm_vma_walk_pmd+0x822/0x8a0
> [  +0.000263]  walk_pgd_range+0x40b/0x900
> [  +0.000268]  __walk_page_range+0x205/0x220

I wonder if that is coming from pmd_migration_entry_wait() or 
migration_entry_wait() --  the "?" above adds uncertainty :)

Likely it's from migration_entry_wait().

I was first concerned about the lack of PTL in this function, but 
migration_entry_wait() will take the PTL and re-read the PTE.

So when we call into migration_entry_wait_on_locked(), we are holding 
the PTL and we verified that we indeed have a migration entry.

So if we fail in 
migration_entry_wait_on_locked()->pfn_swap_entry_folio(), we verified 
under PTL and still have a migration entry.

The referenced folio is indeed not locked then.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: BUG_ON() in pfn_swap_entry_to_page()
  2024-04-26  8:49               ` David Hildenbrand
@ 2024-04-26 14:56                 ` Felix Kuehling
  0 siblings, 0 replies; 11+ messages in thread
From: Felix Kuehling @ 2024-04-26 14:56 UTC (permalink / raw)
  To: David Hildenbrand, Alistair Popple, Matthew Wilcox
  Cc: Sebastian Andrzej Siewior, linux-mm, Andrew Morton, Thomas Gleixner


On 2024-04-26 4:49, David Hildenbrand wrote:
> On 25.04.24 16:33, Felix Kuehling wrote:
>>
>>
>> On 2024-04-25 5:32, David Hildenbrand wrote:
>>> On 24.04.24 21:45, Felix Kuehling wrote:
>>>> Sorry for top-posting. I'm resurrecting an old thread here because I
>>>> think I ran into the same problem with this assertion failing on Linux
>>>> 6.7:
>>>>
>>>> static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry)
>>>> {
>>>>           struct page *p = pfn_to_page(swp_offset_pfn(entry));
>>>>
>>>>           /*
>>>>            * Any use of migration entries may only occur while the
>>>>            * corresponding page is locked
>>>>            */
>>>> -->     BUG_ON(is_migration_entry(entry) && !PageLocked(p));
>>>>
>>>>           return p;
>>>> }
>>>>
>>>> It looks like this thread just fizzled two years ago. Did anything
>>>> ever come of this?
>>>>
>>>> Maybe I should add that I saw this in a pre-silicon test environment.
>>>> I've never seen this on real hardware. Maybe something 
>>>> timing-sensitive.
>>>
>>> In the past, it indicated a swp pte corruption, that would e.g., mess up
>>> the stored PFN ot the swap entry type.
>>>
>>> On which call chain do you see that?
>>>
>>
>> This is the backtrace, it's coming from hmm_range_fault. Looks like the
>> swap entries are from migrated DEVICE_PRIVATE pages.
> 
> Thanks, on which kernel version can you reproduce this?

This is on a branch based on v6.7: $ git describe HEAD
v6.7-2677-g065851796b25

The branch mostly changes code in drivers. No changes in kernel/ or mm/. 
A few changes in include/linux, but nothing that looks related to core 
memory management.


> 
>>
>> [Apr 3 20:11] ------------[ cut here ]------------
>> [  +0.000041] kernel BUG at include/linux/swapops.h:466!
>> [  +0.000691] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>> [  +0.000342] CPU: 2 PID: 49 Comm: kworker/2:1 Not tainted
>> 6.7.0-kfd-compute-rocm-npi-186 #1
>> [  +0.000556] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>> rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>> [  +0.000703] Workqueue: events amdgpu_irq_handle_ih_soft [amdgpu]
>> [  +0.000501] RIP: 0010:migration_entry_wait_on_locked+0x26b/0x2b0
>> [  +0.000389] Code: fe ff ff 48 8d 7c 24 07 e8 02 7e f0 ff e9 58 fe ff
>> ff 48 8b 43 08 a8 01 75 3f 66 90 48 89 d8 48 8b 00 a8 01 0f 85 f1 fd ff
>> ff <0f> 0b 48 8d 58 ff e9 f7 fd ff ff 48 89 d8 f7 c3 ff 0f 00 00 75 df
>> [  +0.001161] RSP: 0018:ffffb211c01bb788 EFLAGS: 00010246
>> [  +0.000339] RAX: 017fff8000080018 RBX: fffff682c40ce8c0 RCX:
>> 0000000000000001
>> [  +0.000463] RDX: 0000000000000000 RSI: ffff977a45034840 RDI:
>> 000000000000001a
>> [  +0.000454] RBP: ffff977a45034840 R08: 68000000001033a3 R09:
>> 0000000000000030
>> [  +0.000451] R10: ffffb211c01bb6a8 R11: 0000000000000001 R12:
>> ffff977a46bd1318
>> [  +0.000461] R13: 0000000000000003 R14: 4000000000000000 R15:
>> ffffb211c01bb9b8
>> [  +0.000454] FS:  0000000000000000(0000) GS:ffff977dafd00000(0000)
>> knlGS:0000000000000000
>> [  +0.000518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  +0.000372] CR2: 00007fa2d1cba000 CR3: 00000001030d2004 CR4:
>> 0000000000770ef0
>> [  +0.000453] PKRU: 55555554
>> [  +0.000182] Call Trace:
>> [  +0.000171]  <TASK>
>> [  +0.000147]  ? die+0x37/0x90
>> [  +0.000211]  ? do_trap+0xe0/0x110
>> [  +0.000221]  ? migration_entry_wait_on_locked+0x26b/0x2b0
>> [  +0.000351]  ? do_error_trap+0x98/0x120
>> [  +0.000252]  ? migration_entry_wait_on_locked+0x26b/0x2b0
>> [  +0.000346]  ? migration_entry_wait_on_locked+0x26b/0x2b0
>> [  +0.000355]  ? exc_invalid_op+0x52/0x70
>> [  +0.000254]  ? migration_entry_wait_on_locked+0x26b/0x2b0
>> [  +0.000345]  ? asm_exc_invalid_op+0x1a/0x20
>> [  +0.000274]  ? migration_entry_wait_on_locked+0x26b/0x2b0
>> [  +0.000361]  ? migration_entry_wait+0x4e/0x160
>> [  +0.000293]  ? lock_release+0x119/0x260
>> [  +0.000255]  migration_entry_wait+0x105/0x160
>> [  +0.000290]  hmm_vma_walk_pmd+0x822/0x8a0
>> [  +0.000263]  walk_pgd_range+0x40b/0x900
>> [  +0.000268]  __walk_page_range+0x205/0x220
> 
> I wonder if that is coming from pmd_migration_entry_wait() or 
> migration_entry_wait() --  the "?" above adds uncertainty :)

This is weird. I only see a call to pmd_migration_entry_wait in 
hmm_vma_walk_pmd.

> 
> Likely it's from migration_entry_wait().
> 
> I was first concerned about the lack of PTL in this function, but 
> migration_entry_wait() will take the PTL and re-read the PTE.
> 
> So when we call into migration_entry_wait_on_locked(), we are holding 
> the PTL and we verified that we indeed have a migration entry.
> 
> So if we fail in 
> migration_entry_wait_on_locked()->pfn_swap_entry_folio(), we verified 
> under PTL and still have a migration entry.
> 
> The referenced folio is indeed not locked then.

I must admit, I'm not familiar with this code at all, so my observations 
and questions are probably naive. So is the BUG_ON bad, or is 
migration_entry_wait_on_locked missing some page locking?

I see that migration_entry_wait_on_locked does a 
folio_trylock_flag(folio, PG_locked, wait), but _after_ getting the 
folio with page_folio(pfn_swap_entry_to_page(entry)).

Maybe as a workaround for the team stumbling over this, I'll suggest 
disabling THP.

Regards,
   Felix


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-04-26 14:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-22 17:25 BUG_ON() in pfn_swap_entry_to_page() Sebastian Andrzej Siewior
2022-03-22 17:41 ` Matthew Wilcox
2022-03-23  0:29   ` Alistair Popple
2022-03-24  3:24     ` Matthew Wilcox
2022-03-24  3:51       ` Alistair Popple
2024-04-24 19:45         ` Felix Kuehling
2024-04-25  9:32           ` David Hildenbrand
2024-04-25 14:33             ` Felix Kuehling
2024-04-26  8:49               ` David Hildenbrand
2024-04-26 14:56                 ` Felix Kuehling
2022-03-22 18:53 ` Ritesh Harjani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).