Re: [PATCH] mm/migrate: fix race between lock page and clear PG

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
       [not found] <20220315030515.20263-1-andrew.yang@mediatek.com>
  2022-03-15  3:32   ` Matthew Wilcox
@ 2022-03-15  3:32   ` Matthew Wilcox
  1 sibling, 0 replies; 21+ messages in thread
From: Matthew Wilcox @ 2022-03-15  3:32 UTC (permalink / raw)
  To: Andrew Yang
  Cc: Andrew Morton, Matthias Brugger, Vlastimil Babka, David Howells,
	William Kucharski, David Hildenbrand, Yang Shi, Marc Zyngier,
	linux-kernel, linux-mm, linux-arm-kernel, linux-mediatek,
	wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On Tue, Mar 15, 2022 at 11:05:15AM +0800, Andrew Yang wrote:
> When memory is tight, system may start to compact memory for large
> continuous memory demands. If one process tries to lock a memory page
> that is being locked and isolated for compaction, it may wait a long time
> or even forever. This is because compaction will perform non-atomic
> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> set by the process that can't obtain the page lock and add itself to the
> waiting queue to wait for the lock to be unlocked.
> 
> CPU1                            CPU2
> lock_page(page); (successful)
>                                 lock_page(); (failed)
> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> unlock_page(page);
> 
> The solution is to not perform non-atomic operation on page flags while
> holding page lock.
> 
> Signed-off-by: andrew.yang <andrew.yang@mediatek.com>
> ---
>  include/linux/page-flags.h |  2 +-
>  mm/migrate.c               | 12 ++++++------
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 1c3b6e5c8bfd..64a84a9835cb 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -918,7 +918,7 @@ PAGE_TYPE_OPS(Guard, guard)
>  
>  extern bool is_free_buddy_page(struct page *page);
>  
> -__PAGEFLAG(Isolated, isolated, PF_ANY);
> +PAGEFLAG(Isolated, isolated, PF_ANY);

Agreed.  Further, page cannot be a tail page (this is implied by the
get_page_unless_zero() as tailpages have a zero refcount, and it
is assumed by __PageMovable() as page->mapping is undefined for tail
pages).  So this can actually be:

+PAGEFLAG(Isolated, isolated, PF_NO_TAIL);

I considered PF_ONLY_HEAD, but there are a lot more places that _check_
PageIsolated() and I don't want to prove that they're all definitely
working on head pages.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
@ 2022-03-15  3:32   ` Matthew Wilcox
  0 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox @ 2022-03-15  3:32 UTC (permalink / raw)
  To: Andrew Yang
  Cc: Andrew Morton, Matthias Brugger, Vlastimil Babka, David Howells,
	William Kucharski, David Hildenbrand, Yang Shi, Marc Zyngier,
	linux-kernel, linux-mm, linux-arm-kernel, linux-mediatek,
	wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On Tue, Mar 15, 2022 at 11:05:15AM +0800, Andrew Yang wrote:
> When memory is tight, system may start to compact memory for large
> continuous memory demands. If one process tries to lock a memory page
> that is being locked and isolated for compaction, it may wait a long time
> or even forever. This is because compaction will perform non-atomic
> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> set by the process that can't obtain the page lock and add itself to the
> waiting queue to wait for the lock to be unlocked.
> 
> CPU1                            CPU2
> lock_page(page); (successful)
>                                 lock_page(); (failed)
> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> unlock_page(page);
> 
> The solution is to not perform non-atomic operation on page flags while
> holding page lock.
> 
> Signed-off-by: andrew.yang <andrew.yang@mediatek.com>
> ---
>  include/linux/page-flags.h |  2 +-
>  mm/migrate.c               | 12 ++++++------
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 1c3b6e5c8bfd..64a84a9835cb 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -918,7 +918,7 @@ PAGE_TYPE_OPS(Guard, guard)
>  
>  extern bool is_free_buddy_page(struct page *page);
>  
> -__PAGEFLAG(Isolated, isolated, PF_ANY);
> +PAGEFLAG(Isolated, isolated, PF_ANY);

Agreed.  Further, page cannot be a tail page (this is implied by the
get_page_unless_zero() as tailpages have a zero refcount, and it
is assumed by __PageMovable() as page->mapping is undefined for tail
pages).  So this can actually be:

+PAGEFLAG(Isolated, isolated, PF_NO_TAIL);

I considered PF_ONLY_HEAD, but there are a lot more places that _check_
PageIsolated() and I don't want to prove that they're all definitely
working on head pages.


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
@ 2022-03-15  3:32   ` Matthew Wilcox
  0 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox @ 2022-03-15  3:32 UTC (permalink / raw)
  To: Andrew Yang
  Cc: Andrew Morton, Matthias Brugger, Vlastimil Babka, David Howells,
	William Kucharski, David Hildenbrand, Yang Shi, Marc Zyngier,
	linux-kernel, linux-mm, linux-arm-kernel, linux-mediatek,
	wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On Tue, Mar 15, 2022 at 11:05:15AM +0800, Andrew Yang wrote:
> When memory is tight, system may start to compact memory for large
> continuous memory demands. If one process tries to lock a memory page
> that is being locked and isolated for compaction, it may wait a long time
> or even forever. This is because compaction will perform non-atomic
> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> set by the process that can't obtain the page lock and add itself to the
> waiting queue to wait for the lock to be unlocked.
> 
> CPU1                            CPU2
> lock_page(page); (successful)
>                                 lock_page(); (failed)
> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> unlock_page(page);
> 
> The solution is to not perform non-atomic operation on page flags while
> holding page lock.
> 
> Signed-off-by: andrew.yang <andrew.yang@mediatek.com>
> ---
>  include/linux/page-flags.h |  2 +-
>  mm/migrate.c               | 12 ++++++------
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 1c3b6e5c8bfd..64a84a9835cb 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -918,7 +918,7 @@ PAGE_TYPE_OPS(Guard, guard)
>  
>  extern bool is_free_buddy_page(struct page *page);
>  
> -__PAGEFLAG(Isolated, isolated, PF_ANY);
> +PAGEFLAG(Isolated, isolated, PF_ANY);

Agreed.  Further, page cannot be a tail page (this is implied by the
get_page_unless_zero() as tailpages have a zero refcount, and it
is assumed by __PageMovable() as page->mapping is undefined for tail
pages).  So this can actually be:

+PAGEFLAG(Isolated, isolated, PF_NO_TAIL);

I considered PF_ONLY_HEAD, but there are a lot more places that _check_
PageIsolated() and I don't want to prove that they're all definitely
working on head pages.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
       [not found] <20220315030515.20263-1-andrew.yang@mediatek.com>
  2022-03-15  3:32   ` Matthew Wilcox
@ 2022-03-15  4:21   ` Andrew Morton
  1 sibling, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2022-03-15  4:21 UTC (permalink / raw)
  To: Andrew Yang
  Cc: Matthias Brugger, Matthew Wilcox, Vlastimil Babka, David Howells,
	William Kucharski, David Hildenbrand, Yang Shi, Marc Zyngier,
	linux-kernel, linux-mm, linux-arm-kernel, linux-mediatek,
	wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:

> When memory is tight, system may start to compact memory for large
> continuous memory demands. If one process tries to lock a memory page
> that is being locked and isolated for compaction, it may wait a long time
> or even forever. This is because compaction will perform non-atomic
> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> set by the process that can't obtain the page lock and add itself to the
> waiting queue to wait for the lock to be unlocked.
> 
> CPU1                            CPU2
> lock_page(page); (successful)
>                                 lock_page(); (failed)
> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> unlock_page(page);
> 
> The solution is to not perform non-atomic operation on page flags while
> holding page lock.

Sure, the non-atomic bitop optimization is really risky and I suspect
we reach for it too often.  Or at least without really clearly
demonstrating that it is safe, and documenting our assumptions.

I'm thinking this one should be backported, so I'll queue it for
5.18-rc1, with a cc:stable so it gets trickled back.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
@ 2022-03-15  4:21   ` Andrew Morton
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2022-03-15  4:21 UTC (permalink / raw)
  To: Andrew Yang
  Cc: Matthias Brugger, Matthew Wilcox, Vlastimil Babka, David Howells,
	William Kucharski, David Hildenbrand, Yang Shi, Marc Zyngier,
	linux-kernel, linux-mm, linux-arm-kernel, linux-mediatek,
	wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:

> When memory is tight, system may start to compact memory for large
> continuous memory demands. If one process tries to lock a memory page
> that is being locked and isolated for compaction, it may wait a long time
> or even forever. This is because compaction will perform non-atomic
> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> set by the process that can't obtain the page lock and add itself to the
> waiting queue to wait for the lock to be unlocked.
> 
> CPU1                            CPU2
> lock_page(page); (successful)
>                                 lock_page(); (failed)
> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> unlock_page(page);
> 
> The solution is to not perform non-atomic operation on page flags while
> holding page lock.

Sure, the non-atomic bitop optimization is really risky and I suspect
we reach for it too often.  Or at least without really clearly
demonstrating that it is safe, and documenting our assumptions.

I'm thinking this one should be backported, so I'll queue it for
5.18-rc1, with a cc:stable so it gets trickled back.

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
@ 2022-03-15  4:21   ` Andrew Morton
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2022-03-15  4:21 UTC (permalink / raw)
  To: Andrew Yang
  Cc: Matthias Brugger, Matthew Wilcox, Vlastimil Babka, David Howells,
	William Kucharski, David Hildenbrand, Yang Shi, Marc Zyngier,
	linux-kernel, linux-mm, linux-arm-kernel, linux-mediatek,
	wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:

> When memory is tight, system may start to compact memory for large
> continuous memory demands. If one process tries to lock a memory page
> that is being locked and isolated for compaction, it may wait a long time
> or even forever. This is because compaction will perform non-atomic
> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> set by the process that can't obtain the page lock and add itself to the
> waiting queue to wait for the lock to be unlocked.
> 
> CPU1                            CPU2
> lock_page(page); (successful)
>                                 lock_page(); (failed)
> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> unlock_page(page);
> 
> The solution is to not perform non-atomic operation on page flags while
> holding page lock.

Sure, the non-atomic bitop optimization is really risky and I suspect
we reach for it too often.  Or at least without really clearly
demonstrating that it is safe, and documenting our assumptions.

I'm thinking this one should be backported, so I'll queue it for
5.18-rc1, with a cc:stable so it gets trickled back.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
  2022-03-15  4:21   ` Andrew Morton
  (?)
@ 2022-03-15 15:45     ` David Hildenbrand
  -1 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2022-03-15 15:45 UTC (permalink / raw)
  To: Andrew Morton, Andrew Yang
  Cc: Matthias Brugger, Matthew Wilcox, Vlastimil Babka, David Howells,
	William Kucharski, Yang Shi, Marc Zyngier, linux-kernel,
	linux-mm, linux-arm-kernel, linux-mediatek, wsd_upstream,
	Nicholas Tang, Kuan-Ying Lee

On 15.03.22 05:21, Andrew Morton wrote:
> On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:
> 
>> When memory is tight, system may start to compact memory for large
>> continuous memory demands. If one process tries to lock a memory page
>> that is being locked and isolated for compaction, it may wait a long time
>> or even forever. This is because compaction will perform non-atomic
>> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
>> set by the process that can't obtain the page lock and add itself to the
>> waiting queue to wait for the lock to be unlocked.
>>
>> CPU1                            CPU2
>> lock_page(page); (successful)
>>                                 lock_page(); (failed)
>> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
>> unlock_page(page);
>>
>> The solution is to not perform non-atomic operation on page flags while
>> holding page lock.
> 
> Sure, the non-atomic bitop optimization is really risky and I suspect
> we reach for it too often.  Or at least without really clearly
> demonstrating that it is safe, and documenting our assumptions.

I agree. IIRC, non-atomic variants are mostly only safe while the
refcount is 0. Everything else is just absolutely fragile.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
@ 2022-03-15 15:45     ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2022-03-15 15:45 UTC (permalink / raw)
  To: Andrew Morton, Andrew Yang
  Cc: Matthias Brugger, Matthew Wilcox, Vlastimil Babka, David Howells,
	William Kucharski, Yang Shi, Marc Zyngier, linux-kernel,
	linux-mm, linux-arm-kernel, linux-mediatek, wsd_upstream,
	Nicholas Tang, Kuan-Ying Lee

On 15.03.22 05:21, Andrew Morton wrote:
> On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:
> 
>> When memory is tight, system may start to compact memory for large
>> continuous memory demands. If one process tries to lock a memory page
>> that is being locked and isolated for compaction, it may wait a long time
>> or even forever. This is because compaction will perform non-atomic
>> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
>> set by the process that can't obtain the page lock and add itself to the
>> waiting queue to wait for the lock to be unlocked.
>>
>> CPU1                            CPU2
>> lock_page(page); (successful)
>>                                 lock_page(); (failed)
>> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
>> unlock_page(page);
>>
>> The solution is to not perform non-atomic operation on page flags while
>> holding page lock.
> 
> Sure, the non-atomic bitop optimization is really risky and I suspect
> we reach for it too often.  Or at least without really clearly
> demonstrating that it is safe, and documenting our assumptions.

I agree. IIRC, non-atomic variants are mostly only safe while the
refcount is 0. Everything else is just absolutely fragile.

-- 
Thanks,

David / dhildenb


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
@ 2022-03-15 15:45     ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2022-03-15 15:45 UTC (permalink / raw)
  To: Andrew Morton, Andrew Yang
  Cc: Matthias Brugger, Matthew Wilcox, Vlastimil Babka, David Howells,
	William Kucharski, Yang Shi, Marc Zyngier, linux-kernel,
	linux-mm, linux-arm-kernel, linux-mediatek, wsd_upstream,
	Nicholas Tang, Kuan-Ying Lee

On 15.03.22 05:21, Andrew Morton wrote:
> On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:
> 
>> When memory is tight, system may start to compact memory for large
>> continuous memory demands. If one process tries to lock a memory page
>> that is being locked and isolated for compaction, it may wait a long time
>> or even forever. This is because compaction will perform non-atomic
>> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
>> set by the process that can't obtain the page lock and add itself to the
>> waiting queue to wait for the lock to be unlocked.
>>
>> CPU1                            CPU2
>> lock_page(page); (successful)
>>                                 lock_page(); (failed)
>> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
>> unlock_page(page);
>>
>> The solution is to not perform non-atomic operation on page flags while
>> holding page lock.
> 
> Sure, the non-atomic bitop optimization is really risky and I suspect
> we reach for it too often.  Or at least without really clearly
> demonstrating that it is safe, and documenting our assumptions.

I agree. IIRC, non-atomic variants are mostly only safe while the
refcount is 0. Everything else is just absolutely fragile.

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
  2022-03-15 15:45     ` David Hildenbrand
  (?)
@ 2022-03-15 17:43       ` Matthew Wilcox
  -1 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox @ 2022-03-15 17:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Andrew Yang, Matthias Brugger, Vlastimil Babka,
	David Howells, William Kucharski, Yang Shi, Marc Zyngier,
	linux-kernel, linux-mm, linux-arm-kernel, linux-mediatek,
	wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On Tue, Mar 15, 2022 at 04:45:13PM +0100, David Hildenbrand wrote:
> On 15.03.22 05:21, Andrew Morton wrote:
> > On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:
> > 
> >> When memory is tight, system may start to compact memory for large
> >> continuous memory demands. If one process tries to lock a memory page
> >> that is being locked and isolated for compaction, it may wait a long time
> >> or even forever. This is because compaction will perform non-atomic
> >> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> >> set by the process that can't obtain the page lock and add itself to the
> >> waiting queue to wait for the lock to be unlocked.
> >>
> >> CPU1                            CPU2
> >> lock_page(page); (successful)
> >>                                 lock_page(); (failed)
> >> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> >> unlock_page(page);
> >>
> >> The solution is to not perform non-atomic operation on page flags while
> >> holding page lock.
> > 
> > Sure, the non-atomic bitop optimization is really risky and I suspect
> > we reach for it too often.  Or at least without really clearly
> > demonstrating that it is safe, and documenting our assumptions.
> 
> I agree. IIRC, non-atomic variants are mostly only safe while the
> refcount is 0. Everything else is just absolutely fragile.

We could add an assertion ... I just tried this:

+++ b/include/linux/page-flags.h
@@ -342,14 +342,16 @@ static __always_inline                                                    \
 void __folio_set_##lname(struct folio *folio)                          \
 { __set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }         \
 static __always_inline void __SetPage##uname(struct page *page)                \
-{ __set_bit(PG_##lname, &policy(page, 1)->flags); }
+{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
+  __set_bit(PG_##lname, &policy(page, 1)->flags); }

 #define __CLEARPAGEFLAG(uname, lname, policy)                          \
 static __always_inline                                                 \
 void __folio_clear_##lname(struct folio *folio)                                \
 { __clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }       \
 static __always_inline void __ClearPage##uname(struct page *page)      \
-{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }
+{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
+  __clear_bit(PG_##lname, &policy(page, 1)->flags); }

 #define TESTSETFLAG(uname, lname, policy)                              \
 static __always_inline                                                 \

... but it dies _really_ early:

(gdb) bt
#0  0xffffffff820055e5 in native_halt ()
    at ../arch/x86/include/asm/irqflags.h:57
#1  halt () at ../arch/x86/include/asm/irqflags.h:98
#2  early_fixup_exception (regs=regs@entry=0xffffffff81e03cf8,
    trapnr=trapnr@entry=6) at ../arch/x86/mm/extable.c:283
#3  0xffffffff81ff243c in do_early_exception (regs=0xffffffff81e03cf8,
    trapnr=6) at ../arch/x86/kernel/head64.c:419
#4  0xffffffff81ff214f in early_idt_handler_common ()
    at ../arch/x86/kernel/head_64.S:417
#5  0x0000000000000000 in ?? ()

and honestly, I'm not sure how to debug something that goes wrong this
early.  Maybe I need to make that start warning 5 seconds after boot
or only if we're not in pid 1, or something ...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
@ 2022-03-15 17:43       ` Matthew Wilcox
  0 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox @ 2022-03-15 17:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Andrew Yang, Matthias Brugger, Vlastimil Babka,
	David Howells, William Kucharski, Yang Shi, Marc Zyngier,
	linux-kernel, linux-mm, linux-arm-kernel, linux-mediatek,
	wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On Tue, Mar 15, 2022 at 04:45:13PM +0100, David Hildenbrand wrote:
> On 15.03.22 05:21, Andrew Morton wrote:
> > On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:
> > 
> >> When memory is tight, system may start to compact memory for large
> >> continuous memory demands. If one process tries to lock a memory page
> >> that is being locked and isolated for compaction, it may wait a long time
> >> or even forever. This is because compaction will perform non-atomic
> >> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> >> set by the process that can't obtain the page lock and add itself to the
> >> waiting queue to wait for the lock to be unlocked.
> >>
> >> CPU1                            CPU2
> >> lock_page(page); (successful)
> >>                                 lock_page(); (failed)
> >> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> >> unlock_page(page);
> >>
> >> The solution is to not perform non-atomic operation on page flags while
> >> holding page lock.
> > 
> > Sure, the non-atomic bitop optimization is really risky and I suspect
> > we reach for it too often.  Or at least without really clearly
> > demonstrating that it is safe, and documenting our assumptions.
> 
> I agree. IIRC, non-atomic variants are mostly only safe while the
> refcount is 0. Everything else is just absolutely fragile.

We could add an assertion ... I just tried this:

+++ b/include/linux/page-flags.h
@@ -342,14 +342,16 @@ static __always_inline                                                    \
 void __folio_set_##lname(struct folio *folio)                          \
 { __set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }         \
 static __always_inline void __SetPage##uname(struct page *page)                \
-{ __set_bit(PG_##lname, &policy(page, 1)->flags); }
+{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
+  __set_bit(PG_##lname, &policy(page, 1)->flags); }

 #define __CLEARPAGEFLAG(uname, lname, policy)                          \
 static __always_inline                                                 \
 void __folio_clear_##lname(struct folio *folio)                                \
 { __clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }       \
 static __always_inline void __ClearPage##uname(struct page *page)      \
-{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }
+{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
+  __clear_bit(PG_##lname, &policy(page, 1)->flags); }

 #define TESTSETFLAG(uname, lname, policy)                              \
 static __always_inline                                                 \

... but it dies _really_ early:

(gdb) bt
#0  0xffffffff820055e5 in native_halt ()
    at ../arch/x86/include/asm/irqflags.h:57
#1  halt () at ../arch/x86/include/asm/irqflags.h:98
#2  early_fixup_exception (regs=regs@entry=0xffffffff81e03cf8,
    trapnr=trapnr@entry=6) at ../arch/x86/mm/extable.c:283
#3  0xffffffff81ff243c in do_early_exception (regs=0xffffffff81e03cf8,
    trapnr=6) at ../arch/x86/kernel/head64.c:419
#4  0xffffffff81ff214f in early_idt_handler_common ()
    at ../arch/x86/kernel/head_64.S:417
#5  0x0000000000000000 in ?? ()

and honestly, I'm not sure how to debug something that goes wrong this
early.  Maybe I need to make that start warning 5 seconds after boot
or only if we're not in pid 1, or something ...

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
@ 2022-03-15 17:43       ` Matthew Wilcox
  0 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox @ 2022-03-15 17:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Andrew Yang, Matthias Brugger, Vlastimil Babka,
	David Howells, William Kucharski, Yang Shi, Marc Zyngier,
	linux-kernel, linux-mm, linux-arm-kernel, linux-mediatek,
	wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On Tue, Mar 15, 2022 at 04:45:13PM +0100, David Hildenbrand wrote:
> On 15.03.22 05:21, Andrew Morton wrote:
> > On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:
> > 
> >> When memory is tight, system may start to compact memory for large
> >> continuous memory demands. If one process tries to lock a memory page
> >> that is being locked and isolated for compaction, it may wait a long time
> >> or even forever. This is because compaction will perform non-atomic
> >> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> >> set by the process that can't obtain the page lock and add itself to the
> >> waiting queue to wait for the lock to be unlocked.
> >>
> >> CPU1                            CPU2
> >> lock_page(page); (successful)
> >>                                 lock_page(); (failed)
> >> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> >> unlock_page(page);
> >>
> >> The solution is to not perform non-atomic operation on page flags while
> >> holding page lock.
> > 
> > Sure, the non-atomic bitop optimization is really risky and I suspect
> > we reach for it too often.  Or at least without really clearly
> > demonstrating that it is safe, and documenting our assumptions.
> 
> I agree. IIRC, non-atomic variants are mostly only safe while the
> refcount is 0. Everything else is just absolutely fragile.

We could add an assertion ... I just tried this:

+++ b/include/linux/page-flags.h
@@ -342,14 +342,16 @@ static __always_inline                                                    \
 void __folio_set_##lname(struct folio *folio)                          \
 { __set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }         \
 static __always_inline void __SetPage##uname(struct page *page)                \
-{ __set_bit(PG_##lname, &policy(page, 1)->flags); }
+{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
+  __set_bit(PG_##lname, &policy(page, 1)->flags); }

 #define __CLEARPAGEFLAG(uname, lname, policy)                          \
 static __always_inline                                                 \
 void __folio_clear_##lname(struct folio *folio)                                \
 { __clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }       \
 static __always_inline void __ClearPage##uname(struct page *page)      \
-{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }
+{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
+  __clear_bit(PG_##lname, &policy(page, 1)->flags); }

 #define TESTSETFLAG(uname, lname, policy)                              \
 static __always_inline                                                 \

... but it dies _really_ early:

(gdb) bt
#0  0xffffffff820055e5 in native_halt ()
    at ../arch/x86/include/asm/irqflags.h:57
#1  halt () at ../arch/x86/include/asm/irqflags.h:98
#2  early_fixup_exception (regs=regs@entry=0xffffffff81e03cf8,
    trapnr=trapnr@entry=6) at ../arch/x86/mm/extable.c:283
#3  0xffffffff81ff243c in do_early_exception (regs=0xffffffff81e03cf8,
    trapnr=6) at ../arch/x86/kernel/head64.c:419
#4  0xffffffff81ff214f in early_idt_handler_common ()
    at ../arch/x86/kernel/head_64.S:417
#5  0x0000000000000000 in ?? ()

and honestly, I'm not sure how to debug something that goes wrong this
early.  Maybe I need to make that start warning 5 seconds after boot
or only if we're not in pid 1, or something ...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
  2022-03-15 17:43       ` Matthew Wilcox
  (?)
@ 2022-03-15 19:07         ` David Hildenbrand
  -1 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2022-03-15 19:07 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Andrew Yang, Matthias Brugger, Vlastimil Babka,
	David Howells, William Kucharski, Yang Shi, Marc Zyngier,
	linux-kernel, linux-mm, linux-arm-kernel, linux-mediatek,
	wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On 15.03.22 18:43, Matthew Wilcox wrote:
> On Tue, Mar 15, 2022 at 04:45:13PM +0100, David Hildenbrand wrote:
>> On 15.03.22 05:21, Andrew Morton wrote:
>>> On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:
>>>
>>>> When memory is tight, system may start to compact memory for large
>>>> continuous memory demands. If one process tries to lock a memory page
>>>> that is being locked and isolated for compaction, it may wait a long time
>>>> or even forever. This is because compaction will perform non-atomic
>>>> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
>>>> set by the process that can't obtain the page lock and add itself to the
>>>> waiting queue to wait for the lock to be unlocked.
>>>>
>>>> CPU1                            CPU2
>>>> lock_page(page); (successful)
>>>>                                 lock_page(); (failed)
>>>> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
>>>> unlock_page(page);
>>>>
>>>> The solution is to not perform non-atomic operation on page flags while
>>>> holding page lock.
>>>
>>> Sure, the non-atomic bitop optimization is really risky and I suspect
>>> we reach for it too often.  Or at least without really clearly
>>> demonstrating that it is safe, and documenting our assumptions.
>>
>> I agree. IIRC, non-atomic variants are mostly only safe while the
>> refcount is 0. Everything else is just absolutely fragile.
> 
> We could add an assertion ... I just tried this:
> 
> +++ b/include/linux/page-flags.h
> @@ -342,14 +342,16 @@ static __always_inline                                                    \
>  void __folio_set_##lname(struct folio *folio)                          \
>  { __set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }         \
>  static __always_inline void __SetPage##uname(struct page *page)                \
> -{ __set_bit(PG_##lname, &policy(page, 1)->flags); }
> +{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
> +  __set_bit(PG_##lname, &policy(page, 1)->flags); }
> 
>  #define __CLEARPAGEFLAG(uname, lname, policy)                          \
>  static __always_inline                                                 \
>  void __folio_clear_##lname(struct folio *folio)                                \
>  { __clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }       \
>  static __always_inline void __ClearPage##uname(struct page *page)      \
> -{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }
> +{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
> +  __clear_bit(PG_##lname, &policy(page, 1)->flags); }
> 
>  #define TESTSETFLAG(uname, lname, policy)                              \
>  static __always_inline                                                 \
> 
> ... but it dies _really_ early:
> 
> (gdb) bt
> #0  0xffffffff820055e5 in native_halt ()
>     at ../arch/x86/include/asm/irqflags.h:57
> #1  halt () at ../arch/x86/include/asm/irqflags.h:98
> #2  early_fixup_exception (regs=regs@entry=0xffffffff81e03cf8,
>     trapnr=trapnr@entry=6) at ../arch/x86/mm/extable.c:283
> #3  0xffffffff81ff243c in do_early_exception (regs=0xffffffff81e03cf8,
>     trapnr=6) at ../arch/x86/kernel/head64.c:419
> #4  0xffffffff81ff214f in early_idt_handler_common ()
>     at ../arch/x86/kernel/head_64.S:417
> #5  0x0000000000000000 in ?? ()
> 
> and honestly, I'm not sure how to debug something that goes wrong this
> early.  Maybe I need to make that start warning 5 seconds after boot
> or only if we're not in pid 1, or something ...

Maybe checking for "system_state >= SYSTEM_RUNNING" or "system_state >=
SYSTEM_SCHEDULING" to exclude early boot where no (real) concurrency is
happening. But I assume you'll still get plenty of such reports.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
@ 2022-03-15 19:07         ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2022-03-15 19:07 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Andrew Yang, Matthias Brugger, Vlastimil Babka,
	David Howells, William Kucharski, Yang Shi, Marc Zyngier,
	linux-kernel, linux-mm, linux-arm-kernel, linux-mediatek,
	wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On 15.03.22 18:43, Matthew Wilcox wrote:
> On Tue, Mar 15, 2022 at 04:45:13PM +0100, David Hildenbrand wrote:
>> On 15.03.22 05:21, Andrew Morton wrote:
>>> On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:
>>>
>>>> When memory is tight, system may start to compact memory for large
>>>> continuous memory demands. If one process tries to lock a memory page
>>>> that is being locked and isolated for compaction, it may wait a long time
>>>> or even forever. This is because compaction will perform non-atomic
>>>> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
>>>> set by the process that can't obtain the page lock and add itself to the
>>>> waiting queue to wait for the lock to be unlocked.
>>>>
>>>> CPU1                            CPU2
>>>> lock_page(page); (successful)
>>>>                                 lock_page(); (failed)
>>>> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
>>>> unlock_page(page);
>>>>
>>>> The solution is to not perform non-atomic operation on page flags while
>>>> holding page lock.
>>>
>>> Sure, the non-atomic bitop optimization is really risky and I suspect
>>> we reach for it too often.  Or at least without really clearly
>>> demonstrating that it is safe, and documenting our assumptions.
>>
>> I agree. IIRC, non-atomic variants are mostly only safe while the
>> refcount is 0. Everything else is just absolutely fragile.
> 
> We could add an assertion ... I just tried this:
> 
> +++ b/include/linux/page-flags.h
> @@ -342,14 +342,16 @@ static __always_inline                                                    \
>  void __folio_set_##lname(struct folio *folio)                          \
>  { __set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }         \
>  static __always_inline void __SetPage##uname(struct page *page)                \
> -{ __set_bit(PG_##lname, &policy(page, 1)->flags); }
> +{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
> +  __set_bit(PG_##lname, &policy(page, 1)->flags); }
> 
>  #define __CLEARPAGEFLAG(uname, lname, policy)                          \
>  static __always_inline                                                 \
>  void __folio_clear_##lname(struct folio *folio)                                \
>  { __clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }       \
>  static __always_inline void __ClearPage##uname(struct page *page)      \
> -{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }
> +{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
> +  __clear_bit(PG_##lname, &policy(page, 1)->flags); }
> 
>  #define TESTSETFLAG(uname, lname, policy)                              \
>  static __always_inline                                                 \
> 
> ... but it dies _really_ early:
> 
> (gdb) bt
> #0  0xffffffff820055e5 in native_halt ()
>     at ../arch/x86/include/asm/irqflags.h:57
> #1  halt () at ../arch/x86/include/asm/irqflags.h:98
> #2  early_fixup_exception (regs=regs@entry=0xffffffff81e03cf8,
>     trapnr=trapnr@entry=6) at ../arch/x86/mm/extable.c:283
> #3  0xffffffff81ff243c in do_early_exception (regs=0xffffffff81e03cf8,
>     trapnr=6) at ../arch/x86/kernel/head64.c:419
> #4  0xffffffff81ff214f in early_idt_handler_common ()
>     at ../arch/x86/kernel/head_64.S:417
> #5  0x0000000000000000 in ?? ()
> 
> and honestly, I'm not sure how to debug something that goes wrong this
> early.  Maybe I need to make that start warning 5 seconds after boot
> or only if we're not in pid 1, or something ...

Maybe checking for "system_state >= SYSTEM_RUNNING" or "system_state >=
SYSTEM_SCHEDULING" to exclude early boot where no (real) concurrency is
happening. But I assume you'll still get plenty of such reports.

-- 
Thanks,

David / dhildenb


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
@ 2022-03-15 19:07         ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2022-03-15 19:07 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Andrew Yang, Matthias Brugger, Vlastimil Babka,
	David Howells, William Kucharski, Yang Shi, Marc Zyngier,
	linux-kernel, linux-mm, linux-arm-kernel, linux-mediatek,
	wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On 15.03.22 18:43, Matthew Wilcox wrote:
> On Tue, Mar 15, 2022 at 04:45:13PM +0100, David Hildenbrand wrote:
>> On 15.03.22 05:21, Andrew Morton wrote:
>>> On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:
>>>
>>>> When memory is tight, system may start to compact memory for large
>>>> continuous memory demands. If one process tries to lock a memory page
>>>> that is being locked and isolated for compaction, it may wait a long time
>>>> or even forever. This is because compaction will perform non-atomic
>>>> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
>>>> set by the process that can't obtain the page lock and add itself to the
>>>> waiting queue to wait for the lock to be unlocked.
>>>>
>>>> CPU1                            CPU2
>>>> lock_page(page); (successful)
>>>>                                 lock_page(); (failed)
>>>> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
>>>> unlock_page(page);
>>>>
>>>> The solution is to not perform non-atomic operation on page flags while
>>>> holding page lock.
>>>
>>> Sure, the non-atomic bitop optimization is really risky and I suspect
>>> we reach for it too often.  Or at least without really clearly
>>> demonstrating that it is safe, and documenting our assumptions.
>>
>> I agree. IIRC, non-atomic variants are mostly only safe while the
>> refcount is 0. Everything else is just absolutely fragile.
> 
> We could add an assertion ... I just tried this:
> 
> +++ b/include/linux/page-flags.h
> @@ -342,14 +342,16 @@ static __always_inline                                                    \
>  void __folio_set_##lname(struct folio *folio)                          \
>  { __set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }         \
>  static __always_inline void __SetPage##uname(struct page *page)                \
> -{ __set_bit(PG_##lname, &policy(page, 1)->flags); }
> +{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
> +  __set_bit(PG_##lname, &policy(page, 1)->flags); }
> 
>  #define __CLEARPAGEFLAG(uname, lname, policy)                          \
>  static __always_inline                                                 \
>  void __folio_clear_##lname(struct folio *folio)                                \
>  { __clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }       \
>  static __always_inline void __ClearPage##uname(struct page *page)      \
> -{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }
> +{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
> +  __clear_bit(PG_##lname, &policy(page, 1)->flags); }
> 
>  #define TESTSETFLAG(uname, lname, policy)                              \
>  static __always_inline                                                 \
> 
> ... but it dies _really_ early:
> 
> (gdb) bt
> #0  0xffffffff820055e5 in native_halt ()
>     at ../arch/x86/include/asm/irqflags.h:57
> #1  halt () at ../arch/x86/include/asm/irqflags.h:98
> #2  early_fixup_exception (regs=regs@entry=0xffffffff81e03cf8,
>     trapnr=trapnr@entry=6) at ../arch/x86/mm/extable.c:283
> #3  0xffffffff81ff243c in do_early_exception (regs=0xffffffff81e03cf8,
>     trapnr=6) at ../arch/x86/kernel/head64.c:419
> #4  0xffffffff81ff214f in early_idt_handler_common ()
>     at ../arch/x86/kernel/head_64.S:417
> #5  0x0000000000000000 in ?? ()
> 
> and honestly, I'm not sure how to debug something that goes wrong this
> early.  Maybe I need to make that start warning 5 seconds after boot
> or only if we're not in pid 1, or something ...

Maybe checking for "system_state >= SYSTEM_RUNNING" or "system_state >=
SYSTEM_SCHEDULING" to exclude early boot where no (real) concurrency is
happening. But I assume you'll still get plenty of such reports.

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
  2022-03-15 15:45     ` David Hildenbrand
  (?)
@ 2022-03-15 20:34       ` Hugh Dickins
  -1 siblings, 0 replies; 21+ messages in thread
From: Hugh Dickins @ 2022-03-15 20:34 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Andrew Yang, Matthias Brugger, Matthew Wilcox,
	Vlastimil Babka, David Howells, William Kucharski, Yang Shi,
	Marc Zyngier, linux-kernel, linux-mm, linux-arm-kernel,
	linux-mediatek, wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On Tue, 15 Mar 2022, David Hildenbrand wrote:
> On 15.03.22 05:21, Andrew Morton wrote:
> > On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:
> > 
> >> When memory is tight, system may start to compact memory for large
> >> continuous memory demands. If one process tries to lock a memory page
> >> that is being locked and isolated for compaction, it may wait a long time
> >> or even forever. This is because compaction will perform non-atomic
> >> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> >> set by the process that can't obtain the page lock and add itself to the
> >> waiting queue to wait for the lock to be unlocked.
> >>
> >> CPU1                            CPU2
> >> lock_page(page); (successful)
> >>                                 lock_page(); (failed)
> >> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> >> unlock_page(page);
> >>
> >> The solution is to not perform non-atomic operation on page flags while
> >> holding page lock.
> > 
> > Sure, the non-atomic bitop optimization is really risky and I suspect
> > we reach for it too often.  Or at least without really clearly
> > demonstrating that it is safe, and documenting our assumptions.
> 
> I agree. IIRC, non-atomic variants are mostly only safe while the
> refcount is 0. Everything else is just absolutely fragile.

It is normal and correct to use __SetPageFlag(page) on a page just allocated
from the buddy, and not yet logically visible to others: that has refcount 1.

Of course, it might have refcount 2 or more, through being speculatively
visible to get_page_unless_zero() users: perhaps through earlier usage of
the same struct page, or by physical scan of memmap.

Those few such others - compaction's isolate_migratepages_block() is the
one I know best - must be very careful in their sequence of operations.

Preliminary read-only checks are usually okay (but some VM_BUG_ON_PGFLAGS
are increasingly problematic: I've had to turn off that CONFIG), then
get_page_unless_zero(), then read-only check that the page is of the
manageable kind (PageLRU in my world), and only then can it be safe to
lock the page - which of course touches page flags, and so would be
problematic for a racing user's __SetPageFlag(page).

But PageMovable and PageIsolated are beyond my ken: I can't say there.

Hugh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
@ 2022-03-15 20:34       ` Hugh Dickins
  0 siblings, 0 replies; 21+ messages in thread
From: Hugh Dickins @ 2022-03-15 20:34 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Andrew Yang, Matthias Brugger, Matthew Wilcox,
	Vlastimil Babka, David Howells, William Kucharski, Yang Shi,
	Marc Zyngier, linux-kernel, linux-mm, linux-arm-kernel,
	linux-mediatek, wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On Tue, 15 Mar 2022, David Hildenbrand wrote:
> On 15.03.22 05:21, Andrew Morton wrote:
> > On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:
> > 
> >> When memory is tight, system may start to compact memory for large
> >> continuous memory demands. If one process tries to lock a memory page
> >> that is being locked and isolated for compaction, it may wait a long time
> >> or even forever. This is because compaction will perform non-atomic
> >> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> >> set by the process that can't obtain the page lock and add itself to the
> >> waiting queue to wait for the lock to be unlocked.
> >>
> >> CPU1                            CPU2
> >> lock_page(page); (successful)
> >>                                 lock_page(); (failed)
> >> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> >> unlock_page(page);
> >>
> >> The solution is to not perform non-atomic operation on page flags while
> >> holding page lock.
> > 
> > Sure, the non-atomic bitop optimization is really risky and I suspect
> > we reach for it too often.  Or at least without really clearly
> > demonstrating that it is safe, and documenting our assumptions.
> 
> I agree. IIRC, non-atomic variants are mostly only safe while the
> refcount is 0. Everything else is just absolutely fragile.

It is normal and correct to use __SetPageFlag(page) on a page just allocated
from the buddy, and not yet logically visible to others: that has refcount 1.

Of course, it might have refcount 2 or more, through being speculatively
visible to get_page_unless_zero() users: perhaps through earlier usage of
the same struct page, or by physical scan of memmap.

Those few such others - compaction's isolate_migratepages_block() is the
one I know best - must be very careful in their sequence of operations.

Preliminary read-only checks are usually okay (but some VM_BUG_ON_PGFLAGS
are increasingly problematic: I've had to turn off that CONFIG), then
get_page_unless_zero(), then read-only check that the page is of the
manageable kind (PageLRU in my world), and only then can it be safe to
lock the page - which of course touches page flags, and so would be
problematic for a racing user's __SetPageFlag(page).

But PageMovable and PageIsolated are beyond my ken: I can't say there.

Hugh

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
@ 2022-03-15 20:34       ` Hugh Dickins
  0 siblings, 0 replies; 21+ messages in thread
From: Hugh Dickins @ 2022-03-15 20:34 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Andrew Yang, Matthias Brugger, Matthew Wilcox,
	Vlastimil Babka, David Howells, William Kucharski, Yang Shi,
	Marc Zyngier, linux-kernel, linux-mm, linux-arm-kernel,
	linux-mediatek, wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On Tue, 15 Mar 2022, David Hildenbrand wrote:
> On 15.03.22 05:21, Andrew Morton wrote:
> > On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:
> > 
> >> When memory is tight, system may start to compact memory for large
> >> continuous memory demands. If one process tries to lock a memory page
> >> that is being locked and isolated for compaction, it may wait a long time
> >> or even forever. This is because compaction will perform non-atomic
> >> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> >> set by the process that can't obtain the page lock and add itself to the
> >> waiting queue to wait for the lock to be unlocked.
> >>
> >> CPU1                            CPU2
> >> lock_page(page); (successful)
> >>                                 lock_page(); (failed)
> >> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> >> unlock_page(page);
> >>
> >> The solution is to not perform non-atomic operation on page flags while
> >> holding page lock.
> > 
> > Sure, the non-atomic bitop optimization is really risky and I suspect
> > we reach for it too often.  Or at least without really clearly
> > demonstrating that it is safe, and documenting our assumptions.
> 
> I agree. IIRC, non-atomic variants are mostly only safe while the
> refcount is 0. Everything else is just absolutely fragile.

It is normal and correct to use __SetPageFlag(page) on a page just allocated
from the buddy, and not yet logically visible to others: that has refcount 1.

Of course, it might have refcount 2 or more, through being speculatively
visible to get_page_unless_zero() users: perhaps through earlier usage of
the same struct page, or by physical scan of memmap.

Those few such others - compaction's isolate_migratepages_block() is the
one I know best - must be very careful in their sequence of operations.

Preliminary read-only checks are usually okay (but some VM_BUG_ON_PGFLAGS
are increasingly problematic: I've had to turn off that CONFIG), then
get_page_unless_zero(), then read-only check that the page is of the
manageable kind (PageLRU in my world), and only then can it be safe to
lock the page - which of course touches page flags, and so would be
problematic for a racing user's __SetPageFlag(page).

But PageMovable and PageIsolated are beyond my ken: I can't say there.

Hugh

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
  2022-03-15  3:32   ` Matthew Wilcox
  (?)
@ 2022-03-21 23:26     ` Andrew Morton
  -1 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2022-03-21 23:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Yang, Matthias Brugger, Vlastimil Babka, David Howells,
	William Kucharski, David Hildenbrand, Yang Shi, Marc Zyngier,
	linux-kernel, linux-mm, linux-arm-kernel, linux-mediatek,
	wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On Tue, 15 Mar 2022 03:32:16 +0000 Matthew Wilcox <willy@infradead.org> wrote:

> On Tue, Mar 15, 2022 at 11:05:15AM +0800, Andrew Yang wrote:
> > When memory is tight, system may start to compact memory for large
> > continuous memory demands. If one process tries to lock a memory page
> > that is being locked and isolated for compaction, it may wait a long time
> > or even forever. This is because compaction will perform non-atomic
> > PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> > set by the process that can't obtain the page lock and add itself to the
> > waiting queue to wait for the lock to be unlocked.
> > 
> > CPU1                            CPU2
> > lock_page(page); (successful)
> >                                 lock_page(); (failed)
> > __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> > unlock_page(page);
> > 
> > The solution is to not perform non-atomic operation on page flags while
> > holding page lock.
> > 
> > Signed-off-by: andrew.yang <andrew.yang@mediatek.com>
> > ---
> >  include/linux/page-flags.h |  2 +-
> >  mm/migrate.c               | 12 ++++++------
> >  2 files changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 1c3b6e5c8bfd..64a84a9835cb 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -918,7 +918,7 @@ PAGE_TYPE_OPS(Guard, guard)
> >  
> >  extern bool is_free_buddy_page(struct page *page);
> >  
> > -__PAGEFLAG(Isolated, isolated, PF_ANY);
> > +PAGEFLAG(Isolated, isolated, PF_ANY);
> 
> Agreed.  Further, page cannot be a tail page (this is implied by the
> get_page_unless_zero() as tailpages have a zero refcount, and it
> is assumed by __PageMovable() as page->mapping is undefined for tail
> pages).  So this can actually be:
> 
> +PAGEFLAG(Isolated, isolated, PF_NO_TAIL);
> 
> I considered PF_ONLY_HEAD, but there are a lot more places that _check_
> PageIsolated() and I don't want to prove that they're all definitely
> working on head pages.

There was no followup to this.  I'll send the patch upstream as-is, but
please let's not forget this idea.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
@ 2022-03-21 23:26     ` Andrew Morton
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2022-03-21 23:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Yang, Matthias Brugger, Vlastimil Babka, David Howells,
	William Kucharski, David Hildenbrand, Yang Shi, Marc Zyngier,
	linux-kernel, linux-mm, linux-arm-kernel, linux-mediatek,
	wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On Tue, 15 Mar 2022 03:32:16 +0000 Matthew Wilcox <willy@infradead.org> wrote:

> On Tue, Mar 15, 2022 at 11:05:15AM +0800, Andrew Yang wrote:
> > When memory is tight, system may start to compact memory for large
> > continuous memory demands. If one process tries to lock a memory page
> > that is being locked and isolated for compaction, it may wait a long time
> > or even forever. This is because compaction will perform non-atomic
> > PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> > set by the process that can't obtain the page lock and add itself to the
> > waiting queue to wait for the lock to be unlocked.
> > 
> > CPU1                            CPU2
> > lock_page(page); (successful)
> >                                 lock_page(); (failed)
> > __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> > unlock_page(page);
> > 
> > The solution is to not perform non-atomic operation on page flags while
> > holding page lock.
> > 
> > Signed-off-by: andrew.yang <andrew.yang@mediatek.com>
> > ---
> >  include/linux/page-flags.h |  2 +-
> >  mm/migrate.c               | 12 ++++++------
> >  2 files changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 1c3b6e5c8bfd..64a84a9835cb 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -918,7 +918,7 @@ PAGE_TYPE_OPS(Guard, guard)
> >  
> >  extern bool is_free_buddy_page(struct page *page);
> >  
> > -__PAGEFLAG(Isolated, isolated, PF_ANY);
> > +PAGEFLAG(Isolated, isolated, PF_ANY);
> 
> Agreed.  Further, page cannot be a tail page (this is implied by the
> get_page_unless_zero() as tailpages have a zero refcount, and it
> is assumed by __PageMovable() as page->mapping is undefined for tail
> pages).  So this can actually be:
> 
> +PAGEFLAG(Isolated, isolated, PF_NO_TAIL);
> 
> I considered PF_ONLY_HEAD, but there are a lot more places that _check_
> PageIsolated() and I don't want to prove that they're all definitely
> working on head pages.

There was no followup to this.  I'll send the patch upstream as-is, but
please let's not forget this idea.


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
@ 2022-03-21 23:26     ` Andrew Morton
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2022-03-21 23:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Yang, Matthias Brugger, Vlastimil Babka, David Howells,
	William Kucharski, David Hildenbrand, Yang Shi, Marc Zyngier,
	linux-kernel, linux-mm, linux-arm-kernel, linux-mediatek,
	wsd_upstream, Nicholas Tang, Kuan-Ying Lee

On Tue, 15 Mar 2022 03:32:16 +0000 Matthew Wilcox <willy@infradead.org> wrote:

> On Tue, Mar 15, 2022 at 11:05:15AM +0800, Andrew Yang wrote:
> > When memory is tight, system may start to compact memory for large
> > continuous memory demands. If one process tries to lock a memory page
> > that is being locked and isolated for compaction, it may wait a long time
> > or even forever. This is because compaction will perform non-atomic
> > PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> > set by the process that can't obtain the page lock and add itself to the
> > waiting queue to wait for the lock to be unlocked.
> > 
> > CPU1                            CPU2
> > lock_page(page); (successful)
> >                                 lock_page(); (failed)
> > __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> > unlock_page(page);
> > 
> > The solution is to not perform non-atomic operation on page flags while
> > holding page lock.
> > 
> > Signed-off-by: andrew.yang <andrew.yang@mediatek.com>
> > ---
> >  include/linux/page-flags.h |  2 +-
> >  mm/migrate.c               | 12 ++++++------
> >  2 files changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 1c3b6e5c8bfd..64a84a9835cb 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -918,7 +918,7 @@ PAGE_TYPE_OPS(Guard, guard)
> >  
> >  extern bool is_free_buddy_page(struct page *page);
> >  
> > -__PAGEFLAG(Isolated, isolated, PF_ANY);
> > +PAGEFLAG(Isolated, isolated, PF_ANY);
> 
> Agreed.  Further, page cannot be a tail page (this is implied by the
> get_page_unless_zero() as tailpages have a zero refcount, and it
> is assumed by __PageMovable() as page->mapping is undefined for tail
> pages).  So this can actually be:
> 
> +PAGEFLAG(Isolated, isolated, PF_NO_TAIL);
> 
> I considered PF_ONLY_HEAD, but there are a lot more places that _check_
> PageIsolated() and I don't want to prove that they're all definitely
> working on head pages.

There was no followup to this.  I'll send the patch upstream as-is, but
please let's not forget this idea.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-03-21 23:27 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220315030515.20263-1-andrew.yang@mediatek.com>
2022-03-15  3:32 ` [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated Matthew Wilcox
2022-03-15  3:32   ` Matthew Wilcox
2022-03-15  3:32   ` Matthew Wilcox
2022-03-21 23:26   ` Andrew Morton
2022-03-21 23:26     ` Andrew Morton
2022-03-21 23:26     ` Andrew Morton
2022-03-15  4:21 ` Andrew Morton
2022-03-15  4:21   ` Andrew Morton
2022-03-15  4:21   ` Andrew Morton
2022-03-15 15:45   ` David Hildenbrand
2022-03-15 15:45     ` David Hildenbrand
2022-03-15 15:45     ` David Hildenbrand
2022-03-15 17:43     ` Matthew Wilcox
2022-03-15 17:43       ` Matthew Wilcox
2022-03-15 17:43       ` Matthew Wilcox
2022-03-15 19:07       ` David Hildenbrand
2022-03-15 19:07         ` David Hildenbrand
2022-03-15 19:07         ` David Hildenbrand
2022-03-15 20:34     ` Hugh Dickins
2022-03-15 20:34       ` Hugh Dickins
2022-03-15 20:34       ` Hugh Dickins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.