All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andrew Yang <andrew.yang@mediatek.com>,
	Matthias Brugger <matthias.bgg@gmail.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	David Howells <dhowells@redhat.com>,
	William Kucharski <william.kucharski@oracle.com>,
	Yang Shi <shy828301@gmail.com>, Marc Zyngier <maz@kernel.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-arm-kernel@lists.infradead.org,
	linux-mediatek@lists.infradead.org, wsd_upstream@mediatek.com,
	Nicholas Tang <nicholas.tang@mediatek.com>,
	Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Subject: Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
Date: Tue, 15 Mar 2022 17:43:33 +0000	[thread overview]
Message-ID: <YjDQRdShE6syVSnM@casper.infradead.org> (raw)
In-Reply-To: <4cb789a5-c49c-f095-1f7e-67be65ba508a@redhat.com>

On Tue, Mar 15, 2022 at 04:45:13PM +0100, David Hildenbrand wrote:
> On 15.03.22 05:21, Andrew Morton wrote:
> > On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:
> > 
> >> When memory is tight, system may start to compact memory for large
> >> continuous memory demands. If one process tries to lock a memory page
> >> that is being locked and isolated for compaction, it may wait a long time
> >> or even forever. This is because compaction will perform non-atomic
> >> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> >> set by the process that can't obtain the page lock and add itself to the
> >> waiting queue to wait for the lock to be unlocked.
> >>
> >> CPU1                            CPU2
> >> lock_page(page); (successful)
> >>                                 lock_page(); (failed)
> >> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> >> unlock_page(page);
> >>
> >> The solution is to not perform non-atomic operation on page flags while
> >> holding page lock.
> > 
> > Sure, the non-atomic bitop optimization is really risky and I suspect
> > we reach for it too often.  Or at least without really clearly
> > demonstrating that it is safe, and documenting our assumptions.
> 
> I agree. IIRC, non-atomic variants are mostly only safe while the
> refcount is 0. Everything else is just absolutely fragile.

We could add an assertion ... I just tried this:

+++ b/include/linux/page-flags.h
@@ -342,14 +342,16 @@ static __always_inline                                                    \
 void __folio_set_##lname(struct folio *folio)                          \
 { __set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }         \
 static __always_inline void __SetPage##uname(struct page *page)                \
-{ __set_bit(PG_##lname, &policy(page, 1)->flags); }
+{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
+  __set_bit(PG_##lname, &policy(page, 1)->flags); }

 #define __CLEARPAGEFLAG(uname, lname, policy)                          \
 static __always_inline                                                 \
 void __folio_clear_##lname(struct folio *folio)                                \
 { __clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }       \
 static __always_inline void __ClearPage##uname(struct page *page)      \
-{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }
+{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
+  __clear_bit(PG_##lname, &policy(page, 1)->flags); }

 #define TESTSETFLAG(uname, lname, policy)                              \
 static __always_inline                                                 \

... but it dies _really_ early:

(gdb) bt
#0  0xffffffff820055e5 in native_halt ()
    at ../arch/x86/include/asm/irqflags.h:57
#1  halt () at ../arch/x86/include/asm/irqflags.h:98
#2  early_fixup_exception (regs=regs@entry=0xffffffff81e03cf8,
    trapnr=trapnr@entry=6) at ../arch/x86/mm/extable.c:283
#3  0xffffffff81ff243c in do_early_exception (regs=0xffffffff81e03cf8,
    trapnr=6) at ../arch/x86/kernel/head64.c:419
#4  0xffffffff81ff214f in early_idt_handler_common ()
    at ../arch/x86/kernel/head_64.S:417
#5  0x0000000000000000 in ?? ()

and honestly, I'm not sure how to debug something that goes wrong this
early.  Maybe I need to make that start warning 5 seconds after boot
or only if we're not in pid 1, or something ...

WARNING: multiple messages have this Message-ID (diff)
From: Matthew Wilcox <willy@infradead.org>
To: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andrew Yang <andrew.yang@mediatek.com>,
	Matthias Brugger <matthias.bgg@gmail.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	David Howells <dhowells@redhat.com>,
	William Kucharski <william.kucharski@oracle.com>,
	Yang Shi <shy828301@gmail.com>, Marc Zyngier <maz@kernel.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-arm-kernel@lists.infradead.org,
	linux-mediatek@lists.infradead.org, wsd_upstream@mediatek.com,
	Nicholas Tang <nicholas.tang@mediatek.com>,
	Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Subject: Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
Date: Tue, 15 Mar 2022 17:43:33 +0000	[thread overview]
Message-ID: <YjDQRdShE6syVSnM@casper.infradead.org> (raw)
In-Reply-To: <4cb789a5-c49c-f095-1f7e-67be65ba508a@redhat.com>

On Tue, Mar 15, 2022 at 04:45:13PM +0100, David Hildenbrand wrote:
> On 15.03.22 05:21, Andrew Morton wrote:
> > On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:
> > 
> >> When memory is tight, system may start to compact memory for large
> >> continuous memory demands. If one process tries to lock a memory page
> >> that is being locked and isolated for compaction, it may wait a long time
> >> or even forever. This is because compaction will perform non-atomic
> >> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> >> set by the process that can't obtain the page lock and add itself to the
> >> waiting queue to wait for the lock to be unlocked.
> >>
> >> CPU1                            CPU2
> >> lock_page(page); (successful)
> >>                                 lock_page(); (failed)
> >> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> >> unlock_page(page);
> >>
> >> The solution is to not perform non-atomic operation on page flags while
> >> holding page lock.
> > 
> > Sure, the non-atomic bitop optimization is really risky and I suspect
> > we reach for it too often.  Or at least without really clearly
> > demonstrating that it is safe, and documenting our assumptions.
> 
> I agree. IIRC, non-atomic variants are mostly only safe while the
> refcount is 0. Everything else is just absolutely fragile.

We could add an assertion ... I just tried this:

+++ b/include/linux/page-flags.h
@@ -342,14 +342,16 @@ static __always_inline                                                    \
 void __folio_set_##lname(struct folio *folio)                          \
 { __set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }         \
 static __always_inline void __SetPage##uname(struct page *page)                \
-{ __set_bit(PG_##lname, &policy(page, 1)->flags); }
+{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
+  __set_bit(PG_##lname, &policy(page, 1)->flags); }

 #define __CLEARPAGEFLAG(uname, lname, policy)                          \
 static __always_inline                                                 \
 void __folio_clear_##lname(struct folio *folio)                                \
 { __clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }       \
 static __always_inline void __ClearPage##uname(struct page *page)      \
-{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }
+{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
+  __clear_bit(PG_##lname, &policy(page, 1)->flags); }

 #define TESTSETFLAG(uname, lname, policy)                              \
 static __always_inline                                                 \

... but it dies _really_ early:

(gdb) bt
#0  0xffffffff820055e5 in native_halt ()
    at ../arch/x86/include/asm/irqflags.h:57
#1  halt () at ../arch/x86/include/asm/irqflags.h:98
#2  early_fixup_exception (regs=regs@entry=0xffffffff81e03cf8,
    trapnr=trapnr@entry=6) at ../arch/x86/mm/extable.c:283
#3  0xffffffff81ff243c in do_early_exception (regs=0xffffffff81e03cf8,
    trapnr=6) at ../arch/x86/kernel/head64.c:419
#4  0xffffffff81ff214f in early_idt_handler_common ()
    at ../arch/x86/kernel/head_64.S:417
#5  0x0000000000000000 in ?? ()

and honestly, I'm not sure how to debug something that goes wrong this
early.  Maybe I need to make that start warning 5 seconds after boot
or only if we're not in pid 1, or something ...

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

WARNING: multiple messages have this Message-ID (diff)
From: Matthew Wilcox <willy@infradead.org>
To: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andrew Yang <andrew.yang@mediatek.com>,
	Matthias Brugger <matthias.bgg@gmail.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	David Howells <dhowells@redhat.com>,
	William Kucharski <william.kucharski@oracle.com>,
	Yang Shi <shy828301@gmail.com>, Marc Zyngier <maz@kernel.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-arm-kernel@lists.infradead.org,
	linux-mediatek@lists.infradead.org, wsd_upstream@mediatek.com,
	Nicholas Tang <nicholas.tang@mediatek.com>,
	Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Subject: Re: [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated
Date: Tue, 15 Mar 2022 17:43:33 +0000	[thread overview]
Message-ID: <YjDQRdShE6syVSnM@casper.infradead.org> (raw)
In-Reply-To: <4cb789a5-c49c-f095-1f7e-67be65ba508a@redhat.com>

On Tue, Mar 15, 2022 at 04:45:13PM +0100, David Hildenbrand wrote:
> On 15.03.22 05:21, Andrew Morton wrote:
> > On Tue, 15 Mar 2022 11:05:15 +0800 Andrew Yang <andrew.yang@mediatek.com> wrote:
> > 
> >> When memory is tight, system may start to compact memory for large
> >> continuous memory demands. If one process tries to lock a memory page
> >> that is being locked and isolated for compaction, it may wait a long time
> >> or even forever. This is because compaction will perform non-atomic
> >> PG_Isolated clear while holding page lock, this may overwrite PG_waiters
> >> set by the process that can't obtain the page lock and add itself to the
> >> waiting queue to wait for the lock to be unlocked.
> >>
> >> CPU1                            CPU2
> >> lock_page(page); (successful)
> >>                                 lock_page(); (failed)
> >> __ClearPageIsolated(page);      SetPageWaiters(page) (may be overwritten)
> >> unlock_page(page);
> >>
> >> The solution is to not perform non-atomic operation on page flags while
> >> holding page lock.
> > 
> > Sure, the non-atomic bitop optimization is really risky and I suspect
> > we reach for it too often.  Or at least without really clearly
> > demonstrating that it is safe, and documenting our assumptions.
> 
> I agree. IIRC, non-atomic variants are mostly only safe while the
> refcount is 0. Everything else is just absolutely fragile.

We could add an assertion ... I just tried this:

+++ b/include/linux/page-flags.h
@@ -342,14 +342,16 @@ static __always_inline                                                    \
 void __folio_set_##lname(struct folio *folio)                          \
 { __set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }         \
 static __always_inline void __SetPage##uname(struct page *page)                \
-{ __set_bit(PG_##lname, &policy(page, 1)->flags); }
+{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
+  __set_bit(PG_##lname, &policy(page, 1)->flags); }

 #define __CLEARPAGEFLAG(uname, lname, policy)                          \
 static __always_inline                                                 \
 void __folio_clear_##lname(struct folio *folio)                                \
 { __clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); }       \
 static __always_inline void __ClearPage##uname(struct page *page)      \
-{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }
+{ VM_BUG_ON_PGFLAGS(atomic_read(&policy(page, 1)->_refcount), page);   \
+  __clear_bit(PG_##lname, &policy(page, 1)->flags); }

 #define TESTSETFLAG(uname, lname, policy)                              \
 static __always_inline                                                 \

... but it dies _really_ early:

(gdb) bt
#0  0xffffffff820055e5 in native_halt ()
    at ../arch/x86/include/asm/irqflags.h:57
#1  halt () at ../arch/x86/include/asm/irqflags.h:98
#2  early_fixup_exception (regs=regs@entry=0xffffffff81e03cf8,
    trapnr=trapnr@entry=6) at ../arch/x86/mm/extable.c:283
#3  0xffffffff81ff243c in do_early_exception (regs=0xffffffff81e03cf8,
    trapnr=6) at ../arch/x86/kernel/head64.c:419
#4  0xffffffff81ff214f in early_idt_handler_common ()
    at ../arch/x86/kernel/head_64.S:417
#5  0x0000000000000000 in ?? ()

and honestly, I'm not sure how to debug something that goes wrong this
early.  Maybe I need to make that start warning 5 seconds after boot
or only if we're not in pid 1, or something ...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-03-15 17:43 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20220315030515.20263-1-andrew.yang@mediatek.com>
2022-03-15  3:32 ` [PATCH] mm/migrate: fix race between lock page and clear PG_Isolated Matthew Wilcox
2022-03-15  3:32   ` Matthew Wilcox
2022-03-15  3:32   ` Matthew Wilcox
2022-03-21 23:26   ` Andrew Morton
2022-03-21 23:26     ` Andrew Morton
2022-03-21 23:26     ` Andrew Morton
2022-03-15  4:21 ` Andrew Morton
2022-03-15  4:21   ` Andrew Morton
2022-03-15  4:21   ` Andrew Morton
2022-03-15 15:45   ` David Hildenbrand
2022-03-15 15:45     ` David Hildenbrand
2022-03-15 15:45     ` David Hildenbrand
2022-03-15 17:43     ` Matthew Wilcox [this message]
2022-03-15 17:43       ` Matthew Wilcox
2022-03-15 17:43       ` Matthew Wilcox
2022-03-15 19:07       ` David Hildenbrand
2022-03-15 19:07         ` David Hildenbrand
2022-03-15 19:07         ` David Hildenbrand
2022-03-15 20:34     ` Hugh Dickins
2022-03-15 20:34       ` Hugh Dickins
2022-03-15 20:34       ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjDQRdShE6syVSnM@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=Kuan-Ying.Lee@mediatek.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.yang@mediatek.com \
    --cc=david@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=matthias.bgg@gmail.com \
    --cc=maz@kernel.org \
    --cc=nicholas.tang@mediatek.com \
    --cc=shy828301@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=william.kucharski@oracle.com \
    --cc=wsd_upstream@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.