All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: Barry Song <21cnbao@gmail.com>
Cc: "Catalin Marinas" <catalin.marinas@arm.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Will Deacon" <will@kernel.org>, Linux-MM <linux-mm@kvack.org>,
	LAK <linux-arm-kernel@lists.infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	hanchuanhua <hanchuanhua@oppo.com>,
	"张诗明(Simon Zhang)" <zhangshiming@oppo.com>, 郭健 <guojian@oppo.com>,
	"Barry Song" <v-songbaohua@oppo.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	"Minchan Kim" <minchan@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Hugh Dickins" <hughd@google.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Steven Price" <steven.price@arm.com>
Subject: Re: [PATCH] arm64: enable THP_SWAP for arm64
Date: Thu, 26 May 2022 10:02:58 -0700	[thread overview]
Message-ID: <CAHbLzko=tVVCozxZsny-_kn5GOERWYsUQXFQ18J7h2gcs_LjPg@mail.gmail.com> (raw)
In-Reply-To: <CAGsJ_4xeOqOvnf3t7PDM4EJ9YuUUvi7w88rk_KrN6StMkYKUYg@mail.gmail.com>

On Thu, May 26, 2022 at 2:19 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Thu, May 26, 2022 at 5:49 AM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Wed, May 25, 2022 at 4:10 AM Barry Song <21cnbao@gmail.com> wrote:
> > >
> > > On Wed, May 25, 2022 at 7:14 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > >
> > > > On Tue, May 24, 2022 at 10:05:35PM +1200, Barry Song wrote:
> > > > > On Tue, May 24, 2022 at 8:12 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > > > On Tue, May 24, 2022 at 07:14:03PM +1200, Barry Song wrote:
> > > > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > > > > index d550f5acfaf3..8e3771c56fbf 100644
> > > > > > > --- a/arch/arm64/Kconfig
> > > > > > > +++ b/arch/arm64/Kconfig
> > > > > > > @@ -98,6 +98,7 @@ config ARM64
> > > > > > >       select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
> > > > > > >       select ARCH_WANT_LD_ORPHAN_WARN
> > > > > > >       select ARCH_WANTS_NO_INSTR
> > > > > > > +     select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
> > > > > >
> > > > > > I'm not opposed to this but I think it would break pages mapped with
> > > > > > PROT_MTE. We have an assumption in mte_sync_tags() that compound pages
> > > > > > are not swapped out (or in). With MTE, we store the tags in a slab
> > > > >
> > > > > I assume you mean mte_sync_tags() require that THP is not swapped as a whole,
> > > > > as without THP_SWP, THP is still swapping after being splitted. MTE doesn't stop
> > > > > THP from swapping through a couple of splitted pages, does it?
> > > >
> > > > That's correct, split THP page are swapped out/in just fine.
> > > >
> > > > > > object (128-bytes per swapped page) and restore them when pages are
> > > > > > swapped in. At some point we may teach the core swap code about such
> > > > > > metadata but in the meantime that was the easiest way.
> > > > >
> > > > > If my previous assumption is true,  the easiest way to enable THP_SWP
> > > > > for this moment might be always letting mm fallback to the splitting
> > > > > way for MTE hardware. For this moment, I care about THP_SWP more as
> > > > > none of my hardware has MTE.
> > > > >
> > > > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > > > > index 45c358538f13..d55a2a3e41a9 100644
> > > > > --- a/arch/arm64/include/asm/pgtable.h
> > > > > +++ b/arch/arm64/include/asm/pgtable.h
> > > > > @@ -44,6 +44,8 @@
> > > > >         __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
> > > > >  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > > > >
> > > > > +#define arch_thp_swp_supported !system_supports_mte
> > > > > +
> > > > >  /*
> > > > >   * Outside of a few very special situations (e.g. hibernation), we always
> > > > >   * use broadcast TLB invalidation instructions, therefore a spurious page
> > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > > index 2999190adc22..064b6b03df9e 100644
> > > > > --- a/include/linux/huge_mm.h
> > > > > +++ b/include/linux/huge_mm.h
> > > > > @@ -447,4 +447,16 @@ static inline int split_folio_to_list(struct folio *folio,
> > > > >         return split_huge_page_to_list(&folio->page, list);
> > > > >  }
> > > > >
> > > > > +/*
> > > > > + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> > > > > + * limitations in the implementation like arm64 MTE can override this to
> > > > > + * false
> > > > > + */
> > > > > +#ifndef arch_thp_swp_supported
> > > > > +static inline bool arch_thp_swp_supported(void)
> > > > > +{
> > > > > +       return true;
> > > > > +}
> > > > > +#endif
> > > > > +
> > > > >  #endif /* _LINUX_HUGE_MM_H */
> > > > > diff --git a/mm/swap_slots.c b/mm/swap_slots.c
> > > > > index 2b5531840583..dde685836328 100644
> > > > > --- a/mm/swap_slots.c
> > > > > +++ b/mm/swap_slots.c
> > > > > @@ -309,7 +309,7 @@ swp_entry_t get_swap_page(struct page *page)
> > > > >         entry.val = 0;
> > > > >
> > > > >         if (PageTransHuge(page)) {
> > > > > -               if (IS_ENABLED(CONFIG_THP_SWAP))
> > > > > +               if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
> > > > >                         get_swap_pages(1, &entry, HPAGE_PMD_NR);
> > > > >                 goto out;
> > > >
> > > > I think this should work and with your other proposal it would be
> > > > limited to MTE pages:
> > > >
> > > > #define arch_thp_swp_supported(page)    (!test_bit(PG_mte_tagged, &page->flags))
> > > >
> > > > Are THP pages loaded from swap as a whole or are they split? IIRC the
> > >
> > > i can confirm thp is written as a whole through:
> > > [   90.622863]  __swap_writepage+0xe8/0x580
> > > [   90.622881]  swap_writepage+0x44/0xf8
> > > [   90.622891]  pageout+0xe0/0x2a8
> > > [   90.622906]  shrink_page_list+0x9dc/0xde0
> > > [   90.622917]  shrink_inactive_list+0x1ec/0x3c8
> > > [   90.622928]  shrink_lruvec+0x3dc/0x628
> > > [   90.622939]  shrink_node+0x37c/0x6a0
> > > [   90.622950]  balance_pgdat+0x354/0x668
> > > [   90.622961]  kswapd+0x1e0/0x3c0
> > > [   90.622972]  kthread+0x110/0x120
> > >
> > > but i have never got a backtrace in which thp is loaded as a whole though it
> > > seems the code has this path:
> >
> > THP could be swapped out in a whole, but never swapped in as THP. Just
> > the single base page (4K on x86) is swapped in.
>
> yep. it seems swapin_readahead() is never reading a THP or even splitted
> pages for this 2MB THP.
>
> the number of pages to be read-ahead is determined either by
> /proc/sys/vm/page-cluster if /sys/kernel/mm/swap/vma_ra_enabled is fase
> or
> by vma read-ahead algorithm if /sys//kernel/mm/swap/vma_ra_enabled is true
> And the number is usually quite small.
>
> Am I missing any case in which 2MB can be swapped in as whole either by
> splitted pages or a THP?

Even though readahead swaps in 2MB, they are 512 single base pages
rather than THP. They may not be physically continuous at all.

>
> Thanks
> Barry

WARNING: multiple messages have this Message-ID (diff)
From: Yang Shi <shy828301@gmail.com>
To: Barry Song <21cnbao@gmail.com>
Cc: "Catalin Marinas" <catalin.marinas@arm.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Will Deacon" <will@kernel.org>, Linux-MM <linux-mm@kvack.org>,
	LAK <linux-arm-kernel@lists.infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	hanchuanhua <hanchuanhua@oppo.com>,
	"张诗明(Simon Zhang)" <zhangshiming@oppo.com>, 郭健 <guojian@oppo.com>,
	"Barry Song" <v-songbaohua@oppo.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	"Minchan Kim" <minchan@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Hugh Dickins" <hughd@google.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Steven Price" <steven.price@arm.com>
Subject: Re: [PATCH] arm64: enable THP_SWAP for arm64
Date: Thu, 26 May 2022 10:02:58 -0700	[thread overview]
Message-ID: <CAHbLzko=tVVCozxZsny-_kn5GOERWYsUQXFQ18J7h2gcs_LjPg@mail.gmail.com> (raw)
In-Reply-To: <CAGsJ_4xeOqOvnf3t7PDM4EJ9YuUUvi7w88rk_KrN6StMkYKUYg@mail.gmail.com>

On Thu, May 26, 2022 at 2:19 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Thu, May 26, 2022 at 5:49 AM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Wed, May 25, 2022 at 4:10 AM Barry Song <21cnbao@gmail.com> wrote:
> > >
> > > On Wed, May 25, 2022 at 7:14 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > >
> > > > On Tue, May 24, 2022 at 10:05:35PM +1200, Barry Song wrote:
> > > > > On Tue, May 24, 2022 at 8:12 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > > > On Tue, May 24, 2022 at 07:14:03PM +1200, Barry Song wrote:
> > > > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > > > > index d550f5acfaf3..8e3771c56fbf 100644
> > > > > > > --- a/arch/arm64/Kconfig
> > > > > > > +++ b/arch/arm64/Kconfig
> > > > > > > @@ -98,6 +98,7 @@ config ARM64
> > > > > > >       select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
> > > > > > >       select ARCH_WANT_LD_ORPHAN_WARN
> > > > > > >       select ARCH_WANTS_NO_INSTR
> > > > > > > +     select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
> > > > > >
> > > > > > I'm not opposed to this but I think it would break pages mapped with
> > > > > > PROT_MTE. We have an assumption in mte_sync_tags() that compound pages
> > > > > > are not swapped out (or in). With MTE, we store the tags in a slab
> > > > >
> > > > > I assume you mean mte_sync_tags() require that THP is not swapped as a whole,
> > > > > as without THP_SWP, THP is still swapping after being splitted. MTE doesn't stop
> > > > > THP from swapping through a couple of splitted pages, does it?
> > > >
> > > > That's correct, split THP page are swapped out/in just fine.
> > > >
> > > > > > object (128-bytes per swapped page) and restore them when pages are
> > > > > > swapped in. At some point we may teach the core swap code about such
> > > > > > metadata but in the meantime that was the easiest way.
> > > > >
> > > > > If my previous assumption is true,  the easiest way to enable THP_SWP
> > > > > for this moment might be always letting mm fallback to the splitting
> > > > > way for MTE hardware. For this moment, I care about THP_SWP more as
> > > > > none of my hardware has MTE.
> > > > >
> > > > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > > > > index 45c358538f13..d55a2a3e41a9 100644
> > > > > --- a/arch/arm64/include/asm/pgtable.h
> > > > > +++ b/arch/arm64/include/asm/pgtable.h
> > > > > @@ -44,6 +44,8 @@
> > > > >         __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
> > > > >  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > > > >
> > > > > +#define arch_thp_swp_supported !system_supports_mte
> > > > > +
> > > > >  /*
> > > > >   * Outside of a few very special situations (e.g. hibernation), we always
> > > > >   * use broadcast TLB invalidation instructions, therefore a spurious page
> > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > > index 2999190adc22..064b6b03df9e 100644
> > > > > --- a/include/linux/huge_mm.h
> > > > > +++ b/include/linux/huge_mm.h
> > > > > @@ -447,4 +447,16 @@ static inline int split_folio_to_list(struct folio *folio,
> > > > >         return split_huge_page_to_list(&folio->page, list);
> > > > >  }
> > > > >
> > > > > +/*
> > > > > + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> > > > > + * limitations in the implementation like arm64 MTE can override this to
> > > > > + * false
> > > > > + */
> > > > > +#ifndef arch_thp_swp_supported
> > > > > +static inline bool arch_thp_swp_supported(void)
> > > > > +{
> > > > > +       return true;
> > > > > +}
> > > > > +#endif
> > > > > +
> > > > >  #endif /* _LINUX_HUGE_MM_H */
> > > > > diff --git a/mm/swap_slots.c b/mm/swap_slots.c
> > > > > index 2b5531840583..dde685836328 100644
> > > > > --- a/mm/swap_slots.c
> > > > > +++ b/mm/swap_slots.c
> > > > > @@ -309,7 +309,7 @@ swp_entry_t get_swap_page(struct page *page)
> > > > >         entry.val = 0;
> > > > >
> > > > >         if (PageTransHuge(page)) {
> > > > > -               if (IS_ENABLED(CONFIG_THP_SWAP))
> > > > > +               if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
> > > > >                         get_swap_pages(1, &entry, HPAGE_PMD_NR);
> > > > >                 goto out;
> > > >
> > > > I think this should work and with your other proposal it would be
> > > > limited to MTE pages:
> > > >
> > > > #define arch_thp_swp_supported(page)    (!test_bit(PG_mte_tagged, &page->flags))
> > > >
> > > > Are THP pages loaded from swap as a whole or are they split? IIRC the
> > >
> > > i can confirm thp is written as a whole through:
> > > [   90.622863]  __swap_writepage+0xe8/0x580
> > > [   90.622881]  swap_writepage+0x44/0xf8
> > > [   90.622891]  pageout+0xe0/0x2a8
> > > [   90.622906]  shrink_page_list+0x9dc/0xde0
> > > [   90.622917]  shrink_inactive_list+0x1ec/0x3c8
> > > [   90.622928]  shrink_lruvec+0x3dc/0x628
> > > [   90.622939]  shrink_node+0x37c/0x6a0
> > > [   90.622950]  balance_pgdat+0x354/0x668
> > > [   90.622961]  kswapd+0x1e0/0x3c0
> > > [   90.622972]  kthread+0x110/0x120
> > >
> > > but i have never got a backtrace in which thp is loaded as a whole though it
> > > seems the code has this path:
> >
> > THP could be swapped out in a whole, but never swapped in as THP. Just
> > the single base page (4K on x86) is swapped in.
>
> yep. it seems swapin_readahead() is never reading a THP or even splitted
> pages for this 2MB THP.
>
> the number of pages to be read-ahead is determined either by
> /proc/sys/vm/page-cluster if /sys/kernel/mm/swap/vma_ra_enabled is fase
> or
> by vma read-ahead algorithm if /sys//kernel/mm/swap/vma_ra_enabled is true
> And the number is usually quite small.
>
> Am I missing any case in which 2MB can be swapped in as whole either by
> splitted pages or a THP?

Even though readahead swaps in 2MB, they are 512 single base pages
rather than THP. They may not be physically continuous at all.

>
> Thanks
> Barry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-05-26 17:03 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-24  7:14 [PATCH] arm64: enable THP_SWAP for arm64 Barry Song
2022-05-24  7:14 ` Barry Song
2022-05-24  8:12 ` Catalin Marinas
2022-05-24  8:12   ` Catalin Marinas
2022-05-24 10:05   ` Barry Song
2022-05-24 10:05     ` Barry Song
2022-05-24 11:15     ` Barry Song
2022-05-24 11:15       ` Barry Song
2022-05-26  8:13       ` Anshuman Khandual
2022-05-26  8:13         ` Anshuman Khandual
2022-05-24 19:14     ` Catalin Marinas
2022-05-24 19:14       ` Catalin Marinas
2022-05-25 11:10       ` Barry Song
2022-05-25 11:10         ` Barry Song
2022-05-25 16:54         ` Catalin Marinas
2022-05-25 16:54           ` Catalin Marinas
2022-05-25 17:49         ` Yang Shi
2022-05-25 17:49           ` Yang Shi
2022-05-26  9:19           ` Barry Song
2022-05-26  9:19             ` Barry Song
2022-05-26 17:02             ` Yang Shi [this message]
2022-05-26 17:02               ` Yang Shi
2022-05-27  7:29               ` Barry Song
2022-05-27  7:29                 ` Barry Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHbLzko=tVVCozxZsny-_kn5GOERWYsUQXFQ18J7h2gcs_LjPg@mail.gmail.com' \
    --to=shy828301@gmail.com \
    --cc=21cnbao@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=guojian@oppo.com \
    --cc=hanchuanhua@oppo.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=steven.price@arm.com \
    --cc=v-songbaohua@oppo.com \
    --cc=will@kernel.org \
    --cc=ying.huang@intel.com \
    --cc=zhangshiming@oppo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.