* Re: [PATCH] arm64: enable THP_SWAP for arm64
@ 2022-05-25 11:10 ` Barry Song
0 siblings, 0 replies; 24+ messages in thread
From: Barry Song @ 2022-05-25 11:10 UTC (permalink / raw)
To: Catalin Marinas
Cc: Andrew Morton, Will Deacon, Linux-MM, LAK, LKML, hanchuanhua,
张诗明(Simon Zhang), 郭健,
Barry Song, Huang, Ying, Minchan Kim, Johannes Weiner,
Hugh Dickins, Shaohua Li, Rik van Riel, Andrea Arcangeli,
Steven Price
On Wed, May 25, 2022 at 7:14 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Tue, May 24, 2022 at 10:05:35PM +1200, Barry Song wrote:
> > On Tue, May 24, 2022 at 8:12 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > On Tue, May 24, 2022 at 07:14:03PM +1200, Barry Song wrote:
> > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > index d550f5acfaf3..8e3771c56fbf 100644
> > > > --- a/arch/arm64/Kconfig
> > > > +++ b/arch/arm64/Kconfig
> > > > @@ -98,6 +98,7 @@ config ARM64
> > > > select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
> > > > select ARCH_WANT_LD_ORPHAN_WARN
> > > > select ARCH_WANTS_NO_INSTR
> > > > + select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
> > >
> > > I'm not opposed to this but I think it would break pages mapped with
> > > PROT_MTE. We have an assumption in mte_sync_tags() that compound pages
> > > are not swapped out (or in). With MTE, we store the tags in a slab
> >
> > I assume you mean mte_sync_tags() require that THP is not swapped as a whole,
> > as without THP_SWP, THP is still swapping after being splitted. MTE doesn't stop
> > THP from swapping through a couple of splitted pages, does it?
>
> That's correct, split THP page are swapped out/in just fine.
>
> > > object (128-bytes per swapped page) and restore them when pages are
> > > swapped in. At some point we may teach the core swap code about such
> > > metadata but in the meantime that was the easiest way.
> >
> > If my previous assumption is true, the easiest way to enable THP_SWP
> > for this moment might be always letting mm fallback to the splitting
> > way for MTE hardware. For this moment, I care about THP_SWP more as
> > none of my hardware has MTE.
> >
> > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > index 45c358538f13..d55a2a3e41a9 100644
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -44,6 +44,8 @@
> > __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
> > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >
> > +#define arch_thp_swp_supported !system_supports_mte
> > +
> > /*
> > * Outside of a few very special situations (e.g. hibernation), we always
> > * use broadcast TLB invalidation instructions, therefore a spurious page
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 2999190adc22..064b6b03df9e 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -447,4 +447,16 @@ static inline int split_folio_to_list(struct folio *folio,
> > return split_huge_page_to_list(&folio->page, list);
> > }
> >
> > +/*
> > + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> > + * limitations in the implementation like arm64 MTE can override this to
> > + * false
> > + */
> > +#ifndef arch_thp_swp_supported
> > +static inline bool arch_thp_swp_supported(void)
> > +{
> > + return true;
> > +}
> > +#endif
> > +
> > #endif /* _LINUX_HUGE_MM_H */
> > diff --git a/mm/swap_slots.c b/mm/swap_slots.c
> > index 2b5531840583..dde685836328 100644
> > --- a/mm/swap_slots.c
> > +++ b/mm/swap_slots.c
> > @@ -309,7 +309,7 @@ swp_entry_t get_swap_page(struct page *page)
> > entry.val = 0;
> >
> > if (PageTransHuge(page)) {
> > - if (IS_ENABLED(CONFIG_THP_SWAP))
> > + if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
> > get_swap_pages(1, &entry, HPAGE_PMD_NR);
> > goto out;
>
> I think this should work and with your other proposal it would be
> limited to MTE pages:
>
> #define arch_thp_swp_supported(page) (!test_bit(PG_mte_tagged, &page->flags))
>
> Are THP pages loaded from swap as a whole or are they split? IIRC the
i can confirm thp is written as a whole through:
[ 90.622863] __swap_writepage+0xe8/0x580
[ 90.622881] swap_writepage+0x44/0xf8
[ 90.622891] pageout+0xe0/0x2a8
[ 90.622906] shrink_page_list+0x9dc/0xde0
[ 90.622917] shrink_inactive_list+0x1ec/0x3c8
[ 90.622928] shrink_lruvec+0x3dc/0x628
[ 90.622939] shrink_node+0x37c/0x6a0
[ 90.622950] balance_pgdat+0x354/0x668
[ 90.622961] kswapd+0x1e0/0x3c0
[ 90.622972] kthread+0x110/0x120
but i have never got a backtrace in which thp is loaded as a whole though it
seems the code has this path:
int swap_readpage(struct page *page, bool synchronous)
{
...
bio = bio_alloc(sis->bdev, 1, REQ_OP_READ, GFP_KERNEL);
bio->bi_iter.bi_sector = swap_page_sector(page);
bio->bi_end_io = end_swap_bio_read;
bio_add_page(bio, page, thp_size(page), 0);
...
submit_bio(bio);
}
> splitting still happens but after the swapping out finishes. Even if
> they are loaded as 4K pages, we still have the mte_save_tags() that only
> understands small pages currently, so rejecting THP pages is probably
> best.
as anyway i don't have a mte-hardware to do a valid test to go any
further, so i will totally disable thp_swp for hardware having mte for
this moment in patch v2.
>
> --
> Catalin
Thanks
Barry
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] arm64: enable THP_SWAP for arm64
2022-05-25 11:10 ` Barry Song
@ 2022-05-25 16:54 ` Catalin Marinas
-1 siblings, 0 replies; 24+ messages in thread
From: Catalin Marinas @ 2022-05-25 16:54 UTC (permalink / raw)
To: Barry Song
Cc: Andrew Morton, Will Deacon, Linux-MM, LAK, LKML, hanchuanhua,
张诗明(Simon Zhang), 郭健,
Barry Song, Huang, Ying, Minchan Kim, Johannes Weiner,
Hugh Dickins, Shaohua Li, Rik van Riel, Andrea Arcangeli,
Steven Price
On Wed, May 25, 2022 at 11:10:41PM +1200, Barry Song wrote:
> On Wed, May 25, 2022 at 7:14 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > I think this should work and with your other proposal it would be
> > limited to MTE pages:
> >
> > #define arch_thp_swp_supported(page) (!test_bit(PG_mte_tagged, &page->flags))
> >
> > Are THP pages loaded from swap as a whole or are they split? IIRC the
>
> i can confirm thp is written as a whole through:
> [ 90.622863] __swap_writepage+0xe8/0x580
> [ 90.622881] swap_writepage+0x44/0xf8
> [ 90.622891] pageout+0xe0/0x2a8
> [ 90.622906] shrink_page_list+0x9dc/0xde0
> [ 90.622917] shrink_inactive_list+0x1ec/0x3c8
> [ 90.622928] shrink_lruvec+0x3dc/0x628
> [ 90.622939] shrink_node+0x37c/0x6a0
> [ 90.622950] balance_pgdat+0x354/0x668
> [ 90.622961] kswapd+0x1e0/0x3c0
> [ 90.622972] kthread+0x110/0x120
>
> but i have never got a backtrace in which thp is loaded as a whole though it
> seems the code has this path:
> int swap_readpage(struct page *page, bool synchronous)
> {
> ...
> bio = bio_alloc(sis->bdev, 1, REQ_OP_READ, GFP_KERNEL);
> bio->bi_iter.bi_sector = swap_page_sector(page);
> bio->bi_end_io = end_swap_bio_read;
> bio_add_page(bio, page, thp_size(page), 0);
> ...
> submit_bio(bio);
> }
>
> > splitting still happens but after the swapping out finishes. Even if
> > they are loaded as 4K pages, we still have the mte_save_tags() that only
> > understands small pages currently, so rejecting THP pages is probably
> > best.
>
> as anyway i don't have a mte-hardware to do a valid test to go any
> further, so i will totally disable thp_swp for hardware having mte for
> this moment in patch v2.
It makes sense. If we decide to improve this for MTE, we'll change the
arch check.
Thanks.
--
Catalin
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] arm64: enable THP_SWAP for arm64
@ 2022-05-25 16:54 ` Catalin Marinas
0 siblings, 0 replies; 24+ messages in thread
From: Catalin Marinas @ 2022-05-25 16:54 UTC (permalink / raw)
To: Barry Song
Cc: Andrew Morton, Will Deacon, Linux-MM, LAK, LKML, hanchuanhua,
张诗明(Simon Zhang), 郭健,
Barry Song, Huang, Ying, Minchan Kim, Johannes Weiner,
Hugh Dickins, Shaohua Li, Rik van Riel, Andrea Arcangeli,
Steven Price
On Wed, May 25, 2022 at 11:10:41PM +1200, Barry Song wrote:
> On Wed, May 25, 2022 at 7:14 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > I think this should work and with your other proposal it would be
> > limited to MTE pages:
> >
> > #define arch_thp_swp_supported(page) (!test_bit(PG_mte_tagged, &page->flags))
> >
> > Are THP pages loaded from swap as a whole or are they split? IIRC the
>
> i can confirm thp is written as a whole through:
> [ 90.622863] __swap_writepage+0xe8/0x580
> [ 90.622881] swap_writepage+0x44/0xf8
> [ 90.622891] pageout+0xe0/0x2a8
> [ 90.622906] shrink_page_list+0x9dc/0xde0
> [ 90.622917] shrink_inactive_list+0x1ec/0x3c8
> [ 90.622928] shrink_lruvec+0x3dc/0x628
> [ 90.622939] shrink_node+0x37c/0x6a0
> [ 90.622950] balance_pgdat+0x354/0x668
> [ 90.622961] kswapd+0x1e0/0x3c0
> [ 90.622972] kthread+0x110/0x120
>
> but i have never got a backtrace in which thp is loaded as a whole though it
> seems the code has this path:
> int swap_readpage(struct page *page, bool synchronous)
> {
> ...
> bio = bio_alloc(sis->bdev, 1, REQ_OP_READ, GFP_KERNEL);
> bio->bi_iter.bi_sector = swap_page_sector(page);
> bio->bi_end_io = end_swap_bio_read;
> bio_add_page(bio, page, thp_size(page), 0);
> ...
> submit_bio(bio);
> }
>
> > splitting still happens but after the swapping out finishes. Even if
> > they are loaded as 4K pages, we still have the mte_save_tags() that only
> > understands small pages currently, so rejecting THP pages is probably
> > best.
>
> as anyway i don't have a mte-hardware to do a valid test to go any
> further, so i will totally disable thp_swp for hardware having mte for
> this moment in patch v2.
It makes sense. If we decide to improve this for MTE, we'll change the
arch check.
Thanks.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] arm64: enable THP_SWAP for arm64
2022-05-25 11:10 ` Barry Song
@ 2022-05-25 17:49 ` Yang Shi
-1 siblings, 0 replies; 24+ messages in thread
From: Yang Shi @ 2022-05-25 17:49 UTC (permalink / raw)
To: Barry Song
Cc: Catalin Marinas, Andrew Morton, Will Deacon, Linux-MM, LAK, LKML,
hanchuanhua, 张诗明(Simon Zhang),
郭健,
Barry Song, Huang, Ying, Minchan Kim, Johannes Weiner,
Hugh Dickins, Shaohua Li, Rik van Riel, Andrea Arcangeli,
Steven Price
On Wed, May 25, 2022 at 4:10 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Wed, May 25, 2022 at 7:14 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> >
> > On Tue, May 24, 2022 at 10:05:35PM +1200, Barry Song wrote:
> > > On Tue, May 24, 2022 at 8:12 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > On Tue, May 24, 2022 at 07:14:03PM +1200, Barry Song wrote:
> > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > > index d550f5acfaf3..8e3771c56fbf 100644
> > > > > --- a/arch/arm64/Kconfig
> > > > > +++ b/arch/arm64/Kconfig
> > > > > @@ -98,6 +98,7 @@ config ARM64
> > > > > select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
> > > > > select ARCH_WANT_LD_ORPHAN_WARN
> > > > > select ARCH_WANTS_NO_INSTR
> > > > > + select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
> > > >
> > > > I'm not opposed to this but I think it would break pages mapped with
> > > > PROT_MTE. We have an assumption in mte_sync_tags() that compound pages
> > > > are not swapped out (or in). With MTE, we store the tags in a slab
> > >
> > > I assume you mean mte_sync_tags() require that THP is not swapped as a whole,
> > > as without THP_SWP, THP is still swapping after being splitted. MTE doesn't stop
> > > THP from swapping through a couple of splitted pages, does it?
> >
> > That's correct, split THP page are swapped out/in just fine.
> >
> > > > object (128-bytes per swapped page) and restore them when pages are
> > > > swapped in. At some point we may teach the core swap code about such
> > > > metadata but in the meantime that was the easiest way.
> > >
> > > If my previous assumption is true, the easiest way to enable THP_SWP
> > > for this moment might be always letting mm fallback to the splitting
> > > way for MTE hardware. For this moment, I care about THP_SWP more as
> > > none of my hardware has MTE.
> > >
> > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > > index 45c358538f13..d55a2a3e41a9 100644
> > > --- a/arch/arm64/include/asm/pgtable.h
> > > +++ b/arch/arm64/include/asm/pgtable.h
> > > @@ -44,6 +44,8 @@
> > > __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
> > > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > >
> > > +#define arch_thp_swp_supported !system_supports_mte
> > > +
> > > /*
> > > * Outside of a few very special situations (e.g. hibernation), we always
> > > * use broadcast TLB invalidation instructions, therefore a spurious page
> > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > index 2999190adc22..064b6b03df9e 100644
> > > --- a/include/linux/huge_mm.h
> > > +++ b/include/linux/huge_mm.h
> > > @@ -447,4 +447,16 @@ static inline int split_folio_to_list(struct folio *folio,
> > > return split_huge_page_to_list(&folio->page, list);
> > > }
> > >
> > > +/*
> > > + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> > > + * limitations in the implementation like arm64 MTE can override this to
> > > + * false
> > > + */
> > > +#ifndef arch_thp_swp_supported
> > > +static inline bool arch_thp_swp_supported(void)
> > > +{
> > > + return true;
> > > +}
> > > +#endif
> > > +
> > > #endif /* _LINUX_HUGE_MM_H */
> > > diff --git a/mm/swap_slots.c b/mm/swap_slots.c
> > > index 2b5531840583..dde685836328 100644
> > > --- a/mm/swap_slots.c
> > > +++ b/mm/swap_slots.c
> > > @@ -309,7 +309,7 @@ swp_entry_t get_swap_page(struct page *page)
> > > entry.val = 0;
> > >
> > > if (PageTransHuge(page)) {
> > > - if (IS_ENABLED(CONFIG_THP_SWAP))
> > > + if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
> > > get_swap_pages(1, &entry, HPAGE_PMD_NR);
> > > goto out;
> >
> > I think this should work and with your other proposal it would be
> > limited to MTE pages:
> >
> > #define arch_thp_swp_supported(page) (!test_bit(PG_mte_tagged, &page->flags))
> >
> > Are THP pages loaded from swap as a whole or are they split? IIRC the
>
> i can confirm thp is written as a whole through:
> [ 90.622863] __swap_writepage+0xe8/0x580
> [ 90.622881] swap_writepage+0x44/0xf8
> [ 90.622891] pageout+0xe0/0x2a8
> [ 90.622906] shrink_page_list+0x9dc/0xde0
> [ 90.622917] shrink_inactive_list+0x1ec/0x3c8
> [ 90.622928] shrink_lruvec+0x3dc/0x628
> [ 90.622939] shrink_node+0x37c/0x6a0
> [ 90.622950] balance_pgdat+0x354/0x668
> [ 90.622961] kswapd+0x1e0/0x3c0
> [ 90.622972] kthread+0x110/0x120
>
> but i have never got a backtrace in which thp is loaded as a whole though it
> seems the code has this path:
THP could be swapped out in a whole, but never swapped in as THP. Just
the single base page (4K on x86) is swapped in.
> int swap_readpage(struct page *page, bool synchronous)
> {
> ...
> bio = bio_alloc(sis->bdev, 1, REQ_OP_READ, GFP_KERNEL);
> bio->bi_iter.bi_sector = swap_page_sector(page);
> bio->bi_end_io = end_swap_bio_read;
> bio_add_page(bio, page, thp_size(page), 0);
> ...
> submit_bio(bio);
> }
>
>
> > splitting still happens but after the swapping out finishes. Even if
> > they are loaded as 4K pages, we still have the mte_save_tags() that only
> > understands small pages currently, so rejecting THP pages is probably
> > best.
>
> as anyway i don't have a mte-hardware to do a valid test to go any
> further, so i will totally disable thp_swp for hardware having mte for
> this moment in patch v2.
>
> >
> > --
> > Catalin
>
> Thanks
> Barry
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] arm64: enable THP_SWAP for arm64
@ 2022-05-25 17:49 ` Yang Shi
0 siblings, 0 replies; 24+ messages in thread
From: Yang Shi @ 2022-05-25 17:49 UTC (permalink / raw)
To: Barry Song
Cc: Catalin Marinas, Andrew Morton, Will Deacon, Linux-MM, LAK, LKML,
hanchuanhua, 张诗明(Simon Zhang),
郭健,
Barry Song, Huang, Ying, Minchan Kim, Johannes Weiner,
Hugh Dickins, Shaohua Li, Rik van Riel, Andrea Arcangeli,
Steven Price
On Wed, May 25, 2022 at 4:10 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Wed, May 25, 2022 at 7:14 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> >
> > On Tue, May 24, 2022 at 10:05:35PM +1200, Barry Song wrote:
> > > On Tue, May 24, 2022 at 8:12 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > On Tue, May 24, 2022 at 07:14:03PM +1200, Barry Song wrote:
> > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > > index d550f5acfaf3..8e3771c56fbf 100644
> > > > > --- a/arch/arm64/Kconfig
> > > > > +++ b/arch/arm64/Kconfig
> > > > > @@ -98,6 +98,7 @@ config ARM64
> > > > > select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
> > > > > select ARCH_WANT_LD_ORPHAN_WARN
> > > > > select ARCH_WANTS_NO_INSTR
> > > > > + select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
> > > >
> > > > I'm not opposed to this but I think it would break pages mapped with
> > > > PROT_MTE. We have an assumption in mte_sync_tags() that compound pages
> > > > are not swapped out (or in). With MTE, we store the tags in a slab
> > >
> > > I assume you mean mte_sync_tags() require that THP is not swapped as a whole,
> > > as without THP_SWP, THP is still swapping after being splitted. MTE doesn't stop
> > > THP from swapping through a couple of splitted pages, does it?
> >
> > That's correct, split THP page are swapped out/in just fine.
> >
> > > > object (128-bytes per swapped page) and restore them when pages are
> > > > swapped in. At some point we may teach the core swap code about such
> > > > metadata but in the meantime that was the easiest way.
> > >
> > > If my previous assumption is true, the easiest way to enable THP_SWP
> > > for this moment might be always letting mm fallback to the splitting
> > > way for MTE hardware. For this moment, I care about THP_SWP more as
> > > none of my hardware has MTE.
> > >
> > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > > index 45c358538f13..d55a2a3e41a9 100644
> > > --- a/arch/arm64/include/asm/pgtable.h
> > > +++ b/arch/arm64/include/asm/pgtable.h
> > > @@ -44,6 +44,8 @@
> > > __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
> > > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > >
> > > +#define arch_thp_swp_supported !system_supports_mte
> > > +
> > > /*
> > > * Outside of a few very special situations (e.g. hibernation), we always
> > > * use broadcast TLB invalidation instructions, therefore a spurious page
> > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > index 2999190adc22..064b6b03df9e 100644
> > > --- a/include/linux/huge_mm.h
> > > +++ b/include/linux/huge_mm.h
> > > @@ -447,4 +447,16 @@ static inline int split_folio_to_list(struct folio *folio,
> > > return split_huge_page_to_list(&folio->page, list);
> > > }
> > >
> > > +/*
> > > + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> > > + * limitations in the implementation like arm64 MTE can override this to
> > > + * false
> > > + */
> > > +#ifndef arch_thp_swp_supported
> > > +static inline bool arch_thp_swp_supported(void)
> > > +{
> > > + return true;
> > > +}
> > > +#endif
> > > +
> > > #endif /* _LINUX_HUGE_MM_H */
> > > diff --git a/mm/swap_slots.c b/mm/swap_slots.c
> > > index 2b5531840583..dde685836328 100644
> > > --- a/mm/swap_slots.c
> > > +++ b/mm/swap_slots.c
> > > @@ -309,7 +309,7 @@ swp_entry_t get_swap_page(struct page *page)
> > > entry.val = 0;
> > >
> > > if (PageTransHuge(page)) {
> > > - if (IS_ENABLED(CONFIG_THP_SWAP))
> > > + if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
> > > get_swap_pages(1, &entry, HPAGE_PMD_NR);
> > > goto out;
> >
> > I think this should work and with your other proposal it would be
> > limited to MTE pages:
> >
> > #define arch_thp_swp_supported(page) (!test_bit(PG_mte_tagged, &page->flags))
> >
> > Are THP pages loaded from swap as a whole or are they split? IIRC the
>
> i can confirm thp is written as a whole through:
> [ 90.622863] __swap_writepage+0xe8/0x580
> [ 90.622881] swap_writepage+0x44/0xf8
> [ 90.622891] pageout+0xe0/0x2a8
> [ 90.622906] shrink_page_list+0x9dc/0xde0
> [ 90.622917] shrink_inactive_list+0x1ec/0x3c8
> [ 90.622928] shrink_lruvec+0x3dc/0x628
> [ 90.622939] shrink_node+0x37c/0x6a0
> [ 90.622950] balance_pgdat+0x354/0x668
> [ 90.622961] kswapd+0x1e0/0x3c0
> [ 90.622972] kthread+0x110/0x120
>
> but i have never got a backtrace in which thp is loaded as a whole though it
> seems the code has this path:
THP could be swapped out in a whole, but never swapped in as THP. Just
the single base page (4K on x86) is swapped in.
> int swap_readpage(struct page *page, bool synchronous)
> {
> ...
> bio = bio_alloc(sis->bdev, 1, REQ_OP_READ, GFP_KERNEL);
> bio->bi_iter.bi_sector = swap_page_sector(page);
> bio->bi_end_io = end_swap_bio_read;
> bio_add_page(bio, page, thp_size(page), 0);
> ...
> submit_bio(bio);
> }
>
>
> > splitting still happens but after the swapping out finishes. Even if
> > they are loaded as 4K pages, we still have the mte_save_tags() that only
> > understands small pages currently, so rejecting THP pages is probably
> > best.
>
> as anyway i don't have a mte-hardware to do a valid test to go any
> further, so i will totally disable thp_swp for hardware having mte for
> this moment in patch v2.
>
> >
> > --
> > Catalin
>
> Thanks
> Barry
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] arm64: enable THP_SWAP for arm64
2022-05-25 17:49 ` Yang Shi
@ 2022-05-26 9:19 ` Barry Song
-1 siblings, 0 replies; 24+ messages in thread
From: Barry Song @ 2022-05-26 9:19 UTC (permalink / raw)
To: Yang Shi
Cc: Catalin Marinas, Andrew Morton, Will Deacon, Linux-MM, LAK, LKML,
hanchuanhua, 张诗明(Simon Zhang),
郭健,
Barry Song, Huang, Ying, Minchan Kim, Johannes Weiner,
Hugh Dickins, Shaohua Li, Rik van Riel, Andrea Arcangeli,
Steven Price
On Thu, May 26, 2022 at 5:49 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Wed, May 25, 2022 at 4:10 AM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Wed, May 25, 2022 at 7:14 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > >
> > > On Tue, May 24, 2022 at 10:05:35PM +1200, Barry Song wrote:
> > > > On Tue, May 24, 2022 at 8:12 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > > On Tue, May 24, 2022 at 07:14:03PM +1200, Barry Song wrote:
> > > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > > > index d550f5acfaf3..8e3771c56fbf 100644
> > > > > > --- a/arch/arm64/Kconfig
> > > > > > +++ b/arch/arm64/Kconfig
> > > > > > @@ -98,6 +98,7 @@ config ARM64
> > > > > > select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
> > > > > > select ARCH_WANT_LD_ORPHAN_WARN
> > > > > > select ARCH_WANTS_NO_INSTR
> > > > > > + select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
> > > > >
> > > > > I'm not opposed to this but I think it would break pages mapped with
> > > > > PROT_MTE. We have an assumption in mte_sync_tags() that compound pages
> > > > > are not swapped out (or in). With MTE, we store the tags in a slab
> > > >
> > > > I assume you mean mte_sync_tags() require that THP is not swapped as a whole,
> > > > as without THP_SWP, THP is still swapping after being splitted. MTE doesn't stop
> > > > THP from swapping through a couple of splitted pages, does it?
> > >
> > > That's correct, split THP page are swapped out/in just fine.
> > >
> > > > > object (128-bytes per swapped page) and restore them when pages are
> > > > > swapped in. At some point we may teach the core swap code about such
> > > > > metadata but in the meantime that was the easiest way.
> > > >
> > > > If my previous assumption is true, the easiest way to enable THP_SWP
> > > > for this moment might be always letting mm fallback to the splitting
> > > > way for MTE hardware. For this moment, I care about THP_SWP more as
> > > > none of my hardware has MTE.
> > > >
> > > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > > > index 45c358538f13..d55a2a3e41a9 100644
> > > > --- a/arch/arm64/include/asm/pgtable.h
> > > > +++ b/arch/arm64/include/asm/pgtable.h
> > > > @@ -44,6 +44,8 @@
> > > > __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
> > > > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > > >
> > > > +#define arch_thp_swp_supported !system_supports_mte
> > > > +
> > > > /*
> > > > * Outside of a few very special situations (e.g. hibernation), we always
> > > > * use broadcast TLB invalidation instructions, therefore a spurious page
> > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > index 2999190adc22..064b6b03df9e 100644
> > > > --- a/include/linux/huge_mm.h
> > > > +++ b/include/linux/huge_mm.h
> > > > @@ -447,4 +447,16 @@ static inline int split_folio_to_list(struct folio *folio,
> > > > return split_huge_page_to_list(&folio->page, list);
> > > > }
> > > >
> > > > +/*
> > > > + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> > > > + * limitations in the implementation like arm64 MTE can override this to
> > > > + * false
> > > > + */
> > > > +#ifndef arch_thp_swp_supported
> > > > +static inline bool arch_thp_swp_supported(void)
> > > > +{
> > > > + return true;
> > > > +}
> > > > +#endif
> > > > +
> > > > #endif /* _LINUX_HUGE_MM_H */
> > > > diff --git a/mm/swap_slots.c b/mm/swap_slots.c
> > > > index 2b5531840583..dde685836328 100644
> > > > --- a/mm/swap_slots.c
> > > > +++ b/mm/swap_slots.c
> > > > @@ -309,7 +309,7 @@ swp_entry_t get_swap_page(struct page *page)
> > > > entry.val = 0;
> > > >
> > > > if (PageTransHuge(page)) {
> > > > - if (IS_ENABLED(CONFIG_THP_SWAP))
> > > > + if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
> > > > get_swap_pages(1, &entry, HPAGE_PMD_NR);
> > > > goto out;
> > >
> > > I think this should work and with your other proposal it would be
> > > limited to MTE pages:
> > >
> > > #define arch_thp_swp_supported(page) (!test_bit(PG_mte_tagged, &page->flags))
> > >
> > > Are THP pages loaded from swap as a whole or are they split? IIRC the
> >
> > i can confirm thp is written as a whole through:
> > [ 90.622863] __swap_writepage+0xe8/0x580
> > [ 90.622881] swap_writepage+0x44/0xf8
> > [ 90.622891] pageout+0xe0/0x2a8
> > [ 90.622906] shrink_page_list+0x9dc/0xde0
> > [ 90.622917] shrink_inactive_list+0x1ec/0x3c8
> > [ 90.622928] shrink_lruvec+0x3dc/0x628
> > [ 90.622939] shrink_node+0x37c/0x6a0
> > [ 90.622950] balance_pgdat+0x354/0x668
> > [ 90.622961] kswapd+0x1e0/0x3c0
> > [ 90.622972] kthread+0x110/0x120
> >
> > but i have never got a backtrace in which thp is loaded as a whole though it
> > seems the code has this path:
>
> THP could be swapped out in a whole, but never swapped in as THP. Just
> the single base page (4K on x86) is swapped in.
yep. it seems swapin_readahead() is never reading a THP or even splitted
pages for this 2MB THP.
the number of pages to be read-ahead is determined either by
/proc/sys/vm/page-cluster if /sys/kernel/mm/swap/vma_ra_enabled is fase
or
by vma read-ahead algorithm if /sys//kernel/mm/swap/vma_ra_enabled is true
And the number is usually quite small.
Am I missing any case in which 2MB can be swapped in as whole either by
splitted pages or a THP?
Thanks
Barry
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] arm64: enable THP_SWAP for arm64
@ 2022-05-26 9:19 ` Barry Song
0 siblings, 0 replies; 24+ messages in thread
From: Barry Song @ 2022-05-26 9:19 UTC (permalink / raw)
To: Yang Shi
Cc: Catalin Marinas, Andrew Morton, Will Deacon, Linux-MM, LAK, LKML,
hanchuanhua, 张诗明(Simon Zhang),
郭健,
Barry Song, Huang, Ying, Minchan Kim, Johannes Weiner,
Hugh Dickins, Shaohua Li, Rik van Riel, Andrea Arcangeli,
Steven Price
On Thu, May 26, 2022 at 5:49 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Wed, May 25, 2022 at 4:10 AM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Wed, May 25, 2022 at 7:14 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > >
> > > On Tue, May 24, 2022 at 10:05:35PM +1200, Barry Song wrote:
> > > > On Tue, May 24, 2022 at 8:12 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > > On Tue, May 24, 2022 at 07:14:03PM +1200, Barry Song wrote:
> > > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > > > index d550f5acfaf3..8e3771c56fbf 100644
> > > > > > --- a/arch/arm64/Kconfig
> > > > > > +++ b/arch/arm64/Kconfig
> > > > > > @@ -98,6 +98,7 @@ config ARM64
> > > > > > select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
> > > > > > select ARCH_WANT_LD_ORPHAN_WARN
> > > > > > select ARCH_WANTS_NO_INSTR
> > > > > > + select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
> > > > >
> > > > > I'm not opposed to this but I think it would break pages mapped with
> > > > > PROT_MTE. We have an assumption in mte_sync_tags() that compound pages
> > > > > are not swapped out (or in). With MTE, we store the tags in a slab
> > > >
> > > > I assume you mean mte_sync_tags() require that THP is not swapped as a whole,
> > > > as without THP_SWP, THP is still swapping after being splitted. MTE doesn't stop
> > > > THP from swapping through a couple of splitted pages, does it?
> > >
> > > That's correct, split THP page are swapped out/in just fine.
> > >
> > > > > object (128-bytes per swapped page) and restore them when pages are
> > > > > swapped in. At some point we may teach the core swap code about such
> > > > > metadata but in the meantime that was the easiest way.
> > > >
> > > > If my previous assumption is true, the easiest way to enable THP_SWP
> > > > for this moment might be always letting mm fallback to the splitting
> > > > way for MTE hardware. For this moment, I care about THP_SWP more as
> > > > none of my hardware has MTE.
> > > >
> > > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > > > index 45c358538f13..d55a2a3e41a9 100644
> > > > --- a/arch/arm64/include/asm/pgtable.h
> > > > +++ b/arch/arm64/include/asm/pgtable.h
> > > > @@ -44,6 +44,8 @@
> > > > __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
> > > > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > > >
> > > > +#define arch_thp_swp_supported !system_supports_mte
> > > > +
> > > > /*
> > > > * Outside of a few very special situations (e.g. hibernation), we always
> > > > * use broadcast TLB invalidation instructions, therefore a spurious page
> > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > index 2999190adc22..064b6b03df9e 100644
> > > > --- a/include/linux/huge_mm.h
> > > > +++ b/include/linux/huge_mm.h
> > > > @@ -447,4 +447,16 @@ static inline int split_folio_to_list(struct folio *folio,
> > > > return split_huge_page_to_list(&folio->page, list);
> > > > }
> > > >
> > > > +/*
> > > > + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> > > > + * limitations in the implementation like arm64 MTE can override this to
> > > > + * false
> > > > + */
> > > > +#ifndef arch_thp_swp_supported
> > > > +static inline bool arch_thp_swp_supported(void)
> > > > +{
> > > > + return true;
> > > > +}
> > > > +#endif
> > > > +
> > > > #endif /* _LINUX_HUGE_MM_H */
> > > > diff --git a/mm/swap_slots.c b/mm/swap_slots.c
> > > > index 2b5531840583..dde685836328 100644
> > > > --- a/mm/swap_slots.c
> > > > +++ b/mm/swap_slots.c
> > > > @@ -309,7 +309,7 @@ swp_entry_t get_swap_page(struct page *page)
> > > > entry.val = 0;
> > > >
> > > > if (PageTransHuge(page)) {
> > > > - if (IS_ENABLED(CONFIG_THP_SWAP))
> > > > + if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
> > > > get_swap_pages(1, &entry, HPAGE_PMD_NR);
> > > > goto out;
> > >
> > > I think this should work and with your other proposal it would be
> > > limited to MTE pages:
> > >
> > > #define arch_thp_swp_supported(page) (!test_bit(PG_mte_tagged, &page->flags))
> > >
> > > Are THP pages loaded from swap as a whole or are they split? IIRC the
> >
> > i can confirm thp is written as a whole through:
> > [ 90.622863] __swap_writepage+0xe8/0x580
> > [ 90.622881] swap_writepage+0x44/0xf8
> > [ 90.622891] pageout+0xe0/0x2a8
> > [ 90.622906] shrink_page_list+0x9dc/0xde0
> > [ 90.622917] shrink_inactive_list+0x1ec/0x3c8
> > [ 90.622928] shrink_lruvec+0x3dc/0x628
> > [ 90.622939] shrink_node+0x37c/0x6a0
> > [ 90.622950] balance_pgdat+0x354/0x668
> > [ 90.622961] kswapd+0x1e0/0x3c0
> > [ 90.622972] kthread+0x110/0x120
> >
> > but i have never got a backtrace in which thp is loaded as a whole though it
> > seems the code has this path:
>
> THP could be swapped out in a whole, but never swapped in as THP. Just
> the single base page (4K on x86) is swapped in.
yep. it seems swapin_readahead() is never reading a THP or even splitted
pages for this 2MB THP.
the number of pages to be read-ahead is determined either by
/proc/sys/vm/page-cluster if /sys/kernel/mm/swap/vma_ra_enabled is fase
or
by vma read-ahead algorithm if /sys//kernel/mm/swap/vma_ra_enabled is true
And the number is usually quite small.
Am I missing any case in which 2MB can be swapped in as whole either by
splitted pages or a THP?
Thanks
Barry
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] arm64: enable THP_SWAP for arm64
2022-05-26 9:19 ` Barry Song
@ 2022-05-26 17:02 ` Yang Shi
-1 siblings, 0 replies; 24+ messages in thread
From: Yang Shi @ 2022-05-26 17:02 UTC (permalink / raw)
To: Barry Song
Cc: Catalin Marinas, Andrew Morton, Will Deacon, Linux-MM, LAK, LKML,
hanchuanhua, 张诗明(Simon Zhang),
郭健,
Barry Song, Huang, Ying, Minchan Kim, Johannes Weiner,
Hugh Dickins, Andrea Arcangeli, Steven Price
On Thu, May 26, 2022 at 2:19 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Thu, May 26, 2022 at 5:49 AM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Wed, May 25, 2022 at 4:10 AM Barry Song <21cnbao@gmail.com> wrote:
> > >
> > > On Wed, May 25, 2022 at 7:14 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > >
> > > > On Tue, May 24, 2022 at 10:05:35PM +1200, Barry Song wrote:
> > > > > On Tue, May 24, 2022 at 8:12 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > > > On Tue, May 24, 2022 at 07:14:03PM +1200, Barry Song wrote:
> > > > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > > > > index d550f5acfaf3..8e3771c56fbf 100644
> > > > > > > --- a/arch/arm64/Kconfig
> > > > > > > +++ b/arch/arm64/Kconfig
> > > > > > > @@ -98,6 +98,7 @@ config ARM64
> > > > > > > select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
> > > > > > > select ARCH_WANT_LD_ORPHAN_WARN
> > > > > > > select ARCH_WANTS_NO_INSTR
> > > > > > > + select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
> > > > > >
> > > > > > I'm not opposed to this but I think it would break pages mapped with
> > > > > > PROT_MTE. We have an assumption in mte_sync_tags() that compound pages
> > > > > > are not swapped out (or in). With MTE, we store the tags in a slab
> > > > >
> > > > > I assume you mean mte_sync_tags() require that THP is not swapped as a whole,
> > > > > as without THP_SWP, THP is still swapping after being splitted. MTE doesn't stop
> > > > > THP from swapping through a couple of splitted pages, does it?
> > > >
> > > > That's correct, split THP page are swapped out/in just fine.
> > > >
> > > > > > object (128-bytes per swapped page) and restore them when pages are
> > > > > > swapped in. At some point we may teach the core swap code about such
> > > > > > metadata but in the meantime that was the easiest way.
> > > > >
> > > > > If my previous assumption is true, the easiest way to enable THP_SWP
> > > > > for this moment might be always letting mm fallback to the splitting
> > > > > way for MTE hardware. For this moment, I care about THP_SWP more as
> > > > > none of my hardware has MTE.
> > > > >
> > > > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > > > > index 45c358538f13..d55a2a3e41a9 100644
> > > > > --- a/arch/arm64/include/asm/pgtable.h
> > > > > +++ b/arch/arm64/include/asm/pgtable.h
> > > > > @@ -44,6 +44,8 @@
> > > > > __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
> > > > > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > > > >
> > > > > +#define arch_thp_swp_supported !system_supports_mte
> > > > > +
> > > > > /*
> > > > > * Outside of a few very special situations (e.g. hibernation), we always
> > > > > * use broadcast TLB invalidation instructions, therefore a spurious page
> > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > > index 2999190adc22..064b6b03df9e 100644
> > > > > --- a/include/linux/huge_mm.h
> > > > > +++ b/include/linux/huge_mm.h
> > > > > @@ -447,4 +447,16 @@ static inline int split_folio_to_list(struct folio *folio,
> > > > > return split_huge_page_to_list(&folio->page, list);
> > > > > }
> > > > >
> > > > > +/*
> > > > > + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> > > > > + * limitations in the implementation like arm64 MTE can override this to
> > > > > + * false
> > > > > + */
> > > > > +#ifndef arch_thp_swp_supported
> > > > > +static inline bool arch_thp_swp_supported(void)
> > > > > +{
> > > > > + return true;
> > > > > +}
> > > > > +#endif
> > > > > +
> > > > > #endif /* _LINUX_HUGE_MM_H */
> > > > > diff --git a/mm/swap_slots.c b/mm/swap_slots.c
> > > > > index 2b5531840583..dde685836328 100644
> > > > > --- a/mm/swap_slots.c
> > > > > +++ b/mm/swap_slots.c
> > > > > @@ -309,7 +309,7 @@ swp_entry_t get_swap_page(struct page *page)
> > > > > entry.val = 0;
> > > > >
> > > > > if (PageTransHuge(page)) {
> > > > > - if (IS_ENABLED(CONFIG_THP_SWAP))
> > > > > + if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
> > > > > get_swap_pages(1, &entry, HPAGE_PMD_NR);
> > > > > goto out;
> > > >
> > > > I think this should work and with your other proposal it would be
> > > > limited to MTE pages:
> > > >
> > > > #define arch_thp_swp_supported(page) (!test_bit(PG_mte_tagged, &page->flags))
> > > >
> > > > Are THP pages loaded from swap as a whole or are they split? IIRC the
> > >
> > > i can confirm thp is written as a whole through:
> > > [ 90.622863] __swap_writepage+0xe8/0x580
> > > [ 90.622881] swap_writepage+0x44/0xf8
> > > [ 90.622891] pageout+0xe0/0x2a8
> > > [ 90.622906] shrink_page_list+0x9dc/0xde0
> > > [ 90.622917] shrink_inactive_list+0x1ec/0x3c8
> > > [ 90.622928] shrink_lruvec+0x3dc/0x628
> > > [ 90.622939] shrink_node+0x37c/0x6a0
> > > [ 90.622950] balance_pgdat+0x354/0x668
> > > [ 90.622961] kswapd+0x1e0/0x3c0
> > > [ 90.622972] kthread+0x110/0x120
> > >
> > > but i have never got a backtrace in which thp is loaded as a whole though it
> > > seems the code has this path:
> >
> > THP could be swapped out in a whole, but never swapped in as THP. Just
> > the single base page (4K on x86) is swapped in.
>
> yep. it seems swapin_readahead() is never reading a THP or even splitted
> pages for this 2MB THP.
>
> the number of pages to be read-ahead is determined either by
> /proc/sys/vm/page-cluster if /sys/kernel/mm/swap/vma_ra_enabled is fase
> or
> by vma read-ahead algorithm if /sys//kernel/mm/swap/vma_ra_enabled is true
> And the number is usually quite small.
>
> Am I missing any case in which 2MB can be swapped in as whole either by
> splitted pages or a THP?
Even though readahead swaps in 2MB, they are 512 single base pages
rather than THP. They may not be physically continuous at all.
>
> Thanks
> Barry
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] arm64: enable THP_SWAP for arm64
@ 2022-05-26 17:02 ` Yang Shi
0 siblings, 0 replies; 24+ messages in thread
From: Yang Shi @ 2022-05-26 17:02 UTC (permalink / raw)
To: Barry Song
Cc: Catalin Marinas, Andrew Morton, Will Deacon, Linux-MM, LAK, LKML,
hanchuanhua, 张诗明(Simon Zhang),
郭健,
Barry Song, Huang, Ying, Minchan Kim, Johannes Weiner,
Hugh Dickins, Andrea Arcangeli, Steven Price
On Thu, May 26, 2022 at 2:19 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Thu, May 26, 2022 at 5:49 AM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Wed, May 25, 2022 at 4:10 AM Barry Song <21cnbao@gmail.com> wrote:
> > >
> > > On Wed, May 25, 2022 at 7:14 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > >
> > > > On Tue, May 24, 2022 at 10:05:35PM +1200, Barry Song wrote:
> > > > > On Tue, May 24, 2022 at 8:12 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > > > On Tue, May 24, 2022 at 07:14:03PM +1200, Barry Song wrote:
> > > > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > > > > index d550f5acfaf3..8e3771c56fbf 100644
> > > > > > > --- a/arch/arm64/Kconfig
> > > > > > > +++ b/arch/arm64/Kconfig
> > > > > > > @@ -98,6 +98,7 @@ config ARM64
> > > > > > > select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
> > > > > > > select ARCH_WANT_LD_ORPHAN_WARN
> > > > > > > select ARCH_WANTS_NO_INSTR
> > > > > > > + select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
> > > > > >
> > > > > > I'm not opposed to this but I think it would break pages mapped with
> > > > > > PROT_MTE. We have an assumption in mte_sync_tags() that compound pages
> > > > > > are not swapped out (or in). With MTE, we store the tags in a slab
> > > > >
> > > > > I assume you mean mte_sync_tags() require that THP is not swapped as a whole,
> > > > > as without THP_SWP, THP is still swapping after being splitted. MTE doesn't stop
> > > > > THP from swapping through a couple of splitted pages, does it?
> > > >
> > > > That's correct, split THP page are swapped out/in just fine.
> > > >
> > > > > > object (128-bytes per swapped page) and restore them when pages are
> > > > > > swapped in. At some point we may teach the core swap code about such
> > > > > > metadata but in the meantime that was the easiest way.
> > > > >
> > > > > If my previous assumption is true, the easiest way to enable THP_SWP
> > > > > for this moment might be always letting mm fallback to the splitting
> > > > > way for MTE hardware. For this moment, I care about THP_SWP more as
> > > > > none of my hardware has MTE.
> > > > >
> > > > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > > > > index 45c358538f13..d55a2a3e41a9 100644
> > > > > --- a/arch/arm64/include/asm/pgtable.h
> > > > > +++ b/arch/arm64/include/asm/pgtable.h
> > > > > @@ -44,6 +44,8 @@
> > > > > __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
> > > > > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > > > >
> > > > > +#define arch_thp_swp_supported !system_supports_mte
> > > > > +
> > > > > /*
> > > > > * Outside of a few very special situations (e.g. hibernation), we always
> > > > > * use broadcast TLB invalidation instructions, therefore a spurious page
> > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > > index 2999190adc22..064b6b03df9e 100644
> > > > > --- a/include/linux/huge_mm.h
> > > > > +++ b/include/linux/huge_mm.h
> > > > > @@ -447,4 +447,16 @@ static inline int split_folio_to_list(struct folio *folio,
> > > > > return split_huge_page_to_list(&folio->page, list);
> > > > > }
> > > > >
> > > > > +/*
> > > > > + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> > > > > + * limitations in the implementation like arm64 MTE can override this to
> > > > > + * false
> > > > > + */
> > > > > +#ifndef arch_thp_swp_supported
> > > > > +static inline bool arch_thp_swp_supported(void)
> > > > > +{
> > > > > + return true;
> > > > > +}
> > > > > +#endif
> > > > > +
> > > > > #endif /* _LINUX_HUGE_MM_H */
> > > > > diff --git a/mm/swap_slots.c b/mm/swap_slots.c
> > > > > index 2b5531840583..dde685836328 100644
> > > > > --- a/mm/swap_slots.c
> > > > > +++ b/mm/swap_slots.c
> > > > > @@ -309,7 +309,7 @@ swp_entry_t get_swap_page(struct page *page)
> > > > > entry.val = 0;
> > > > >
> > > > > if (PageTransHuge(page)) {
> > > > > - if (IS_ENABLED(CONFIG_THP_SWAP))
> > > > > + if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
> > > > > get_swap_pages(1, &entry, HPAGE_PMD_NR);
> > > > > goto out;
> > > >
> > > > I think this should work and with your other proposal it would be
> > > > limited to MTE pages:
> > > >
> > > > #define arch_thp_swp_supported(page) (!test_bit(PG_mte_tagged, &page->flags))
> > > >
> > > > Are THP pages loaded from swap as a whole or are they split? IIRC the
> > >
> > > i can confirm thp is written as a whole through:
> > > [ 90.622863] __swap_writepage+0xe8/0x580
> > > [ 90.622881] swap_writepage+0x44/0xf8
> > > [ 90.622891] pageout+0xe0/0x2a8
> > > [ 90.622906] shrink_page_list+0x9dc/0xde0
> > > [ 90.622917] shrink_inactive_list+0x1ec/0x3c8
> > > [ 90.622928] shrink_lruvec+0x3dc/0x628
> > > [ 90.622939] shrink_node+0x37c/0x6a0
> > > [ 90.622950] balance_pgdat+0x354/0x668
> > > [ 90.622961] kswapd+0x1e0/0x3c0
> > > [ 90.622972] kthread+0x110/0x120
> > >
> > > but i have never got a backtrace in which thp is loaded as a whole though it
> > > seems the code has this path:
> >
> > THP could be swapped out in a whole, but never swapped in as THP. Just
> > the single base page (4K on x86) is swapped in.
>
> yep. it seems swapin_readahead() is never reading a THP or even splitted
> pages for this 2MB THP.
>
> the number of pages to be read-ahead is determined either by
> /proc/sys/vm/page-cluster if /sys/kernel/mm/swap/vma_ra_enabled is fase
> or
> by vma read-ahead algorithm if /sys//kernel/mm/swap/vma_ra_enabled is true
> And the number is usually quite small.
>
> Am I missing any case in which 2MB can be swapped in as whole either by
> splitted pages or a THP?
Even though readahead swaps in 2MB, they are 512 single base pages
rather than THP. They may not be physically continuous at all.
>
> Thanks
> Barry
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] arm64: enable THP_SWAP for arm64
2022-05-26 17:02 ` Yang Shi
@ 2022-05-27 7:29 ` Barry Song
-1 siblings, 0 replies; 24+ messages in thread
From: Barry Song @ 2022-05-27 7:29 UTC (permalink / raw)
To: Yang Shi
Cc: Catalin Marinas, Andrew Morton, Will Deacon, Linux-MM, LAK, LKML,
hanchuanhua, 张诗明(Simon Zhang),
郭健,
Barry Song, Huang, Ying, Minchan Kim, Johannes Weiner,
Hugh Dickins, Andrea Arcangeli, Steven Price
On Fri, May 27, 2022 at 5:03 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Thu, May 26, 2022 at 2:19 AM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Thu, May 26, 2022 at 5:49 AM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Wed, May 25, 2022 at 4:10 AM Barry Song <21cnbao@gmail.com> wrote:
> > > >
> > > > On Wed, May 25, 2022 at 7:14 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > >
> > > > > On Tue, May 24, 2022 at 10:05:35PM +1200, Barry Song wrote:
> > > > > > On Tue, May 24, 2022 at 8:12 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > > > > On Tue, May 24, 2022 at 07:14:03PM +1200, Barry Song wrote:
> > > > > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > > > > > index d550f5acfaf3..8e3771c56fbf 100644
> > > > > > > > --- a/arch/arm64/Kconfig
> > > > > > > > +++ b/arch/arm64/Kconfig
> > > > > > > > @@ -98,6 +98,7 @@ config ARM64
> > > > > > > > select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
> > > > > > > > select ARCH_WANT_LD_ORPHAN_WARN
> > > > > > > > select ARCH_WANTS_NO_INSTR
> > > > > > > > + select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
> > > > > > >
> > > > > > > I'm not opposed to this but I think it would break pages mapped with
> > > > > > > PROT_MTE. We have an assumption in mte_sync_tags() that compound pages
> > > > > > > are not swapped out (or in). With MTE, we store the tags in a slab
> > > > > >
> > > > > > I assume you mean mte_sync_tags() require that THP is not swapped as a whole,
> > > > > > as without THP_SWP, THP is still swapping after being splitted. MTE doesn't stop
> > > > > > THP from swapping through a couple of splitted pages, does it?
> > > > >
> > > > > That's correct, split THP page are swapped out/in just fine.
> > > > >
> > > > > > > object (128-bytes per swapped page) and restore them when pages are
> > > > > > > swapped in. At some point we may teach the core swap code about such
> > > > > > > metadata but in the meantime that was the easiest way.
> > > > > >
> > > > > > If my previous assumption is true, the easiest way to enable THP_SWP
> > > > > > for this moment might be always letting mm fallback to the splitting
> > > > > > way for MTE hardware. For this moment, I care about THP_SWP more as
> > > > > > none of my hardware has MTE.
> > > > > >
> > > > > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > > > > > index 45c358538f13..d55a2a3e41a9 100644
> > > > > > --- a/arch/arm64/include/asm/pgtable.h
> > > > > > +++ b/arch/arm64/include/asm/pgtable.h
> > > > > > @@ -44,6 +44,8 @@
> > > > > > __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
> > > > > > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > > > > >
> > > > > > +#define arch_thp_swp_supported !system_supports_mte
> > > > > > +
> > > > > > /*
> > > > > > * Outside of a few very special situations (e.g. hibernation), we always
> > > > > > * use broadcast TLB invalidation instructions, therefore a spurious page
> > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > > > index 2999190adc22..064b6b03df9e 100644
> > > > > > --- a/include/linux/huge_mm.h
> > > > > > +++ b/include/linux/huge_mm.h
> > > > > > @@ -447,4 +447,16 @@ static inline int split_folio_to_list(struct folio *folio,
> > > > > > return split_huge_page_to_list(&folio->page, list);
> > > > > > }
> > > > > >
> > > > > > +/*
> > > > > > + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> > > > > > + * limitations in the implementation like arm64 MTE can override this to
> > > > > > + * false
> > > > > > + */
> > > > > > +#ifndef arch_thp_swp_supported
> > > > > > +static inline bool arch_thp_swp_supported(void)
> > > > > > +{
> > > > > > + return true;
> > > > > > +}
> > > > > > +#endif
> > > > > > +
> > > > > > #endif /* _LINUX_HUGE_MM_H */
> > > > > > diff --git a/mm/swap_slots.c b/mm/swap_slots.c
> > > > > > index 2b5531840583..dde685836328 100644
> > > > > > --- a/mm/swap_slots.c
> > > > > > +++ b/mm/swap_slots.c
> > > > > > @@ -309,7 +309,7 @@ swp_entry_t get_swap_page(struct page *page)
> > > > > > entry.val = 0;
> > > > > >
> > > > > > if (PageTransHuge(page)) {
> > > > > > - if (IS_ENABLED(CONFIG_THP_SWAP))
> > > > > > + if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
> > > > > > get_swap_pages(1, &entry, HPAGE_PMD_NR);
> > > > > > goto out;
> > > > >
> > > > > I think this should work and with your other proposal it would be
> > > > > limited to MTE pages:
> > > > >
> > > > > #define arch_thp_swp_supported(page) (!test_bit(PG_mte_tagged, &page->flags))
> > > > >
> > > > > Are THP pages loaded from swap as a whole or are they split? IIRC the
> > > >
> > > > i can confirm thp is written as a whole through:
> > > > [ 90.622863] __swap_writepage+0xe8/0x580
> > > > [ 90.622881] swap_writepage+0x44/0xf8
> > > > [ 90.622891] pageout+0xe0/0x2a8
> > > > [ 90.622906] shrink_page_list+0x9dc/0xde0
> > > > [ 90.622917] shrink_inactive_list+0x1ec/0x3c8
> > > > [ 90.622928] shrink_lruvec+0x3dc/0x628
> > > > [ 90.622939] shrink_node+0x37c/0x6a0
> > > > [ 90.622950] balance_pgdat+0x354/0x668
> > > > [ 90.622961] kswapd+0x1e0/0x3c0
> > > > [ 90.622972] kthread+0x110/0x120
> > > >
> > > > but i have never got a backtrace in which thp is loaded as a whole though it
> > > > seems the code has this path:
> > >
> > > THP could be swapped out in a whole, but never swapped in as THP. Just
> > > the single base page (4K on x86) is swapped in.
> >
> > yep. it seems swapin_readahead() is never reading a THP or even splitted
> > pages for this 2MB THP.
> >
> > the number of pages to be read-ahead is determined either by
> > /proc/sys/vm/page-cluster if /sys/kernel/mm/swap/vma_ra_enabled is fase
> > or
> > by vma read-ahead algorithm if /sys//kernel/mm/swap/vma_ra_enabled is true
> > And the number is usually quite small.
> >
> > Am I missing any case in which 2MB can be swapped in as whole either by
> > splitted pages or a THP?
>
> Even though readahead swaps in 2MB, they are 512 single base pages
> rather than THP. They may not be physically continuous at all.
I actually haven't found out that readahead can swap in 2MB through either
THP or 512 single base pages. per my log, swapin_vma_readahead() usually
swaps in 2,3,4 or 8 pages.
but we do have a case in which we can swap in up to 2MB while doing
collapse:
static bool __collapse_huge_page_swapin(struct mm_struct *mm,
struct vm_area_struct *vma,
unsigned long haddr, pmd_t *pmd,
int referenced)
{
int swapped_in = 0;
vm_fault_t ret = 0;
unsigned long address, end = haddr + (HPAGE_PMD_NR * PAGE_SIZE);
for (address = haddr; address < end; address += PAGE_SIZE) {
struct vm_fault vmf = {
.vma = vma,
.address = address,
.pgoff = linear_page_index(vma, haddr),
.flags = FAULT_FLAG_ALLOW_RETRY,
.pmd = pmd,
};
vmf.pte = pte_offset_map(pmd, address);
vmf.orig_pte = *vmf.pte;
if (!is_swap_pte(vmf.orig_pte)) {
pte_unmap(vmf.pte);
continue;
}
swapped_in++;
ret = do_swap_page(&vmf);
...}
}
}
It seems Huang Ying once mentioned there was a plan to not split THP
throughout the whole process.
Thanks
Barry
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] arm64: enable THP_SWAP for arm64
@ 2022-05-27 7:29 ` Barry Song
0 siblings, 0 replies; 24+ messages in thread
From: Barry Song @ 2022-05-27 7:29 UTC (permalink / raw)
To: Yang Shi
Cc: Catalin Marinas, Andrew Morton, Will Deacon, Linux-MM, LAK, LKML,
hanchuanhua, 张诗明(Simon Zhang),
郭健,
Barry Song, Huang, Ying, Minchan Kim, Johannes Weiner,
Hugh Dickins, Andrea Arcangeli, Steven Price
On Fri, May 27, 2022 at 5:03 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Thu, May 26, 2022 at 2:19 AM Barry Song <21cnbao@gmail.com> wrote:
> >
> > On Thu, May 26, 2022 at 5:49 AM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Wed, May 25, 2022 at 4:10 AM Barry Song <21cnbao@gmail.com> wrote:
> > > >
> > > > On Wed, May 25, 2022 at 7:14 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > >
> > > > > On Tue, May 24, 2022 at 10:05:35PM +1200, Barry Song wrote:
> > > > > > On Tue, May 24, 2022 at 8:12 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > > > > On Tue, May 24, 2022 at 07:14:03PM +1200, Barry Song wrote:
> > > > > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > > > > > index d550f5acfaf3..8e3771c56fbf 100644
> > > > > > > > --- a/arch/arm64/Kconfig
> > > > > > > > +++ b/arch/arm64/Kconfig
> > > > > > > > @@ -98,6 +98,7 @@ config ARM64
> > > > > > > > select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
> > > > > > > > select ARCH_WANT_LD_ORPHAN_WARN
> > > > > > > > select ARCH_WANTS_NO_INSTR
> > > > > > > > + select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
> > > > > > >
> > > > > > > I'm not opposed to this but I think it would break pages mapped with
> > > > > > > PROT_MTE. We have an assumption in mte_sync_tags() that compound pages
> > > > > > > are not swapped out (or in). With MTE, we store the tags in a slab
> > > > > >
> > > > > > I assume you mean mte_sync_tags() require that THP is not swapped as a whole,
> > > > > > as without THP_SWP, THP is still swapping after being splitted. MTE doesn't stop
> > > > > > THP from swapping through a couple of splitted pages, does it?
> > > > >
> > > > > That's correct, split THP page are swapped out/in just fine.
> > > > >
> > > > > > > object (128-bytes per swapped page) and restore them when pages are
> > > > > > > swapped in. At some point we may teach the core swap code about such
> > > > > > > metadata but in the meantime that was the easiest way.
> > > > > >
> > > > > > If my previous assumption is true, the easiest way to enable THP_SWP
> > > > > > for this moment might be always letting mm fallback to the splitting
> > > > > > way for MTE hardware. For this moment, I care about THP_SWP more as
> > > > > > none of my hardware has MTE.
> > > > > >
> > > > > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > > > > > index 45c358538f13..d55a2a3e41a9 100644
> > > > > > --- a/arch/arm64/include/asm/pgtable.h
> > > > > > +++ b/arch/arm64/include/asm/pgtable.h
> > > > > > @@ -44,6 +44,8 @@
> > > > > > __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
> > > > > > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > > > > >
> > > > > > +#define arch_thp_swp_supported !system_supports_mte
> > > > > > +
> > > > > > /*
> > > > > > * Outside of a few very special situations (e.g. hibernation), we always
> > > > > > * use broadcast TLB invalidation instructions, therefore a spurious page
> > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > > > index 2999190adc22..064b6b03df9e 100644
> > > > > > --- a/include/linux/huge_mm.h
> > > > > > +++ b/include/linux/huge_mm.h
> > > > > > @@ -447,4 +447,16 @@ static inline int split_folio_to_list(struct folio *folio,
> > > > > > return split_huge_page_to_list(&folio->page, list);
> > > > > > }
> > > > > >
> > > > > > +/*
> > > > > > + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> > > > > > + * limitations in the implementation like arm64 MTE can override this to
> > > > > > + * false
> > > > > > + */
> > > > > > +#ifndef arch_thp_swp_supported
> > > > > > +static inline bool arch_thp_swp_supported(void)
> > > > > > +{
> > > > > > + return true;
> > > > > > +}
> > > > > > +#endif
> > > > > > +
> > > > > > #endif /* _LINUX_HUGE_MM_H */
> > > > > > diff --git a/mm/swap_slots.c b/mm/swap_slots.c
> > > > > > index 2b5531840583..dde685836328 100644
> > > > > > --- a/mm/swap_slots.c
> > > > > > +++ b/mm/swap_slots.c
> > > > > > @@ -309,7 +309,7 @@ swp_entry_t get_swap_page(struct page *page)
> > > > > > entry.val = 0;
> > > > > >
> > > > > > if (PageTransHuge(page)) {
> > > > > > - if (IS_ENABLED(CONFIG_THP_SWAP))
> > > > > > + if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
> > > > > > get_swap_pages(1, &entry, HPAGE_PMD_NR);
> > > > > > goto out;
> > > > >
> > > > > I think this should work and with your other proposal it would be
> > > > > limited to MTE pages:
> > > > >
> > > > > #define arch_thp_swp_supported(page) (!test_bit(PG_mte_tagged, &page->flags))
> > > > >
> > > > > Are THP pages loaded from swap as a whole or are they split? IIRC the
> > > >
> > > > i can confirm thp is written as a whole through:
> > > > [ 90.622863] __swap_writepage+0xe8/0x580
> > > > [ 90.622881] swap_writepage+0x44/0xf8
> > > > [ 90.622891] pageout+0xe0/0x2a8
> > > > [ 90.622906] shrink_page_list+0x9dc/0xde0
> > > > [ 90.622917] shrink_inactive_list+0x1ec/0x3c8
> > > > [ 90.622928] shrink_lruvec+0x3dc/0x628
> > > > [ 90.622939] shrink_node+0x37c/0x6a0
> > > > [ 90.622950] balance_pgdat+0x354/0x668
> > > > [ 90.622961] kswapd+0x1e0/0x3c0
> > > > [ 90.622972] kthread+0x110/0x120
> > > >
> > > > but i have never got a backtrace in which thp is loaded as a whole though it
> > > > seems the code has this path:
> > >
> > > THP could be swapped out in a whole, but never swapped in as THP. Just
> > > the single base page (4K on x86) is swapped in.
> >
> > yep. it seems swapin_readahead() is never reading a THP or even splitted
> > pages for this 2MB THP.
> >
> > the number of pages to be read-ahead is determined either by
> > /proc/sys/vm/page-cluster if /sys/kernel/mm/swap/vma_ra_enabled is fase
> > or
> > by vma read-ahead algorithm if /sys//kernel/mm/swap/vma_ra_enabled is true
> > And the number is usually quite small.
> >
> > Am I missing any case in which 2MB can be swapped in as whole either by
> > splitted pages or a THP?
>
> Even though readahead swaps in 2MB, they are 512 single base pages
> rather than THP. They may not be physically continuous at all.
I actually haven't found out that readahead can swap in 2MB through either
THP or 512 single base pages. per my log, swapin_vma_readahead() usually
swaps in 2,3,4 or 8 pages.
but we do have a case in which we can swap in up to 2MB while doing
collapse:
static bool __collapse_huge_page_swapin(struct mm_struct *mm,
struct vm_area_struct *vma,
unsigned long haddr, pmd_t *pmd,
int referenced)
{
int swapped_in = 0;
vm_fault_t ret = 0;
unsigned long address, end = haddr + (HPAGE_PMD_NR * PAGE_SIZE);
for (address = haddr; address < end; address += PAGE_SIZE) {
struct vm_fault vmf = {
.vma = vma,
.address = address,
.pgoff = linear_page_index(vma, haddr),
.flags = FAULT_FLAG_ALLOW_RETRY,
.pmd = pmd,
};
vmf.pte = pte_offset_map(pmd, address);
vmf.orig_pte = *vmf.pte;
if (!is_swap_pte(vmf.orig_pte)) {
pte_unmap(vmf.pte);
continue;
}
swapped_in++;
ret = do_swap_page(&vmf);
...}
}
}
It seems Huang Ying once mentioned there was a plan to not split THP
throughout the whole process.
Thanks
Barry
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread