All of lore.kernel.org
 help / color / mirror / Atom feed
From: Barry Song <21cnbao@gmail.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: "Huang, Ying" <ying.huang@intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	akpm@linux-foundation.org,  linux-mm@kvack.org,
	chengming.zhou@linux.dev, chrisl@kernel.org,  david@redhat.com,
	hannes@cmpxchg.org, kasong@tencent.com,
	 linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,  mhocko@suse.com,
	nphamcs@gmail.com, shy828301@gmail.com, steven.price@arm.com,
	 surenb@google.com, wangkefeng.wang@huawei.com, xiang@kernel.org,
	 yosryahmed@google.com, yuzhao@google.com,
	Chuanhua Han <hanchuanhua@oppo.com>,
	 Barry Song <v-songbaohua@oppo.com>
Subject: Re: [RFC PATCH v3 5/5] mm: support large folios swapin as a whole
Date: Thu, 21 Mar 2024 23:20:23 +1300	[thread overview]
Message-ID: <CAGsJ_4wb6JTuBs=_xvUUB+80W7rXOeR2OhJT1418_Xnmu-1VvA@mail.gmail.com> (raw)
In-Reply-To: <f918354d-12ee-4349-9356-fc02d2457a26@arm.com>

On Wed, Mar 20, 2024 at 1:19 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 19/03/2024 09:20, Huang, Ying wrote:
> > Ryan Roberts <ryan.roberts@arm.com> writes:
> >
> >>>>> I agree phones are not the only platform. But Rome wasn't built in a
> >>>>> day. I can only get
> >>>>> started on a hardware which I can easily reach and have enough hardware/test
> >>>>> resources on it. So we may take the first step which can be applied on
> >>>>> a real product
> >>>>> and improve its performance, and step by step, we broaden it and make it
> >>>>> widely useful to various areas  in which I can't reach :-)
> >>>>
> >>>> We must guarantee the normal swap path runs correctly and has no
> >>>> performance regression when developing SWP_SYNCHRONOUS_IO optimization.
> >>>> So we have to put some effort on the normal path test anyway.
> >>>>
> >>>>> so probably we can have a sysfs "enable" entry with default "n" or
> >>>>> have a maximum
> >>>>> swap-in order as Ryan's suggestion [1] at the beginning,
> >>>>>
> >>>>> "
> >>>>> So in the common case, swap-in will pull in the same size of folio as was
> >>>>> swapped-out. Is that definitely the right policy for all folio sizes? Certainly
> >>>>> it makes sense for "small" large folios (e.g. up to 64K IMHO). But I'm not sure
> >>>>> it makes sense for 2M THP; As the size increases the chances of actually needing
> >>>>> all of the folio reduces so chances are we are wasting IO. There are similar
> >>>>> arguments for CoW, where we currently copy 1 page per fault - it probably makes
> >>>>> sense to copy the whole folio up to a certain size.
> >>>>> "
> >>
> >> I thought about this a bit more. No clear conclusions, but hoped this might help
> >> the discussion around policy:
> >>
> >> The decision about the size of the THP is made at first fault, with some help
> >> from user space and in future we might make decisions to split based on
> >> munmap/mremap/etc hints. In an ideal world, the fact that we have had to swap
> >> the THP out at some point in its lifetime should not impact on its size. It's
> >> just being moved around in the system and the reason for our original decision
> >> should still hold.
> >>
> >> So from that PoV, it would be good to swap-in to the same size that was
> >> swapped-out.
> >
> > Sorry, I don't agree with this.  It's better to swap-in and swap-out in
> > smallest size if the page is only accessed seldom to avoid to waste
> > memory.
>
> If we want to optimize only for memory consumption, I'm sure there are many
> things we would do differently. We need to find a balance between memory and
> performance. The benefits of folios are well documented and the kernel is
> heading in the direction of managing memory in variable-sized blocks. So I don't
> think it's as simple as saying we should always swap-in the smallest possible
> amount of memory.

Absolutely agreed. With 64KiB large folios implemented, there may have been
a slight uptick in memory usage due to fragmentation. Nevertheless, through the
optimization of zRAM and zsmalloc to compress entire large folios, we found that
the compressed data could be up to 1GiB smaller compared to compressing them
in 4KiB increments on a typical phone with 12~16GiB memory. Consequently, we
not only reclaimed our memory loss entirely but also gained the benefits of
CONT-PTE , reduced TLB misses etc.

>
> You also said we should swap *out* in smallest size possible. Have I
> misunderstood you? I thought the case for swapping-out a whole folio without
> splitting was well established and non-controversial?
>
> >
> >> But we only kind-of keep that information around, via the swap
> >> entry contiguity and alignment. With that scheme it is possible that multiple
> >> virtually adjacent but not physically contiguous folios get swapped-out to
> >> adjacent swap slot ranges and then they would be swapped-in to a single, larger
> >> folio. This is not ideal, and I think it would be valuable to try to maintain
> >> the original folio size information with the swap slot. One way to do this would
> >> be to store the original order for which the cluster was allocated in the
> >> cluster. Then we at least know that a given swap slot is either for a folio of
> >> that order or an order-0 folio (due to cluster exhaustion/scanning). Can we
> >> steal a bit from swap_map to determine which case it is? Or are there better
> >> approaches?
> >
> > [snip]
> >
> > --
> > Best Regards,
> > Huang, Ying

Thanks
Barry

WARNING: multiple messages have this Message-ID (diff)
From: Barry Song <21cnbao@gmail.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: "Huang, Ying" <ying.huang@intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	akpm@linux-foundation.org,  linux-mm@kvack.org,
	chengming.zhou@linux.dev, chrisl@kernel.org,  david@redhat.com,
	hannes@cmpxchg.org, kasong@tencent.com,
	 linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,  mhocko@suse.com,
	nphamcs@gmail.com, shy828301@gmail.com, steven.price@arm.com,
	 surenb@google.com, wangkefeng.wang@huawei.com, xiang@kernel.org,
	 yosryahmed@google.com, yuzhao@google.com,
	Chuanhua Han <hanchuanhua@oppo.com>,
	 Barry Song <v-songbaohua@oppo.com>
Subject: Re: [RFC PATCH v3 5/5] mm: support large folios swapin as a whole
Date: Thu, 21 Mar 2024 23:20:23 +1300	[thread overview]
Message-ID: <CAGsJ_4wb6JTuBs=_xvUUB+80W7rXOeR2OhJT1418_Xnmu-1VvA@mail.gmail.com> (raw)
In-Reply-To: <f918354d-12ee-4349-9356-fc02d2457a26@arm.com>

On Wed, Mar 20, 2024 at 1:19 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 19/03/2024 09:20, Huang, Ying wrote:
> > Ryan Roberts <ryan.roberts@arm.com> writes:
> >
> >>>>> I agree phones are not the only platform. But Rome wasn't built in a
> >>>>> day. I can only get
> >>>>> started on a hardware which I can easily reach and have enough hardware/test
> >>>>> resources on it. So we may take the first step which can be applied on
> >>>>> a real product
> >>>>> and improve its performance, and step by step, we broaden it and make it
> >>>>> widely useful to various areas  in which I can't reach :-)
> >>>>
> >>>> We must guarantee the normal swap path runs correctly and has no
> >>>> performance regression when developing SWP_SYNCHRONOUS_IO optimization.
> >>>> So we have to put some effort on the normal path test anyway.
> >>>>
> >>>>> so probably we can have a sysfs "enable" entry with default "n" or
> >>>>> have a maximum
> >>>>> swap-in order as Ryan's suggestion [1] at the beginning,
> >>>>>
> >>>>> "
> >>>>> So in the common case, swap-in will pull in the same size of folio as was
> >>>>> swapped-out. Is that definitely the right policy for all folio sizes? Certainly
> >>>>> it makes sense for "small" large folios (e.g. up to 64K IMHO). But I'm not sure
> >>>>> it makes sense for 2M THP; As the size increases the chances of actually needing
> >>>>> all of the folio reduces so chances are we are wasting IO. There are similar
> >>>>> arguments for CoW, where we currently copy 1 page per fault - it probably makes
> >>>>> sense to copy the whole folio up to a certain size.
> >>>>> "
> >>
> >> I thought about this a bit more. No clear conclusions, but hoped this might help
> >> the discussion around policy:
> >>
> >> The decision about the size of the THP is made at first fault, with some help
> >> from user space and in future we might make decisions to split based on
> >> munmap/mremap/etc hints. In an ideal world, the fact that we have had to swap
> >> the THP out at some point in its lifetime should not impact on its size. It's
> >> just being moved around in the system and the reason for our original decision
> >> should still hold.
> >>
> >> So from that PoV, it would be good to swap-in to the same size that was
> >> swapped-out.
> >
> > Sorry, I don't agree with this.  It's better to swap-in and swap-out in
> > smallest size if the page is only accessed seldom to avoid to waste
> > memory.
>
> If we want to optimize only for memory consumption, I'm sure there are many
> things we would do differently. We need to find a balance between memory and
> performance. The benefits of folios are well documented and the kernel is
> heading in the direction of managing memory in variable-sized blocks. So I don't
> think it's as simple as saying we should always swap-in the smallest possible
> amount of memory.

Absolutely agreed. With 64KiB large folios implemented, there may have been
a slight uptick in memory usage due to fragmentation. Nevertheless, through the
optimization of zRAM and zsmalloc to compress entire large folios, we found that
the compressed data could be up to 1GiB smaller compared to compressing them
in 4KiB increments on a typical phone with 12~16GiB memory. Consequently, we
not only reclaimed our memory loss entirely but also gained the benefits of
CONT-PTE , reduced TLB misses etc.

>
> You also said we should swap *out* in smallest size possible. Have I
> misunderstood you? I thought the case for swapping-out a whole folio without
> splitting was well established and non-controversial?
>
> >
> >> But we only kind-of keep that information around, via the swap
> >> entry contiguity and alignment. With that scheme it is possible that multiple
> >> virtually adjacent but not physically contiguous folios get swapped-out to
> >> adjacent swap slot ranges and then they would be swapped-in to a single, larger
> >> folio. This is not ideal, and I think it would be valuable to try to maintain
> >> the original folio size information with the swap slot. One way to do this would
> >> be to store the original order for which the cluster was allocated in the
> >> cluster. Then we at least know that a given swap slot is either for a folio of
> >> that order or an order-0 folio (due to cluster exhaustion/scanning). Can we
> >> steal a bit from swap_map to determine which case it is? Or are there better
> >> approaches?
> >
> > [snip]
> >
> > --
> > Best Regards,
> > Huang, Ying

Thanks
Barry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2024-03-21 10:20 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-04  8:13 [RFC PATCH v3 0/5] mm: support large folios swap-in Barry Song
2024-03-04  8:13 ` Barry Song
2024-03-04  8:13 ` [RFC PATCH v3 1/5] arm64: mm: swap: support THP_SWAP on hardware with MTE Barry Song
2024-03-04  8:13   ` Barry Song
2024-03-11 16:55   ` Ryan Roberts
2024-03-11 16:55     ` Ryan Roberts
2024-03-21  8:42     ` Barry Song
2024-03-21  8:42       ` Barry Song
2024-03-21 10:31       ` Ryan Roberts
2024-03-21 10:31         ` Ryan Roberts
2024-03-21 10:43         ` Barry Song
2024-03-21 10:43           ` Barry Song
2024-03-22  2:51         ` Barry Song
2024-03-22  2:51           ` Barry Song
2024-03-22  7:41           ` Barry Song
2024-03-22  7:41             ` Barry Song
2024-03-22 10:19             ` Ryan Roberts
2024-03-22 10:19               ` Ryan Roberts
2024-03-23  2:15               ` Chris Li
2024-03-23  2:15                 ` Chris Li
2024-03-23  3:50                 ` Barry Song
2024-03-23  3:50                   ` Barry Song
2024-03-04  8:13 ` [RFC PATCH v3 2/5] mm: swap: introduce swap_nr_free() for batched swap_free() Barry Song
2024-03-04  8:13   ` Barry Song
2024-03-11 18:51   ` Ryan Roberts
2024-03-11 18:51     ` Ryan Roberts
2024-03-14 13:12     ` Chuanhua Han
2024-03-14 13:12       ` Chuanhua Han
2024-03-14 13:43       ` Ryan Roberts
2024-03-14 13:43         ` Ryan Roberts
2024-03-15  8:34         ` Chuanhua Han
2024-03-15  8:34           ` Chuanhua Han
2024-03-15 10:57           ` Ryan Roberts
2024-03-15 10:57             ` Ryan Roberts
2024-03-18  1:28             ` Chuanhua Han
2024-03-18  1:28               ` Chuanhua Han
2024-03-04  8:13 ` [RFC PATCH v3 3/5] mm: swap: make should_try_to_free_swap() support large-folio Barry Song
2024-03-04  8:13   ` Barry Song
2024-03-12 12:34   ` Ryan Roberts
2024-03-12 12:34     ` Ryan Roberts
2024-03-13  2:21     ` Chuanhua Han
2024-03-13  2:21       ` Chuanhua Han
2024-03-13  9:09       ` Ryan Roberts
2024-03-13  9:09         ` Ryan Roberts
2024-03-13  9:24         ` Chuanhua Han
2024-03-13  9:24           ` Chuanhua Han
2024-03-04  8:13 ` [RFC PATCH v3 4/5] mm: swap: introduce swapcache_prepare_nr and swapcache_clear_nr for large folios swap-in Barry Song
2024-03-04  8:13   ` Barry Song
2024-03-12 15:35   ` Ryan Roberts
2024-03-12 15:35     ` Ryan Roberts
2024-03-18 22:35     ` Barry Song
2024-03-18 22:35       ` Barry Song
2024-03-04  8:13 ` [RFC PATCH v3 5/5] mm: support large folios swapin as a whole Barry Song
2024-03-04  8:13   ` Barry Song
2024-03-12 16:33   ` Ryan Roberts
2024-03-12 16:33     ` Ryan Roberts
2024-03-14 12:56     ` Chuanhua Han
2024-03-14 12:56       ` Chuanhua Han
2024-03-14 13:57       ` Ryan Roberts
2024-03-14 13:57         ` Ryan Roberts
2024-03-14 20:43         ` Barry Song
2024-03-14 20:43           ` Barry Song
2024-03-15 10:59           ` Ryan Roberts
2024-03-15 10:59             ` Ryan Roberts
2024-03-15  1:16         ` Chuanhua Han
2024-03-15  1:16           ` Chuanhua Han
2024-03-15  8:41   ` Huang, Ying
2024-03-15  8:41     ` Huang, Ying
2024-03-15  8:54     ` Barry Song
2024-03-15  8:54       ` Barry Song
2024-03-15  9:15       ` Huang, Ying
2024-03-15  9:15         ` Huang, Ying
2024-03-15 10:01         ` Barry Song
2024-03-15 10:01           ` Barry Song
2024-03-15 12:06           ` Ryan Roberts
2024-03-15 12:06             ` Ryan Roberts
2024-03-17  6:11             ` Barry Song
2024-03-17  6:11               ` Barry Song
2024-03-18  1:52           ` Huang, Ying
2024-03-18  1:52             ` Huang, Ying
2024-03-18  2:41             ` Barry Song
2024-03-18  2:41               ` Barry Song
2024-03-18 16:45               ` Ryan Roberts
2024-03-18 16:45                 ` Ryan Roberts
2024-03-19  6:27                 ` Barry Song
2024-03-19  6:27                   ` Barry Song
2024-03-19  9:05                   ` Ryan Roberts
2024-03-19  9:05                     ` Ryan Roberts
2024-03-21  9:22                     ` Barry Song
2024-03-21  9:22                       ` Barry Song
2024-03-21 11:13                       ` Ryan Roberts
2024-03-21 11:13                         ` Ryan Roberts
2024-03-19  9:20                 ` Huang, Ying
2024-03-19  9:20                   ` Huang, Ying
2024-03-19 12:19                   ` Ryan Roberts
2024-03-19 12:19                     ` Ryan Roberts
2024-03-20  2:18                     ` Huang, Ying
2024-03-20  2:18                       ` Huang, Ying
2024-03-20  2:47                       ` Barry Song
2024-03-20  2:47                         ` Barry Song
2024-03-20  6:20                         ` Huang, Ying
2024-03-20  6:20                           ` Huang, Ying
2024-03-20 18:38                           ` Barry Song
2024-03-20 18:38                             ` Barry Song
2024-03-21  4:23                             ` Huang, Ying
2024-03-21  4:23                               ` Huang, Ying
2024-03-21  5:12                               ` Barry Song
2024-03-21  5:12                                 ` Barry Song
2024-03-21 10:20                     ` Barry Song [this message]
2024-03-21 10:20                       ` Barry Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGsJ_4wb6JTuBs=_xvUUB+80W7rXOeR2OhJT1418_Xnmu-1VvA@mail.gmail.com' \
    --to=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=hanchuanhua@oppo.com \
    --cc=hannes@cmpxchg.org \
    --cc=kasong@tencent.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=nphamcs@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=steven.price@arm.com \
    --cc=surenb@google.com \
    --cc=v-songbaohua@oppo.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=xiang@kernel.org \
    --cc=ying.huang@intel.com \
    --cc=yosryahmed@google.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.