From: David Hildenbrand <david@redhat.com>
To: Gavin Shan <gshan@redhat.com>,
Alexander Duyck <alexander.duyck@gmail.com>
Cc: linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
shan.gavin@gmail.com,
Anshuman Khandual <anshuman.khandual@arm.com>
Subject: Re: [RFC PATCH] mm/page_reporting: Adjust threshold according to MAX_ORDER
Date: Wed, 16 Jun 2021 14:07:35 +0200 [thread overview]
Message-ID: <249e5814-e644-3d82-9b38-232928af4dbd@redhat.com> (raw)
In-Reply-To: <3adbcad8-1016-cf48-4574-799de0bba6e4@redhat.com>
> Indeed. 512MB pageblocks are rare, especially on systems which have been
> up and running for long time.
>
> The free page reporting starts from guest. Taking an extreme case: guest has
> 512MB memory and it's backed by one THP on host. The free page reporting won't
> work at all.
>
> Besides, it seems free page reporting isn't guranteed to work all the time.
> For example, on system where we have 4KB base page size. Freeing individual
> 4KB pages can't come up with a free 2MB pageblock due to fragmentation.
> In this case, the free'd page won't be reported immediately, but might be
> reported after swapping or compaction due to memory pressure. The free page
> isn't reported immediately at least.
Exactly, it's a pure optimization that won't work, especially when guest
memory is heavily fragmented. There has to be a balance between
reclaiming free memory in the hypervisor, degrading VM performance, and
overhead of the feature.
Further, there are no guarantees when a VM will reuse the memory again.
In the worst case, all VMs that reported free pages reuse memory at the
same time. In that case, one definitely needs sufficient backend memory
in the hypervisor (-> swap) to not run out of memory, and performance
will be degraded.
As MST once phrased it, if the feature has a higher overhead than
swapping in the hypervisor, it's of little use.
>
> David, how about taking your suggestion to have different threshold size only
> for arm64 (64KB base page size). The threshold will be smaller than pageblock_order
> for sure. There are two ways to do so and please let me know which is the preferred
> way to go if you (and Alex) agree to do it.
>
> (a) Introduce CONFIG_PAGE_REPORTING_ORDER for individual archs to choose the
> value. The threshold falls back to pageblock_order if isn't configurated.
> (b) Rename PAGE_REPORTING_MIN_ORDER to PAGE_REPORTING_ORDER. archs can decide
> its value. If it's not provided by arch, it falls back to pageblock_order.
>
I wonder if we could further define it as a (module/cmdline) parameter
and make it configurable when booting. The default could then be set
based on CONFIG_PAGE_REPORTING_ORDER. CONFIG_PAGE_REPORTING_ORDER would
default to pageblock_order (if easily possible) and could be
special-cases to arm64 with 64k.
> By the way, I recently had some performance testing on different page sizes.
> We get much more performance gain from 64KB (vs 4KB) page size in guest than
> 512MB (vs 2MB) THP on host. It means the performance won't be affected too
> much even the 512MB THP is splitted on arm64 host.
Yes, if one is even able to get 512MB THP populated in the hypervisor --
because once again, 512MB THP are just a bad fit for many workloads.
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2021-06-16 12:07 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-01 3:33 [RFC PATCH] mm/page_reporting: Adjust threshold according to MAX_ORDER Gavin Shan
2021-06-01 8:01 ` David Hildenbrand
2021-06-11 7:44 ` Gavin Shan
2021-06-14 11:03 ` David Hildenbrand
2021-06-15 2:26 ` Alexander Duyck
2021-06-15 2:26 ` Alexander Duyck
2021-06-16 9:10 ` Gavin Shan
2021-06-16 8:03 ` David Hildenbrand
2021-06-16 13:16 ` Gavin Shan
2021-06-16 11:20 ` David Hildenbrand
2021-06-16 13:58 ` Gavin Shan
2021-06-16 12:07 ` David Hildenbrand [this message]
2021-06-21 5:16 ` Gavin Shan
2021-06-16 14:15 ` Alexander Duyck
2021-06-16 14:15 ` Alexander Duyck
2021-06-21 7:03 ` Gavin Shan
2021-06-21 7:52 ` Gavin Shan
2021-06-21 13:43 ` Alexander Duyck
2021-06-21 13:43 ` Alexander Duyck
2021-06-16 1:53 ` Gavin Shan
2021-06-16 7:59 ` David Hildenbrand
2021-06-16 12:59 ` Gavin Shan
2021-06-16 11:15 ` David Hildenbrand
2021-06-02 0:03 ` Andrew Morton
2021-06-11 2:54 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=249e5814-e644-3d82-9b38-232928af4dbd@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.duyck@gmail.com \
--cc=anshuman.khandual@arm.com \
--cc=gshan@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=shan.gavin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.