From: David Hildenbrand <david@redhat.com>
To: Gavin Shan <gshan@redhat.com>, linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, alexander.h.duyck@linux.intel.com,
akpm@linux-foundation.org, shan.gavin@gmail.com,
Anshuman Khandual <anshuman.khandual@arm.com>
Subject: Re: [RFC PATCH] mm/page_reporting: Adjust threshold according to MAX_ORDER
Date: Tue, 1 Jun 2021 10:01:25 +0200 [thread overview]
Message-ID: <76516781-6a70-f2b0-f3e3-da999c84350f@redhat.com> (raw)
In-Reply-To: <20210601033319.100737-1-gshan@redhat.com>
On 01.06.21 05:33, Gavin Shan wrote:
> The PAGE_REPORTING_MIN_ORDER is equal to @pageblock_order, taken as
> minimal order (threshold) to trigger page reporting. The page reporting
> is never triggered with the following configurations and settings on
> aarch64. In the particular scenario, the page reporting won't be triggered
> until the largest (2 ^ (MAX_ORDER-1)) free area is achieved from the
> page freeing. The condition is very hard, or even impossible to be met.
>
> CONFIG_ARM64_PAGE_SHIFT: 16
> CONFIG_HUGETLB_PAGE: Y
> CONFIG_HUGETLB_PAGE_SIZE_VARIABLE: N
> pageblock_order: 13
> CONFIG_FORCE_MAX_ZONEORDER: 14
> MAX_ORDER: 14
>
> The issue can be reproduced in VM, running kernel with above configurations
> and settings. The 'memhog' is used inside the VM to access 512MB anonymous
> area. The QEMU's RSS doesn't drop accordingly after 'memhog' exits.
>
> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
> -accel kvm -machine virt,gic-version=host \
> -cpu host -smp 8,sockets=2,cores=4,threads=1 -m 4096M,maxmem=64G \
> -object memory-backend-ram,id=mem0,size=2048M \
> -object memory-backend-ram,id=mem1,size=2048M \
> -numa node,nodeid=0,cpus=0-3,memdev=mem0 \
> -numa node,nodeid=1,cpus=4-7,memdev=mem1 \
> : \
> -device virtio-balloon-pci,id=balloon0,free-page-reporting=yes
>
> This tries to fix the issue by adjusting the threshold to the smaller value
> of @pageblock_order and (MAX_ORDER/2). With this applied, the QEMU's RSS
> drops after 'memhog' exits.
IIRC, we use pageblock_order to
a) Reduce the free page reporting overhead. Reporting on small chunks
can make us report constantly with little system activity.
b) Avoid splitting THP in the hypervisor, avoiding downgraded VM
performance.
c) Avoid affecting creation of pageblock_order pages while hinting is
active. I think there are cases where "temporary pulling sub-pageblock
pages" can negatively affect creation of pageblock_order pages.
Concurrent compaction would be one of these cases.
The monstrosity called aarch64 64k is really special in that sense,
because a) does not apply because pageblocks are just very big, b) does
sometimes not apply because either our VM isn't backed by (rare) 512MB
THP or uses 4k with 2MB THP and c) similarly doesn't apply in smallish
VMs because we don't really happen to create 512MB THP either way.
For example, going on x86-64 from reporting 2MB to something like 32KB
is absolutely undesired.
I think if we want to go down that path (and I am not 100% sure yet if
we want to), we really want to treat only the special case in a special
way. Note that even when doing it only for aarch64 with 64k, you will
still end up splitting THP in a hypervisor if it uses 64k base pages
(b)) and can affect creation of THP, for example, when compacting (c),
so there is a negative side to that.
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2021-06-01 8:01 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-01 3:33 [RFC PATCH] mm/page_reporting: Adjust threshold according to MAX_ORDER Gavin Shan
2021-06-01 8:01 ` David Hildenbrand [this message]
2021-06-11 7:44 ` Gavin Shan
2021-06-14 11:03 ` David Hildenbrand
2021-06-15 2:26 ` Alexander Duyck
2021-06-16 9:10 ` Gavin Shan
2021-06-16 8:03 ` David Hildenbrand
2021-06-16 13:16 ` Gavin Shan
2021-06-16 11:20 ` David Hildenbrand
2021-06-16 13:58 ` Gavin Shan
2021-06-16 12:07 ` David Hildenbrand
2021-06-21 5:16 ` Gavin Shan
2021-06-16 14:15 ` Alexander Duyck
2021-06-21 7:03 ` Gavin Shan
2021-06-21 7:52 ` Gavin Shan
2021-06-21 13:43 ` Alexander Duyck
2021-06-16 1:53 ` Gavin Shan
2021-06-16 7:59 ` David Hildenbrand
2021-06-16 12:59 ` Gavin Shan
2021-06-16 11:15 ` David Hildenbrand
2021-06-02 0:03 ` Andrew Morton
2021-06-11 2:54 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=76516781-6a70-f2b0-f3e3-da999c84350f@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.h.duyck@linux.intel.com \
--cc=anshuman.khandual@arm.com \
--cc=gshan@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=shan.gavin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).