All of lore.kernel.org
 help / color / mirror / Atom feed
From: Muchun Song <songmuchun@bytedance.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Iurii Zaikin <yzaikin@google.com>,
	Oscar Salvador <osalvador@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Masahiro Yamada <masahiroy@kernel.org>,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Xiongchun duan <duanxiongchun@bytedance.com>,
	Muchun Song <smuchun@gmail.com>
Subject: Re: [PATCH v6 4/4] mm: hugetlb_vmemmap: add hugetlb_free_vmemmap sysctl
Date: Thu, 31 Mar 2022 11:45:29 +0800	[thread overview]
Message-ID: <CAMZfGtWqaM5n38kSvjTJxCSYVq-ic30-VmshrYK9xXsF=Fe10A@mail.gmail.com> (raw)
In-Reply-To: <20220330193657.88f68bbf13fb198fb189bc15@linux-foundation.org>

On Thu, Mar 31, 2022 at 10:37 AM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Wed, 30 Mar 2022 23:37:45 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
>
> > We must add "hugetlb_free_vmemmap=on" to boot cmdline and reboot the
> > server to enable the feature of freeing vmemmap pages of HugeTLB
> > pages.  Rebooting usually takes a long time.  Add a sysctl to enable
> > or disable the feature at runtime without rebooting.
>
> I forget, why did we add the hugetlb_free_vmemmap option in the first
> place? Why not just leave the feature enabled in all cases?

The 1st reason is because we disable PMD/huge page mapping
of vmemmap pages (in the original version) which increase page
table pages.  So if a user/sysadmin only  uses a small number of
HugeTLB pages (as a percentage of system memory), they could
end up using more memory with hugetlb_free_vmemmap on as
opposed to off.  Now this tradeoff is gone.

The 2nd reason is this feature adds more overhead in the path of
HugeTLB allocation/freeing from/to the buddy system.  As Mike said
in the link [1].
"
There are still some instances where huge pages
are allocated 'on the fly' instead of being pulled from the pool.  Michal
pointed out the case of page migration.  It is also possible for someone to
use hugetlbfs without pre-allocating huge pages to the pool.  I remember the
use case pointed out in commit 099730d67417.  It says, "I have a hugetlbfs
user which is never explicitly allocating huge pages with 'nr_hugepages'.
They only set 'nr_overcommit_hugepages' and then let the pages be allocated
from the buddy allocator at fault time."  In this case, I suspect they were
using 'page fault' allocation for initialization much like someone using
/proc/sys/vm/nr_hugepages.  So, the overhead may not be as noticeable.
"

For those different workloads, we introduce hugetlb_free_vmemmap and
expect users to make decisions based on their workloads.

[1] https://patchwork.kernel.org/comment/23752641/

>
> Furthermore, why would anyone want to tweak this at runtime?  What is
> the use case?  Where is the end-user value in all of this?

If the workload is changed in the future on a server.  The users need to
adapt this at runtime without rebooting the server.

>
> > Disabling requires there is no any optimized HugeTLB page in the
> > system.  If you fail to disable it, you can set "nr_hugepages" to 0
> > and then retry.
> >
> > --- a/Documentation/admin-guide/sysctl/vm.rst
> > +++ b/Documentation/admin-guide/sysctl/vm.rst
> > @@ -561,6 +561,20 @@ Change the minimum size of the hugepage pool.
> >  See Documentation/admin-guide/mm/hugetlbpage.rst
> >
> >
> > +hugetlb_free_vmemmap
> > +====================
> > +
> > +Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap
> > +pages associated with each HugeTLB page.  Once true, the vmemmap pages of
> > +subsequent allocation of HugeTLB pages from buddy system will be optimized,
> > +whereas already allocated HugeTLB pages will not be optimized.  If you fail
> > +to disable this feature, you can set "nr_hugepages" to 0 and then retry
> > +since it is only allowed to be disabled after there is no any optimized
> > +HugeTLB page in the system.
> > +
>
> Pity the poor user who is looking at this and wondering whether it will
> improve or worsen things.  If we don't tell them, who will?  Are they
> supposed to just experiment?
>
> What can we add here to help them understand whether this might be
> beneficial?
>

My bad. I should explain more details to let users make better decisions.

Thanks.

      reply	other threads:[~2022-03-31  4:12 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-30 15:37 [PATCH v6 0/4] add hugetlb_free_vmemmap sysctl Muchun Song
2022-03-30 15:37 ` [PATCH v6 1/4] mm: hugetlb_vmemmap: introduce STRUCT_PAGE_SIZE_IS_POWER_OF_2 Muchun Song
2022-03-31  2:28   ` Andrew Morton
2022-03-31  2:52     ` Muchun Song
2022-03-31  2:57       ` Andrew Morton
2022-03-31 12:39   ` kernel test robot
2022-03-31 15:26     ` Muchun Song
2022-03-31 15:26       ` Muchun Song
2022-03-30 15:37 ` [PATCH v6 2/4] mm: memory_hotplug: override memmap_on_memory when hugetlb_free_vmemmap=on Muchun Song
2022-03-30 15:37 ` [PATCH v6 3/4] sysctl: allow to set extra1 to SYSCTL_ONE Muchun Song
2022-03-30 15:37 ` [PATCH v6 4/4] mm: hugetlb_vmemmap: add hugetlb_free_vmemmap sysctl Muchun Song
2022-03-31  2:36   ` Andrew Morton
2022-03-31  3:45     ` Muchun Song [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMZfGtWqaM5n38kSvjTJxCSYVq-ic30-VmshrYK9xXsF=Fe10A@mail.gmail.com' \
    --to=songmuchun@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=keescook@chromium.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=masahiroy@kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=osalvador@suse.de \
    --cc=smuchun@gmail.com \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.