linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: 黄杰 <huangjie.albert@bytedance.com>
Cc: Hugh Dickins <hughd@google.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Andi Kleen <ak@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops
Date: Sun, 23 Oct 2022 13:16:33 -0700 (PDT)	[thread overview]
Message-ID: <b96374a-c5df-6ac3-c598-341eface774@google.com> (raw)
In-Reply-To: <CABKxMyNEWVBd8hvZfOHqxgtLi7gts+X53pVvag-zEsvqYqRnhw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3870 bytes --]

On Wed, 19 Oct 2022, 黄杰 wrote:
> Hugh Dickins <hughd@google.com> 于2022年10月13日周四 03:45写道:
> > On Wed, 12 Oct 2022, Albert Huang wrote:
> > > From: "huangjie.albert" <huangjie.albert@bytedance.com>
> > >
> > > implement these two functions so that we can set the mempolicy to
> > > the inode of the hugetlb file. This ensures that the mempolicy of
> > > all processes sharing this huge page file is consistent.
> > >
> > > In some scenarios where huge pages are shared:
> > > if we need to limit the memory usage of vm within node0, so I set qemu's
> > > mempilciy bind to node0, but if there is a process (such as virtiofsd)
> > > shared memory with the vm, in this case. If the page fault is triggered
> > > by virtiofsd, the allocated memory may go to node1 which  depends on
> > > virtiofsd.
> > >
> > > Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
> >
> > Aha!  Congratulations for noticing, after all this time.  hugetlbfs
> > contains various little pieces of code that pretend to be supporting
> > shared NUMA mempolicy, but in fact there was nothing connecting it up.
> >
> > It will be for Mike to decide, but personally I oppose adding
> > shared NUMA mempolicy support to hugetlbfs, after eighteen years.
> >
> > The thing is, it will change the behaviour of NUMA on hugetlbfs:
> > in ways that would have been sensible way back then, yes; but surely
> > those who have invested in NUMA and hugetlbfs have developed other
> > ways of administering it successfully, without shared NUMA mempolicy.
> >
> > At the least, I would expect some tests to break (I could easily be
> > wrong), and there's a chance that some app or tool would break too.
> 
> Hi : Hugh
> 
> Can you share some issues here?

Sorry, I don't think I can: precisely because it's been such a relief
to know that hugetlbfs is not in the shared NUMA mempolicy game, I've
given no thought to what issues it might have if it joined the game.

Not much memory is wasted on the unused fields in hugetlbfs_inode_info,
just a few bytes per inode, that aspect doesn't concern me much.

Reference counting of shared mempolicy has certainly been a recurrent
problem in the past (see mpol_needs_cond_ref() etc): stable nowadays
I believe; but whether supporting hugetlbfs would cause new problems
to surface there, I don't know; but whatever, those would just be
bugs to be fixed.

/proc/pid/numa_maps does not represent shared NUMA mempolicies
correctly: not for tmpfs, and would not for hugetlbfs.  I did have
old patches to fix that, but not patches that I'm ever likely to
have time to resurrect and present and push.

My main difficulties in tmpfs were with how to deal correctly and
consistently with non-hugepage-aligned mempolicies when hugepages
are in use.  In the case of hugetlbfs, it would be simpler, since
you're always dealing in hugepages of a known size: I recommend
being as strict as possible, demanding correctly aligned mempolicy
or else EINVAL.  (That may already be enforced, I've not looked.)

But my main concern in extending shared NUMA mempolicy to hugetlbfs
is exactly what I already said earlier:

The thing is, it will change the behaviour of NUMA on hugetlbfs:
in ways that would have been sensible way back then, yes; but surely
those who have invested in NUMA and hugetlbfs have developed other
ways of administering it successfully, without shared NUMA mempolicy.

At the least, I would expect some tests to break (I could easily be
wrong), and there's a chance that some app or tool would break too.

It's a risk, and a body of complication, that I would keep away from
myself.  The shared mempolicy rbtree: makes sense, but no madvise()
since has implemented such a tree, to attach its advice to ranges
of the shared object rather than to vma.

Hugh

  reply	other threads:[~2022-10-23 20:16 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-12  8:15 [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops Albert Huang
2022-10-12 19:45 ` Hugh Dickins
2022-10-14 16:56   ` Mike Kravetz
2022-10-17  3:35     ` [External] " 黄杰
2022-10-19  9:29     ` [PATCH v2] mm: hugetlb: support for shared memory policy Albert Huang
2022-10-19 11:49       ` Aneesh Kumar K V
2022-10-19  9:33   ` [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops 黄杰
2022-10-23 20:16     ` Hugh Dickins [this message]
2022-10-17  8:44 ` David Hildenbrand
2022-10-17  9:48   ` [External] " 黄杰
2022-10-17 11:33     ` David Hildenbrand
2022-10-17 11:46       ` 黄杰
2022-10-17 12:00         ` David Hildenbrand
2022-10-18  9:27           ` 黄杰
2022-10-18  9:35             ` David Hildenbrand
2022-10-17 17:59       ` [External] " Mike Kravetz
2022-10-18  9:24         ` 黄杰

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b96374a-c5df-6ac3-c598-341eface774@google.com \
    --to=hughd@google.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=huangjie.albert@bytedance.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).