All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>,
	David Hildenbrand <david@redhat.com>,
	akpm@linux-foundation.org, wangkefeng.wang@huawei.com,
	willy@infradead.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, John Hubbard <jhubbard@nvidia.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [RFC PATCH] mm: support large folio numa balancing
Date: Fri, 17 Nov 2023 10:07:45 +0000	[thread overview]
Message-ID: <20231117100745.fnpijbk4xgmals3k@techsingularity.net> (raw)
In-Reply-To: <87sf57en8n.fsf@yhuang6-desk2.ccr.corp.intel.com>

On Wed, Nov 15, 2023 at 10:58:32AM +0800, Huang, Ying wrote:
> Baolin Wang <baolin.wang@linux.alibaba.com> writes:
> 
> > On 11/14/2023 9:12 AM, Huang, Ying wrote:
> >> David Hildenbrand <david@redhat.com> writes:
> >> 
> >>> On 13.11.23 11:45, Baolin Wang wrote:
> >>>> Currently, the file pages already support large folio, and supporting for
> >>>> anonymous pages is also under discussion[1]. Moreover, the numa balancing
> >>>> code are converted to use a folio by previous thread[2], and the migrate_pages
> >>>> function also already supports the large folio migration.
> >>>> So now I did not see any reason to continue restricting NUMA
> >>>> balancing for
> >>>> large folio.
> >>>
> >>> I recall John wanted to look into that. CCing him.
> >>>
> >>> I'll note that the "head page mapcount" heuristic to detect sharers will
> >>> now strike on the PTE path and make us believe that a large folios is
> >>> exclusive, although it isn't.
> >> Even 4k folio may be shared by multiple processes/threads.  So, numa
> >> balancing uses a multi-stage node selection algorithm (mostly
> >> implemented in should_numa_migrate_memory()) to identify shared folios.
> >> I think that the algorithm needs to be adjusted for PTE mapped large
> >> folio for shared folios.
> >
> > Not sure I get you here. In should_numa_migrate_memory(), it will use
> > last CPU id, last PID and group numa faults to determine if this page
> > can be migrated to the target node. So for large folio, a precise
> > folio sharers check can make the numa faults of a group more accurate,
> > which is enough for should_numa_migrate_memory() to make a decision?
> 
> A large folio that is mapped by multiple process may be accessed by one
> remote NUMA node, so we still want to migrate it.  A large folio that is
> mapped by one process but accessed by multiple threads on multiple NUMA
> node may be not migrated.
> 

This leads into a generic problem with large anything with NUMA
balancing -- false sharing. As it stands, THP can be false shared by
threads if thread-local data is split within a THP range. In this case,
the ideal would be the THP is migrated to the hottest node but such
support doesn't exist. The same applies for folios. If not handled
properly, a large folio of any type can ping-pong between nodes so just
migrating because we can is not necessarily a good idea. The patch
should cover a realistic case why this matters, why splitting the folio
is not better and supporting data.

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2023-11-17 10:07 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-13 10:45 [RFC PATCH] mm: support large folio numa balancing Baolin Wang
2023-11-13 10:53 ` David Hildenbrand
2023-11-13 12:10   ` Kefeng Wang
2023-11-13 13:01     ` Baolin Wang
2023-11-13 22:15       ` John Hubbard
2023-11-14 11:35         ` David Hildenbrand
2023-11-14 13:12           ` Kefeng Wang
2023-11-13 12:59   ` Baolin Wang
2023-11-13 14:49     ` David Hildenbrand
2023-11-14 10:53       ` Baolin Wang
2023-11-14  1:12   ` Huang, Ying
2023-11-14 11:11     ` Baolin Wang
2023-11-15  2:58       ` Huang, Ying
2023-11-17 10:07         ` Mel Gorman [this message]
2023-11-17 10:13           ` Peter Zijlstra
2023-11-17 16:04             ` Mel Gorman
2023-11-20  8:01           ` Baolin Wang
2023-11-15 10:46 ` David Hildenbrand
2023-11-15 10:47   ` David Hildenbrand
2023-11-20  3:28     ` Baolin Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231117100745.fnpijbk4xgmals3k@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.