[RFC PATCH] mm: support large folio numa balancing

* [RFC PATCH] mm: support large folio numa balancing
@ 2023-11-13 10:45 Baolin Wang
  2023-11-13 10:53 ` David Hildenbrand
  2023-11-15 10:46 ` David Hildenbrand
  0 siblings, 2 replies; 20+ messages in thread
From: Baolin Wang @ 2023-11-13 10:45 UTC (permalink / raw)
  To: akpm
  Cc: david, ying.huang, wangkefeng.wang, willy, baolin.wang, linux-mm,
	linux-kernel

Currently, the file pages already support large folio, and supporting for
anonymous pages is also under discussion[1]. Moreover, the numa balancing
code are converted to use a folio by previous thread[2], and the migrate_pages
function also already supports the large folio migration.

So now I did not see any reason to continue restricting NUMA balancing for
large folio.

[1] https://lkml.org/lkml/2023/9/29/342
[2] https://lore.kernel.org/all/20230921074417.24004-4-wangkefeng.wang@huawei.com/T/#md9d10fe34587229a72801f0d731f7457ab3f4a6e
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/memory.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index c32954e16b28..8ca21eff294c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4804,7 +4804,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	int last_cpupid;
 	int target_nid;
 	pte_t pte, old_pte;
-	int flags = 0;
+	int flags = 0, nr_pages = 0;
 
 	/*
 	 * The "pte" at this point cannot be used safely without
@@ -4834,10 +4834,6 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	if (!folio || folio_is_zone_device(folio))
 		goto out_map;
 
-	/* TODO: handle PTE-mapped THP */
-	if (folio_test_large(folio))
-		goto out_map;
-
 	/*
 	 * Avoid grouping on RO pages in general. RO pages shouldn't hurt as
 	 * much anyway since they can be in shared cache state. This misses
@@ -4857,6 +4853,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 		flags |= TNF_SHARED;
 
 	nid = folio_nid(folio);
+	nr_pages = folio_nr_pages(folio);
 	/*
 	 * For memory tiering mode, cpupid of slow memory page is used
 	 * to record page access time.  So use default value.
@@ -4893,7 +4890,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
 
 out:
 	if (nid != NUMA_NO_NODE)
-		task_numa_fault(last_cpupid, nid, 1, flags);
+		task_numa_fault(last_cpupid, nid, nr_pages, flags);
 	return 0;
 out_map:
 	/*
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread