linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>, Christoph Lameter <cl@linux.com>,
	Matthew Wilcox <willy@infradead.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	David Hildenbrand <david@redhat.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Yang Shi <shy828301@gmail.com>,
	Sidhartha Kumar <sidhartha.kumar@oracle.com>,
	Vishal Moola <vishal.moola@gmail.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Tejun Heo <tj@kernel.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Michal Hocko <mhocko@suse.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH v2 12/12] mempolicy: migration attempt to match interleave nodes
Date: Tue, 3 Oct 2023 02:29:00 -0700 (PDT)	[thread overview]
Message-ID: <77954a5-9c9b-1c11-7d5c-3262c01b895f@google.com> (raw)
In-Reply-To: <ebc0987e-beff-8bfb-9283-234c2cbd17c5@google.com>

Improve alloc_migration_target_by_mpol()'s treatment of MPOL_INTERLEAVE.

Make an effort in do_mbind(), to identify the correct interleave index
for the first page to be migrated, so that it and all subsequent pages
from the same vma will be targeted to precisely their intended nodes.
Pages from following vmas will still be interleaved from the requested
nodemask, but perhaps starting from a different base.

Whether this is worth doing at all, or worth improving further, is
arguable: queue_folio_required() is right not to care about the precise
placement on interleaved nodes; but this little effort seems appropriate.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/mempolicy.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 46 insertions(+), 3 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a7b34b9c00ef..b01922e88548 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -430,6 +430,11 @@ static bool strictly_unmovable(unsigned long flags)
 			 MPOL_MF_STRICT;
 }
 
+struct migration_mpol {		/* for alloc_migration_target_by_mpol() */
+	struct mempolicy *pol;
+	pgoff_t ilx;
+};
+
 struct queue_pages {
 	struct list_head *pagelist;
 	unsigned long flags;
@@ -1178,8 +1183,9 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
 static struct folio *alloc_migration_target_by_mpol(struct folio *src,
 						    unsigned long private)
 {
-	struct mempolicy *pol = (struct mempolicy *)private;
-	pgoff_t ilx = 0;	/* improve on this later */
+	struct migration_mpol *mmpol = (struct migration_mpol *)private;
+	struct mempolicy *pol = mmpol->pol;
+	pgoff_t ilx = mmpol->ilx;
 	struct page *page;
 	unsigned int order;
 	int nid = numa_node_id();
@@ -1234,6 +1240,7 @@ static long do_mbind(unsigned long start, unsigned long len,
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma, *prev;
 	struct vma_iterator vmi;
+	struct migration_mpol mmpol;
 	struct mempolicy *new;
 	unsigned long end;
 	long err;
@@ -1314,9 +1321,45 @@ static long do_mbind(unsigned long start, unsigned long len,
 			new = get_task_policy(current);
 			mpol_get(new);
 		}
+		mmpol.pol = new;
+		mmpol.ilx = 0;
+
+		/*
+		 * In the interleaved case, attempt to allocate on exactly the
+		 * targeted nodes, for the first VMA to be migrated; for later
+		 * VMAs, the nodes will still be interleaved from the targeted
+		 * nodemask, but one by one may be selected differently.
+		 */
+		if (new->mode == MPOL_INTERLEAVE) {
+			struct page *page;
+			unsigned int order;
+			unsigned long addr = -EFAULT;
+
+			list_for_each_entry(page, &pagelist, lru) {
+				if (!PageKsm(page))
+					break;
+			}
+			if (!list_entry_is_head(page, &pagelist, lru)) {
+				vma_iter_init(&vmi, mm, start);
+				for_each_vma_range(vmi, vma, end) {
+					addr = page_address_in_vma(page, vma);
+					if (addr != -EFAULT)
+						break;
+				}
+			}
+			if (addr != -EFAULT) {
+				order = compound_order(page);
+				/* We already know the pol, but not the ilx */
+				mpol_cond_put(get_vma_policy(vma, addr, order,
+							     &mmpol.ilx));
+				/* Set base from which to increment by index */
+				mmpol.ilx -= page->index >> order;
+			}
+		}
+
 		nr_failed |= migrate_pages(&pagelist,
 				alloc_migration_target_by_mpol, NULL,
-				(unsigned long)new, MIGRATE_SYNC,
+				(unsigned long)&mmpol, MIGRATE_SYNC,
 				MR_MEMPOLICY_MBIND, NULL);
 	}
 
-- 
2.35.3


  parent reply	other threads:[~2023-10-03  9:29 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-03  9:12 [PATCH v2 00/12] mempolicy: cleanups leading to NUMA mpol without vma Hugh Dickins
2023-10-03  9:15 ` [PATCH v2 01/12] hugetlbfs: drop shared NUMA mempolicy pretence Hugh Dickins
2023-10-03  9:16 ` [PATCH v2 02/12] kernfs: drop shared NUMA mempolicy hooks Hugh Dickins
2023-10-03  9:17 ` [PATCH v2 03/12] mempolicy: fix migrate_pages(2) syscall return nr_failed Hugh Dickins
2023-10-07  7:27   ` Huang, Ying
2023-10-03  9:19 ` [PATCH v2 04/12] mempolicy trivia: delete those ancient pr_debug()s Hugh Dickins
2023-10-03  9:20 ` [PATCH v2 05/12] mempolicy trivia: slightly more consistent naming Hugh Dickins
2023-10-03  9:21 ` [PATCH v2 06/12] mempolicy trivia: use pgoff_t in shared mempolicy tree Hugh Dickins
2023-10-03  9:22 ` [PATCH v2 07/12] mempolicy: mpol_shared_policy_init() without pseudo-vma Hugh Dickins
2023-10-03  9:24 ` [PATCH v2 08/12] mempolicy: remove confusing MPOL_MF_LAZY dead code Hugh Dickins
2023-10-03 22:28   ` Yang Shi
2023-10-03  9:25 ` [PATCH v2 09/12] mm: add page_rmappable_folio() wrapper Hugh Dickins
2023-10-03  9:26 ` [PATCH v2 10/12] mempolicy: alloc_pages_mpol() for NUMA policy without vma Hugh Dickins
2023-10-19 20:39   ` [PATCH v3 " Hugh Dickins
2023-10-23 16:53     ` domenico cerasuolo
2023-10-23 17:53       ` Andrew Morton
2023-10-23 18:10         ` domenico cerasuolo
2023-10-23 19:05           ` Johannes Weiner
2023-10-23 19:48             ` Hugh Dickins
2023-10-24  6:44             ` [PATCH] mempolicy: alloc_pages_mpol() for NUMA policy without vma: fix Hugh Dickins
2023-10-24  8:17               ` kernel test robot
2023-10-24 15:56                 ` Hugh Dickins
2023-10-24 16:09               ` [PATCH v2] " Hugh Dickins
2023-10-23 18:34     ` [PATCH v3 10/12] mempolicy: alloc_pages_mpol() for NUMA policy without vma Zi Yan
2023-10-23 21:10       ` Hugh Dickins
2023-10-23 21:13         ` Zi Yan
2023-10-03  9:27 ` [PATCH v2 11/12] mempolicy: mmap_lock is not needed while migrating folios Hugh Dickins
2023-10-03  9:29 ` Hugh Dickins [this message]
2023-10-24  6:50   ` [PATCH] mempolicy: migration attempt to match interleave nodes: fix Hugh Dickins
2023-10-24 15:18     ` Liam R. Howlett
2023-10-24 16:32       ` Hugh Dickins
2023-10-24 16:45         ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=77954a5-9c9b-1c11-7d5c-3262c01b895f@google.com \
    --to=hughd@google.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=david@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=shy828301@gmail.com \
    --cc=sidhartha.kumar@oracle.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=vishal.moola@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).