All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@amd.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	Bharata B Rao <bharata@amd.com>, Ingo Molnar <mingo@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	Mel Gorman <mgorman@techsingularity.net>
Subject: [PATCH 5/6] sched/numa: Complete scanning of partial VMAs regardless of PID activity
Date: Tue, 10 Oct 2023 09:31:42 +0100	[thread overview]
Message-ID: <20231010083143.19593-6-mgorman@techsingularity.net> (raw)
In-Reply-To: <20231010083143.19593-1-mgorman@techsingularity.net>

NUMA Balancing skips VMAs when the current task has not trapped a NUMA
fault within the VMA. If the VMA is skipped then mm->numa_scan_offset
advances and a task that is trapping faults within the VMA may never
fully update PTEs within the VMA.

Force tasks to update PTEs for partially scanned PTEs. The VMA will
be tagged for NUMA hints by some task but this removes some of the
benefit of tracking PID activity within a VMA. A follow-on patch
will mitigate this problem.

The test cases and machines evaluated did not trigger the corner case so
the performance results are neutral with only small changes within the
noise from normal test-to-test variance. However, the next patch makes
the corner case easier to trigger.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/sched/numa_balancing.h |  1 +
 include/trace/events/sched.h         |  3 ++-
 kernel/sched/fair.c                  | 18 +++++++++++++++---
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched/numa_balancing.h b/include/linux/sched/numa_balancing.h
index c127a1509e2f..7dcc0bdfddbb 100644
--- a/include/linux/sched/numa_balancing.h
+++ b/include/linux/sched/numa_balancing.h
@@ -21,6 +21,7 @@ enum numa_vmaskip_reason {
 	NUMAB_SKIP_INACCESSIBLE,
 	NUMAB_SKIP_SCAN_DELAY,
 	NUMAB_SKIP_PID_INACTIVE,
+	NUMAB_SKIP_IGNORE_PID,
 };
 
 #ifdef CONFIG_NUMA_BALANCING
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index b0d0dbf491ea..27b51c81b106 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -670,7 +670,8 @@ DEFINE_EVENT(sched_numa_pair_template, sched_swap_numa,
 	EM( NUMAB_SKIP_SHARED_RO,		"shared_ro" )	\
 	EM( NUMAB_SKIP_INACCESSIBLE,		"inaccessible" )	\
 	EM( NUMAB_SKIP_SCAN_DELAY,		"scan_delay" )	\
-	EMe(NUMAB_SKIP_PID_INACTIVE,		"pid_inactive" )
+	EM( NUMAB_SKIP_PID_INACTIVE,		"pid_inactive" )	\
+	EMe(NUMAB_SKIP_IGNORE_PID,		"ignore_pid_inactive" )
 
 /* Redefine for export. */
 #undef EM
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 05e89a7950d0..150f01948ec6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3130,7 +3130,7 @@ static void reset_ptenuma_scan(struct task_struct *p)
 	p->mm->numa_scan_offset = 0;
 }
 
-static bool vma_is_accessed(struct vm_area_struct *vma)
+static bool vma_is_accessed(struct mm_struct *mm, struct vm_area_struct *vma)
 {
 	unsigned long pids;
 	/*
@@ -3143,7 +3143,19 @@ static bool vma_is_accessed(struct vm_area_struct *vma)
 		return true;
 
 	pids = vma->numab_state->pids_active[0] | vma->numab_state->pids_active[1];
-	return test_bit(hash_32(current->pid, ilog2(BITS_PER_LONG)), &pids);
+	if (test_bit(hash_32(current->pid, ilog2(BITS_PER_LONG)), &pids))
+		return true;
+
+	/*
+	 * Complete a scan that has already started regardless of PID access or
+	 * some VMAs may never be scanned in multi-threaded applications
+	 */
+	if (mm->numa_scan_offset > vma->vm_start) {
+		trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_IGNORE_PID);
+		return true;
+	}
+
+	return false;
 }
 
 #define VMA_PID_RESET_PERIOD (4 * sysctl_numa_balancing_scan_delay)
@@ -3287,7 +3299,7 @@ static void task_numa_work(struct callback_head *work)
 		}
 
 		/* Do not scan the VMA if task has not accessed */
-		if (!vma_is_accessed(vma)) {
+		if (!vma_is_accessed(mm, vma)) {
 			trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_PID_INACTIVE);
 			continue;
 		}
-- 
2.35.3


  parent reply	other threads:[~2023-10-10  8:32 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-10  8:31 [PATCH 0/6] sched/numa: Complete scanning of partial and inactive VMAs Mel Gorman
2023-10-10  8:31 ` [PATCH 1/6] sched/numa: Document vma_numab_state fields Mel Gorman
2023-10-10  9:43   ` [tip: sched/core] " tip-bot2 for Mel Gorman
2023-10-10  8:31 ` [PATCH 2/6] sched/numa: Rename vma_numab_state.access_pids Mel Gorman
2023-10-10  9:43   ` [tip: sched/core] sched/numa: Rename vma_numab_state::access_pids[] => ::pids_active[], ::next_pid_reset => ::pids_active_reset tip-bot2 for Mel Gorman
2023-10-10  8:31 ` [PATCH 3/6] sched/numa: Trace decisions related to skipping VMAs Mel Gorman
2023-10-10  9:43   ` [tip: sched/core] " tip-bot2 for Mel Gorman
2023-10-10  8:31 ` [PATCH 4/6] sched/numa: Move up the access pid reset logic Mel Gorman
2023-10-10  9:43   ` [tip: sched/core] " tip-bot2 for Raghavendra K T
2023-10-10  8:31 ` Mel Gorman [this message]
2023-10-10  9:43   ` [tip: sched/core] sched/numa: Complete scanning of partial VMAs regardless of PID activity tip-bot2 for Mel Gorman
2023-10-10 21:47   ` tip-bot2 for Mel Gorman
2023-10-10  8:31 ` [PATCH 6/6] sched/numa: Complete scanning of inactive VMAs when there is no alternative Mel Gorman
2023-10-10  9:23   ` Ingo Molnar
2023-10-10  9:57     ` Mel Gorman
2023-10-10 21:39       ` Ingo Molnar
2023-10-10 11:40     ` Raghavendra K T
2024-02-24  4:50       ` Raghavendra K T
2023-10-10  9:43   ` [tip: sched/core] " tip-bot2 for Mel Gorman
2023-10-10 11:42   ` [PATCH 6/6] " Raghavendra K T
2023-10-10 21:47   ` [tip: sched/core] " tip-bot2 for Mel Gorman
2023-10-10 11:39 ` [PATCH 0/6] sched/numa: Complete scanning of partial and inactive VMAs Raghavendra K T
2023-10-10 21:45   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231010083143.19593-6-mgorman@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=bharata@amd.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.