linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] Fix excessive CPU usage during compaction
@ 2023-01-25 13:44 Mel Gorman
  2023-01-25 13:44 ` [PATCH 1/4] mm, compaction: Rename compact_control->rescan to finish_pageblock Mel Gorman
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Mel Gorman @ 2023-01-25 13:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML,
	Mel Gorman

Commit 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
fixed a problem where pageblocks found by fast_find_migrateblock() were
ignored. Unfortunately there were numerous bug reports complaining about high
CPU usage and massive stalls once 6.1 was released. Due to the severity,
the patch was reverted by Vlastimil as a short-term fix[1] to -stable and
is currently sitting in the Andrew's git branch mm/mm-hotfixes-unstable.

The underlying problem for each of the bugs is suspected to be the
repeated scanning of the same pageblocks. This series should guarantee
forward progress even with commit 7efc3b726103. More information is in
the changelog for patch 4.

If this series is accepted and merged after the revert of 7efc3b726103
then a "revert of the revert" will be needed.

[1] http://lore.kernel.org/r/20230113173345.9692-1-vbabka@suse.cz

 mm/compaction.c | 73 +++++++++++++++++++++++++++++++------------------
 mm/internal.h   |  6 +++-
 2 files changed, 52 insertions(+), 27 deletions(-)

-- 
2.35.3

Mel Gorman (4):
  mm, compaction: Rename compact_control->rescan to finish_pageblock
  mm, compaction: Check if a page has been captured before draining PCP
    pages
  mm, compaction: Finish scanning the current pageblock if requested
  mm, compaction: Finish pageblocks on complete migration failure

 mm/compaction.c | 73 +++++++++++++++++++++++++++++++------------------
 mm/internal.h   |  6 +++-
 2 files changed, 52 insertions(+), 27 deletions(-)

-- 
2.35.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] mm, compaction: Rename compact_control->rescan to finish_pageblock
  2023-01-25 13:44 [RFC PATCH 0/4] Fix excessive CPU usage during compaction Mel Gorman
@ 2023-01-25 13:44 ` Mel Gorman
  2023-02-07 16:22   ` Vlastimil Babka
  2023-01-25 13:44 ` [PATCH 2/4] mm, compaction: Check if a page has been captured before draining PCP pages Mel Gorman
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2023-01-25 13:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML,
	Mel Gorman

The rescan field was not well named albeit accurate at the time. Rename the
field to finish_pageblock to indicate that the remainder of the pageblock
should be scanned regardless of COMPACT_CLUSTER_MAX. The intent is that
pageblocks with transient failures get marked for skipping to avoid
revisiting the same pageblock.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 24 ++++++++++++------------
 mm/internal.h   |  6 +++++-
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index ca1603524bbe..c018b0e65720 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1102,12 +1102,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 		/*
 		 * Avoid isolating too much unless this block is being
-		 * rescanned (e.g. dirty/writeback pages, parallel allocation)
+		 * fully scanned (e.g. dirty/writeback pages, parallel allocation)
 		 * or a lock is contended. For contention, isolate quickly to
 		 * potentially remove one source of contention.
 		 */
 		if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX &&
-		    !cc->rescan && !cc->contended) {
+		    !cc->finish_pageblock && !cc->contended) {
 			++low_pfn;
 			break;
 		}
@@ -1172,14 +1172,14 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	}
 
 	/*
-	 * Updated the cached scanner pfn once the pageblock has been scanned
+	 * Update the cached scanner pfn once the pageblock has been scanned.
 	 * Pages will either be migrated in which case there is no point
 	 * scanning in the near future or migration failed in which case the
 	 * failure reason may persist. The block is marked for skipping if
 	 * there were no pages isolated in the block or if the block is
 	 * rescanned twice in a row.
 	 */
-	if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) {
+	if (low_pfn == end_pfn && (!nr_isolated || cc->finish_pageblock)) {
 		if (valid_page && !skip_updated)
 			set_pageblock_skip(valid_page);
 		update_cached_migrate(cc, low_pfn);
@@ -2374,17 +2374,17 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 		unsigned long iteration_start_pfn = cc->migrate_pfn;
 
 		/*
-		 * Avoid multiple rescans which can happen if a page cannot be
-		 * isolated (dirty/writeback in async mode) or if the migrated
-		 * pages are being allocated before the pageblock is cleared.
-		 * The first rescan will capture the entire pageblock for
-		 * migration. If it fails, it'll be marked skip and scanning
-		 * will proceed as normal.
+		 * Avoid multiple rescans of the same pageblock which can
+		 * happen if a page cannot be isolated (dirty/writeback in
+		 * async mode) or if the migrated pages are being allocated
+		 * before the pageblock is cleared.  The first rescan will
+		 * capture the entire pageblock for migration. If it fails,
+		 * it'll be marked skip and scanning will proceed as normal.
 		 */
-		cc->rescan = false;
+		cc->finish_pageblock = false;
 		if (pageblock_start_pfn(last_migrated_pfn) ==
 		    pageblock_start_pfn(iteration_start_pfn)) {
-			cc->rescan = true;
+			cc->finish_pageblock = true;
 		}
 
 		switch (isolate_migratepages(cc)) {
diff --git a/mm/internal.h b/mm/internal.h
index bcf75a8b032d..21466d0ab22f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -422,7 +422,11 @@ struct compact_control {
 	bool proactive_compaction;	/* kcompactd proactive compaction */
 	bool whole_zone;		/* Whole zone should/has been scanned */
 	bool contended;			/* Signal lock contention */
-	bool rescan;			/* Rescanning the same pageblock */
+	bool finish_pageblock;		/* Scan the remainder of a pageblock. Used
+					 * when there are potentially transient
+					 * isolation or migration failures to
+					 * ensure forward progress.
+					 */
 	bool alloc_contig;		/* alloc_contig_range allocation */
 };
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] mm, compaction: Check if a page has been captured before draining PCP pages
  2023-01-25 13:44 [RFC PATCH 0/4] Fix excessive CPU usage during compaction Mel Gorman
  2023-01-25 13:44 ` [PATCH 1/4] mm, compaction: Rename compact_control->rescan to finish_pageblock Mel Gorman
@ 2023-01-25 13:44 ` Mel Gorman
  2023-02-07 16:53   ` Vlastimil Babka
  2023-01-25 13:44 ` [PATCH 3/4] mm, compaction: Finish scanning the current pageblock if requested Mel Gorman
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2023-01-25 13:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML,
	Mel Gorman

If a page has been captured then draining is unnecssary so check first
for a captured page.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index c018b0e65720..28711a21a8a2 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2441,6 +2441,12 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 			}
 		}
 
+		/* Stop if a page has been captured */
+		if (capc && capc->page) {
+			ret = COMPACT_SUCCESS;
+			break;
+		}
+
 check_drain:
 		/*
 		 * Has the migration scanner moved away from the previous
@@ -2459,12 +2465,6 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 				last_migrated_pfn = 0;
 			}
 		}
-
-		/* Stop if a page has been captured */
-		if (capc && capc->page) {
-			ret = COMPACT_SUCCESS;
-			break;
-		}
 	}
 
 out:
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] mm, compaction: Finish scanning the current pageblock if requested
  2023-01-25 13:44 [RFC PATCH 0/4] Fix excessive CPU usage during compaction Mel Gorman
  2023-01-25 13:44 ` [PATCH 1/4] mm, compaction: Rename compact_control->rescan to finish_pageblock Mel Gorman
  2023-01-25 13:44 ` [PATCH 2/4] mm, compaction: Check if a page has been captured before draining PCP pages Mel Gorman
@ 2023-01-25 13:44 ` Mel Gorman
  2023-02-07 17:10   ` Vlastimil Babka
  2023-01-25 13:44 ` [PATCH 4/4] mm, compaction: Finish pageblocks on complete migration failure Mel Gorman
  2023-01-26  1:11 ` [RFC PATCH 0/4] Fix excessive CPU usage during compaction Andrew Morton
  4 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2023-01-25 13:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML,
	Mel Gorman

cc->finish_pageblock is set when the current pageblock should be
rescanned but fast_find_migrateblock can select an alternative
block. Disable fast_find_migrateblock when the current pageblock
scan should be completed.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 28711a21a8a2..4b3a0238879c 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1762,6 +1762,13 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
 	if (cc->ignore_skip_hint)
 		return pfn;
 
+	/*
+	 * If the pageblock should be finished then do not select a different
+	 * pageblock.
+	 */
+	if (cc->finish_pageblock)
+		return pfn;
+
 	/*
 	 * If the migrate_pfn is not at the start of a zone or the start
 	 * of a pageblock then assume this is a continuation of a previous
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] mm, compaction: Finish pageblocks on complete migration failure
  2023-01-25 13:44 [RFC PATCH 0/4] Fix excessive CPU usage during compaction Mel Gorman
                   ` (2 preceding siblings ...)
  2023-01-25 13:44 ` [PATCH 3/4] mm, compaction: Finish scanning the current pageblock if requested Mel Gorman
@ 2023-01-25 13:44 ` Mel Gorman
  2023-02-07 17:42   ` Vlastimil Babka
  2023-01-26  1:11 ` [RFC PATCH 0/4] Fix excessive CPU usage during compaction Andrew Morton
  4 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2023-01-25 13:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML,
	Mel Gorman

Commit 7efc3b726103 ("mm/compaction: fix set skip in
fast_find_migrateblock") address an issue where a pageblock selected
by fast_find_migrateblock() was ignored. Unfortunately, the same fix
resulted in numerous reports of khugepaged or kcompactd stalling for
long periods of time or consuming 100% of CPU.

Tracing showed that there was a lot of rescanning between a small subset
of pageblocks because the conditions for marking the block skip are not
met. The scan is not reaching the end of the pageblock because enough
pages were isolated but none were migrated successfully. Eventually it
circles back to the same block.

Pageblock skip tracking tries to minimise both latency and excessive
scanning but tracking exactly when a block is fully scanned requires an
excessive amount of state. This patch forcibly rescans a pageblock when
all isolated pages fail to migrate even though it could be for transient
reasons such as page writeback or page dirty. This will sometimes migrate
too many pages but pageblocks will be marked skip and forward progress
will be made.

"Usemen" from the mmtests configuration
workload-usemem-stress-numa-compact was used to stress compaction.
The compaction trace events were recorded using a 6.2-rc5 kernel that
includes commit 7efc3b726103 and count of unique ranges were measured. The
top 5 ranges were

   3076 range=(0x10ca00-0x10cc00)
   3076 range=(0x110a00-0x110c00)
   3098 range=(0x13b600-0x13b800)
   3104 range=(0x141c00-0x141e00)
  11424 range=(0x11b600-0x11b800)

While this workload is very different than what the bugs reported,
the pattern of the same subset of blocks being repeatedly scanned is
observed. At one point, *only* the range range=(0x11b600 ~ 0x11b800)
was scanned for 2 seconds. 14 seconds passed between the first
migration-related event and the last.

With the series applied including this patch, the top 5 ranges were

      1 range=(0x11607e-0x116200)
      1 range=(0x116200-0x116278)
      1 range=(0x116278-0x116400)
      1 range=(0x116400-0x116424)
      1 range=(0x116424-0x116600)

Only unique ranges were scanned and the time between the first
migration-related event was 0.11 milliseconds.

Fixes: 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 4b3a0238879c..937ec2f05f2c 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2394,6 +2394,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 			cc->finish_pageblock = true;
 		}
 
+rescan:
 		switch (isolate_migratepages(cc)) {
 		case ISOLATE_ABORT:
 			ret = COMPACT_CONTENDED;
@@ -2436,15 +2437,28 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 				goto out;
 			}
 			/*
-			 * We failed to migrate at least one page in the current
-			 * order-aligned block, so skip the rest of it.
+			 * If an ASYNC or SYNC_LIGHT fails to migrate a page
+			 * within the current order-aligned block, scan the
+			 * remainder of the pageblock. This will mark the
+			 * pageblock "skip" to avoid rescanning in the near
+			 * future. This will isolate more pages than necessary
+			 * for the request but avoid loops due to
+			 * fast_find_migrateblock revisiting blocks that were
+			 * recently partially scanned.
 			 */
-			if (cc->direct_compaction &&
-						(cc->mode == MIGRATE_ASYNC)) {
-				cc->migrate_pfn = block_end_pfn(
-						cc->migrate_pfn - 1, cc->order);
-				/* Draining pcplists is useless in this case */
-				last_migrated_pfn = 0;
+			if (cc->direct_compaction && !cc->finish_pageblock &&
+						(cc->mode < MIGRATE_SYNC)) {
+				cc->finish_pageblock = true;
+
+				/*
+				 * Draining pcplists does not help THP if
+				 * any page failed to migrate. Even after
+				 * drain, the pageblock will not be free.
+				 */
+				if (cc->order == COMPACTION_HPAGE_ORDER)
+					last_migrated_pfn = 0;
+
+				goto rescan;
 			}
 		}
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/4] Fix excessive CPU usage during compaction
  2023-01-25 13:44 [RFC PATCH 0/4] Fix excessive CPU usage during compaction Mel Gorman
                   ` (3 preceding siblings ...)
  2023-01-25 13:44 ` [PATCH 4/4] mm, compaction: Finish pageblocks on complete migration failure Mel Gorman
@ 2023-01-26  1:11 ` Andrew Morton
  2023-01-26  9:04   ` Mel Gorman
  2023-01-29 18:03   ` Vlastimil Babka
  4 siblings, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2023-01-26  1:11 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Vlastimil Babka, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On Wed, 25 Jan 2023 13:44:30 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:

> Commit 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
> fixed a problem where pageblocks found by fast_find_migrateblock() were
> ignored. Unfortunately there were numerous bug reports complaining about high
> CPU usage and massive stalls once 6.1 was released. Due to the severity,
> the patch was reverted by Vlastimil as a short-term fix[1] to -stable and
> is currently sitting in the Andrew's git branch mm/mm-hotfixes-unstable.
> 
> The underlying problem for each of the bugs is suspected to be the
> repeated scanning of the same pageblocks. This series should guarantee
> forward progress even with commit 7efc3b726103. More information is in
> the changelog for patch 4.
> 
> If this series is accepted and merged after the revert of 7efc3b726103
> then a "revert of the revert" will be needed.

If we drop Vlastimil's reversion and apply this, the whole series
should be cc:stable and it isn't really designed for that.

So I think either

a) drop Vlastimil's reversion and persuade Mel to send us a minimal
   version of patch #4 for -stable consumption.  Patches 1-3 of this
   series come later.

b) go ahead with Vlastimil's revert for -stable, queue up this
   series for 6.3-rc1 and redo the original "fix set skip in
   fast_find_migrateblock" some time in the future.

If we go with b) then the Fixes: tag in "[PATCH 4/4] mm, compaction:
Finish pageblocks on complete migration failure" is inappropriate -
fixing a reverted commit which Vlastimil's revert already fixed.

I'll plan on b) for now.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/4] Fix excessive CPU usage during compaction
  2023-01-26  1:11 ` [RFC PATCH 0/4] Fix excessive CPU usage during compaction Andrew Morton
@ 2023-01-26  9:04   ` Mel Gorman
  2023-01-29 18:03   ` Vlastimil Babka
  1 sibling, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2023-01-26  9:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On Wed, Jan 25, 2023 at 05:11:59PM -0800, Andrew Morton wrote:
> On Wed, 25 Jan 2023 13:44:30 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> > Commit 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
> > fixed a problem where pageblocks found by fast_find_migrateblock() were
> > ignored. Unfortunately there were numerous bug reports complaining about high
> > CPU usage and massive stalls once 6.1 was released. Due to the severity,
> > the patch was reverted by Vlastimil as a short-term fix[1] to -stable and
> > is currently sitting in the Andrew's git branch mm/mm-hotfixes-unstable.
> > 
> > The underlying problem for each of the bugs is suspected to be the
> > repeated scanning of the same pageblocks. This series should guarantee
> > forward progress even with commit 7efc3b726103. More information is in
> > the changelog for patch 4.
> > 
> > If this series is accepted and merged after the revert of 7efc3b726103
> > then a "revert of the revert" will be needed.
> 
> If we drop Vlastimil's reversion and apply this, the whole series
> should be cc:stable and it isn't really designed for that.
> 
> So I think either
> 
> a) drop Vlastimil's reversion and persuade Mel to send us a minimal
>    version of patch #4 for -stable consumption.  Patches 1-3 of this
>    series come later.
> 
> b) go ahead with Vlastimil's revert for -stable, queue up this
>    series for 6.3-rc1 and redo the original "fix set skip in
>    fast_find_migrateblock" some time in the future.
> 
> If we go with b) then the Fixes: tag in "[PATCH 4/4] mm, compaction:
> Finish pageblocks on complete migration failure" is inappropriate -
> fixing a reverted commit which Vlastimil's revert already fixed.
> 
> I'll plan on b) for now.

I think plan b) is more appropriate because Vlastimil's revert contains
useful information on this class of bug that may be useful again in the
future. There is no harm preserving that in both the mainline and -stable
git history.

The series worked for me but that was a limited test case and high CPU
usage bugs could crop up again in 6.2. If such bugs show up then reverting
("mm/compaction: fix set skip in fast_find_migrateblock") becomes a
short-term option again while it is analysed.

I disagee that the Fixes tag is inappropriate. Commit 7efc3b726103 is simply
a messenger that fast_find_migrateblock() was broken and while it fixed one
problem, it triggered many more because the fix was incomplete. Vlastimil's
fix was a short-term "fix" to mitigate the fallout until a proper fix was
available. This series is a more comprehensive fix but both are "Fixes"
to commit 7efc3b726103 in terms of what kernel has very obvious problems
that needs to be fixed. Any kernel including 7efc3b726103 should include
either Vlastimil's fix, mine or both if git history is being preserved by the
backports. While there is an older commit that broke fast_find_migrateblock,
most people simply didn't notice where as commit 7efc3b726103 was noticeable
by a number of people very quickly.

However, if you disagree then the appropriate fixes would be

Fixes: 70b44595eafe9 ("mm, compaction: use free lists to quickly locate a migration source")

That would at least indicate to anyone doing the backport that they need
to ensure the backport does not have any hidden dependencies.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/4] Fix excessive CPU usage during compaction
  2023-01-26  1:11 ` [RFC PATCH 0/4] Fix excessive CPU usage during compaction Andrew Morton
  2023-01-26  9:04   ` Mel Gorman
@ 2023-01-29 18:03   ` Vlastimil Babka
  2023-01-29 21:00     ` Mel Gorman
  1 sibling, 1 reply; 14+ messages in thread
From: Vlastimil Babka @ 2023-01-29 18:03 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman
  Cc: Jiri Slaby, Maxim Levitsky, Michal Hocko, Pedro Falcato,
	Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On 1/26/23 02:11, Andrew Morton wrote:
> On Wed, 25 Jan 2023 13:44:30 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> If we drop Vlastimil's reversion and apply this, the whole series
> should be cc:stable and it isn't really designed for that.
> 
> So I think either
> 
> a) drop Vlastimil's reversion and persuade Mel to send us a minimal
>    version of patch #4 for -stable consumption.  Patches 1-3 of this
>    series come later.
> 
> b) go ahead with Vlastimil's revert for -stable, queue up this
>    series for 6.3-rc1 and redo the original "fix set skip in
>    fast_find_migrateblock" some time in the future.
> 
> If we go with b) then the Fixes: tag in "[PATCH 4/4] mm, compaction:
> Finish pageblocks on complete migration failure" is inappropriate -
> fixing a reverted commit which Vlastimil's revert already fixed.
> 
> I'll plan on b) for now.

Agreed with the plan b). I couldn't review this yet due to being sick,
but I doubt I would have enough confidence to fast-track the series to
6.2 and 6.1-stable. It's subtle enough area and extra time in -next and
full -rc cycle will help.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/4] Fix excessive CPU usage during compaction
  2023-01-29 18:03   ` Vlastimil Babka
@ 2023-01-29 21:00     ` Mel Gorman
  0 siblings, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2023-01-29 21:00 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On Sun, Jan 29, 2023 at 07:03:54PM +0100, Vlastimil Babka wrote:
> On 1/26/23 02:11, Andrew Morton wrote:
> > On Wed, 25 Jan 2023 13:44:30 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:
> > 
> > If we drop Vlastimil's reversion and apply this, the whole series
> > should be cc:stable and it isn't really designed for that.
> > 
> > So I think either
> > 
> > a) drop Vlastimil's reversion and persuade Mel to send us a minimal
> >    version of patch #4 for -stable consumption.  Patches 1-3 of this
> >    series come later.
> > 
> > b) go ahead with Vlastimil's revert for -stable, queue up this
> >    series for 6.3-rc1 and redo the original "fix set skip in
> >    fast_find_migrateblock" some time in the future.
> > 
> > If we go with b) then the Fixes: tag in "[PATCH 4/4] mm, compaction:
> > Finish pageblocks on complete migration failure" is inappropriate -
> > fixing a reverted commit which Vlastimil's revert already fixed.
> > 
> > I'll plan on b) for now.
> 
> Agreed with the plan b). I couldn't review this yet due to being sick,
> but I doubt I would have enough confidence to fast-track the series to
> 6.2 and 6.1-stable. It's subtle enough area and extra time in -next and
> full -rc cycle will help.

I hope you feel better soon but for what it's worth, I think it also
deserves a full -rc cycle. I've been running it on my own machine for the
last few days using an openSUSE stable kernel with this series applied and
I haven't had problems with kcompactd or khugepaged getting out of control
(monitored via top -b -i). However, I have noticed at least one audio glitch
and I'm not sure if that is related to the series or not. Compaction is
more active than I would have expected from intuition but I've also never
had reason to monitor compaction on my desktop so that's not very useful
in itself. My desktop is a very basic environment (awesome WM, no fancy
animations) and the times when my machine starts thrashing, I also expect
it to because it's the weekly "run git gc on every git tree Friday evening
if X is idle more than an hour".

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] mm, compaction: Rename compact_control->rescan to finish_pageblock
  2023-01-25 13:44 ` [PATCH 1/4] mm, compaction: Rename compact_control->rescan to finish_pageblock Mel Gorman
@ 2023-02-07 16:22   ` Vlastimil Babka
  0 siblings, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2023-02-07 16:22 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On 1/25/23 14:44, Mel Gorman wrote:
> The rescan field was not well named albeit accurate at the time. Rename the
> field to finish_pageblock to indicate that the remainder of the pageblock
> should be scanned regardless of COMPACT_CLUSTER_MAX. The intent is that
> pageblocks with transient failures get marked for skipping to avoid
> revisiting the same pageblock.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] mm, compaction: Check if a page has been captured before draining PCP pages
  2023-01-25 13:44 ` [PATCH 2/4] mm, compaction: Check if a page has been captured before draining PCP pages Mel Gorman
@ 2023-02-07 16:53   ` Vlastimil Babka
  0 siblings, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2023-02-07 16:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On 1/25/23 14:44, Mel Gorman wrote:
> If a page has been captured then draining is unnecssary so check first
> for a captured page.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/compaction.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index c018b0e65720..28711a21a8a2 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2441,6 +2441,12 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>  			}
>  		}
>  
> +		/* Stop if a page has been captured */
> +		if (capc && capc->page) {
> +			ret = COMPACT_SUCCESS;
> +			break;
> +		}
> +
>  check_drain:
>  		/*
>  		 * Has the migration scanner moved away from the previous
> @@ -2459,12 +2465,6 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>  				last_migrated_pfn = 0;
>  			}
>  		}
> -
> -		/* Stop if a page has been captured */
> -		if (capc && capc->page) {
> -			ret = COMPACT_SUCCESS;
> -			break;
> -		}
>  	}
>  
>  out:


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] mm, compaction: Finish scanning the current pageblock if requested
  2023-01-25 13:44 ` [PATCH 3/4] mm, compaction: Finish scanning the current pageblock if requested Mel Gorman
@ 2023-02-07 17:10   ` Vlastimil Babka
  0 siblings, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2023-02-07 17:10 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On 1/25/23 14:44, Mel Gorman wrote:
> cc->finish_pageblock is set when the current pageblock should be
> rescanned but fast_find_migrateblock can select an alternative
> block. Disable fast_find_migrateblock when the current pageblock
> scan should be completed.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/compaction.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 28711a21a8a2..4b3a0238879c 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1762,6 +1762,13 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
>  	if (cc->ignore_skip_hint)
>  		return pfn;
>  
> +	/*
> +	 * If the pageblock should be finished then do not select a different
> +	 * pageblock.
> +	 */
> +	if (cc->finish_pageblock)
> +		return pfn;
> +
>  	/*
>  	 * If the migrate_pfn is not at the start of a zone or the start
>  	 * of a pageblock then assume this is a continuation of a previous


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] mm, compaction: Finish pageblocks on complete migration failure
  2023-01-25 13:44 ` [PATCH 4/4] mm, compaction: Finish pageblocks on complete migration failure Mel Gorman
@ 2023-02-07 17:42   ` Vlastimil Babka
  2023-02-13 21:50     ` Vlastimil Babka
  0 siblings, 1 reply; 14+ messages in thread
From: Vlastimil Babka @ 2023-02-07 17:42 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On 1/25/23 14:44, Mel Gorman wrote:
> Commit 7efc3b726103 ("mm/compaction: fix set skip in
> fast_find_migrateblock") address an issue where a pageblock selected
> by fast_find_migrateblock() was ignored. Unfortunately, the same fix
> resulted in numerous reports of khugepaged or kcompactd stalling for
> long periods of time or consuming 100% of CPU.
> 
> Tracing showed that there was a lot of rescanning between a small subset
> of pageblocks because the conditions for marking the block skip are not
> met. The scan is not reaching the end of the pageblock because enough
> pages were isolated but none were migrated successfully. Eventually it
> circles back to the same block.
> 
> Pageblock skip tracking tries to minimise both latency and excessive
> scanning but tracking exactly when a block is fully scanned requires an
> excessive amount of state. This patch forcibly rescans a pageblock when
> all isolated pages fail to migrate even though it could be for transient
> reasons such as page writeback or page dirty. This will sometimes migrate
> too many pages but pageblocks will be marked skip and forward progress
> will be made.
> 
> "Usemen" from the mmtests configuration
> workload-usemem-stress-numa-compact was used to stress compaction.
> The compaction trace events were recorded using a 6.2-rc5 kernel that
> includes commit 7efc3b726103 and count of unique ranges were measured. The
> top 5 ranges were
> 
>    3076 range=(0x10ca00-0x10cc00)
>    3076 range=(0x110a00-0x110c00)
>    3098 range=(0x13b600-0x13b800)
>    3104 range=(0x141c00-0x141e00)
>   11424 range=(0x11b600-0x11b800)
> 
> While this workload is very different than what the bugs reported,
> the pattern of the same subset of blocks being repeatedly scanned is
> observed. At one point, *only* the range range=(0x11b600 ~ 0x11b800)
> was scanned for 2 seconds. 14 seconds passed between the first
> migration-related event and the last.
> 
> With the series applied including this patch, the top 5 ranges were
> 
>       1 range=(0x11607e-0x116200)
>       1 range=(0x116200-0x116278)
>       1 range=(0x116278-0x116400)
>       1 range=(0x116400-0x116424)
>       1 range=(0x116424-0x116600)
> 
> Only unique ranges were scanned and the time between the first
> migration-related event was 0.11 milliseconds.
> 
> Fixes: 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

While this seems like it will mostly prevent the issue at hand (I think
kcompactd is still a hazard, see below), I'm not super happy about some of
the implementation details.

1. it modifies code that was meant to quickly skip an order-aligned block
where a page migration failed during MIGRATE_ASYNC direct compaction, as
it's very unlikely to sucessfully form a free page of that order in that
block. Now instead it will finish the whole pageblock in that case, which
could be lots of useless work and thus exactly the opposite of what we
wanted for MIGRATE_ASYNC direct compaction.

2. The conditions "cc->direct_compaction" and "(cc->mode < MIGRATE_SYNC)"
seem a bit hazardous. I think we can have a compaction where these are not
true, and yet it uses fast_find_migrateblock() and thus can exhibit the bug
but won't be forced to rescan?
AFAICS kcompactd_do_work()
- is MIGRATE_SYNC_LIGHT
- has ignore_skip_hint = false
- has direct_compaction = false

so AFAICS it will use fast_find_migrateblock() and not bail out in one of
the preconditions. But the cc->direct_compaction condition here won't apply.

So it might be better to leave the current "skip the rest of block" check
alone, and add a separate check for the finish_pageblock rescan that will
not miss some cases where it should apply - maybe it could check for a
complete migration failure specifically as well?
> ---
>  mm/compaction.c | 30 ++++++++++++++++++++++--------
>  1 file changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4b3a0238879c..937ec2f05f2c 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2394,6 +2394,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>  			cc->finish_pageblock = true;
>  		}
>  
> +rescan:
>  		switch (isolate_migratepages(cc)) {
>  		case ISOLATE_ABORT:
>  			ret = COMPACT_CONTENDED;
> @@ -2436,15 +2437,28 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>  				goto out;
>  			}
>  			/*
> -			 * We failed to migrate at least one page in the current
> -			 * order-aligned block, so skip the rest of it.
> +			 * If an ASYNC or SYNC_LIGHT fails to migrate a page
> +			 * within the current order-aligned block, scan the
> +			 * remainder of the pageblock. This will mark the
> +			 * pageblock "skip" to avoid rescanning in the near
> +			 * future. This will isolate more pages than necessary
> +			 * for the request but avoid loops due to
> +			 * fast_find_migrateblock revisiting blocks that were
> +			 * recently partially scanned.
>  			 */
> -			if (cc->direct_compaction &&
> -						(cc->mode == MIGRATE_ASYNC)) {
> -				cc->migrate_pfn = block_end_pfn(
> -						cc->migrate_pfn - 1, cc->order);
> -				/* Draining pcplists is useless in this case */
> -				last_migrated_pfn = 0;
> +			if (cc->direct_compaction && !cc->finish_pageblock &&
> +						(cc->mode < MIGRATE_SYNC)) {
> +				cc->finish_pageblock = true;
> +
> +				/*
> +				 * Draining pcplists does not help THP if
> +				 * any page failed to migrate. Even after
> +				 * drain, the pageblock will not be free.
> +				 */
> +				if (cc->order == COMPACTION_HPAGE_ORDER)
> +					last_migrated_pfn = 0;
> +
> +				goto rescan;
>  			}
>  		}
>  


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] mm, compaction: Finish pageblocks on complete migration failure
  2023-02-07 17:42   ` Vlastimil Babka
@ 2023-02-13 21:50     ` Vlastimil Babka
  0 siblings, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2023-02-13 21:50 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On 2/7/23 18:42, Vlastimil Babka wrote:
> On 1/25/23 14:44, Mel Gorman wrote:
>> Commit 7efc3b726103 ("mm/compaction: fix set skip in
>> fast_find_migrateblock") address an issue where a pageblock selected
>> by fast_find_migrateblock() was ignored. Unfortunately, the same fix
>> resulted in numerous reports of khugepaged or kcompactd stalling for
>> long periods of time or consuming 100% of CPU.
>>
>> Tracing showed that there was a lot of rescanning between a small subset
>> of pageblocks because the conditions for marking the block skip are not
>> met. The scan is not reaching the end of the pageblock because enough
>> pages were isolated but none were migrated successfully. Eventually it
>> circles back to the same block.
>>
>> Pageblock skip tracking tries to minimise both latency and excessive
>> scanning but tracking exactly when a block is fully scanned requires an
>> excessive amount of state. This patch forcibly rescans a pageblock when
>> all isolated pages fail to migrate even though it could be for transient
>> reasons such as page writeback or page dirty. This will sometimes migrate
>> too many pages but pageblocks will be marked skip and forward progress
>> will be made.
>>
>> "Usemen" from the mmtests configuration
>> workload-usemem-stress-numa-compact was used to stress compaction.
>> The compaction trace events were recorded using a 6.2-rc5 kernel that
>> includes commit 7efc3b726103 and count of unique ranges were measured. The
>> top 5 ranges were
>>
>>    3076 range=(0x10ca00-0x10cc00)
>>    3076 range=(0x110a00-0x110c00)
>>    3098 range=(0x13b600-0x13b800)
>>    3104 range=(0x141c00-0x141e00)
>>   11424 range=(0x11b600-0x11b800)
>>
>> While this workload is very different than what the bugs reported,
>> the pattern of the same subset of blocks being repeatedly scanned is
>> observed. At one point, *only* the range range=(0x11b600 ~ 0x11b800)
>> was scanned for 2 seconds. 14 seconds passed between the first
>> migration-related event and the last.
>>
>> With the series applied including this patch, the top 5 ranges were
>>
>>       1 range=(0x11607e-0x116200)
>>       1 range=(0x116200-0x116278)
>>       1 range=(0x116278-0x116400)
>>       1 range=(0x116400-0x116424)
>>       1 range=(0x116424-0x116600)
>>
>> Only unique ranges were scanned and the time between the first
>> migration-related event was 0.11 milliseconds.
>>
>> Fixes: 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
>> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> While this seems like it will mostly prevent the issue at hand (I think
> kcompactd is still a hazard, see below), I'm not super happy about some of
> the implementation details.

For the record, after some discussion with Mel, my concerns are not a blocker
and the series can proceed from mm-stable to 6.3.

> 1. it modifies code that was meant to quickly skip an order-aligned block
> where a page migration failed during MIGRATE_ASYNC direct compaction, as
> it's very unlikely to sucessfully form a free page of that order in that
> block. Now instead it will finish the whole pageblock in that case, which
> could be lots of useless work and thus exactly the opposite of what we
> wanted for MIGRATE_ASYNC direct compaction.

There are both advantages and disadvantages to this so not clear win or lose.

> 2. The conditions "cc->direct_compaction" and "(cc->mode < MIGRATE_SYNC)"
> seem a bit hazardous. I think we can have a compaction where these are not
> true, and yet it uses fast_find_migrateblock() and thus can exhibit the bug
> but won't be forced to rescan?
> AFAICS kcompactd_do_work()
> - is MIGRATE_SYNC_LIGHT
> - has ignore_skip_hint = false
> - has direct_compaction = false
> 
> so AFAICS it will use fast_find_migrateblock() and not bail out in one of
> the preconditions. But the cc->direct_compaction condition here won't apply.

This is only a concern once we attempt to un-revert 7efc3b726103 again so
only necessary to be addressed as part of future series leading to the un-revert.

> So it might be better to leave the current "skip the rest of block" check
> alone, and add a separate check for the finish_pageblock rescan that will
> not miss some cases where it should apply - maybe it could check for a
> complete migration failure specifically as well?
>> ---
>>  mm/compaction.c | 30 ++++++++++++++++++++++--------
>>  1 file changed, 22 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 4b3a0238879c..937ec2f05f2c 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -2394,6 +2394,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>  			cc->finish_pageblock = true;
>>  		}
>>  
>> +rescan:
>>  		switch (isolate_migratepages(cc)) {
>>  		case ISOLATE_ABORT:
>>  			ret = COMPACT_CONTENDED;
>> @@ -2436,15 +2437,28 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>  				goto out;
>>  			}
>>  			/*
>> -			 * We failed to migrate at least one page in the current
>> -			 * order-aligned block, so skip the rest of it.
>> +			 * If an ASYNC or SYNC_LIGHT fails to migrate a page
>> +			 * within the current order-aligned block, scan the
>> +			 * remainder of the pageblock. This will mark the
>> +			 * pageblock "skip" to avoid rescanning in the near
>> +			 * future. This will isolate more pages than necessary
>> +			 * for the request but avoid loops due to
>> +			 * fast_find_migrateblock revisiting blocks that were
>> +			 * recently partially scanned.
>>  			 */
>> -			if (cc->direct_compaction &&
>> -						(cc->mode == MIGRATE_ASYNC)) {
>> -				cc->migrate_pfn = block_end_pfn(
>> -						cc->migrate_pfn - 1, cc->order);
>> -				/* Draining pcplists is useless in this case */
>> -				last_migrated_pfn = 0;
>> +			if (cc->direct_compaction && !cc->finish_pageblock &&
>> +						(cc->mode < MIGRATE_SYNC)) {
>> +				cc->finish_pageblock = true;
>> +
>> +				/*
>> +				 * Draining pcplists does not help THP if
>> +				 * any page failed to migrate. Even after
>> +				 * drain, the pageblock will not be free.
>> +				 */
>> +				if (cc->order == COMPACTION_HPAGE_ORDER)
>> +					last_migrated_pfn = 0;
>> +
>> +				goto rescan;
>>  			}
>>  		}
>>  
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-02-13 21:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-25 13:44 [RFC PATCH 0/4] Fix excessive CPU usage during compaction Mel Gorman
2023-01-25 13:44 ` [PATCH 1/4] mm, compaction: Rename compact_control->rescan to finish_pageblock Mel Gorman
2023-02-07 16:22   ` Vlastimil Babka
2023-01-25 13:44 ` [PATCH 2/4] mm, compaction: Check if a page has been captured before draining PCP pages Mel Gorman
2023-02-07 16:53   ` Vlastimil Babka
2023-01-25 13:44 ` [PATCH 3/4] mm, compaction: Finish scanning the current pageblock if requested Mel Gorman
2023-02-07 17:10   ` Vlastimil Babka
2023-01-25 13:44 ` [PATCH 4/4] mm, compaction: Finish pageblocks on complete migration failure Mel Gorman
2023-02-07 17:42   ` Vlastimil Babka
2023-02-13 21:50     ` Vlastimil Babka
2023-01-26  1:11 ` [RFC PATCH 0/4] Fix excessive CPU usage during compaction Andrew Morton
2023-01-26  9:04   ` Mel Gorman
2023-01-29 18:03   ` Vlastimil Babka
2023-01-29 21:00     ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).