[PATCH 0/4] Follow-up "Fix excessive CPU usage during compaction"

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] Follow-up "Fix excessive CPU usage during compaction"
@ 2023-05-15 11:33 Mel Gorman
  2023-05-15 11:33 ` [PATCH 1/4] mm: compaction: Ensure rescanning only happens on partially scanned pageblocks Mel Gorman
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Mel Gorman @ 2023-05-15 11:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML,
	Mel Gorman

The series "Fix excessive CPU usage during compaction" [1] attempted to
fix a bug [2] but Vlastimil noted that the fix was incomplete [3]. While
the series was merged, fast_find_migrateblock was still disabled. This
series should fix the corner cases and allow 95e7a450b819 ("Revert
"mm/compaction: fix set skip in fast_find_migrateblock"") to be safely
reverted. Details on how many pageblocks are rescanned are in the
changelog of the last patch.

[1] https://lore.kernel.org/r/20230125134434.18017-1-mgorman@techsingularity.net
[2] https://bugzilla.suse.com/show_bug.cgi?id=1206848
[3] https://lore.kernel.org/r/a55cf026-a2f9-ef01-9a4c-398693e048ea@suse.cz

 mm/compaction.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

-- 
2.35.3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] mm: compaction: Ensure rescanning only happens on partially scanned pageblocks
  2023-05-15 11:33 [PATCH 0/4] Follow-up "Fix excessive CPU usage during compaction" Mel Gorman
@ 2023-05-15 11:33 ` Mel Gorman
  2023-05-25  9:57   ` Vlastimil Babka
  2023-05-15 11:33 ` [PATCH 2/4] mm: compaction: Only force pageblock scan completion when skip hints are obeyed Mel Gorman
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2023-05-15 11:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML,
	Mel Gorman

compact_zone() intends to rescan pageblocks if there is a failure to
migrate "within the current order-aligned block". However, the pageblock
scan may already be complete and moved to the next block causing the
next pageblock to be "rescanned". Ensure only the most recent pageblock
is rescanned.

Reported-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index c8bcdea15f5f..81791c124bb8 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2464,8 +2464,9 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 			 * fast_find_migrateblock revisiting blocks that were
 			 * recently partially scanned.
 			 */
-			if (cc->direct_compaction && !cc->finish_pageblock &&
-						(cc->mode < MIGRATE_SYNC)) {
+			if (!pageblock_aligned(cc->migrate_pfn) &&
+			    cc->direct_compaction && !cc->finish_pageblock &&
+			    (cc->mode < MIGRATE_SYNC)) {
 				cc->finish_pageblock = true;
 
 				/*
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/4] mm: compaction: Only force pageblock scan completion when skip hints are obeyed
  2023-05-15 11:33 [PATCH 0/4] Follow-up "Fix excessive CPU usage during compaction" Mel Gorman
  2023-05-15 11:33 ` [PATCH 1/4] mm: compaction: Ensure rescanning only happens on partially scanned pageblocks Mel Gorman
@ 2023-05-15 11:33 ` Mel Gorman
  2023-05-25 10:01   ` Vlastimil Babka
  2023-05-15 11:33 ` [PATCH 3/4] mm: compaction: Update pageblock skip when first migration candidate is not at the start Mel Gorman
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2023-05-15 11:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML,
	Mel Gorman

fast_find_migrateblock relies on skip hints to avoid rescanning a recently
selected pageblock but compact_zone() only forces the pageblock scan
completion to set the skip hint if in direct compaction.  While this
prevents direct compaction repeatedly scanning a subset of blocks due
to fast_find_migrateblock(), it does not prevent proactive compaction,
node compaction and kcompactd encountering the same problem described
in commit cfccd2e63e7e ("mm, compaction: finish pageblocks on complete
migration failure").

Force the scan completion of a pageblock to set the skip hint if skip
hints are obeyed to prevent fast_find_migrateblock() repeatedly selecting
a subset of pageblocks.

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 81791c124bb8..accc6568091a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2456,7 +2456,8 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 			}
 			/*
 			 * If an ASYNC or SYNC_LIGHT fails to migrate a page
-			 * within the current order-aligned block, scan the
+			 * within the current order-aligned block and
+			 * fast_find_migrateblock may be used then scan the
 			 * remainder of the pageblock. This will mark the
 			 * pageblock "skip" to avoid rescanning in the near
 			 * future. This will isolate more pages than necessary
@@ -2465,7 +2466,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 			 * recently partially scanned.
 			 */
 			if (!pageblock_aligned(cc->migrate_pfn) &&
-			    cc->direct_compaction && !cc->finish_pageblock &&
+			    !cc->ignore_skip_hint && !cc->finish_pageblock &&
 			    (cc->mode < MIGRATE_SYNC)) {
 				cc->finish_pageblock = true;
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/4] mm: compaction: Update pageblock skip when first migration candidate is not at the start
  2023-05-15 11:33 [PATCH 0/4] Follow-up "Fix excessive CPU usage during compaction" Mel Gorman
  2023-05-15 11:33 ` [PATCH 1/4] mm: compaction: Ensure rescanning only happens on partially scanned pageblocks Mel Gorman
  2023-05-15 11:33 ` [PATCH 2/4] mm: compaction: Only force pageblock scan completion when skip hints are obeyed Mel Gorman
@ 2023-05-15 11:33 ` Mel Gorman
  2023-05-25 13:37   ` Vlastimil Babka
  2023-05-15 11:33 ` [PATCH 4/4] Revert "Revert "mm/compaction: fix set skip in fast_find_migrateblock"" Mel Gorman
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2023-05-15 11:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML,
	Mel Gorman

isolate_migratepages_block should mark a pageblock as skip if scanning
started on an aligned pageblock boundary but it only updates the skip
flag if the first migration candidate is also aligned. Tracing during
a compaction stress load (mmtests: workload-usemem-stress-numa-compact)
that many pageblocks are not marked skip causing excessive scanning of
blocks that had been recently checked. Update pageblock skip based on
"valid_page" which is set if scanning started on a pageblock boundary.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index accc6568091a..d7be990b1d60 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -392,18 +392,14 @@ void reset_isolation_suitable(pg_data_t *pgdat)
  * Sets the pageblock skip bit if it was clear. Note that this is a hint as
  * locks are not required for read/writers. Returns true if it was already set.
  */
-static bool test_and_set_skip(struct compact_control *cc, struct page *page,
-							unsigned long pfn)
+static bool test_and_set_skip(struct compact_control *cc, struct page *page)
 {
 	bool skip;
 
-	/* Do no update if skip hint is being ignored */
+	/* Do not update if skip hint is being ignored */
 	if (cc->ignore_skip_hint)
 		return false;
 
-	if (!pageblock_aligned(pfn))
-		return false;
-
 	skip = get_pageblock_skip(page);
 	if (!skip && !cc->no_set_skip_hint)
 		set_pageblock_skip(page);
@@ -470,8 +466,7 @@ static void update_cached_migrate(struct compact_control *cc, unsigned long pfn)
 {
 }
 
-static bool test_and_set_skip(struct compact_control *cc, struct page *page,
-							unsigned long pfn)
+static bool test_and_set_skip(struct compact_control *cc, struct page *page)
 {
 	return false;
 }
@@ -1075,9 +1070,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			lruvec_memcg_debug(lruvec, page_folio(page));
 
 			/* Try get exclusive access under lock */
-			if (!skip_updated) {
+			if (!skip_updated && valid_page) {
 				skip_updated = true;
-				if (test_and_set_skip(cc, page, low_pfn))
+				if (test_and_set_skip(cc, valid_page))
 					goto isolate_abort;
 			}
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/4] Revert "Revert "mm/compaction: fix set skip in fast_find_migrateblock""
  2023-05-15 11:33 [PATCH 0/4] Follow-up "Fix excessive CPU usage during compaction" Mel Gorman
                   ` (2 preceding siblings ...)
  2023-05-15 11:33 ` [PATCH 3/4] mm: compaction: Update pageblock skip when first migration candidate is not at the start Mel Gorman
@ 2023-05-15 11:33 ` Mel Gorman
  2023-05-25 13:42   ` Vlastimil Babka
  2023-05-19  6:43 ` [PATCH 0/4] Follow-up "Fix excessive CPU usage during compaction" Raghavendra K T
  2023-05-23 13:47 ` Baolin Wang
  5 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2023-05-15 11:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML,
	Mel Gorman

This reverts commit 95e7a450b819 ("Revert "mm/compaction: fix set skip
in fast_find_migrateblock"").

Commit 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
was reverted due to bug reports about khugepaged consuming large amounts
of CPU without making progress. The underlying bug was partially fixed
by commit cfccd2e63e7e ("mm, compaction: finish pageblocks on complete
migration failure") but it only mitigated the problem and Vlastimil Babka
pointing out the same issue could theoretically happen to kcompactd.

As pageblocks containing pages that fail to migrate should now be
forcibly rescanned to set the skip hint if skip hints are used,
fast_find_migrateblock() should no longer loop on a small subset
of pageblocks for prolonged periods of time. Revert the revert so
fast_find_migrateblock() is effective again.

Using the mmtests config workload-usemem-stress-numa-compact, the number
of unique ranges scanned was analysed for both kcompactd and !kcompactd
activity.

6.4.0-rc1-vanilla
kcompactd
      7 range=(0x10d600~0x10d800)
      7 range=(0x110c00~0x110e00)
      7 range=(0x110e00~0x111000)
      7 range=(0x111800~0x111a00)
      7 range=(0x111a00~0x111c00)
!kcompactd
      1 range=(0x113e00~0x114000)
      1 range=(0x114000~0x114020)
      1 range=(0x114400~0x114489)
      1 range=(0x114489~0x1144aa)
      1 range=(0x1144aa~0x114600)

6.4.0-rc1-mm-revertfastmigrate
kcompactd
     17 range=(0x104200~0x104400)
     17 range=(0x104400~0x104600)
     17 range=(0x104600~0x104800)
     17 range=(0x104800~0x104a00)
     17 range=(0x104a00~0x104c00)
!kcompactd
   1793 range=(0x15c200~0x15c400)
   5436 range=(0x105800~0x105a00)
  19826 range=(0x150a00~0x150c00)
  19833 range=(0x150800~0x150a00)
  19834 range=(0x11ce00~0x11d000)

6.4.0-rc1-mm-follupfastfind
kcompactd
     22 range=(0x107200~0x107400)
     23 range=(0x107400~0x107600)
     23 range=(0x107600~0x107800)
     23 range=(0x107c00~0x107e00)
     23 range=(0x107e00~0x108000)
!kcompactd
      3 range=(0x890240~0x890400)
      5 range=(0x886e00~0x887000)
      5 range=(0x88a400~0x88a600)
      6 range=(0x88f800~0x88fa00)
      9 range=(0x88a400~0x88a420)

Note that the vanilla kernel and the full series had some duplication of
ranges scanned but it was not severe and would be in line with compaction
resets when the skip hints are cleared.  Just a revert of commit 7efc3b726103
("mm/compaction: fix set skip in fast_find_migrateblock") showed excessive
rescans of the same ranges so the series should not reintroduce bug 1206848.

Link: https://bugzilla.suse.com/show_bug.cgi?id=1206848
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index d7be990b1d60..91af6a8b7a98 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1859,7 +1859,6 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
 					pfn = cc->zone->zone_start_pfn;
 				cc->fast_search_fail = 0;
 				found_block = true;
-				set_pageblock_skip(freepage);
 				break;
 			}
 		}
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Follow-up "Fix excessive CPU usage during compaction"
  2023-05-15 11:33 [PATCH 0/4] Follow-up "Fix excessive CPU usage during compaction" Mel Gorman
                   ` (3 preceding siblings ...)
  2023-05-15 11:33 ` [PATCH 4/4] Revert "Revert "mm/compaction: fix set skip in fast_find_migrateblock"" Mel Gorman
@ 2023-05-19  6:43 ` Raghavendra K T
  2023-05-21 19:20   ` Mel Gorman
  2023-05-23 13:47 ` Baolin Wang
  5 siblings, 1 reply; 20+ messages in thread
From: Raghavendra K T @ 2023-05-19  6:43 UTC (permalink / raw)
  To: Mel Gorman, Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On 5/15/2023 5:03 PM, Mel Gorman wrote:
> The series "Fix excessive CPU usage during compaction" [1] attempted to
> fix a bug [2] but Vlastimil noted that the fix was incomplete [3]. While
> the series was merged, fast_find_migrateblock was still disabled. This
> series should fix the corner cases and allow 95e7a450b819 ("Revert
> "mm/compaction: fix set skip in fast_find_migrateblock"") to be safely
> reverted. Details on how many pageblocks are rescanned are in the
> changelog of the last patch.
> 
> [1] https://lore.kernel.org/r/20230125134434.18017-1-mgorman@techsingularity.net
> [2] https://bugzilla.suse.com/show_bug.cgi?id=1206848
> [3] https://lore.kernel.org/r/a55cf026-a2f9-ef01-9a4c-398693e048ea@suse.cz
> 
>   mm/compaction.c | 24 ++++++++++--------------
>   1 file changed, 10 insertions(+), 14 deletions(-)
> 

Hello Mel,

Not sure how much this info would help, (also I saw it is in Andrew's
tree already) But was curious to evaluate the  patchset from perf
perspective, and I have run mmtest usemem so here is the result (Only
for  compact cases).

kernel configuration:
1) base 6.4.0-rc2
2) patchseries
3) base - revert of 7efc3b726103

Summary,
The result shows decent improvement from perf perspective as well as
compaction related data.

$ cat kcompact_fix_result_thp_madv

SUT : 256 cpu two node milan

usemem
                       6.4.0-rc2              6.4.0-rc2-compactfix 
6.4.0-rc2-revert
Amean     syst-1      102.18 (   0.00%)       98.98 *   3.13%* 
96.96 *   5.10%*
Amean     elsp-1      220.57 (   0.00%)      215.96 *   2.09%* 
212.49 *   3.67%*
Amean     syst-3      126.34 (   0.00%)      122.39 *   3.13%* 
124.09 *   1.78%*
Amean     elsp-3       84.72 (   0.00%)       81.91 *   3.31%* 
82.46 *   2.66%*
Amean     syst-4      142.64 (   0.00%)      130.69 *   8.38%* 
131.19 *   8.03%*
Amean     elsp-4       79.48 (   0.00%)       66.60 *  16.21%* 
66.37 *  16.49%*

                      6.4.0-rc2  6.4.0-rc2-compactfix    6.4.0-rc2-revert
Duration User        2106.24             2065.61        2047.87
Duration System      2598.68             2464.98        2466.33
Duration Elapsed     2693.98             2551.89        2529.83

                                 6.4.0-rc2 
6.4.0-rc2-compactfix            6.4.0-rc2-revert
Ops Minor Faults                  2440289187.00          2356626072.00 
         2354663705.00
Ops Sector Reads                         556.00                1028.00 
               2400.00
Ops Sector Writes                      20388.00               15556.00 
              16744.00
Ops Page migrate success            94642602.00            40234601.00 
           38486059.00
Ops Page migrate failure                2161.00                 467.00 
                725.00
Ops Compaction pages isolated      116977416.00            80467188.00 
           76966075.00
Ops Compaction migrate scanned     136909038.00            60060978.00 
           60479389.00
Ops Compaction free scanned        165907615.00            95661742.00 
           89794809.00
Ops Compact scan efficiency               82.52                  62.78 
                 67.35
Ops Compaction cost                   101419.99               43712.83 
              41834.25
Ops Kcompactd wake                         0.00                   0.00 
                  0.00
Ops Kcompactd migrate scanned      136909038.00            60060978.00 
           60479389.00
Ops Kcompactd free scanned         165907615.00            95661742.00 
           89794809.00
Ops NUMA alloc hit                1834950351.00          1853285684.00 
         1867258209.00
Ops NUMA alloc miss                413982625.00           359494012.00 
          345537623.00
Ops NUMA alloc local              1834950349.00          1853285649.00 
         1867258632.00
Ops NUMA base-page range updates   202326681.00           149242271.00 
          147460952.00
Ops NUMA PTE updates               202326681.00           149242271.00 
          147460952.00
Ops NUMA hint faults               195595665.00           148152760.00 
          146173618.00
Ops NUMA hint local faults %       159439097.00           148150869.00 
          146169658.00
Ops NUMA hint local percent               81.51                 100.00 
                100.00
Ops NUMA pages migrated             36155998.00                1454.00 
               3596.00
Ops AutoNUMA cost                     980081.58              741808.52 
             731900.38

Thanks

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Follow-up "Fix excessive CPU usage during compaction"
  2023-05-19  6:43 ` [PATCH 0/4] Follow-up "Fix excessive CPU usage during compaction" Raghavendra K T
@ 2023-05-21 19:20   ` Mel Gorman
  0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2023-05-21 19:20 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Vlastimil Babka, Andrew Morton, Jiri Slaby, Maxim Levitsky,
	Michal Hocko, Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM,
	LKML

On Fri, May 19, 2023 at 12:13:11PM +0530, Raghavendra K T wrote:
> On 5/15/2023 5:03 PM, Mel Gorman wrote:
> > The series "Fix excessive CPU usage during compaction" [1] attempted to
> > fix a bug [2] but Vlastimil noted that the fix was incomplete [3]. While
> > the series was merged, fast_find_migrateblock was still disabled. This
> > series should fix the corner cases and allow 95e7a450b819 ("Revert
> > "mm/compaction: fix set skip in fast_find_migrateblock"") to be safely
> > reverted. Details on how many pageblocks are rescanned are in the
> > changelog of the last patch.
> > 
> > [1] https://lore.kernel.org/r/20230125134434.18017-1-mgorman@techsingularity.net
> > [2] https://bugzilla.suse.com/show_bug.cgi?id=1206848
> > [3] https://lore.kernel.org/r/a55cf026-a2f9-ef01-9a4c-398693e048ea@suse.cz
> > 
> >   mm/compaction.c | 24 ++++++++++--------------
> >   1 file changed, 10 insertions(+), 14 deletions(-)
> > 
> 
> Hello Mel,
> 
> Not sure how much this info would help, (also I saw it is in Andrew's
> tree already) But was curious to evaluate the  patchset from perf
> perspective, and I have run mmtest usemem so here is the result (Only
> for  compact cases).
> 
> <SNIP>
>

Thanks Raghavendra!

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Follow-up "Fix excessive CPU usage during compaction"
  2023-05-15 11:33 [PATCH 0/4] Follow-up "Fix excessive CPU usage during compaction" Mel Gorman
                   ` (4 preceding siblings ...)
  2023-05-19  6:43 ` [PATCH 0/4] Follow-up "Fix excessive CPU usage during compaction" Raghavendra K T
@ 2023-05-23 13:47 ` Baolin Wang
  5 siblings, 0 replies; 20+ messages in thread
From: Baolin Wang @ 2023-05-23 13:47 UTC (permalink / raw)
  To: Mel Gorman, Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

Hi Mel,

On 5/15/2023 7:33 PM, Mel Gorman wrote:
> The series "Fix excessive CPU usage during compaction" [1] attempted to
> fix a bug [2] but Vlastimil noted that the fix was incomplete [3]. While
> the series was merged, fast_find_migrateblock was still disabled. This
> series should fix the corner cases and allow 95e7a450b819 ("Revert
> "mm/compaction: fix set skip in fast_find_migrateblock"") to be safely
> reverted. Details on how many pageblocks are rescanned are in the
> changelog of the last patch.
> 
> [1] https://lore.kernel.org/r/20230125134434.18017-1-mgorman@techsingularity.net
> [2] https://bugzilla.suse.com/show_bug.cgi?id=1206848
> [3] https://lore.kernel.org/r/a55cf026-a2f9-ef01-9a4c-398693e048ea@suse.cz

I've run this series on my machine, which can solve the regression I met 
before [1], and no other issues found. So please feel free to add:
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>

[1] 
https://lore.kernel.org/all/3576e3520c044beb2a81860aecb2d4f597089300.1682521303.git.baolin.wang@linux.alibaba.com/

> 
>   mm/compaction.c | 24 ++++++++++--------------
>   1 file changed, 10 insertions(+), 14 deletions(-)
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] mm: compaction: Ensure rescanning only happens on partially scanned pageblocks
  2023-05-15 11:33 ` [PATCH 1/4] mm: compaction: Ensure rescanning only happens on partially scanned pageblocks Mel Gorman
@ 2023-05-25  9:57   ` Vlastimil Babka
  0 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2023-05-25  9:57 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On 5/15/23 13:33, Mel Gorman wrote:
> compact_zone() intends to rescan pageblocks if there is a failure to
> migrate "within the current order-aligned block". However, the pageblock
> scan may already be complete and moved to the next block causing the
> next pageblock to be "rescanned". Ensure only the most recent pageblock
> is rescanned.
> 
> Reported-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/compaction.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index c8bcdea15f5f..81791c124bb8 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2464,8 +2464,9 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>  			 * fast_find_migrateblock revisiting blocks that were
>  			 * recently partially scanned.
>  			 */
> -			if (cc->direct_compaction && !cc->finish_pageblock &&
> -						(cc->mode < MIGRATE_SYNC)) {
> +			if (!pageblock_aligned(cc->migrate_pfn) &&
> +			    cc->direct_compaction && !cc->finish_pageblock &&
> +			    (cc->mode < MIGRATE_SYNC)) {
>  				cc->finish_pageblock = true;
>  
>  				/*


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] mm: compaction: Only force pageblock scan completion when skip hints are obeyed
  2023-05-15 11:33 ` [PATCH 2/4] mm: compaction: Only force pageblock scan completion when skip hints are obeyed Mel Gorman
@ 2023-05-25 10:01   ` Vlastimil Babka
  0 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2023-05-25 10:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On 5/15/23 13:33, Mel Gorman wrote:
> fast_find_migrateblock relies on skip hints to avoid rescanning a recently
> selected pageblock but compact_zone() only forces the pageblock scan
> completion to set the skip hint if in direct compaction.  While this
> prevents direct compaction repeatedly scanning a subset of blocks due
> to fast_find_migrateblock(), it does not prevent proactive compaction,
> node compaction and kcompactd encountering the same problem described
> in commit cfccd2e63e7e ("mm, compaction: finish pageblocks on complete
> migration failure").
> 
> Force the scan completion of a pageblock to set the skip hint if skip
> hints are obeyed to prevent fast_find_migrateblock() repeatedly selecting
> a subset of pageblocks.
> 
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/compaction.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 81791c124bb8..accc6568091a 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2456,7 +2456,8 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>  			}
>  			/*
>  			 * If an ASYNC or SYNC_LIGHT fails to migrate a page
> -			 * within the current order-aligned block, scan the
> +			 * within the current order-aligned block and
> +			 * fast_find_migrateblock may be used then scan the
>  			 * remainder of the pageblock. This will mark the
>  			 * pageblock "skip" to avoid rescanning in the near
>  			 * future. This will isolate more pages than necessary
> @@ -2465,7 +2466,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>  			 * recently partially scanned.
>  			 */
>  			if (!pageblock_aligned(cc->migrate_pfn) &&
> -			    cc->direct_compaction && !cc->finish_pageblock &&
> +			    !cc->ignore_skip_hint && !cc->finish_pageblock &&
>  			    (cc->mode < MIGRATE_SYNC)) {
>  				cc->finish_pageblock = true;
>  


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] mm: compaction: Update pageblock skip when first migration candidate is not at the start
  2023-05-15 11:33 ` [PATCH 3/4] mm: compaction: Update pageblock skip when first migration candidate is not at the start Mel Gorman
@ 2023-05-25 13:37   ` Vlastimil Babka
  2023-05-29 10:33     ` Mel Gorman
  0 siblings, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2023-05-25 13:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On 5/15/23 13:33, Mel Gorman wrote:
> isolate_migratepages_block should mark a pageblock as skip if scanning
> started on an aligned pageblock boundary but it only updates the skip
> flag if the first migration candidate is also aligned. Tracing during
> a compaction stress load (mmtests: workload-usemem-stress-numa-compact)
> that many pageblocks are not marked skip causing excessive scanning of
> blocks that had been recently checked. Update pageblock skip based on
> "valid_page" which is set if scanning started on a pageblock boundary.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

I wonder if this has an unintended side-effect that if we resume
isolate_migratepages_block() of a partially compacted pageblock to finish
it, test_and_set_skip() will now tell us to abort, because we already set
the skip bit in the previous call. This would include the
cc->finish_pageblock rescan cases.

So unless I miss something that already prevents that, I agree we should not
tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not
pageblock aligned, we should ignore the already-set skip bit, as it was most
likely being set by us in the previous iteration and should not prevent us
from finishing the pageblock?

> ---
>  mm/compaction.c | 15 +++++----------
>  1 file changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index accc6568091a..d7be990b1d60 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -392,18 +392,14 @@ void reset_isolation_suitable(pg_data_t *pgdat)
>   * Sets the pageblock skip bit if it was clear. Note that this is a hint as
>   * locks are not required for read/writers. Returns true if it was already set.
>   */
> -static bool test_and_set_skip(struct compact_control *cc, struct page *page,
> -							unsigned long pfn)
> +static bool test_and_set_skip(struct compact_control *cc, struct page *page)
>  {
>  	bool skip;
>  
> -	/* Do no update if skip hint is being ignored */
> +	/* Do not update if skip hint is being ignored */
>  	if (cc->ignore_skip_hint)
>  		return false;
>  
> -	if (!pageblock_aligned(pfn))
> -		return false;
> -
>  	skip = get_pageblock_skip(page);
>  	if (!skip && !cc->no_set_skip_hint)
>  		set_pageblock_skip(page);
> @@ -470,8 +466,7 @@ static void update_cached_migrate(struct compact_control *cc, unsigned long pfn)
>  {
>  }
>  
> -static bool test_and_set_skip(struct compact_control *cc, struct page *page,
> -							unsigned long pfn)
> +static bool test_and_set_skip(struct compact_control *cc, struct page *page)
>  {
>  	return false;
>  }
> @@ -1075,9 +1070,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  			lruvec_memcg_debug(lruvec, page_folio(page));
>  
>  			/* Try get exclusive access under lock */
> -			if (!skip_updated) {
> +			if (!skip_updated && valid_page) {
>  				skip_updated = true;
> -				if (test_and_set_skip(cc, page, low_pfn))
> +				if (test_and_set_skip(cc, valid_page))
>  					goto isolate_abort;
>  			}
>  


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] Revert "Revert "mm/compaction: fix set skip in fast_find_migrateblock""
  2023-05-15 11:33 ` [PATCH 4/4] Revert "Revert "mm/compaction: fix set skip in fast_find_migrateblock"" Mel Gorman
@ 2023-05-25 13:42   ` Vlastimil Babka
  0 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2023-05-25 13:42 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On 5/15/23 13:33, Mel Gorman wrote:
> This reverts commit 95e7a450b819 ("Revert "mm/compaction: fix set skip
> in fast_find_migrateblock"").
> 
> Commit 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
> was reverted due to bug reports about khugepaged consuming large amounts
> of CPU without making progress. The underlying bug was partially fixed
> by commit cfccd2e63e7e ("mm, compaction: finish pageblocks on complete
> migration failure") but it only mitigated the problem and Vlastimil Babka
> pointing out the same issue could theoretically happen to kcompactd.
> 
> As pageblocks containing pages that fail to migrate should now be
> forcibly rescanned to set the skip hint if skip hints are used,
> fast_find_migrateblock() should no longer loop on a small subset
> of pageblocks for prolonged periods of time. Revert the revert so
> fast_find_migrateblock() is effective again.
> 
> Using the mmtests config workload-usemem-stress-numa-compact, the number
> of unique ranges scanned was analysed for both kcompactd and !kcompactd
> activity.
> 
> 6.4.0-rc1-vanilla
> kcompactd
>       7 range=(0x10d600~0x10d800)
>       7 range=(0x110c00~0x110e00)
>       7 range=(0x110e00~0x111000)
>       7 range=(0x111800~0x111a00)
>       7 range=(0x111a00~0x111c00)
> !kcompactd
>       1 range=(0x113e00~0x114000)
>       1 range=(0x114000~0x114020)
>       1 range=(0x114400~0x114489)
>       1 range=(0x114489~0x1144aa)
>       1 range=(0x1144aa~0x114600)
> 
> 6.4.0-rc1-mm-revertfastmigrate
> kcompactd
>      17 range=(0x104200~0x104400)
>      17 range=(0x104400~0x104600)
>      17 range=(0x104600~0x104800)
>      17 range=(0x104800~0x104a00)
>      17 range=(0x104a00~0x104c00)
> !kcompactd
>    1793 range=(0x15c200~0x15c400)
>    5436 range=(0x105800~0x105a00)
>   19826 range=(0x150a00~0x150c00)
>   19833 range=(0x150800~0x150a00)
>   19834 range=(0x11ce00~0x11d000)
> 
> 6.4.0-rc1-mm-follupfastfind
> kcompactd
>      22 range=(0x107200~0x107400)
>      23 range=(0x107400~0x107600)
>      23 range=(0x107600~0x107800)
>      23 range=(0x107c00~0x107e00)
>      23 range=(0x107e00~0x108000)
> !kcompactd
>       3 range=(0x890240~0x890400)
>       5 range=(0x886e00~0x887000)
>       5 range=(0x88a400~0x88a600)
>       6 range=(0x88f800~0x88fa00)
>       9 range=(0x88a400~0x88a420)
> 
> Note that the vanilla kernel and the full series had some duplication of
> ranges scanned but it was not severe and would be in line with compaction
> resets when the skip hints are cleared.  Just a revert of commit 7efc3b726103
> ("mm/compaction: fix set skip in fast_find_migrateblock") showed excessive
> rescans of the same ranges so the series should not reintroduce bug 1206848.
> 
> Link: https://bugzilla.suse.com/show_bug.cgi?id=1206848
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

My concerns about patch 3/4 don't affect this part so

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/compaction.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index d7be990b1d60..91af6a8b7a98 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1859,7 +1859,6 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
>  					pfn = cc->zone->zone_start_pfn;
>  				cc->fast_search_fail = 0;
>  				found_block = true;
> -				set_pageblock_skip(freepage);
>  				break;
>  			}
>  		}


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] mm: compaction: Update pageblock skip when first migration candidate is not at the start
  2023-05-25 13:37   ` Vlastimil Babka
@ 2023-05-29 10:33     ` Mel Gorman
  2023-05-29 12:43       ` Vlastimil Babka
  0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2023-05-29 10:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On Thu, May 25, 2023 at 03:37:43PM +0200, Vlastimil Babka wrote:
> On 5/15/23 13:33, Mel Gorman wrote:
> > isolate_migratepages_block should mark a pageblock as skip if scanning
> > started on an aligned pageblock boundary but it only updates the skip
> > flag if the first migration candidate is also aligned. Tracing during
> > a compaction stress load (mmtests: workload-usemem-stress-numa-compact)
> > that many pageblocks are not marked skip causing excessive scanning of
> > blocks that had been recently checked. Update pageblock skip based on
> > "valid_page" which is set if scanning started on a pageblock boundary.
> > 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> I wonder if this has an unintended side-effect that if we resume
> isolate_migratepages_block() of a partially compacted pageblock to finish
> it, test_and_set_skip() will now tell us to abort, because we already set
> the skip bit in the previous call. This would include the
> cc->finish_pageblock rescan cases.
> 
> So unless I miss something that already prevents that, I agree we should not
> tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not
> pageblock aligned, we should ignore the already-set skip bit, as it was most
> likely being set by us in the previous iteration and should not prevent us
> from finishing the pageblock?
> 

Hmm, I think you're right. While it should not hit the original bug,
migration candidates are missed until the next compaction scan which
could be tricky to detect. Something like this as a separate patch?
Build tested only but the intent is for an unaligned start to set the skip
bet if already unset but otherwise complete the scan. Like earlier fixes,
this might overscan some pageblocks in a given context but we are probably
hitting the limits on how compaction can run efficiently in the current
scheme without causing other side-effects :(

diff --git a/mm/compaction.c b/mm/compaction.c
index 91af6a8b7a98..761a2dd7d78a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -792,6 +792,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	bool skip_on_failure = false;
 	unsigned long next_skip_pfn = 0;
 	bool skip_updated = false;
+	bool start_aligned;
 	int ret = 0;
 
 	cc->migrate_pfn = low_pfn;
@@ -824,6 +825,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	}
 
 	/* Time to isolate some pages for migration */
+	start_aligned = pageblock_aligned(start_pfn);
 	for (; low_pfn < end_pfn; low_pfn++) {
 
 		if (skip_on_failure && low_pfn >= next_skip_pfn) {
@@ -1069,10 +1071,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 			lruvec_memcg_debug(lruvec, page_folio(page));
 
-			/* Try get exclusive access under lock */
+			/* Try get exclusive access under lock. Isolation is
+			 * only aborted if the start was pageblock aligned
+			 * as this may be a partial resumed scan that set
+			 * the bit on a recent scan but the scan must reach
+			 * the end of the pageblock.
+			 */
 			if (!skip_updated && valid_page) {
 				skip_updated = true;
-				if (test_and_set_skip(cc, valid_page))
+				if (test_and_set_skip(cc, valid_page) && start_aligned)
 					goto isolate_abort;
 			}
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] mm: compaction: Update pageblock skip when first migration candidate is not at the start
  2023-05-29 10:33     ` Mel Gorman
@ 2023-05-29 12:43       ` Vlastimil Babka
  2023-06-02 11:16         ` Mel Gorman
  0 siblings, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2023-05-29 12:43 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On 5/29/23 12:33, Mel Gorman wrote:
> On Thu, May 25, 2023 at 03:37:43PM +0200, Vlastimil Babka wrote:
>> On 5/15/23 13:33, Mel Gorman wrote:
>> > isolate_migratepages_block should mark a pageblock as skip if scanning
>> > started on an aligned pageblock boundary but it only updates the skip
>> > flag if the first migration candidate is also aligned. Tracing during
>> > a compaction stress load (mmtests: workload-usemem-stress-numa-compact)
>> > that many pageblocks are not marked skip causing excessive scanning of
>> > blocks that had been recently checked. Update pageblock skip based on
>> > "valid_page" which is set if scanning started on a pageblock boundary.
>> > 
>> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
>> 
>> I wonder if this has an unintended side-effect that if we resume
>> isolate_migratepages_block() of a partially compacted pageblock to finish
>> it, test_and_set_skip() will now tell us to abort, because we already set
>> the skip bit in the previous call. This would include the
>> cc->finish_pageblock rescan cases.
>> 
>> So unless I miss something that already prevents that, I agree we should not
>> tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not
>> pageblock aligned, we should ignore the already-set skip bit, as it was most
>> likely being set by us in the previous iteration and should not prevent us
>> from finishing the pageblock?
>> 
> 
> Hmm, I think you're right. While it should not hit the original bug,
> migration candidates are missed until the next compaction scan which
> could be tricky to detect. Something like this as a separate patch?
> Build tested only but the intent is for an unaligned start to set the skip
> bet if already unset but otherwise complete the scan. Like earlier fixes,
> this might overscan some pageblocks in a given context but we are probably
> hitting the limits on how compaction can run efficiently in the current
> scheme without causing other side-effects :(

Yeah that should work! I think it should be even folded to 3/4 but if you
want separate, fine too.

> diff --git a/mm/compaction.c b/mm/compaction.c
> index 91af6a8b7a98..761a2dd7d78a 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -792,6 +792,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  	bool skip_on_failure = false;
>  	unsigned long next_skip_pfn = 0;
>  	bool skip_updated = false;
> +	bool start_aligned;
>  	int ret = 0;
>  
>  	cc->migrate_pfn = low_pfn;
> @@ -824,6 +825,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  	}
>  
>  	/* Time to isolate some pages for migration */
> +	start_aligned = pageblock_aligned(start_pfn);
>  	for (; low_pfn < end_pfn; low_pfn++) {
>  
>  		if (skip_on_failure && low_pfn >= next_skip_pfn) {
> @@ -1069,10 +1071,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  
>  			lruvec_memcg_debug(lruvec, page_folio(page));
>  
> -			/* Try get exclusive access under lock */
> +			/* Try get exclusive access under lock. Isolation is
> +			 * only aborted if the start was pageblock aligned
> +			 * as this may be a partial resumed scan that set
> +			 * the bit on a recent scan but the scan must reach
> +			 * the end of the pageblock.
> +			 */
>  			if (!skip_updated && valid_page) {
>  				skip_updated = true;
> -				if (test_and_set_skip(cc, valid_page))
> +				if (test_and_set_skip(cc, valid_page) && start_aligned)
>  					goto isolate_abort;
>  			}
>  


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] mm: compaction: Update pageblock skip when first migration candidate is not at the start
  2023-05-29 12:43       ` Vlastimil Babka
@ 2023-06-02 11:16         ` Mel Gorman
  2023-06-02 12:19           ` Vlastimil Babka
  0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2023-06-02 11:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On Mon, May 29, 2023 at 02:43:48PM +0200, Vlastimil Babka wrote:
> On 5/29/23 12:33, Mel Gorman wrote:
> > On Thu, May 25, 2023 at 03:37:43PM +0200, Vlastimil Babka wrote:
> >> On 5/15/23 13:33, Mel Gorman wrote:
> >> > isolate_migratepages_block should mark a pageblock as skip if scanning
> >> > started on an aligned pageblock boundary but it only updates the skip
> >> > flag if the first migration candidate is also aligned. Tracing during
> >> > a compaction stress load (mmtests: workload-usemem-stress-numa-compact)
> >> > that many pageblocks are not marked skip causing excessive scanning of
> >> > blocks that had been recently checked. Update pageblock skip based on
> >> > "valid_page" which is set if scanning started on a pageblock boundary.
> >> > 
> >> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> >> 
> >> I wonder if this has an unintended side-effect that if we resume
> >> isolate_migratepages_block() of a partially compacted pageblock to finish
> >> it, test_and_set_skip() will now tell us to abort, because we already set
> >> the skip bit in the previous call. This would include the
> >> cc->finish_pageblock rescan cases.
> >> 
> >> So unless I miss something that already prevents that, I agree we should not
> >> tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not
> >> pageblock aligned, we should ignore the already-set skip bit, as it was most
> >> likely being set by us in the previous iteration and should not prevent us
> >> from finishing the pageblock?
> >> 
> > 
> > Hmm, I think you're right. While it should not hit the original bug,
> > migration candidates are missed until the next compaction scan which
> > could be tricky to detect. Something like this as a separate patch?
> > Build tested only but the intent is for an unaligned start to set the skip
> > bet if already unset but otherwise complete the scan. Like earlier fixes,
> > this might overscan some pageblocks in a given context but we are probably
> > hitting the limits on how compaction can run efficiently in the current
> > scheme without causing other side-effects :(
> 
> Yeah that should work! I think it should be even folded to 3/4 but if you
> want separate, fine too.
> 

I was not happy with the test results so limited the scope of the patch
which performed much better both in terms of absolute performance and
compaction activity. Are you still ok with this version?

Thanks

--8<--
mm: compaction: Update pageblock skip when first migration candidate is not at the start -fix

Vlastimil Babka pointed out the following problem with
mm-compaction-update-pageblock-skip-when-first-migration-candidate-is-not-at-the-start.patch

	I wonder if this has an unintended side-effect that if we resume
	isolate_migratepages_block() of a partially compacted pageblock
	to finish it, test_and_set_skip() will now tell us to abort,
	because we already set the skip bit in the previous call. This
	would include the cc->finish_pageblock rescan cases.

He is correct and a partial rescan as implemented in "mm, compaction:
finish pageblocks on complete migration failure" would abort
prematurely.

Test and set the skip bit when acquiring "exclusive access" to a pageblock
for migration but only abort if the calling context is rescanning to
finish a pageblock.

This is a fix for the mmotm patch
mm-compaction-update-pageblock-skip-when-first-migration-candidate-is-not-at-the-start.patch

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 91af6a8b7a98..300aa968a0cf 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1069,11 +1069,17 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 			lruvec_memcg_debug(lruvec, page_folio(page));
 
-			/* Try get exclusive access under lock */
+			/*
+			 * Try get exclusive access under lock. If marked for
+			 * skip, the scan is aborted unless the current context
+			 * is a rescan to reach the end of the pageblock.
+			 */
 			if (!skip_updated && valid_page) {
 				skip_updated = true;
-				if (test_and_set_skip(cc, valid_page))
+				if (test_and_set_skip(cc, valid_page) &&
+				    !cc->finish_pageblock) {
 					goto isolate_abort;
+				}
 			}
 
 			/*

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] mm: compaction: Update pageblock skip when first migration candidate is not at the start
  2023-06-02 11:16         ` Mel Gorman
@ 2023-06-02 12:19           ` Vlastimil Babka
  2023-06-02 12:48             ` Mel Gorman
  0 siblings, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2023-06-02 12:19 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On 6/2/23 13:16, Mel Gorman wrote:
> On Mon, May 29, 2023 at 02:43:48PM +0200, Vlastimil Babka wrote:
>> On 5/29/23 12:33, Mel Gorman wrote:
>> > On Thu, May 25, 2023 at 03:37:43PM +0200, Vlastimil Babka wrote:
>> >> On 5/15/23 13:33, Mel Gorman wrote:
>> >> > isolate_migratepages_block should mark a pageblock as skip if scanning
>> >> > started on an aligned pageblock boundary but it only updates the skip
>> >> > flag if the first migration candidate is also aligned. Tracing during
>> >> > a compaction stress load (mmtests: workload-usemem-stress-numa-compact)
>> >> > that many pageblocks are not marked skip causing excessive scanning of
>> >> > blocks that had been recently checked. Update pageblock skip based on
>> >> > "valid_page" which is set if scanning started on a pageblock boundary.
>> >> > 
>> >> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
>> >> 
>> >> I wonder if this has an unintended side-effect that if we resume
>> >> isolate_migratepages_block() of a partially compacted pageblock to finish
>> >> it, test_and_set_skip() will now tell us to abort, because we already set
>> >> the skip bit in the previous call. This would include the
>> >> cc->finish_pageblock rescan cases.
>> >> 
>> >> So unless I miss something that already prevents that, I agree we should not
>> >> tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not
>> >> pageblock aligned, we should ignore the already-set skip bit, as it was most
>> >> likely being set by us in the previous iteration and should not prevent us
>> >> from finishing the pageblock?
>> >> 
>> > 
>> > Hmm, I think you're right. While it should not hit the original bug,
>> > migration candidates are missed until the next compaction scan which
>> > could be tricky to detect. Something like this as a separate patch?
>> > Build tested only but the intent is for an unaligned start to set the skip
>> > bet if already unset but otherwise complete the scan. Like earlier fixes,
>> > this might overscan some pageblocks in a given context but we are probably
>> > hitting the limits on how compaction can run efficiently in the current
>> > scheme without causing other side-effects :(
>> 
>> Yeah that should work! I think it should be even folded to 3/4 but if you
>> want separate, fine too.
>> 
> 
> I was not happy with the test results so limited the scope of the patch
> which performed much better both in terms of absolute performance and
> compaction activity.

That's surprising. Does that mean that if we isolate COMPACT_CLUSTER_MAX
pages, migrate them without failing, but it's not enough to succeed (i.e.
there are more pages we need to migrate to free up a whole pageblock), it's
better to give up on the rest of the pageblock rather than continue? As
that's AFAIU the scenario where cc->finish_pageblock is false when we
revisit an unfinished pageblock.

Are you still ok with this version?

It's better than previously in that cc->finish_pageblock == true case is not
sabotaged anymore. But the result as described above seems to be a weird
non-intuitive and non-obvious heuristic. How did the test differences look like?

> Thanks
> 
> --8<--
> mm: compaction: Update pageblock skip when first migration candidate is not at the start -fix
> 
> Vlastimil Babka pointed out the following problem with
> mm-compaction-update-pageblock-skip-when-first-migration-candidate-is-not-at-the-start.patch
> 
> 	I wonder if this has an unintended side-effect that if we resume
> 	isolate_migratepages_block() of a partially compacted pageblock
> 	to finish it, test_and_set_skip() will now tell us to abort,
> 	because we already set the skip bit in the previous call. This
> 	would include the cc->finish_pageblock rescan cases.
> 
> He is correct and a partial rescan as implemented in "mm, compaction:
> finish pageblocks on complete migration failure" would abort
> prematurely.
> 
> Test and set the skip bit when acquiring "exclusive access" to a pageblock
> for migration but only abort if the calling context is rescanning to
> finish a pageblock.

Should it say NOT rescanning to finish a pageblock?

> This is a fix for the mmotm patch
> mm-compaction-update-pageblock-skip-when-first-migration-candidate-is-not-at-the-start.patch
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/compaction.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 91af6a8b7a98..300aa968a0cf 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1069,11 +1069,17 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  
>  			lruvec_memcg_debug(lruvec, page_folio(page));
>  
> -			/* Try get exclusive access under lock */
> +			/*
> +			 * Try get exclusive access under lock. If marked for
> +			 * skip, the scan is aborted unless the current context
> +			 * is a rescan to reach the end of the pageblock.
> +			 */
>  			if (!skip_updated && valid_page) {
>  				skip_updated = true;
> -				if (test_and_set_skip(cc, valid_page))
> +				if (test_and_set_skip(cc, valid_page) &&
> +				    !cc->finish_pageblock) {
>  					goto isolate_abort;
> +				}
>  			}
>  
>  			/*


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] mm: compaction: Update pageblock skip when first migration candidate is not at the start
  2023-06-02 12:19           ` Vlastimil Babka
@ 2023-06-02 12:48             ` Mel Gorman
  2023-06-06 13:11               ` Vlastimil Babka
  0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2023-06-02 12:48 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On Fri, Jun 02, 2023 at 02:19:00PM +0200, Vlastimil Babka wrote:
> On 6/2/23 13:16, Mel Gorman wrote:
> > On Mon, May 29, 2023 at 02:43:48PM +0200, Vlastimil Babka wrote:
> >> On 5/29/23 12:33, Mel Gorman wrote:
> >> > On Thu, May 25, 2023 at 03:37:43PM +0200, Vlastimil Babka wrote:
> >> >> On 5/15/23 13:33, Mel Gorman wrote:
> >> >> > isolate_migratepages_block should mark a pageblock as skip if scanning
> >> >> > started on an aligned pageblock boundary but it only updates the skip
> >> >> > flag if the first migration candidate is also aligned. Tracing during
> >> >> > a compaction stress load (mmtests: workload-usemem-stress-numa-compact)
> >> >> > that many pageblocks are not marked skip causing excessive scanning of
> >> >> > blocks that had been recently checked. Update pageblock skip based on
> >> >> > "valid_page" which is set if scanning started on a pageblock boundary.
> >> >> > 
> >> >> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> >> >> 
> >> >> I wonder if this has an unintended side-effect that if we resume
> >> >> isolate_migratepages_block() of a partially compacted pageblock to finish
> >> >> it, test_and_set_skip() will now tell us to abort, because we already set
> >> >> the skip bit in the previous call. This would include the
> >> >> cc->finish_pageblock rescan cases.
> >> >> 
> >> >> So unless I miss something that already prevents that, I agree we should not
> >> >> tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not
> >> >> pageblock aligned, we should ignore the already-set skip bit, as it was most
> >> >> likely being set by us in the previous iteration and should not prevent us
> >> >> from finishing the pageblock?
> >> >> 
> >> > 
> >> > Hmm, I think you're right. While it should not hit the original bug,
> >> > migration candidates are missed until the next compaction scan which
> >> > could be tricky to detect. Something like this as a separate patch?
> >> > Build tested only but the intent is for an unaligned start to set the skip
> >> > bet if already unset but otherwise complete the scan. Like earlier fixes,
> >> > this might overscan some pageblocks in a given context but we are probably
> >> > hitting the limits on how compaction can run efficiently in the current
> >> > scheme without causing other side-effects :(
> >> 
> >> Yeah that should work! I think it should be even folded to 3/4 but if you
> >> want separate, fine too.
> >> 
> > 
> > I was not happy with the test results so limited the scope of the patch
> > which performed much better both in terms of absolute performance and
> > compaction activity.
> 
> That's surprising. Does that mean that if we isolate COMPACT_CLUSTER_MAX
> pages, migrate them without failing, but it's not enough to succeed (i.e.
> there are more pages we need to migrate to free up a whole pageblock), it's
> better to give up on the rest of the pageblock rather than continue?

I don't have precise enough data to answer that with certainty but probably
yes, at least in terms of scan activity. The first version had spikes of
pages scanned for migration that are not always reproducible and not on
all machines.

> As
> that's AFAIU the scenario where cc->finish_pageblock is false when we
> revisit an unfinished pageblock.
> 
> > Are you still ok with this version?
> 
> It's better than previously in that cc->finish_pageblock == true case is not
> sabotaged anymore. But the result as described above seems to be a weird
> non-intuitive and non-obvious heuristic. How did the test differences look like?
> 

5 machines were tested in all (different Intel and AMD generations),
2 showed unexpected spikes in scan activity. 1 showed high migration
scan counts for workloads workload-thpchallenge-kernel-build-xfs
and workload-usemem-stress-numa-compact. They are both basically
compaction stressors with the first one using a kernel build
in the background to generate noise. For the second machine, only
workload-usemem-stress-numa-compact was affected. In all other test cases
and machines, the patches were equivalent in terms of behaviour but the
extreme counter-examples led be to conclude the fix should be as constrained
as possible unless there is a good reason to do otherwise.

> > Vlastimil Babka pointed out the following problem with
> > mm-compaction-update-pageblock-skip-when-first-migration-candidate-is-not-at-the-start.patch
> > 
> > 	I wonder if this has an unintended side-effect that if we resume
> > 	isolate_migratepages_block() of a partially compacted pageblock
> > 	to finish it, test_and_set_skip() will now tell us to abort,
> > 	because we already set the skip bit in the previous call. This
> > 	would include the cc->finish_pageblock rescan cases.
> > 
> > He is correct and a partial rescan as implemented in "mm, compaction:
> > finish pageblocks on complete migration failure" would abort
> > prematurely.
> > 
> > Test and set the skip bit when acquiring "exclusive access" to a pageblock
> > for migration but only abort if the calling context is rescanning to
> > finish a pageblock.
> 
> Should it say NOT rescanning to finish a pageblock?
> 

Yep, it should. The sentence was a last minute update before pushing
send :(

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] mm: compaction: Update pageblock skip when first migration candidate is not at the start
  2023-06-02 12:48             ` Mel Gorman
@ 2023-06-06 13:11               ` Vlastimil Babka
  2023-06-07  3:38                 ` Baolin Wang
  2023-06-07 12:24                 ` Mel Gorman
  0 siblings, 2 replies; 20+ messages in thread
From: Vlastimil Babka @ 2023-06-06 13:11 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On 6/2/23 14:48, Mel Gorman wrote:
> On Fri, Jun 02, 2023 at 02:19:00PM +0200, Vlastimil Babka wrote:
>> On 6/2/23 13:16, Mel Gorman wrote:
>> > On Mon, May 29, 2023 at 02:43:48PM +0200, Vlastimil Babka wrote:
>> >> On 5/29/23 12:33, Mel Gorman wrote:
>> >> > On Thu, May 25, 2023 at 03:37:43PM +0200, Vlastimil Babka wrote:
>> >> >> On 5/15/23 13:33, Mel Gorman wrote:
>> >> >> > isolate_migratepages_block should mark a pageblock as skip if scanning
>> >> >> > started on an aligned pageblock boundary but it only updates the skip
>> >> >> > flag if the first migration candidate is also aligned. Tracing during
>> >> >> > a compaction stress load (mmtests: workload-usemem-stress-numa-compact)
>> >> >> > that many pageblocks are not marked skip causing excessive scanning of
>> >> >> > blocks that had been recently checked. Update pageblock skip based on
>> >> >> > "valid_page" which is set if scanning started on a pageblock boundary.
>> >> >> > 
>> >> >> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
>> >> >> 
>> >> >> I wonder if this has an unintended side-effect that if we resume
>> >> >> isolate_migratepages_block() of a partially compacted pageblock to finish
>> >> >> it, test_and_set_skip() will now tell us to abort, because we already set
>> >> >> the skip bit in the previous call. This would include the
>> >> >> cc->finish_pageblock rescan cases.
>> >> >> 
>> >> >> So unless I miss something that already prevents that, I agree we should not
>> >> >> tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not
>> >> >> pageblock aligned, we should ignore the already-set skip bit, as it was most
>> >> >> likely being set by us in the previous iteration and should not prevent us
>> >> >> from finishing the pageblock?
>> >> >> 
>> >> > 
>> >> > Hmm, I think you're right. While it should not hit the original bug,
>> >> > migration candidates are missed until the next compaction scan which
>> >> > could be tricky to detect. Something like this as a separate patch?
>> >> > Build tested only but the intent is for an unaligned start to set the skip
>> >> > bet if already unset but otherwise complete the scan. Like earlier fixes,
>> >> > this might overscan some pageblocks in a given context but we are probably
>> >> > hitting the limits on how compaction can run efficiently in the current
>> >> > scheme without causing other side-effects :(
>> >> 
>> >> Yeah that should work! I think it should be even folded to 3/4 but if you
>> >> want separate, fine too.
>> >> 
>> > 
>> > I was not happy with the test results so limited the scope of the patch
>> > which performed much better both in terms of absolute performance and
>> > compaction activity.
>> 
>> That's surprising. Does that mean that if we isolate COMPACT_CLUSTER_MAX
>> pages, migrate them without failing, but it's not enough to succeed (i.e.
>> there are more pages we need to migrate to free up a whole pageblock), it's
>> better to give up on the rest of the pageblock rather than continue?
> 
> I don't have precise enough data to answer that with certainty but probably
> yes, at least in terms of scan activity. The first version had spikes of
> pages scanned for migration that are not always reproducible and not on
> all machines.

Well, that kinda sucks. So for the patch (with adding the missing NOT below).

Acked-by: Vlastimil Babka <vbabka@suse.cz>

But in raises a question whether we should terminate compaction under the
right conditions after a successful migration immediately, rather than
invoke another iteration of isolate_migratepages_block() where we could skip
over some pages uselessly only to abort at first valid page due to the skip bit.
It would save some cycles and be much more obvious than now, where anyone
trying to understand how it works in detail might conclude it's an oversight?

>> As
>> that's AFAIU the scenario where cc->finish_pageblock is false when we
>> revisit an unfinished pageblock.
>> 
>> > Are you still ok with this version?
>> 
>> It's better than previously in that cc->finish_pageblock == true case is not
>> sabotaged anymore. But the result as described above seems to be a weird
>> non-intuitive and non-obvious heuristic. How did the test differences look like?
>> 
> 
> 5 machines were tested in all (different Intel and AMD generations),
> 2 showed unexpected spikes in scan activity. 1 showed high migration
> scan counts for workloads workload-thpchallenge-kernel-build-xfs
> and workload-usemem-stress-numa-compact. They are both basically
> compaction stressors with the first one using a kernel build
> in the background to generate noise. For the second machine, only
> workload-usemem-stress-numa-compact was affected. In all other test cases
> and machines, the patches were equivalent in terms of behaviour but the
> extreme counter-examples led be to conclude the fix should be as constrained
> as possible unless there is a good reason to do otherwise.
> 
>> > Vlastimil Babka pointed out the following problem with
>> > mm-compaction-update-pageblock-skip-when-first-migration-candidate-is-not-at-the-start.patch
>> > 
>> > 	I wonder if this has an unintended side-effect that if we resume
>> > 	isolate_migratepages_block() of a partially compacted pageblock
>> > 	to finish it, test_and_set_skip() will now tell us to abort,
>> > 	because we already set the skip bit in the previous call. This
>> > 	would include the cc->finish_pageblock rescan cases.
>> > 
>> > He is correct and a partial rescan as implemented in "mm, compaction:
>> > finish pageblocks on complete migration failure" would abort
>> > prematurely.
>> > 
>> > Test and set the skip bit when acquiring "exclusive access" to a pageblock
>> > for migration but only abort if the calling context is rescanning to
>> > finish a pageblock.
>> 
>> Should it say NOT rescanning to finish a pageblock?
>> 
> 
> Yep, it should. The sentence was a last minute update before pushing
> send :(
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] mm: compaction: Update pageblock skip when first migration candidate is not at the start
  2023-06-06 13:11               ` Vlastimil Babka
@ 2023-06-07  3:38                 ` Baolin Wang
  2023-06-07 12:24                 ` Mel Gorman
  1 sibling, 0 replies; 20+ messages in thread
From: Baolin Wang @ 2023-06-07  3:38 UTC (permalink / raw)
  To: Vlastimil Babka, Mel Gorman
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML



On 6/6/2023 9:11 PM, Vlastimil Babka wrote:
> On 6/2/23 14:48, Mel Gorman wrote:
>> On Fri, Jun 02, 2023 at 02:19:00PM +0200, Vlastimil Babka wrote:
>>> On 6/2/23 13:16, Mel Gorman wrote:
>>>> On Mon, May 29, 2023 at 02:43:48PM +0200, Vlastimil Babka wrote:
>>>>> On 5/29/23 12:33, Mel Gorman wrote:
>>>>>> On Thu, May 25, 2023 at 03:37:43PM +0200, Vlastimil Babka wrote:
>>>>>>> On 5/15/23 13:33, Mel Gorman wrote:
>>>>>>>> isolate_migratepages_block should mark a pageblock as skip if scanning
>>>>>>>> started on an aligned pageblock boundary but it only updates the skip
>>>>>>>> flag if the first migration candidate is also aligned. Tracing during
>>>>>>>> a compaction stress load (mmtests: workload-usemem-stress-numa-compact)
>>>>>>>> that many pageblocks are not marked skip causing excessive scanning of
>>>>>>>> blocks that had been recently checked. Update pageblock skip based on
>>>>>>>> "valid_page" which is set if scanning started on a pageblock boundary.
>>>>>>>>
>>>>>>>> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
>>>>>>>
>>>>>>> I wonder if this has an unintended side-effect that if we resume
>>>>>>> isolate_migratepages_block() of a partially compacted pageblock to finish
>>>>>>> it, test_and_set_skip() will now tell us to abort, because we already set
>>>>>>> the skip bit in the previous call. This would include the
>>>>>>> cc->finish_pageblock rescan cases.
>>>>>>>
>>>>>>> So unless I miss something that already prevents that, I agree we should not
>>>>>>> tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not
>>>>>>> pageblock aligned, we should ignore the already-set skip bit, as it was most
>>>>>>> likely being set by us in the previous iteration and should not prevent us
>>>>>>> from finishing the pageblock?
>>>>>>>
>>>>>>
>>>>>> Hmm, I think you're right. While it should not hit the original bug,
>>>>>> migration candidates are missed until the next compaction scan which
>>>>>> could be tricky to detect. Something like this as a separate patch?
>>>>>> Build tested only but the intent is for an unaligned start to set the skip
>>>>>> bet if already unset but otherwise complete the scan. Like earlier fixes,
>>>>>> this might overscan some pageblocks in a given context but we are probably
>>>>>> hitting the limits on how compaction can run efficiently in the current
>>>>>> scheme without causing other side-effects :(
>>>>>
>>>>> Yeah that should work! I think it should be even folded to 3/4 but if you
>>>>> want separate, fine too.
>>>>>
>>>>
>>>> I was not happy with the test results so limited the scope of the patch
>>>> which performed much better both in terms of absolute performance and
>>>> compaction activity.
>>>
>>> That's surprising. Does that mean that if we isolate COMPACT_CLUSTER_MAX
>>> pages, migrate them without failing, but it's not enough to succeed (i.e.
>>> there are more pages we need to migrate to free up a whole pageblock), it's
>>> better to give up on the rest of the pageblock rather than continue?
>>
>> I don't have precise enough data to answer that with certainty but probably
>> yes, at least in terms of scan activity. The first version had spikes of
>> pages scanned for migration that are not always reproducible and not on
>> all machines.
> 
> Well, that kinda sucks. So for the patch (with adding the missing NOT below).
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> But in raises a question whether we should terminate compaction under the
> right conditions after a successful migration immediately, rather than
> invoke another iteration of isolate_migratepages_block() where we could skip
> over some pages uselessly only to abort at first valid page due to the skip bit.
> It would save some cycles and be much more obvious than now, where anyone
> trying to understand how it works in detail might conclude it's an oversight?

IIUC, for direct compactor, we can terminate compaction if there is a 
suitable free page, that means other pages scanning in this pageblock 
are uselessly like you said, and we can skip them to save cpu cycles. So 
we can remove below COMPACT_CONTINUE validation in __compact_finished():

	/*
	 * Always finish scanning a pageblock to reduce the possibility of
	 * fallbacks in the future. This is particularly important when
	 * migration source is unmovable/reclaimable but it's not worth
	 * special casing.
	 */
	if (!pageblock_aligned(cc->migrate_pfn))
		return COMPACT_CONTINUE;

And in this case, we also should call update_cached_migrate() to record 
the partial cc->migrate_pfn, which is the start position for next 
scanning. Please correct me if I missed something. Thanks.

>>> As
>>> that's AFAIU the scenario where cc->finish_pageblock is false when we
>>> revisit an unfinished pageblock.
>>>
>>>> Are you still ok with this version?
>>>
>>> It's better than previously in that cc->finish_pageblock == true case is not
>>> sabotaged anymore. But the result as described above seems to be a weird
>>> non-intuitive and non-obvious heuristic. How did the test differences look like?
>>>
>>
>> 5 machines were tested in all (different Intel and AMD generations),
>> 2 showed unexpected spikes in scan activity. 1 showed high migration
>> scan counts for workloads workload-thpchallenge-kernel-build-xfs
>> and workload-usemem-stress-numa-compact. They are both basically
>> compaction stressors with the first one using a kernel build
>> in the background to generate noise. For the second machine, only
>> workload-usemem-stress-numa-compact was affected. In all other test cases
>> and machines, the patches were equivalent in terms of behaviour but the
>> extreme counter-examples led be to conclude the fix should be as constrained
>> as possible unless there is a good reason to do otherwise.
>>
>>>> Vlastimil Babka pointed out the following problem with
>>>> mm-compaction-update-pageblock-skip-when-first-migration-candidate-is-not-at-the-start.patch
>>>>
>>>> 	I wonder if this has an unintended side-effect that if we resume
>>>> 	isolate_migratepages_block() of a partially compacted pageblock
>>>> 	to finish it, test_and_set_skip() will now tell us to abort,
>>>> 	because we already set the skip bit in the previous call. This
>>>> 	would include the cc->finish_pageblock rescan cases.
>>>>
>>>> He is correct and a partial rescan as implemented in "mm, compaction:
>>>> finish pageblocks on complete migration failure" would abort
>>>> prematurely.
>>>>
>>>> Test and set the skip bit when acquiring "exclusive access" to a pageblock
>>>> for migration but only abort if the calling context is rescanning to
>>>> finish a pageblock.
>>>
>>> Should it say NOT rescanning to finish a pageblock?
>>>
>>
>> Yep, it should. The sentence was a last minute update before pushing
>> send :(
>>
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] mm: compaction: Update pageblock skip when first migration candidate is not at the start
  2023-06-06 13:11               ` Vlastimil Babka
  2023-06-07  3:38                 ` Baolin Wang
@ 2023-06-07 12:24                 ` Mel Gorman
  1 sibling, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2023-06-07 12:24 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Jiri Slaby, Maxim Levitsky, Michal Hocko,
	Pedro Falcato, Paolo Bonzini, Chuyi Zhou, Linux-MM, LKML

On Tue, Jun 06, 2023 at 03:11:27PM +0200, Vlastimil Babka wrote:
> On 6/2/23 14:48, Mel Gorman wrote:
> > On Fri, Jun 02, 2023 at 02:19:00PM +0200, Vlastimil Babka wrote:
> >> On 6/2/23 13:16, Mel Gorman wrote:
> >> > On Mon, May 29, 2023 at 02:43:48PM +0200, Vlastimil Babka wrote:
> >> >> On 5/29/23 12:33, Mel Gorman wrote:
> >> >> > On Thu, May 25, 2023 at 03:37:43PM +0200, Vlastimil Babka wrote:
> >> >> >> On 5/15/23 13:33, Mel Gorman wrote:
> >> >> >> > isolate_migratepages_block should mark a pageblock as skip if scanning
> >> >> >> > started on an aligned pageblock boundary but it only updates the skip
> >> >> >> > flag if the first migration candidate is also aligned. Tracing during
> >> >> >> > a compaction stress load (mmtests: workload-usemem-stress-numa-compact)
> >> >> >> > that many pageblocks are not marked skip causing excessive scanning of
> >> >> >> > blocks that had been recently checked. Update pageblock skip based on
> >> >> >> > "valid_page" which is set if scanning started on a pageblock boundary.
> >> >> >> > 
> >> >> >> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> >> >> >> 
> >> >> >> I wonder if this has an unintended side-effect that if we resume
> >> >> >> isolate_migratepages_block() of a partially compacted pageblock to finish
> >> >> >> it, test_and_set_skip() will now tell us to abort, because we already set
> >> >> >> the skip bit in the previous call. This would include the
> >> >> >> cc->finish_pageblock rescan cases.
> >> >> >> 
> >> >> >> So unless I miss something that already prevents that, I agree we should not
> >> >> >> tie setting the skip bit to pageblock_aligned(pfn), but maybe if we are not
> >> >> >> pageblock aligned, we should ignore the already-set skip bit, as it was most
> >> >> >> likely being set by us in the previous iteration and should not prevent us
> >> >> >> from finishing the pageblock?
> >> >> >> 
> >> >> > 
> >> >> > Hmm, I think you're right. While it should not hit the original bug,
> >> >> > migration candidates are missed until the next compaction scan which
> >> >> > could be tricky to detect. Something like this as a separate patch?
> >> >> > Build tested only but the intent is for an unaligned start to set the skip
> >> >> > bet if already unset but otherwise complete the scan. Like earlier fixes,
> >> >> > this might overscan some pageblocks in a given context but we are probably
> >> >> > hitting the limits on how compaction can run efficiently in the current
> >> >> > scheme without causing other side-effects :(
> >> >> 
> >> >> Yeah that should work! I think it should be even folded to 3/4 but if you
> >> >> want separate, fine too.
> >> >> 
> >> > 
> >> > I was not happy with the test results so limited the scope of the patch
> >> > which performed much better both in terms of absolute performance and
> >> > compaction activity.
> >> 
> >> That's surprising. Does that mean that if we isolate COMPACT_CLUSTER_MAX
> >> pages, migrate them without failing, but it's not enough to succeed (i.e.
> >> there are more pages we need to migrate to free up a whole pageblock), it's
> >> better to give up on the rest of the pageblock rather than continue?
> > 
> > I don't have precise enough data to answer that with certainty but probably
> > yes, at least in terms of scan activity. The first version had spikes of
> > pages scanned for migration that are not always reproducible and not on
> > all machines.
> 
> Well, that kinda sucks. So for the patch (with adding the missing NOT below).
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> But in raises a question whether we should terminate compaction under the
> right conditions after a successful migration immediately, rather than
> invoke another iteration of isolate_migratepages_block() where we could skip
> over some pages uselessly only to abort at first valid page due to the skip bit.
> It would save some cycles and be much more obvious than now, where anyone
> trying to understand how it works in detail might conclude it's an oversight?
> 

It sounds like a solid idea and would be a good standalone patch with
the usual supporting data. At a quick glance, the check for a page
stored on compact_control happens surprisingly late which makes me think
we probably over-compact in direct compaction in particular. It would
need supporting data because it probably means that compaction cost gets
spread over multiple tasks requiring high-order pages instead of one
unlucky task doing compaction works that unrelated tasks indirectly
benefit from. It's probably more sensible behaviour that tasks requiring
high-order pages pay the cost if kcompactd cannot keep up but supporting
data would tell us one way or the other.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-06-07 12:25 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-15 11:33 [PATCH 0/4] Follow-up "Fix excessive CPU usage during compaction" Mel Gorman
2023-05-15 11:33 ` [PATCH 1/4] mm: compaction: Ensure rescanning only happens on partially scanned pageblocks Mel Gorman
2023-05-25  9:57   ` Vlastimil Babka
2023-05-15 11:33 ` [PATCH 2/4] mm: compaction: Only force pageblock scan completion when skip hints are obeyed Mel Gorman
2023-05-25 10:01   ` Vlastimil Babka
2023-05-15 11:33 ` [PATCH 3/4] mm: compaction: Update pageblock skip when first migration candidate is not at the start Mel Gorman
2023-05-25 13:37   ` Vlastimil Babka
2023-05-29 10:33     ` Mel Gorman
2023-05-29 12:43       ` Vlastimil Babka
2023-06-02 11:16         ` Mel Gorman
2023-06-02 12:19           ` Vlastimil Babka
2023-06-02 12:48             ` Mel Gorman
2023-06-06 13:11               ` Vlastimil Babka
2023-06-07  3:38                 ` Baolin Wang
2023-06-07 12:24                 ` Mel Gorman
2023-05-15 11:33 ` [PATCH 4/4] Revert "Revert "mm/compaction: fix set skip in fast_find_migrateblock"" Mel Gorman
2023-05-25 13:42   ` Vlastimil Babka
2023-05-19  6:43 ` [PATCH 0/4] Follow-up "Fix excessive CPU usage during compaction" Raghavendra K T
2023-05-21 19:20   ` Mel Gorman
2023-05-23 13:47 ` Baolin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).