All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for 6.1 regression] Revert "mm/compaction: fix set skip in fast_find_migrateblock"
@ 2023-01-13 17:33 Vlastimil Babka
  2023-01-14  6:49 ` Vlastimil Babka
  0 siblings, 1 reply; 3+ messages in thread
From: Vlastimil Babka @ 2023-01-13 17:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: patches, linux-mm, linux-kernel, regressions, Jiri Slaby,
	Maxim Levitsky, Mel Gorman, Michal Hocko, Pedro Falcato,
	Paolo Bonzini, Chuyi Zhou, Vlastimil Babka, stable

This reverts commit 7efc3b7261030da79001c00d92bc3392fd6c664c.

We have got openSUSE reports (Link 1) for 6.1 kernel with khugepaged
stalling CPU for long periods of time. Investigation of tracepoint data
shows that compaction is stuck in repeating fast_find_migrateblock()
based migrate page isolation, and then fails to migrate all isolated
pages. Commit 7efc3b726103 ("mm/compaction: fix set skip in
fast_find_migrateblock") was suspected as it was merged in 6.1 and in
theory can indeed remove a termination condition for
fast_find_migrateblock() under certain conditions, as it removes a place
that always marks a scanned pageblock from being re-scanned. There are
other such places, but those can be skipped under certain conditions,
which seems to match the tracepoint data.

Testing of revert also appears to have resolved the issue, thus revert
the commit until a more robust solution for the original problem is
developed.

It's also likely this will fix qemu stalls with 6.1 kernel reported in
Link 2, but that is not yet confirmed.

Link: https://bugzilla.suse.com/show_bug.cgi?id=1206848
Link: https://lore.kernel.org/kvm/b8017e09-f336-3035-8344-c549086c2340@kernel.org/
Fixes: 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
Cc: <stable@vger.kernel.org>
---
 mm/compaction.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index ca1603524bbe..8238e83385a7 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1839,6 +1839,7 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
 					pfn = cc->zone->zone_start_pfn;
 				cc->fast_search_fail = 0;
 				found_block = true;
+				set_pageblock_skip(freepage);
 				break;
 			}
 		}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH for 6.1 regression] Revert "mm/compaction: fix set skip in fast_find_migrateblock"
  2023-01-13 17:33 [PATCH for 6.1 regression] Revert "mm/compaction: fix set skip in fast_find_migrateblock" Vlastimil Babka
@ 2023-01-14  6:49 ` Vlastimil Babka
  2023-01-14  8:08   ` Pedro Falcato
  0 siblings, 1 reply; 3+ messages in thread
From: Vlastimil Babka @ 2023-01-14  6:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: patches, linux-mm, linux-kernel, regressions, Jiri Slaby,
	Maxim Levitsky, Mel Gorman, Michal Hocko, Pedro Falcato,
	Paolo Bonzini, Chuyi Zhou, stable

On 1/13/23 18:33, Vlastimil Babka wrote:
> This reverts commit 7efc3b7261030da79001c00d92bc3392fd6c664c.
> 
> We have got openSUSE reports (Link 1) for 6.1 kernel with khugepaged
> stalling CPU for long periods of time. Investigation of tracepoint data
> shows that compaction is stuck in repeating fast_find_migrateblock()
> based migrate page isolation, and then fails to migrate all isolated
> pages. Commit 7efc3b726103 ("mm/compaction: fix set skip in
> fast_find_migrateblock") was suspected as it was merged in 6.1 and in
> theory can indeed remove a termination condition for
> fast_find_migrateblock() under certain conditions, as it removes a place
> that always marks a scanned pageblock from being re-scanned. There are
> other such places, but those can be skipped under certain conditions,
> which seems to match the tracepoint data.
> 
> Testing of revert also appears to have resolved the issue, thus revert
> the commit until a more robust solution for the original problem is
> developed.
> 
> It's also likely this will fix qemu stalls with 6.1 kernel reported in
> Link 2, but that is not yet confirmed.
> 
> Link: https://bugzilla.suse.com/show_bug.cgi?id=1206848
> Link: https://lore.kernel.org/kvm/b8017e09-f336-3035-8344-c549086c2340@kernel.org/
> Fixes: 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
> Cc: <stable@vger.kernel.org>

Oops, forgot:

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/compaction.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ca1603524bbe..8238e83385a7 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1839,6 +1839,7 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
>  					pfn = cc->zone->zone_start_pfn;
>  				cc->fast_search_fail = 0;
>  				found_block = true;
> +				set_pageblock_skip(freepage);
>  				break;
>  			}
>  		}


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH for 6.1 regression] Revert "mm/compaction: fix set skip in fast_find_migrateblock"
  2023-01-14  6:49 ` Vlastimil Babka
@ 2023-01-14  8:08   ` Pedro Falcato
  0 siblings, 0 replies; 3+ messages in thread
From: Pedro Falcato @ 2023-01-14  8:08 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, patches, linux-mm, linux-kernel, regressions,
	Jiri Slaby, Maxim Levitsky, Mel Gorman, Michal Hocko,
	Paolo Bonzini, Chuyi Zhou, stable

On Sat, Jan 14, 2023 at 6:51 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 1/13/23 18:33, Vlastimil Babka wrote:
> > This reverts commit 7efc3b7261030da79001c00d92bc3392fd6c664c.
> >
> > We have got openSUSE reports (Link 1) for 6.1 kernel with khugepaged
> > stalling CPU for long periods of time. Investigation of tracepoint data
> > shows that compaction is stuck in repeating fast_find_migrateblock()
> > based migrate page isolation, and then fails to migrate all isolated
> > pages. Commit 7efc3b726103 ("mm/compaction: fix set skip in
> > fast_find_migrateblock") was suspected as it was merged in 6.1 and in
> > theory can indeed remove a termination condition for
> > fast_find_migrateblock() under certain conditions, as it removes a place
> > that always marks a scanned pageblock from being re-scanned. There are
> > other such places, but those can be skipped under certain conditions,
> > which seems to match the tracepoint data.
> >
> > Testing of revert also appears to have resolved the issue, thus revert
> > the commit until a more robust solution for the original problem is
> > developed.
> >
> > It's also likely this will fix qemu stalls with 6.1 kernel reported in
> > Link 2, but that is not yet confirmed.
> >
> > Link: https://bugzilla.suse.com/show_bug.cgi?id=1206848
> > Link: https://lore.kernel.org/kvm/b8017e09-f336-3035-8344-c549086c2340@kernel.org/
> > Fixes: 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock")
> > Cc: <stable@vger.kernel.org>
>
> Oops, forgot:
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>
> > ---
> >  mm/compaction.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index ca1603524bbe..8238e83385a7 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -1839,6 +1839,7 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
> >                                       pfn = cc->zone->zone_start_pfn;
> >                               cc->fast_search_fail = 0;
> >                               found_block = true;
> > +                             set_pageblock_skip(freepage);
> >                               break;
> >                       }
> >               }
>

Vlastimil,

Thank you so much for looking into this. I've been daily driving it
for the past half day and it seems to have fixed my QEMU issues.
Of course, I don't have exactly a test suite for this but I've tried
everything and I can't get any of the original problems to show up.

That being said,
Tested-by: Pedro Falcato <pedro.falcato@gmail.com>

I'll report back if QEMU freezes the system again.

-- 
Pedro

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-01-14  8:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-13 17:33 [PATCH for 6.1 regression] Revert "mm/compaction: fix set skip in fast_find_migrateblock" Vlastimil Babka
2023-01-14  6:49 ` Vlastimil Babka
2023-01-14  8:08   ` Pedro Falcato

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.