linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Mel Gorman <mgorman@techsingularity.net>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: howaboutsynergy@protonmail.com,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: compaction: Avoid 100% CPU usage during compaction when a task is killed
Date: Thu, 25 Jul 2019 15:02:55 +0200	[thread overview]
Message-ID: <68fef6b3-bae8-2479-0e6e-ce13607369af@suse.cz> (raw)
In-Reply-To: <20190718085708.GE24383@techsingularity.net>

On 7/18/19 10:57 AM, Mel Gorman wrote:
> "howaboutsynergy" reported via kernel buzilla number 204165 that
> compact_zone_order was consuming 100% CPU during a stress test for
> prolonged periods of time. Specifically the following command, which
> should exit in 10 seconds, was taking an excessive time to finish while
> the CPU was pegged at 100%.
> 
>   stress -m 220 --vm-bytes 1000000000 --timeout 10
> 
> Tracing indicated a pattern as follows
> 
>           stress-3923  [007]   519.106208: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
>           stress-3923  [007]   519.106212: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
>           stress-3923  [007]   519.106216: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
>           stress-3923  [007]   519.106219: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
>           stress-3923  [007]   519.106223: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
>           stress-3923  [007]   519.106227: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
>           stress-3923  [007]   519.106231: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
>           stress-3923  [007]   519.106235: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
>           stress-3923  [007]   519.106238: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
>           stress-3923  [007]   519.106242: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
> 
> Note that compaction is entered in rapid succession while scanning and
> isolating nothing. The problem is that when a task that is compacting
> receives a fatal signal, it retries indefinitely instead of exiting while
> making no progress as a fatal signal is pending.
> 
> It's not easy to trigger this condition although enabling zswap helps on
> the basis that the timing is altered. A very small window has to be hit
> for the problem to occur (signal delivered while compacting and isolating
> a PFN for migration that is not aligned to SWAP_CLUSTER_MAX).
> 
> This was reproduced locally -- 16G single socket system, 8G swap, 30% zswap
> configured, vm-bytes 22000000000 using Colin Kings stress-ng implementation
> from github running in a loop until the problem hits). Tracing recorded the
> problem occurring almost 200K times in a short window. With this patch, the
> problem hit 4 times but the task existed normally instead of consuming CPU.
> 
> This problem has existed for some time but it was made worse by
> cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as
> contention"). Before that commit, if the same condition was hit then
> locks would be quickly contended and compaction would exit that way.
> 
> I haven't included a Reported-and-tested-by as the reporters real name
> is unknown but this was caught and repaired due to their testing and
> tracing.  If they want a tag added then hopefully they'll say so before
> this gets merged.
> 
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204165
> Fixes: cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as contention")
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> CC: stable@vger.kernel.org # v5.1+

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/compaction.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 9e1b9acb116b..952dc2fb24e5 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -842,13 +842,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  
>  		/*
>  		 * Periodically drop the lock (if held) regardless of its
> -		 * contention, to give chance to IRQs. Abort async compaction
> -		 * if contended.
> +		 * contention, to give chance to IRQs. Abort completely if
> +		 * a fatal signal is pending.
>  		 */
>  		if (!(low_pfn % SWAP_CLUSTER_MAX)
>  		    && compact_unlock_should_abort(&pgdat->lru_lock,
> -					    flags, &locked, cc))
> -			break;
> +					    flags, &locked, cc)) {
> +			low_pfn = 0;
> +			goto fatal_pending;
> +		}
>  
>  		if (!pfn_valid_within(low_pfn))
>  			goto isolate_fail;
> @@ -1060,6 +1062,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  	trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
>  						nr_scanned, nr_isolated);
>  
> +fatal_pending:
>  	cc->total_migrate_scanned += nr_scanned;
>  	if (nr_isolated)
>  		count_compact_events(COMPACTISOLATED, nr_isolated);
> 


      parent reply	other threads:[~2019-07-25 13:02 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-18  8:57 [PATCH] mm: compaction: Avoid 100% CPU usage during compaction when a task is killed Mel Gorman
2019-07-18 11:48 ` howaboutsynergy
2019-07-18 21:22   ` Andrew Morton
2019-07-25 13:02 ` Vlastimil Babka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=68fef6b3-bae8-2479-0e6e-ce13607369af@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=howaboutsynergy@protonmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).