linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Mel Gorman <mgorman@techsingularity.net>, Linux-MM <linux-mm@kvack.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Zi Yan <zi.yan@cs.rutgers.edu>, Michal Hocko <mhocko@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/4] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event
Date: Thu, 22 Nov 2018 18:02:10 +0100	[thread overview]
Message-ID: <35ea6691-e819-5581-7d32-39c1abfbe775@suse.cz> (raw)
In-Reply-To: <20181121101414.21301-5-mgorman@techsingularity.net>

On 11/21/18 11:14 AM, Mel Gorman wrote:
> An event that potentially causes external fragmentation problems has
> already been described but there are degrees of severity.  A "serious"
> event is defined as one that steals a contiguous range of pages of an order
> lower than fragment_stall_order (PAGE_ALLOC_COSTLY_ORDER by default). If a
> movable allocation request that is allowed to sleep needs to steal a small
> block then it schedules until kswapd makes progress or a timeout passes.
> The watermarks are also boosted slightly faster so that kswapd makes
> greater effort to reclaim enough pages to avoid the fragmentation event.
> 
> This stall is not guaranteed to avoid serious fragmentation events.
> If memory pressure is high enough, the pages freed by kswapd may be
> reallocated or the free pages may not be in pageblocks that contain
> only movable pages. Furthermore an allocation request that cannot stall
> (e.g. atomic allocations) or unmovable/reclaimable allocations will still
> proceed without stalling.

Not doing this for unmovable/reclaimable allocations is kinda disadvantage?

>  ==============================================================
>  
> +fragment_stall_order
> +
> +External fragmentation control is managed on a pageblock level where the
> +page allocator tries to avoid mixing pages of different mobility within page
> +blocks (e.g. order 9 on 64-bit x86). If external fragmentation is perfectly
> +controlled then a THP allocation will often succeed up to the number of
> +movable pageblocks in the system as reported by /proc/pagetypeinfo.
> +
> +When memory is low, the system may have to mix pageblocks and will wake
> +kswapd to try control future fragmentation. fragment_stall_order controls if
> +the allocating task will stall if possible until kswapd makes some progress
> +in preference to fragmenting the system. This incurs a small stall penalty
> +in exchange for future success at allocating huge pages. If the stalls
> +are undesirable and high-order allocations are irrelevant then this can
> +be disabled by writing 0 to the tunable. Writing the pageblock order will
> +strongly (but not perfectly) control external fragmentation.
> +
> +The default will stall for fragmenting allocations smaller than the
> +PAGE_ALLOC_COSTLY_ORDER (defined as order-3 at the time of writing).

Perhaps be more explicit that steals of orders strictly lower than given
value will stall? So for the default order-3, the sysctl value is 4,
which might confuse somebody.

> +
> @@ -2130,9 +2131,10 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
>  	return false;
>  }
>  
> +
> +static void stall_fragmentation(struct zone *pzone)
> +{
> +	DEFINE_WAIT(wait);
> +	long remaining = 0;
> +	long timeout = HZ/50;
> +	pg_data_t *pgdat = pzone->zone_pgdat;
> +
> +	if (current->flags & PF_MEMALLOC)
> +		return;
> +
> +	boost_watermark(pzone, true);

Should zone->lock be taken around this to make watermark_boost
adjustment safe? Similar to balance_pgdat().

> +	prepare_to_wait(&pgdat->pfmemalloc_wait, &wait, TASK_INTERRUPTIBLE);
> +	if (waitqueue_active(&pgdat->kswapd_wait))
> +		wake_up_interruptible(&pgdat->kswapd_wait);
> +	remaining = schedule_timeout(timeout);
> +	finish_wait(&pgdat->pfmemalloc_wait, &wait);
> +	if (remaining != timeout) {
> +		trace_mm_fragmentation_stall(pgdat->node_id,
> +			jiffies_to_usecs(timeout - remaining));
> +		count_vm_event(FRAGMENTSTALL);
> +	}
>  }
>  

> @@ -4186,6 +4234,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	 */
>  	alloc_flags = gfp_to_alloc_flags(gfp_mask);
>  
> +	/*
> +	 * Consider stalling on heavy for movable allocations in preference to
> +	 * fragmenting unmovable/reclaimable pageblocks.
> +	 */
> +	if ((gfp_mask & (__GFP_MOVABLE|__GFP_DIRECT_RECLAIM)) ==
> +			(__GFP_MOVABLE|__GFP_DIRECT_RECLAIM))
> +		alloc_flags |= ALLOC_FRAGMENT_STALL;

Surprised that this only has effect in the slowpath, i.e. when
watermarks are below 'low'. If it's intended (to not stall that much I
suppose) maybe explain the rationale in the changelog?

Thanks for the series, Mel, hope the results are still optimistic after
some of the fixes that might unfortunately limit its impact :)

Vlastimil

  reply	other threads:[~2018-11-22 17:05 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-21 10:14 [PATCH 0/4] Fragmentation avoidance improvements v4 Mel Gorman
2018-11-21 10:14 ` [PATCH 1/4] mm, page_alloc: Spread allocations across zones before introducing fragmentation Mel Gorman
2018-11-21 14:18   ` Vlastimil Babka
2018-11-21 14:31     ` Mel Gorman
2018-11-21 10:14 ` [PATCH 2/4] mm: Move zone watermark accesses behind an accessor Mel Gorman
2018-11-21 22:07   ` Vlastimil Babka
2018-11-21 10:14 ` [PATCH 3/4] mm: Reclaim small amounts of memory when an external fragmentation event occurs Mel Gorman
2018-11-22 13:53   ` Vlastimil Babka
2018-11-22 15:04     ` Mel Gorman
2018-11-22 15:35       ` Vlastimil Babka
2018-11-22 16:22         ` Mel Gorman
2018-11-21 10:14 ` [PATCH 4/4] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event Mel Gorman
2018-11-22 17:02   ` Vlastimil Babka [this message]
2018-11-22 19:10     ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2018-11-08  9:12 [PATCH 0/4] Fragmentation avoidance improvements v3 Mel Gorman
2018-11-08  9:12 ` [PATCH 4/4] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=35ea6691-e819-5581-7d32-39c1abfbe775@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).