[25/25] mm, compaction: Do not direct compact remote memory
diff mbox series

Message ID 20190104125011.16071-26-mgorman@techsingularity.net
State In Next
Commit 89fb2aa1a477c033a87f715e77ae3f69e0052901
Headers show
Series
  • Increase success rates and reduce latency of compaction v2
Related show

Commit Message

Mel Gorman Jan. 4, 2019, 12:50 p.m. UTC
Remote compaction is expensive and possibly counter-productive. Locality
is expected to often have better performance characteristics than remote
high-order pages. For small allocations, it's expected that locality is
generally required or fallbacks are possible. For larger allocations such
as THP, they are forbidden at the time of writing but if __GFP_THISNODE
is ever removed, then it would still be preferable to fallback to small
local base pages over remote THP in the general case. kcompactd is still
woken via kswapd so compaction happens eventually.

While this patch potentially has both positive and negative effects,
it is best to avoid the possibility of remote compaction given the cost
relative to any potential benefit.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Vlastimil Babka Jan. 18, 2019, 1:51 p.m. UTC | #1
On 1/4/19 1:50 PM, Mel Gorman wrote:
> Remote compaction is expensive and possibly counter-productive. Locality
> is expected to often have better performance characteristics than remote
> high-order pages. For small allocations, it's expected that locality is
> generally required or fallbacks are possible. For larger allocations such
> as THP, they are forbidden at the time of writing but if __GFP_THISNODE
> is ever removed, then it would still be preferable to fallback to small
> local base pages over remote THP in the general case. kcompactd is still
> woken via kswapd so compaction happens eventually.
> 
> While this patch potentially has both positive and negative effects,
> it is best to avoid the possibility of remote compaction given the cost
> relative to any potential benefit.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Generally agree with the intent, but what if there's e.g. high-order (but not
costly) kernel allocation on behalf of user process on cpu belonging to a
movable node, where the only non-movable node is node 0. It will have to keep
reclaiming until a large enough page is formed, or wait for kcompactd?
So maybe do this only for costly orders?

Also I think compaction_zonelist_suitable() should be also updated, or we might
be promising the reclaim-compact loop e.g. that we will compact after enough
reclaim, but then we won't.

> ---
>  mm/compaction.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ae70be023b21..cc17f0c01811 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2348,6 +2348,16 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
>  			continue;
>  		}
>  
> +		/*
> +		 * Do not compact remote memory. It's expensive and high-order
> +		 * small allocations are expected to prefer or require local
> +		 * memory. Similarly, larger requests such as THP can fallback
> +		 * to base pages in preference to remote huge pages if
> +		 * __GFP_THISNODE is not specified
> +		 */
> +		if (zone_to_nid(zone) != zone_to_nid(ac->preferred_zoneref->zone))
> +			continue;
> +
>  		status = compact_zone_order(zone, order, gfp_mask, prio,
>  				alloc_flags, ac_classzone_idx(ac), capture);
>  		rc = max(status, rc);
>
Mel Gorman Jan. 18, 2019, 2:46 p.m. UTC | #2
On Fri, Jan 18, 2019 at 02:51:00PM +0100, Vlastimil Babka wrote:
> On 1/4/19 1:50 PM, Mel Gorman wrote:
> > Remote compaction is expensive and possibly counter-productive. Locality
> > is expected to often have better performance characteristics than remote
> > high-order pages. For small allocations, it's expected that locality is
> > generally required or fallbacks are possible. For larger allocations such
> > as THP, they are forbidden at the time of writing but if __GFP_THISNODE
> > is ever removed, then it would still be preferable to fallback to small
> > local base pages over remote THP in the general case. kcompactd is still
> > woken via kswapd so compaction happens eventually.
> > 
> > While this patch potentially has both positive and negative effects,
> > it is best to avoid the possibility of remote compaction given the cost
> > relative to any potential benefit.
> > 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> Generally agree with the intent, but what if there's e.g. high-order (but not
> costly) kernel allocation on behalf of user process on cpu belonging to a
> movable node, where the only non-movable node is node 0. It will have to keep
> reclaiming until a large enough page is formed, or wait for kcompactd?

Nnnggghhh, movable nodes. Yes, in such a case it would have to wait for
reclaim or kcompactd which could be problematic. This would have to be
special cased further.

> So maybe do this only for costly orders?
> 

This was written on the basis of the __GFP_THISNODE discussion which is
THP specific so costly didn't come into my thinking. If that ever gets
resurrected properly, this patch can be revisited. It would be trivial to
check if the preferred node is a movable node and allow remote compaction
in such cases but I'm not aiming at any specific problem with this patch
so it's too hand-wavy.

> Also I think compaction_zonelist_suitable() should be also updated, or we might
> be promising the reclaim-compact loop e.g. that we will compact after enough
> reclaim, but then we won't.
> 

True. I think I'll kill this patch as __GFP_THISNODE is now used again
for THP (regardless of how one feels about the subject) and we don't have
good examples where remote compaction for lower-order kernel allocations
is a problem.

Patch
diff mbox series

diff --git a/mm/compaction.c b/mm/compaction.c
index ae70be023b21..cc17f0c01811 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2348,6 +2348,16 @@  enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 			continue;
 		}
 
+		/*
+		 * Do not compact remote memory. It's expensive and high-order
+		 * small allocations are expected to prefer or require local
+		 * memory. Similarly, larger requests such as THP can fallback
+		 * to base pages in preference to remote huge pages if
+		 * __GFP_THISNODE is not specified
+		 */
+		if (zone_to_nid(zone) != zone_to_nid(ac->preferred_zoneref->zone))
+			continue;
+
 		status = compact_zone_order(zone, order, gfp_mask, prio,
 				alloc_flags, ac_classzone_idx(ac), capture);
 		rc = max(status, rc);