[patch] mm, compaction: make sure freeing scanner isn't persistently expensive

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [patch] mm, compaction: make sure freeing scanner isn't persistently expensive
@ 2016-06-29  1:39 David Rientjes
  2016-06-29  6:53 ` Vlastimil Babka
  0 siblings, 1 reply; 8+ messages in thread
From: David Rientjes @ 2016-06-29  1:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Joonsoo Kim, Mel Gorman, linux-mm, linux-kernel

It's possible that the freeing scanner can be consistently expensive if
memory is well compacted toward the end of the zone with few free pages
available in that area.

If all zone memory is synchronously compacted, say with
/proc/sys/vm/compact_memory, and thp is faulted, it is possible to
iterate a massive amount of memory even with the per-zone cached free
position.

For example, after compacting all memory and faulting thp for heap, it
was observed that compact_free_scanned increased as much as 892518911 4KB
pages while compact_stall only increased by 171.  The freeing scanner
iterated ~20GB of memory for each compaction stall.

To address this, if too much memory is spanned on the freeing scanner's
freelist when releasing back to the system, return the low pfn rather than
the high pfn.  It's declared that the freeing scanner will become too
expensive if the high pfn is used, so use the low pfn instead.

The amount of memory declared as too expensive to iterate is subjectively
chosen at COMPACT_CLUSTER_MAX << PAGE_SHIFT, which is 512MB with 4KB
pages.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -47,10 +47,16 @@ static inline void count_compact_events(enum vm_event_item item, long delta)
 #define pageblock_start_pfn(pfn)	block_start_pfn(pfn, pageblock_order)
 #define pageblock_end_pfn(pfn)		block_end_pfn(pfn, pageblock_order)

+/*
+ * Releases isolated free pages back to the buddy allocator.  Returns the pfn
+ * that should be cached for the next compaction of this zone, depending on how
+ * much memory the free pages span.
+ */
 static unsigned long release_freepages(struct list_head *freelist)
 {
 	struct page *page, *next;
 	unsigned long high_pfn = 0;
+	unsigned long low_pfn = -1UL;

 	list_for_each_entry_safe(page, next, freelist, lru) {
 		unsigned long pfn = page_to_pfn(page);
@@ -58,8 +64,18 @@ static unsigned long release_freepages(struct list_head *freelist)
 		__free_page(page);
 		if (pfn > high_pfn)
 			high_pfn = pfn;
+		if (pfn < low_pfn)
+			low_pfn = pfn;
 	}

+	/*
+	 * If the list of freepages spans too much memory, the cached position
+	 * should be updated to the lowest pfn to prevent the freeing scanner
+	 * from becoming too expensive.
+	 */
+	if ((high_pfn - low_pfn) > (COMPACT_CLUSTER_MAX << PAGE_SHIFT))
+		return low_pfn;
+
 	return high_pfn;
 }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive
  2016-06-29  1:39 [patch] mm, compaction: make sure freeing scanner isn't persistently expensive David Rientjes
@ 2016-06-29  6:53 ` Vlastimil Babka
  2016-06-29 20:55   ` David Rientjes
  0 siblings, 1 reply; 8+ messages in thread
From: Vlastimil Babka @ 2016-06-29  6:53 UTC (permalink / raw)
  To: David Rientjes, Andrew Morton
  Cc: Joonsoo Kim, Mel Gorman, linux-mm, linux-kernel

On 06/29/2016 03:39 AM, David Rientjes wrote:
> It's possible that the freeing scanner can be consistently expensive if
> memory is well compacted toward the end of the zone with few free pages
> available in that area.
>
> If all zone memory is synchronously compacted, say with
> /proc/sys/vm/compact_memory, and thp is faulted, it is possible to
> iterate a massive amount of memory even with the per-zone cached free
> position.
>
> For example, after compacting all memory and faulting thp for heap, it
> was observed that compact_free_scanned increased as much as 892518911 4KB
> pages while compact_stall only increased by 171.  The freeing scanner
> iterated ~20GB of memory for each compaction stall.
>
> To address this, if too much memory is spanned on the freeing scanner's
> freelist when releasing back to the system, return the low pfn rather than
> the high pfn.  It's declared that the freeing scanner will become too
> expensive if the high pfn is used, so use the low pfn instead.
>
> The amount of memory declared as too expensive to iterate is subjectively
> chosen at COMPACT_CLUSTER_MAX << PAGE_SHIFT, which is 512MB with 4KB
> pages.
>
> Signed-off-by: David Rientjes <rientjes@google.com>

Hmm, I don't know. Seems it only works around one corner case of a 
larger issue. The cost for the scanning was already paid, the patch 
prevents it from being paid again, but only until the scanners are reset.

Note also that THP's no longer do direct compaction by default in recent 
kernels.

To fully solve the freepage scanning issue, we should probably pick and 
finish one of the proposed reworks from Joonsoo or myself, or the 
approach that replaces free scanner with direct freelist allocations.

> ---
>  mm/compaction.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -47,10 +47,16 @@ static inline void count_compact_events(enum vm_event_item item, long delta)
>  #define pageblock_start_pfn(pfn)	block_start_pfn(pfn, pageblock_order)
>  #define pageblock_end_pfn(pfn)		block_end_pfn(pfn, pageblock_order)
>
> +/*
> + * Releases isolated free pages back to the buddy allocator.  Returns the pfn
> + * that should be cached for the next compaction of this zone, depending on how
> + * much memory the free pages span.
> + */
>  static unsigned long release_freepages(struct list_head *freelist)
>  {
>  	struct page *page, *next;
>  	unsigned long high_pfn = 0;
> +	unsigned long low_pfn = -1UL;
>
>  	list_for_each_entry_safe(page, next, freelist, lru) {
>  		unsigned long pfn = page_to_pfn(page);
> @@ -58,8 +64,18 @@ static unsigned long release_freepages(struct list_head *freelist)
>  		__free_page(page);
>  		if (pfn > high_pfn)
>  			high_pfn = pfn;
> +		if (pfn < low_pfn)
> +			low_pfn = pfn;
>  	}
>
> +	/*
> +	 * If the list of freepages spans too much memory, the cached position
> +	 * should be updated to the lowest pfn to prevent the freeing scanner
> +	 * from becoming too expensive.
> +	 */
> +	if ((high_pfn - low_pfn) > (COMPACT_CLUSTER_MAX << PAGE_SHIFT))
> +		return low_pfn;
> +
>  	return high_pfn;
>  }
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive
  2016-06-29  6:53 ` Vlastimil Babka
@ 2016-06-29 20:55   ` David Rientjes
  2016-06-30  7:31     ` Joonsoo Kim
  0 siblings, 1 reply; 8+ messages in thread
From: David Rientjes @ 2016-06-29 20:55 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Joonsoo Kim, Mel Gorman, linux-mm, linux-kernel

On Wed, 29 Jun 2016, Vlastimil Babka wrote:

> On 06/29/2016 03:39 AM, David Rientjes wrote:
> > It's possible that the freeing scanner can be consistently expensive if
> > memory is well compacted toward the end of the zone with few free pages
> > available in that area.
> > 
> > If all zone memory is synchronously compacted, say with
> > /proc/sys/vm/compact_memory, and thp is faulted, it is possible to
> > iterate a massive amount of memory even with the per-zone cached free
> > position.
> > 
> > For example, after compacting all memory and faulting thp for heap, it
> > was observed that compact_free_scanned increased as much as 892518911 4KB
> > pages while compact_stall only increased by 171.  The freeing scanner
> > iterated ~20GB of memory for each compaction stall.
> > 
> > To address this, if too much memory is spanned on the freeing scanner's
> > freelist when releasing back to the system, return the low pfn rather than
> > the high pfn.  It's declared that the freeing scanner will become too
> > expensive if the high pfn is used, so use the low pfn instead.
> > 
> > The amount of memory declared as too expensive to iterate is subjectively
> > chosen at COMPACT_CLUSTER_MAX << PAGE_SHIFT, which is 512MB with 4KB
> > pages.
> > 
> > Signed-off-by: David Rientjes <rientjes@google.com>
> 
> Hmm, I don't know. Seems it only works around one corner case of a larger
> issue. The cost for the scanning was already paid, the patch prevents it from
> being paid again, but only until the scanners are reset.
> 

The only point of the per-zone cached pfn positions is to avoid doing the 
same work again unnecessarily.  Having the last 16GB of memory at the end 
of a zone being completely unfree is the same as a single page in the last 
pageblock free.  The number of PageBuddy pages in that amount of memory 
can be irrelevant up to COMPACT_CLUSTER_MAX.  We simply can't afford to 
scan 16GB of memory looking for free pages.

> Note also that THP's no longer do direct compaction by default in recent
> kernels.
> 
> To fully solve the freepage scanning issue, we should probably pick and finish
> one of the proposed reworks from Joonsoo or myself, or the approach that
> replaces free scanner with direct freelist allocations.
> 

Feel free to post the patches, but I believe this simple change makes 
release_freepages() exceedingly better and can better target memory for 
the freeing scanner.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive
  2016-06-29 20:55   ` David Rientjes
@ 2016-06-30  7:31     ` Joonsoo Kim
  2016-06-30  7:42       ` Vlastimil Babka
  2016-07-11 23:01       ` David Rientjes
  0 siblings, 2 replies; 8+ messages in thread
From: Joonsoo Kim @ 2016-06-30  7:31 UTC (permalink / raw)
  To: David Rientjes
  Cc: Vlastimil Babka, Andrew Morton, Mel Gorman, linux-mm, linux-kernel

On Wed, Jun 29, 2016 at 01:55:55PM -0700, David Rientjes wrote:
> On Wed, 29 Jun 2016, Vlastimil Babka wrote:
> 
> > On 06/29/2016 03:39 AM, David Rientjes wrote:
> > > It's possible that the freeing scanner can be consistently expensive if
> > > memory is well compacted toward the end of the zone with few free pages
> > > available in that area.
> > > 
> > > If all zone memory is synchronously compacted, say with
> > > /proc/sys/vm/compact_memory, and thp is faulted, it is possible to
> > > iterate a massive amount of memory even with the per-zone cached free
> > > position.
> > > 
> > > For example, after compacting all memory and faulting thp for heap, it
> > > was observed that compact_free_scanned increased as much as 892518911 4KB
> > > pages while compact_stall only increased by 171.  The freeing scanner
> > > iterated ~20GB of memory for each compaction stall.
> > > 
> > > To address this, if too much memory is spanned on the freeing scanner's
> > > freelist when releasing back to the system, return the low pfn rather than
> > > the high pfn.  It's declared that the freeing scanner will become too
> > > expensive if the high pfn is used, so use the low pfn instead.
> > > 
> > > The amount of memory declared as too expensive to iterate is subjectively
> > > chosen at COMPACT_CLUSTER_MAX << PAGE_SHIFT, which is 512MB with 4KB
> > > pages.
> > > 
> > > Signed-off-by: David Rientjes <rientjes@google.com>
> > 
> > Hmm, I don't know. Seems it only works around one corner case of a larger
> > issue. The cost for the scanning was already paid, the patch prevents it from
> > being paid again, but only until the scanners are reset.
> > 
> 
> The only point of the per-zone cached pfn positions is to avoid doing the 
> same work again unnecessarily.  Having the last 16GB of memory at the end 
> of a zone being completely unfree is the same as a single page in the last 
> pageblock free.  The number of PageBuddy pages in that amount of memory 
> can be irrelevant up to COMPACT_CLUSTER_MAX.  We simply can't afford to 
> scan 16GB of memory looking for free pages.

We need to find a root cause of this problem, first.

I guess that this problem would happen when isolate_freepages_block()
early stop due to watermark check (if your patch is applied to your
kernel). If scanner meets, cached pfn will be reset and your patch
doesn't have any effect. So, I guess that scanner doesn't meet.

We enter the compaction with enough free memory so stop in
isolate_freepages_block() should be unlikely event but your number
shows that it happens frequently?

Maybe, if we change all watermark check on compaction.c to use
min_wmark, problem would be disappeared.

Anyway, could you check how often isolate_freepages_block() is stopped
and why?

In addition, I worry that your previous patch that makes
isolate_freepages_block() stop when watermark doesn't meet would cause
compaction non-progress. Amount of free memory can be flutuated so
watermark fail would be temporaral. We need to break compaction in
this case? It would decrease compaction success rate if there is a
memory hogger in parallel. Any idea?

Thanks.

> 
> > Note also that THP's no longer do direct compaction by default in recent
> > kernels.
> > 
> > To fully solve the freepage scanning issue, we should probably pick and finish
> > one of the proposed reworks from Joonsoo or myself, or the approach that
> > replaces free scanner with direct freelist allocations.
> > 
> 
> Feel free to post the patches, but I believe this simple change makes 
> release_freepages() exceedingly better and can better target memory for 
> the freeing scanner.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive
  2016-06-30  7:31     ` Joonsoo Kim
@ 2016-06-30  7:42       ` Vlastimil Babka
  2016-06-30  8:16         ` Joonsoo Kim
  2016-07-11 23:01       ` David Rientjes
  1 sibling, 1 reply; 8+ messages in thread
From: Vlastimil Babka @ 2016-06-30  7:42 UTC (permalink / raw)
  To: Joonsoo Kim, David Rientjes
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel

On 06/30/2016 09:31 AM, Joonsoo Kim wrote:
> On Wed, Jun 29, 2016 at 01:55:55PM -0700, David Rientjes wrote:
>> On Wed, 29 Jun 2016, Vlastimil Babka wrote:
>>
>>> On 06/29/2016 03:39 AM, David Rientjes wrote:
>>>> It's possible that the freeing scanner can be consistently expensive if
>>>> memory is well compacted toward the end of the zone with few free pages
>>>> available in that area.
>>>>
>>>> If all zone memory is synchronously compacted, say with
>>>> /proc/sys/vm/compact_memory, and thp is faulted, it is possible to
>>>> iterate a massive amount of memory even with the per-zone cached free
>>>> position.
>>>>
>>>> For example, after compacting all memory and faulting thp for heap, it
>>>> was observed that compact_free_scanned increased as much as 892518911 4KB
>>>> pages while compact_stall only increased by 171.  The freeing scanner
>>>> iterated ~20GB of memory for each compaction stall.
>>>>
>>>> To address this, if too much memory is spanned on the freeing scanner's
>>>> freelist when releasing back to the system, return the low pfn rather than
>>>> the high pfn.  It's declared that the freeing scanner will become too
>>>> expensive if the high pfn is used, so use the low pfn instead.
>>>>
>>>> The amount of memory declared as too expensive to iterate is subjectively
>>>> chosen at COMPACT_CLUSTER_MAX << PAGE_SHIFT, which is 512MB with 4KB
>>>> pages.
>>>>
>>>> Signed-off-by: David Rientjes <rientjes@google.com>
>>>
>>> Hmm, I don't know. Seems it only works around one corner case of a larger
>>> issue. The cost for the scanning was already paid, the patch prevents it from
>>> being paid again, but only until the scanners are reset.
>>>
>>
>> The only point of the per-zone cached pfn positions is to avoid doing the
>> same work again unnecessarily.  Having the last 16GB of memory at the end
>> of a zone being completely unfree is the same as a single page in the last
>> pageblock free.  The number of PageBuddy pages in that amount of memory
>> can be irrelevant up to COMPACT_CLUSTER_MAX.  We simply can't afford to
>> scan 16GB of memory looking for free pages.
>
> We need to find a root cause of this problem, first.
>
> I guess that this problem would happen when isolate_freepages_block()
> early stop due to watermark check (if your patch is applied to your
> kernel). If scanner meets, cached pfn will be reset and your patch
> doesn't have any effect. So, I guess that scanner doesn't meet.
>
> We enter the compaction with enough free memory so stop in
> isolate_freepages_block() should be unlikely event but your number
> shows that it happens frequently?

If it's THP faults, it could be also due to need_resched() or lock 
contention?

> Maybe, if we change all watermark check on compaction.c to use
> min_wmark, problem would be disappeared.

Basically patches 13 and 16 in https://lkml.org/lkml/2016/6/24/222

> Anyway, could you check how often isolate_freepages_block() is stopped
> and why?
>
> In addition, I worry that your previous patch that makes
> isolate_freepages_block() stop when watermark doesn't meet would cause
> compaction non-progress. Amount of free memory can be flutuated so
> watermark fail would be temporaral. We need to break compaction in
> this case? It would decrease compaction success rate if there is a
> memory hogger in parallel. Any idea?

I think it's better to stop and possibly switch to reclaim (or give up 
for THP's) than to continue hoping that somebody would free the memory 
for us. As I explained in the other thread, even if we removed watermark 
check completely and migration succeeded and formed high-order page, 
compact_finished() would see failed high-order watermark and return 
COMPACT_CONTINUE, even if the problem is actually order-0 watermarks. So 
maybe success rate would be bigger, but at enormous cost. IIRC you even 
proposed once to add order-0 check (maybe even with some gap like 
compaction_suitable()?) to compact_finished() that would terminate 
compaction. Which shouldn't be necessary if we terminate due to 
split_free_page() failing.

> Thanks.
>
>>
>>> Note also that THP's no longer do direct compaction by default in recent
>>> kernels.
>>>
>>> To fully solve the freepage scanning issue, we should probably pick and finish
>>> one of the proposed reworks from Joonsoo or myself, or the approach that
>>> replaces free scanner with direct freelist allocations.
>>>
>>
>> Feel free to post the patches, but I believe this simple change makes
>> release_freepages() exceedingly better and can better target memory for
>> the freeing scanner.
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive
  2016-06-30  7:42       ` Vlastimil Babka
@ 2016-06-30  8:16         ` Joonsoo Kim
  0 siblings, 0 replies; 8+ messages in thread
From: Joonsoo Kim @ 2016-06-30  8:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Andrew Morton, Mel Gorman, linux-mm, linux-kernel

On Thu, Jun 30, 2016 at 09:42:36AM +0200, Vlastimil Babka wrote:
> On 06/30/2016 09:31 AM, Joonsoo Kim wrote:
> >On Wed, Jun 29, 2016 at 01:55:55PM -0700, David Rientjes wrote:
> >>On Wed, 29 Jun 2016, Vlastimil Babka wrote:
> >>
> >>>On 06/29/2016 03:39 AM, David Rientjes wrote:
> >>>>It's possible that the freeing scanner can be consistently expensive if
> >>>>memory is well compacted toward the end of the zone with few free pages
> >>>>available in that area.
> >>>>
> >>>>If all zone memory is synchronously compacted, say with
> >>>>/proc/sys/vm/compact_memory, and thp is faulted, it is possible to
> >>>>iterate a massive amount of memory even with the per-zone cached free
> >>>>position.
> >>>>
> >>>>For example, after compacting all memory and faulting thp for heap, it
> >>>>was observed that compact_free_scanned increased as much as 892518911 4KB
> >>>>pages while compact_stall only increased by 171.  The freeing scanner
> >>>>iterated ~20GB of memory for each compaction stall.
> >>>>
> >>>>To address this, if too much memory is spanned on the freeing scanner's
> >>>>freelist when releasing back to the system, return the low pfn rather than
> >>>>the high pfn.  It's declared that the freeing scanner will become too
> >>>>expensive if the high pfn is used, so use the low pfn instead.
> >>>>
> >>>>The amount of memory declared as too expensive to iterate is subjectively
> >>>>chosen at COMPACT_CLUSTER_MAX << PAGE_SHIFT, which is 512MB with 4KB
> >>>>pages.
> >>>>
> >>>>Signed-off-by: David Rientjes <rientjes@google.com>
> >>>
> >>>Hmm, I don't know. Seems it only works around one corner case of a larger
> >>>issue. The cost for the scanning was already paid, the patch prevents it from
> >>>being paid again, but only until the scanners are reset.
> >>>
> >>
> >>The only point of the per-zone cached pfn positions is to avoid doing the
> >>same work again unnecessarily.  Having the last 16GB of memory at the end
> >>of a zone being completely unfree is the same as a single page in the last
> >>pageblock free.  The number of PageBuddy pages in that amount of memory
> >>can be irrelevant up to COMPACT_CLUSTER_MAX.  We simply can't afford to
> >>scan 16GB of memory looking for free pages.
> >
> >We need to find a root cause of this problem, first.
> >
> >I guess that this problem would happen when isolate_freepages_block()
> >early stop due to watermark check (if your patch is applied to your
> >kernel). If scanner meets, cached pfn will be reset and your patch
> >doesn't have any effect. So, I guess that scanner doesn't meet.
> >
> >We enter the compaction with enough free memory so stop in
> >isolate_freepages_block() should be unlikely event but your number
> >shows that it happens frequently?
> 
> If it's THP faults, it could be also due to need_resched() or lock
> contention?

Okay. I missed that.

> 
> >Maybe, if we change all watermark check on compaction.c to use
> >min_wmark, problem would be disappeared.
> 
> Basically patches 13 and 16 in https://lkml.org/lkml/2016/6/24/222

Okay. I don't look at it but I like to change to use min_wmark.

> >Anyway, could you check how often isolate_freepages_block() is stopped
> >and why?
> >
> >In addition, I worry that your previous patch that makes
> >isolate_freepages_block() stop when watermark doesn't meet would cause
> >compaction non-progress. Amount of free memory can be flutuated so
> >watermark fail would be temporaral. We need to break compaction in
> >this case? It would decrease compaction success rate if there is a
> >memory hogger in parallel. Any idea?
> 
> I think it's better to stop and possibly switch to reclaim (or give
> up for THP's) than to continue hoping that somebody would free the
> memory for us. As I explained in the other thread, even if we
> removed watermark check completely and migration succeeded and
> formed high-order page, compact_finished() would see failed
> high-order watermark and return COMPACT_CONTINUE, even if the
> problem is actually order-0 watermarks. So maybe success rate would
> be bigger, but at enormous cost. IIRC you even proposed once to add

I understand your point. I'm not insisting to remove watermark check
in split_free_page(). However, my worry still remains. If we use
min_wmark, there would be no problem since memory hogger cannot
easily consume memory below the min_wmark. But, if we use low_wmark,
memory hogger consumes all free memory up to min_wmark repeatedly and
compaction will fail repeatedly. This is the problem about robustness
and correctness of the system so, even if we pay more, we prohibits
such a case. If we once make high order page, it would not be broken
easily so we can get it when next reclaim makes order 0 free memory up
to watermark. But, if we stop to make high order page when watermark
check is failed, we need to run compaction one more time after next
reclaim and there is a chance that memory hogger could consume all
reclaimed free memory.

> order-0 check (maybe even with some gap like compaction_suitable()?)
> to compact_finished() that would terminate compaction. Which
> shouldn't be necessary if we terminate due to split_free_page()
> failing.

I can't remember if I did it or not. :)

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive
  2016-06-30  7:31     ` Joonsoo Kim
  2016-06-30  7:42       ` Vlastimil Babka
@ 2016-07-11 23:01       ` David Rientjes
  2016-07-18  5:44         ` Joonsoo Kim
  1 sibling, 1 reply; 8+ messages in thread
From: David Rientjes @ 2016-07-11 23:01 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Vlastimil Babka, Andrew Morton, Mel Gorman, linux-mm, linux-kernel

On Thu, 30 Jun 2016, Joonsoo Kim wrote:

> We need to find a root cause of this problem, first.
> 
> I guess that this problem would happen when isolate_freepages_block()
> early stop due to watermark check (if your patch is applied to your
> kernel). If scanner meets, cached pfn will be reset and your patch
> doesn't have any effect. So, I guess that scanner doesn't meet.
> 

If the scanners meet, we should rely on deferred compaction to suppress 
further attempts in the near future.  This is outside the scope of this 
fix.

> We enter the compaction with enough free memory so stop in
> isolate_freepages_block() should be unlikely event but your number
> shows that it happens frequently?
> 

It's not the only reason why freepages will be returned to the buddy 
allocator: if locks become contended because we are spending too much time 
compacting memory, we can persistently get free pages returned to the end 
of the zone and then repeatedly iterate >100GB of memory on every call to 
isolate_freepages(), which makes its own contended checks fire more often.  
This patch is only an attempt to prevent lenghty iterations when we have 
recently scanned the memory and found freepages to not be isolatable.

> In addition, I worry that your previous patch that makes
> isolate_freepages_block() stop when watermark doesn't meet would cause
> compaction non-progress. Amount of free memory can be flutuated so
> watermark fail would be temporaral. We need to break compaction in
> this case? It would decrease compaction success rate if there is a
> memory hogger in parallel. Any idea?
> 

In my opinion, which I think is quite well known by now, the compaction 
freeing scanner shouldn't be checking _any_ watermark.  The end result is 
that we're migrating memory, not allocating additional memory; determining 
if compaction should be done is best left lower on the stack.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive
  2016-07-11 23:01       ` David Rientjes
@ 2016-07-18  5:44         ` Joonsoo Kim
  0 siblings, 0 replies; 8+ messages in thread
From: Joonsoo Kim @ 2016-07-18  5:44 UTC (permalink / raw)
  To: David Rientjes
  Cc: Vlastimil Babka, Andrew Morton, Mel Gorman, linux-mm, linux-kernel

On Mon, Jul 11, 2016 at 04:01:52PM -0700, David Rientjes wrote:
> On Thu, 30 Jun 2016, Joonsoo Kim wrote:
> 
> > We need to find a root cause of this problem, first.
> > 
> > I guess that this problem would happen when isolate_freepages_block()
> > early stop due to watermark check (if your patch is applied to your
> > kernel). If scanner meets, cached pfn will be reset and your patch
> > doesn't have any effect. So, I guess that scanner doesn't meet.
> > 
> 
> If the scanners meet, we should rely on deferred compaction to suppress 
> further attempts in the near future.  This is outside the scope of this 
> fix.
> 
> > We enter the compaction with enough free memory so stop in
> > isolate_freepages_block() should be unlikely event but your number
> > shows that it happens frequently?
> > 
> 
> It's not the only reason why freepages will be returned to the buddy 
> allocator: if locks become contended because we are spending too much time 
> compacting memory, we can persistently get free pages returned to the end 
> of the zone and then repeatedly iterate >100GB of memory on every call to 
> isolate_freepages(), which makes its own contended checks fire more often.  
> This patch is only an attempt to prevent lenghty iterations when we have 
> recently scanned the memory and found freepages to not be isolatable.

Hmm... I can't understand how freepage scanner is persistently
expensive. After freepage scanner get freepages, migration isn't
stopped until either migratable pages are empty or freepages are empty.

If there is no freepage, above problem doesn't happen so I assume that
there is no migratable pages after calling migrate_pages().

If there is no migratable pages, it means that freepages are used by
migration. Sometimes later, freepages in that pageblock are exhausted by
migration and freepage scanner will move the next pageblock. So, I
cannot understand how it is persistently expensive.

Am I missing something?

If it is caused by the fact that too many freepages are isolated at
once (up to migratable pages), we can modify logic to stop isolating
freepages when the pageblock is changed and freepage scanner has one
or more freepages.

> 
> > In addition, I worry that your previous patch that makes
> > isolate_freepages_block() stop when watermark doesn't meet would cause
> > compaction non-progress. Amount of free memory can be flutuated so
> > watermark fail would be temporaral. We need to break compaction in
> > this case? It would decrease compaction success rate if there is a
> > memory hogger in parallel. Any idea?
> > 
> 
> In my opinion, which I think is quite well known by now, the compaction 
> freeing scanner shouldn't be checking _any_ watermark.  The end result is 
> that we're migrating memory, not allocating additional memory; determining 
> if compaction should be done is best left lower on the stack.

Hmm...if there are many parallel compactors and we have no watermark check,
they consume all emergency memory. It can be mitigated by isolating
just one freepage in this case, but, potential risk would not
be disappeared.

Thanks.

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-07-18  5:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-29  1:39 [patch] mm, compaction: make sure freeing scanner isn't persistently expensive David Rientjes
2016-06-29  6:53 ` Vlastimil Babka
2016-06-29 20:55   ` David Rientjes
2016-06-30  7:31     ` Joonsoo Kim
2016-06-30  7:42       ` Vlastimil Babka
2016-06-30  8:16         ` Joonsoo Kim
2016-07-11 23:01       ` David Rientjes
2016-07-18  5:44         ` Joonsoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).