* [patch] mm, compaction: make sure freeing scanner isn't persistently expensive @ 2016-06-29 1:39 David Rientjes 2016-06-29 6:53 ` Vlastimil Babka 0 siblings, 1 reply; 8+ messages in thread From: David Rientjes @ 2016-06-29 1:39 UTC (permalink / raw) To: Andrew Morton Cc: Vlastimil Babka, Joonsoo Kim, Mel Gorman, linux-mm, linux-kernel It's possible that the freeing scanner can be consistently expensive if memory is well compacted toward the end of the zone with few free pages available in that area. If all zone memory is synchronously compacted, say with /proc/sys/vm/compact_memory, and thp is faulted, it is possible to iterate a massive amount of memory even with the per-zone cached free position. For example, after compacting all memory and faulting thp for heap, it was observed that compact_free_scanned increased as much as 892518911 4KB pages while compact_stall only increased by 171. The freeing scanner iterated ~20GB of memory for each compaction stall. To address this, if too much memory is spanned on the freeing scanner's freelist when releasing back to the system, return the low pfn rather than the high pfn. It's declared that the freeing scanner will become too expensive if the high pfn is used, so use the low pfn instead. The amount of memory declared as too expensive to iterate is subjectively chosen at COMPACT_CLUSTER_MAX << PAGE_SHIFT, which is 512MB with 4KB pages. Signed-off-by: David Rientjes <rientjes@google.com> --- mm/compaction.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/mm/compaction.c b/mm/compaction.c --- a/mm/compaction.c +++ b/mm/compaction.c @@ -47,10 +47,16 @@ static inline void count_compact_events(enum vm_event_item item, long delta) #define pageblock_start_pfn(pfn) block_start_pfn(pfn, pageblock_order) #define pageblock_end_pfn(pfn) block_end_pfn(pfn, pageblock_order) +/* + * Releases isolated free pages back to the buddy allocator. Returns the pfn + * that should be cached for the next compaction of this zone, depending on how + * much memory the free pages span. + */ static unsigned long release_freepages(struct list_head *freelist) { struct page *page, *next; unsigned long high_pfn = 0; + unsigned long low_pfn = -1UL; list_for_each_entry_safe(page, next, freelist, lru) { unsigned long pfn = page_to_pfn(page); @@ -58,8 +64,18 @@ static unsigned long release_freepages(struct list_head *freelist) __free_page(page); if (pfn > high_pfn) high_pfn = pfn; + if (pfn < low_pfn) + low_pfn = pfn; } + /* + * If the list of freepages spans too much memory, the cached position + * should be updated to the lowest pfn to prevent the freeing scanner + * from becoming too expensive. + */ + if ((high_pfn - low_pfn) > (COMPACT_CLUSTER_MAX << PAGE_SHIFT)) + return low_pfn; + return high_pfn; } ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive 2016-06-29 1:39 [patch] mm, compaction: make sure freeing scanner isn't persistently expensive David Rientjes @ 2016-06-29 6:53 ` Vlastimil Babka 2016-06-29 20:55 ` David Rientjes 0 siblings, 1 reply; 8+ messages in thread From: Vlastimil Babka @ 2016-06-29 6:53 UTC (permalink / raw) To: David Rientjes, Andrew Morton Cc: Joonsoo Kim, Mel Gorman, linux-mm, linux-kernel On 06/29/2016 03:39 AM, David Rientjes wrote: > It's possible that the freeing scanner can be consistently expensive if > memory is well compacted toward the end of the zone with few free pages > available in that area. > > If all zone memory is synchronously compacted, say with > /proc/sys/vm/compact_memory, and thp is faulted, it is possible to > iterate a massive amount of memory even with the per-zone cached free > position. > > For example, after compacting all memory and faulting thp for heap, it > was observed that compact_free_scanned increased as much as 892518911 4KB > pages while compact_stall only increased by 171. The freeing scanner > iterated ~20GB of memory for each compaction stall. > > To address this, if too much memory is spanned on the freeing scanner's > freelist when releasing back to the system, return the low pfn rather than > the high pfn. It's declared that the freeing scanner will become too > expensive if the high pfn is used, so use the low pfn instead. > > The amount of memory declared as too expensive to iterate is subjectively > chosen at COMPACT_CLUSTER_MAX << PAGE_SHIFT, which is 512MB with 4KB > pages. > > Signed-off-by: David Rientjes <rientjes@google.com> Hmm, I don't know. Seems it only works around one corner case of a larger issue. The cost for the scanning was already paid, the patch prevents it from being paid again, but only until the scanners are reset. Note also that THP's no longer do direct compaction by default in recent kernels. To fully solve the freepage scanning issue, we should probably pick and finish one of the proposed reworks from Joonsoo or myself, or the approach that replaces free scanner with direct freelist allocations. > --- > mm/compaction.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/mm/compaction.c b/mm/compaction.c > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -47,10 +47,16 @@ static inline void count_compact_events(enum vm_event_item item, long delta) > #define pageblock_start_pfn(pfn) block_start_pfn(pfn, pageblock_order) > #define pageblock_end_pfn(pfn) block_end_pfn(pfn, pageblock_order) > > +/* > + * Releases isolated free pages back to the buddy allocator. Returns the pfn > + * that should be cached for the next compaction of this zone, depending on how > + * much memory the free pages span. > + */ > static unsigned long release_freepages(struct list_head *freelist) > { > struct page *page, *next; > unsigned long high_pfn = 0; > + unsigned long low_pfn = -1UL; > > list_for_each_entry_safe(page, next, freelist, lru) { > unsigned long pfn = page_to_pfn(page); > @@ -58,8 +64,18 @@ static unsigned long release_freepages(struct list_head *freelist) > __free_page(page); > if (pfn > high_pfn) > high_pfn = pfn; > + if (pfn < low_pfn) > + low_pfn = pfn; > } > > + /* > + * If the list of freepages spans too much memory, the cached position > + * should be updated to the lowest pfn to prevent the freeing scanner > + * from becoming too expensive. > + */ > + if ((high_pfn - low_pfn) > (COMPACT_CLUSTER_MAX << PAGE_SHIFT)) > + return low_pfn; > + > return high_pfn; > } > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive 2016-06-29 6:53 ` Vlastimil Babka @ 2016-06-29 20:55 ` David Rientjes 2016-06-30 7:31 ` Joonsoo Kim 0 siblings, 1 reply; 8+ messages in thread From: David Rientjes @ 2016-06-29 20:55 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Joonsoo Kim, Mel Gorman, linux-mm, linux-kernel On Wed, 29 Jun 2016, Vlastimil Babka wrote: > On 06/29/2016 03:39 AM, David Rientjes wrote: > > It's possible that the freeing scanner can be consistently expensive if > > memory is well compacted toward the end of the zone with few free pages > > available in that area. > > > > If all zone memory is synchronously compacted, say with > > /proc/sys/vm/compact_memory, and thp is faulted, it is possible to > > iterate a massive amount of memory even with the per-zone cached free > > position. > > > > For example, after compacting all memory and faulting thp for heap, it > > was observed that compact_free_scanned increased as much as 892518911 4KB > > pages while compact_stall only increased by 171. The freeing scanner > > iterated ~20GB of memory for each compaction stall. > > > > To address this, if too much memory is spanned on the freeing scanner's > > freelist when releasing back to the system, return the low pfn rather than > > the high pfn. It's declared that the freeing scanner will become too > > expensive if the high pfn is used, so use the low pfn instead. > > > > The amount of memory declared as too expensive to iterate is subjectively > > chosen at COMPACT_CLUSTER_MAX << PAGE_SHIFT, which is 512MB with 4KB > > pages. > > > > Signed-off-by: David Rientjes <rientjes@google.com> > > Hmm, I don't know. Seems it only works around one corner case of a larger > issue. The cost for the scanning was already paid, the patch prevents it from > being paid again, but only until the scanners are reset. > The only point of the per-zone cached pfn positions is to avoid doing the same work again unnecessarily. Having the last 16GB of memory at the end of a zone being completely unfree is the same as a single page in the last pageblock free. The number of PageBuddy pages in that amount of memory can be irrelevant up to COMPACT_CLUSTER_MAX. We simply can't afford to scan 16GB of memory looking for free pages. > Note also that THP's no longer do direct compaction by default in recent > kernels. > > To fully solve the freepage scanning issue, we should probably pick and finish > one of the proposed reworks from Joonsoo or myself, or the approach that > replaces free scanner with direct freelist allocations. > Feel free to post the patches, but I believe this simple change makes release_freepages() exceedingly better and can better target memory for the freeing scanner. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive 2016-06-29 20:55 ` David Rientjes @ 2016-06-30 7:31 ` Joonsoo Kim 2016-06-30 7:42 ` Vlastimil Babka 2016-07-11 23:01 ` David Rientjes 0 siblings, 2 replies; 8+ messages in thread From: Joonsoo Kim @ 2016-06-30 7:31 UTC (permalink / raw) To: David Rientjes Cc: Vlastimil Babka, Andrew Morton, Mel Gorman, linux-mm, linux-kernel On Wed, Jun 29, 2016 at 01:55:55PM -0700, David Rientjes wrote: > On Wed, 29 Jun 2016, Vlastimil Babka wrote: > > > On 06/29/2016 03:39 AM, David Rientjes wrote: > > > It's possible that the freeing scanner can be consistently expensive if > > > memory is well compacted toward the end of the zone with few free pages > > > available in that area. > > > > > > If all zone memory is synchronously compacted, say with > > > /proc/sys/vm/compact_memory, and thp is faulted, it is possible to > > > iterate a massive amount of memory even with the per-zone cached free > > > position. > > > > > > For example, after compacting all memory and faulting thp for heap, it > > > was observed that compact_free_scanned increased as much as 892518911 4KB > > > pages while compact_stall only increased by 171. The freeing scanner > > > iterated ~20GB of memory for each compaction stall. > > > > > > To address this, if too much memory is spanned on the freeing scanner's > > > freelist when releasing back to the system, return the low pfn rather than > > > the high pfn. It's declared that the freeing scanner will become too > > > expensive if the high pfn is used, so use the low pfn instead. > > > > > > The amount of memory declared as too expensive to iterate is subjectively > > > chosen at COMPACT_CLUSTER_MAX << PAGE_SHIFT, which is 512MB with 4KB > > > pages. > > > > > > Signed-off-by: David Rientjes <rientjes@google.com> > > > > Hmm, I don't know. Seems it only works around one corner case of a larger > > issue. The cost for the scanning was already paid, the patch prevents it from > > being paid again, but only until the scanners are reset. > > > > The only point of the per-zone cached pfn positions is to avoid doing the > same work again unnecessarily. Having the last 16GB of memory at the end > of a zone being completely unfree is the same as a single page in the last > pageblock free. The number of PageBuddy pages in that amount of memory > can be irrelevant up to COMPACT_CLUSTER_MAX. We simply can't afford to > scan 16GB of memory looking for free pages. We need to find a root cause of this problem, first. I guess that this problem would happen when isolate_freepages_block() early stop due to watermark check (if your patch is applied to your kernel). If scanner meets, cached pfn will be reset and your patch doesn't have any effect. So, I guess that scanner doesn't meet. We enter the compaction with enough free memory so stop in isolate_freepages_block() should be unlikely event but your number shows that it happens frequently? Maybe, if we change all watermark check on compaction.c to use min_wmark, problem would be disappeared. Anyway, could you check how often isolate_freepages_block() is stopped and why? In addition, I worry that your previous patch that makes isolate_freepages_block() stop when watermark doesn't meet would cause compaction non-progress. Amount of free memory can be flutuated so watermark fail would be temporaral. We need to break compaction in this case? It would decrease compaction success rate if there is a memory hogger in parallel. Any idea? Thanks. > > > Note also that THP's no longer do direct compaction by default in recent > > kernels. > > > > To fully solve the freepage scanning issue, we should probably pick and finish > > one of the proposed reworks from Joonsoo or myself, or the approach that > > replaces free scanner with direct freelist allocations. > > > > Feel free to post the patches, but I believe this simple change makes > release_freepages() exceedingly better and can better target memory for > the freeing scanner. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive 2016-06-30 7:31 ` Joonsoo Kim @ 2016-06-30 7:42 ` Vlastimil Babka 2016-06-30 8:16 ` Joonsoo Kim 2016-07-11 23:01 ` David Rientjes 1 sibling, 1 reply; 8+ messages in thread From: Vlastimil Babka @ 2016-06-30 7:42 UTC (permalink / raw) To: Joonsoo Kim, David Rientjes Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel On 06/30/2016 09:31 AM, Joonsoo Kim wrote: > On Wed, Jun 29, 2016 at 01:55:55PM -0700, David Rientjes wrote: >> On Wed, 29 Jun 2016, Vlastimil Babka wrote: >> >>> On 06/29/2016 03:39 AM, David Rientjes wrote: >>>> It's possible that the freeing scanner can be consistently expensive if >>>> memory is well compacted toward the end of the zone with few free pages >>>> available in that area. >>>> >>>> If all zone memory is synchronously compacted, say with >>>> /proc/sys/vm/compact_memory, and thp is faulted, it is possible to >>>> iterate a massive amount of memory even with the per-zone cached free >>>> position. >>>> >>>> For example, after compacting all memory and faulting thp for heap, it >>>> was observed that compact_free_scanned increased as much as 892518911 4KB >>>> pages while compact_stall only increased by 171. The freeing scanner >>>> iterated ~20GB of memory for each compaction stall. >>>> >>>> To address this, if too much memory is spanned on the freeing scanner's >>>> freelist when releasing back to the system, return the low pfn rather than >>>> the high pfn. It's declared that the freeing scanner will become too >>>> expensive if the high pfn is used, so use the low pfn instead. >>>> >>>> The amount of memory declared as too expensive to iterate is subjectively >>>> chosen at COMPACT_CLUSTER_MAX << PAGE_SHIFT, which is 512MB with 4KB >>>> pages. >>>> >>>> Signed-off-by: David Rientjes <rientjes@google.com> >>> >>> Hmm, I don't know. Seems it only works around one corner case of a larger >>> issue. The cost for the scanning was already paid, the patch prevents it from >>> being paid again, but only until the scanners are reset. >>> >> >> The only point of the per-zone cached pfn positions is to avoid doing the >> same work again unnecessarily. Having the last 16GB of memory at the end >> of a zone being completely unfree is the same as a single page in the last >> pageblock free. The number of PageBuddy pages in that amount of memory >> can be irrelevant up to COMPACT_CLUSTER_MAX. We simply can't afford to >> scan 16GB of memory looking for free pages. > > We need to find a root cause of this problem, first. > > I guess that this problem would happen when isolate_freepages_block() > early stop due to watermark check (if your patch is applied to your > kernel). If scanner meets, cached pfn will be reset and your patch > doesn't have any effect. So, I guess that scanner doesn't meet. > > We enter the compaction with enough free memory so stop in > isolate_freepages_block() should be unlikely event but your number > shows that it happens frequently? If it's THP faults, it could be also due to need_resched() or lock contention? > Maybe, if we change all watermark check on compaction.c to use > min_wmark, problem would be disappeared. Basically patches 13 and 16 in https://lkml.org/lkml/2016/6/24/222 > Anyway, could you check how often isolate_freepages_block() is stopped > and why? > > In addition, I worry that your previous patch that makes > isolate_freepages_block() stop when watermark doesn't meet would cause > compaction non-progress. Amount of free memory can be flutuated so > watermark fail would be temporaral. We need to break compaction in > this case? It would decrease compaction success rate if there is a > memory hogger in parallel. Any idea? I think it's better to stop and possibly switch to reclaim (or give up for THP's) than to continue hoping that somebody would free the memory for us. As I explained in the other thread, even if we removed watermark check completely and migration succeeded and formed high-order page, compact_finished() would see failed high-order watermark and return COMPACT_CONTINUE, even if the problem is actually order-0 watermarks. So maybe success rate would be bigger, but at enormous cost. IIRC you even proposed once to add order-0 check (maybe even with some gap like compaction_suitable()?) to compact_finished() that would terminate compaction. Which shouldn't be necessary if we terminate due to split_free_page() failing. > Thanks. > >> >>> Note also that THP's no longer do direct compaction by default in recent >>> kernels. >>> >>> To fully solve the freepage scanning issue, we should probably pick and finish >>> one of the proposed reworks from Joonsoo or myself, or the approach that >>> replaces free scanner with direct freelist allocations. >>> >> >> Feel free to post the patches, but I believe this simple change makes >> release_freepages() exceedingly better and can better target memory for >> the freeing scanner. >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive 2016-06-30 7:42 ` Vlastimil Babka @ 2016-06-30 8:16 ` Joonsoo Kim 0 siblings, 0 replies; 8+ messages in thread From: Joonsoo Kim @ 2016-06-30 8:16 UTC (permalink / raw) To: Vlastimil Babka Cc: David Rientjes, Andrew Morton, Mel Gorman, linux-mm, linux-kernel On Thu, Jun 30, 2016 at 09:42:36AM +0200, Vlastimil Babka wrote: > On 06/30/2016 09:31 AM, Joonsoo Kim wrote: > >On Wed, Jun 29, 2016 at 01:55:55PM -0700, David Rientjes wrote: > >>On Wed, 29 Jun 2016, Vlastimil Babka wrote: > >> > >>>On 06/29/2016 03:39 AM, David Rientjes wrote: > >>>>It's possible that the freeing scanner can be consistently expensive if > >>>>memory is well compacted toward the end of the zone with few free pages > >>>>available in that area. > >>>> > >>>>If all zone memory is synchronously compacted, say with > >>>>/proc/sys/vm/compact_memory, and thp is faulted, it is possible to > >>>>iterate a massive amount of memory even with the per-zone cached free > >>>>position. > >>>> > >>>>For example, after compacting all memory and faulting thp for heap, it > >>>>was observed that compact_free_scanned increased as much as 892518911 4KB > >>>>pages while compact_stall only increased by 171. The freeing scanner > >>>>iterated ~20GB of memory for each compaction stall. > >>>> > >>>>To address this, if too much memory is spanned on the freeing scanner's > >>>>freelist when releasing back to the system, return the low pfn rather than > >>>>the high pfn. It's declared that the freeing scanner will become too > >>>>expensive if the high pfn is used, so use the low pfn instead. > >>>> > >>>>The amount of memory declared as too expensive to iterate is subjectively > >>>>chosen at COMPACT_CLUSTER_MAX << PAGE_SHIFT, which is 512MB with 4KB > >>>>pages. > >>>> > >>>>Signed-off-by: David Rientjes <rientjes@google.com> > >>> > >>>Hmm, I don't know. Seems it only works around one corner case of a larger > >>>issue. The cost for the scanning was already paid, the patch prevents it from > >>>being paid again, but only until the scanners are reset. > >>> > >> > >>The only point of the per-zone cached pfn positions is to avoid doing the > >>same work again unnecessarily. Having the last 16GB of memory at the end > >>of a zone being completely unfree is the same as a single page in the last > >>pageblock free. The number of PageBuddy pages in that amount of memory > >>can be irrelevant up to COMPACT_CLUSTER_MAX. We simply can't afford to > >>scan 16GB of memory looking for free pages. > > > >We need to find a root cause of this problem, first. > > > >I guess that this problem would happen when isolate_freepages_block() > >early stop due to watermark check (if your patch is applied to your > >kernel). If scanner meets, cached pfn will be reset and your patch > >doesn't have any effect. So, I guess that scanner doesn't meet. > > > >We enter the compaction with enough free memory so stop in > >isolate_freepages_block() should be unlikely event but your number > >shows that it happens frequently? > > If it's THP faults, it could be also due to need_resched() or lock > contention? Okay. I missed that. > > >Maybe, if we change all watermark check on compaction.c to use > >min_wmark, problem would be disappeared. > > Basically patches 13 and 16 in https://lkml.org/lkml/2016/6/24/222 Okay. I don't look at it but I like to change to use min_wmark. > >Anyway, could you check how often isolate_freepages_block() is stopped > >and why? > > > >In addition, I worry that your previous patch that makes > >isolate_freepages_block() stop when watermark doesn't meet would cause > >compaction non-progress. Amount of free memory can be flutuated so > >watermark fail would be temporaral. We need to break compaction in > >this case? It would decrease compaction success rate if there is a > >memory hogger in parallel. Any idea? > > I think it's better to stop and possibly switch to reclaim (or give > up for THP's) than to continue hoping that somebody would free the > memory for us. As I explained in the other thread, even if we > removed watermark check completely and migration succeeded and > formed high-order page, compact_finished() would see failed > high-order watermark and return COMPACT_CONTINUE, even if the > problem is actually order-0 watermarks. So maybe success rate would > be bigger, but at enormous cost. IIRC you even proposed once to add I understand your point. I'm not insisting to remove watermark check in split_free_page(). However, my worry still remains. If we use min_wmark, there would be no problem since memory hogger cannot easily consume memory below the min_wmark. But, if we use low_wmark, memory hogger consumes all free memory up to min_wmark repeatedly and compaction will fail repeatedly. This is the problem about robustness and correctness of the system so, even if we pay more, we prohibits such a case. If we once make high order page, it would not be broken easily so we can get it when next reclaim makes order 0 free memory up to watermark. But, if we stop to make high order page when watermark check is failed, we need to run compaction one more time after next reclaim and there is a chance that memory hogger could consume all reclaimed free memory. > order-0 check (maybe even with some gap like compaction_suitable()?) > to compact_finished() that would terminate compaction. Which > shouldn't be necessary if we terminate due to split_free_page() > failing. I can't remember if I did it or not. :) Thanks. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive 2016-06-30 7:31 ` Joonsoo Kim 2016-06-30 7:42 ` Vlastimil Babka @ 2016-07-11 23:01 ` David Rientjes 2016-07-18 5:44 ` Joonsoo Kim 1 sibling, 1 reply; 8+ messages in thread From: David Rientjes @ 2016-07-11 23:01 UTC (permalink / raw) To: Joonsoo Kim Cc: Vlastimil Babka, Andrew Morton, Mel Gorman, linux-mm, linux-kernel On Thu, 30 Jun 2016, Joonsoo Kim wrote: > We need to find a root cause of this problem, first. > > I guess that this problem would happen when isolate_freepages_block() > early stop due to watermark check (if your patch is applied to your > kernel). If scanner meets, cached pfn will be reset and your patch > doesn't have any effect. So, I guess that scanner doesn't meet. > If the scanners meet, we should rely on deferred compaction to suppress further attempts in the near future. This is outside the scope of this fix. > We enter the compaction with enough free memory so stop in > isolate_freepages_block() should be unlikely event but your number > shows that it happens frequently? > It's not the only reason why freepages will be returned to the buddy allocator: if locks become contended because we are spending too much time compacting memory, we can persistently get free pages returned to the end of the zone and then repeatedly iterate >100GB of memory on every call to isolate_freepages(), which makes its own contended checks fire more often. This patch is only an attempt to prevent lenghty iterations when we have recently scanned the memory and found freepages to not be isolatable. > In addition, I worry that your previous patch that makes > isolate_freepages_block() stop when watermark doesn't meet would cause > compaction non-progress. Amount of free memory can be flutuated so > watermark fail would be temporaral. We need to break compaction in > this case? It would decrease compaction success rate if there is a > memory hogger in parallel. Any idea? > In my opinion, which I think is quite well known by now, the compaction freeing scanner shouldn't be checking _any_ watermark. The end result is that we're migrating memory, not allocating additional memory; determining if compaction should be done is best left lower on the stack. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] mm, compaction: make sure freeing scanner isn't persistently expensive 2016-07-11 23:01 ` David Rientjes @ 2016-07-18 5:44 ` Joonsoo Kim 0 siblings, 0 replies; 8+ messages in thread From: Joonsoo Kim @ 2016-07-18 5:44 UTC (permalink / raw) To: David Rientjes Cc: Vlastimil Babka, Andrew Morton, Mel Gorman, linux-mm, linux-kernel On Mon, Jul 11, 2016 at 04:01:52PM -0700, David Rientjes wrote: > On Thu, 30 Jun 2016, Joonsoo Kim wrote: > > > We need to find a root cause of this problem, first. > > > > I guess that this problem would happen when isolate_freepages_block() > > early stop due to watermark check (if your patch is applied to your > > kernel). If scanner meets, cached pfn will be reset and your patch > > doesn't have any effect. So, I guess that scanner doesn't meet. > > > > If the scanners meet, we should rely on deferred compaction to suppress > further attempts in the near future. This is outside the scope of this > fix. > > > We enter the compaction with enough free memory so stop in > > isolate_freepages_block() should be unlikely event but your number > > shows that it happens frequently? > > > > It's not the only reason why freepages will be returned to the buddy > allocator: if locks become contended because we are spending too much time > compacting memory, we can persistently get free pages returned to the end > of the zone and then repeatedly iterate >100GB of memory on every call to > isolate_freepages(), which makes its own contended checks fire more often. > This patch is only an attempt to prevent lenghty iterations when we have > recently scanned the memory and found freepages to not be isolatable. Hmm... I can't understand how freepage scanner is persistently expensive. After freepage scanner get freepages, migration isn't stopped until either migratable pages are empty or freepages are empty. If there is no freepage, above problem doesn't happen so I assume that there is no migratable pages after calling migrate_pages(). If there is no migratable pages, it means that freepages are used by migration. Sometimes later, freepages in that pageblock are exhausted by migration and freepage scanner will move the next pageblock. So, I cannot understand how it is persistently expensive. Am I missing something? If it is caused by the fact that too many freepages are isolated at once (up to migratable pages), we can modify logic to stop isolating freepages when the pageblock is changed and freepage scanner has one or more freepages. > > > In addition, I worry that your previous patch that makes > > isolate_freepages_block() stop when watermark doesn't meet would cause > > compaction non-progress. Amount of free memory can be flutuated so > > watermark fail would be temporaral. We need to break compaction in > > this case? It would decrease compaction success rate if there is a > > memory hogger in parallel. Any idea? > > > > In my opinion, which I think is quite well known by now, the compaction > freeing scanner shouldn't be checking _any_ watermark. The end result is > that we're migrating memory, not allocating additional memory; determining > if compaction should be done is best left lower on the stack. Hmm...if there are many parallel compactors and we have no watermark check, they consume all emergency memory. It can be mitigated by isolating just one freepage in this case, but, potential risk would not be disappeared. Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-07-18 5:40 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-06-29 1:39 [patch] mm, compaction: make sure freeing scanner isn't persistently expensive David Rientjes 2016-06-29 6:53 ` Vlastimil Babka 2016-06-29 20:55 ` David Rientjes 2016-06-30 7:31 ` Joonsoo Kim 2016-06-30 7:42 ` Vlastimil Babka 2016-06-30 8:16 ` Joonsoo Kim 2016-07-11 23:01 ` David Rientjes 2016-07-18 5:44 ` Joonsoo Kim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).