All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/page_alloc: prevent merging between isolated and other pageblocks
@ 2016-03-23  9:40 ` Vlastimil Babka
  0 siblings, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2016-03-23  9:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Leizhen, Sasha Levin, qiuxishi,
	Catalin Marinas, Will Deacon, Arnd Bergmann, dingtinahong,
	chenjie6, Lucas Stach, Vlastimil Babka, Aneesh Kumar K.V,
	Hanjun Guo, Johannes Weiner, Joonsoo Kim, Kirill A. Shutemov,
	Laura Abbott, Mel Gorman, Michal Nazarewicz, Minchan Kim,
	Naoya Horiguchi, stable, Yasuaki Ishimatsu, Zhang Yanfei

Hanjun Guo has reported that a CMA stress test causes broken accounting of
CMA and free pages:

> Before the test, I got:
> -bash-4.3# cat /proc/meminfo | grep Cma
> CmaTotal:         204800 kB
> CmaFree:          195044 kB
>
>
> After running the test:
> -bash-4.3# cat /proc/meminfo | grep Cma
> CmaTotal:         204800 kB
> CmaFree:         6602584 kB
>
> So the freed CMA memory is more than total..
>
> Also the the MemFree is more than mem total:
>
> -bash-4.3# cat /proc/meminfo
> MemTotal:       16342016 kB
> MemFree:        22367268 kB
> MemAvailable:   22370528 kB

Laura Abbott has confirmed the issue and suspected the freepage accounting
rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is
caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
pageblocks:

> CMA isolates MAX_ORDER aligned blocks, but, during the process,
> partialy isolated block exists. If MAX_ORDER is 11 and
> pageblock_order is 9, two pageblocks make up MAX_ORDER
> aligned block and I can think following scenario because pageblock
> (un)isolation would be done one by one.
>
> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
> MIGRATE_ISOLATE, respectively.
>
> CC -> IC -> II (Isolation)
> II -> CI -> CC (Un-isolation)
>
> If some pages are freed at this intermediate state such as IC or CI,
> that page could be merged to the other page that is resident on
> different type of pageblock and it will cause wrong freepage count.

This was supposed to be prevented by CMA operating on MAX_ORDER blocks, but
since it doesn't hold the zone->lock between pageblocks, a race window does
exist.

It's also likely that unexpected merging can occur between MIGRATE_ISOLATE
and non-CMA pageblocks. This should be prevented in __free_one_page() since
commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated
pageblock"). However, we only check the migratetype of the pageblock where
buddy merging has been initiated, not the migratetype of the buddy pageblock
(or group of pageblocks) which can be MIGRATE_ISOLATE.

Joonsoo has suggested checking for buddy migratetype as part of
page_is_buddy(), but that would add extra checks in allocator hotpath and
bloat-o-meter has shown significant code bloat (the function is inline).

This patch reduces the bloat at some expense of more complicated code. The
buddy-merging while-loop in __free_one_page() is initially bounded to
pageblock_border and without any migratetype checks. The checks are placed
outside, bumping the max_order if merging is allowed, and returning to the
while-loop with a statement which can't be possibly considered harmful.

This fixes the accounting bug and also removes the arguably weird state in the
original commit 3c605096d315 where buddies could be left unmerged.

Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
Link: https://lkml.org/lkml/2016/3/2/280
Reported-by: Hanjun Guo <guohanjun@huawei.com>
Debugged-by: Laura Abbott <labbott@redhat.com>
Debugged-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 46 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 33 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c46b75d14b6f..b9785af4fae2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -683,34 +683,28 @@ static inline void __free_one_page(struct page *page,
 	unsigned long combined_idx;
 	unsigned long uninitialized_var(buddy_idx);
 	struct page *buddy;
-	unsigned int max_order = MAX_ORDER;
+	unsigned int max_order;
+
+	max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
 
 	VM_BUG_ON(!zone_is_initialized(zone));
 	VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
 
 	VM_BUG_ON(migratetype == -1);
-	if (is_migrate_isolate(migratetype)) {
-		/*
-		 * We restrict max order of merging to prevent merge
-		 * between freepages on isolate pageblock and normal
-		 * pageblock. Without this, pageblock isolation
-		 * could cause incorrect freepage accounting.
-		 */
-		max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
-	} else {
+	if (likely(!is_migrate_isolate(migratetype)))
 		__mod_zone_freepage_state(zone, 1 << order, migratetype);
-	}
 
-	page_idx = pfn & ((1 << max_order) - 1);
+	page_idx = pfn & ((1 << MAX_ORDER) - 1);
 
 	VM_BUG_ON_PAGE(page_idx & ((1 << order) - 1), page);
 	VM_BUG_ON_PAGE(bad_range(zone, page), page);
 
+continue_merging:
 	while (order < max_order - 1) {
 		buddy_idx = __find_buddy_index(page_idx, order);
 		buddy = page + (buddy_idx - page_idx);
 		if (!page_is_buddy(page, buddy, order))
-			break;
+			goto done_merging;
 		/*
 		 * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page,
 		 * merge with it and move up one order.
@@ -727,6 +721,32 @@ static inline void __free_one_page(struct page *page,
 		page_idx = combined_idx;
 		order++;
 	}
+	if (max_order < MAX_ORDER) {
+		/* If we are here, it means order is >= pageblock_order.
+		 * We want to prevent merge between freepages on isolate
+		 * pageblock and normal pageblock. Without this, pageblock
+		 * isolation could cause incorrect freepage or CMA accounting.
+		 *
+		 * We don't want to hit this code for the more frequent
+		 * low-order merging.
+		 */
+		if (unlikely(has_isolate_pageblock(zone))) {
+			int buddy_mt;
+
+			buddy_idx = __find_buddy_index(page_idx, order);
+			buddy = page + (buddy_idx - page_idx);
+			buddy_mt = get_pageblock_migratetype(buddy);
+
+			if (migratetype != buddy_mt
+					&& (is_migrate_isolate(migratetype) ||
+						is_migrate_isolate(buddy_mt)))
+				goto done_merging;
+		}
+		max_order++;
+		goto continue_merging;
+	}
+
+done_merging:
 	set_page_order(page, order);
 
 	/*
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] mm/page_alloc: prevent merging between isolated and other pageblocks
@ 2016-03-23  9:40 ` Vlastimil Babka
  0 siblings, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2016-03-23  9:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Leizhen, Sasha Levin, qiuxishi,
	Catalin Marinas, Will Deacon, Arnd Bergmann, dingtinahong,
	chenjie6, Lucas Stach, Vlastimil Babka, Aneesh Kumar K.V,
	Hanjun Guo, Johannes Weiner, Joonsoo Kim, Kirill A. Shutemov,
	Laura Abbott, Mel Gorman, Michal Nazarewicz, Minchan Kim,
	Naoya Horiguchi, stable, Yasuaki Ishimatsu, Zhang Yanfei

Hanjun Guo has reported that a CMA stress test causes broken accounting of
CMA and free pages:

> Before the test, I got:
> -bash-4.3# cat /proc/meminfo | grep Cma
> CmaTotal:         204800 kB
> CmaFree:          195044 kB
>
>
> After running the test:
> -bash-4.3# cat /proc/meminfo | grep Cma
> CmaTotal:         204800 kB
> CmaFree:         6602584 kB
>
> So the freed CMA memory is more than total..
>
> Also the the MemFree is more than mem total:
>
> -bash-4.3# cat /proc/meminfo
> MemTotal:       16342016 kB
> MemFree:        22367268 kB
> MemAvailable:   22370528 kB

Laura Abbott has confirmed the issue and suspected the freepage accounting
rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is
caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
pageblocks:

> CMA isolates MAX_ORDER aligned blocks, but, during the process,
> partialy isolated block exists. If MAX_ORDER is 11 and
> pageblock_order is 9, two pageblocks make up MAX_ORDER
> aligned block and I can think following scenario because pageblock
> (un)isolation would be done one by one.
>
> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
> MIGRATE_ISOLATE, respectively.
>
> CC -> IC -> II (Isolation)
> II -> CI -> CC (Un-isolation)
>
> If some pages are freed at this intermediate state such as IC or CI,
> that page could be merged to the other page that is resident on
> different type of pageblock and it will cause wrong freepage count.

This was supposed to be prevented by CMA operating on MAX_ORDER blocks, but
since it doesn't hold the zone->lock between pageblocks, a race window does
exist.

It's also likely that unexpected merging can occur between MIGRATE_ISOLATE
and non-CMA pageblocks. This should be prevented in __free_one_page() since
commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated
pageblock"). However, we only check the migratetype of the pageblock where
buddy merging has been initiated, not the migratetype of the buddy pageblock
(or group of pageblocks) which can be MIGRATE_ISOLATE.

Joonsoo has suggested checking for buddy migratetype as part of
page_is_buddy(), but that would add extra checks in allocator hotpath and
bloat-o-meter has shown significant code bloat (the function is inline).

This patch reduces the bloat at some expense of more complicated code. The
buddy-merging while-loop in __free_one_page() is initially bounded to
pageblock_border and without any migratetype checks. The checks are placed
outside, bumping the max_order if merging is allowed, and returning to the
while-loop with a statement which can't be possibly considered harmful.

This fixes the accounting bug and also removes the arguably weird state in the
original commit 3c605096d315 where buddies could be left unmerged.

Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
Link: https://lkml.org/lkml/2016/3/2/280
Reported-by: Hanjun Guo <guohanjun@huawei.com>
Debugged-by: Laura Abbott <labbott@redhat.com>
Debugged-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 46 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 33 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c46b75d14b6f..b9785af4fae2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -683,34 +683,28 @@ static inline void __free_one_page(struct page *page,
 	unsigned long combined_idx;
 	unsigned long uninitialized_var(buddy_idx);
 	struct page *buddy;
-	unsigned int max_order = MAX_ORDER;
+	unsigned int max_order;
+
+	max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
 
 	VM_BUG_ON(!zone_is_initialized(zone));
 	VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
 
 	VM_BUG_ON(migratetype == -1);
-	if (is_migrate_isolate(migratetype)) {
-		/*
-		 * We restrict max order of merging to prevent merge
-		 * between freepages on isolate pageblock and normal
-		 * pageblock. Without this, pageblock isolation
-		 * could cause incorrect freepage accounting.
-		 */
-		max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
-	} else {
+	if (likely(!is_migrate_isolate(migratetype)))
 		__mod_zone_freepage_state(zone, 1 << order, migratetype);
-	}
 
-	page_idx = pfn & ((1 << max_order) - 1);
+	page_idx = pfn & ((1 << MAX_ORDER) - 1);
 
 	VM_BUG_ON_PAGE(page_idx & ((1 << order) - 1), page);
 	VM_BUG_ON_PAGE(bad_range(zone, page), page);
 
+continue_merging:
 	while (order < max_order - 1) {
 		buddy_idx = __find_buddy_index(page_idx, order);
 		buddy = page + (buddy_idx - page_idx);
 		if (!page_is_buddy(page, buddy, order))
-			break;
+			goto done_merging;
 		/*
 		 * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page,
 		 * merge with it and move up one order.
@@ -727,6 +721,32 @@ static inline void __free_one_page(struct page *page,
 		page_idx = combined_idx;
 		order++;
 	}
+	if (max_order < MAX_ORDER) {
+		/* If we are here, it means order is >= pageblock_order.
+		 * We want to prevent merge between freepages on isolate
+		 * pageblock and normal pageblock. Without this, pageblock
+		 * isolation could cause incorrect freepage or CMA accounting.
+		 *
+		 * We don't want to hit this code for the more frequent
+		 * low-order merging.
+		 */
+		if (unlikely(has_isolate_pageblock(zone))) {
+			int buddy_mt;
+
+			buddy_idx = __find_buddy_index(page_idx, order);
+			buddy = page + (buddy_idx - page_idx);
+			buddy_mt = get_pageblock_migratetype(buddy);
+
+			if (migratetype != buddy_mt
+					&& (is_migrate_isolate(migratetype) ||
+						is_migrate_isolate(buddy_mt)))
+				goto done_merging;
+		}
+		max_order++;
+		goto continue_merging;
+	}
+
+done_merging:
 	set_page_order(page, order);
 
 	/*
-- 
2.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/page_alloc: prevent merging between isolated and other pageblocks
  2016-03-23  9:40 ` Vlastimil Babka
  (?)
@ 2016-03-23 11:38   ` Hanjun Guo
  -1 siblings, 0 replies; 10+ messages in thread
From: Hanjun Guo @ 2016-03-23 11:38 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton
  Cc: linux-mm, linux-kernel, Leizhen, Sasha Levin, qiuxishi,
	Catalin Marinas, Will Deacon, Arnd Bergmann, dingtinahong,
	chenjie6, Lucas Stach, Aneesh Kumar K.V, Johannes Weiner,
	Joonsoo Kim, Kirill A. Shutemov, Laura Abbott, Mel Gorman,
	Michal Nazarewicz, Minchan Kim, Naoya Horiguchi, stable,
	Yasuaki Ishimatsu, Zhang Yanfei

On 2016/3/23 17:40, Vlastimil Babka wrote:
> Hanjun Guo has reported that a CMA stress test causes broken accounting of
> CMA and free pages:
>
>> Before the test, I got:
>> -bash-4.3# cat /proc/meminfo | grep Cma
>> CmaTotal:         204800 kB
>> CmaFree:          195044 kB
>>
>>
>> After running the test:
>> -bash-4.3# cat /proc/meminfo | grep Cma
>> CmaTotal:         204800 kB
>> CmaFree:         6602584 kB
>>
>> So the freed CMA memory is more than total..
>>
>> Also the the MemFree is more than mem total:
>>
>> -bash-4.3# cat /proc/meminfo
>> MemTotal:       16342016 kB
>> MemFree:        22367268 kB
>> MemAvailable:   22370528 kB
> Laura Abbott has confirmed the issue and suspected the freepage accounting
> rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is
> caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
> pageblocks:
>
>> CMA isolates MAX_ORDER aligned blocks, but, during the process,
>> partialy isolated block exists. If MAX_ORDER is 11 and
>> pageblock_order is 9, two pageblocks make up MAX_ORDER
>> aligned block and I can think following scenario because pageblock
>> (un)isolation would be done one by one.
>>
>> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
>> MIGRATE_ISOLATE, respectively.
>>
>> CC -> IC -> II (Isolation)
>> II -> CI -> CC (Un-isolation)
>>
>> If some pages are freed at this intermediate state such as IC or CI,
>> that page could be merged to the other page that is resident on
>> different type of pageblock and it will cause wrong freepage count.
> This was supposed to be prevented by CMA operating on MAX_ORDER blocks, but
> since it doesn't hold the zone->lock between pageblocks, a race window does
> exist.
>
> It's also likely that unexpected merging can occur between MIGRATE_ISOLATE
> and non-CMA pageblocks. This should be prevented in __free_one_page() since
> commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated
> pageblock"). However, we only check the migratetype of the pageblock where
> buddy merging has been initiated, not the migratetype of the buddy pageblock
> (or group of pageblocks) which can be MIGRATE_ISOLATE.
>
> Joonsoo has suggested checking for buddy migratetype as part of
> page_is_buddy(), but that would add extra checks in allocator hotpath and
> bloat-o-meter has shown significant code bloat (the function is inline).
>
> This patch reduces the bloat at some expense of more complicated code. The
> buddy-merging while-loop in __free_one_page() is initially bounded to
> pageblock_border and without any migratetype checks. The checks are placed
> outside, bumping the max_order if merging is allowed, and returning to the
> while-loop with a statement which can't be possibly considered harmful.
>
> This fixes the accounting bug and also removes the arguably weird state in the
> original commit 3c605096d315 where buddies could be left unmerged.
>
> Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
> Link: https://lkml.org/lkml/2016/3/2/280
> Reported-by: Hanjun Guo <guohanjun@huawei.com>

With the same stress test case (alloc/free cma) running for more than
one hour, the bug I reported is gone.

Tested-by: Hanjun Guo <guohanjun@huawei.com>

Thanks for debugging!
Hanjun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/page_alloc: prevent merging between isolated and other pageblocks
@ 2016-03-23 11:38   ` Hanjun Guo
  0 siblings, 0 replies; 10+ messages in thread
From: Hanjun Guo @ 2016-03-23 11:38 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton
  Cc: linux-mm, linux-kernel, Leizhen, Sasha Levin, qiuxishi,
	Catalin Marinas, Will Deacon, Arnd Bergmann, dingtinahong,
	chenjie6, Lucas Stach, Aneesh Kumar K.V, Johannes Weiner,
	Joonsoo Kim, Kirill A. Shutemov, Laura Abbott, Mel Gorman,
	Michal Nazarewicz, Minchan Kim, Naoya Horiguchi, stable,
	Yasuaki Ishimatsu, Zhang Yanfei

On 2016/3/23 17:40, Vlastimil Babka wrote:
> Hanjun Guo has reported that a CMA stress test causes broken accounting of
> CMA and free pages:
>
>> Before the test, I got:
>> -bash-4.3# cat /proc/meminfo | grep Cma
>> CmaTotal:         204800 kB
>> CmaFree:          195044 kB
>>
>>
>> After running the test:
>> -bash-4.3# cat /proc/meminfo | grep Cma
>> CmaTotal:         204800 kB
>> CmaFree:         6602584 kB
>>
>> So the freed CMA memory is more than total..
>>
>> Also the the MemFree is more than mem total:
>>
>> -bash-4.3# cat /proc/meminfo
>> MemTotal:       16342016 kB
>> MemFree:        22367268 kB
>> MemAvailable:   22370528 kB
> Laura Abbott has confirmed the issue and suspected the freepage accounting
> rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is
> caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
> pageblocks:
>
>> CMA isolates MAX_ORDER aligned blocks, but, during the process,
>> partialy isolated block exists. If MAX_ORDER is 11 and
>> pageblock_order is 9, two pageblocks make up MAX_ORDER
>> aligned block and I can think following scenario because pageblock
>> (un)isolation would be done one by one.
>>
>> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
>> MIGRATE_ISOLATE, respectively.
>>
>> CC -> IC -> II (Isolation)
>> II -> CI -> CC (Un-isolation)
>>
>> If some pages are freed at this intermediate state such as IC or CI,
>> that page could be merged to the other page that is resident on
>> different type of pageblock and it will cause wrong freepage count.
> This was supposed to be prevented by CMA operating on MAX_ORDER blocks, but
> since it doesn't hold the zone->lock between pageblocks, a race window does
> exist.
>
> It's also likely that unexpected merging can occur between MIGRATE_ISOLATE
> and non-CMA pageblocks. This should be prevented in __free_one_page() since
> commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated
> pageblock"). However, we only check the migratetype of the pageblock where
> buddy merging has been initiated, not the migratetype of the buddy pageblock
> (or group of pageblocks) which can be MIGRATE_ISOLATE.
>
> Joonsoo has suggested checking for buddy migratetype as part of
> page_is_buddy(), but that would add extra checks in allocator hotpath and
> bloat-o-meter has shown significant code bloat (the function is inline).
>
> This patch reduces the bloat at some expense of more complicated code. The
> buddy-merging while-loop in __free_one_page() is initially bounded to
> pageblock_border and without any migratetype checks. The checks are placed
> outside, bumping the max_order if merging is allowed, and returning to the
> while-loop with a statement which can't be possibly considered harmful.
>
> This fixes the accounting bug and also removes the arguably weird state in the
> original commit 3c605096d315 where buddies could be left unmerged.
>
> Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
> Link: https://lkml.org/lkml/2016/3/2/280
> Reported-by: Hanjun Guo <guohanjun@huawei.com>

With the same stress test case (alloc/free cma) running for more than
one hour, the bug I reported is gone.

Tested-by: Hanjun Guo <guohanjun@huawei.com>

Thanks for debugging!
Hanjun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/page_alloc: prevent merging between isolated and other pageblocks
@ 2016-03-23 11:38   ` Hanjun Guo
  0 siblings, 0 replies; 10+ messages in thread
From: Hanjun Guo @ 2016-03-23 11:38 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton
  Cc: linux-mm, linux-kernel, Leizhen, Sasha Levin, qiuxishi,
	Catalin Marinas, Will Deacon, Arnd Bergmann, dingtinahong,
	chenjie6, Lucas Stach, Aneesh Kumar K.V, Johannes Weiner,
	Joonsoo Kim, Kirill A. Shutemov, Laura Abbott, Mel Gorman,
	Michal Nazarewicz, Minchan Kim, Naoya Horiguchi, stable,
	Yasuaki Ishimatsu, Zhang Yanfei

On 2016/3/23 17:40, Vlastimil Babka wrote:
> Hanjun Guo has reported that a CMA stress test causes broken accounting of
> CMA and free pages:
>
>> Before the test, I got:
>> -bash-4.3# cat /proc/meminfo | grep Cma
>> CmaTotal:         204800 kB
>> CmaFree:          195044 kB
>>
>>
>> After running the test:
>> -bash-4.3# cat /proc/meminfo | grep Cma
>> CmaTotal:         204800 kB
>> CmaFree:         6602584 kB
>>
>> So the freed CMA memory is more than total..
>>
>> Also the the MemFree is more than mem total:
>>
>> -bash-4.3# cat /proc/meminfo
>> MemTotal:       16342016 kB
>> MemFree:        22367268 kB
>> MemAvailable:   22370528 kB
> Laura Abbott has confirmed the issue and suspected the freepage accounting
> rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is
> caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
> pageblocks:
>
>> CMA isolates MAX_ORDER aligned blocks, but, during the process,
>> partialy isolated block exists. If MAX_ORDER is 11 and
>> pageblock_order is 9, two pageblocks make up MAX_ORDER
>> aligned block and I can think following scenario because pageblock
>> (un)isolation would be done one by one.
>>
>> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
>> MIGRATE_ISOLATE, respectively.
>>
>> CC -> IC -> II (Isolation)
>> II -> CI -> CC (Un-isolation)
>>
>> If some pages are freed at this intermediate state such as IC or CI,
>> that page could be merged to the other page that is resident on
>> different type of pageblock and it will cause wrong freepage count.
> This was supposed to be prevented by CMA operating on MAX_ORDER blocks, but
> since it doesn't hold the zone->lock between pageblocks, a race window does
> exist.
>
> It's also likely that unexpected merging can occur between MIGRATE_ISOLATE
> and non-CMA pageblocks. This should be prevented in __free_one_page() since
> commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated
> pageblock"). However, we only check the migratetype of the pageblock where
> buddy merging has been initiated, not the migratetype of the buddy pageblock
> (or group of pageblocks) which can be MIGRATE_ISOLATE.
>
> Joonsoo has suggested checking for buddy migratetype as part of
> page_is_buddy(), but that would add extra checks in allocator hotpath and
> bloat-o-meter has shown significant code bloat (the function is inline).
>
> This patch reduces the bloat at some expense of more complicated code. The
> buddy-merging while-loop in __free_one_page() is initially bounded to
> pageblock_border and without any migratetype checks. The checks are placed
> outside, bumping the max_order if merging is allowed, and returning to the
> while-loop with a statement which can't be possibly considered harmful.
>
> This fixes the accounting bug and also removes the arguably weird state in the
> original commit 3c605096d315 where buddies could be left unmerged.
>
> Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
> Link: https://lkml.org/lkml/2016/3/2/280
> Reported-by: Hanjun Guo <guohanjun@huawei.com>

With the same stress test case (alloc/free cma) running for more than
one hour, the bug I reported is gone.

Tested-by: Hanjun Guo <guohanjun@huawei.com>

Thanks for debugging!
Hanjun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/page_alloc: prevent merging between isolated and other pageblocks
  2016-03-23 11:38   ` Hanjun Guo
@ 2016-03-23 12:38     ` Vlastimil Babka
  -1 siblings, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2016-03-23 12:38 UTC (permalink / raw)
  To: Hanjun Guo, Andrew Morton
  Cc: linux-mm, linux-kernel, Leizhen, Sasha Levin, qiuxishi,
	Catalin Marinas, Will Deacon, Arnd Bergmann, dingtinahong,
	chenjie6, Lucas Stach, Aneesh Kumar K.V, Johannes Weiner,
	Joonsoo Kim, Kirill A. Shutemov, Laura Abbott, Mel Gorman,
	Michal Nazarewicz, Minchan Kim, Naoya Horiguchi, stable,
	Yasuaki Ishimatsu, Zhang Yanfei

On 03/23/2016 12:38 PM, Hanjun Guo wrote:
> On 2016/3/23 17:40, Vlastimil Babka wrote:
>> Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
>> Link: https://lkml.org/lkml/2016/3/2/280
>> Reported-by: Hanjun Guo <guohanjun@huawei.com>
>
> With the same stress test case (alloc/free cma) running for more than
> one hour, the bug I reported is gone.
>
> Tested-by: Hanjun Guo <guohanjun@huawei.com>

I wanted to add that, but forgot. Thanks!
I also forgot to transfer:

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

> Thanks for debugging!
> Hanjun
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/page_alloc: prevent merging between isolated and other pageblocks
@ 2016-03-23 12:38     ` Vlastimil Babka
  0 siblings, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2016-03-23 12:38 UTC (permalink / raw)
  To: Hanjun Guo, Andrew Morton
  Cc: linux-mm, linux-kernel, Leizhen, Sasha Levin, qiuxishi,
	Catalin Marinas, Will Deacon, Arnd Bergmann, dingtinahong,
	chenjie6, Lucas Stach, Aneesh Kumar K.V, Johannes Weiner,
	Joonsoo Kim, Kirill A. Shutemov, Laura Abbott, Mel Gorman,
	Michal Nazarewicz, Minchan Kim, Naoya Horiguchi, stable,
	Yasuaki Ishimatsu, Zhang Yanfei

On 03/23/2016 12:38 PM, Hanjun Guo wrote:
> On 2016/3/23 17:40, Vlastimil Babka wrote:
>> Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
>> Link: https://lkml.org/lkml/2016/3/2/280
>> Reported-by: Hanjun Guo <guohanjun@huawei.com>
>
> With the same stress test case (alloc/free cma) running for more than
> one hour, the bug I reported is gone.
>
> Tested-by: Hanjun Guo <guohanjun@huawei.com>

I wanted to add that, but forgot. Thanks!
I also forgot to transfer:

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

> Thanks for debugging!
> Hanjun
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/page_alloc: prevent merging between isolated and other pageblocks
  2016-03-23  9:40 ` Vlastimil Babka
  (?)
@ 2016-03-29 14:13   ` Michal Nazarewicz
  -1 siblings, 0 replies; 10+ messages in thread
From: Michal Nazarewicz @ 2016-03-29 14:13 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton
  Cc: linux-mm, linux-kernel, Leizhen, Sasha Levin, qiuxishi,
	Catalin Marinas, Will Deacon, Arnd Bergmann, dingtinahong,
	chenjie6, Lucas Stach, Vlastimil Babka, Aneesh Kumar K.V,
	Hanjun Guo, Johannes Weiner, Joonsoo Kim, Kirill A. Shutemov,
	Laura Abbott, Mel Gorman, Minchan Kim, Naoya Horiguchi, stable,
	Yasuaki Ishimatsu, Zhang Yanfei

On Wed, Mar 23 2016, Vlastimil Babka wrote:
> Hanjun Guo has reported that a CMA stress test causes broken accounting of
> CMA and free pages:
>
>> Before the test, I got:
>> -bash-4.3# cat /proc/meminfo | grep Cma
>> CmaTotal:         204800 kB
>> CmaFree:          195044 kB
>>
>>
>> After running the test:
>> -bash-4.3# cat /proc/meminfo | grep Cma
>> CmaTotal:         204800 kB
>> CmaFree:         6602584 kB
>>
>> So the freed CMA memory is more than total..
>>
>> Also the the MemFree is more than mem total:
>>
>> -bash-4.3# cat /proc/meminfo
>> MemTotal:       16342016 kB
>> MemFree:        22367268 kB
>> MemAvailable:   22370528 kB
>
> Laura Abbott has confirmed the issue and suspected the freepage accounting
> rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is
> caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
> pageblocks:
>
>> CMA isolates MAX_ORDER aligned blocks, but, during the process,
>> partialy isolated block exists. If MAX_ORDER is 11 and
>> pageblock_order is 9, two pageblocks make up MAX_ORDER
>> aligned block and I can think following scenario because pageblock
>> (un)isolation would be done one by one.
>>
>> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
>> MIGRATE_ISOLATE, respectively.
>>
>> CC -> IC -> II (Isolation)
>> II -> CI -> CC (Un-isolation)
>>
>> If some pages are freed at this intermediate state such as IC or CI,
>> that page could be merged to the other page that is resident on
>> different type of pageblock and it will cause wrong freepage count.
>
> This was supposed to be prevented by CMA operating on MAX_ORDER blocks, but
> since it doesn't hold the zone->lock between pageblocks, a race window does
> exist.
>
> It's also likely that unexpected merging can occur between MIGRATE_ISOLATE
> and non-CMA pageblocks. This should be prevented in __free_one_page() since
> commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated
> pageblock"). However, we only check the migratetype of the pageblock where
> buddy merging has been initiated, not the migratetype of the buddy pageblock
> (or group of pageblocks) which can be MIGRATE_ISOLATE.
>
> Joonsoo has suggested checking for buddy migratetype as part of
> page_is_buddy(), but that would add extra checks in allocator hotpath and
> bloat-o-meter has shown significant code bloat (the function is inline).
>
> This patch reduces the bloat at some expense of more complicated code. The
> buddy-merging while-loop in __free_one_page() is initially bounded to
> pageblock_border and without any migratetype checks. The checks are placed
> outside, bumping the max_order if merging is allowed, and returning to the
> while-loop with a statement which can't be possibly considered harmful.
>
> This fixes the accounting bug and also removes the arguably weird state in the
> original commit 3c605096d315 where buddies could be left unmerged.
>
> Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
> Link: https://lkml.org/lkml/2016/3/2/280
> Reported-by: Hanjun Guo <guohanjun@huawei.com>
> Debugged-by: Laura Abbott <labbott@redhat.com>
> Debugged-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: <stable@vger.kernel.org> # v3.18+
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

I wonder if with this change,

	ret = start_isolate_page_range(pfn_max_align_down(start),
				       pfn_max_align_up(end), migratetype,
				       false);

in alloc_contig_range could be loosen up to align to pageblocks instead
of having to use max(pageblock, max_page).  It feels that it should be
possible, but on the other hand, I’m not certain how buddy allocator
will behave if a max_order page spans MIGRATE_CMA and MIGRATE_ISOLATE
pageblocks.  I guess start_isolate_page_range would have to split such
pages?

> ---
>  mm/page_alloc.c | 46 +++++++++++++++++++++++++++++++++-------------
>  1 file changed, 33 insertions(+), 13 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c46b75d14b6f..b9785af4fae2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -683,34 +683,28 @@ static inline void __free_one_page(struct page *page,
>  	unsigned long combined_idx;
>  	unsigned long uninitialized_var(buddy_idx);
>  	struct page *buddy;
> -	unsigned int max_order = MAX_ORDER;
> +	unsigned int max_order;
> +
> +	max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
>  
>  	VM_BUG_ON(!zone_is_initialized(zone));
>  	VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
>  
>  	VM_BUG_ON(migratetype == -1);
> -	if (is_migrate_isolate(migratetype)) {
> -		/*
> -		 * We restrict max order of merging to prevent merge
> -		 * between freepages on isolate pageblock and normal
> -		 * pageblock. Without this, pageblock isolation
> -		 * could cause incorrect freepage accounting.
> -		 */
> -		max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
> -	} else {
> +	if (likely(!is_migrate_isolate(migratetype)))
>  		__mod_zone_freepage_state(zone, 1 << order, migratetype);
> -	}
>  
> -	page_idx = pfn & ((1 << max_order) - 1);
> +	page_idx = pfn & ((1 << MAX_ORDER) - 1);
>  
>  	VM_BUG_ON_PAGE(page_idx & ((1 << order) - 1), page);
>  	VM_BUG_ON_PAGE(bad_range(zone, page), page);
>  
> +continue_merging:
>  	while (order < max_order - 1) {
>  		buddy_idx = __find_buddy_index(page_idx, order);
>  		buddy = page + (buddy_idx - page_idx);
>  		if (!page_is_buddy(page, buddy, order))
> -			break;
> +			goto done_merging;
>  		/*
>  		 * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page,
>  		 * merge with it and move up one order.
> @@ -727,6 +721,32 @@ static inline void __free_one_page(struct page *page,
>  		page_idx = combined_idx;
>  		order++;
>  	}
> +	if (max_order < MAX_ORDER) {
> +		/* If we are here, it means order is >= pageblock_order.
> +		 * We want to prevent merge between freepages on isolate
> +		 * pageblock and normal pageblock. Without this, pageblock
> +		 * isolation could cause incorrect freepage or CMA accounting.
> +		 *
> +		 * We don't want to hit this code for the more frequent
> +		 * low-order merging.
> +		 */
> +		if (unlikely(has_isolate_pageblock(zone))) {
> +			int buddy_mt;
> +
> +			buddy_idx = __find_buddy_index(page_idx, order);
> +			buddy = page + (buddy_idx - page_idx);
> +			buddy_mt = get_pageblock_migratetype(buddy);
> +
> +			if (migratetype != buddy_mt
> +					&& (is_migrate_isolate(migratetype) ||
> +						is_migrate_isolate(buddy_mt)))
> +				goto done_merging;
> +		}
> +		max_order++;
> +		goto continue_merging;
> +	}
> +
> +done_merging:
>  	set_page_order(page, order);
>  
>  	/*
> -- 
> 2.7.3
>

-- 
Best regards
ミハウ “𝓶𝓲𝓷𝓪86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/page_alloc: prevent merging between isolated and other pageblocks
@ 2016-03-29 14:13   ` Michal Nazarewicz
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Nazarewicz @ 2016-03-29 14:13 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton
  Cc: linux-mm, linux-kernel, Leizhen, Sasha Levin, qiuxishi,
	Catalin Marinas, Will Deacon, Arnd Bergmann, dingtinahong,
	chenjie6, Lucas Stach, Vlastimil Babka, Aneesh Kumar K.V,
	Hanjun Guo, Johannes Weiner, Joonsoo Kim, Kirill A. Shutemov,
	Laura Abbott, Mel Gorman, Minchan Kim, Naoya Horiguchi, stable,
	Yasuaki Ishimatsu, Zhang Yanfei

On Wed, Mar 23 2016, Vlastimil Babka wrote:
> Hanjun Guo has reported that a CMA stress test causes broken accounting of
> CMA and free pages:
>
>> Before the test, I got:
>> -bash-4.3# cat /proc/meminfo | grep Cma
>> CmaTotal:         204800 kB
>> CmaFree:          195044 kB
>>
>>
>> After running the test:
>> -bash-4.3# cat /proc/meminfo | grep Cma
>> CmaTotal:         204800 kB
>> CmaFree:         6602584 kB
>>
>> So the freed CMA memory is more than total..
>>
>> Also the the MemFree is more than mem total:
>>
>> -bash-4.3# cat /proc/meminfo
>> MemTotal:       16342016 kB
>> MemFree:        22367268 kB
>> MemAvailable:   22370528 kB
>
> Laura Abbott has confirmed the issue and suspected the freepage accounting
> rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is
> caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
> pageblocks:
>
>> CMA isolates MAX_ORDER aligned blocks, but, during the process,
>> partialy isolated block exists. If MAX_ORDER is 11 and
>> pageblock_order is 9, two pageblocks make up MAX_ORDER
>> aligned block and I can think following scenario because pageblock
>> (un)isolation would be done one by one.
>>
>> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
>> MIGRATE_ISOLATE, respectively.
>>
>> CC -> IC -> II (Isolation)
>> II -> CI -> CC (Un-isolation)
>>
>> If some pages are freed at this intermediate state such as IC or CI,
>> that page could be merged to the other page that is resident on
>> different type of pageblock and it will cause wrong freepage count.
>
> This was supposed to be prevented by CMA operating on MAX_ORDER blocks, but
> since it doesn't hold the zone->lock between pageblocks, a race window does
> exist.
>
> It's also likely that unexpected merging can occur between MIGRATE_ISOLATE
> and non-CMA pageblocks. This should be prevented in __free_one_page() since
> commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated
> pageblock"). However, we only check the migratetype of the pageblock where
> buddy merging has been initiated, not the migratetype of the buddy pageblock
> (or group of pageblocks) which can be MIGRATE_ISOLATE.
>
> Joonsoo has suggested checking for buddy migratetype as part of
> page_is_buddy(), but that would add extra checks in allocator hotpath and
> bloat-o-meter has shown significant code bloat (the function is inline).
>
> This patch reduces the bloat at some expense of more complicated code. The
> buddy-merging while-loop in __free_one_page() is initially bounded to
> pageblock_border and without any migratetype checks. The checks are placed
> outside, bumping the max_order if merging is allowed, and returning to the
> while-loop with a statement which can't be possibly considered harmful.
>
> This fixes the accounting bug and also removes the arguably weird state in the
> original commit 3c605096d315 where buddies could be left unmerged.
>
> Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
> Link: https://lkml.org/lkml/2016/3/2/280
> Reported-by: Hanjun Guo <guohanjun@huawei.com>
> Debugged-by: Laura Abbott <labbott@redhat.com>
> Debugged-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: <stable@vger.kernel.org> # v3.18+
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

I wonder if with this change,

	ret = start_isolate_page_range(pfn_max_align_down(start),
				       pfn_max_align_up(end), migratetype,
				       false);

in alloc_contig_range could be loosen up to align to pageblocks instead
of having to use max(pageblock, max_page).  It feels that it should be
possible, but on the other hand, I’m not certain how buddy allocator
will behave if a max_order page spans MIGRATE_CMA and MIGRATE_ISOLATE
pageblocks.  I guess start_isolate_page_range would have to split such
pages?

> ---
>  mm/page_alloc.c | 46 +++++++++++++++++++++++++++++++++-------------
>  1 file changed, 33 insertions(+), 13 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c46b75d14b6f..b9785af4fae2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -683,34 +683,28 @@ static inline void __free_one_page(struct page *page,
>  	unsigned long combined_idx;
>  	unsigned long uninitialized_var(buddy_idx);
>  	struct page *buddy;
> -	unsigned int max_order = MAX_ORDER;
> +	unsigned int max_order;
> +
> +	max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
>  
>  	VM_BUG_ON(!zone_is_initialized(zone));
>  	VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
>  
>  	VM_BUG_ON(migratetype == -1);
> -	if (is_migrate_isolate(migratetype)) {
> -		/*
> -		 * We restrict max order of merging to prevent merge
> -		 * between freepages on isolate pageblock and normal
> -		 * pageblock. Without this, pageblock isolation
> -		 * could cause incorrect freepage accounting.
> -		 */
> -		max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
> -	} else {
> +	if (likely(!is_migrate_isolate(migratetype)))
>  		__mod_zone_freepage_state(zone, 1 << order, migratetype);
> -	}
>  
> -	page_idx = pfn & ((1 << max_order) - 1);
> +	page_idx = pfn & ((1 << MAX_ORDER) - 1);
>  
>  	VM_BUG_ON_PAGE(page_idx & ((1 << order) - 1), page);
>  	VM_BUG_ON_PAGE(bad_range(zone, page), page);
>  
> +continue_merging:
>  	while (order < max_order - 1) {
>  		buddy_idx = __find_buddy_index(page_idx, order);
>  		buddy = page + (buddy_idx - page_idx);
>  		if (!page_is_buddy(page, buddy, order))
> -			break;
> +			goto done_merging;
>  		/*
>  		 * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page,
>  		 * merge with it and move up one order.
> @@ -727,6 +721,32 @@ static inline void __free_one_page(struct page *page,
>  		page_idx = combined_idx;
>  		order++;
>  	}
> +	if (max_order < MAX_ORDER) {
> +		/* If we are here, it means order is >= pageblock_order.
> +		 * We want to prevent merge between freepages on isolate
> +		 * pageblock and normal pageblock. Without this, pageblock
> +		 * isolation could cause incorrect freepage or CMA accounting.
> +		 *
> +		 * We don't want to hit this code for the more frequent
> +		 * low-order merging.
> +		 */
> +		if (unlikely(has_isolate_pageblock(zone))) {
> +			int buddy_mt;
> +
> +			buddy_idx = __find_buddy_index(page_idx, order);
> +			buddy = page + (buddy_idx - page_idx);
> +			buddy_mt = get_pageblock_migratetype(buddy);
> +
> +			if (migratetype != buddy_mt
> +					&& (is_migrate_isolate(migratetype) ||
> +						is_migrate_isolate(buddy_mt)))
> +				goto done_merging;
> +		}
> +		max_order++;
> +		goto continue_merging;
> +	}
> +
> +done_merging:
>  	set_page_order(page, order);
>  
>  	/*
> -- 
> 2.7.3
>

-- 
Best regards
ミハウ “𝓶𝓲𝓷𝓪86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/page_alloc: prevent merging between isolated and other pageblocks
@ 2016-03-29 14:13   ` Michal Nazarewicz
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Nazarewicz @ 2016-03-29 14:13 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton
  Cc: linux-mm, linux-kernel, Leizhen, Sasha Levin, qiuxishi,
	Catalin Marinas, Will Deacon, Arnd Bergmann, dingtinahong,
	chenjie6, Lucas Stach, Aneesh Kumar K.V, Hanjun Guo,
	Johannes Weiner, Joonsoo Kim, Kirill A. Shutemov, Laura Abbott,
	Mel Gorman, Minchan Kim, Naoya Horiguchi, stable,
	Yasuaki Ishimatsu, Zhang Yanfei

On Wed, Mar 23 2016, Vlastimil Babka wrote:
> Hanjun Guo has reported that a CMA stress test causes broken accounting of
> CMA and free pages:
>
>> Before the test, I got:
>> -bash-4.3# cat /proc/meminfo | grep Cma
>> CmaTotal:         204800 kB
>> CmaFree:          195044 kB
>>
>>
>> After running the test:
>> -bash-4.3# cat /proc/meminfo | grep Cma
>> CmaTotal:         204800 kB
>> CmaFree:         6602584 kB
>>
>> So the freed CMA memory is more than total..
>>
>> Also the the MemFree is more than mem total:
>>
>> -bash-4.3# cat /proc/meminfo
>> MemTotal:       16342016 kB
>> MemFree:        22367268 kB
>> MemAvailable:   22370528 kB
>
> Laura Abbott has confirmed the issue and suspected the freepage accounting
> rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is
> caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
> pageblocks:
>
>> CMA isolates MAX_ORDER aligned blocks, but, during the process,
>> partialy isolated block exists. If MAX_ORDER is 11 and
>> pageblock_order is 9, two pageblocks make up MAX_ORDER
>> aligned block and I can think following scenario because pageblock
>> (un)isolation would be done one by one.
>>
>> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
>> MIGRATE_ISOLATE, respectively.
>>
>> CC -> IC -> II (Isolation)
>> II -> CI -> CC (Un-isolation)
>>
>> If some pages are freed at this intermediate state such as IC or CI,
>> that page could be merged to the other page that is resident on
>> different type of pageblock and it will cause wrong freepage count.
>
> This was supposed to be prevented by CMA operating on MAX_ORDER blocks, but
> since it doesn't hold the zone->lock between pageblocks, a race window does
> exist.
>
> It's also likely that unexpected merging can occur between MIGRATE_ISOLATE
> and non-CMA pageblocks. This should be prevented in __free_one_page() since
> commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated
> pageblock"). However, we only check the migratetype of the pageblock where
> buddy merging has been initiated, not the migratetype of the buddy pageblock
> (or group of pageblocks) which can be MIGRATE_ISOLATE.
>
> Joonsoo has suggested checking for buddy migratetype as part of
> page_is_buddy(), but that would add extra checks in allocator hotpath and
> bloat-o-meter has shown significant code bloat (the function is inline).
>
> This patch reduces the bloat at some expense of more complicated code. The
> buddy-merging while-loop in __free_one_page() is initially bounded to
> pageblock_border and without any migratetype checks. The checks are placed
> outside, bumping the max_order if merging is allowed, and returning to the
> while-loop with a statement which can't be possibly considered harmful.
>
> This fixes the accounting bug and also removes the arguably weird state in the
> original commit 3c605096d315 where buddies could be left unmerged.
>
> Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
> Link: https://lkml.org/lkml/2016/3/2/280
> Reported-by: Hanjun Guo <guohanjun@huawei.com>
> Debugged-by: Laura Abbott <labbott@redhat.com>
> Debugged-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: <stable@vger.kernel.org> # v3.18+
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

I wonder if with this change,

	ret = start_isolate_page_range(pfn_max_align_down(start),
				       pfn_max_align_up(end), migratetype,
				       false);

in alloc_contig_range could be loosen up to align to pageblocks instead
of having to use max(pageblock, max_page).  It feels that it should be
possible, but on the other hand, I’m not certain how buddy allocator
will behave if a max_order page spans MIGRATE_CMA and MIGRATE_ISOLATE
pageblocks.  I guess start_isolate_page_range would have to split such
pages?

> ---
>  mm/page_alloc.c | 46 +++++++++++++++++++++++++++++++++-------------
>  1 file changed, 33 insertions(+), 13 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c46b75d14b6f..b9785af4fae2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -683,34 +683,28 @@ static inline void __free_one_page(struct page *page,
>  	unsigned long combined_idx;
>  	unsigned long uninitialized_var(buddy_idx);
>  	struct page *buddy;
> -	unsigned int max_order = MAX_ORDER;
> +	unsigned int max_order;
> +
> +	max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
>  
>  	VM_BUG_ON(!zone_is_initialized(zone));
>  	VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
>  
>  	VM_BUG_ON(migratetype == -1);
> -	if (is_migrate_isolate(migratetype)) {
> -		/*
> -		 * We restrict max order of merging to prevent merge
> -		 * between freepages on isolate pageblock and normal
> -		 * pageblock. Without this, pageblock isolation
> -		 * could cause incorrect freepage accounting.
> -		 */
> -		max_order = min_t(unsigned int, MAX_ORDER, pageblock_order + 1);
> -	} else {
> +	if (likely(!is_migrate_isolate(migratetype)))
>  		__mod_zone_freepage_state(zone, 1 << order, migratetype);
> -	}
>  
> -	page_idx = pfn & ((1 << max_order) - 1);
> +	page_idx = pfn & ((1 << MAX_ORDER) - 1);
>  
>  	VM_BUG_ON_PAGE(page_idx & ((1 << order) - 1), page);
>  	VM_BUG_ON_PAGE(bad_range(zone, page), page);
>  
> +continue_merging:
>  	while (order < max_order - 1) {
>  		buddy_idx = __find_buddy_index(page_idx, order);
>  		buddy = page + (buddy_idx - page_idx);
>  		if (!page_is_buddy(page, buddy, order))
> -			break;
> +			goto done_merging;
>  		/*
>  		 * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page,
>  		 * merge with it and move up one order.
> @@ -727,6 +721,32 @@ static inline void __free_one_page(struct page *page,
>  		page_idx = combined_idx;
>  		order++;
>  	}
> +	if (max_order < MAX_ORDER) {
> +		/* If we are here, it means order is >= pageblock_order.
> +		 * We want to prevent merge between freepages on isolate
> +		 * pageblock and normal pageblock. Without this, pageblock
> +		 * isolation could cause incorrect freepage or CMA accounting.
> +		 *
> +		 * We don't want to hit this code for the more frequent
> +		 * low-order merging.
> +		 */
> +		if (unlikely(has_isolate_pageblock(zone))) {
> +			int buddy_mt;
> +
> +			buddy_idx = __find_buddy_index(page_idx, order);
> +			buddy = page + (buddy_idx - page_idx);
> +			buddy_mt = get_pageblock_migratetype(buddy);
> +
> +			if (migratetype != buddy_mt
> +					&& (is_migrate_isolate(migratetype) ||
> +						is_migrate_isolate(buddy_mt)))
> +				goto done_merging;
> +		}
> +		max_order++;
> +		goto continue_merging;
> +	}
> +
> +done_merging:
>  	set_page_order(page, order);
>  
>  	/*
> -- 
> 2.7.3
>

-- 
Best regards
ミハウ “𝓶𝓲𝓷𝓪86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-03-29 14:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-23  9:40 [PATCH] mm/page_alloc: prevent merging between isolated and other pageblocks Vlastimil Babka
2016-03-23  9:40 ` Vlastimil Babka
2016-03-23 11:38 ` Hanjun Guo
2016-03-23 11:38   ` Hanjun Guo
2016-03-23 11:38   ` Hanjun Guo
2016-03-23 12:38   ` Vlastimil Babka
2016-03-23 12:38     ` Vlastimil Babka
2016-03-29 14:13 ` Michal Nazarewicz
2016-03-29 14:13   ` Michal Nazarewicz
2016-03-29 14:13   ` Michal Nazarewicz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.