[PATCH v2 0/3] Aggressively allocate the pages on cma reserved memory

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/3] Aggressively allocate the pages on cma reserved memory
@ 2014-05-28  7:04 ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-05-28  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel

Hello,

This series tries to improve CMA.

CMA is introduced to provide physically contiguous pages at runtime
without reserving memory area. But, current implementation works like as
reserving memory approach, because allocation on cma reserved region only
occurs as fallback of migrate_movable allocation. We can allocate from it
when there is no movable page. In that situation, kswapd would be invoked
easily since unmovable and reclaimable allocation consider
(free pages - free CMA pages) as free memory on the system and free memory
may be lower than high watermark in that case. If kswapd start to reclaim
memory, then fallback allocation doesn't occur much.

In my experiment, I found that if system memory has 1024 MB memory and
has 512 MB reserved memory for CMA, kswapd is mostly invoked around
the 512MB free memory boundary. And invoked kswapd tries to make free
memory until (free pages - free CMA pages) is higher than high watermark,
so free memory on meminfo is moving around 512MB boundary consistently.

To fix this problem, we should allocate the pages on cma reserved memory
more aggressively and intelligenetly. Patch 2 implements the solution.
Patch 1 is the simple optimization which remove useless re-trial and patch 3
is for removing useless alloc flag, so these are not important.
See patch 2 for more detailed description.

This patchset is based on v3.15-rc7.

Joonsoo Kim (3):
  CMA: remove redundant retrying code in __alloc_contig_migrate_range
  CMA: aggressively allocate the pages on cma reserved memory when not
    used
  CMA: always treat free cma pages as non-free on watermark checking

 arch/powerpc/kvm/book3s_hv_cma.c |    4 ++
 drivers/base/dma-contiguous.c    |    3 +
 include/linux/gfp.h              |    1 +
 include/linux/mmzone.h           |   14 +++++
 mm/compaction.c                  |    4 --
 mm/internal.h                    |    3 +-
 mm/page_alloc.c                  |  124 +++++++++++++++++++++++++++++++-------
 7 files changed, 125 insertions(+), 28 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v2 0/3] Aggressively allocate the pages on cma reserved memory
@ 2014-05-28  7:04 ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-05-28  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel

Hello,

This series tries to improve CMA.

CMA is introduced to provide physically contiguous pages at runtime
without reserving memory area. But, current implementation works like as
reserving memory approach, because allocation on cma reserved region only
occurs as fallback of migrate_movable allocation. We can allocate from it
when there is no movable page. In that situation, kswapd would be invoked
easily since unmovable and reclaimable allocation consider
(free pages - free CMA pages) as free memory on the system and free memory
may be lower than high watermark in that case. If kswapd start to reclaim
memory, then fallback allocation doesn't occur much.

In my experiment, I found that if system memory has 1024 MB memory and
has 512 MB reserved memory for CMA, kswapd is mostly invoked around
the 512MB free memory boundary. And invoked kswapd tries to make free
memory until (free pages - free CMA pages) is higher than high watermark,
so free memory on meminfo is moving around 512MB boundary consistently.

To fix this problem, we should allocate the pages on cma reserved memory
more aggressively and intelligenetly. Patch 2 implements the solution.
Patch 1 is the simple optimization which remove useless re-trial and patch 3
is for removing useless alloc flag, so these are not important.
See patch 2 for more detailed description.

This patchset is based on v3.15-rc7.

Joonsoo Kim (3):
  CMA: remove redundant retrying code in __alloc_contig_migrate_range
  CMA: aggressively allocate the pages on cma reserved memory when not
    used
  CMA: always treat free cma pages as non-free on watermark checking

 arch/powerpc/kvm/book3s_hv_cma.c |    4 ++
 drivers/base/dma-contiguous.c    |    3 +
 include/linux/gfp.h              |    1 +
 include/linux/mmzone.h           |   14 +++++
 mm/compaction.c                  |    4 --
 mm/internal.h                    |    3 +-
 mm/page_alloc.c                  |  124 +++++++++++++++++++++++++++++++-------
 7 files changed, 125 insertions(+), 28 deletions(-)

-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v2 1/3] CMA: remove redundant retrying code in __alloc_contig_migrate_range
  2014-05-28  7:04 ` Joonsoo Kim
@ 2014-05-28  7:04   ` Joonsoo Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-05-28  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel

We already have retry logic in migrate_pages(). It does retry 10 times.
So if we keep this retrying code in __alloc_contig_migrate_range(), we
would try to migrate some unmigratable page in 50 times. There is just one
small difference in -ENOMEM case. migrate_pages() don't do retry
in this case, however, current __alloc_contig_migrate_range() does. But,
I think that this isn't problem, because in this case, we may fail again
with same reason.

Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..674ade7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6185,7 +6185,6 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 	/* This function is based on compact_zone() from compaction.c. */
 	unsigned long nr_reclaimed;
 	unsigned long pfn = start;
-	unsigned int tries = 0;
 	int ret = 0;
 
 	migrate_prep();
@@ -6204,10 +6203,6 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 				ret = -EINTR;
 				break;
 			}
-			tries = 0;
-		} else if (++tries == 5) {
-			ret = ret < 0 ? ret : -EBUSY;
-			break;
 		}
 
 		nr_reclaimed = reclaim_clean_pages_from_list(cc->zone,
@@ -6216,6 +6211,10 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 
 		ret = migrate_pages(&cc->migratepages, alloc_migrate_target,
 				    0, MIGRATE_SYNC, MR_CMA);
+		if (ret) {
+			ret = ret < 0 ? ret : -EBUSY;
+			break;
+		}
 	}
 	if (ret < 0) {
 		putback_movable_pages(&cc->migratepages);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 1/3] CMA: remove redundant retrying code in __alloc_contig_migrate_range
@ 2014-05-28  7:04   ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-05-28  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel

We already have retry logic in migrate_pages(). It does retry 10 times.
So if we keep this retrying code in __alloc_contig_migrate_range(), we
would try to migrate some unmigratable page in 50 times. There is just one
small difference in -ENOMEM case. migrate_pages() don't do retry
in this case, however, current __alloc_contig_migrate_range() does. But,
I think that this isn't problem, because in this case, we may fail again
with same reason.

Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..674ade7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6185,7 +6185,6 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 	/* This function is based on compact_zone() from compaction.c. */
 	unsigned long nr_reclaimed;
 	unsigned long pfn = start;
-	unsigned int tries = 0;
 	int ret = 0;
 
 	migrate_prep();
@@ -6204,10 +6203,6 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 				ret = -EINTR;
 				break;
 			}
-			tries = 0;
-		} else if (++tries == 5) {
-			ret = ret < 0 ? ret : -EBUSY;
-			break;
 		}
 
 		nr_reclaimed = reclaim_clean_pages_from_list(cc->zone,
@@ -6216,6 +6211,10 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 
 		ret = migrate_pages(&cc->migratepages, alloc_migrate_target,
 				    0, MIGRATE_SYNC, MR_CMA);
+		if (ret) {
+			ret = ret < 0 ? ret : -EBUSY;
+			break;
+		}
 	}
 	if (ret < 0) {
 		putback_movable_pages(&cc->migratepages);
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-28  7:04 ` Joonsoo Kim
@ 2014-05-28  7:04   ` Joonsoo Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-05-28  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel

CMA is introduced to provide physically contiguous pages at runtime.
For this purpose, it reserves memory at boot time. Although it reserve
memory, this reserved memory can be used for movable memory allocation
request. This usecase is beneficial to the system that needs this CMA
reserved memory infrequently and it is one of main purpose of
introducing CMA.

But, there is a problem in current implementation. The problem is that
it works like as just reserved memory approach. The pages on cma reserved
memory are hardly used for movable memory allocation. This is caused by
combination of allocation and reclaim policy.

The pages on cma reserved memory are allocated if there is no movable
memory, that is, as fallback allocation. So the time this fallback
allocation is started is under heavy memory pressure. Although it is under
memory pressure, movable allocation easily succeed, since there would be
many pages on cma reserved memory. But this is not the case for unmovable
and reclaimable allocation, because they can't use the pages on cma
reserved memory. These allocations regard system's free memory as
(free pages - free cma pages) on watermark checking, that is, free
unmovable pages + free reclaimable pages + free movable pages. Because
we already exhausted movable pages, only free pages we have are unmovable
and reclaimable types and this would be really small amount. So watermark
checking would be failed. It will wake up kswapd to make enough free
memory for unmovable and reclaimable allocation and kswapd will do.
So before we fully utilize pages on cma reserved memory, kswapd start to
reclaim memory and try to make free memory over the high watermark. This
watermark checking by kswapd doesn't take care free cma pages so many
movable pages would be reclaimed. After then, we have a lot of movable
pages again, so fallback allocation doesn't happen again. To conclude,
amount of free memory on meminfo which includes free CMA pages is moving
around 512 MB if I reserve 512 MB memory for CMA.

I found this problem on following experiment.

4 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16

CMA reserve:            0 MB            512 MB
Elapsed-time:           225.2           472.5
Average-MemFree:        322490 KB       630839 KB

To solve this problem, I can think following 2 possible solutions.
1. allocate the pages on cma reserved memory first, and if they are
   exhausted, allocate movable pages.
2. interleaved allocation: try to allocate specific amounts of memory
   from cma reserved memory and then allocate from free movable memory.

I tested #1 approach and found the problem. Although free memory on
meminfo can move around low watermark, there is large fluctuation on free
memory, because too many pages are reclaimed when kswapd is invoked.
Reason for this behaviour is that successive allocated CMA pages are
on the LRU list in that order and kswapd reclaim them in same order.
These memory doesn't help watermark checking from kwapd, so too many
pages are reclaimed, I guess.

So, I implement #2 approach.
One thing I should note is that we should not change allocation target
(movable list or cma) on each allocation attempt, since this prevent
allocated pages to be in physically succession, so some I/O devices can
be hurt their performance. To solve this, I keep allocation target
in at least pageblock_nr_pages attempts and make this number reflect
ratio, free pages without free cma pages to free cma pages. With this
approach, system works very smoothly and fully utilize the pages on
cma reserved memory.

Following is the experimental result of this patch.

4 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16

<Before>
CMA reserve:            0 MB            512 MB
Elapsed-time:           225.2           472.5
Average-MemFree:        322490 KB       630839 KB
nr_free_cma:            0               131068
pswpin:                 0               261666
pswpout:                75              1241363

<After>
CMA reserve:            0 MB            512 MB
Elapsed-time:           222.7           224
Average-MemFree:        325595 KB       393033 KB
nr_free_cma:            0               61001
pswpin:                 0               6
pswpout:                44              502

There is no difference if we don't have cma reserved memory (0 MB case).
But, with cma reserved memory (512 MB case), we fully utilize these
reserved memory through this patch and the system behaves like as
it doesn't reserve any memory.

With this patch, we aggressively allocate the pages on cma reserved memory
so latency of CMA can arise. Below is the experimental result about
latency.

4 CPUs, 1024 MB, VIRTUAL MACHINE
CMA reserve: 512 MB
Backgound Workload: make -jN
Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval

N:                    1        4       8        16
Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2

So generally we can see latency increase. Ratio of this increase
is rather big - up to 70%. But, under the heavy workload, it shows
latency decrease - up to 55%. This may be worst-case scenario, but
reducing it would be important for some system, so, I can say that
this patch have advantages and disadvantages in terms of latency.

Although I think that this patch is right direction for CMA, there is
side-effect in following case. If there is small memory zone and CMA
occupys most of them, LRU for this zone would have many CMA pages. When
reclaim is started, these CMA pages would be reclaimed, but not counted
for watermark checking, so too many CMA pages could be reclaimed
unnecessarily. Until now, this can't happen because free CMA pages aren't
used easily. But, with this patch, free CMA pages are used easily so
this problem can be possible. I will handle it on another patchset
after some investigating.

v2: In fastpath, just replenish counters. Calculation is done whenver
    cma area is varied

Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c
index d9d3d85..84a7582 100644
--- a/arch/powerpc/kvm/book3s_hv_cma.c
+++ b/arch/powerpc/kvm/book3s_hv_cma.c
@@ -132,6 +132,8 @@ struct page *kvm_alloc_cma(unsigned long nr_pages, unsigned long align_pages)
 		if (ret == 0) {
 			bitmap_set(cma->bitmap, pageno, nr_chunk);
 			page = pfn_to_page(pfn);
+			adjust_managed_cma_page_count(page_zone(page),
+								nr_pages);
 			memset(pfn_to_kaddr(pfn), 0, nr_pages << PAGE_SHIFT);
 			break;
 		} else if (ret != -EBUSY) {
@@ -180,6 +182,7 @@ bool kvm_release_cma(struct page *pages, unsigned long nr_pages)
 		     (pfn - cma->base_pfn) >> (KVM_CMA_CHUNK_ORDER - PAGE_SHIFT),
 		     nr_chunk);
 	free_contig_range(pfn, nr_pages);
+	adjust_managed_cma_page_count(page_zone(pages), nr_pages);
 	mutex_unlock(&kvm_cma_mutex);

 	return true;
@@ -210,6 +213,8 @@ static int __init kvm_cma_activate_area(unsigned long base_pfn,
 		}
 		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
 	} while (--i);
+	adjust_managed_cma_page_count(zone, count);
+
 	return 0;
 }

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 165c2c2..c578d5a 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -160,6 +160,7 @@ static int __init cma_activate_area(struct cma *cma)
 		}
 		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
 	} while (--i);
+	adjust_managed_cma_page_count(zone, cma->count);

 	return 0;
 }
@@ -307,6 +308,7 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count,
 		if (ret == 0) {
 			bitmap_set(cma->bitmap, pageno, count);
 			page = pfn_to_page(pfn);
+			adjust_managed_cma_page_count(page_zone(page), count);
 			break;
 		} else if (ret != -EBUSY) {
 			break;
@@ -353,6 +355,7 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
 	mutex_lock(&cma_mutex);
 	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
 	free_contig_range(pfn, count);
+	adjust_managed_cma_page_count(page_zone(pages), count);
 	mutex_unlock(&cma_mutex);

 	return true;
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 39b81dc..51cffc1 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -415,6 +415,7 @@ extern int alloc_contig_range(unsigned long start, unsigned long end,
 extern void free_contig_range(unsigned long pfn, unsigned nr_pages);

 /* CMA stuff */
+extern void adjust_managed_cma_page_count(struct zone *zone, long count);
 extern void init_cma_reserved_pageblock(struct page *page);

 #endif
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fac5509..f52cb96 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -389,6 +389,20 @@ struct zone {
 	int			compact_order_failed;
 #endif

+#ifdef CONFIG_CMA
+	unsigned long managed_cma_pages;
+	/*
+	 * Number of allocation attempt on each movable/cma type
+	 * without switching type. max_try(movable/cma) maintain
+	 * predefined calculated counter and replenish nr_try_(movable/cma)
+	 * with each of them whenever both of them are 0.
+	 */
+	int nr_try_movable;
+	int nr_try_cma;
+	int max_try_movable;
+	int max_try_cma;
+#endif
+
 	ZONE_PADDING(_pad1_)

 	/* Fields commonly accessed by the page reclaim scanner */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 674ade7..ca678b6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -788,6 +788,56 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
 }

 #ifdef CONFIG_CMA
+void adjust_managed_cma_page_count(struct zone *zone, long count)
+{
+	unsigned long flags;
+	long total, cma, movable;
+
+	spin_lock_irqsave(&zone->lock, flags);
+	zone->managed_cma_pages += count;
+
+	total = zone->managed_pages;
+	cma = zone->managed_cma_pages;
+	movable = total - cma - high_wmark_pages(zone);
+
+	/* No cma pages, so do only movable allocation */
+	if (cma <= 0) {
+		zone->max_try_movable = pageblock_nr_pages;
+		zone->max_try_cma = 0;
+		goto out;
+	}
+
+	/*
+	 * We want to consume cma pages with well balanced ratio so that
+	 * we have consumed enough cma pages before the reclaim. For this
+	 * purpose, we can use the ratio, movable : cma. And we doesn't
+	 * want to switch too frequently, because it prevent allocated pages
+	 * from beging successive and it is bad for some sorts of devices.
+	 * I choose pageblock_nr_pages for the minimum amount of successive
+	 * allocation because it is the size of a huge page and fragmentation
+	 * avoidance is implemented based on this size.
+	 *
+	 * To meet above criteria, I derive following equation.
+	 *
+	 * if (movable > cma) then; movable : cma = X : pageblock_nr_pages
+	 * else (movable <= cma) then; movable : cma = pageblock_nr_pages : X
+	 */
+	if (movable > cma) {
+		zone->max_try_movable =
+			(movable * pageblock_nr_pages) / cma;
+		zone->max_try_cma = pageblock_nr_pages;
+	} else {
+		zone->max_try_movable = pageblock_nr_pages;
+		zone->max_try_cma = cma * pageblock_nr_pages / movable;
+	}
+
+out:
+	zone->nr_try_movable = zone->max_try_movable;
+	zone->nr_try_cma = zone->max_try_cma;
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
+
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
 {
@@ -1136,6 +1186,36 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	return NULL;
 }

+#ifdef CONFIG_CMA
+static struct page *__rmqueue_cma(struct zone *zone, unsigned int order)
+{
+	struct page *page;
+
+	if (zone->nr_try_movable > 0)
+		goto alloc_movable;
+
+	if (zone->nr_try_cma > 0) {
+		/* Okay. Now, we can try to allocate the page from cma region */
+		zone->nr_try_cma -= 1 << order;
+		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
+
+		/* CMA pages can vanish through CMA allocation */
+		if (unlikely(!page && order == 0))
+			zone->nr_try_cma = 0;
+
+		return page;
+	}
+
+	/* Reset counter */
+	zone->nr_try_movable = zone->max_try_movable;
+	zone->nr_try_cma = zone->max_try_cma;
+
+alloc_movable:
+	zone->nr_try_movable -= 1 << order;
+	return NULL;
+}
+#endif
+
 /*
  * Do the hard work of removing an element from the buddy allocator.
  * Call me with the zone->lock already held.
@@ -1143,10 +1223,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 static struct page *__rmqueue(struct zone *zone, unsigned int order,
 						int migratetype)
 {
-	struct page *page;
+	struct page *page = NULL;
+
+	if (IS_ENABLED(CONFIG_CMA) &&
+		migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
+		page = __rmqueue_cma(zone, order);

 retry_reserve:
-	page = __rmqueue_smallest(zone, order, migratetype);
+	if (!page)
+		page = __rmqueue_smallest(zone, order, migratetype);

 	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
 		page = __rmqueue_fallback(zone, order, migratetype);
@@ -4849,6 +4934,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 		zone_seqlock_init(zone);
 		zone->zone_pgdat = pgdat;
 		zone_pcp_init(zone);
+		if (IS_ENABLED(CONFIG_CMA))
+			zone->managed_cma_pages = 0;

 		/* For bootup, initialized properly in watermark setup */
 		mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
@ 2014-05-28  7:04   ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-05-28  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel

CMA is introduced to provide physically contiguous pages at runtime.
For this purpose, it reserves memory at boot time. Although it reserve
memory, this reserved memory can be used for movable memory allocation
request. This usecase is beneficial to the system that needs this CMA
reserved memory infrequently and it is one of main purpose of
introducing CMA.

But, there is a problem in current implementation. The problem is that
it works like as just reserved memory approach. The pages on cma reserved
memory are hardly used for movable memory allocation. This is caused by
combination of allocation and reclaim policy.

The pages on cma reserved memory are allocated if there is no movable
memory, that is, as fallback allocation. So the time this fallback
allocation is started is under heavy memory pressure. Although it is under
memory pressure, movable allocation easily succeed, since there would be
many pages on cma reserved memory. But this is not the case for unmovable
and reclaimable allocation, because they can't use the pages on cma
reserved memory. These allocations regard system's free memory as
(free pages - free cma pages) on watermark checking, that is, free
unmovable pages + free reclaimable pages + free movable pages. Because
we already exhausted movable pages, only free pages we have are unmovable
and reclaimable types and this would be really small amount. So watermark
checking would be failed. It will wake up kswapd to make enough free
memory for unmovable and reclaimable allocation and kswapd will do.
So before we fully utilize pages on cma reserved memory, kswapd start to
reclaim memory and try to make free memory over the high watermark. This
watermark checking by kswapd doesn't take care free cma pages so many
movable pages would be reclaimed. After then, we have a lot of movable
pages again, so fallback allocation doesn't happen again. To conclude,
amount of free memory on meminfo which includes free CMA pages is moving
around 512 MB if I reserve 512 MB memory for CMA.

I found this problem on following experiment.

4 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16

CMA reserve:            0 MB            512 MB
Elapsed-time:           225.2           472.5
Average-MemFree:        322490 KB       630839 KB

To solve this problem, I can think following 2 possible solutions.
1. allocate the pages on cma reserved memory first, and if they are
   exhausted, allocate movable pages.
2. interleaved allocation: try to allocate specific amounts of memory
   from cma reserved memory and then allocate from free movable memory.

I tested #1 approach and found the problem. Although free memory on
meminfo can move around low watermark, there is large fluctuation on free
memory, because too many pages are reclaimed when kswapd is invoked.
Reason for this behaviour is that successive allocated CMA pages are
on the LRU list in that order and kswapd reclaim them in same order.
These memory doesn't help watermark checking from kwapd, so too many
pages are reclaimed, I guess.

So, I implement #2 approach.
One thing I should note is that we should not change allocation target
(movable list or cma) on each allocation attempt, since this prevent
allocated pages to be in physically succession, so some I/O devices can
be hurt their performance. To solve this, I keep allocation target
in at least pageblock_nr_pages attempts and make this number reflect
ratio, free pages without free cma pages to free cma pages. With this
approach, system works very smoothly and fully utilize the pages on
cma reserved memory.

Following is the experimental result of this patch.

4 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16

<Before>
CMA reserve:            0 MB            512 MB
Elapsed-time:           225.2           472.5
Average-MemFree:        322490 KB       630839 KB
nr_free_cma:            0               131068
pswpin:                 0               261666
pswpout:                75              1241363

<After>
CMA reserve:            0 MB            512 MB
Elapsed-time:           222.7           224
Average-MemFree:        325595 KB       393033 KB
nr_free_cma:            0               61001
pswpin:                 0               6
pswpout:                44              502

There is no difference if we don't have cma reserved memory (0 MB case).
But, with cma reserved memory (512 MB case), we fully utilize these
reserved memory through this patch and the system behaves like as
it doesn't reserve any memory.

With this patch, we aggressively allocate the pages on cma reserved memory
so latency of CMA can arise. Below is the experimental result about
latency.

4 CPUs, 1024 MB, VIRTUAL MACHINE
CMA reserve: 512 MB
Backgound Workload: make -jN
Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval

N:                    1        4       8        16
Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2

So generally we can see latency increase. Ratio of this increase
is rather big - up to 70%. But, under the heavy workload, it shows
latency decrease - up to 55%. This may be worst-case scenario, but
reducing it would be important for some system, so, I can say that
this patch have advantages and disadvantages in terms of latency.

Although I think that this patch is right direction for CMA, there is
side-effect in following case. If there is small memory zone and CMA
occupys most of them, LRU for this zone would have many CMA pages. When
reclaim is started, these CMA pages would be reclaimed, but not counted
for watermark checking, so too many CMA pages could be reclaimed
unnecessarily. Until now, this can't happen because free CMA pages aren't
used easily. But, with this patch, free CMA pages are used easily so
this problem can be possible. I will handle it on another patchset
after some investigating.

v2: In fastpath, just replenish counters. Calculation is done whenver
    cma area is varied

Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c
index d9d3d85..84a7582 100644
--- a/arch/powerpc/kvm/book3s_hv_cma.c
+++ b/arch/powerpc/kvm/book3s_hv_cma.c
@@ -132,6 +132,8 @@ struct page *kvm_alloc_cma(unsigned long nr_pages, unsigned long align_pages)
 		if (ret == 0) {
 			bitmap_set(cma->bitmap, pageno, nr_chunk);
 			page = pfn_to_page(pfn);
+			adjust_managed_cma_page_count(page_zone(page),
+								nr_pages);
 			memset(pfn_to_kaddr(pfn), 0, nr_pages << PAGE_SHIFT);
 			break;
 		} else if (ret != -EBUSY) {
@@ -180,6 +182,7 @@ bool kvm_release_cma(struct page *pages, unsigned long nr_pages)
 		     (pfn - cma->base_pfn) >> (KVM_CMA_CHUNK_ORDER - PAGE_SHIFT),
 		     nr_chunk);
 	free_contig_range(pfn, nr_pages);
+	adjust_managed_cma_page_count(page_zone(pages), nr_pages);
 	mutex_unlock(&kvm_cma_mutex);

 	return true;
@@ -210,6 +213,8 @@ static int __init kvm_cma_activate_area(unsigned long base_pfn,
 		}
 		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
 	} while (--i);
+	adjust_managed_cma_page_count(zone, count);
+
 	return 0;
 }

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index 165c2c2..c578d5a 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -160,6 +160,7 @@ static int __init cma_activate_area(struct cma *cma)
 		}
 		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
 	} while (--i);
+	adjust_managed_cma_page_count(zone, cma->count);

 	return 0;
 }
@@ -307,6 +308,7 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count,
 		if (ret == 0) {
 			bitmap_set(cma->bitmap, pageno, count);
 			page = pfn_to_page(pfn);
+			adjust_managed_cma_page_count(page_zone(page), count);
 			break;
 		} else if (ret != -EBUSY) {
 			break;
@@ -353,6 +355,7 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
 	mutex_lock(&cma_mutex);
 	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
 	free_contig_range(pfn, count);
+	adjust_managed_cma_page_count(page_zone(pages), count);
 	mutex_unlock(&cma_mutex);

 	return true;
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 39b81dc..51cffc1 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -415,6 +415,7 @@ extern int alloc_contig_range(unsigned long start, unsigned long end,
 extern void free_contig_range(unsigned long pfn, unsigned nr_pages);

 /* CMA stuff */
+extern void adjust_managed_cma_page_count(struct zone *zone, long count);
 extern void init_cma_reserved_pageblock(struct page *page);

 #endif
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fac5509..f52cb96 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -389,6 +389,20 @@ struct zone {
 	int			compact_order_failed;
 #endif

+#ifdef CONFIG_CMA
+	unsigned long managed_cma_pages;
+	/*
+	 * Number of allocation attempt on each movable/cma type
+	 * without switching type. max_try(movable/cma) maintain
+	 * predefined calculated counter and replenish nr_try_(movable/cma)
+	 * with each of them whenever both of them are 0.
+	 */
+	int nr_try_movable;
+	int nr_try_cma;
+	int max_try_movable;
+	int max_try_cma;
+#endif
+
 	ZONE_PADDING(_pad1_)

 	/* Fields commonly accessed by the page reclaim scanner */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 674ade7..ca678b6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -788,6 +788,56 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
 }

 #ifdef CONFIG_CMA
+void adjust_managed_cma_page_count(struct zone *zone, long count)
+{
+	unsigned long flags;
+	long total, cma, movable;
+
+	spin_lock_irqsave(&zone->lock, flags);
+	zone->managed_cma_pages += count;
+
+	total = zone->managed_pages;
+	cma = zone->managed_cma_pages;
+	movable = total - cma - high_wmark_pages(zone);
+
+	/* No cma pages, so do only movable allocation */
+	if (cma <= 0) {
+		zone->max_try_movable = pageblock_nr_pages;
+		zone->max_try_cma = 0;
+		goto out;
+	}
+
+	/*
+	 * We want to consume cma pages with well balanced ratio so that
+	 * we have consumed enough cma pages before the reclaim. For this
+	 * purpose, we can use the ratio, movable : cma. And we doesn't
+	 * want to switch too frequently, because it prevent allocated pages
+	 * from beging successive and it is bad for some sorts of devices.
+	 * I choose pageblock_nr_pages for the minimum amount of successive
+	 * allocation because it is the size of a huge page and fragmentation
+	 * avoidance is implemented based on this size.
+	 *
+	 * To meet above criteria, I derive following equation.
+	 *
+	 * if (movable > cma) then; movable : cma = X : pageblock_nr_pages
+	 * else (movable <= cma) then; movable : cma = pageblock_nr_pages : X
+	 */
+	if (movable > cma) {
+		zone->max_try_movable =
+			(movable * pageblock_nr_pages) / cma;
+		zone->max_try_cma = pageblock_nr_pages;
+	} else {
+		zone->max_try_movable = pageblock_nr_pages;
+		zone->max_try_cma = cma * pageblock_nr_pages / movable;
+	}
+
+out:
+	zone->nr_try_movable = zone->max_try_movable;
+	zone->nr_try_cma = zone->max_try_cma;
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
+
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
 {
@@ -1136,6 +1186,36 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	return NULL;
 }

+#ifdef CONFIG_CMA
+static struct page *__rmqueue_cma(struct zone *zone, unsigned int order)
+{
+	struct page *page;
+
+	if (zone->nr_try_movable > 0)
+		goto alloc_movable;
+
+	if (zone->nr_try_cma > 0) {
+		/* Okay. Now, we can try to allocate the page from cma region */
+		zone->nr_try_cma -= 1 << order;
+		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
+
+		/* CMA pages can vanish through CMA allocation */
+		if (unlikely(!page && order == 0))
+			zone->nr_try_cma = 0;
+
+		return page;
+	}
+
+	/* Reset counter */
+	zone->nr_try_movable = zone->max_try_movable;
+	zone->nr_try_cma = zone->max_try_cma;
+
+alloc_movable:
+	zone->nr_try_movable -= 1 << order;
+	return NULL;
+}
+#endif
+
 /*
  * Do the hard work of removing an element from the buddy allocator.
  * Call me with the zone->lock already held.
@@ -1143,10 +1223,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 static struct page *__rmqueue(struct zone *zone, unsigned int order,
 						int migratetype)
 {
-	struct page *page;
+	struct page *page = NULL;
+
+	if (IS_ENABLED(CONFIG_CMA) &&
+		migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
+		page = __rmqueue_cma(zone, order);

 retry_reserve:
-	page = __rmqueue_smallest(zone, order, migratetype);
+	if (!page)
+		page = __rmqueue_smallest(zone, order, migratetype);

 	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
 		page = __rmqueue_fallback(zone, order, migratetype);
@@ -4849,6 +4934,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 		zone_seqlock_init(zone);
 		zone->zone_pgdat = pgdat;
 		zone_pcp_init(zone);
+		if (IS_ENABLED(CONFIG_CMA))
+			zone->managed_cma_pages = 0;

 		/* For bootup, initialized properly in watermark setup */
 		mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 3/3] CMA: always treat free cma pages as non-free on watermark checking
  2014-05-28  7:04 ` Joonsoo Kim
@ 2014-05-28  7:04   ` Joonsoo Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-05-28  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel

commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag
for alloc flag and treats free cma pages as free pages if this flag is
passed to watermark checking. Intention of that patch is that movable page
allocation can be be handled from cma reserved region without starting
kswapd. Now, previous patch changes the behaviour of allocator that
movable allocation uses the page on cma reserved region aggressively,
so this watermark hack isn't needed anymore. Therefore remove it.

Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/compaction.c b/mm/compaction.c
index 627dc2e..36e2fcd 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1117,10 +1117,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 
 	count_compact_event(COMPACTSTALL);
 
-#ifdef CONFIG_CMA
-	if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
-#endif
 	/* Compact each zone in the list */
 	for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
 								nodemask) {
diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..a121762 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -384,7 +384,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_HARDER		0x10 /* try to alloc harder */
 #define ALLOC_HIGH		0x20 /* __GFP_HIGH set */
 #define ALLOC_CPUSET		0x40 /* check for correct cpuset */
-#define ALLOC_CMA		0x80 /* allow allocations from CMA areas */
-#define ALLOC_FAIR		0x100 /* fair zone allocation */
+#define ALLOC_FAIR		0x80 /* fair zone allocation */
 
 #endif	/* __MM_INTERNAL_H */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ca678b6..83a8021 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1764,20 +1764,22 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark,
 	long min = mark;
 	long lowmem_reserve = z->lowmem_reserve[classzone_idx];
 	int o;
-	long free_cma = 0;
 
 	free_pages -= (1 << order) - 1;
 	if (alloc_flags & ALLOC_HIGH)
 		min -= min / 2;
 	if (alloc_flags & ALLOC_HARDER)
 		min -= min / 4;
-#ifdef CONFIG_CMA
-	/* If allocation can't use CMA areas don't use free CMA pages */
-	if (!(alloc_flags & ALLOC_CMA))
-		free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
-#endif
+	/*
+	 * We don't want to regard the pages on CMA region as free
+	 * on watermark checking, since they cannot be used for
+	 * unmovable/reclaimable allocation and they can suddenly
+	 * vanish through CMA allocation
+	 */
+	if (IS_ENABLED(CONFIG_CMA) && z->managed_cma_pages)
+		free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
 
-	if (free_pages - free_cma <= min + lowmem_reserve)
+	if (free_pages <= min + lowmem_reserve)
 		return false;
 	for (o = 0; o < order; o++) {
 		/* At the next order, this order's pages become unavailable */
@@ -2545,10 +2547,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 				 unlikely(test_thread_flag(TIF_MEMDIE))))
 			alloc_flags |= ALLOC_NO_WATERMARKS;
 	}
-#ifdef CONFIG_CMA
-	if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
-#endif
 	return alloc_flags;
 }
 
@@ -2818,10 +2816,6 @@ retry_cpuset:
 	if (!preferred_zone)
 		goto out;
 
-#ifdef CONFIG_CMA
-	if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
-#endif
 retry:
 	/* First allocation attempt */
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 3/3] CMA: always treat free cma pages as non-free on watermark checking
@ 2014-05-28  7:04   ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-05-28  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel

commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag
for alloc flag and treats free cma pages as free pages if this flag is
passed to watermark checking. Intention of that patch is that movable page
allocation can be be handled from cma reserved region without starting
kswapd. Now, previous patch changes the behaviour of allocator that
movable allocation uses the page on cma reserved region aggressively,
so this watermark hack isn't needed anymore. Therefore remove it.

Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/compaction.c b/mm/compaction.c
index 627dc2e..36e2fcd 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1117,10 +1117,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 
 	count_compact_event(COMPACTSTALL);
 
-#ifdef CONFIG_CMA
-	if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
-#endif
 	/* Compact each zone in the list */
 	for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
 								nodemask) {
diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..a121762 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -384,7 +384,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_HARDER		0x10 /* try to alloc harder */
 #define ALLOC_HIGH		0x20 /* __GFP_HIGH set */
 #define ALLOC_CPUSET		0x40 /* check for correct cpuset */
-#define ALLOC_CMA		0x80 /* allow allocations from CMA areas */
-#define ALLOC_FAIR		0x100 /* fair zone allocation */
+#define ALLOC_FAIR		0x80 /* fair zone allocation */
 
 #endif	/* __MM_INTERNAL_H */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ca678b6..83a8021 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1764,20 +1764,22 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark,
 	long min = mark;
 	long lowmem_reserve = z->lowmem_reserve[classzone_idx];
 	int o;
-	long free_cma = 0;
 
 	free_pages -= (1 << order) - 1;
 	if (alloc_flags & ALLOC_HIGH)
 		min -= min / 2;
 	if (alloc_flags & ALLOC_HARDER)
 		min -= min / 4;
-#ifdef CONFIG_CMA
-	/* If allocation can't use CMA areas don't use free CMA pages */
-	if (!(alloc_flags & ALLOC_CMA))
-		free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
-#endif
+	/*
+	 * We don't want to regard the pages on CMA region as free
+	 * on watermark checking, since they cannot be used for
+	 * unmovable/reclaimable allocation and they can suddenly
+	 * vanish through CMA allocation
+	 */
+	if (IS_ENABLED(CONFIG_CMA) && z->managed_cma_pages)
+		free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
 
-	if (free_pages - free_cma <= min + lowmem_reserve)
+	if (free_pages <= min + lowmem_reserve)
 		return false;
 	for (o = 0; o < order; o++) {
 		/* At the next order, this order's pages become unavailable */
@@ -2545,10 +2547,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 				 unlikely(test_thread_flag(TIF_MEMDIE))))
 			alloc_flags |= ALLOC_NO_WATERMARKS;
 	}
-#ifdef CONFIG_CMA
-	if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
-#endif
 	return alloc_flags;
 }
 
@@ -2818,10 +2816,6 @@ retry_cpuset:
 	if (!preferred_zone)
 		goto out;
 
-#ifdef CONFIG_CMA
-	if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
-#endif
 retry:
 	/* First allocation attempt */
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-28  7:04   ` Joonsoo Kim
@ 2014-05-29  7:24     ` Gioh Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Gioh Kim @ 2014-05-29  7:24 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Laura Abbott,
	Minchan Kim, Heesub Shin, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel

I've not understand your code fully. Please let me ask some silly questions.

2014-05-28 오후 4:04, Joonsoo Kim 쓴 글:
> CMA is introduced to provide physically contiguous pages at runtime.
> For this purpose, it reserves memory at boot time. Although it reserve
> memory, this reserved memory can be used for movable memory allocation
> request. This usecase is beneficial to the system that needs this CMA
> reserved memory infrequently and it is one of main purpose of
> introducing CMA.
> 
> But, there is a problem in current implementation. The problem is that
> it works like as just reserved memory approach. The pages on cma reserved
> memory are hardly used for movable memory allocation. This is caused by
> combination of allocation and reclaim policy.
> 
> The pages on cma reserved memory are allocated if there is no movable
> memory, that is, as fallback allocation. So the time this fallback
> allocation is started is under heavy memory pressure. Although it is under
> memory pressure, movable allocation easily succeed, since there would be
> many pages on cma reserved memory. But this is not the case for unmovable
> and reclaimable allocation, because they can't use the pages on cma
> reserved memory. These allocations regard system's free memory as
> (free pages - free cma pages) on watermark checking, that is, free
> unmovable pages + free reclaimable pages + free movable pages. Because
> we already exhausted movable pages, only free pages we have are unmovable
> and reclaimable types and this would be really small amount. So watermark
> checking would be failed. It will wake up kswapd to make enough free
> memory for unmovable and reclaimable allocation and kswapd will do.
> So before we fully utilize pages on cma reserved memory, kswapd start to
> reclaim memory and try to make free memory over the high watermark. This
> watermark checking by kswapd doesn't take care free cma pages so many
> movable pages would be reclaimed. After then, we have a lot of movable
> pages again, so fallback allocation doesn't happen again. To conclude,
> amount of free memory on meminfo which includes free CMA pages is moving
> around 512 MB if I reserve 512 MB memory for CMA.
> 
> I found this problem on following experiment.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
> 
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           225.2           472.5
> Average-MemFree:        322490 KB       630839 KB
> 
> To solve this problem, I can think following 2 possible solutions.
> 1. allocate the pages on cma reserved memory first, and if they are
>     exhausted, allocate movable pages.
> 2. interleaved allocation: try to allocate specific amounts of memory
>     from cma reserved memory and then allocate from free movable memory.
> 
> I tested #1 approach and found the problem. Although free memory on
> meminfo can move around low watermark, there is large fluctuation on free
> memory, because too many pages are reclaimed when kswapd is invoked.
> Reason for this behaviour is that successive allocated CMA pages are
> on the LRU list in that order and kswapd reclaim them in same order.
> These memory doesn't help watermark checking from kwapd, so too many
> pages are reclaimed, I guess.
> 
> So, I implement #2 approach.
> One thing I should note is that we should not change allocation target
> (movable list or cma) on each allocation attempt, since this prevent
> allocated pages to be in physically succession, so some I/O devices can
> be hurt their performance. To solve this, I keep allocation target
> in at least pageblock_nr_pages attempts and make this number reflect
> ratio, free pages without free cma pages to free cma pages. With this
> approach, system works very smoothly and fully utilize the pages on
> cma reserved memory.
> 
> Following is the experimental result of this patch.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
> 
> <Before>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           225.2           472.5
> Average-MemFree:        322490 KB       630839 KB
> nr_free_cma:            0               131068
> pswpin:                 0               261666
> pswpout:                75              1241363
> 
> <After>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           222.7           224
> Average-MemFree:        325595 KB       393033 KB
> nr_free_cma:            0               61001
> pswpin:                 0               6
> pswpout:                44              502
> 
> There is no difference if we don't have cma reserved memory (0 MB case).
> But, with cma reserved memory (512 MB case), we fully utilize these
> reserved memory through this patch and the system behaves like as
> it doesn't reserve any memory.
> 
> With this patch, we aggressively allocate the pages on cma reserved memory
> so latency of CMA can arise. Below is the experimental result about
> latency.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> CMA reserve: 512 MB
> Backgound Workload: make -jN
> Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval
> 
> N:                    1        4       8        16
> Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
> Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2
> 
> So generally we can see latency increase. Ratio of this increase
> is rather big - up to 70%. But, under the heavy workload, it shows
> latency decrease - up to 55%. This may be worst-case scenario, but
> reducing it would be important for some system, so, I can say that
> this patch have advantages and disadvantages in terms of latency.
> 
> Although I think that this patch is right direction for CMA, there is
> side-effect in following case. If there is small memory zone and CMA
> occupys most of them, LRU for this zone would have many CMA pages. When
> reclaim is started, these CMA pages would be reclaimed, but not counted
> for watermark checking, so too many CMA pages could be reclaimed
> unnecessarily. Until now, this can't happen because free CMA pages aren't
> used easily. But, with this patch, free CMA pages are used easily so
> this problem can be possible. I will handle it on another patchset
> after some investigating.
> 
> v2: In fastpath, just replenish counters. Calculation is done whenver
>      cma area is varied
> 
> Acked-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c
> index d9d3d85..84a7582 100644
> --- a/arch/powerpc/kvm/book3s_hv_cma.c
> +++ b/arch/powerpc/kvm/book3s_hv_cma.c
> @@ -132,6 +132,8 @@ struct page *kvm_alloc_cma(unsigned long nr_pages, unsigned long align_pages)
>   		if (ret == 0) {
>   			bitmap_set(cma->bitmap, pageno, nr_chunk);
>   			page = pfn_to_page(pfn);
> +			adjust_managed_cma_page_count(page_zone(page),
> +								nr_pages);

I think it should be -nr_pages to decrease the managed_cma_pages variable.
But it is not. I think there is a reason.
Why the managed_cma_pages is increased by allocation?

>   			memset(pfn_to_kaddr(pfn), 0, nr_pages << PAGE_SHIFT);
>   			break;
>   		} else if (ret != -EBUSY) {
> @@ -180,6 +182,7 @@ bool kvm_release_cma(struct page *pages, unsigned long nr_pages)
>   		     (pfn - cma->base_pfn) >> (KVM_CMA_CHUNK_ORDER - PAGE_SHIFT),
>   		     nr_chunk);
>   	free_contig_range(pfn, nr_pages);
> +	adjust_managed_cma_page_count(page_zone(pages), nr_pages);
>   	mutex_unlock(&kvm_cma_mutex);
>   
>   	return true;
> @@ -210,6 +213,8 @@ static int __init kvm_cma_activate_area(unsigned long base_pfn,
>   		}
>   		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
>   	} while (--i);
> +	adjust_managed_cma_page_count(zone, count);
> +
>   	return 0;
>   }
>   
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index 165c2c2..c578d5a 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -160,6 +160,7 @@ static int __init cma_activate_area(struct cma *cma)
>   		}
>   		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
>   	} while (--i);
> +	adjust_managed_cma_page_count(zone, cma->count);
>   
>   	return 0;
>   }
> @@ -307,6 +308,7 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count,
>   		if (ret == 0) {
>   			bitmap_set(cma->bitmap, pageno, count);
>   			page = pfn_to_page(pfn);
> +			adjust_managed_cma_page_count(page_zone(page), count);

I think this also should be -count.

>   			break;
>   		} else if (ret != -EBUSY) {
>   			break;
> @@ -353,6 +355,7 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
>   	mutex_lock(&cma_mutex);
>   	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
>   	free_contig_range(pfn, count);
> +	adjust_managed_cma_page_count(page_zone(pages), count);
>   	mutex_unlock(&cma_mutex);
>   
>   	return true;
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 39b81dc..51cffc1 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -415,6 +415,7 @@ extern int alloc_contig_range(unsigned long start, unsigned long end,
>   extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
>   
>   /* CMA stuff */
> +extern void adjust_managed_cma_page_count(struct zone *zone, long count);
>   extern void init_cma_reserved_pageblock(struct page *page);
>   
>   #endif
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fac5509..f52cb96 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -389,6 +389,20 @@ struct zone {
>   	int			compact_order_failed;
>   #endif
>   
> +#ifdef CONFIG_CMA
> +	unsigned long managed_cma_pages;
> +	/*
> +	 * Number of allocation attempt on each movable/cma type
> +	 * without switching type. max_try(movable/cma) maintain
> +	 * predefined calculated counter and replenish nr_try_(movable/cma)
> +	 * with each of them whenever both of them are 0.
> +	 */
> +	int nr_try_movable;
> +	int nr_try_cma;
> +	int max_try_movable;
> +	int max_try_cma;
> +#endif
> +
>   	ZONE_PADDING(_pad1_)
>   
>   	/* Fields commonly accessed by the page reclaim scanner */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 674ade7..ca678b6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -788,6 +788,56 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
>   }
>   
>   #ifdef CONFIG_CMA
> +void adjust_managed_cma_page_count(struct zone *zone, long count)
> +{
> +	unsigned long flags;
> +	long total, cma, movable;
> +
> +	spin_lock_irqsave(&zone->lock, flags);
> +	zone->managed_cma_pages += count;
> +
> +	total = zone->managed_pages;
> +	cma = zone->managed_cma_pages;
> +	movable = total - cma - high_wmark_pages(zone);

If cma can be negative value, above calcuation increase movable value becuase -cma becomes positive value.
Does it need a sign check?



> +
> +	/* No cma pages, so do only movable allocation */
> +	if (cma <= 0) {
> +		zone->max_try_movable = pageblock_nr_pages;
> +		zone->max_try_cma = 0;
> +		goto out;
> +	}
> +
> +	/*
> +	 * We want to consume cma pages with well balanced ratio so that
> +	 * we have consumed enough cma pages before the reclaim. For this
> +	 * purpose, we can use the ratio, movable : cma. And we doesn't
> +	 * want to switch too frequently, because it prevent allocated pages
> +	 * from beging successive and it is bad for some sorts of devices.
> +	 * I choose pageblock_nr_pages for the minimum amount of successive
> +	 * allocation because it is the size of a huge page and fragmentation
> +	 * avoidance is implemented based on this size.
> +	 *
> +	 * To meet above criteria, I derive following equation.
> +	 *
> +	 * if (movable > cma) then; movable : cma = X : pageblock_nr_pages
> +	 * else (movable <= cma) then; movable : cma = pageblock_nr_pages : X
> +	 */
> +	if (movable > cma) {
> +		zone->max_try_movable =
> +			(movable * pageblock_nr_pages) / cma;

I think you assume that cma value cannot be negative. If cma can be negative, the resule of dividing by cma becomes negative. Right?


> +		zone->max_try_cma = pageblock_nr_pages;
> +	} else {
> +		zone->max_try_movable = pageblock_nr_pages;
> +		zone->max_try_cma = cma * pageblock_nr_pages / movable;
> +	}
> +
> +out:
> +	zone->nr_try_movable = zone->max_try_movable;
> +	zone->nr_try_cma = zone->max_try_cma;
> +
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +}
> +
>   /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>   void __init init_cma_reserved_pageblock(struct page *page)
>   {
> @@ -1136,6 +1186,36 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>   	return NULL;
>   }
>   
> +#ifdef CONFIG_CMA
> +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order)
> +{
> +	struct page *page;
> +
> +	if (zone->nr_try_movable > 0)
> +		goto alloc_movable;
> +
> +	if (zone->nr_try_cma > 0) {
> +		/* Okay. Now, we can try to allocate the page from cma region */
> +		zone->nr_try_cma -= 1 << order;
> +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> +
> +		/* CMA pages can vanish through CMA allocation */
> +		if (unlikely(!page && order == 0))
> +			zone->nr_try_cma = 0;
> +
> +		return page;
> +	}
> +
> +	/* Reset counter */
> +	zone->nr_try_movable = zone->max_try_movable;
> +	zone->nr_try_cma = zone->max_try_cma;
> +
> +alloc_movable:
> +	zone->nr_try_movable -= 1 << order;
> +	return NULL;
> +}
> +#endif
> +
>   /*
>    * Do the hard work of removing an element from the buddy allocator.
>    * Call me with the zone->lock already held.
> @@ -1143,10 +1223,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>   static struct page *__rmqueue(struct zone *zone, unsigned int order,
>   						int migratetype)
>   {
> -	struct page *page;
> +	struct page *page = NULL;
> +
> +	if (IS_ENABLED(CONFIG_CMA) &&

You might know that CONFIG_CMA is enabled and there is no CMA memory, because CONFIG_CMA_SIZE_MBYTES can be zero.
Is IS_ENABLED(CONFIG_CMA) alright in that case?

> +		migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
> +		page = __rmqueue_cma(zone, order);
>   
>   retry_reserve:
> -	page = __rmqueue_smallest(zone, order, migratetype);
> +	if (!page)
> +		page = __rmqueue_smallest(zone, order, migratetype);
>   
>   	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
>   		page = __rmqueue_fallback(zone, order, migratetype);
> @@ -4849,6 +4934,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>   		zone_seqlock_init(zone);
>   		zone->zone_pgdat = pgdat;
>   		zone_pcp_init(zone);
> +		if (IS_ENABLED(CONFIG_CMA))
> +			zone->managed_cma_pages = 0;
>   
>   		/* For bootup, initialized properly in watermark setup */
>   		mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
@ 2014-05-29  7:24     ` Gioh Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Gioh Kim @ 2014-05-29  7:24 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Laura Abbott,
	Minchan Kim, Heesub Shin, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel

I've not understand your code fully. Please let me ask some silly questions.

2014-05-28 ?AEA 4:04, Joonsoo Kim  3/4 ' +-U:
> CMA is introduced to provide physically contiguous pages at runtime.
> For this purpose, it reserves memory at boot time. Although it reserve
> memory, this reserved memory can be used for movable memory allocation
> request. This usecase is beneficial to the system that needs this CMA
> reserved memory infrequently and it is one of main purpose of
> introducing CMA.
> 
> But, there is a problem in current implementation. The problem is that
> it works like as just reserved memory approach. The pages on cma reserved
> memory are hardly used for movable memory allocation. This is caused by
> combination of allocation and reclaim policy.
> 
> The pages on cma reserved memory are allocated if there is no movable
> memory, that is, as fallback allocation. So the time this fallback
> allocation is started is under heavy memory pressure. Although it is under
> memory pressure, movable allocation easily succeed, since there would be
> many pages on cma reserved memory. But this is not the case for unmovable
> and reclaimable allocation, because they can't use the pages on cma
> reserved memory. These allocations regard system's free memory as
> (free pages - free cma pages) on watermark checking, that is, free
> unmovable pages + free reclaimable pages + free movable pages. Because
> we already exhausted movable pages, only free pages we have are unmovable
> and reclaimable types and this would be really small amount. So watermark
> checking would be failed. It will wake up kswapd to make enough free
> memory for unmovable and reclaimable allocation and kswapd will do.
> So before we fully utilize pages on cma reserved memory, kswapd start to
> reclaim memory and try to make free memory over the high watermark. This
> watermark checking by kswapd doesn't take care free cma pages so many
> movable pages would be reclaimed. After then, we have a lot of movable
> pages again, so fallback allocation doesn't happen again. To conclude,
> amount of free memory on meminfo which includes free CMA pages is moving
> around 512 MB if I reserve 512 MB memory for CMA.
> 
> I found this problem on following experiment.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
> 
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           225.2           472.5
> Average-MemFree:        322490 KB       630839 KB
> 
> To solve this problem, I can think following 2 possible solutions.
> 1. allocate the pages on cma reserved memory first, and if they are
>     exhausted, allocate movable pages.
> 2. interleaved allocation: try to allocate specific amounts of memory
>     from cma reserved memory and then allocate from free movable memory.
> 
> I tested #1 approach and found the problem. Although free memory on
> meminfo can move around low watermark, there is large fluctuation on free
> memory, because too many pages are reclaimed when kswapd is invoked.
> Reason for this behaviour is that successive allocated CMA pages are
> on the LRU list in that order and kswapd reclaim them in same order.
> These memory doesn't help watermark checking from kwapd, so too many
> pages are reclaimed, I guess.
> 
> So, I implement #2 approach.
> One thing I should note is that we should not change allocation target
> (movable list or cma) on each allocation attempt, since this prevent
> allocated pages to be in physically succession, so some I/O devices can
> be hurt their performance. To solve this, I keep allocation target
> in at least pageblock_nr_pages attempts and make this number reflect
> ratio, free pages without free cma pages to free cma pages. With this
> approach, system works very smoothly and fully utilize the pages on
> cma reserved memory.
> 
> Following is the experimental result of this patch.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
> 
> <Before>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           225.2           472.5
> Average-MemFree:        322490 KB       630839 KB
> nr_free_cma:            0               131068
> pswpin:                 0               261666
> pswpout:                75              1241363
> 
> <After>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           222.7           224
> Average-MemFree:        325595 KB       393033 KB
> nr_free_cma:            0               61001
> pswpin:                 0               6
> pswpout:                44              502
> 
> There is no difference if we don't have cma reserved memory (0 MB case).
> But, with cma reserved memory (512 MB case), we fully utilize these
> reserved memory through this patch and the system behaves like as
> it doesn't reserve any memory.
> 
> With this patch, we aggressively allocate the pages on cma reserved memory
> so latency of CMA can arise. Below is the experimental result about
> latency.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> CMA reserve: 512 MB
> Backgound Workload: make -jN
> Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval
> 
> N:                    1        4       8        16
> Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
> Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2
> 
> So generally we can see latency increase. Ratio of this increase
> is rather big - up to 70%. But, under the heavy workload, it shows
> latency decrease - up to 55%. This may be worst-case scenario, but
> reducing it would be important for some system, so, I can say that
> this patch have advantages and disadvantages in terms of latency.
> 
> Although I think that this patch is right direction for CMA, there is
> side-effect in following case. If there is small memory zone and CMA
> occupys most of them, LRU for this zone would have many CMA pages. When
> reclaim is started, these CMA pages would be reclaimed, but not counted
> for watermark checking, so too many CMA pages could be reclaimed
> unnecessarily. Until now, this can't happen because free CMA pages aren't
> used easily. But, with this patch, free CMA pages are used easily so
> this problem can be possible. I will handle it on another patchset
> after some investigating.
> 
> v2: In fastpath, just replenish counters. Calculation is done whenver
>      cma area is varied
> 
> Acked-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c
> index d9d3d85..84a7582 100644
> --- a/arch/powerpc/kvm/book3s_hv_cma.c
> +++ b/arch/powerpc/kvm/book3s_hv_cma.c
> @@ -132,6 +132,8 @@ struct page *kvm_alloc_cma(unsigned long nr_pages, unsigned long align_pages)
>   		if (ret == 0) {
>   			bitmap_set(cma->bitmap, pageno, nr_chunk);
>   			page = pfn_to_page(pfn);
> +			adjust_managed_cma_page_count(page_zone(page),
> +								nr_pages);

I think it should be -nr_pages to decrease the managed_cma_pages variable.
But it is not. I think there is a reason.
Why the managed_cma_pages is increased by allocation?

>   			memset(pfn_to_kaddr(pfn), 0, nr_pages << PAGE_SHIFT);
>   			break;
>   		} else if (ret != -EBUSY) {
> @@ -180,6 +182,7 @@ bool kvm_release_cma(struct page *pages, unsigned long nr_pages)
>   		     (pfn - cma->base_pfn) >> (KVM_CMA_CHUNK_ORDER - PAGE_SHIFT),
>   		     nr_chunk);
>   	free_contig_range(pfn, nr_pages);
> +	adjust_managed_cma_page_count(page_zone(pages), nr_pages);
>   	mutex_unlock(&kvm_cma_mutex);
>   
>   	return true;
> @@ -210,6 +213,8 @@ static int __init kvm_cma_activate_area(unsigned long base_pfn,
>   		}
>   		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
>   	} while (--i);
> +	adjust_managed_cma_page_count(zone, count);
> +
>   	return 0;
>   }
>   
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index 165c2c2..c578d5a 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -160,6 +160,7 @@ static int __init cma_activate_area(struct cma *cma)
>   		}
>   		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
>   	} while (--i);
> +	adjust_managed_cma_page_count(zone, cma->count);
>   
>   	return 0;
>   }
> @@ -307,6 +308,7 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count,
>   		if (ret == 0) {
>   			bitmap_set(cma->bitmap, pageno, count);
>   			page = pfn_to_page(pfn);
> +			adjust_managed_cma_page_count(page_zone(page), count);

I think this also should be -count.

>   			break;
>   		} else if (ret != -EBUSY) {
>   			break;
> @@ -353,6 +355,7 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
>   	mutex_lock(&cma_mutex);
>   	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
>   	free_contig_range(pfn, count);
> +	adjust_managed_cma_page_count(page_zone(pages), count);
>   	mutex_unlock(&cma_mutex);
>   
>   	return true;
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 39b81dc..51cffc1 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -415,6 +415,7 @@ extern int alloc_contig_range(unsigned long start, unsigned long end,
>   extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
>   
>   /* CMA stuff */
> +extern void adjust_managed_cma_page_count(struct zone *zone, long count);
>   extern void init_cma_reserved_pageblock(struct page *page);
>   
>   #endif
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fac5509..f52cb96 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -389,6 +389,20 @@ struct zone {
>   	int			compact_order_failed;
>   #endif
>   
> +#ifdef CONFIG_CMA
> +	unsigned long managed_cma_pages;
> +	/*
> +	 * Number of allocation attempt on each movable/cma type
> +	 * without switching type. max_try(movable/cma) maintain
> +	 * predefined calculated counter and replenish nr_try_(movable/cma)
> +	 * with each of them whenever both of them are 0.
> +	 */
> +	int nr_try_movable;
> +	int nr_try_cma;
> +	int max_try_movable;
> +	int max_try_cma;
> +#endif
> +
>   	ZONE_PADDING(_pad1_)
>   
>   	/* Fields commonly accessed by the page reclaim scanner */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 674ade7..ca678b6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -788,6 +788,56 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
>   }
>   
>   #ifdef CONFIG_CMA
> +void adjust_managed_cma_page_count(struct zone *zone, long count)
> +{
> +	unsigned long flags;
> +	long total, cma, movable;
> +
> +	spin_lock_irqsave(&zone->lock, flags);
> +	zone->managed_cma_pages += count;
> +
> +	total = zone->managed_pages;
> +	cma = zone->managed_cma_pages;
> +	movable = total - cma - high_wmark_pages(zone);

If cma can be negative value, above calcuation increase movable value becuase -cma becomes positive value.
Does it need a sign check?



> +
> +	/* No cma pages, so do only movable allocation */
> +	if (cma <= 0) {
> +		zone->max_try_movable = pageblock_nr_pages;
> +		zone->max_try_cma = 0;
> +		goto out;
> +	}
> +
> +	/*
> +	 * We want to consume cma pages with well balanced ratio so that
> +	 * we have consumed enough cma pages before the reclaim. For this
> +	 * purpose, we can use the ratio, movable : cma. And we doesn't
> +	 * want to switch too frequently, because it prevent allocated pages
> +	 * from beging successive and it is bad for some sorts of devices.
> +	 * I choose pageblock_nr_pages for the minimum amount of successive
> +	 * allocation because it is the size of a huge page and fragmentation
> +	 * avoidance is implemented based on this size.
> +	 *
> +	 * To meet above criteria, I derive following equation.
> +	 *
> +	 * if (movable > cma) then; movable : cma = X : pageblock_nr_pages
> +	 * else (movable <= cma) then; movable : cma = pageblock_nr_pages : X
> +	 */
> +	if (movable > cma) {
> +		zone->max_try_movable =
> +			(movable * pageblock_nr_pages) / cma;

I think you assume that cma value cannot be negative. If cma can be negative, the resule of dividing by cma becomes negative. Right?


> +		zone->max_try_cma = pageblock_nr_pages;
> +	} else {
> +		zone->max_try_movable = pageblock_nr_pages;
> +		zone->max_try_cma = cma * pageblock_nr_pages / movable;
> +	}
> +
> +out:
> +	zone->nr_try_movable = zone->max_try_movable;
> +	zone->nr_try_cma = zone->max_try_cma;
> +
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +}
> +
>   /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>   void __init init_cma_reserved_pageblock(struct page *page)
>   {
> @@ -1136,6 +1186,36 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>   	return NULL;
>   }
>   
> +#ifdef CONFIG_CMA
> +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order)
> +{
> +	struct page *page;
> +
> +	if (zone->nr_try_movable > 0)
> +		goto alloc_movable;
> +
> +	if (zone->nr_try_cma > 0) {
> +		/* Okay. Now, we can try to allocate the page from cma region */
> +		zone->nr_try_cma -= 1 << order;
> +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> +
> +		/* CMA pages can vanish through CMA allocation */
> +		if (unlikely(!page && order == 0))
> +			zone->nr_try_cma = 0;
> +
> +		return page;
> +	}
> +
> +	/* Reset counter */
> +	zone->nr_try_movable = zone->max_try_movable;
> +	zone->nr_try_cma = zone->max_try_cma;
> +
> +alloc_movable:
> +	zone->nr_try_movable -= 1 << order;
> +	return NULL;
> +}
> +#endif
> +
>   /*
>    * Do the hard work of removing an element from the buddy allocator.
>    * Call me with the zone->lock already held.
> @@ -1143,10 +1223,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>   static struct page *__rmqueue(struct zone *zone, unsigned int order,
>   						int migratetype)
>   {
> -	struct page *page;
> +	struct page *page = NULL;
> +
> +	if (IS_ENABLED(CONFIG_CMA) &&

You might know that CONFIG_CMA is enabled and there is no CMA memory, because CONFIG_CMA_SIZE_MBYTES can be zero.
Is IS_ENABLED(CONFIG_CMA) alright in that case?

> +		migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
> +		page = __rmqueue_cma(zone, order);
>   
>   retry_reserve:
> -	page = __rmqueue_smallest(zone, order, migratetype);
> +	if (!page)
> +		page = __rmqueue_smallest(zone, order, migratetype);
>   
>   	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
>   		page = __rmqueue_fallback(zone, order, migratetype);
> @@ -4849,6 +4934,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>   		zone_seqlock_init(zone);
>   		zone->zone_pgdat = pgdat;
>   		zone_pcp_init(zone);
> +		if (IS_ENABLED(CONFIG_CMA))
> +			zone->managed_cma_pages = 0;
>   
>   		/* For bootup, initialized properly in watermark setup */
>   		mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-29  7:24     ` Gioh Kim
@ 2014-05-29  7:48       ` Joonsoo Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-05-29  7:48 UTC (permalink / raw)
  To: Gioh Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel

On Thu, May 29, 2014 at 04:24:58PM +0900, Gioh Kim wrote:
> I've not understand your code fully. Please let me ask some silly questions.
> 
> 2014-05-28 오후 4:04, Joonsoo Kim 쓴 글:
> > CMA is introduced to provide physically contiguous pages at runtime.
> > For this purpose, it reserves memory at boot time. Although it reserve
> > memory, this reserved memory can be used for movable memory allocation
> > request. This usecase is beneficial to the system that needs this CMA
> > reserved memory infrequently and it is one of main purpose of
> > introducing CMA.
> > 
> > But, there is a problem in current implementation. The problem is that
> > it works like as just reserved memory approach. The pages on cma reserved
> > memory are hardly used for movable memory allocation. This is caused by
> > combination of allocation and reclaim policy.
> > 
> > The pages on cma reserved memory are allocated if there is no movable
> > memory, that is, as fallback allocation. So the time this fallback
> > allocation is started is under heavy memory pressure. Although it is under
> > memory pressure, movable allocation easily succeed, since there would be
> > many pages on cma reserved memory. But this is not the case for unmovable
> > and reclaimable allocation, because they can't use the pages on cma
> > reserved memory. These allocations regard system's free memory as
> > (free pages - free cma pages) on watermark checking, that is, free
> > unmovable pages + free reclaimable pages + free movable pages. Because
> > we already exhausted movable pages, only free pages we have are unmovable
> > and reclaimable types and this would be really small amount. So watermark
> > checking would be failed. It will wake up kswapd to make enough free
> > memory for unmovable and reclaimable allocation and kswapd will do.
> > So before we fully utilize pages on cma reserved memory, kswapd start to
> > reclaim memory and try to make free memory over the high watermark. This
> > watermark checking by kswapd doesn't take care free cma pages so many
> > movable pages would be reclaimed. After then, we have a lot of movable
> > pages again, so fallback allocation doesn't happen again. To conclude,
> > amount of free memory on meminfo which includes free CMA pages is moving
> > around 512 MB if I reserve 512 MB memory for CMA.
> > 
> > I found this problem on following experiment.
> > 
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > make -j16
> > 
> > CMA reserve:            0 MB            512 MB
> > Elapsed-time:           225.2           472.5
> > Average-MemFree:        322490 KB       630839 KB
> > 
> > To solve this problem, I can think following 2 possible solutions.
> > 1. allocate the pages on cma reserved memory first, and if they are
> >     exhausted, allocate movable pages.
> > 2. interleaved allocation: try to allocate specific amounts of memory
> >     from cma reserved memory and then allocate from free movable memory.
> > 
> > I tested #1 approach and found the problem. Although free memory on
> > meminfo can move around low watermark, there is large fluctuation on free
> > memory, because too many pages are reclaimed when kswapd is invoked.
> > Reason for this behaviour is that successive allocated CMA pages are
> > on the LRU list in that order and kswapd reclaim them in same order.
> > These memory doesn't help watermark checking from kwapd, so too many
> > pages are reclaimed, I guess.
> > 
> > So, I implement #2 approach.
> > One thing I should note is that we should not change allocation target
> > (movable list or cma) on each allocation attempt, since this prevent
> > allocated pages to be in physically succession, so some I/O devices can
> > be hurt their performance. To solve this, I keep allocation target
> > in at least pageblock_nr_pages attempts and make this number reflect
> > ratio, free pages without free cma pages to free cma pages. With this
> > approach, system works very smoothly and fully utilize the pages on
> > cma reserved memory.
> > 
> > Following is the experimental result of this patch.
> > 
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > make -j16
> > 
> > <Before>
> > CMA reserve:            0 MB            512 MB
> > Elapsed-time:           225.2           472.5
> > Average-MemFree:        322490 KB       630839 KB
> > nr_free_cma:            0               131068
> > pswpin:                 0               261666
> > pswpout:                75              1241363
> > 
> > <After>
> > CMA reserve:            0 MB            512 MB
> > Elapsed-time:           222.7           224
> > Average-MemFree:        325595 KB       393033 KB
> > nr_free_cma:            0               61001
> > pswpin:                 0               6
> > pswpout:                44              502
> > 
> > There is no difference if we don't have cma reserved memory (0 MB case).
> > But, with cma reserved memory (512 MB case), we fully utilize these
> > reserved memory through this patch and the system behaves like as
> > it doesn't reserve any memory.
> > 
> > With this patch, we aggressively allocate the pages on cma reserved memory
> > so latency of CMA can arise. Below is the experimental result about
> > latency.
> > 
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > CMA reserve: 512 MB
> > Backgound Workload: make -jN
> > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval
> > 
> > N:                    1        4       8        16
> > Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
> > Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2
> > 
> > So generally we can see latency increase. Ratio of this increase
> > is rather big - up to 70%. But, under the heavy workload, it shows
> > latency decrease - up to 55%. This may be worst-case scenario, but
> > reducing it would be important for some system, so, I can say that
> > this patch have advantages and disadvantages in terms of latency.
> > 
> > Although I think that this patch is right direction for CMA, there is
> > side-effect in following case. If there is small memory zone and CMA
> > occupys most of them, LRU for this zone would have many CMA pages. When
> > reclaim is started, these CMA pages would be reclaimed, but not counted
> > for watermark checking, so too many CMA pages could be reclaimed
> > unnecessarily. Until now, this can't happen because free CMA pages aren't
> > used easily. But, with this patch, free CMA pages are used easily so
> > this problem can be possible. I will handle it on another patchset
> > after some investigating.
> > 
> > v2: In fastpath, just replenish counters. Calculation is done whenver
> >      cma area is varied
> > 
> > Acked-by: Michal Nazarewicz <mina86@mina86.com>
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
> > diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c
> > index d9d3d85..84a7582 100644
> > --- a/arch/powerpc/kvm/book3s_hv_cma.c
> > +++ b/arch/powerpc/kvm/book3s_hv_cma.c
> > @@ -132,6 +132,8 @@ struct page *kvm_alloc_cma(unsigned long nr_pages, unsigned long align_pages)
> >   		if (ret == 0) {
> >   			bitmap_set(cma->bitmap, pageno, nr_chunk);
> >   			page = pfn_to_page(pfn);
> > +			adjust_managed_cma_page_count(page_zone(page),
> > +								nr_pages);
> 
> I think it should be -nr_pages to decrease the managed_cma_pages variable.
> But it is not. I think there is a reason.
> Why the managed_cma_pages is increased by allocation?

Hello, Gioh.

It's my mistake. It should be -nr_pages.
Thanks for pointing out.

> 
> >   			memset(pfn_to_kaddr(pfn), 0, nr_pages << PAGE_SHIFT);
> >   			break;
> >   		} else if (ret != -EBUSY) {
> > @@ -180,6 +182,7 @@ bool kvm_release_cma(struct page *pages, unsigned long nr_pages)
> >   		     (pfn - cma->base_pfn) >> (KVM_CMA_CHUNK_ORDER - PAGE_SHIFT),
> >   		     nr_chunk);
> >   	free_contig_range(pfn, nr_pages);
> > +	adjust_managed_cma_page_count(page_zone(pages), nr_pages);
> >   	mutex_unlock(&kvm_cma_mutex);
> >   
> >   	return true;
> > @@ -210,6 +213,8 @@ static int __init kvm_cma_activate_area(unsigned long base_pfn,
> >   		}
> >   		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
> >   	} while (--i);
> > +	adjust_managed_cma_page_count(zone, count);
> > +
> >   	return 0;
> >   }
> >   
> > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> > index 165c2c2..c578d5a 100644
> > --- a/drivers/base/dma-contiguous.c
> > +++ b/drivers/base/dma-contiguous.c
> > @@ -160,6 +160,7 @@ static int __init cma_activate_area(struct cma *cma)
> >   		}
> >   		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
> >   	} while (--i);
> > +	adjust_managed_cma_page_count(zone, cma->count);
> >   
> >   	return 0;
> >   }
> > @@ -307,6 +308,7 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> >   		if (ret == 0) {
> >   			bitmap_set(cma->bitmap, pageno, count);
> >   			page = pfn_to_page(pfn);
> > +			adjust_managed_cma_page_count(page_zone(page), count);
> 
> I think this also should be -count.

Ditto.

> 
> >   			break;
> >   		} else if (ret != -EBUSY) {
> >   			break;
> > @@ -353,6 +355,7 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
> >   	mutex_lock(&cma_mutex);
> >   	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
> >   	free_contig_range(pfn, count);
> > +	adjust_managed_cma_page_count(page_zone(pages), count);
> >   	mutex_unlock(&cma_mutex);
> >   
> >   	return true;
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index 39b81dc..51cffc1 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -415,6 +415,7 @@ extern int alloc_contig_range(unsigned long start, unsigned long end,
> >   extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
> >   
> >   /* CMA stuff */
> > +extern void adjust_managed_cma_page_count(struct zone *zone, long count);
> >   extern void init_cma_reserved_pageblock(struct page *page);
> >   
> >   #endif
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index fac5509..f52cb96 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -389,6 +389,20 @@ struct zone {
> >   	int			compact_order_failed;
> >   #endif
> >   
> > +#ifdef CONFIG_CMA
> > +	unsigned long managed_cma_pages;
> > +	/*
> > +	 * Number of allocation attempt on each movable/cma type
> > +	 * without switching type. max_try(movable/cma) maintain
> > +	 * predefined calculated counter and replenish nr_try_(movable/cma)
> > +	 * with each of them whenever both of them are 0.
> > +	 */
> > +	int nr_try_movable;
> > +	int nr_try_cma;
> > +	int max_try_movable;
> > +	int max_try_cma;
> > +#endif
> > +
> >   	ZONE_PADDING(_pad1_)
> >   
> >   	/* Fields commonly accessed by the page reclaim scanner */
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 674ade7..ca678b6 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -788,6 +788,56 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
> >   }
> >   
> >   #ifdef CONFIG_CMA
> > +void adjust_managed_cma_page_count(struct zone *zone, long count)
> > +{
> > +	unsigned long flags;
> > +	long total, cma, movable;
> > +
> > +	spin_lock_irqsave(&zone->lock, flags);
> > +	zone->managed_cma_pages += count;
> > +
> > +	total = zone->managed_pages;
> > +	cma = zone->managed_cma_pages;
> > +	movable = total - cma - high_wmark_pages(zone);
> 
> If cma can be negative value, above calcuation increase movable value becuase -cma becomes positive value.
> Does it need a sign check?

This is leftover from version 1. They (totla, cma, movable)
can not be negative on this v2. I will fix it on v3.

> 
> 
> 
> > +
> > +	/* No cma pages, so do only movable allocation */
> > +	if (cma <= 0) {
> > +		zone->max_try_movable = pageblock_nr_pages;
> > +		zone->max_try_cma = 0;
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * We want to consume cma pages with well balanced ratio so that
> > +	 * we have consumed enough cma pages before the reclaim. For this
> > +	 * purpose, we can use the ratio, movable : cma. And we doesn't
> > +	 * want to switch too frequently, because it prevent allocated pages
> > +	 * from beging successive and it is bad for some sorts of devices.
> > +	 * I choose pageblock_nr_pages for the minimum amount of successive
> > +	 * allocation because it is the size of a huge page and fragmentation
> > +	 * avoidance is implemented based on this size.
> > +	 *
> > +	 * To meet above criteria, I derive following equation.
> > +	 *
> > +	 * if (movable > cma) then; movable : cma = X : pageblock_nr_pages
> > +	 * else (movable <= cma) then; movable : cma = pageblock_nr_pages : X
> > +	 */
> > +	if (movable > cma) {
> > +		zone->max_try_movable =
> > +			(movable * pageblock_nr_pages) / cma;
> 
> I think you assume that cma value cannot be negative. If cma can be negative, the resule of dividing by cma becomes negative. Right?

It cannot be negative.

> 
> > +		zone->max_try_cma = pageblock_nr_pages;
> > +	} else {
> > +		zone->max_try_movable = pageblock_nr_pages;
> > +		zone->max_try_cma = cma * pageblock_nr_pages / movable;
> > +	}
> > +
> > +out:
> > +	zone->nr_try_movable = zone->max_try_movable;
> > +	zone->nr_try_cma = zone->max_try_cma;
> > +
> > +	spin_unlock_irqrestore(&zone->lock, flags);
> > +}
> > +
> >   /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> >   void __init init_cma_reserved_pageblock(struct page *page)
> >   {
> > @@ -1136,6 +1186,36 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> >   	return NULL;
> >   }
> >   
> > +#ifdef CONFIG_CMA
> > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order)
> > +{
> > +	struct page *page;
> > +
> > +	if (zone->nr_try_movable > 0)
> > +		goto alloc_movable;
> > +
> > +	if (zone->nr_try_cma > 0) {
> > +		/* Okay. Now, we can try to allocate the page from cma region */
> > +		zone->nr_try_cma -= 1 << order;
> > +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> > +
> > +		/* CMA pages can vanish through CMA allocation */
> > +		if (unlikely(!page && order == 0))
> > +			zone->nr_try_cma = 0;
> > +
> > +		return page;
> > +	}
> > +
> > +	/* Reset counter */
> > +	zone->nr_try_movable = zone->max_try_movable;
> > +	zone->nr_try_cma = zone->max_try_cma;
> > +
> > +alloc_movable:
> > +	zone->nr_try_movable -= 1 << order;
> > +	return NULL;
> > +}
> > +#endif
> > +
> >   /*
> >    * Do the hard work of removing an element from the buddy allocator.
> >    * Call me with the zone->lock already held.
> > @@ -1143,10 +1223,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> >   static struct page *__rmqueue(struct zone *zone, unsigned int order,
> >   						int migratetype)
> >   {
> > -	struct page *page;
> > +	struct page *page = NULL;
> > +
> > +	if (IS_ENABLED(CONFIG_CMA) &&
> 
> You might know that CONFIG_CMA is enabled and there is no CMA memory, because CONFIG_CMA_SIZE_MBYTES can be zero.
> Is IS_ENABLED(CONFIG_CMA) alright in that case?

next line checks whether zone->managed_cma_pages is positive or not.
If there is no CMA memory, zone->managed_cma_pages will be zero and
we will skip to call __rmqueue_cma().

Thanks for review!!!

Thanks.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
@ 2014-05-29  7:48       ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-05-29  7:48 UTC (permalink / raw)
  To: Gioh Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel

On Thu, May 29, 2014 at 04:24:58PM +0900, Gioh Kim wrote:
> I've not understand your code fully. Please let me ask some silly questions.
> 
> 2014-05-28 i??i?? 4:04, Joonsoo Kim i?' e,?:
> > CMA is introduced to provide physically contiguous pages at runtime.
> > For this purpose, it reserves memory at boot time. Although it reserve
> > memory, this reserved memory can be used for movable memory allocation
> > request. This usecase is beneficial to the system that needs this CMA
> > reserved memory infrequently and it is one of main purpose of
> > introducing CMA.
> > 
> > But, there is a problem in current implementation. The problem is that
> > it works like as just reserved memory approach. The pages on cma reserved
> > memory are hardly used for movable memory allocation. This is caused by
> > combination of allocation and reclaim policy.
> > 
> > The pages on cma reserved memory are allocated if there is no movable
> > memory, that is, as fallback allocation. So the time this fallback
> > allocation is started is under heavy memory pressure. Although it is under
> > memory pressure, movable allocation easily succeed, since there would be
> > many pages on cma reserved memory. But this is not the case for unmovable
> > and reclaimable allocation, because they can't use the pages on cma
> > reserved memory. These allocations regard system's free memory as
> > (free pages - free cma pages) on watermark checking, that is, free
> > unmovable pages + free reclaimable pages + free movable pages. Because
> > we already exhausted movable pages, only free pages we have are unmovable
> > and reclaimable types and this would be really small amount. So watermark
> > checking would be failed. It will wake up kswapd to make enough free
> > memory for unmovable and reclaimable allocation and kswapd will do.
> > So before we fully utilize pages on cma reserved memory, kswapd start to
> > reclaim memory and try to make free memory over the high watermark. This
> > watermark checking by kswapd doesn't take care free cma pages so many
> > movable pages would be reclaimed. After then, we have a lot of movable
> > pages again, so fallback allocation doesn't happen again. To conclude,
> > amount of free memory on meminfo which includes free CMA pages is moving
> > around 512 MB if I reserve 512 MB memory for CMA.
> > 
> > I found this problem on following experiment.
> > 
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > make -j16
> > 
> > CMA reserve:            0 MB            512 MB
> > Elapsed-time:           225.2           472.5
> > Average-MemFree:        322490 KB       630839 KB
> > 
> > To solve this problem, I can think following 2 possible solutions.
> > 1. allocate the pages on cma reserved memory first, and if they are
> >     exhausted, allocate movable pages.
> > 2. interleaved allocation: try to allocate specific amounts of memory
> >     from cma reserved memory and then allocate from free movable memory.
> > 
> > I tested #1 approach and found the problem. Although free memory on
> > meminfo can move around low watermark, there is large fluctuation on free
> > memory, because too many pages are reclaimed when kswapd is invoked.
> > Reason for this behaviour is that successive allocated CMA pages are
> > on the LRU list in that order and kswapd reclaim them in same order.
> > These memory doesn't help watermark checking from kwapd, so too many
> > pages are reclaimed, I guess.
> > 
> > So, I implement #2 approach.
> > One thing I should note is that we should not change allocation target
> > (movable list or cma) on each allocation attempt, since this prevent
> > allocated pages to be in physically succession, so some I/O devices can
> > be hurt their performance. To solve this, I keep allocation target
> > in at least pageblock_nr_pages attempts and make this number reflect
> > ratio, free pages without free cma pages to free cma pages. With this
> > approach, system works very smoothly and fully utilize the pages on
> > cma reserved memory.
> > 
> > Following is the experimental result of this patch.
> > 
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > make -j16
> > 
> > <Before>
> > CMA reserve:            0 MB            512 MB
> > Elapsed-time:           225.2           472.5
> > Average-MemFree:        322490 KB       630839 KB
> > nr_free_cma:            0               131068
> > pswpin:                 0               261666
> > pswpout:                75              1241363
> > 
> > <After>
> > CMA reserve:            0 MB            512 MB
> > Elapsed-time:           222.7           224
> > Average-MemFree:        325595 KB       393033 KB
> > nr_free_cma:            0               61001
> > pswpin:                 0               6
> > pswpout:                44              502
> > 
> > There is no difference if we don't have cma reserved memory (0 MB case).
> > But, with cma reserved memory (512 MB case), we fully utilize these
> > reserved memory through this patch and the system behaves like as
> > it doesn't reserve any memory.
> > 
> > With this patch, we aggressively allocate the pages on cma reserved memory
> > so latency of CMA can arise. Below is the experimental result about
> > latency.
> > 
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > CMA reserve: 512 MB
> > Backgound Workload: make -jN
> > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval
> > 
> > N:                    1        4       8        16
> > Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
> > Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2
> > 
> > So generally we can see latency increase. Ratio of this increase
> > is rather big - up to 70%. But, under the heavy workload, it shows
> > latency decrease - up to 55%. This may be worst-case scenario, but
> > reducing it would be important for some system, so, I can say that
> > this patch have advantages and disadvantages in terms of latency.
> > 
> > Although I think that this patch is right direction for CMA, there is
> > side-effect in following case. If there is small memory zone and CMA
> > occupys most of them, LRU for this zone would have many CMA pages. When
> > reclaim is started, these CMA pages would be reclaimed, but not counted
> > for watermark checking, so too many CMA pages could be reclaimed
> > unnecessarily. Until now, this can't happen because free CMA pages aren't
> > used easily. But, with this patch, free CMA pages are used easily so
> > this problem can be possible. I will handle it on another patchset
> > after some investigating.
> > 
> > v2: In fastpath, just replenish counters. Calculation is done whenver
> >      cma area is varied
> > 
> > Acked-by: Michal Nazarewicz <mina86@mina86.com>
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
> > diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c
> > index d9d3d85..84a7582 100644
> > --- a/arch/powerpc/kvm/book3s_hv_cma.c
> > +++ b/arch/powerpc/kvm/book3s_hv_cma.c
> > @@ -132,6 +132,8 @@ struct page *kvm_alloc_cma(unsigned long nr_pages, unsigned long align_pages)
> >   		if (ret == 0) {
> >   			bitmap_set(cma->bitmap, pageno, nr_chunk);
> >   			page = pfn_to_page(pfn);
> > +			adjust_managed_cma_page_count(page_zone(page),
> > +								nr_pages);
> 
> I think it should be -nr_pages to decrease the managed_cma_pages variable.
> But it is not. I think there is a reason.
> Why the managed_cma_pages is increased by allocation?

Hello, Gioh.

It's my mistake. It should be -nr_pages.
Thanks for pointing out.

> 
> >   			memset(pfn_to_kaddr(pfn), 0, nr_pages << PAGE_SHIFT);
> >   			break;
> >   		} else if (ret != -EBUSY) {
> > @@ -180,6 +182,7 @@ bool kvm_release_cma(struct page *pages, unsigned long nr_pages)
> >   		     (pfn - cma->base_pfn) >> (KVM_CMA_CHUNK_ORDER - PAGE_SHIFT),
> >   		     nr_chunk);
> >   	free_contig_range(pfn, nr_pages);
> > +	adjust_managed_cma_page_count(page_zone(pages), nr_pages);
> >   	mutex_unlock(&kvm_cma_mutex);
> >   
> >   	return true;
> > @@ -210,6 +213,8 @@ static int __init kvm_cma_activate_area(unsigned long base_pfn,
> >   		}
> >   		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
> >   	} while (--i);
> > +	adjust_managed_cma_page_count(zone, count);
> > +
> >   	return 0;
> >   }
> >   
> > diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> > index 165c2c2..c578d5a 100644
> > --- a/drivers/base/dma-contiguous.c
> > +++ b/drivers/base/dma-contiguous.c
> > @@ -160,6 +160,7 @@ static int __init cma_activate_area(struct cma *cma)
> >   		}
> >   		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
> >   	} while (--i);
> > +	adjust_managed_cma_page_count(zone, cma->count);
> >   
> >   	return 0;
> >   }
> > @@ -307,6 +308,7 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count,
> >   		if (ret == 0) {
> >   			bitmap_set(cma->bitmap, pageno, count);
> >   			page = pfn_to_page(pfn);
> > +			adjust_managed_cma_page_count(page_zone(page), count);
> 
> I think this also should be -count.

Ditto.

> 
> >   			break;
> >   		} else if (ret != -EBUSY) {
> >   			break;
> > @@ -353,6 +355,7 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
> >   	mutex_lock(&cma_mutex);
> >   	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
> >   	free_contig_range(pfn, count);
> > +	adjust_managed_cma_page_count(page_zone(pages), count);
> >   	mutex_unlock(&cma_mutex);
> >   
> >   	return true;
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index 39b81dc..51cffc1 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -415,6 +415,7 @@ extern int alloc_contig_range(unsigned long start, unsigned long end,
> >   extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
> >   
> >   /* CMA stuff */
> > +extern void adjust_managed_cma_page_count(struct zone *zone, long count);
> >   extern void init_cma_reserved_pageblock(struct page *page);
> >   
> >   #endif
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index fac5509..f52cb96 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -389,6 +389,20 @@ struct zone {
> >   	int			compact_order_failed;
> >   #endif
> >   
> > +#ifdef CONFIG_CMA
> > +	unsigned long managed_cma_pages;
> > +	/*
> > +	 * Number of allocation attempt on each movable/cma type
> > +	 * without switching type. max_try(movable/cma) maintain
> > +	 * predefined calculated counter and replenish nr_try_(movable/cma)
> > +	 * with each of them whenever both of them are 0.
> > +	 */
> > +	int nr_try_movable;
> > +	int nr_try_cma;
> > +	int max_try_movable;
> > +	int max_try_cma;
> > +#endif
> > +
> >   	ZONE_PADDING(_pad1_)
> >   
> >   	/* Fields commonly accessed by the page reclaim scanner */
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 674ade7..ca678b6 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -788,6 +788,56 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
> >   }
> >   
> >   #ifdef CONFIG_CMA
> > +void adjust_managed_cma_page_count(struct zone *zone, long count)
> > +{
> > +	unsigned long flags;
> > +	long total, cma, movable;
> > +
> > +	spin_lock_irqsave(&zone->lock, flags);
> > +	zone->managed_cma_pages += count;
> > +
> > +	total = zone->managed_pages;
> > +	cma = zone->managed_cma_pages;
> > +	movable = total - cma - high_wmark_pages(zone);
> 
> If cma can be negative value, above calcuation increase movable value becuase -cma becomes positive value.
> Does it need a sign check?

This is leftover from version 1. They (totla, cma, movable)
can not be negative on this v2. I will fix it on v3.

> 
> 
> 
> > +
> > +	/* No cma pages, so do only movable allocation */
> > +	if (cma <= 0) {
> > +		zone->max_try_movable = pageblock_nr_pages;
> > +		zone->max_try_cma = 0;
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * We want to consume cma pages with well balanced ratio so that
> > +	 * we have consumed enough cma pages before the reclaim. For this
> > +	 * purpose, we can use the ratio, movable : cma. And we doesn't
> > +	 * want to switch too frequently, because it prevent allocated pages
> > +	 * from beging successive and it is bad for some sorts of devices.
> > +	 * I choose pageblock_nr_pages for the minimum amount of successive
> > +	 * allocation because it is the size of a huge page and fragmentation
> > +	 * avoidance is implemented based on this size.
> > +	 *
> > +	 * To meet above criteria, I derive following equation.
> > +	 *
> > +	 * if (movable > cma) then; movable : cma = X : pageblock_nr_pages
> > +	 * else (movable <= cma) then; movable : cma = pageblock_nr_pages : X
> > +	 */
> > +	if (movable > cma) {
> > +		zone->max_try_movable =
> > +			(movable * pageblock_nr_pages) / cma;
> 
> I think you assume that cma value cannot be negative. If cma can be negative, the resule of dividing by cma becomes negative. Right?

It cannot be negative.

> 
> > +		zone->max_try_cma = pageblock_nr_pages;
> > +	} else {
> > +		zone->max_try_movable = pageblock_nr_pages;
> > +		zone->max_try_cma = cma * pageblock_nr_pages / movable;
> > +	}
> > +
> > +out:
> > +	zone->nr_try_movable = zone->max_try_movable;
> > +	zone->nr_try_cma = zone->max_try_cma;
> > +
> > +	spin_unlock_irqrestore(&zone->lock, flags);
> > +}
> > +
> >   /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> >   void __init init_cma_reserved_pageblock(struct page *page)
> >   {
> > @@ -1136,6 +1186,36 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> >   	return NULL;
> >   }
> >   
> > +#ifdef CONFIG_CMA
> > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order)
> > +{
> > +	struct page *page;
> > +
> > +	if (zone->nr_try_movable > 0)
> > +		goto alloc_movable;
> > +
> > +	if (zone->nr_try_cma > 0) {
> > +		/* Okay. Now, we can try to allocate the page from cma region */
> > +		zone->nr_try_cma -= 1 << order;
> > +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> > +
> > +		/* CMA pages can vanish through CMA allocation */
> > +		if (unlikely(!page && order == 0))
> > +			zone->nr_try_cma = 0;
> > +
> > +		return page;
> > +	}
> > +
> > +	/* Reset counter */
> > +	zone->nr_try_movable = zone->max_try_movable;
> > +	zone->nr_try_cma = zone->max_try_cma;
> > +
> > +alloc_movable:
> > +	zone->nr_try_movable -= 1 << order;
> > +	return NULL;
> > +}
> > +#endif
> > +
> >   /*
> >    * Do the hard work of removing an element from the buddy allocator.
> >    * Call me with the zone->lock already held.
> > @@ -1143,10 +1223,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> >   static struct page *__rmqueue(struct zone *zone, unsigned int order,
> >   						int migratetype)
> >   {
> > -	struct page *page;
> > +	struct page *page = NULL;
> > +
> > +	if (IS_ENABLED(CONFIG_CMA) &&
> 
> You might know that CONFIG_CMA is enabled and there is no CMA memory, because CONFIG_CMA_SIZE_MBYTES can be zero.
> Is IS_ENABLED(CONFIG_CMA) alright in that case?

next line checks whether zone->managed_cma_pages is positive or not.
If there is no CMA memory, zone->managed_cma_pages will be zero and
we will skip to call __rmqueue_cma().

Thanks for review!!!

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-29  7:48       ` Joonsoo Kim
@ 2014-05-29  8:09         ` Gioh Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Gioh Kim @ 2014-05-29  8:09 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel


>>> +
>>>    /*
>>>     * Do the hard work of removing an element from the buddy allocator.
>>>     * Call me with the zone->lock already held.
>>> @@ -1143,10 +1223,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>>>    static struct page *__rmqueue(struct zone *zone, unsigned int order,
>>>    						int migratetype)
>>>    {
>>> -	struct page *page;
>>> +	struct page *page = NULL;
>>> +
>>> +	if (IS_ENABLED(CONFIG_CMA) &&
>>
>> You might know that CONFIG_CMA is enabled and there is no CMA memory, because CONFIG_CMA_SIZE_MBYTES can be zero.
>> Is IS_ENABLED(CONFIG_CMA) alright in that case?
>
> next line checks whether zone->managed_cma_pages is positive or not.
> If there is no CMA memory, zone->managed_cma_pages will be zero and
> we will skip to call __rmqueue_cma().

Is IS_ENABLED(CONFIG_CMA) necessary?
What about if (migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages) ?

>
> Thanks for review!!!
>
> Thanks.
>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
@ 2014-05-29  8:09         ` Gioh Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Gioh Kim @ 2014-05-29  8:09 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel


>>> +
>>>    /*
>>>     * Do the hard work of removing an element from the buddy allocator.
>>>     * Call me with the zone->lock already held.
>>> @@ -1143,10 +1223,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>>>    static struct page *__rmqueue(struct zone *zone, unsigned int order,
>>>    						int migratetype)
>>>    {
>>> -	struct page *page;
>>> +	struct page *page = NULL;
>>> +
>>> +	if (IS_ENABLED(CONFIG_CMA) &&
>>
>> You might know that CONFIG_CMA is enabled and there is no CMA memory, because CONFIG_CMA_SIZE_MBYTES can be zero.
>> Is IS_ENABLED(CONFIG_CMA) alright in that case?
>
> next line checks whether zone->managed_cma_pages is positive or not.
> If there is no CMA memory, zone->managed_cma_pages will be zero and
> we will skip to call __rmqueue_cma().

Is IS_ENABLED(CONFIG_CMA) necessary?
What about if (migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages) ?

>
> Thanks for review!!!
>
> Thanks.
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-29  8:09         ` Gioh Kim
@ 2014-05-30  0:45           ` Joonsoo Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-05-30  0:45 UTC (permalink / raw)
  To: Gioh Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel

On Thu, May 29, 2014 at 05:09:34PM +0900, Gioh Kim wrote:
> 
> >>>+
> >>>   /*
> >>>    * Do the hard work of removing an element from the buddy allocator.
> >>>    * Call me with the zone->lock already held.
> >>>@@ -1143,10 +1223,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> >>>   static struct page *__rmqueue(struct zone *zone, unsigned int order,
> >>>   						int migratetype)
> >>>   {
> >>>-	struct page *page;
> >>>+	struct page *page = NULL;
> >>>+
> >>>+	if (IS_ENABLED(CONFIG_CMA) &&
> >>
> >>You might know that CONFIG_CMA is enabled and there is no CMA memory, because CONFIG_CMA_SIZE_MBYTES can be zero.
> >>Is IS_ENABLED(CONFIG_CMA) alright in that case?
> >
> >next line checks whether zone->managed_cma_pages is positive or not.
> >If there is no CMA memory, zone->managed_cma_pages will be zero and
> >we will skip to call __rmqueue_cma().
> 
> Is IS_ENABLED(CONFIG_CMA) necessary?
> What about if (migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages) ?

Yes, field, managed_cma_pages exists only if CONFIG_CMA is enabled, so
removing IS_ENABLE(CONFIG_CMA) would break the build.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
@ 2014-05-30  0:45           ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-05-30  0:45 UTC (permalink / raw)
  To: Gioh Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel

On Thu, May 29, 2014 at 05:09:34PM +0900, Gioh Kim wrote:
> 
> >>>+
> >>>   /*
> >>>    * Do the hard work of removing an element from the buddy allocator.
> >>>    * Call me with the zone->lock already held.
> >>>@@ -1143,10 +1223,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> >>>   static struct page *__rmqueue(struct zone *zone, unsigned int order,
> >>>   						int migratetype)
> >>>   {
> >>>-	struct page *page;
> >>>+	struct page *page = NULL;
> >>>+
> >>>+	if (IS_ENABLED(CONFIG_CMA) &&
> >>
> >>You might know that CONFIG_CMA is enabled and there is no CMA memory, because CONFIG_CMA_SIZE_MBYTES can be zero.
> >>Is IS_ENABLED(CONFIG_CMA) alright in that case?
> >
> >next line checks whether zone->managed_cma_pages is positive or not.
> >If there is no CMA memory, zone->managed_cma_pages will be zero and
> >we will skip to call __rmqueue_cma().
> 
> Is IS_ENABLED(CONFIG_CMA) necessary?
> What about if (migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages) ?

Yes, field, managed_cma_pages exists only if CONFIG_CMA is enabled, so
removing IS_ENABLE(CONFIG_CMA) would break the build.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-28  7:04   ` Joonsoo Kim
@ 2014-05-30  7:53     ` Gioh Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Gioh Kim @ 2014-05-30  7:53 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Laura Abbott,
	Minchan Kim, Heesub Shin, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel

Joonsoo,

I'm attaching a patch for combination of __rmqueue and __rmqueue_cma.
I didn't test fully but my board is turned on and working well if no frequent memory allocations.

I'm sorry to send not-tested code.
I just want to report this during your working hour ;-)

I'm testing this this evening and reporting next week.
Have a nice weekend!

-------------------------------------- 8< -----------------------------------------
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7f97767..9ced736 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -964,7 +964,7 @@ static int fallbacks[MIGRATE_TYPES][4] = {
        [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,     MIGRATE_R
 #ifdef CONFIG_CMA
        [MIGRATE_MOVABLE]     = { MIGRATE_CMA,         MIGRATE_RECLAIMABLE, MIGRATE_U
-       [MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
+       [MIGRATE_CMA]         = { MIGRATE_MOVABLE,     MIGRATE_RECLAIMABLE, MIGRATE_U
 #else
        [MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE,   MIGRATE_R
 #endif
@@ -1170,9 +1170,22 @@ static struct page *__rmqueue(struct zone *zone, unsigned int
                                                int migratetype)
 {
        struct page *page;
+       long free, free_cma, free_wmark;

 retry_reserve:
-       page = __rmqueue_smallest(zone, order, migratetype);
+       if (IS_ENABLED(CONFIG_CMA) && migratetype == MIGRATE_MOVABLE) {
+               if (zone->nr_try_movable) {
+                       zone->nr_try_movable -= 1 << order;
+               } else if (zone->nr_try_cma) {
+                       zone->nr_try_cma -= 1 << order;
+                       migratetype = MIGRATE_CMA;
+               } else {
+                       zone->nr_try_movable = zone->max_try_movable;
+                       zone->nr_try_movable -= 1 << order;
+                       zone->nr_try_cma = zone->max_try_cma;
+               }
+       }
+       page = __rmqueue_smallest(zone, order, migratetype);

        if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
                page = __rmqueue_fallback(zone, order, migratetype);


2014-05-28 오후 4:04, Joonsoo Kim 쓴 글:
> CMA is introduced to provide physically contiguous pages at runtime.
> For this purpose, it reserves memory at boot time. Although it reserve
> memory, this reserved memory can be used for movable memory allocation
> request. This usecase is beneficial to the system that needs this CMA
> reserved memory infrequently and it is one of main purpose of
> introducing CMA.
> 
> But, there is a problem in current implementation. The problem is that
> it works like as just reserved memory approach. The pages on cma reserved
> memory are hardly used for movable memory allocation. This is caused by
> combination of allocation and reclaim policy.
> 
> The pages on cma reserved memory are allocated if there is no movable
> memory, that is, as fallback allocation. So the time this fallback
> allocation is started is under heavy memory pressure. Although it is under
> memory pressure, movable allocation easily succeed, since there would be
> many pages on cma reserved memory. But this is not the case for unmovable
> and reclaimable allocation, because they can't use the pages on cma
> reserved memory. These allocations regard system's free memory as
> (free pages - free cma pages) on watermark checking, that is, free
> unmovable pages + free reclaimable pages + free movable pages. Because
> we already exhausted movable pages, only free pages we have are unmovable
> and reclaimable types and this would be really small amount. So watermark
> checking would be failed. It will wake up kswapd to make enough free
> memory for unmovable and reclaimable allocation and kswapd will do.
> So before we fully utilize pages on cma reserved memory, kswapd start to
> reclaim memory and try to make free memory over the high watermark. This
> watermark checking by kswapd doesn't take care free cma pages so many
> movable pages would be reclaimed. After then, we have a lot of movable
> pages again, so fallback allocation doesn't happen again. To conclude,
> amount of free memory on meminfo which includes free CMA pages is moving
> around 512 MB if I reserve 512 MB memory for CMA.
> 
> I found this problem on following experiment.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
> 
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           225.2           472.5
> Average-MemFree:        322490 KB       630839 KB
> 
> To solve this problem, I can think following 2 possible solutions.
> 1. allocate the pages on cma reserved memory first, and if they are
>     exhausted, allocate movable pages.
> 2. interleaved allocation: try to allocate specific amounts of memory
>     from cma reserved memory and then allocate from free movable memory.
> 
> I tested #1 approach and found the problem. Although free memory on
> meminfo can move around low watermark, there is large fluctuation on free
> memory, because too many pages are reclaimed when kswapd is invoked.
> Reason for this behaviour is that successive allocated CMA pages are
> on the LRU list in that order and kswapd reclaim them in same order.
> These memory doesn't help watermark checking from kwapd, so too many
> pages are reclaimed, I guess.
> 
> So, I implement #2 approach.
> One thing I should note is that we should not change allocation target
> (movable list or cma) on each allocation attempt, since this prevent
> allocated pages to be in physically succession, so some I/O devices can
> be hurt their performance. To solve this, I keep allocation target
> in at least pageblock_nr_pages attempts and make this number reflect
> ratio, free pages without free cma pages to free cma pages. With this
> approach, system works very smoothly and fully utilize the pages on
> cma reserved memory.
> 
> Following is the experimental result of this patch.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
> 
> <Before>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           225.2           472.5
> Average-MemFree:        322490 KB       630839 KB
> nr_free_cma:            0               131068
> pswpin:                 0               261666
> pswpout:                75              1241363
> 
> <After>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           222.7           224
> Average-MemFree:        325595 KB       393033 KB
> nr_free_cma:            0               61001
> pswpin:                 0               6
> pswpout:                44              502
> 
> There is no difference if we don't have cma reserved memory (0 MB case).
> But, with cma reserved memory (512 MB case), we fully utilize these
> reserved memory through this patch and the system behaves like as
> it doesn't reserve any memory.
> 
> With this patch, we aggressively allocate the pages on cma reserved memory
> so latency of CMA can arise. Below is the experimental result about
> latency.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> CMA reserve: 512 MB
> Backgound Workload: make -jN
> Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval
> 
> N:                    1        4       8        16
> Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
> Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2
> 
> So generally we can see latency increase. Ratio of this increase
> is rather big - up to 70%. But, under the heavy workload, it shows
> latency decrease - up to 55%. This may be worst-case scenario, but
> reducing it would be important for some system, so, I can say that
> this patch have advantages and disadvantages in terms of latency.
> 
> Although I think that this patch is right direction for CMA, there is
> side-effect in following case. If there is small memory zone and CMA
> occupys most of them, LRU for this zone would have many CMA pages. When
> reclaim is started, these CMA pages would be reclaimed, but not counted
> for watermark checking, so too many CMA pages could be reclaimed
> unnecessarily. Until now, this can't happen because free CMA pages aren't
> used easily. But, with this patch, free CMA pages are used easily so
> this problem can be possible. I will handle it on another patchset
> after some investigating.
> 
> v2: In fastpath, just replenish counters. Calculation is done whenver
>      cma area is varied
> 
> Acked-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c
> index d9d3d85..84a7582 100644
> --- a/arch/powerpc/kvm/book3s_hv_cma.c
> +++ b/arch/powerpc/kvm/book3s_hv_cma.c
> @@ -132,6 +132,8 @@ struct page *kvm_alloc_cma(unsigned long nr_pages, unsigned long align_pages)
>   		if (ret == 0) {
>   			bitmap_set(cma->bitmap, pageno, nr_chunk);
>   			page = pfn_to_page(pfn);
> +			adjust_managed_cma_page_count(page_zone(page),
> +								nr_pages);
>   			memset(pfn_to_kaddr(pfn), 0, nr_pages << PAGE_SHIFT);
>   			break;
>   		} else if (ret != -EBUSY) {
> @@ -180,6 +182,7 @@ bool kvm_release_cma(struct page *pages, unsigned long nr_pages)
>   		     (pfn - cma->base_pfn) >> (KVM_CMA_CHUNK_ORDER - PAGE_SHIFT),
>   		     nr_chunk);
>   	free_contig_range(pfn, nr_pages);
> +	adjust_managed_cma_page_count(page_zone(pages), nr_pages);
>   	mutex_unlock(&kvm_cma_mutex);
>   
>   	return true;
> @@ -210,6 +213,8 @@ static int __init kvm_cma_activate_area(unsigned long base_pfn,
>   		}
>   		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
>   	} while (--i);
> +	adjust_managed_cma_page_count(zone, count);
> +
>   	return 0;
>   }
>   
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index 165c2c2..c578d5a 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -160,6 +160,7 @@ static int __init cma_activate_area(struct cma *cma)
>   		}
>   		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
>   	} while (--i);
> +	adjust_managed_cma_page_count(zone, cma->count);
>   
>   	return 0;
>   }
> @@ -307,6 +308,7 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count,
>   		if (ret == 0) {
>   			bitmap_set(cma->bitmap, pageno, count);
>   			page = pfn_to_page(pfn);
> +			adjust_managed_cma_page_count(page_zone(page), count);
>   			break;
>   		} else if (ret != -EBUSY) {
>   			break;
> @@ -353,6 +355,7 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
>   	mutex_lock(&cma_mutex);
>   	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
>   	free_contig_range(pfn, count);
> +	adjust_managed_cma_page_count(page_zone(pages), count);
>   	mutex_unlock(&cma_mutex);
>   
>   	return true;
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 39b81dc..51cffc1 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -415,6 +415,7 @@ extern int alloc_contig_range(unsigned long start, unsigned long end,
>   extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
>   
>   /* CMA stuff */
> +extern void adjust_managed_cma_page_count(struct zone *zone, long count);
>   extern void init_cma_reserved_pageblock(struct page *page);
>   
>   #endif
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fac5509..f52cb96 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -389,6 +389,20 @@ struct zone {
>   	int			compact_order_failed;
>   #endif
>   
> +#ifdef CONFIG_CMA
> +	unsigned long managed_cma_pages;
> +	/*
> +	 * Number of allocation attempt on each movable/cma type
> +	 * without switching type. max_try(movable/cma) maintain
> +	 * predefined calculated counter and replenish nr_try_(movable/cma)
> +	 * with each of them whenever both of them are 0.
> +	 */
> +	int nr_try_movable;
> +	int nr_try_cma;
> +	int max_try_movable;
> +	int max_try_cma;
> +#endif
> +
>   	ZONE_PADDING(_pad1_)
>   
>   	/* Fields commonly accessed by the page reclaim scanner */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 674ade7..ca678b6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -788,6 +788,56 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
>   }
>   
>   #ifdef CONFIG_CMA
> +void adjust_managed_cma_page_count(struct zone *zone, long count)
> +{
> +	unsigned long flags;
> +	long total, cma, movable;
> +
> +	spin_lock_irqsave(&zone->lock, flags);
> +	zone->managed_cma_pages += count;
> +
> +	total = zone->managed_pages;
> +	cma = zone->managed_cma_pages;
> +	movable = total - cma - high_wmark_pages(zone);
> +
> +	/* No cma pages, so do only movable allocation */
> +	if (cma <= 0) {
> +		zone->max_try_movable = pageblock_nr_pages;
> +		zone->max_try_cma = 0;
> +		goto out;
> +	}
> +
> +	/*
> +	 * We want to consume cma pages with well balanced ratio so that
> +	 * we have consumed enough cma pages before the reclaim. For this
> +	 * purpose, we can use the ratio, movable : cma. And we doesn't
> +	 * want to switch too frequently, because it prevent allocated pages
> +	 * from beging successive and it is bad for some sorts of devices.
> +	 * I choose pageblock_nr_pages for the minimum amount of successive
> +	 * allocation because it is the size of a huge page and fragmentation
> +	 * avoidance is implemented based on this size.
> +	 *
> +	 * To meet above criteria, I derive following equation.
> +	 *
> +	 * if (movable > cma) then; movable : cma = X : pageblock_nr_pages
> +	 * else (movable <= cma) then; movable : cma = pageblock_nr_pages : X
> +	 */
> +	if (movable > cma) {
> +		zone->max_try_movable =
> +			(movable * pageblock_nr_pages) / cma;
> +		zone->max_try_cma = pageblock_nr_pages;
> +	} else {
> +		zone->max_try_movable = pageblock_nr_pages;
> +		zone->max_try_cma = cma * pageblock_nr_pages / movable;
> +	}
> +
> +out:
> +	zone->nr_try_movable = zone->max_try_movable;
> +	zone->nr_try_cma = zone->max_try_cma;
> +
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +}
> +
>   /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>   void __init init_cma_reserved_pageblock(struct page *page)
>   {
> @@ -1136,6 +1186,36 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>   	return NULL;
>   }
>   
> +#ifdef CONFIG_CMA
> +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order)
> +{
> +	struct page *page;
> +
> +	if (zone->nr_try_movable > 0)
> +		goto alloc_movable;
> +
> +	if (zone->nr_try_cma > 0) {
> +		/* Okay. Now, we can try to allocate the page from cma region */
> +		zone->nr_try_cma -= 1 << order;
> +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> +
> +		/* CMA pages can vanish through CMA allocation */
> +		if (unlikely(!page && order == 0))
> +			zone->nr_try_cma = 0;
> +
> +		return page;
> +	}
> +
> +	/* Reset counter */
> +	zone->nr_try_movable = zone->max_try_movable;
> +	zone->nr_try_cma = zone->max_try_cma;
> +
> +alloc_movable:
> +	zone->nr_try_movable -= 1 << order;
> +	return NULL;
> +}
> +#endif
> +
>   /*
>    * Do the hard work of removing an element from the buddy allocator.
>    * Call me with the zone->lock already held.
> @@ -1143,10 +1223,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>   static struct page *__rmqueue(struct zone *zone, unsigned int order,
>   						int migratetype)
>   {
> -	struct page *page;
> +	struct page *page = NULL;
> +
> +	if (IS_ENABLED(CONFIG_CMA) &&
> +		migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
> +		page = __rmqueue_cma(zone, order);
>   
>   retry_reserve:
> -	page = __rmqueue_smallest(zone, order, migratetype);
> +	if (!page)
> +		page = __rmqueue_smallest(zone, order, migratetype);
>   
>   	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
>   		page = __rmqueue_fallback(zone, order, migratetype);
> @@ -4849,6 +4934,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>   		zone_seqlock_init(zone);
>   		zone->zone_pgdat = pgdat;
>   		zone_pcp_init(zone);
> +		if (IS_ENABLED(CONFIG_CMA))
> +			zone->managed_cma_pages = 0;
>   
>   		/* For bootup, initialized properly in watermark setup */
>   		mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
> 

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
@ 2014-05-30  7:53     ` Gioh Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Gioh Kim @ 2014-05-30  7:53 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Laura Abbott,
	Minchan Kim, Heesub Shin, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel

Joonsoo,

I'm attaching a patch for combination of __rmqueue and __rmqueue_cma.
I didn't test fully but my board is turned on and working well if no frequent memory allocations.

I'm sorry to send not-tested code.
I just want to report this during your working hour ;-)

I'm testing this this evening and reporting next week.
Have a nice weekend!

-------------------------------------- 8< -----------------------------------------
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7f97767..9ced736 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -964,7 +964,7 @@ static int fallbacks[MIGRATE_TYPES][4] = {
        [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,     MIGRATE_R
 #ifdef CONFIG_CMA
        [MIGRATE_MOVABLE]     = { MIGRATE_CMA,         MIGRATE_RECLAIMABLE, MIGRATE_U
-       [MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
+       [MIGRATE_CMA]         = { MIGRATE_MOVABLE,     MIGRATE_RECLAIMABLE, MIGRATE_U
 #else
        [MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE,   MIGRATE_R
 #endif
@@ -1170,9 +1170,22 @@ static struct page *__rmqueue(struct zone *zone, unsigned int
                                                int migratetype)
 {
        struct page *page;
+       long free, free_cma, free_wmark;

 retry_reserve:
-       page = __rmqueue_smallest(zone, order, migratetype);
+       if (IS_ENABLED(CONFIG_CMA) && migratetype == MIGRATE_MOVABLE) {
+               if (zone->nr_try_movable) {
+                       zone->nr_try_movable -= 1 << order;
+               } else if (zone->nr_try_cma) {
+                       zone->nr_try_cma -= 1 << order;
+                       migratetype = MIGRATE_CMA;
+               } else {
+                       zone->nr_try_movable = zone->max_try_movable;
+                       zone->nr_try_movable -= 1 << order;
+                       zone->nr_try_cma = zone->max_try_cma;
+               }
+       }
+       page = __rmqueue_smallest(zone, order, migratetype);

        if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
                page = __rmqueue_fallback(zone, order, migratetype);


2014-05-28 ?AEA 4:04, Joonsoo Kim  3/4 ' +-U:
> CMA is introduced to provide physically contiguous pages at runtime.
> For this purpose, it reserves memory at boot time. Although it reserve
> memory, this reserved memory can be used for movable memory allocation
> request. This usecase is beneficial to the system that needs this CMA
> reserved memory infrequently and it is one of main purpose of
> introducing CMA.
> 
> But, there is a problem in current implementation. The problem is that
> it works like as just reserved memory approach. The pages on cma reserved
> memory are hardly used for movable memory allocation. This is caused by
> combination of allocation and reclaim policy.
> 
> The pages on cma reserved memory are allocated if there is no movable
> memory, that is, as fallback allocation. So the time this fallback
> allocation is started is under heavy memory pressure. Although it is under
> memory pressure, movable allocation easily succeed, since there would be
> many pages on cma reserved memory. But this is not the case for unmovable
> and reclaimable allocation, because they can't use the pages on cma
> reserved memory. These allocations regard system's free memory as
> (free pages - free cma pages) on watermark checking, that is, free
> unmovable pages + free reclaimable pages + free movable pages. Because
> we already exhausted movable pages, only free pages we have are unmovable
> and reclaimable types and this would be really small amount. So watermark
> checking would be failed. It will wake up kswapd to make enough free
> memory for unmovable and reclaimable allocation and kswapd will do.
> So before we fully utilize pages on cma reserved memory, kswapd start to
> reclaim memory and try to make free memory over the high watermark. This
> watermark checking by kswapd doesn't take care free cma pages so many
> movable pages would be reclaimed. After then, we have a lot of movable
> pages again, so fallback allocation doesn't happen again. To conclude,
> amount of free memory on meminfo which includes free CMA pages is moving
> around 512 MB if I reserve 512 MB memory for CMA.
> 
> I found this problem on following experiment.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
> 
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           225.2           472.5
> Average-MemFree:        322490 KB       630839 KB
> 
> To solve this problem, I can think following 2 possible solutions.
> 1. allocate the pages on cma reserved memory first, and if they are
>     exhausted, allocate movable pages.
> 2. interleaved allocation: try to allocate specific amounts of memory
>     from cma reserved memory and then allocate from free movable memory.
> 
> I tested #1 approach and found the problem. Although free memory on
> meminfo can move around low watermark, there is large fluctuation on free
> memory, because too many pages are reclaimed when kswapd is invoked.
> Reason for this behaviour is that successive allocated CMA pages are
> on the LRU list in that order and kswapd reclaim them in same order.
> These memory doesn't help watermark checking from kwapd, so too many
> pages are reclaimed, I guess.
> 
> So, I implement #2 approach.
> One thing I should note is that we should not change allocation target
> (movable list or cma) on each allocation attempt, since this prevent
> allocated pages to be in physically succession, so some I/O devices can
> be hurt their performance. To solve this, I keep allocation target
> in at least pageblock_nr_pages attempts and make this number reflect
> ratio, free pages without free cma pages to free cma pages. With this
> approach, system works very smoothly and fully utilize the pages on
> cma reserved memory.
> 
> Following is the experimental result of this patch.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
> 
> <Before>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           225.2           472.5
> Average-MemFree:        322490 KB       630839 KB
> nr_free_cma:            0               131068
> pswpin:                 0               261666
> pswpout:                75              1241363
> 
> <After>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           222.7           224
> Average-MemFree:        325595 KB       393033 KB
> nr_free_cma:            0               61001
> pswpin:                 0               6
> pswpout:                44              502
> 
> There is no difference if we don't have cma reserved memory (0 MB case).
> But, with cma reserved memory (512 MB case), we fully utilize these
> reserved memory through this patch and the system behaves like as
> it doesn't reserve any memory.
> 
> With this patch, we aggressively allocate the pages on cma reserved memory
> so latency of CMA can arise. Below is the experimental result about
> latency.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> CMA reserve: 512 MB
> Backgound Workload: make -jN
> Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval
> 
> N:                    1        4       8        16
> Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
> Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2
> 
> So generally we can see latency increase. Ratio of this increase
> is rather big - up to 70%. But, under the heavy workload, it shows
> latency decrease - up to 55%. This may be worst-case scenario, but
> reducing it would be important for some system, so, I can say that
> this patch have advantages and disadvantages in terms of latency.
> 
> Although I think that this patch is right direction for CMA, there is
> side-effect in following case. If there is small memory zone and CMA
> occupys most of them, LRU for this zone would have many CMA pages. When
> reclaim is started, these CMA pages would be reclaimed, but not counted
> for watermark checking, so too many CMA pages could be reclaimed
> unnecessarily. Until now, this can't happen because free CMA pages aren't
> used easily. But, with this patch, free CMA pages are used easily so
> this problem can be possible. I will handle it on another patchset
> after some investigating.
> 
> v2: In fastpath, just replenish counters. Calculation is done whenver
>      cma area is varied
> 
> Acked-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c
> index d9d3d85..84a7582 100644
> --- a/arch/powerpc/kvm/book3s_hv_cma.c
> +++ b/arch/powerpc/kvm/book3s_hv_cma.c
> @@ -132,6 +132,8 @@ struct page *kvm_alloc_cma(unsigned long nr_pages, unsigned long align_pages)
>   		if (ret == 0) {
>   			bitmap_set(cma->bitmap, pageno, nr_chunk);
>   			page = pfn_to_page(pfn);
> +			adjust_managed_cma_page_count(page_zone(page),
> +								nr_pages);
>   			memset(pfn_to_kaddr(pfn), 0, nr_pages << PAGE_SHIFT);
>   			break;
>   		} else if (ret != -EBUSY) {
> @@ -180,6 +182,7 @@ bool kvm_release_cma(struct page *pages, unsigned long nr_pages)
>   		     (pfn - cma->base_pfn) >> (KVM_CMA_CHUNK_ORDER - PAGE_SHIFT),
>   		     nr_chunk);
>   	free_contig_range(pfn, nr_pages);
> +	adjust_managed_cma_page_count(page_zone(pages), nr_pages);
>   	mutex_unlock(&kvm_cma_mutex);
>   
>   	return true;
> @@ -210,6 +213,8 @@ static int __init kvm_cma_activate_area(unsigned long base_pfn,
>   		}
>   		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
>   	} while (--i);
> +	adjust_managed_cma_page_count(zone, count);
> +
>   	return 0;
>   }
>   
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index 165c2c2..c578d5a 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -160,6 +160,7 @@ static int __init cma_activate_area(struct cma *cma)
>   		}
>   		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
>   	} while (--i);
> +	adjust_managed_cma_page_count(zone, cma->count);
>   
>   	return 0;
>   }
> @@ -307,6 +308,7 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count,
>   		if (ret == 0) {
>   			bitmap_set(cma->bitmap, pageno, count);
>   			page = pfn_to_page(pfn);
> +			adjust_managed_cma_page_count(page_zone(page), count);
>   			break;
>   		} else if (ret != -EBUSY) {
>   			break;
> @@ -353,6 +355,7 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
>   	mutex_lock(&cma_mutex);
>   	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
>   	free_contig_range(pfn, count);
> +	adjust_managed_cma_page_count(page_zone(pages), count);
>   	mutex_unlock(&cma_mutex);
>   
>   	return true;
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 39b81dc..51cffc1 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -415,6 +415,7 @@ extern int alloc_contig_range(unsigned long start, unsigned long end,
>   extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
>   
>   /* CMA stuff */
> +extern void adjust_managed_cma_page_count(struct zone *zone, long count);
>   extern void init_cma_reserved_pageblock(struct page *page);
>   
>   #endif
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fac5509..f52cb96 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -389,6 +389,20 @@ struct zone {
>   	int			compact_order_failed;
>   #endif
>   
> +#ifdef CONFIG_CMA
> +	unsigned long managed_cma_pages;
> +	/*
> +	 * Number of allocation attempt on each movable/cma type
> +	 * without switching type. max_try(movable/cma) maintain
> +	 * predefined calculated counter and replenish nr_try_(movable/cma)
> +	 * with each of them whenever both of them are 0.
> +	 */
> +	int nr_try_movable;
> +	int nr_try_cma;
> +	int max_try_movable;
> +	int max_try_cma;
> +#endif
> +
>   	ZONE_PADDING(_pad1_)
>   
>   	/* Fields commonly accessed by the page reclaim scanner */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 674ade7..ca678b6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -788,6 +788,56 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
>   }
>   
>   #ifdef CONFIG_CMA
> +void adjust_managed_cma_page_count(struct zone *zone, long count)
> +{
> +	unsigned long flags;
> +	long total, cma, movable;
> +
> +	spin_lock_irqsave(&zone->lock, flags);
> +	zone->managed_cma_pages += count;
> +
> +	total = zone->managed_pages;
> +	cma = zone->managed_cma_pages;
> +	movable = total - cma - high_wmark_pages(zone);
> +
> +	/* No cma pages, so do only movable allocation */
> +	if (cma <= 0) {
> +		zone->max_try_movable = pageblock_nr_pages;
> +		zone->max_try_cma = 0;
> +		goto out;
> +	}
> +
> +	/*
> +	 * We want to consume cma pages with well balanced ratio so that
> +	 * we have consumed enough cma pages before the reclaim. For this
> +	 * purpose, we can use the ratio, movable : cma. And we doesn't
> +	 * want to switch too frequently, because it prevent allocated pages
> +	 * from beging successive and it is bad for some sorts of devices.
> +	 * I choose pageblock_nr_pages for the minimum amount of successive
> +	 * allocation because it is the size of a huge page and fragmentation
> +	 * avoidance is implemented based on this size.
> +	 *
> +	 * To meet above criteria, I derive following equation.
> +	 *
> +	 * if (movable > cma) then; movable : cma = X : pageblock_nr_pages
> +	 * else (movable <= cma) then; movable : cma = pageblock_nr_pages : X
> +	 */
> +	if (movable > cma) {
> +		zone->max_try_movable =
> +			(movable * pageblock_nr_pages) / cma;
> +		zone->max_try_cma = pageblock_nr_pages;
> +	} else {
> +		zone->max_try_movable = pageblock_nr_pages;
> +		zone->max_try_cma = cma * pageblock_nr_pages / movable;
> +	}
> +
> +out:
> +	zone->nr_try_movable = zone->max_try_movable;
> +	zone->nr_try_cma = zone->max_try_cma;
> +
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +}
> +
>   /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>   void __init init_cma_reserved_pageblock(struct page *page)
>   {
> @@ -1136,6 +1186,36 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>   	return NULL;
>   }
>   
> +#ifdef CONFIG_CMA
> +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order)
> +{
> +	struct page *page;
> +
> +	if (zone->nr_try_movable > 0)
> +		goto alloc_movable;
> +
> +	if (zone->nr_try_cma > 0) {
> +		/* Okay. Now, we can try to allocate the page from cma region */
> +		zone->nr_try_cma -= 1 << order;
> +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> +
> +		/* CMA pages can vanish through CMA allocation */
> +		if (unlikely(!page && order == 0))
> +			zone->nr_try_cma = 0;
> +
> +		return page;
> +	}
> +
> +	/* Reset counter */
> +	zone->nr_try_movable = zone->max_try_movable;
> +	zone->nr_try_cma = zone->max_try_cma;
> +
> +alloc_movable:
> +	zone->nr_try_movable -= 1 << order;
> +	return NULL;
> +}
> +#endif
> +
>   /*
>    * Do the hard work of removing an element from the buddy allocator.
>    * Call me with the zone->lock already held.
> @@ -1143,10 +1223,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>   static struct page *__rmqueue(struct zone *zone, unsigned int order,
>   						int migratetype)
>   {
> -	struct page *page;
> +	struct page *page = NULL;
> +
> +	if (IS_ENABLED(CONFIG_CMA) &&
> +		migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
> +		page = __rmqueue_cma(zone, order);
>   
>   retry_reserve:
> -	page = __rmqueue_smallest(zone, order, migratetype);
> +	if (!page)
> +		page = __rmqueue_smallest(zone, order, migratetype);
>   
>   	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
>   		page = __rmqueue_fallback(zone, order, migratetype);
> @@ -4849,6 +4934,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>   		zone_seqlock_init(zone);
>   		zone->zone_pgdat = pgdat;
>   		zone_pcp_init(zone);
> +		if (IS_ENABLED(CONFIG_CMA))
> +			zone->managed_cma_pages = 0;
>   
>   		/* For bootup, initialized properly in watermark setup */
>   		mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/3] CMA: always treat free cma pages as non-free on watermark checking
  2014-05-28  7:04   ` Joonsoo Kim
@ 2014-05-30 10:40     ` Ritesh Harjani
  -1 siblings, 0 replies; 48+ messages in thread
From: Ritesh Harjani @ 2014-05-30 10:40 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel,
	Ritesh Harjani, Nagachandra P

Hi Joonsoo,

I think you will be loosing the benefit of below patch with your changes.
I am no expert here so please bear with me. I tried explaining in the
inline comments, let me know if I am wrong.

commit 026b08147923142e925a7d0aaa39038055ae0156
Author: Tomasz Stanislawski <t.stanislaws@samsung.com>
Date:   Wed Jun 12 14:05:02 2013 -0700


On Wed, May 28, 2014 at 12:34 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag
> for alloc flag and treats free cma pages as free pages if this flag is
> passed to watermark checking. Intention of that patch is that movable page
> allocation can be be handled from cma reserved region without starting
> kswapd. Now, previous patch changes the behaviour of allocator that
> movable allocation uses the page on cma reserved region aggressively,
> so this watermark hack isn't needed anymore. Therefore remove it.
>
> Acked-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 627dc2e..36e2fcd 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1117,10 +1117,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>
>         count_compact_event(COMPACTSTALL);
>
> -#ifdef CONFIG_CMA
> -       if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> -               alloc_flags |= ALLOC_CMA;
> -#endif
>         /* Compact each zone in the list */
>         for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
>                                                                 nodemask) {
> diff --git a/mm/internal.h b/mm/internal.h
> index 07b6736..a121762 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -384,7 +384,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>  #define ALLOC_HARDER           0x10 /* try to alloc harder */
>  #define ALLOC_HIGH             0x20 /* __GFP_HIGH set */
>  #define ALLOC_CPUSET           0x40 /* check for correct cpuset */
> -#define ALLOC_CMA              0x80 /* allow allocations from CMA areas */
> -#define ALLOC_FAIR             0x100 /* fair zone allocation */
> +#define ALLOC_FAIR             0x80 /* fair zone allocation */
>
>  #endif /* __MM_INTERNAL_H */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ca678b6..83a8021 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1764,20 +1764,22 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark,
>         long min = mark;
>         long lowmem_reserve = z->lowmem_reserve[classzone_idx];
>         int o;
> -       long free_cma = 0;
>
>         free_pages -= (1 << order) - 1;
>         if (alloc_flags & ALLOC_HIGH)
>                 min -= min / 2;
>         if (alloc_flags & ALLOC_HARDER)
>                 min -= min / 4;
> -#ifdef CONFIG_CMA
> -       /* If allocation can't use CMA areas don't use free CMA pages */
> -       if (!(alloc_flags & ALLOC_CMA))
> -               free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
> -#endif
> +       /*
> +        * We don't want to regard the pages on CMA region as free
> +        * on watermark checking, since they cannot be used for
> +        * unmovable/reclaimable allocation and they can suddenly
> +        * vanish through CMA allocation
> +        */
> +       if (IS_ENABLED(CONFIG_CMA) && z->managed_cma_pages)
> +               free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);

make this free_cma instead of free_pages.

>
> -       if (free_pages - free_cma <= min + lowmem_reserve)
> +       if (free_pages <= min + lowmem_reserve)
free_pages - free_cma <= min + lowmem_reserve

Because in for loop you subtract nr_free which includes the CMA pages.
So if you have subtracted NR_FREE_CMA_PAGES
from free_pages above then you will be subtracting cma pages again in
nr_free (below in for loop).

>                 return false;
>         for (o = 0; o < order; o++) {
>                 /* At the next order, this order's pages become unavailable */
> @@ -2545,10 +2547,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>                                  unlikely(test_thread_flag(TIF_MEMDIE))))
>                         alloc_flags |= ALLOC_NO_WATERMARKS;
>         }
> -#ifdef CONFIG_CMA
> -       if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> -               alloc_flags |= ALLOC_CMA;
> -#endif
>         return alloc_flags;
>  }
>
> @@ -2818,10 +2816,6 @@ retry_cpuset:
>         if (!preferred_zone)
>                 goto out;
>
> -#ifdef CONFIG_CMA
> -       if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> -               alloc_flags |= ALLOC_CMA;
> -#endif
>  retry:
>         /* First allocation attempt */
>         page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
> --
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


Thanks
Ritesh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/3] CMA: always treat free cma pages as non-free on watermark checking
@ 2014-05-30 10:40     ` Ritesh Harjani
  0 siblings, 0 replies; 48+ messages in thread
From: Ritesh Harjani @ 2014-05-30 10:40 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel,
	Ritesh Harjani, Nagachandra P

Hi Joonsoo,

I think you will be loosing the benefit of below patch with your changes.
I am no expert here so please bear with me. I tried explaining in the
inline comments, let me know if I am wrong.

commit 026b08147923142e925a7d0aaa39038055ae0156
Author: Tomasz Stanislawski <t.stanislaws@samsung.com>
Date:   Wed Jun 12 14:05:02 2013 -0700


On Wed, May 28, 2014 at 12:34 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag
> for alloc flag and treats free cma pages as free pages if this flag is
> passed to watermark checking. Intention of that patch is that movable page
> allocation can be be handled from cma reserved region without starting
> kswapd. Now, previous patch changes the behaviour of allocator that
> movable allocation uses the page on cma reserved region aggressively,
> so this watermark hack isn't needed anymore. Therefore remove it.
>
> Acked-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 627dc2e..36e2fcd 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1117,10 +1117,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>
>         count_compact_event(COMPACTSTALL);
>
> -#ifdef CONFIG_CMA
> -       if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> -               alloc_flags |= ALLOC_CMA;
> -#endif
>         /* Compact each zone in the list */
>         for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
>                                                                 nodemask) {
> diff --git a/mm/internal.h b/mm/internal.h
> index 07b6736..a121762 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -384,7 +384,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>  #define ALLOC_HARDER           0x10 /* try to alloc harder */
>  #define ALLOC_HIGH             0x20 /* __GFP_HIGH set */
>  #define ALLOC_CPUSET           0x40 /* check for correct cpuset */
> -#define ALLOC_CMA              0x80 /* allow allocations from CMA areas */
> -#define ALLOC_FAIR             0x100 /* fair zone allocation */
> +#define ALLOC_FAIR             0x80 /* fair zone allocation */
>
>  #endif /* __MM_INTERNAL_H */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ca678b6..83a8021 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1764,20 +1764,22 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark,
>         long min = mark;
>         long lowmem_reserve = z->lowmem_reserve[classzone_idx];
>         int o;
> -       long free_cma = 0;
>
>         free_pages -= (1 << order) - 1;
>         if (alloc_flags & ALLOC_HIGH)
>                 min -= min / 2;
>         if (alloc_flags & ALLOC_HARDER)
>                 min -= min / 4;
> -#ifdef CONFIG_CMA
> -       /* If allocation can't use CMA areas don't use free CMA pages */
> -       if (!(alloc_flags & ALLOC_CMA))
> -               free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
> -#endif
> +       /*
> +        * We don't want to regard the pages on CMA region as free
> +        * on watermark checking, since they cannot be used for
> +        * unmovable/reclaimable allocation and they can suddenly
> +        * vanish through CMA allocation
> +        */
> +       if (IS_ENABLED(CONFIG_CMA) && z->managed_cma_pages)
> +               free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);

make this free_cma instead of free_pages.

>
> -       if (free_pages - free_cma <= min + lowmem_reserve)
> +       if (free_pages <= min + lowmem_reserve)
free_pages - free_cma <= min + lowmem_reserve

Because in for loop you subtract nr_free which includes the CMA pages.
So if you have subtracted NR_FREE_CMA_PAGES
from free_pages above then you will be subtracting cma pages again in
nr_free (below in for loop).

>                 return false;
>         for (o = 0; o < order; o++) {
>                 /* At the next order, this order's pages become unavailable */
> @@ -2545,10 +2547,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>                                  unlikely(test_thread_flag(TIF_MEMDIE))))
>                         alloc_flags |= ALLOC_NO_WATERMARKS;
>         }
> -#ifdef CONFIG_CMA
> -       if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> -               alloc_flags |= ALLOC_CMA;
> -#endif
>         return alloc_flags;
>  }
>
> @@ -2818,10 +2816,6 @@ retry_cpuset:
>         if (!preferred_zone)
>                 goto out;
>
> -#ifdef CONFIG_CMA
> -       if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> -               alloc_flags |= ALLOC_CMA;
> -#endif
>  retry:
>         /* First allocation attempt */
>         page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
> --
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


Thanks
Ritesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-30  7:53     ` Gioh Kim
@ 2014-05-30 14:23       ` Joonsoo Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-05-30 14:23 UTC (permalink / raw)
  To: Gioh Kim
  Cc: Joonsoo Kim, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, Laura Abbott, Minchan Kim, Heesub Shin,
	Marek Szyprowski, Michal Nazarewicz, Aneesh Kumar K.V,
	Linux Memory Management List, LKML

2014-05-30 16:53 GMT+09:00 Gioh Kim <gioh.kim@lge.com>:
> Joonsoo,
>
> I'm attaching a patch for combination of __rmqueue and __rmqueue_cma.
> I didn't test fully but my board is turned on and working well if no frequent memory allocations.
>
> I'm sorry to send not-tested code.
> I just want to report this during your working hour ;-)
>
> I'm testing this this evening and reporting next week.
> Have a nice weekend!

Thanks Gioh. :)

> -------------------------------------- 8< -----------------------------------------
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7f97767..9ced736 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -964,7 +964,7 @@ static int fallbacks[MIGRATE_TYPES][4] = {
>         [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,     MIGRATE_R
>  #ifdef CONFIG_CMA
>         [MIGRATE_MOVABLE]     = { MIGRATE_CMA,         MIGRATE_RECLAIMABLE, MIGRATE_U
> -       [MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
> +       [MIGRATE_CMA]         = { MIGRATE_MOVABLE,     MIGRATE_RECLAIMABLE, MIGRATE_U

I don't want to use __rmqueue_fallback() for CMA.
__rmqueue_fallback() takes big order page rather than small order page
in order to steal large amount of pages and continue to use them in
next allocation attempts.
We can use CMA pages on limited cases, so stealing some pages from
other migrate type
to CMA type isn't good idea to me.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
@ 2014-05-30 14:23       ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-05-30 14:23 UTC (permalink / raw)
  To: Gioh Kim
  Cc: Joonsoo Kim, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, Laura Abbott, Minchan Kim, Heesub Shin,
	Marek Szyprowski, Michal Nazarewicz, Aneesh Kumar K.V,
	Linux Memory Management List, LKML

2014-05-30 16:53 GMT+09:00 Gioh Kim <gioh.kim@lge.com>:
> Joonsoo,
>
> I'm attaching a patch for combination of __rmqueue and __rmqueue_cma.
> I didn't test fully but my board is turned on and working well if no frequent memory allocations.
>
> I'm sorry to send not-tested code.
> I just want to report this during your working hour ;-)
>
> I'm testing this this evening and reporting next week.
> Have a nice weekend!

Thanks Gioh. :)

> -------------------------------------- 8< -----------------------------------------
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7f97767..9ced736 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -964,7 +964,7 @@ static int fallbacks[MIGRATE_TYPES][4] = {
>         [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,     MIGRATE_R
>  #ifdef CONFIG_CMA
>         [MIGRATE_MOVABLE]     = { MIGRATE_CMA,         MIGRATE_RECLAIMABLE, MIGRATE_U
> -       [MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
> +       [MIGRATE_CMA]         = { MIGRATE_MOVABLE,     MIGRATE_RECLAIMABLE, MIGRATE_U

I don't want to use __rmqueue_fallback() for CMA.
__rmqueue_fallback() takes big order page rather than small order page
in order to steal large amount of pages and continue to use them in
next allocation attempts.
We can use CMA pages on limited cases, so stealing some pages from
other migrate type
to CMA type isn't good idea to me.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/3] CMA: always treat free cma pages as non-free on watermark checking
  2014-05-30 10:40     ` Ritesh Harjani
@ 2014-05-30 14:46       ` Joonsoo Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-05-30 14:46 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Joonsoo Kim, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, Laura Abbott, Minchan Kim, Heesub Shin,
	Marek Szyprowski, Michal Nazarewicz, Aneesh Kumar K.V,
	Linux Memory Management List, LKML, Ritesh Harjani,
	Nagachandra P

2014-05-30 19:40 GMT+09:00 Ritesh Harjani <ritesh.list@gmail.com>:
> Hi Joonsoo,
>
> I think you will be loosing the benefit of below patch with your changes.
> I am no expert here so please bear with me. I tried explaining in the
> inline comments, let me know if I am wrong.
>
> commit 026b08147923142e925a7d0aaa39038055ae0156
> Author: Tomasz Stanislawski <t.stanislaws@samsung.com>
> Date:   Wed Jun 12 14:05:02 2013 -0700

Hello, Ritesh.

Thanks for notifying that.

>
> On Wed, May 28, 2014 at 12:34 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag
>> for alloc flag and treats free cma pages as free pages if this flag is
>> passed to watermark checking. Intention of that patch is that movable page
>> allocation can be be handled from cma reserved region without starting
>> kswapd. Now, previous patch changes the behaviour of allocator that
>> movable allocation uses the page on cma reserved region aggressively,
>> so this watermark hack isn't needed anymore. Therefore remove it.
>>
>> Acked-by: Michal Nazarewicz <mina86@mina86.com>
>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 627dc2e..36e2fcd 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1117,10 +1117,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>>
>>         count_compact_event(COMPACTSTALL);
>>
>> -#ifdef CONFIG_CMA
>> -       if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
>> -               alloc_flags |= ALLOC_CMA;
>> -#endif
>>         /* Compact each zone in the list */
>>         for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
>>                                                                 nodemask) {
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 07b6736..a121762 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -384,7 +384,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>>  #define ALLOC_HARDER           0x10 /* try to alloc harder */
>>  #define ALLOC_HIGH             0x20 /* __GFP_HIGH set */
>>  #define ALLOC_CPUSET           0x40 /* check for correct cpuset */
>> -#define ALLOC_CMA              0x80 /* allow allocations from CMA areas */
>> -#define ALLOC_FAIR             0x100 /* fair zone allocation */
>> +#define ALLOC_FAIR             0x80 /* fair zone allocation */
>>
>>  #endif /* __MM_INTERNAL_H */
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index ca678b6..83a8021 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1764,20 +1764,22 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark,
>>         long min = mark;
>>         long lowmem_reserve = z->lowmem_reserve[classzone_idx];
>>         int o;
>> -       long free_cma = 0;
>>
>>         free_pages -= (1 << order) - 1;
>>         if (alloc_flags & ALLOC_HIGH)
>>                 min -= min / 2;
>>         if (alloc_flags & ALLOC_HARDER)
>>                 min -= min / 4;
>> -#ifdef CONFIG_CMA
>> -       /* If allocation can't use CMA areas don't use free CMA pages */
>> -       if (!(alloc_flags & ALLOC_CMA))
>> -               free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
>> -#endif
>> +       /*
>> +        * We don't want to regard the pages on CMA region as free
>> +        * on watermark checking, since they cannot be used for
>> +        * unmovable/reclaimable allocation and they can suddenly
>> +        * vanish through CMA allocation
>> +        */
>> +       if (IS_ENABLED(CONFIG_CMA) && z->managed_cma_pages)
>> +               free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
>
> make this free_cma instead of free_pages.
>
>>
>> -       if (free_pages - free_cma <= min + lowmem_reserve)
>> +       if (free_pages <= min + lowmem_reserve)
> free_pages - free_cma <= min + lowmem_reserve
>
> Because in for loop you subtract nr_free which includes the CMA pages.
> So if you have subtracted NR_FREE_CMA_PAGES
> from free_pages above then you will be subtracting cma pages again in
> nr_free (below in for loop).

Yes, I understand the problem you mentioned.

I think that this is complicated issue.

Comit '026b081' you mentioned makes watermark_ok() loose for high order
allocation compared to kernel that CMA isn't enabled, since free_pages includes
free_cma pages and most of high order allocation except THP would be
non-movable allocation. This non-movable allocation can't use cma pages,
so we shouldn't include free_cma pages.

If most of free cma pages are 0 order, that commit works correctly. We subtract
nr of free cma pages at the first loop, so there is no problem. But,
if the system
have some free high-order cma pages, watermark checking allow high-order
allocation more easily.

I think that loosing the watermark check is right solution so will takes your
comment on v2. But I want to know other developer's opinion.
If needed, I can implement to track free_area[o].nr_cma_free and use it for
precise freepage calculation in watermark check.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/3] CMA: always treat free cma pages as non-free on watermark checking
@ 2014-05-30 14:46       ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-05-30 14:46 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Joonsoo Kim, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, Laura Abbott, Minchan Kim, Heesub Shin,
	Marek Szyprowski, Michal Nazarewicz, Aneesh Kumar K.V,
	Linux Memory Management List, LKML, Ritesh Harjani,
	Nagachandra P

2014-05-30 19:40 GMT+09:00 Ritesh Harjani <ritesh.list@gmail.com>:
> Hi Joonsoo,
>
> I think you will be loosing the benefit of below patch with your changes.
> I am no expert here so please bear with me. I tried explaining in the
> inline comments, let me know if I am wrong.
>
> commit 026b08147923142e925a7d0aaa39038055ae0156
> Author: Tomasz Stanislawski <t.stanislaws@samsung.com>
> Date:   Wed Jun 12 14:05:02 2013 -0700

Hello, Ritesh.

Thanks for notifying that.

>
> On Wed, May 28, 2014 at 12:34 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag
>> for alloc flag and treats free cma pages as free pages if this flag is
>> passed to watermark checking. Intention of that patch is that movable page
>> allocation can be be handled from cma reserved region without starting
>> kswapd. Now, previous patch changes the behaviour of allocator that
>> movable allocation uses the page on cma reserved region aggressively,
>> so this watermark hack isn't needed anymore. Therefore remove it.
>>
>> Acked-by: Michal Nazarewicz <mina86@mina86.com>
>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 627dc2e..36e2fcd 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1117,10 +1117,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>>
>>         count_compact_event(COMPACTSTALL);
>>
>> -#ifdef CONFIG_CMA
>> -       if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
>> -               alloc_flags |= ALLOC_CMA;
>> -#endif
>>         /* Compact each zone in the list */
>>         for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
>>                                                                 nodemask) {
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 07b6736..a121762 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -384,7 +384,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>>  #define ALLOC_HARDER           0x10 /* try to alloc harder */
>>  #define ALLOC_HIGH             0x20 /* __GFP_HIGH set */
>>  #define ALLOC_CPUSET           0x40 /* check for correct cpuset */
>> -#define ALLOC_CMA              0x80 /* allow allocations from CMA areas */
>> -#define ALLOC_FAIR             0x100 /* fair zone allocation */
>> +#define ALLOC_FAIR             0x80 /* fair zone allocation */
>>
>>  #endif /* __MM_INTERNAL_H */
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index ca678b6..83a8021 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1764,20 +1764,22 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark,
>>         long min = mark;
>>         long lowmem_reserve = z->lowmem_reserve[classzone_idx];
>>         int o;
>> -       long free_cma = 0;
>>
>>         free_pages -= (1 << order) - 1;
>>         if (alloc_flags & ALLOC_HIGH)
>>                 min -= min / 2;
>>         if (alloc_flags & ALLOC_HARDER)
>>                 min -= min / 4;
>> -#ifdef CONFIG_CMA
>> -       /* If allocation can't use CMA areas don't use free CMA pages */
>> -       if (!(alloc_flags & ALLOC_CMA))
>> -               free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
>> -#endif
>> +       /*
>> +        * We don't want to regard the pages on CMA region as free
>> +        * on watermark checking, since they cannot be used for
>> +        * unmovable/reclaimable allocation and they can suddenly
>> +        * vanish through CMA allocation
>> +        */
>> +       if (IS_ENABLED(CONFIG_CMA) && z->managed_cma_pages)
>> +               free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
>
> make this free_cma instead of free_pages.
>
>>
>> -       if (free_pages - free_cma <= min + lowmem_reserve)
>> +       if (free_pages <= min + lowmem_reserve)
> free_pages - free_cma <= min + lowmem_reserve
>
> Because in for loop you subtract nr_free which includes the CMA pages.
> So if you have subtracted NR_FREE_CMA_PAGES
> from free_pages above then you will be subtracting cma pages again in
> nr_free (below in for loop).

Yes, I understand the problem you mentioned.

I think that this is complicated issue.

Comit '026b081' you mentioned makes watermark_ok() loose for high order
allocation compared to kernel that CMA isn't enabled, since free_pages includes
free_cma pages and most of high order allocation except THP would be
non-movable allocation. This non-movable allocation can't use cma pages,
so we shouldn't include free_cma pages.

If most of free cma pages are 0 order, that commit works correctly. We subtract
nr of free cma pages at the first loop, so there is no problem. But,
if the system
have some free high-order cma pages, watermark checking allow high-order
allocation more easily.

I think that loosing the watermark check is right solution so will takes your
comment on v2. But I want to know other developer's opinion.
If needed, I can implement to track free_area[o].nr_cma_free and use it for
precise freepage calculation in watermark check.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-30  0:45           ` Joonsoo Kim
@ 2014-05-31  0:02             ` Michal Nazarewicz
  -1 siblings, 0 replies; 48+ messages in thread
From: Michal Nazarewicz @ 2014-05-31  0:02 UTC (permalink / raw)
  To: Joonsoo Kim, Gioh Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Aneesh Kumar K.V, linux-mm, linux-kernel

> On Thu, May 29, 2014 at 05:09:34PM +0900, Gioh Kim wrote:
>> Is IS_ENABLED(CONFIG_CMA) necessary?
>> What about if (migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages) ?

On Fri, May 30 2014, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> Yes, field, managed_cma_pages exists only if CONFIG_CMA is enabled, so
> removing IS_ENABLE(CONFIG_CMA) would break the build.

That statement makes no sense.  If zone->managed_cma_pages not being
defined is the problem, what you need is:

+#ifdef CONFIG_CMA
+	if (migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
+		page = __rmqueue_cma(zone, order);
+#endif

If you use IS_ENABLED, zone-managed_cma_pages has to be defined
regardless of result of state of CONFIG_CMA.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
@ 2014-05-31  0:02             ` Michal Nazarewicz
  0 siblings, 0 replies; 48+ messages in thread
From: Michal Nazarewicz @ 2014-05-31  0:02 UTC (permalink / raw)
  To: Joonsoo Kim, Gioh Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Aneesh Kumar K.V, linux-mm, linux-kernel

> On Thu, May 29, 2014 at 05:09:34PM +0900, Gioh Kim wrote:
>> Is IS_ENABLED(CONFIG_CMA) necessary?
>> What about if (migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages) ?

On Fri, May 30 2014, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> Yes, field, managed_cma_pages exists only if CONFIG_CMA is enabled, so
> removing IS_ENABLE(CONFIG_CMA) would break the build.

That statement makes no sense.  If zone->managed_cma_pages not being
defined is the problem, what you need is:

+#ifdef CONFIG_CMA
+	if (migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
+		page = __rmqueue_cma(zone, order);
+#endif

If you use IS_ENABLED, zone-managed_cma_pages has to be defined
regardless of result of state of CONFIG_CMA.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-28  7:04   ` Joonsoo Kim
@ 2014-05-31  0:11     ` Michal Nazarewicz
  -1 siblings, 0 replies; 48+ messages in thread
From: Michal Nazarewicz @ 2014-05-31  0:11 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Aneesh Kumar K.V, linux-mm, linux-kernel

On Wed, May 28 2014, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> @@ -1143,10 +1223,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  static struct page *__rmqueue(struct zone *zone, unsigned int order,
>  						int migratetype)
>  {
> -	struct page *page;
> +	struct page *page = NULL;
> +
> +	if (IS_ENABLED(CONFIG_CMA) &&
> +		migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
> +		page = __rmqueue_cma(zone, order);

Come to think of it, I would consider:

	if (…) {
		page = __rmqueue_cma(zone, order);
		if (page)
			goto done
	}

	…

done:
	trace_mm_page_alloc_zone_locked(page, order, migratetype);
	return page;

>  
>  retry_reserve:
> -	page = __rmqueue_smallest(zone, order, migratetype);
> +	if (!page)
> +		page = __rmqueue_smallest(zone, order, migratetype);
>  

The above would allow this if statement to go away.

>  	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
>  		page = __rmqueue_fallback(zone, order, migratetype);

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
@ 2014-05-31  0:11     ` Michal Nazarewicz
  0 siblings, 0 replies; 48+ messages in thread
From: Michal Nazarewicz @ 2014-05-31  0:11 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Laura Abbott,
	Minchan Kim, Heesub Shin, Marek Szyprowski, Aneesh Kumar K.V,
	linux-mm, linux-kernel

On Wed, May 28 2014, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> @@ -1143,10 +1223,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  static struct page *__rmqueue(struct zone *zone, unsigned int order,
>  						int migratetype)
>  {
> -	struct page *page;
> +	struct page *page = NULL;
> +
> +	if (IS_ENABLED(CONFIG_CMA) &&
> +		migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
> +		page = __rmqueue_cma(zone, order);

Come to think of it, I would consider:

	if (…) {
		page = __rmqueue_cma(zone, order);
		if (page)
			goto done
	}

	…

done:
	trace_mm_page_alloc_zone_locked(page, order, migratetype);
	return page;

>  
>  retry_reserve:
> -	page = __rmqueue_smallest(zone, order, migratetype);
> +	if (!page)
> +		page = __rmqueue_smallest(zone, order, migratetype);
>  

The above would allow this if statement to go away.

>  	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
>  		page = __rmqueue_fallback(zone, order, migratetype);

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/3] CMA: always treat free cma pages as non-free on watermark checking
  2014-05-30 14:46       ` Joonsoo Kim
@ 2014-06-02  4:07         ` Ritesh Harjani
  -1 siblings, 0 replies; 48+ messages in thread
From: Ritesh Harjani @ 2014-06-02  4:07 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Joonsoo Kim, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, Laura Abbott, Minchan Kim, Heesub Shin,
	Marek Szyprowski, Michal Nazarewicz, Aneesh Kumar K.V,
	Linux Memory Management List, LKML, Nagachandra P, Vinayak Menon,
	Ritesh Harjani, t.stanislaws

Hi Joonsoo,

CC'ing the developer of the patch (Tomasz Stanislawski)


On Fri, May 30, 2014 at 8:16 PM, Joonsoo Kim <js1304@gmail.com> wrote:
> 2014-05-30 19:40 GMT+09:00 Ritesh Harjani <ritesh.list@gmail.com>:
>> Hi Joonsoo,
>>
>> I think you will be loosing the benefit of below patch with your changes.
>> I am no expert here so please bear with me. I tried explaining in the
>> inline comments, let me know if I am wrong.
>>
>> commit 026b08147923142e925a7d0aaa39038055ae0156
>> Author: Tomasz Stanislawski <t.stanislaws@samsung.com>
>> Date:   Wed Jun 12 14:05:02 2013 -0700
>
> Hello, Ritesh.
>
> Thanks for notifying that.
>
>>
>> On Wed, May 28, 2014 at 12:34 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>>> commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag
>>> for alloc flag and treats free cma pages as free pages if this flag is
>>> passed to watermark checking. Intention of that patch is that movable page
>>> allocation can be be handled from cma reserved region without starting
>>> kswapd. Now, previous patch changes the behaviour of allocator that
>>> movable allocation uses the page on cma reserved region aggressively,
>>> so this watermark hack isn't needed anymore. Therefore remove it.
>>>
>>> Acked-by: Michal Nazarewicz <mina86@mina86.com>
>>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index 627dc2e..36e2fcd 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -1117,10 +1117,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>>>
>>>         count_compact_event(COMPACTSTALL);
>>>
>>> -#ifdef CONFIG_CMA
>>> -       if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
>>> -               alloc_flags |= ALLOC_CMA;
>>> -#endif
>>>         /* Compact each zone in the list */
>>>         for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
>>>                                                                 nodemask) {
>>> diff --git a/mm/internal.h b/mm/internal.h
>>> index 07b6736..a121762 100644
>>> --- a/mm/internal.h
>>> +++ b/mm/internal.h
>>> @@ -384,7 +384,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>>>  #define ALLOC_HARDER           0x10 /* try to alloc harder */
>>>  #define ALLOC_HIGH             0x20 /* __GFP_HIGH set */
>>>  #define ALLOC_CPUSET           0x40 /* check for correct cpuset */
>>> -#define ALLOC_CMA              0x80 /* allow allocations from CMA areas */
>>> -#define ALLOC_FAIR             0x100 /* fair zone allocation */
>>> +#define ALLOC_FAIR             0x80 /* fair zone allocation */
>>>
>>>  #endif /* __MM_INTERNAL_H */
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index ca678b6..83a8021 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -1764,20 +1764,22 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark,
>>>         long min = mark;
>>>         long lowmem_reserve = z->lowmem_reserve[classzone_idx];
>>>         int o;
>>> -       long free_cma = 0;
>>>
>>>         free_pages -= (1 << order) - 1;
>>>         if (alloc_flags & ALLOC_HIGH)
>>>                 min -= min / 2;
>>>         if (alloc_flags & ALLOC_HARDER)
>>>                 min -= min / 4;
>>> -#ifdef CONFIG_CMA
>>> -       /* If allocation can't use CMA areas don't use free CMA pages */
>>> -       if (!(alloc_flags & ALLOC_CMA))
>>> -               free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
>>> -#endif
>>> +       /*
>>> +        * We don't want to regard the pages on CMA region as free
>>> +        * on watermark checking, since they cannot be used for
>>> +        * unmovable/reclaimable allocation and they can suddenly
>>> +        * vanish through CMA allocation
>>> +        */
>>> +       if (IS_ENABLED(CONFIG_CMA) && z->managed_cma_pages)
>>> +               free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
>>
>> make this free_cma instead of free_pages.
>>
>>>
>>> -       if (free_pages - free_cma <= min + lowmem_reserve)
>>> +       if (free_pages <= min + lowmem_reserve)
>> free_pages - free_cma <= min + lowmem_reserve
>>
>> Because in for loop you subtract nr_free which includes the CMA pages.
>> So if you have subtracted NR_FREE_CMA_PAGES
>> from free_pages above then you will be subtracting cma pages again in
>> nr_free (below in for loop).
>
> Yes, I understand the problem you mentioned.
>
> I think that this is complicated issue.
>
> Comit '026b081' you mentioned makes watermark_ok() loose for high order
> allocation compared to kernel that CMA isn't enabled, since free_pages includes
> free_cma pages and most of high order allocation except THP would be
> non-movable allocation. This non-movable allocation can't use cma pages,
> so we shouldn't include free_cma pages.
>
> If most of free cma pages are 0 order, that commit works correctly. We subtract
> nr of free cma pages at the first loop, so there is no problem. But,
> if the system
> have some free high-order cma pages, watermark checking allow high-order
> allocation more easily.
>
> I think that loosing the watermark check is right solution so will takes your
> comment on v2. But I want to know other developer's opinion.

Thanks for giving this a thought for your v2 patch.


> If needed, I can implement to track free_area[o].nr_cma_free and use it for
> precise freepage calculation in watermark check.

I guess implementing nr_cma_free would be the correct solution.
Because currently for other than 0 order allocation
we still consider high order free_cma pages as free pages in the for
loop which from the code looks incorrect.

This can lead to situation when we have more high order free CMA pages
but very less unmovable pages, but zone_watermark returns
ok for unmovable page, thus leading to allocation failure every time
instead of recovering from this situation.

But its better if experts comment on this.


>
> Thanks.


Thanks
Ritesh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/3] CMA: always treat free cma pages as non-free on watermark checking
@ 2014-06-02  4:07         ` Ritesh Harjani
  0 siblings, 0 replies; 48+ messages in thread
From: Ritesh Harjani @ 2014-06-02  4:07 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Joonsoo Kim, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, Laura Abbott, Minchan Kim, Heesub Shin,
	Marek Szyprowski, Michal Nazarewicz, Aneesh Kumar K.V,
	Linux Memory Management List, LKML, Nagachandra P, Vinayak Menon,
	Ritesh Harjani, t.stanislaws

Hi Joonsoo,

CC'ing the developer of the patch (Tomasz Stanislawski)


On Fri, May 30, 2014 at 8:16 PM, Joonsoo Kim <js1304@gmail.com> wrote:
> 2014-05-30 19:40 GMT+09:00 Ritesh Harjani <ritesh.list@gmail.com>:
>> Hi Joonsoo,
>>
>> I think you will be loosing the benefit of below patch with your changes.
>> I am no expert here so please bear with me. I tried explaining in the
>> inline comments, let me know if I am wrong.
>>
>> commit 026b08147923142e925a7d0aaa39038055ae0156
>> Author: Tomasz Stanislawski <t.stanislaws@samsung.com>
>> Date:   Wed Jun 12 14:05:02 2013 -0700
>
> Hello, Ritesh.
>
> Thanks for notifying that.
>
>>
>> On Wed, May 28, 2014 at 12:34 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>>> commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag
>>> for alloc flag and treats free cma pages as free pages if this flag is
>>> passed to watermark checking. Intention of that patch is that movable page
>>> allocation can be be handled from cma reserved region without starting
>>> kswapd. Now, previous patch changes the behaviour of allocator that
>>> movable allocation uses the page on cma reserved region aggressively,
>>> so this watermark hack isn't needed anymore. Therefore remove it.
>>>
>>> Acked-by: Michal Nazarewicz <mina86@mina86.com>
>>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index 627dc2e..36e2fcd 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -1117,10 +1117,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>>>
>>>         count_compact_event(COMPACTSTALL);
>>>
>>> -#ifdef CONFIG_CMA
>>> -       if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
>>> -               alloc_flags |= ALLOC_CMA;
>>> -#endif
>>>         /* Compact each zone in the list */
>>>         for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
>>>                                                                 nodemask) {
>>> diff --git a/mm/internal.h b/mm/internal.h
>>> index 07b6736..a121762 100644
>>> --- a/mm/internal.h
>>> +++ b/mm/internal.h
>>> @@ -384,7 +384,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>>>  #define ALLOC_HARDER           0x10 /* try to alloc harder */
>>>  #define ALLOC_HIGH             0x20 /* __GFP_HIGH set */
>>>  #define ALLOC_CPUSET           0x40 /* check for correct cpuset */
>>> -#define ALLOC_CMA              0x80 /* allow allocations from CMA areas */
>>> -#define ALLOC_FAIR             0x100 /* fair zone allocation */
>>> +#define ALLOC_FAIR             0x80 /* fair zone allocation */
>>>
>>>  #endif /* __MM_INTERNAL_H */
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index ca678b6..83a8021 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -1764,20 +1764,22 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark,
>>>         long min = mark;
>>>         long lowmem_reserve = z->lowmem_reserve[classzone_idx];
>>>         int o;
>>> -       long free_cma = 0;
>>>
>>>         free_pages -= (1 << order) - 1;
>>>         if (alloc_flags & ALLOC_HIGH)
>>>                 min -= min / 2;
>>>         if (alloc_flags & ALLOC_HARDER)
>>>                 min -= min / 4;
>>> -#ifdef CONFIG_CMA
>>> -       /* If allocation can't use CMA areas don't use free CMA pages */
>>> -       if (!(alloc_flags & ALLOC_CMA))
>>> -               free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
>>> -#endif
>>> +       /*
>>> +        * We don't want to regard the pages on CMA region as free
>>> +        * on watermark checking, since they cannot be used for
>>> +        * unmovable/reclaimable allocation and they can suddenly
>>> +        * vanish through CMA allocation
>>> +        */
>>> +       if (IS_ENABLED(CONFIG_CMA) && z->managed_cma_pages)
>>> +               free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
>>
>> make this free_cma instead of free_pages.
>>
>>>
>>> -       if (free_pages - free_cma <= min + lowmem_reserve)
>>> +       if (free_pages <= min + lowmem_reserve)
>> free_pages - free_cma <= min + lowmem_reserve
>>
>> Because in for loop you subtract nr_free which includes the CMA pages.
>> So if you have subtracted NR_FREE_CMA_PAGES
>> from free_pages above then you will be subtracting cma pages again in
>> nr_free (below in for loop).
>
> Yes, I understand the problem you mentioned.
>
> I think that this is complicated issue.
>
> Comit '026b081' you mentioned makes watermark_ok() loose for high order
> allocation compared to kernel that CMA isn't enabled, since free_pages includes
> free_cma pages and most of high order allocation except THP would be
> non-movable allocation. This non-movable allocation can't use cma pages,
> so we shouldn't include free_cma pages.
>
> If most of free cma pages are 0 order, that commit works correctly. We subtract
> nr of free cma pages at the first loop, so there is no problem. But,
> if the system
> have some free high-order cma pages, watermark checking allow high-order
> allocation more easily.
>
> I think that loosing the watermark check is right solution so will takes your
> comment on v2. But I want to know other developer's opinion.

Thanks for giving this a thought for your v2 patch.


> If needed, I can implement to track free_area[o].nr_cma_free and use it for
> precise freepage calculation in watermark check.

I guess implementing nr_cma_free would be the correct solution.
Because currently for other than 0 order allocation
we still consider high order free_cma pages as free pages in the for
loop which from the code looks incorrect.

This can lead to situation when we have more high order free CMA pages
but very less unmovable pages, but zone_watermark returns
ok for unmovable page, thus leading to allocation failure every time
instead of recovering from this situation.

But its better if experts comment on this.


>
> Thanks.


Thanks
Ritesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-30 14:23       ` Joonsoo Kim
@ 2014-06-02  5:54         ` Gioh Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Gioh Kim @ 2014-06-02  5:54 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V,
	Linux Memory Management List, LKML, 이건호

I found 2 problems at my platform.

1st is occured when I set CMA size 528MB and total memory is 960MB.
I print some values in adjust_managed_cma_page_count(),
the total value becomes 105439 and cma value 131072.
Finally movable value becomes negative value.

The total value 105439 means 411MB.
Is the zone->managed_pages value pages amount except the CMA?
I think zone->managed_pages value is including CMA size but it's value is strange.

2nd is a kernel panic at __netdev_alloc_skb().
I'm not sure it is caused by the CMA.
I'm checking it again and going to send you another report with detail call-stacks.



2014-05-30 오후 11:23, Joonsoo Kim 쓴 글:
> 2014-05-30 16:53 GMT+09:00 Gioh Kim <gioh.kim@lge.com>:
>> Joonsoo,
>>
>> I'm attaching a patch for combination of __rmqueue and __rmqueue_cma.
>> I didn't test fully but my board is turned on and working well if no frequent memory allocations.
>>
>> I'm sorry to send not-tested code.
>> I just want to report this during your working hour ;-)
>>
>> I'm testing this this evening and reporting next week.
>> Have a nice weekend!
>
> Thanks Gioh. :)
>
>> -------------------------------------- 8< -----------------------------------------
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 7f97767..9ced736 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -964,7 +964,7 @@ static int fallbacks[MIGRATE_TYPES][4] = {
>>          [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,     MIGRATE_R
>>   #ifdef CONFIG_CMA
>>          [MIGRATE_MOVABLE]     = { MIGRATE_CMA,         MIGRATE_RECLAIMABLE, MIGRATE_U
>> -       [MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
>> +       [MIGRATE_CMA]         = { MIGRATE_MOVABLE,     MIGRATE_RECLAIMABLE, MIGRATE_U
>
> I don't want to use __rmqueue_fallback() for CMA.
> __rmqueue_fallback() takes big order page rather than small order page
> in order to steal large amount of pages and continue to use them in
> next allocation attempts.
> We can use CMA pages on limited cases, so stealing some pages from
> other migrate type
> to CMA type isn't good idea to me.
>
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
@ 2014-06-02  5:54         ` Gioh Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Gioh Kim @ 2014-06-02  5:54 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V,
	Linux Memory Management List, LKML, 이건호

I found 2 problems at my platform.

1st is occured when I set CMA size 528MB and total memory is 960MB.
I print some values in adjust_managed_cma_page_count(),
the total value becomes 105439 and cma value 131072.
Finally movable value becomes negative value.

The total value 105439 means 411MB.
Is the zone->managed_pages value pages amount except the CMA?
I think zone->managed_pages value is including CMA size but it's value is strange.

2nd is a kernel panic at __netdev_alloc_skb().
I'm not sure it is caused by the CMA.
I'm checking it again and going to send you another report with detail call-stacks.



2014-05-30 i??i?? 11:23, Joonsoo Kim i?' e,?:
> 2014-05-30 16:53 GMT+09:00 Gioh Kim <gioh.kim@lge.com>:
>> Joonsoo,
>>
>> I'm attaching a patch for combination of __rmqueue and __rmqueue_cma.
>> I didn't test fully but my board is turned on and working well if no frequent memory allocations.
>>
>> I'm sorry to send not-tested code.
>> I just want to report this during your working hour ;-)
>>
>> I'm testing this this evening and reporting next week.
>> Have a nice weekend!
>
> Thanks Gioh. :)
>
>> -------------------------------------- 8< -----------------------------------------
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 7f97767..9ced736 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -964,7 +964,7 @@ static int fallbacks[MIGRATE_TYPES][4] = {
>>          [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,     MIGRATE_R
>>   #ifdef CONFIG_CMA
>>          [MIGRATE_MOVABLE]     = { MIGRATE_CMA,         MIGRATE_RECLAIMABLE, MIGRATE_U
>> -       [MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
>> +       [MIGRATE_CMA]         = { MIGRATE_MOVABLE,     MIGRATE_RECLAIMABLE, MIGRATE_U
>
> I don't want to use __rmqueue_fallback() for CMA.
> __rmqueue_fallback() takes big order page rather than small order page
> in order to steal large amount of pages and continue to use them in
> next allocation attempts.
> We can use CMA pages on limited cases, so stealing some pages from
> other migrate type
> to CMA type isn't good idea to me.
>
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-31  0:02             ` Michal Nazarewicz
@ 2014-06-02  6:17               ` Joonsoo Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-06-02  6:17 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Gioh Kim, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, Laura Abbott, Minchan Kim, Heesub Shin,
	Marek Szyprowski, Aneesh Kumar K.V, linux-mm, linux-kernel

On Sat, May 31, 2014 at 09:02:51AM +0900, Michal Nazarewicz wrote:
> > On Thu, May 29, 2014 at 05:09:34PM +0900, Gioh Kim wrote:
> >> Is IS_ENABLED(CONFIG_CMA) necessary?
> >> What about if (migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages) ?
> 
> On Fri, May 30 2014, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > Yes, field, managed_cma_pages exists only if CONFIG_CMA is enabled, so
> > removing IS_ENABLE(CONFIG_CMA) would break the build.
> 
> That statement makes no sense.  If zone->managed_cma_pages not being
> defined is the problem, what you need is:
> 
> +#ifdef CONFIG_CMA
> +	if (migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
> +		page = __rmqueue_cma(zone, order);
> +#endif
> 
> If you use IS_ENABLED, zone-managed_cma_pages has to be defined
> regardless of result of state of CONFIG_CMA.


Hello,

Oops. I totally misunderstand how IS_ENABLED works.
Thanks for spotting this.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
@ 2014-06-02  6:17               ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-06-02  6:17 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Gioh Kim, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, Laura Abbott, Minchan Kim, Heesub Shin,
	Marek Szyprowski, Aneesh Kumar K.V, linux-mm, linux-kernel

On Sat, May 31, 2014 at 09:02:51AM +0900, Michal Nazarewicz wrote:
> > On Thu, May 29, 2014 at 05:09:34PM +0900, Gioh Kim wrote:
> >> Is IS_ENABLED(CONFIG_CMA) necessary?
> >> What about if (migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages) ?
> 
> On Fri, May 30 2014, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > Yes, field, managed_cma_pages exists only if CONFIG_CMA is enabled, so
> > removing IS_ENABLE(CONFIG_CMA) would break the build.
> 
> That statement makes no sense.  If zone->managed_cma_pages not being
> defined is the problem, what you need is:
> 
> +#ifdef CONFIG_CMA
> +	if (migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
> +		page = __rmqueue_cma(zone, order);
> +#endif
> 
> If you use IS_ENABLED, zone-managed_cma_pages has to be defined
> regardless of result of state of CONFIG_CMA.


Hello,

Oops. I totally misunderstand how IS_ENABLED works.
Thanks for spotting this.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-06-02  5:54         ` Gioh Kim
@ 2014-06-02  6:23           ` Joonsoo Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-06-02  6:23 UTC (permalink / raw)
  To: Gioh Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V,
	Linux Memory Management List, LKML, 이건호

On Mon, Jun 02, 2014 at 02:54:30PM +0900, Gioh Kim wrote:
> I found 2 problems at my platform.
> 
> 1st is occured when I set CMA size 528MB and total memory is 960MB.
> I print some values in adjust_managed_cma_page_count(),
> the total value becomes 105439 and cma value 131072.
> Finally movable value becomes negative value.
> 
> The total value 105439 means 411MB.
> Is the zone->managed_pages value pages amount except the CMA?
> I think zone->managed_pages value is including CMA size but it's value is strange.

Hmm...
zone->managed_pages includes nr of CMA pages.
Is there any mistake about your printk?

> 
> 2nd is a kernel panic at __netdev_alloc_skb().
> I'm not sure it is caused by the CMA.
> I'm checking it again and going to send you another report with detail call-stacks.

Okay.

Thanks.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
@ 2014-06-02  6:23           ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-06-02  6:23 UTC (permalink / raw)
  To: Gioh Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V,
	Linux Memory Management List, LKML, 이건호

On Mon, Jun 02, 2014 at 02:54:30PM +0900, Gioh Kim wrote:
> I found 2 problems at my platform.
> 
> 1st is occured when I set CMA size 528MB and total memory is 960MB.
> I print some values in adjust_managed_cma_page_count(),
> the total value becomes 105439 and cma value 131072.
> Finally movable value becomes negative value.
> 
> The total value 105439 means 411MB.
> Is the zone->managed_pages value pages amount except the CMA?
> I think zone->managed_pages value is including CMA size but it's value is strange.

Hmm...
zone->managed_pages includes nr of CMA pages.
Is there any mistake about your printk?

> 
> 2nd is a kernel panic at __netdev_alloc_skb().
> I'm not sure it is caused by the CMA.
> I'm checking it again and going to send you another report with detail call-stacks.

Okay.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-06-02  6:23           ` Joonsoo Kim
@ 2014-06-02  7:13             ` Gioh Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Gioh Kim @ 2014-06-02  7:13 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V,
	Linux Memory Management List, LKML, 이건호

I'm not sure what I'm doing wrong.
These are my code.

  770 #ifdef CONFIG_CMA
  771 void adjust_managed_cma_page_count(struct zone *zone, long count)
  772 {
  773         unsigned long flags;
  774         long total, cma, movable;
  775
  776         spin_lock_irqsave(&zone->lock, flags);
  777
  778         zone->managed_cma_pages += count;
  779
  780         total = zone->managed_pages;
  781         cma = zone->managed_cma_pages;
  782         movable = total - cma - high_wmark_pages(zone);
  783
  784         printk("count=%ld total=%ld cma=%ld movable=%ld\n",
  785                count, total, cma, movable);
  786


2014-06-02 오후 3:23, Joonsoo Kim 쓴 글:
> On Mon, Jun 02, 2014 at 02:54:30PM +0900, Gioh Kim wrote:
>> I found 2 problems at my platform.
>>
>> 1st is occured when I set CMA size 528MB and total memory is 960MB.
>> I print some values in adjust_managed_cma_page_count(),
>> the total value becomes 105439 and cma value 131072.
>> Finally movable value becomes negative value.
>>
>> The total value 105439 means 411MB.
>> Is the zone->managed_pages value pages amount except the CMA?
>> I think zone->managed_pages value is including CMA size but it's value is strange.
>
> Hmm...
> zone->managed_pages includes nr of CMA pages.
> Is there any mistake about your printk?
>
>>
>> 2nd is a kernel panic at __netdev_alloc_skb().
>> I'm not sure it is caused by the CMA.
>> I'm checking it again and going to send you another report with detail call-stacks.
>
> Okay.
>
> Thanks.
>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
@ 2014-06-02  7:13             ` Gioh Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Gioh Kim @ 2014-06-02  7:13 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V,
	Linux Memory Management List, LKML, 이건호

I'm not sure what I'm doing wrong.
These are my code.

  770 #ifdef CONFIG_CMA
  771 void adjust_managed_cma_page_count(struct zone *zone, long count)
  772 {
  773         unsigned long flags;
  774         long total, cma, movable;
  775
  776         spin_lock_irqsave(&zone->lock, flags);
  777
  778         zone->managed_cma_pages += count;
  779
  780         total = zone->managed_pages;
  781         cma = zone->managed_cma_pages;
  782         movable = total - cma - high_wmark_pages(zone);
  783
  784         printk("count=%ld total=%ld cma=%ld movable=%ld\n",
  785                count, total, cma, movable);
  786


2014-06-02 i??i?? 3:23, Joonsoo Kim i?' e,?:
> On Mon, Jun 02, 2014 at 02:54:30PM +0900, Gioh Kim wrote:
>> I found 2 problems at my platform.
>>
>> 1st is occured when I set CMA size 528MB and total memory is 960MB.
>> I print some values in adjust_managed_cma_page_count(),
>> the total value becomes 105439 and cma value 131072.
>> Finally movable value becomes negative value.
>>
>> The total value 105439 means 411MB.
>> Is the zone->managed_pages value pages amount except the CMA?
>> I think zone->managed_pages value is including CMA size but it's value is strange.
>
> Hmm...
> zone->managed_pages includes nr of CMA pages.
> Is there any mistake about your printk?
>
>>
>> 2nd is a kernel panic at __netdev_alloc_skb().
>> I'm not sure it is caused by the CMA.
>> I'm checking it again and going to send you another report with detail call-stacks.
>
> Okay.
>
> Thanks.
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/3] CMA: always treat free cma pages as non-free on watermark checking
  2014-06-02  4:07         ` Ritesh Harjani
@ 2014-06-02 10:47           ` Bartlomiej Zolnierkiewicz
  -1 siblings, 0 replies; 48+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2014-06-02 10:47 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Joonsoo Kim, Joonsoo Kim, Andrew Morton, Rik van Riel,
	Johannes Weiner, Mel Gorman, Laura Abbott, Minchan Kim,
	Heesub Shin, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Linux Memory Management List, LKML,
	Nagachandra P, Vinayak Menon, Ritesh Harjani, t.stanislaws


Hi,

On Monday, June 02, 2014 09:37:49 AM Ritesh Harjani wrote:
> Hi Joonsoo,
> 
> CC'ing the developer of the patch (Tomasz Stanislawski)
> 
> 
> On Fri, May 30, 2014 at 8:16 PM, Joonsoo Kim <js1304@gmail.com> wrote:
> > 2014-05-30 19:40 GMT+09:00 Ritesh Harjani <ritesh.list@gmail.com>:
> >> Hi Joonsoo,
> >>
> >> I think you will be loosing the benefit of below patch with your changes.
> >> I am no expert here so please bear with me. I tried explaining in the
> >> inline comments, let me know if I am wrong.
> >>
> >> commit 026b08147923142e925a7d0aaa39038055ae0156
> >> Author: Tomasz Stanislawski <t.stanislaws@samsung.com>
> >> Date:   Wed Jun 12 14:05:02 2013 -0700
> >
> > Hello, Ritesh.
> >
> > Thanks for notifying that.
> >
> >>
> >> On Wed, May 28, 2014 at 12:34 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> >>> commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag

It is a bit of shame that the author of commit d95ea5d1 (happens to be me :)
was not on cc:.

> >>> for alloc flag and treats free cma pages as free pages if this flag is
> >>> passed to watermark checking. Intention of that patch is that movable page
> >>> allocation can be be handled from cma reserved region without starting
> >>> kswapd. Now, previous patch changes the behaviour of allocator that
> >>> movable allocation uses the page on cma reserved region aggressively,
> >>> so this watermark hack isn't needed anymore. Therefore remove it.
> >>>
> >>> Acked-by: Michal Nazarewicz <mina86@mina86.com>
> >>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >>>
> >>> diff --git a/mm/compaction.c b/mm/compaction.c
> >>> index 627dc2e..36e2fcd 100644
> >>> --- a/mm/compaction.c
> >>> +++ b/mm/compaction.c
> >>> @@ -1117,10 +1117,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
> >>>
> >>>         count_compact_event(COMPACTSTALL);
> >>>
> >>> -#ifdef CONFIG_CMA
> >>> -       if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> >>> -               alloc_flags |= ALLOC_CMA;
> >>> -#endif
> >>>         /* Compact each zone in the list */
> >>>         for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
> >>>                                                                 nodemask) {
> >>> diff --git a/mm/internal.h b/mm/internal.h
> >>> index 07b6736..a121762 100644
> >>> --- a/mm/internal.h
> >>> +++ b/mm/internal.h
> >>> @@ -384,7 +384,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
> >>>  #define ALLOC_HARDER           0x10 /* try to alloc harder */
> >>>  #define ALLOC_HIGH             0x20 /* __GFP_HIGH set */
> >>>  #define ALLOC_CPUSET           0x40 /* check for correct cpuset */
> >>> -#define ALLOC_CMA              0x80 /* allow allocations from CMA areas */
> >>> -#define ALLOC_FAIR             0x100 /* fair zone allocation */
> >>> +#define ALLOC_FAIR             0x80 /* fair zone allocation */
> >>>
> >>>  #endif /* __MM_INTERNAL_H */
> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>> index ca678b6..83a8021 100644
> >>> --- a/mm/page_alloc.c
> >>> +++ b/mm/page_alloc.c
> >>> @@ -1764,20 +1764,22 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark,
> >>>         long min = mark;
> >>>         long lowmem_reserve = z->lowmem_reserve[classzone_idx];
> >>>         int o;
> >>> -       long free_cma = 0;
> >>>
> >>>         free_pages -= (1 << order) - 1;
> >>>         if (alloc_flags & ALLOC_HIGH)
> >>>                 min -= min / 2;
> >>>         if (alloc_flags & ALLOC_HARDER)
> >>>                 min -= min / 4;
> >>> -#ifdef CONFIG_CMA
> >>> -       /* If allocation can't use CMA areas don't use free CMA pages */
> >>> -       if (!(alloc_flags & ALLOC_CMA))
> >>> -               free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
> >>> -#endif
> >>> +       /*
> >>> +        * We don't want to regard the pages on CMA region as free
> >>> +        * on watermark checking, since they cannot be used for
> >>> +        * unmovable/reclaimable allocation and they can suddenly
> >>> +        * vanish through CMA allocation
> >>> +        */
> >>> +       if (IS_ENABLED(CONFIG_CMA) && z->managed_cma_pages)
> >>> +               free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
> >>
> >> make this free_cma instead of free_pages.
> >>
> >>>
> >>> -       if (free_pages - free_cma <= min + lowmem_reserve)
> >>> +       if (free_pages <= min + lowmem_reserve)
> >> free_pages - free_cma <= min + lowmem_reserve
> >>
> >> Because in for loop you subtract nr_free which includes the CMA pages.
> >> So if you have subtracted NR_FREE_CMA_PAGES
> >> from free_pages above then you will be subtracting cma pages again in
> >> nr_free (below in for loop).
> >
> > Yes, I understand the problem you mentioned.
> >
> > I think that this is complicated issue.
> >
> > Comit '026b081' you mentioned makes watermark_ok() loose for high order
> > allocation compared to kernel that CMA isn't enabled, since free_pages includes
> > free_cma pages and most of high order allocation except THP would be
> > non-movable allocation. This non-movable allocation can't use cma pages,
> > so we shouldn't include free_cma pages.
> >
> > If most of free cma pages are 0 order, that commit works correctly. We subtract
> > nr of free cma pages at the first loop, so there is no problem. But,
> > if the system
> > have some free high-order cma pages, watermark checking allow high-order
> > allocation more easily.
> > 
> > I think that loosing the watermark check is right solution so will takes your
> > comment on v2. But I want to know other developer's opinion.
> 
> Thanks for giving this a thought for your v2 patch.
> 
> 
> > If needed, I can implement to track free_area[o].nr_cma_free and use it for
> > precise freepage calculation in watermark check.
> >
> I guess implementing nr_cma_free would be the correct solution.
> Because currently for other than 0 order allocation
> we still consider high order free_cma pages as free pages in the for
> loop which from the code looks incorrect.
> 
> This can lead to situation when we have more high order free CMA pages
> but very less unmovable pages, but zone_watermark returns
> ok for unmovable page, thus leading to allocation failure every time
> instead of recovering from this situation.
> 
> But its better if experts comment on this.

I think that implementing free_area[].nr_cma_free is a correct long-term
solution and it should be done before the current patch gets applied.

[ Tomasz is on holiday currently but he should be back tomorrow so he can
  also take a look at the issue. ]

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/3] CMA: always treat free cma pages as non-free on watermark checking
@ 2014-06-02 10:47           ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 48+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2014-06-02 10:47 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Joonsoo Kim, Joonsoo Kim, Andrew Morton, Rik van Riel,
	Johannes Weiner, Mel Gorman, Laura Abbott, Minchan Kim,
	Heesub Shin, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Linux Memory Management List, LKML,
	Nagachandra P, Vinayak Menon, Ritesh Harjani, t.stanislaws


Hi,

On Monday, June 02, 2014 09:37:49 AM Ritesh Harjani wrote:
> Hi Joonsoo,
> 
> CC'ing the developer of the patch (Tomasz Stanislawski)
> 
> 
> On Fri, May 30, 2014 at 8:16 PM, Joonsoo Kim <js1304@gmail.com> wrote:
> > 2014-05-30 19:40 GMT+09:00 Ritesh Harjani <ritesh.list@gmail.com>:
> >> Hi Joonsoo,
> >>
> >> I think you will be loosing the benefit of below patch with your changes.
> >> I am no expert here so please bear with me. I tried explaining in the
> >> inline comments, let me know if I am wrong.
> >>
> >> commit 026b08147923142e925a7d0aaa39038055ae0156
> >> Author: Tomasz Stanislawski <t.stanislaws@samsung.com>
> >> Date:   Wed Jun 12 14:05:02 2013 -0700
> >
> > Hello, Ritesh.
> >
> > Thanks for notifying that.
> >
> >>
> >> On Wed, May 28, 2014 at 12:34 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> >>> commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag

It is a bit of shame that the author of commit d95ea5d1 (happens to be me :)
was not on cc:.

> >>> for alloc flag and treats free cma pages as free pages if this flag is
> >>> passed to watermark checking. Intention of that patch is that movable page
> >>> allocation can be be handled from cma reserved region without starting
> >>> kswapd. Now, previous patch changes the behaviour of allocator that
> >>> movable allocation uses the page on cma reserved region aggressively,
> >>> so this watermark hack isn't needed anymore. Therefore remove it.
> >>>
> >>> Acked-by: Michal Nazarewicz <mina86@mina86.com>
> >>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >>>
> >>> diff --git a/mm/compaction.c b/mm/compaction.c
> >>> index 627dc2e..36e2fcd 100644
> >>> --- a/mm/compaction.c
> >>> +++ b/mm/compaction.c
> >>> @@ -1117,10 +1117,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
> >>>
> >>>         count_compact_event(COMPACTSTALL);
> >>>
> >>> -#ifdef CONFIG_CMA
> >>> -       if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> >>> -               alloc_flags |= ALLOC_CMA;
> >>> -#endif
> >>>         /* Compact each zone in the list */
> >>>         for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
> >>>                                                                 nodemask) {
> >>> diff --git a/mm/internal.h b/mm/internal.h
> >>> index 07b6736..a121762 100644
> >>> --- a/mm/internal.h
> >>> +++ b/mm/internal.h
> >>> @@ -384,7 +384,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
> >>>  #define ALLOC_HARDER           0x10 /* try to alloc harder */
> >>>  #define ALLOC_HIGH             0x20 /* __GFP_HIGH set */
> >>>  #define ALLOC_CPUSET           0x40 /* check for correct cpuset */
> >>> -#define ALLOC_CMA              0x80 /* allow allocations from CMA areas */
> >>> -#define ALLOC_FAIR             0x100 /* fair zone allocation */
> >>> +#define ALLOC_FAIR             0x80 /* fair zone allocation */
> >>>
> >>>  #endif /* __MM_INTERNAL_H */
> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>> index ca678b6..83a8021 100644
> >>> --- a/mm/page_alloc.c
> >>> +++ b/mm/page_alloc.c
> >>> @@ -1764,20 +1764,22 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark,
> >>>         long min = mark;
> >>>         long lowmem_reserve = z->lowmem_reserve[classzone_idx];
> >>>         int o;
> >>> -       long free_cma = 0;
> >>>
> >>>         free_pages -= (1 << order) - 1;
> >>>         if (alloc_flags & ALLOC_HIGH)
> >>>                 min -= min / 2;
> >>>         if (alloc_flags & ALLOC_HARDER)
> >>>                 min -= min / 4;
> >>> -#ifdef CONFIG_CMA
> >>> -       /* If allocation can't use CMA areas don't use free CMA pages */
> >>> -       if (!(alloc_flags & ALLOC_CMA))
> >>> -               free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
> >>> -#endif
> >>> +       /*
> >>> +        * We don't want to regard the pages on CMA region as free
> >>> +        * on watermark checking, since they cannot be used for
> >>> +        * unmovable/reclaimable allocation and they can suddenly
> >>> +        * vanish through CMA allocation
> >>> +        */
> >>> +       if (IS_ENABLED(CONFIG_CMA) && z->managed_cma_pages)
> >>> +               free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
> >>
> >> make this free_cma instead of free_pages.
> >>
> >>>
> >>> -       if (free_pages - free_cma <= min + lowmem_reserve)
> >>> +       if (free_pages <= min + lowmem_reserve)
> >> free_pages - free_cma <= min + lowmem_reserve
> >>
> >> Because in for loop you subtract nr_free which includes the CMA pages.
> >> So if you have subtracted NR_FREE_CMA_PAGES
> >> from free_pages above then you will be subtracting cma pages again in
> >> nr_free (below in for loop).
> >
> > Yes, I understand the problem you mentioned.
> >
> > I think that this is complicated issue.
> >
> > Comit '026b081' you mentioned makes watermark_ok() loose for high order
> > allocation compared to kernel that CMA isn't enabled, since free_pages includes
> > free_cma pages and most of high order allocation except THP would be
> > non-movable allocation. This non-movable allocation can't use cma pages,
> > so we shouldn't include free_cma pages.
> >
> > If most of free cma pages are 0 order, that commit works correctly. We subtract
> > nr of free cma pages at the first loop, so there is no problem. But,
> > if the system
> > have some free high-order cma pages, watermark checking allow high-order
> > allocation more easily.
> > 
> > I think that loosing the watermark check is right solution so will takes your
> > comment on v2. But I want to know other developer's opinion.
> 
> Thanks for giving this a thought for your v2 patch.
> 
> 
> > If needed, I can implement to track free_area[o].nr_cma_free and use it for
> > precise freepage calculation in watermark check.
> >
> I guess implementing nr_cma_free would be the correct solution.
> Because currently for other than 0 order allocation
> we still consider high order free_cma pages as free pages in the for
> loop which from the code looks incorrect.
> 
> This can lead to situation when we have more high order free CMA pages
> but very less unmovable pages, but zone_watermark returns
> ok for unmovable page, thus leading to allocation failure every time
> instead of recovering from this situation.
> 
> But its better if experts comment on this.

I think that implementing free_area[].nr_cma_free is a correct long-term
solution and it should be done before the current patch gets applied.

[ Tomasz is on holiday currently but he should be back tomorrow so he can
  also take a look at the issue. ]

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/3] CMA: always treat free cma pages as non-free on watermark checking
  2014-06-02 10:47           ` Bartlomiej Zolnierkiewicz
@ 2014-06-02 14:05             ` Joonsoo Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-06-02 14:05 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Ritesh Harjani, Joonsoo Kim, Andrew Morton, Rik van Riel,
	Johannes Weiner, Mel Gorman, Laura Abbott, Minchan Kim,
	Heesub Shin, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Linux Memory Management List, LKML,
	Nagachandra P, Vinayak Menon, Ritesh Harjani, t.stanislaws

2014-06-02 19:47 GMT+09:00 Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>:
>
> Hi,
>
> On Monday, June 02, 2014 09:37:49 AM Ritesh Harjani wrote:
>> Hi Joonsoo,
>>
>> CC'ing the developer of the patch (Tomasz Stanislawski)
>>
>>
>> On Fri, May 30, 2014 at 8:16 PM, Joonsoo Kim <js1304@gmail.com> wrote:
>> > 2014-05-30 19:40 GMT+09:00 Ritesh Harjani <ritesh.list@gmail.com>:
>> >> Hi Joonsoo,
>> >>
>> >> I think you will be loosing the benefit of below patch with your changes.
>> >> I am no expert here so please bear with me. I tried explaining in the
>> >> inline comments, let me know if I am wrong.
>> >>
>> >> commit 026b08147923142e925a7d0aaa39038055ae0156
>> >> Author: Tomasz Stanislawski <t.stanislaws@samsung.com>
>> >> Date:   Wed Jun 12 14:05:02 2013 -0700
>> >
>> > Hello, Ritesh.
>> >
>> > Thanks for notifying that.
>> >
>> >>
>> >> On Wed, May 28, 2014 at 12:34 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> >>> commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag
>
> It is a bit of shame that the author of commit d95ea5d1 (happens to be me :)
> was not on cc:.

Sorry about that.
I will add you on cc in next spin. :)

>> >>> for alloc flag and treats free cma pages as free pages if this flag is
>> >>> passed to watermark checking. Intention of that patch is that movable page
>> >>> allocation can be be handled from cma reserved region without starting
>> >>> kswapd. Now, previous patch changes the behaviour of allocator that
>> >>> movable allocation uses the page on cma reserved region aggressively,
>> >>> so this watermark hack isn't needed anymore. Therefore remove it.
>> >>>
>> >>> Acked-by: Michal Nazarewicz <mina86@mina86.com>
>> >>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> >>>
>> >>> diff --git a/mm/compaction.c b/mm/compaction.c
>> >>> index 627dc2e..36e2fcd 100644
>> >>> --- a/mm/compaction.c
>> >>> +++ b/mm/compaction.c
>> >>> @@ -1117,10 +1117,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>> >>>
>> >>>         count_compact_event(COMPACTSTALL);
>> >>>
>> >>> -#ifdef CONFIG_CMA
>> >>> -       if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
>> >>> -               alloc_flags |= ALLOC_CMA;
>> >>> -#endif
>> >>>         /* Compact each zone in the list */
>> >>>         for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
>> >>>                                                                 nodemask) {
>> >>> diff --git a/mm/internal.h b/mm/internal.h
>> >>> index 07b6736..a121762 100644
>> >>> --- a/mm/internal.h
>> >>> +++ b/mm/internal.h
>> >>> @@ -384,7 +384,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>> >>>  #define ALLOC_HARDER           0x10 /* try to alloc harder */
>> >>>  #define ALLOC_HIGH             0x20 /* __GFP_HIGH set */
>> >>>  #define ALLOC_CPUSET           0x40 /* check for correct cpuset */
>> >>> -#define ALLOC_CMA              0x80 /* allow allocations from CMA areas */
>> >>> -#define ALLOC_FAIR             0x100 /* fair zone allocation */
>> >>> +#define ALLOC_FAIR             0x80 /* fair zone allocation */
>> >>>
>> >>>  #endif /* __MM_INTERNAL_H */
>> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> >>> index ca678b6..83a8021 100644
>> >>> --- a/mm/page_alloc.c
>> >>> +++ b/mm/page_alloc.c
>> >>> @@ -1764,20 +1764,22 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark,
>> >>>         long min = mark;
>> >>>         long lowmem_reserve = z->lowmem_reserve[classzone_idx];
>> >>>         int o;
>> >>> -       long free_cma = 0;
>> >>>
>> >>>         free_pages -= (1 << order) - 1;
>> >>>         if (alloc_flags & ALLOC_HIGH)
>> >>>                 min -= min / 2;
>> >>>         if (alloc_flags & ALLOC_HARDER)
>> >>>                 min -= min / 4;
>> >>> -#ifdef CONFIG_CMA
>> >>> -       /* If allocation can't use CMA areas don't use free CMA pages */
>> >>> -       if (!(alloc_flags & ALLOC_CMA))
>> >>> -               free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
>> >>> -#endif
>> >>> +       /*
>> >>> +        * We don't want to regard the pages on CMA region as free
>> >>> +        * on watermark checking, since they cannot be used for
>> >>> +        * unmovable/reclaimable allocation and they can suddenly
>> >>> +        * vanish through CMA allocation
>> >>> +        */
>> >>> +       if (IS_ENABLED(CONFIG_CMA) && z->managed_cma_pages)
>> >>> +               free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
>> >>
>> >> make this free_cma instead of free_pages.
>> >>
>> >>>
>> >>> -       if (free_pages - free_cma <= min + lowmem_reserve)
>> >>> +       if (free_pages <= min + lowmem_reserve)
>> >> free_pages - free_cma <= min + lowmem_reserve
>> >>
>> >> Because in for loop you subtract nr_free which includes the CMA pages.
>> >> So if you have subtracted NR_FREE_CMA_PAGES
>> >> from free_pages above then you will be subtracting cma pages again in
>> >> nr_free (below in for loop).
>> >
>> > Yes, I understand the problem you mentioned.
>> >
>> > I think that this is complicated issue.
>> >
>> > Comit '026b081' you mentioned makes watermark_ok() loose for high order
>> > allocation compared to kernel that CMA isn't enabled, since free_pages includes
>> > free_cma pages and most of high order allocation except THP would be
>> > non-movable allocation. This non-movable allocation can't use cma pages,
>> > so we shouldn't include free_cma pages.
>> >
>> > If most of free cma pages are 0 order, that commit works correctly. We subtract
>> > nr of free cma pages at the first loop, so there is no problem. But,
>> > if the system
>> > have some free high-order cma pages, watermark checking allow high-order
>> > allocation more easily.
>> >
>> > I think that loosing the watermark check is right solution so will takes your
>> > comment on v2. But I want to know other developer's opinion.
>>
>> Thanks for giving this a thought for your v2 patch.
>>
>>
>> > If needed, I can implement to track free_area[o].nr_cma_free and use it for
>> > precise freepage calculation in watermark check.
>> >
>> I guess implementing nr_cma_free would be the correct solution.
>> Because currently for other than 0 order allocation
>> we still consider high order free_cma pages as free pages in the for
>> loop which from the code looks incorrect.
>>
>> This can lead to situation when we have more high order free CMA pages
>> but very less unmovable pages, but zone_watermark returns
>> ok for unmovable page, thus leading to allocation failure every time
>> instead of recovering from this situation.
>>
>> But its better if experts comment on this.
>
> I think that implementing free_area[].nr_cma_free is a correct long-term
> solution and it should be done before the current patch gets applied.

Okay.

> [ Tomasz is on holiday currently but he should be back tomorrow so he can
>   also take a look at the issue. ]

Okay.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 3/3] CMA: always treat free cma pages as non-free on watermark checking
@ 2014-06-02 14:05             ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2014-06-02 14:05 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Ritesh Harjani, Joonsoo Kim, Andrew Morton, Rik van Riel,
	Johannes Weiner, Mel Gorman, Laura Abbott, Minchan Kim,
	Heesub Shin, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Linux Memory Management List, LKML,
	Nagachandra P, Vinayak Menon, Ritesh Harjani, t.stanislaws

2014-06-02 19:47 GMT+09:00 Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>:
>
> Hi,
>
> On Monday, June 02, 2014 09:37:49 AM Ritesh Harjani wrote:
>> Hi Joonsoo,
>>
>> CC'ing the developer of the patch (Tomasz Stanislawski)
>>
>>
>> On Fri, May 30, 2014 at 8:16 PM, Joonsoo Kim <js1304@gmail.com> wrote:
>> > 2014-05-30 19:40 GMT+09:00 Ritesh Harjani <ritesh.list@gmail.com>:
>> >> Hi Joonsoo,
>> >>
>> >> I think you will be loosing the benefit of below patch with your changes.
>> >> I am no expert here so please bear with me. I tried explaining in the
>> >> inline comments, let me know if I am wrong.
>> >>
>> >> commit 026b08147923142e925a7d0aaa39038055ae0156
>> >> Author: Tomasz Stanislawski <t.stanislaws@samsung.com>
>> >> Date:   Wed Jun 12 14:05:02 2013 -0700
>> >
>> > Hello, Ritesh.
>> >
>> > Thanks for notifying that.
>> >
>> >>
>> >> On Wed, May 28, 2014 at 12:34 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> >>> commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag
>
> It is a bit of shame that the author of commit d95ea5d1 (happens to be me :)
> was not on cc:.

Sorry about that.
I will add you on cc in next spin. :)

>> >>> for alloc flag and treats free cma pages as free pages if this flag is
>> >>> passed to watermark checking. Intention of that patch is that movable page
>> >>> allocation can be be handled from cma reserved region without starting
>> >>> kswapd. Now, previous patch changes the behaviour of allocator that
>> >>> movable allocation uses the page on cma reserved region aggressively,
>> >>> so this watermark hack isn't needed anymore. Therefore remove it.
>> >>>
>> >>> Acked-by: Michal Nazarewicz <mina86@mina86.com>
>> >>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> >>>
>> >>> diff --git a/mm/compaction.c b/mm/compaction.c
>> >>> index 627dc2e..36e2fcd 100644
>> >>> --- a/mm/compaction.c
>> >>> +++ b/mm/compaction.c
>> >>> @@ -1117,10 +1117,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>> >>>
>> >>>         count_compact_event(COMPACTSTALL);
>> >>>
>> >>> -#ifdef CONFIG_CMA
>> >>> -       if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
>> >>> -               alloc_flags |= ALLOC_CMA;
>> >>> -#endif
>> >>>         /* Compact each zone in the list */
>> >>>         for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
>> >>>                                                                 nodemask) {
>> >>> diff --git a/mm/internal.h b/mm/internal.h
>> >>> index 07b6736..a121762 100644
>> >>> --- a/mm/internal.h
>> >>> +++ b/mm/internal.h
>> >>> @@ -384,7 +384,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>> >>>  #define ALLOC_HARDER           0x10 /* try to alloc harder */
>> >>>  #define ALLOC_HIGH             0x20 /* __GFP_HIGH set */
>> >>>  #define ALLOC_CPUSET           0x40 /* check for correct cpuset */
>> >>> -#define ALLOC_CMA              0x80 /* allow allocations from CMA areas */
>> >>> -#define ALLOC_FAIR             0x100 /* fair zone allocation */
>> >>> +#define ALLOC_FAIR             0x80 /* fair zone allocation */
>> >>>
>> >>>  #endif /* __MM_INTERNAL_H */
>> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> >>> index ca678b6..83a8021 100644
>> >>> --- a/mm/page_alloc.c
>> >>> +++ b/mm/page_alloc.c
>> >>> @@ -1764,20 +1764,22 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark,
>> >>>         long min = mark;
>> >>>         long lowmem_reserve = z->lowmem_reserve[classzone_idx];
>> >>>         int o;
>> >>> -       long free_cma = 0;
>> >>>
>> >>>         free_pages -= (1 << order) - 1;
>> >>>         if (alloc_flags & ALLOC_HIGH)
>> >>>                 min -= min / 2;
>> >>>         if (alloc_flags & ALLOC_HARDER)
>> >>>                 min -= min / 4;
>> >>> -#ifdef CONFIG_CMA
>> >>> -       /* If allocation can't use CMA areas don't use free CMA pages */
>> >>> -       if (!(alloc_flags & ALLOC_CMA))
>> >>> -               free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
>> >>> -#endif
>> >>> +       /*
>> >>> +        * We don't want to regard the pages on CMA region as free
>> >>> +        * on watermark checking, since they cannot be used for
>> >>> +        * unmovable/reclaimable allocation and they can suddenly
>> >>> +        * vanish through CMA allocation
>> >>> +        */
>> >>> +       if (IS_ENABLED(CONFIG_CMA) && z->managed_cma_pages)
>> >>> +               free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
>> >>
>> >> make this free_cma instead of free_pages.
>> >>
>> >>>
>> >>> -       if (free_pages - free_cma <= min + lowmem_reserve)
>> >>> +       if (free_pages <= min + lowmem_reserve)
>> >> free_pages - free_cma <= min + lowmem_reserve
>> >>
>> >> Because in for loop you subtract nr_free which includes the CMA pages.
>> >> So if you have subtracted NR_FREE_CMA_PAGES
>> >> from free_pages above then you will be subtracting cma pages again in
>> >> nr_free (below in for loop).
>> >
>> > Yes, I understand the problem you mentioned.
>> >
>> > I think that this is complicated issue.
>> >
>> > Comit '026b081' you mentioned makes watermark_ok() loose for high order
>> > allocation compared to kernel that CMA isn't enabled, since free_pages includes
>> > free_cma pages and most of high order allocation except THP would be
>> > non-movable allocation. This non-movable allocation can't use cma pages,
>> > so we shouldn't include free_cma pages.
>> >
>> > If most of free cma pages are 0 order, that commit works correctly. We subtract
>> > nr of free cma pages at the first loop, so there is no problem. But,
>> > if the system
>> > have some free high-order cma pages, watermark checking allow high-order
>> > allocation more easily.
>> >
>> > I think that loosing the watermark check is right solution so will takes your
>> > comment on v2. But I want to know other developer's opinion.
>>
>> Thanks for giving this a thought for your v2 patch.
>>
>>
>> > If needed, I can implement to track free_area[o].nr_cma_free and use it for
>> > precise freepage calculation in watermark check.
>> >
>> I guess implementing nr_cma_free would be the correct solution.
>> Because currently for other than 0 order allocation
>> we still consider high order free_cma pages as free pages in the for
>> loop which from the code looks incorrect.
>>
>> This can lead to situation when we have more high order free CMA pages
>> but very less unmovable pages, but zone_watermark returns
>> ok for unmovable page, thus leading to allocation failure every time
>> instead of recovering from this situation.
>>
>> But its better if experts comment on this.
>
> I think that implementing free_area[].nr_cma_free is a correct long-term
> solution and it should be done before the current patch gets applied.

Okay.

> [ Tomasz is on holiday currently but he should be back tomorrow so he can
>   also take a look at the issue. ]

Okay.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-28  7:04   ` Joonsoo Kim
@ 2014-10-30 10:37     ` Hui Zhu
  -1 siblings, 0 replies; 48+ messages in thread
From: Hui Zhu @ 2014-10-30 10:37 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel

On Wed, May 28, 2014 at 3:04 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> CMA is introduced to provide physically contiguous pages at runtime.
> For this purpose, it reserves memory at boot time. Although it reserve
> memory, this reserved memory can be used for movable memory allocation
> request. This usecase is beneficial to the system that needs this CMA
> reserved memory infrequently and it is one of main purpose of
> introducing CMA.
>
> But, there is a problem in current implementation. The problem is that
> it works like as just reserved memory approach. The pages on cma reserved
> memory are hardly used for movable memory allocation. This is caused by
> combination of allocation and reclaim policy.
>
> The pages on cma reserved memory are allocated if there is no movable
> memory, that is, as fallback allocation. So the time this fallback
> allocation is started is under heavy memory pressure. Although it is under
> memory pressure, movable allocation easily succeed, since there would be
> many pages on cma reserved memory. But this is not the case for unmovable
> and reclaimable allocation, because they can't use the pages on cma
> reserved memory. These allocations regard system's free memory as
> (free pages - free cma pages) on watermark checking, that is, free
> unmovable pages + free reclaimable pages + free movable pages. Because
> we already exhausted movable pages, only free pages we have are unmovable
> and reclaimable types and this would be really small amount. So watermark
> checking would be failed. It will wake up kswapd to make enough free
> memory for unmovable and reclaimable allocation and kswapd will do.
> So before we fully utilize pages on cma reserved memory, kswapd start to
> reclaim memory and try to make free memory over the high watermark. This
> watermark checking by kswapd doesn't take care free cma pages so many
> movable pages would be reclaimed. After then, we have a lot of movable
> pages again, so fallback allocation doesn't happen again. To conclude,
> amount of free memory on meminfo which includes free CMA pages is moving
> around 512 MB if I reserve 512 MB memory for CMA.
>
> I found this problem on following experiment.
>
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           225.2           472.5
> Average-MemFree:        322490 KB       630839 KB
>
> To solve this problem, I can think following 2 possible solutions.
> 1. allocate the pages on cma reserved memory first, and if they are
>    exhausted, allocate movable pages.
> 2. interleaved allocation: try to allocate specific amounts of memory
>    from cma reserved memory and then allocate from free movable memory.
>
> I tested #1 approach and found the problem. Although free memory on
> meminfo can move around low watermark, there is large fluctuation on free
> memory, because too many pages are reclaimed when kswapd is invoked.
> Reason for this behaviour is that successive allocated CMA pages are
> on the LRU list in that order and kswapd reclaim them in same order.
> These memory doesn't help watermark checking from kwapd, so too many
> pages are reclaimed, I guess.

Could you send more information about this part?  I want to do some
test around it.
I use this way in my patch.

Thanks,
Hui

>
> So, I implement #2 approach.
> One thing I should note is that we should not change allocation target
> (movable list or cma) on each allocation attempt, since this prevent
> allocated pages to be in physically succession, so some I/O devices can
> be hurt their performance. To solve this, I keep allocation target
> in at least pageblock_nr_pages attempts and make this number reflect
> ratio, free pages without free cma pages to free cma pages. With this
> approach, system works very smoothly and fully utilize the pages on
> cma reserved memory.
>
> Following is the experimental result of this patch.
>
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
>
> <Before>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           225.2           472.5
> Average-MemFree:        322490 KB       630839 KB
> nr_free_cma:            0               131068
> pswpin:                 0               261666
> pswpout:                75              1241363
>
> <After>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           222.7           224
> Average-MemFree:        325595 KB       393033 KB
> nr_free_cma:            0               61001
> pswpin:                 0               6
> pswpout:                44              502
>
> There is no difference if we don't have cma reserved memory (0 MB case).
> But, with cma reserved memory (512 MB case), we fully utilize these
> reserved memory through this patch and the system behaves like as
> it doesn't reserve any memory.
>
> With this patch, we aggressively allocate the pages on cma reserved memory
> so latency of CMA can arise. Below is the experimental result about
> latency.
>
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> CMA reserve: 512 MB
> Backgound Workload: make -jN
> Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval
>
> N:                    1        4       8        16
> Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
> Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2
>
> So generally we can see latency increase. Ratio of this increase
> is rather big - up to 70%. But, under the heavy workload, it shows
> latency decrease - up to 55%. This may be worst-case scenario, but
> reducing it would be important for some system, so, I can say that
> this patch have advantages and disadvantages in terms of latency.
>
> Although I think that this patch is right direction for CMA, there is
> side-effect in following case. If there is small memory zone and CMA
> occupys most of them, LRU for this zone would have many CMA pages. When
> reclaim is started, these CMA pages would be reclaimed, but not counted
> for watermark checking, so too many CMA pages could be reclaimed
> unnecessarily. Until now, this can't happen because free CMA pages aren't
> used easily. But, with this patch, free CMA pages are used easily so
> this problem can be possible. I will handle it on another patchset
> after some investigating.
>
> v2: In fastpath, just replenish counters. Calculation is done whenver
>     cma area is varied
>
> Acked-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c
> index d9d3d85..84a7582 100644
> --- a/arch/powerpc/kvm/book3s_hv_cma.c
> +++ b/arch/powerpc/kvm/book3s_hv_cma.c
> @@ -132,6 +132,8 @@ struct page *kvm_alloc_cma(unsigned long nr_pages, unsigned long align_pages)
>                 if (ret == 0) {
>                         bitmap_set(cma->bitmap, pageno, nr_chunk);
>                         page = pfn_to_page(pfn);
> +                       adjust_managed_cma_page_count(page_zone(page),
> +                                                               nr_pages);
>                         memset(pfn_to_kaddr(pfn), 0, nr_pages << PAGE_SHIFT);
>                         break;
>                 } else if (ret != -EBUSY) {
> @@ -180,6 +182,7 @@ bool kvm_release_cma(struct page *pages, unsigned long nr_pages)
>                      (pfn - cma->base_pfn) >> (KVM_CMA_CHUNK_ORDER - PAGE_SHIFT),
>                      nr_chunk);
>         free_contig_range(pfn, nr_pages);
> +       adjust_managed_cma_page_count(page_zone(pages), nr_pages);
>         mutex_unlock(&kvm_cma_mutex);
>
>         return true;
> @@ -210,6 +213,8 @@ static int __init kvm_cma_activate_area(unsigned long base_pfn,
>                 }
>                 init_cma_reserved_pageblock(pfn_to_page(base_pfn));
>         } while (--i);
> +       adjust_managed_cma_page_count(zone, count);
> +
>         return 0;
>  }
>
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index 165c2c2..c578d5a 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -160,6 +160,7 @@ static int __init cma_activate_area(struct cma *cma)
>                 }
>                 init_cma_reserved_pageblock(pfn_to_page(base_pfn));
>         } while (--i);
> +       adjust_managed_cma_page_count(zone, cma->count);
>
>         return 0;
>  }
> @@ -307,6 +308,7 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count,
>                 if (ret == 0) {
>                         bitmap_set(cma->bitmap, pageno, count);
>                         page = pfn_to_page(pfn);
> +                       adjust_managed_cma_page_count(page_zone(page), count);
>                         break;
>                 } else if (ret != -EBUSY) {
>                         break;
> @@ -353,6 +355,7 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
>         mutex_lock(&cma_mutex);
>         bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
>         free_contig_range(pfn, count);
> +       adjust_managed_cma_page_count(page_zone(pages), count);
>         mutex_unlock(&cma_mutex);
>
>         return true;
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 39b81dc..51cffc1 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -415,6 +415,7 @@ extern int alloc_contig_range(unsigned long start, unsigned long end,
>  extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
>
>  /* CMA stuff */
> +extern void adjust_managed_cma_page_count(struct zone *zone, long count);
>  extern void init_cma_reserved_pageblock(struct page *page);
>
>  #endif
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fac5509..f52cb96 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -389,6 +389,20 @@ struct zone {
>         int                     compact_order_failed;
>  #endif
>
> +#ifdef CONFIG_CMA
> +       unsigned long managed_cma_pages;
> +       /*
> +        * Number of allocation attempt on each movable/cma type
> +        * without switching type. max_try(movable/cma) maintain
> +        * predefined calculated counter and replenish nr_try_(movable/cma)
> +        * with each of them whenever both of them are 0.
> +        */
> +       int nr_try_movable;
> +       int nr_try_cma;
> +       int max_try_movable;
> +       int max_try_cma;
> +#endif
> +
>         ZONE_PADDING(_pad1_)
>
>         /* Fields commonly accessed by the page reclaim scanner */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 674ade7..ca678b6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -788,6 +788,56 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
>  }
>
>  #ifdef CONFIG_CMA
> +void adjust_managed_cma_page_count(struct zone *zone, long count)
> +{
> +       unsigned long flags;
> +       long total, cma, movable;
> +
> +       spin_lock_irqsave(&zone->lock, flags);
> +       zone->managed_cma_pages += count;
> +
> +       total = zone->managed_pages;
> +       cma = zone->managed_cma_pages;
> +       movable = total - cma - high_wmark_pages(zone);
> +
> +       /* No cma pages, so do only movable allocation */
> +       if (cma <= 0) {
> +               zone->max_try_movable = pageblock_nr_pages;
> +               zone->max_try_cma = 0;
> +               goto out;
> +       }
> +
> +       /*
> +        * We want to consume cma pages with well balanced ratio so that
> +        * we have consumed enough cma pages before the reclaim. For this
> +        * purpose, we can use the ratio, movable : cma. And we doesn't
> +        * want to switch too frequently, because it prevent allocated pages
> +        * from beging successive and it is bad for some sorts of devices.
> +        * I choose pageblock_nr_pages for the minimum amount of successive
> +        * allocation because it is the size of a huge page and fragmentation
> +        * avoidance is implemented based on this size.
> +        *
> +        * To meet above criteria, I derive following equation.
> +        *
> +        * if (movable > cma) then; movable : cma = X : pageblock_nr_pages
> +        * else (movable <= cma) then; movable : cma = pageblock_nr_pages : X
> +        */
> +       if (movable > cma) {
> +               zone->max_try_movable =
> +                       (movable * pageblock_nr_pages) / cma;
> +               zone->max_try_cma = pageblock_nr_pages;
> +       } else {
> +               zone->max_try_movable = pageblock_nr_pages;
> +               zone->max_try_cma = cma * pageblock_nr_pages / movable;
> +       }
> +
> +out:
> +       zone->nr_try_movable = zone->max_try_movable;
> +       zone->nr_try_cma = zone->max_try_cma;
> +
> +       spin_unlock_irqrestore(&zone->lock, flags);
> +}
> +
>  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>  void __init init_cma_reserved_pageblock(struct page *page)
>  {
> @@ -1136,6 +1186,36 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>         return NULL;
>  }
>
> +#ifdef CONFIG_CMA
> +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order)
> +{
> +       struct page *page;
> +
> +       if (zone->nr_try_movable > 0)
> +               goto alloc_movable;
> +
> +       if (zone->nr_try_cma > 0) {
> +               /* Okay. Now, we can try to allocate the page from cma region */
> +               zone->nr_try_cma -= 1 << order;
> +               page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> +
> +               /* CMA pages can vanish through CMA allocation */
> +               if (unlikely(!page && order == 0))
> +                       zone->nr_try_cma = 0;
> +
> +               return page;
> +       }
> +
> +       /* Reset counter */
> +       zone->nr_try_movable = zone->max_try_movable;
> +       zone->nr_try_cma = zone->max_try_cma;
> +
> +alloc_movable:
> +       zone->nr_try_movable -= 1 << order;
> +       return NULL;
> +}
> +#endif
> +
>  /*
>   * Do the hard work of removing an element from the buddy allocator.
>   * Call me with the zone->lock already held.
> @@ -1143,10 +1223,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  static struct page *__rmqueue(struct zone *zone, unsigned int order,
>                                                 int migratetype)
>  {
> -       struct page *page;
> +       struct page *page = NULL;
> +
> +       if (IS_ENABLED(CONFIG_CMA) &&
> +               migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
> +               page = __rmqueue_cma(zone, order);
>
>  retry_reserve:
> -       page = __rmqueue_smallest(zone, order, migratetype);
> +       if (!page)
> +               page = __rmqueue_smallest(zone, order, migratetype);
>
>         if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
>                 page = __rmqueue_fallback(zone, order, migratetype);
> @@ -4849,6 +4934,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>                 zone_seqlock_init(zone);
>                 zone->zone_pgdat = pgdat;
>                 zone_pcp_init(zone);
> +               if (IS_ENABLED(CONFIG_CMA))
> +                       zone->managed_cma_pages = 0;
>
>                 /* For bootup, initialized properly in watermark setup */
>                 mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
@ 2014-10-30 10:37     ` Hui Zhu
  0 siblings, 0 replies; 48+ messages in thread
From: Hui Zhu @ 2014-10-30 10:37 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, Aneesh Kumar K.V, linux-mm, linux-kernel

On Wed, May 28, 2014 at 3:04 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> CMA is introduced to provide physically contiguous pages at runtime.
> For this purpose, it reserves memory at boot time. Although it reserve
> memory, this reserved memory can be used for movable memory allocation
> request. This usecase is beneficial to the system that needs this CMA
> reserved memory infrequently and it is one of main purpose of
> introducing CMA.
>
> But, there is a problem in current implementation. The problem is that
> it works like as just reserved memory approach. The pages on cma reserved
> memory are hardly used for movable memory allocation. This is caused by
> combination of allocation and reclaim policy.
>
> The pages on cma reserved memory are allocated if there is no movable
> memory, that is, as fallback allocation. So the time this fallback
> allocation is started is under heavy memory pressure. Although it is under
> memory pressure, movable allocation easily succeed, since there would be
> many pages on cma reserved memory. But this is not the case for unmovable
> and reclaimable allocation, because they can't use the pages on cma
> reserved memory. These allocations regard system's free memory as
> (free pages - free cma pages) on watermark checking, that is, free
> unmovable pages + free reclaimable pages + free movable pages. Because
> we already exhausted movable pages, only free pages we have are unmovable
> and reclaimable types and this would be really small amount. So watermark
> checking would be failed. It will wake up kswapd to make enough free
> memory for unmovable and reclaimable allocation and kswapd will do.
> So before we fully utilize pages on cma reserved memory, kswapd start to
> reclaim memory and try to make free memory over the high watermark. This
> watermark checking by kswapd doesn't take care free cma pages so many
> movable pages would be reclaimed. After then, we have a lot of movable
> pages again, so fallback allocation doesn't happen again. To conclude,
> amount of free memory on meminfo which includes free CMA pages is moving
> around 512 MB if I reserve 512 MB memory for CMA.
>
> I found this problem on following experiment.
>
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           225.2           472.5
> Average-MemFree:        322490 KB       630839 KB
>
> To solve this problem, I can think following 2 possible solutions.
> 1. allocate the pages on cma reserved memory first, and if they are
>    exhausted, allocate movable pages.
> 2. interleaved allocation: try to allocate specific amounts of memory
>    from cma reserved memory and then allocate from free movable memory.
>
> I tested #1 approach and found the problem. Although free memory on
> meminfo can move around low watermark, there is large fluctuation on free
> memory, because too many pages are reclaimed when kswapd is invoked.
> Reason for this behaviour is that successive allocated CMA pages are
> on the LRU list in that order and kswapd reclaim them in same order.
> These memory doesn't help watermark checking from kwapd, so too many
> pages are reclaimed, I guess.

Could you send more information about this part?  I want to do some
test around it.
I use this way in my patch.

Thanks,
Hui

>
> So, I implement #2 approach.
> One thing I should note is that we should not change allocation target
> (movable list or cma) on each allocation attempt, since this prevent
> allocated pages to be in physically succession, so some I/O devices can
> be hurt their performance. To solve this, I keep allocation target
> in at least pageblock_nr_pages attempts and make this number reflect
> ratio, free pages without free cma pages to free cma pages. With this
> approach, system works very smoothly and fully utilize the pages on
> cma reserved memory.
>
> Following is the experimental result of this patch.
>
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
>
> <Before>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           225.2           472.5
> Average-MemFree:        322490 KB       630839 KB
> nr_free_cma:            0               131068
> pswpin:                 0               261666
> pswpout:                75              1241363
>
> <After>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           222.7           224
> Average-MemFree:        325595 KB       393033 KB
> nr_free_cma:            0               61001
> pswpin:                 0               6
> pswpout:                44              502
>
> There is no difference if we don't have cma reserved memory (0 MB case).
> But, with cma reserved memory (512 MB case), we fully utilize these
> reserved memory through this patch and the system behaves like as
> it doesn't reserve any memory.
>
> With this patch, we aggressively allocate the pages on cma reserved memory
> so latency of CMA can arise. Below is the experimental result about
> latency.
>
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> CMA reserve: 512 MB
> Backgound Workload: make -jN
> Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval
>
> N:                    1        4       8        16
> Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
> Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2
>
> So generally we can see latency increase. Ratio of this increase
> is rather big - up to 70%. But, under the heavy workload, it shows
> latency decrease - up to 55%. This may be worst-case scenario, but
> reducing it would be important for some system, so, I can say that
> this patch have advantages and disadvantages in terms of latency.
>
> Although I think that this patch is right direction for CMA, there is
> side-effect in following case. If there is small memory zone and CMA
> occupys most of them, LRU for this zone would have many CMA pages. When
> reclaim is started, these CMA pages would be reclaimed, but not counted
> for watermark checking, so too many CMA pages could be reclaimed
> unnecessarily. Until now, this can't happen because free CMA pages aren't
> used easily. But, with this patch, free CMA pages are used easily so
> this problem can be possible. I will handle it on another patchset
> after some investigating.
>
> v2: In fastpath, just replenish counters. Calculation is done whenver
>     cma area is varied
>
> Acked-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> diff --git a/arch/powerpc/kvm/book3s_hv_cma.c b/arch/powerpc/kvm/book3s_hv_cma.c
> index d9d3d85..84a7582 100644
> --- a/arch/powerpc/kvm/book3s_hv_cma.c
> +++ b/arch/powerpc/kvm/book3s_hv_cma.c
> @@ -132,6 +132,8 @@ struct page *kvm_alloc_cma(unsigned long nr_pages, unsigned long align_pages)
>                 if (ret == 0) {
>                         bitmap_set(cma->bitmap, pageno, nr_chunk);
>                         page = pfn_to_page(pfn);
> +                       adjust_managed_cma_page_count(page_zone(page),
> +                                                               nr_pages);
>                         memset(pfn_to_kaddr(pfn), 0, nr_pages << PAGE_SHIFT);
>                         break;
>                 } else if (ret != -EBUSY) {
> @@ -180,6 +182,7 @@ bool kvm_release_cma(struct page *pages, unsigned long nr_pages)
>                      (pfn - cma->base_pfn) >> (KVM_CMA_CHUNK_ORDER - PAGE_SHIFT),
>                      nr_chunk);
>         free_contig_range(pfn, nr_pages);
> +       adjust_managed_cma_page_count(page_zone(pages), nr_pages);
>         mutex_unlock(&kvm_cma_mutex);
>
>         return true;
> @@ -210,6 +213,8 @@ static int __init kvm_cma_activate_area(unsigned long base_pfn,
>                 }
>                 init_cma_reserved_pageblock(pfn_to_page(base_pfn));
>         } while (--i);
> +       adjust_managed_cma_page_count(zone, count);
> +
>         return 0;
>  }
>
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index 165c2c2..c578d5a 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -160,6 +160,7 @@ static int __init cma_activate_area(struct cma *cma)
>                 }
>                 init_cma_reserved_pageblock(pfn_to_page(base_pfn));
>         } while (--i);
> +       adjust_managed_cma_page_count(zone, cma->count);
>
>         return 0;
>  }
> @@ -307,6 +308,7 @@ struct page *dma_alloc_from_contiguous(struct device *dev, int count,
>                 if (ret == 0) {
>                         bitmap_set(cma->bitmap, pageno, count);
>                         page = pfn_to_page(pfn);
> +                       adjust_managed_cma_page_count(page_zone(page), count);
>                         break;
>                 } else if (ret != -EBUSY) {
>                         break;
> @@ -353,6 +355,7 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages,
>         mutex_lock(&cma_mutex);
>         bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
>         free_contig_range(pfn, count);
> +       adjust_managed_cma_page_count(page_zone(pages), count);
>         mutex_unlock(&cma_mutex);
>
>         return true;
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 39b81dc..51cffc1 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -415,6 +415,7 @@ extern int alloc_contig_range(unsigned long start, unsigned long end,
>  extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
>
>  /* CMA stuff */
> +extern void adjust_managed_cma_page_count(struct zone *zone, long count);
>  extern void init_cma_reserved_pageblock(struct page *page);
>
>  #endif
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fac5509..f52cb96 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -389,6 +389,20 @@ struct zone {
>         int                     compact_order_failed;
>  #endif
>
> +#ifdef CONFIG_CMA
> +       unsigned long managed_cma_pages;
> +       /*
> +        * Number of allocation attempt on each movable/cma type
> +        * without switching type. max_try(movable/cma) maintain
> +        * predefined calculated counter and replenish nr_try_(movable/cma)
> +        * with each of them whenever both of them are 0.
> +        */
> +       int nr_try_movable;
> +       int nr_try_cma;
> +       int max_try_movable;
> +       int max_try_cma;
> +#endif
> +
>         ZONE_PADDING(_pad1_)
>
>         /* Fields commonly accessed by the page reclaim scanner */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 674ade7..ca678b6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -788,6 +788,56 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
>  }
>
>  #ifdef CONFIG_CMA
> +void adjust_managed_cma_page_count(struct zone *zone, long count)
> +{
> +       unsigned long flags;
> +       long total, cma, movable;
> +
> +       spin_lock_irqsave(&zone->lock, flags);
> +       zone->managed_cma_pages += count;
> +
> +       total = zone->managed_pages;
> +       cma = zone->managed_cma_pages;
> +       movable = total - cma - high_wmark_pages(zone);
> +
> +       /* No cma pages, so do only movable allocation */
> +       if (cma <= 0) {
> +               zone->max_try_movable = pageblock_nr_pages;
> +               zone->max_try_cma = 0;
> +               goto out;
> +       }
> +
> +       /*
> +        * We want to consume cma pages with well balanced ratio so that
> +        * we have consumed enough cma pages before the reclaim. For this
> +        * purpose, we can use the ratio, movable : cma. And we doesn't
> +        * want to switch too frequently, because it prevent allocated pages
> +        * from beging successive and it is bad for some sorts of devices.
> +        * I choose pageblock_nr_pages for the minimum amount of successive
> +        * allocation because it is the size of a huge page and fragmentation
> +        * avoidance is implemented based on this size.
> +        *
> +        * To meet above criteria, I derive following equation.
> +        *
> +        * if (movable > cma) then; movable : cma = X : pageblock_nr_pages
> +        * else (movable <= cma) then; movable : cma = pageblock_nr_pages : X
> +        */
> +       if (movable > cma) {
> +               zone->max_try_movable =
> +                       (movable * pageblock_nr_pages) / cma;
> +               zone->max_try_cma = pageblock_nr_pages;
> +       } else {
> +               zone->max_try_movable = pageblock_nr_pages;
> +               zone->max_try_cma = cma * pageblock_nr_pages / movable;
> +       }
> +
> +out:
> +       zone->nr_try_movable = zone->max_try_movable;
> +       zone->nr_try_cma = zone->max_try_cma;
> +
> +       spin_unlock_irqrestore(&zone->lock, flags);
> +}
> +
>  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>  void __init init_cma_reserved_pageblock(struct page *page)
>  {
> @@ -1136,6 +1186,36 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>         return NULL;
>  }
>
> +#ifdef CONFIG_CMA
> +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order)
> +{
> +       struct page *page;
> +
> +       if (zone->nr_try_movable > 0)
> +               goto alloc_movable;
> +
> +       if (zone->nr_try_cma > 0) {
> +               /* Okay. Now, we can try to allocate the page from cma region */
> +               zone->nr_try_cma -= 1 << order;
> +               page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> +
> +               /* CMA pages can vanish through CMA allocation */
> +               if (unlikely(!page && order == 0))
> +                       zone->nr_try_cma = 0;
> +
> +               return page;
> +       }
> +
> +       /* Reset counter */
> +       zone->nr_try_movable = zone->max_try_movable;
> +       zone->nr_try_cma = zone->max_try_cma;
> +
> +alloc_movable:
> +       zone->nr_try_movable -= 1 << order;
> +       return NULL;
> +}
> +#endif
> +
>  /*
>   * Do the hard work of removing an element from the buddy allocator.
>   * Call me with the zone->lock already held.
> @@ -1143,10 +1223,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  static struct page *__rmqueue(struct zone *zone, unsigned int order,
>                                                 int migratetype)
>  {
> -       struct page *page;
> +       struct page *page = NULL;
> +
> +       if (IS_ENABLED(CONFIG_CMA) &&
> +               migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
> +               page = __rmqueue_cma(zone, order);
>
>  retry_reserve:
> -       page = __rmqueue_smallest(zone, order, migratetype);
> +       if (!page)
> +               page = __rmqueue_smallest(zone, order, migratetype);
>
>         if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
>                 page = __rmqueue_fallback(zone, order, migratetype);
> @@ -4849,6 +4934,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>                 zone_seqlock_init(zone);
>                 zone->zone_pgdat = pgdat;
>                 zone_pcp_init(zone);
> +               if (IS_ENABLED(CONFIG_CMA))
> +                       zone->managed_cma_pages = 0;
>
>                 /* For bootup, initialized properly in watermark setup */
>                 mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
       [not found]   ` <CADtm3G5Cb2vzVo61qDJ7-1ZNzQ2zOisfjb7GiFXvZR0ocKZy0A@mail.gmail.com>
@ 2015-01-06  4:01       ` Gregory Fong
  0 siblings, 0 replies; 48+ messages in thread
From: Gregory Fong @ 2015-01-06  4:01 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek, linux-mm,
	linux-kernel

+linux-mm and linux-kernel (not sure how those got removed from cc,
sorry about that)

On Mon, Jan 5, 2015 at 7:58 PM, Gregory Fong <gregory.0xf0@gmail.com> wrote:
> Hi Joonsoo,
>
> On Wed, May 28, 2014 at 12:04 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 674ade7..ca678b6 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -788,6 +788,56 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
>>  }
>>
>>  #ifdef CONFIG_CMA
>> +void adjust_managed_cma_page_count(struct zone *zone, long count)
>> +{
>> +       unsigned long flags;
>> +       long total, cma, movable;
>> +
>> +       spin_lock_irqsave(&zone->lock, flags);
>> +       zone->managed_cma_pages += count;
>> +
>> +       total = zone->managed_pages;
>> +       cma = zone->managed_cma_pages;
>> +       movable = total - cma - high_wmark_pages(zone);
>> +
>> +       /* No cma pages, so do only movable allocation */
>> +       if (cma <= 0) {
>> +               zone->max_try_movable = pageblock_nr_pages;
>> +               zone->max_try_cma = 0;
>> +               goto out;
>> +       }
>> +
>> +       /*
>> +        * We want to consume cma pages with well balanced ratio so that
>> +        * we have consumed enough cma pages before the reclaim. For this
>> +        * purpose, we can use the ratio, movable : cma. And we doesn't
>> +        * want to switch too frequently, because it prevent allocated pages
>> +        * from beging successive and it is bad for some sorts of devices.
>> +        * I choose pageblock_nr_pages for the minimum amount of successive
>> +        * allocation because it is the size of a huge page and fragmentation
>> +        * avoidance is implemented based on this size.
>> +        *
>> +        * To meet above criteria, I derive following equation.
>> +        *
>> +        * if (movable > cma) then; movable : cma = X : pageblock_nr_pages
>> +        * else (movable <= cma) then; movable : cma = pageblock_nr_pages : X
>> +        */
>> +       if (movable > cma) {
>> +               zone->max_try_movable =
>> +                       (movable * pageblock_nr_pages) / cma;
>> +               zone->max_try_cma = pageblock_nr_pages;
>> +       } else {
>> +               zone->max_try_movable = pageblock_nr_pages;
>> +               zone->max_try_cma = cma * pageblock_nr_pages / movable;
>
> I don't know if anyone's already pointed this out (didn't see anything
> when searching lkml), but while testing this, I noticed this can
> result in a div by zero under memory pressure (movable becomes 0).
> This is not unlikely when the majority of pages are in CMA regions
> (this may seem pathological but we do actually do this right now).
>
> [    0.249674] Division by zero in kernel.
> [    0.249682] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
> 3.14.13-1.3pre-00368-g4d90957-dirty #10
> [    0.249710] [<c001619c>] (unwind_backtrace) from [<c0011fa4>]
> (show_stack+0x10/0x14)
> [    0.249725] [<c0011fa4>] (show_stack) from [<c0538d6c>]
> (dump_stack+0x80/0x90)
> [    0.249740] [<c0538d6c>] (dump_stack) from [<c025e9d0>] (Ldiv0+0x8/0x10)
> [    0.249751] [<c025e9d0>] (Ldiv0) from [<c0094ba4>]
> (adjust_managed_cma_page_count+0x64/0xd8)
> [    0.249762] [<c0094ba4>] (adjust_managed_cma_page_count) from
> [<c00cb2f4>] (cma_release+0xa8/0xe0)
> [    0.249776] [<c00cb2f4>] (cma_release) from [<c0721698>]
> (cma_drvr_probe+0x378/0x470)
> [    0.249787] [<c0721698>] (cma_drvr_probe) from [<c02ce9cc>]
> (platform_drv_probe+0x18/0x48)
> [    0.249799] [<c02ce9cc>] (platform_drv_probe) from [<c02ccfb0>]
> (driver_probe_device+0xac/0x3a4)
> [    0.249808] [<c02ccfb0>] (driver_probe_device) from [<c02cd378>]
> (__driver_attach+0x8c/0x90)
> [    0.249817] [<c02cd378>] (__driver_attach) from [<c02cb390>]
> (bus_for_each_dev+0x60/0x94)
> [    0.249825] [<c02cb390>] (bus_for_each_dev) from [<c02cc674>]
> (bus_add_driver+0x15c/0x218)
> [    0.249834] [<c02cc674>] (bus_add_driver) from [<c02cd9a0>]
> (driver_register+0x78/0xf8)
> [    0.249841] [<c02cd9a0>] (driver_register) from [<c02cea24>]
> (platform_driver_probe+0x20/0xa4)
> [    0.249849] [<c02cea24>] (platform_driver_probe) from [<c0008958>]
> (do_one_initcall+0xd4/0x17c)
> [    0.249857] [<c0008958>] (do_one_initcall) from [<c0719d00>]
> (kernel_init_freeable+0x13c/0x1dc)
> [    0.249864] [<c0719d00>] (kernel_init_freeable) from [<c0534578>]
> (kernel_init+0x8/0xe8)
> [    0.249873] [<c0534578>] (kernel_init) from [<c000ed78>]
> (ret_from_fork+0x14/0x3c)
>
> Could probably just add something above similar to the "no cma pages" case, like
>
> /* No movable pages, so only do CMA allocation */
> if (movable <= 0) {
>         zone->max_try_cma = pageblock_nr_pages;
>         zone->max_try_movable = 0;
>         goto out;
> }
>
>> +       }
>> +
>> +out:
>> +       zone->nr_try_movable = zone->max_try_movable;
>> +       zone->nr_try_cma = zone->max_try_cma;
>> +
>> +       spin_unlock_irqrestore(&zone->lock, flags);
>> +}
>> +
>
> Best regards,
> Gregory

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
@ 2015-01-06  4:01       ` Gregory Fong
  0 siblings, 0 replies; 48+ messages in thread
From: Gregory Fong @ 2015-01-06  4:01 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek, linux-mm,
	linux-kernel

+linux-mm and linux-kernel (not sure how those got removed from cc,
sorry about that)

On Mon, Jan 5, 2015 at 7:58 PM, Gregory Fong <gregory.0xf0@gmail.com> wrote:
> Hi Joonsoo,
>
> On Wed, May 28, 2014 at 12:04 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 674ade7..ca678b6 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -788,6 +788,56 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
>>  }
>>
>>  #ifdef CONFIG_CMA
>> +void adjust_managed_cma_page_count(struct zone *zone, long count)
>> +{
>> +       unsigned long flags;
>> +       long total, cma, movable;
>> +
>> +       spin_lock_irqsave(&zone->lock, flags);
>> +       zone->managed_cma_pages += count;
>> +
>> +       total = zone->managed_pages;
>> +       cma = zone->managed_cma_pages;
>> +       movable = total - cma - high_wmark_pages(zone);
>> +
>> +       /* No cma pages, so do only movable allocation */
>> +       if (cma <= 0) {
>> +               zone->max_try_movable = pageblock_nr_pages;
>> +               zone->max_try_cma = 0;
>> +               goto out;
>> +       }
>> +
>> +       /*
>> +        * We want to consume cma pages with well balanced ratio so that
>> +        * we have consumed enough cma pages before the reclaim. For this
>> +        * purpose, we can use the ratio, movable : cma. And we doesn't
>> +        * want to switch too frequently, because it prevent allocated pages
>> +        * from beging successive and it is bad for some sorts of devices.
>> +        * I choose pageblock_nr_pages for the minimum amount of successive
>> +        * allocation because it is the size of a huge page and fragmentation
>> +        * avoidance is implemented based on this size.
>> +        *
>> +        * To meet above criteria, I derive following equation.
>> +        *
>> +        * if (movable > cma) then; movable : cma = X : pageblock_nr_pages
>> +        * else (movable <= cma) then; movable : cma = pageblock_nr_pages : X
>> +        */
>> +       if (movable > cma) {
>> +               zone->max_try_movable =
>> +                       (movable * pageblock_nr_pages) / cma;
>> +               zone->max_try_cma = pageblock_nr_pages;
>> +       } else {
>> +               zone->max_try_movable = pageblock_nr_pages;
>> +               zone->max_try_cma = cma * pageblock_nr_pages / movable;
>
> I don't know if anyone's already pointed this out (didn't see anything
> when searching lkml), but while testing this, I noticed this can
> result in a div by zero under memory pressure (movable becomes 0).
> This is not unlikely when the majority of pages are in CMA regions
> (this may seem pathological but we do actually do this right now).
>
> [    0.249674] Division by zero in kernel.
> [    0.249682] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
> 3.14.13-1.3pre-00368-g4d90957-dirty #10
> [    0.249710] [<c001619c>] (unwind_backtrace) from [<c0011fa4>]
> (show_stack+0x10/0x14)
> [    0.249725] [<c0011fa4>] (show_stack) from [<c0538d6c>]
> (dump_stack+0x80/0x90)
> [    0.249740] [<c0538d6c>] (dump_stack) from [<c025e9d0>] (Ldiv0+0x8/0x10)
> [    0.249751] [<c025e9d0>] (Ldiv0) from [<c0094ba4>]
> (adjust_managed_cma_page_count+0x64/0xd8)
> [    0.249762] [<c0094ba4>] (adjust_managed_cma_page_count) from
> [<c00cb2f4>] (cma_release+0xa8/0xe0)
> [    0.249776] [<c00cb2f4>] (cma_release) from [<c0721698>]
> (cma_drvr_probe+0x378/0x470)
> [    0.249787] [<c0721698>] (cma_drvr_probe) from [<c02ce9cc>]
> (platform_drv_probe+0x18/0x48)
> [    0.249799] [<c02ce9cc>] (platform_drv_probe) from [<c02ccfb0>]
> (driver_probe_device+0xac/0x3a4)
> [    0.249808] [<c02ccfb0>] (driver_probe_device) from [<c02cd378>]
> (__driver_attach+0x8c/0x90)
> [    0.249817] [<c02cd378>] (__driver_attach) from [<c02cb390>]
> (bus_for_each_dev+0x60/0x94)
> [    0.249825] [<c02cb390>] (bus_for_each_dev) from [<c02cc674>]
> (bus_add_driver+0x15c/0x218)
> [    0.249834] [<c02cc674>] (bus_add_driver) from [<c02cd9a0>]
> (driver_register+0x78/0xf8)
> [    0.249841] [<c02cd9a0>] (driver_register) from [<c02cea24>]
> (platform_driver_probe+0x20/0xa4)
> [    0.249849] [<c02cea24>] (platform_driver_probe) from [<c0008958>]
> (do_one_initcall+0xd4/0x17c)
> [    0.249857] [<c0008958>] (do_one_initcall) from [<c0719d00>]
> (kernel_init_freeable+0x13c/0x1dc)
> [    0.249864] [<c0719d00>] (kernel_init_freeable) from [<c0534578>]
> (kernel_init+0x8/0xe8)
> [    0.249873] [<c0534578>] (kernel_init) from [<c000ed78>]
> (ret_from_fork+0x14/0x3c)
>
> Could probably just add something above similar to the "no cma pages" case, like
>
> /* No movable pages, so only do CMA allocation */
> if (movable <= 0) {
>         zone->max_try_cma = pageblock_nr_pages;
>         zone->max_try_movable = 0;
>         goto out;
> }
>
>> +       }
>> +
>> +out:
>> +       zone->nr_try_movable = zone->max_try_movable;
>> +       zone->nr_try_cma = zone->max_try_cma;
>> +
>> +       spin_unlock_irqrestore(&zone->lock, flags);
>> +}
>> +
>
> Best regards,
> Gregory

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2015-01-06  4:01       ` Gregory Fong
@ 2015-01-06  8:23         ` Joonsoo Kim
  -1 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2015-01-06  8:23 UTC (permalink / raw)
  To: Gregory Fong
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek, linux-mm,
	linux-kernel

On Mon, Jan 05, 2015 at 08:01:45PM -0800, Gregory Fong wrote:
> +linux-mm and linux-kernel (not sure how those got removed from cc,
> sorry about that)
> 
> On Mon, Jan 5, 2015 at 7:58 PM, Gregory Fong <gregory.0xf0@gmail.com> wrote:
> > Hi Joonsoo,
> >
> > On Wed, May 28, 2014 at 12:04 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 674ade7..ca678b6 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -788,6 +788,56 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
> >>  }
> >>
> >>  #ifdef CONFIG_CMA
> >> +void adjust_managed_cma_page_count(struct zone *zone, long count)
> >> +{
> >> +       unsigned long flags;
> >> +       long total, cma, movable;
> >> +
> >> +       spin_lock_irqsave(&zone->lock, flags);
> >> +       zone->managed_cma_pages += count;
> >> +
> >> +       total = zone->managed_pages;
> >> +       cma = zone->managed_cma_pages;
> >> +       movable = total - cma - high_wmark_pages(zone);
> >> +
> >> +       /* No cma pages, so do only movable allocation */
> >> +       if (cma <= 0) {
> >> +               zone->max_try_movable = pageblock_nr_pages;
> >> +               zone->max_try_cma = 0;
> >> +               goto out;
> >> +       }
> >> +
> >> +       /*
> >> +        * We want to consume cma pages with well balanced ratio so that
> >> +        * we have consumed enough cma pages before the reclaim. For this
> >> +        * purpose, we can use the ratio, movable : cma. And we doesn't
> >> +        * want to switch too frequently, because it prevent allocated pages
> >> +        * from beging successive and it is bad for some sorts of devices.
> >> +        * I choose pageblock_nr_pages for the minimum amount of successive
> >> +        * allocation because it is the size of a huge page and fragmentation
> >> +        * avoidance is implemented based on this size.
> >> +        *
> >> +        * To meet above criteria, I derive following equation.
> >> +        *
> >> +        * if (movable > cma) then; movable : cma = X : pageblock_nr_pages
> >> +        * else (movable <= cma) then; movable : cma = pageblock_nr_pages : X
> >> +        */
> >> +       if (movable > cma) {
> >> +               zone->max_try_movable =
> >> +                       (movable * pageblock_nr_pages) / cma;
> >> +               zone->max_try_cma = pageblock_nr_pages;
> >> +       } else {
> >> +               zone->max_try_movable = pageblock_nr_pages;
> >> +               zone->max_try_cma = cma * pageblock_nr_pages / movable;
> >
> > I don't know if anyone's already pointed this out (didn't see anything
> > when searching lkml), but while testing this, I noticed this can
> > result in a div by zero under memory pressure (movable becomes 0).
> > This is not unlikely when the majority of pages are in CMA regions
> > (this may seem pathological but we do actually do this right now).

Hello,

Yes, you are right. Thanks for pointing this out.
I will fix it on next version.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
@ 2015-01-06  8:23         ` Joonsoo Kim
  0 siblings, 0 replies; 48+ messages in thread
From: Joonsoo Kim @ 2015-01-06  8:23 UTC (permalink / raw)
  To: Gregory Fong
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek, linux-mm,
	linux-kernel

On Mon, Jan 05, 2015 at 08:01:45PM -0800, Gregory Fong wrote:
> +linux-mm and linux-kernel (not sure how those got removed from cc,
> sorry about that)
> 
> On Mon, Jan 5, 2015 at 7:58 PM, Gregory Fong <gregory.0xf0@gmail.com> wrote:
> > Hi Joonsoo,
> >
> > On Wed, May 28, 2014 at 12:04 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 674ade7..ca678b6 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -788,6 +788,56 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
> >>  }
> >>
> >>  #ifdef CONFIG_CMA
> >> +void adjust_managed_cma_page_count(struct zone *zone, long count)
> >> +{
> >> +       unsigned long flags;
> >> +       long total, cma, movable;
> >> +
> >> +       spin_lock_irqsave(&zone->lock, flags);
> >> +       zone->managed_cma_pages += count;
> >> +
> >> +       total = zone->managed_pages;
> >> +       cma = zone->managed_cma_pages;
> >> +       movable = total - cma - high_wmark_pages(zone);
> >> +
> >> +       /* No cma pages, so do only movable allocation */
> >> +       if (cma <= 0) {
> >> +               zone->max_try_movable = pageblock_nr_pages;
> >> +               zone->max_try_cma = 0;
> >> +               goto out;
> >> +       }
> >> +
> >> +       /*
> >> +        * We want to consume cma pages with well balanced ratio so that
> >> +        * we have consumed enough cma pages before the reclaim. For this
> >> +        * purpose, we can use the ratio, movable : cma. And we doesn't
> >> +        * want to switch too frequently, because it prevent allocated pages
> >> +        * from beging successive and it is bad for some sorts of devices.
> >> +        * I choose pageblock_nr_pages for the minimum amount of successive
> >> +        * allocation because it is the size of a huge page and fragmentation
> >> +        * avoidance is implemented based on this size.
> >> +        *
> >> +        * To meet above criteria, I derive following equation.
> >> +        *
> >> +        * if (movable > cma) then; movable : cma = X : pageblock_nr_pages
> >> +        * else (movable <= cma) then; movable : cma = pageblock_nr_pages : X
> >> +        */
> >> +       if (movable > cma) {
> >> +               zone->max_try_movable =
> >> +                       (movable * pageblock_nr_pages) / cma;
> >> +               zone->max_try_cma = pageblock_nr_pages;
> >> +       } else {
> >> +               zone->max_try_movable = pageblock_nr_pages;
> >> +               zone->max_try_cma = cma * pageblock_nr_pages / movable;
> >
> > I don't know if anyone's already pointed this out (didn't see anything
> > when searching lkml), but while testing this, I noticed this can
> > result in a div by zero under memory pressure (movable becomes 0).
> > This is not unlikely when the majority of pages are in CMA regions
> > (this may seem pathological but we do actually do this right now).

Hello,

Yes, you are right. Thanks for pointing this out.
I will fix it on next version.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2015-01-06  8:23 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-28  7:04 [PATCH v2 0/3] Aggressively allocate the pages on cma reserved memory Joonsoo Kim
2014-05-28  7:04 ` Joonsoo Kim
2014-05-28  7:04 ` [PATCH v2 1/3] CMA: remove redundant retrying code in __alloc_contig_migrate_range Joonsoo Kim
2014-05-28  7:04   ` Joonsoo Kim
2014-05-28  7:04 ` [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Joonsoo Kim
2014-05-28  7:04   ` Joonsoo Kim
2014-05-29  7:24   ` Gioh Kim
2014-05-29  7:24     ` Gioh Kim
2014-05-29  7:48     ` Joonsoo Kim
2014-05-29  7:48       ` Joonsoo Kim
2014-05-29  8:09       ` Gioh Kim
2014-05-29  8:09         ` Gioh Kim
2014-05-30  0:45         ` Joonsoo Kim
2014-05-30  0:45           ` Joonsoo Kim
2014-05-31  0:02           ` Michal Nazarewicz
2014-05-31  0:02             ` Michal Nazarewicz
2014-06-02  6:17             ` Joonsoo Kim
2014-06-02  6:17               ` Joonsoo Kim
2014-05-30  7:53   ` Gioh Kim
2014-05-30  7:53     ` Gioh Kim
2014-05-30 14:23     ` Joonsoo Kim
2014-05-30 14:23       ` Joonsoo Kim
2014-06-02  5:54       ` Gioh Kim
2014-06-02  5:54         ` Gioh Kim
2014-06-02  6:23         ` Joonsoo Kim
2014-06-02  6:23           ` Joonsoo Kim
2014-06-02  7:13           ` Gioh Kim
2014-06-02  7:13             ` Gioh Kim
2014-05-31  0:11   ` Michal Nazarewicz
2014-05-31  0:11     ` Michal Nazarewicz
2014-10-30 10:37   ` Hui Zhu
2014-10-30 10:37     ` Hui Zhu
     [not found]   ` <CADtm3G5Cb2vzVo61qDJ7-1ZNzQ2zOisfjb7GiFXvZR0ocKZy0A@mail.gmail.com>
2015-01-06  4:01     ` Gregory Fong
2015-01-06  4:01       ` Gregory Fong
2015-01-06  8:23       ` Joonsoo Kim
2015-01-06  8:23         ` Joonsoo Kim
2014-05-28  7:04 ` [PATCH v2 3/3] CMA: always treat free cma pages as non-free on watermark checking Joonsoo Kim
2014-05-28  7:04   ` Joonsoo Kim
2014-05-30 10:40   ` Ritesh Harjani
2014-05-30 10:40     ` Ritesh Harjani
2014-05-30 14:46     ` Joonsoo Kim
2014-05-30 14:46       ` Joonsoo Kim
2014-06-02  4:07       ` Ritesh Harjani
2014-06-02  4:07         ` Ritesh Harjani
2014-06-02 10:47         ` Bartlomiej Zolnierkiewicz
2014-06-02 10:47           ` Bartlomiej Zolnierkiewicz
2014-06-02 14:05           ` Joonsoo Kim
2014-06-02 14:05             ` Joonsoo Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.