[RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory
@ 2014-05-08  0:32 Joonsoo Kim
  2014-05-08  0:32 ` [RFC PATCH 1/3] CMA: remove redundant retrying code in __alloc_contig_migrate_range Joonsoo Kim
                   ` (3 more replies)
  0 siblings, 4 replies; 45+ messages in thread
From: Joonsoo Kim @ 2014-05-08  0:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, linux-mm, linux-kernel

Hello,

This series tries to improve CMA.

CMA is introduced to provide physically contiguous pages at runtime
without reserving memory area. But, current implementation works like as
reserving memory approach, because allocation on cma reserved region only
occurs as fallback of migrate_movable allocation. We can allocate from it
when there is no movable page. In that situation, kswapd would be invoked
easily since unmovable and reclaimable allocation consider
(free pages - free CMA pages) as free memory on the system and free memory
may be lower than high watermark in that case. If kswapd start to reclaim
memory, then fallback allocation doesn't occur much.

In my experiment, I found that if system memory has 1024 MB memory and
has 512 MB reserved memory for CMA, kswapd is mostly invoked around
the 512MB free memory boundary. And invoked kswapd tries to make free
memory until (free pages - free CMA pages) is higher than high watermark,
so free memory on meminfo is moving around 512MB boundary consistently.

To fix this problem, we should allocate the pages on cma reserved memory
more aggressively and intelligenetly. Patch 2 implements the solution.
Patch 1 is the simple optimization which remove useless re-trial and patch 3
is for removing useless alloc flag, so these are not important.
See patch 2 for more detailed description.

This patchset is based on v3.15-rc4.

Thanks.
Joonsoo Kim (3):
  CMA: remove redundant retrying code in __alloc_contig_migrate_range
  CMA: aggressively allocate the pages on cma reserved memory when not
    used
  CMA: always treat free cma pages as non-free on watermark checking

 include/linux/mmzone.h |    6 +++
 mm/compaction.c        |    4 --
 mm/internal.h          |    3 +-
 mm/page_alloc.c        |  117 +++++++++++++++++++++++++++++++++++++++---------
 4 files changed, 102 insertions(+), 28 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC PATCH 1/3] CMA: remove redundant retrying code in __alloc_contig_migrate_range
  2014-05-08  0:32 [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Joonsoo Kim
@ 2014-05-08  0:32 ` Joonsoo Kim
  2014-05-09 15:44   ` Michal Nazarewicz
  2014-05-08  0:32 ` [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Joonsoo Kim
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 45+ messages in thread
From: Joonsoo Kim @ 2014-05-08  0:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, linux-mm, linux-kernel

We already have retry logic in migrate_pages(). It does retry 10 times.
So if we keep this retrying code in __alloc_contig_migrate_range(), we
would try to migrate some unmigratable page in 50 times. There is just one
small difference in -ENOMEM case. migrate_pages() don't do retry
in this case, however, current __alloc_contig_migrate_range() does. But,
I think that this isn't problem, because in this case, we may fail again
with same reason.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..674ade7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6185,7 +6185,6 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 	/* This function is based on compact_zone() from compaction.c. */
 	unsigned long nr_reclaimed;
 	unsigned long pfn = start;
-	unsigned int tries = 0;
 	int ret = 0;
 
 	migrate_prep();
@@ -6204,10 +6203,6 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 				ret = -EINTR;
 				break;
 			}
-			tries = 0;
-		} else if (++tries == 5) {
-			ret = ret < 0 ? ret : -EBUSY;
-			break;
 		}
 
 		nr_reclaimed = reclaim_clean_pages_from_list(cc->zone,
@@ -6216,6 +6211,10 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 
 		ret = migrate_pages(&cc->migratepages, alloc_migrate_target,
 				    0, MIGRATE_SYNC, MR_CMA);
+		if (ret) {
+			ret = ret < 0 ? ret : -EBUSY;
+			break;
+		}
 	}
 	if (ret < 0) {
 		putback_movable_pages(&cc->migratepages);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-08  0:32 [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Joonsoo Kim
  2014-05-08  0:32 ` [RFC PATCH 1/3] CMA: remove redundant retrying code in __alloc_contig_migrate_range Joonsoo Kim
@ 2014-05-08  0:32 ` Joonsoo Kim
  2014-05-09 15:45   ` Michal Nazarewicz
                     ` (3 more replies)
  2014-05-08  0:32 ` [RFC PATCH 3/3] CMA: always treat free cma pages as non-free on watermark checking Joonsoo Kim
  2014-05-09 12:39 ` [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Marek Szyprowski
  3 siblings, 4 replies; 45+ messages in thread
From: Joonsoo Kim @ 2014-05-08  0:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, linux-mm, linux-kernel

CMA is introduced to provide physically contiguous pages at runtime.
For this purpose, it reserves memory at boot time. Although it reserve
memory, this reserved memory can be used for movable memory allocation
request. This usecase is beneficial to the system that needs this CMA
reserved memory infrequently and it is one of main purpose of
introducing CMA.

But, there is a problem in current implementation. The problem is that
it works like as just reserved memory approach. The pages on cma reserved
memory are hardly used for movable memory allocation. This is caused by
combination of allocation and reclaim policy.

The pages on cma reserved memory are allocated if there is no movable
memory, that is, as fallback allocation. So the time this fallback
allocation is started is under heavy memory pressure. Although it is under
memory pressure, movable allocation easily succeed, since there would be
many pages on cma reserved memory. But this is not the case for unmovable
and reclaimable allocation, because they can't use the pages on cma
reserved memory. These allocations regard system's free memory as
(free pages - free cma pages) on watermark checking, that is, free
unmovable pages + free reclaimable pages + free movable pages. Because
we already exhausted movable pages, only free pages we have are unmovable
and reclaimable types and this would be really small amount. So watermark
checking would be failed. It will wake up kswapd to make enough free
memory for unmovable and reclaimable allocation and kswapd will do.
So before we fully utilize pages on cma reserved memory, kswapd start to
reclaim memory and try to make free memory over the high watermark. This
watermark checking by kswapd doesn't take care free cma pages so many
movable pages would be reclaimed. After then, we have a lot of movable
pages again, so fallback allocation doesn't happen again. To conclude,
amount of free memory on meminfo which includes free CMA pages is moving
around 512 MB if I reserve 512 MB memory for CMA.

I found this problem on following experiment.

4 CPUs, 1024 MB, VIRTUAL MACHINE
make -j24

CMA reserve:		0 MB		512 MB
Elapsed-time:		234.8		361.8
Average-MemFree:	283880 KB	530851 KB

To solve this problem, I can think following 2 possible solutions.
1. allocate the pages on cma reserved memory first, and if they are
   exhausted, allocate movable pages.
2. interleaved allocation: try to allocate specific amounts of memory
   from cma reserved memory and then allocate from free movable memory.

I tested #1 approach and found the problem. Although free memory on
meminfo can move around low watermark, there is large fluctuation on free
memory, because too many pages are reclaimed when kswapd is invoked.
Reason for this behaviour is that successive allocated CMA pages are
on the LRU list in that order and kswapd reclaim them in same order.
These memory doesn't help watermark checking from kwapd, so too many
pages are reclaimed, I guess.

So, I implement #2 approach.
One thing I should note is that we should not change allocation target
(movable list or cma) on each allocation attempt, since this prevent
allocated pages to be in physically succession, so some I/O devices can
be hurt their performance. To solve this, I keep allocation target
in at least pageblock_nr_pages attempts and make this number reflect
ratio, free pages without free cma pages to free cma pages. With this
approach, system works very smoothly and fully utilize the pages on
cma reserved memory.

Following is the experimental result of this patch.

4 CPUs, 1024 MB, VIRTUAL MACHINE
make -j24

<Before>
CMA reserve:            0 MB            512 MB
Elapsed-time:           234.8           361.8
Average-MemFree:        283880 KB       530851 KB
pswpin:                 7               110064
pswpout:                452             767502

<After>
CMA reserve:            0 MB            512 MB
Elapsed-time:           234.2           235.6
Average-MemFree:        281651 KB       290227 KB
pswpin:                 8               8
pswpout:                430             510

There is no difference if we don't have cma reserved memory (0 MB case).
But, with cma reserved memory (512 MB case), we fully utilize these
reserved memory through this patch and the system behaves like as
it doesn't reserve any memory.

With this patch, we aggressively allocate the pages on cma reserved memory
so latency of CMA can arise. Below is the experimental result about
latency.

4 CPUs, 1024 MB, VIRTUAL MACHINE
CMA reserve: 512 MB
Backgound Workload: make -jN
Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval

N:                    1        4       8        16
Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2

So generally we can see latency increase. Ratio of this increase
is rather big - up to 70%. But, under the heavy workload, it shows
latency decrease - up to 55%. This may be worst-case scenario, but
reducing it would be important for some system, so, I can say that
this patch have advantages and disadvantages in terms of latency.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fac5509..3ff24d4 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -389,6 +389,12 @@ struct zone {
 	int			compact_order_failed;
 #endif
 
+#ifdef CONFIG_CMA
+	int has_cma;
+	int nr_try_cma;
+	int nr_try_movable;
+#endif
+
 	ZONE_PADDING(_pad1_)
 
 	/* Fields commonly accessed by the page reclaim scanner */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 674ade7..6f2b27b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
 }
 
 #ifdef CONFIG_CMA
+void __init init_alloc_ratio_counter(struct zone *zone)
+{
+	if (zone->has_cma)
+		return;
+
+	zone->has_cma = 1;
+	zone->nr_try_movable = 0;
+	zone->nr_try_cma = 0;
+}
+
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
 {
@@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
 	set_pageblock_migratetype(page, MIGRATE_CMA);
 	__free_pages(page, pageblock_order);
 	adjust_managed_page_count(page, pageblock_nr_pages);
+	init_alloc_ratio_counter(page_zone(page));
 }
 #endif
 
@@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	return NULL;
 }
 
+#ifdef CONFIG_CMA
+static struct page *__rmqueue_cma(struct zone *zone, unsigned int order,
+						int migratetype)
+{
+	long free, free_cma, free_wmark;
+	struct page *page;
+
+	if (migratetype != MIGRATE_MOVABLE || !zone->has_cma)
+		return NULL;
+
+	if (zone->nr_try_movable)
+		goto alloc_movable;
+
+alloc_cma:
+	if (zone->nr_try_cma) {
+		/* Okay. Now, we can try to allocate the page from cma region */
+		zone->nr_try_cma--;
+		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
+
+		/* CMA pages can vanish through CMA allocation */
+		if (unlikely(!page && order == 0))
+			zone->nr_try_cma = 0;
+
+		return page;
+	}
+
+	/* Reset ratio counter */
+	free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES);
+
+	/* No cma free pages, so recharge only movable allocation */
+	if (free_cma <= 0) {
+		zone->nr_try_movable = pageblock_nr_pages;
+		goto alloc_movable;
+	}
+
+	free = zone_page_state(zone, NR_FREE_PAGES);
+	free_wmark = free - free_cma - high_wmark_pages(zone);
+
+	/*
+	 * free_wmark is below than 0, and it means that normal pages
+	 * are under the pressure, so we recharge only cma allocation.
+	 */
+	if (free_wmark <= 0) {
+		zone->nr_try_cma = pageblock_nr_pages;
+		goto alloc_cma;
+	}
+
+	if (free_wmark > free_cma) {
+		zone->nr_try_movable =
+			(free_wmark * pageblock_nr_pages) / free_cma;
+		zone->nr_try_cma = pageblock_nr_pages;
+	} else {
+		zone->nr_try_movable = pageblock_nr_pages;
+		zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark;
+	}
+
+	/* Reset complete, start on movable first */
+alloc_movable:
+	zone->nr_try_movable--;
+	return NULL;
+}
+#endif
+
 /*
  * Do the hard work of removing an element from the buddy allocator.
  * Call me with the zone->lock already held.
@@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 static struct page *__rmqueue(struct zone *zone, unsigned int order,
 						int migratetype)
 {
-	struct page *page;
+	struct page *page = NULL;
+
+	if (IS_ENABLED(CONFIG_CMA))
+		page = __rmqueue_cma(zone, order, migratetype);
 
 retry_reserve:
-	page = __rmqueue_smallest(zone, order, migratetype);
+	if (!page)
+		page = __rmqueue_smallest(zone, order, migratetype);
 
 	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
 		page = __rmqueue_fallback(zone, order, migratetype);
@@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 		zone_seqlock_init(zone);
 		zone->zone_pgdat = pgdat;
 		zone_pcp_init(zone);
+		if (IS_ENABLED(CONFIG_CMA))
+			zone->has_cma = 0;
 
 		/* For bootup, initialized properly in watermark setup */
 		mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH 3/3] CMA: always treat free cma pages as non-free on watermark checking
  2014-05-08  0:32 [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Joonsoo Kim
  2014-05-08  0:32 ` [RFC PATCH 1/3] CMA: remove redundant retrying code in __alloc_contig_migrate_range Joonsoo Kim
  2014-05-08  0:32 ` [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Joonsoo Kim
@ 2014-05-08  0:32 ` Joonsoo Kim
  2014-05-09 15:46   ` Michal Nazarewicz
  2014-05-09 12:39 ` [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Marek Szyprowski
  3 siblings, 1 reply; 45+ messages in thread
From: Joonsoo Kim @ 2014-05-08  0:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, linux-mm, linux-kernel

commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag
for alloc flag and treats free cma pages as free pages if this flag is
passed to watermark checking. Intention of that patch is that movable page
allocation can be be handled from cma reserved region without starting
kswapd. Now, previous patch changes the behaviour of allocator that
movable allocation uses the page on cma reserved region aggressively,
so this watermark hack isn't needed anymore. Therefore remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/compaction.c b/mm/compaction.c
index 627dc2e..36e2fcd 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1117,10 +1117,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 
 	count_compact_event(COMPACTSTALL);
 
-#ifdef CONFIG_CMA
-	if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
-#endif
 	/* Compact each zone in the list */
 	for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
 								nodemask) {
diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..a121762 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -384,7 +384,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_HARDER		0x10 /* try to alloc harder */
 #define ALLOC_HIGH		0x20 /* __GFP_HIGH set */
 #define ALLOC_CPUSET		0x40 /* check for correct cpuset */
-#define ALLOC_CMA		0x80 /* allow allocations from CMA areas */
-#define ALLOC_FAIR		0x100 /* fair zone allocation */
+#define ALLOC_FAIR		0x80 /* fair zone allocation */
 
 #endif	/* __MM_INTERNAL_H */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6f2b27b..6af2fa1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1757,20 +1757,22 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark,
 	long min = mark;
 	long lowmem_reserve = z->lowmem_reserve[classzone_idx];
 	int o;
-	long free_cma = 0;
 
 	free_pages -= (1 << order) - 1;
 	if (alloc_flags & ALLOC_HIGH)
 		min -= min / 2;
 	if (alloc_flags & ALLOC_HARDER)
 		min -= min / 4;
-#ifdef CONFIG_CMA
-	/* If allocation can't use CMA areas don't use free CMA pages */
-	if (!(alloc_flags & ALLOC_CMA))
-		free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
-#endif
+	/*
+	 * We don't want to regard the pages on CMA region as free
+	 * on watermark checking, since they cannot be used for
+	 * unmovable/reclaimable allocation and they can suddenly
+	 * vanish through CMA allocation
+	 */
+	if (IS_ENABLED(CONFIG_CMA) && z->has_cma)
+		free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
 
-	if (free_pages - free_cma <= min + lowmem_reserve)
+	if (free_pages <= min + lowmem_reserve)
 		return false;
 	for (o = 0; o < order; o++) {
 		/* At the next order, this order's pages become unavailable */
@@ -2538,10 +2540,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 				 unlikely(test_thread_flag(TIF_MEMDIE))))
 			alloc_flags |= ALLOC_NO_WATERMARKS;
 	}
-#ifdef CONFIG_CMA
-	if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
-#endif
 	return alloc_flags;
 }
 
@@ -2811,10 +2809,6 @@ retry_cpuset:
 	if (!preferred_zone)
 		goto out;
 
-#ifdef CONFIG_CMA
-	if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
-#endif
 retry:
 	/* First allocation attempt */
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory
  2014-05-08  0:32 [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Joonsoo Kim
                   ` (2 preceding siblings ...)
  2014-05-08  0:32 ` [RFC PATCH 3/3] CMA: always treat free cma pages as non-free on watermark checking Joonsoo Kim
@ 2014-05-09 12:39 ` Marek Szyprowski
  2014-05-13  2:26   ` Joonsoo Kim
  3 siblings, 1 reply; 45+ messages in thread
From: Marek Szyprowski @ 2014-05-09 12:39 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Laura Abbott,
	Minchan Kim, Heesub Shin, Michal Nazarewicz, linux-mm,
	linux-kernel, Kyungmin Park, Bartlomiej Zolnierkiewicz,
	'Tomasz Stanislawski'

Hello,

On 2014-05-08 02:32, Joonsoo Kim wrote:
> This series tries to improve CMA.
>
> CMA is introduced to provide physically contiguous pages at runtime
> without reserving memory area. But, current implementation works like as
> reserving memory approach, because allocation on cma reserved region only
> occurs as fallback of migrate_movable allocation. We can allocate from it
> when there is no movable page. In that situation, kswapd would be invoked
> easily since unmovable and reclaimable allocation consider
> (free pages - free CMA pages) as free memory on the system and free memory
> may be lower than high watermark in that case. If kswapd start to reclaim
> memory, then fallback allocation doesn't occur much.
>
> In my experiment, I found that if system memory has 1024 MB memory and
> has 512 MB reserved memory for CMA, kswapd is mostly invoked around
> the 512MB free memory boundary. And invoked kswapd tries to make free
> memory until (free pages - free CMA pages) is higher than high watermark,
> so free memory on meminfo is moving around 512MB boundary consistently.
>
> To fix this problem, we should allocate the pages on cma reserved memory
> more aggressively and intelligenetly. Patch 2 implements the solution.
> Patch 1 is the simple optimization which remove useless re-trial and patch 3
> is for removing useless alloc flag, so these are not important.
> See patch 2 for more detailed description.
>
> This patchset is based on v3.15-rc4.

Thanks for posting those patches. It basically reminds me the following 
discussion:
http://thread.gmane.org/gmane.linux.kernel/1391989/focus=1399524

Your approach is basically the same. I hope that your patches can be 
improved
in such a way that they will be accepted by mm maintainers. I only 
wonder if the
third patch is really necessary. Without it kswapd wakeup might be still 
avoided
in some cases.

> Thanks.
> Joonsoo Kim (3):
>    CMA: remove redundant retrying code in __alloc_contig_migrate_range
>    CMA: aggressively allocate the pages on cma reserved memory when not
>      used
>    CMA: always treat free cma pages as non-free on watermark checking
>
>   include/linux/mmzone.h |    6 +++
>   mm/compaction.c        |    4 --
>   mm/internal.h          |    3 +-
>   mm/page_alloc.c        |  117 +++++++++++++++++++++++++++++++++++++++---------
>   4 files changed, 102 insertions(+), 28 deletions(-)
>

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 1/3] CMA: remove redundant retrying code in __alloc_contig_migrate_range
  2014-05-08  0:32 ` [RFC PATCH 1/3] CMA: remove redundant retrying code in __alloc_contig_migrate_range Joonsoo Kim
@ 2014-05-09 15:44   ` Michal Nazarewicz
  0 siblings, 0 replies; 45+ messages in thread
From: Michal Nazarewicz @ 2014-05-09 15:44 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2114 bytes --]

On Wed, May 07 2014, Joonsoo Kim wrote:
> We already have retry logic in migrate_pages(). It does retry 10 times.
> So if we keep this retrying code in __alloc_contig_migrate_range(), we
> would try to migrate some unmigratable page in 50 times. There is just one
> small difference in -ENOMEM case. migrate_pages() don't do retry
> in this case, however, current __alloc_contig_migrate_range() does. But,
> I think that this isn't problem, because in this case, we may fail again
> with same reason.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

I think there was a reason for the retries in
__alloc_contig_migrate_range but perhaps those are no longer valid.

Acked-by: Michal Nazarewicz <mina86@mina86.com>

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5dba293..674ade7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6185,7 +6185,6 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>  	/* This function is based on compact_zone() from compaction.c. */
>  	unsigned long nr_reclaimed;
>  	unsigned long pfn = start;
> -	unsigned int tries = 0;
>  	int ret = 0;
>  
>  	migrate_prep();
> @@ -6204,10 +6203,6 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>  				ret = -EINTR;
>  				break;
>  			}
> -			tries = 0;
> -		} else if (++tries == 5) {
> -			ret = ret < 0 ? ret : -EBUSY;
> -			break;
>  		}
>  
>  		nr_reclaimed = reclaim_clean_pages_from_list(cc->zone,
> @@ -6216,6 +6211,10 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>  
>  		ret = migrate_pages(&cc->migratepages, alloc_migrate_target,
>  				    0, MIGRATE_SYNC, MR_CMA);
> +		if (ret) {
> +			ret = ret < 0 ? ret : -EBUSY;
> +			break;
> +		}
>  	}
>  	if (ret < 0) {
>  		putback_movable_pages(&cc->migratepages);

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

[-- Attachment #2.1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-08  0:32 ` [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Joonsoo Kim
@ 2014-05-09 15:45   ` Michal Nazarewicz
  2014-05-12 17:04   ` Laura Abbott
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: Michal Nazarewicz @ 2014-05-09 15:45 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 531 bytes --]

On Wed, May 07 2014, Joonsoo Kim wrote:
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

The code looks good to me, but I don't feel competent on whether the
approach is beneficial or not.  Still:

Acked-by: Michal Nazarewicz <mina86@mina86.com>


-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

[-- Attachment #2.1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 3/3] CMA: always treat free cma pages as non-free on watermark checking
  2014-05-08  0:32 ` [RFC PATCH 3/3] CMA: always treat free cma pages as non-free on watermark checking Joonsoo Kim
@ 2014-05-09 15:46   ` Michal Nazarewicz
  0 siblings, 0 replies; 45+ messages in thread
From: Michal Nazarewicz @ 2014-05-09 15:46 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 937 bytes --]

On Wed, May 07 2014, Joonsoo Kim wrote:
> commit d95ea5d1('cma: fix watermark checking') introduces ALLOC_CMA flag
> for alloc flag and treats free cma pages as free pages if this flag is
> passed to watermark checking. Intention of that patch is that movable page
> allocation can be be handled from cma reserved region without starting
> kswapd. Now, previous patch changes the behaviour of allocator that
> movable allocation uses the page on cma reserved region aggressively,
> so this watermark hack isn't needed anymore. Therefore remove it.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>


-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

[-- Attachment #2.1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-08  0:32 ` [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Joonsoo Kim
  2014-05-09 15:45   ` Michal Nazarewicz
@ 2014-05-12 17:04   ` Laura Abbott
  2014-05-13  1:14     ` Joonsoo Kim
                       ` (2 more replies)
  2014-05-13  3:00   ` Minchan Kim
  2014-05-14  8:42   ` [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Aneesh Kumar K.V
  3 siblings, 3 replies; 45+ messages in thread
From: Laura Abbott @ 2014-05-12 17:04 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Minchan Kim,
	Heesub Shin, Marek Szyprowski, Michal Nazarewicz, linux-mm,
	linux-kernel

Hi,

On 5/7/2014 5:32 PM, Joonsoo Kim wrote:
> CMA is introduced to provide physically contiguous pages at runtime.
> For this purpose, it reserves memory at boot time. Although it reserve
> memory, this reserved memory can be used for movable memory allocation
> request. This usecase is beneficial to the system that needs this CMA
> reserved memory infrequently and it is one of main purpose of
> introducing CMA.
> 
> But, there is a problem in current implementation. The problem is that
> it works like as just reserved memory approach. The pages on cma reserved
> memory are hardly used for movable memory allocation. This is caused by
> combination of allocation and reclaim policy.
> 
> The pages on cma reserved memory are allocated if there is no movable
> memory, that is, as fallback allocation. So the time this fallback
> allocation is started is under heavy memory pressure. Although it is under
> memory pressure, movable allocation easily succeed, since there would be
> many pages on cma reserved memory. But this is not the case for unmovable
> and reclaimable allocation, because they can't use the pages on cma
> reserved memory. These allocations regard system's free memory as
> (free pages - free cma pages) on watermark checking, that is, free
> unmovable pages + free reclaimable pages + free movable pages. Because
> we already exhausted movable pages, only free pages we have are unmovable
> and reclaimable types and this would be really small amount. So watermark
> checking would be failed. It will wake up kswapd to make enough free
> memory for unmovable and reclaimable allocation and kswapd will do.
> So before we fully utilize pages on cma reserved memory, kswapd start to
> reclaim memory and try to make free memory over the high watermark. This
> watermark checking by kswapd doesn't take care free cma pages so many
> movable pages would be reclaimed. After then, we have a lot of movable
> pages again, so fallback allocation doesn't happen again. To conclude,
> amount of free memory on meminfo which includes free CMA pages is moving
> around 512 MB if I reserve 512 MB memory for CMA.
> 
> I found this problem on following experiment.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j24
> 
> CMA reserve:		0 MB		512 MB
> Elapsed-time:		234.8		361.8
> Average-MemFree:	283880 KB	530851 KB
> 
> To solve this problem, I can think following 2 possible solutions.
> 1. allocate the pages on cma reserved memory first, and if they are
>    exhausted, allocate movable pages.
> 2. interleaved allocation: try to allocate specific amounts of memory
>    from cma reserved memory and then allocate from free movable memory.
> 
> I tested #1 approach and found the problem. Although free memory on
> meminfo can move around low watermark, there is large fluctuation on free
> memory, because too many pages are reclaimed when kswapd is invoked.
> Reason for this behaviour is that successive allocated CMA pages are
> on the LRU list in that order and kswapd reclaim them in same order.
> These memory doesn't help watermark checking from kwapd, so too many
> pages are reclaimed, I guess.
> 

We have an out of tree implementation of #1 and so far it's worked for us
although we weren't looking at the same metrics. I don't completely
understand the issue you pointed out with #1. It sounds like the issue is
that CMA pages are already in use by other processes and on LRU lists and
because the pages are on LRU lists these aren't counted towards the
watermark by kswapd. Is my understanding correct?

> So, I implement #2 approach.
> One thing I should note is that we should not change allocation target
> (movable list or cma) on each allocation attempt, since this prevent
> allocated pages to be in physically succession, so some I/O devices can
> be hurt their performance. To solve this, I keep allocation target
> in at least pageblock_nr_pages attempts and make this number reflect
> ratio, free pages without free cma pages to free cma pages. With this
> approach, system works very smoothly and fully utilize the pages on
> cma reserved memory.
> 
> Following is the experimental result of this patch.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j24
> 
> <Before>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           234.8           361.8
> Average-MemFree:        283880 KB       530851 KB
> pswpin:                 7               110064
> pswpout:                452             767502
> 
> <After>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           234.2           235.6
> Average-MemFree:        281651 KB       290227 KB
> pswpin:                 8               8
> pswpout:                430             510
> 
> There is no difference if we don't have cma reserved memory (0 MB case).
> But, with cma reserved memory (512 MB case), we fully utilize these
> reserved memory through this patch and the system behaves like as
> it doesn't reserve any memory.

What metric are you using to determine all CMA memory was fully used?
We've been checking /proc/pagetypeinfo

> 
> With this patch, we aggressively allocate the pages on cma reserved memory
> so latency of CMA can arise. Below is the experimental result about
> latency.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> CMA reserve: 512 MB
> Backgound Workload: make -jN
> Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval
> 
> N:                    1        4       8        16
> Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
> Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2
> 
> So generally we can see latency increase. Ratio of this increase
> is rather big - up to 70%. But, under the heavy workload, it shows
> latency decrease - up to 55%. This may be worst-case scenario, but
> reducing it would be important for some system, so, I can say that
> this patch have advantages and disadvantages in terms of latency.
> 

Do you have any statistics related to failed migration from this? Latency
and utilization are issues but so is migration success. In the past we've
found that an increase in CMA utilization was related to increase in CMA
migration failures because pages were unmigratable. The current
workaround for this is limiting CMA pages to be used for user processes
only and not the file cache. Both of these have their own problems.

> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fac5509..3ff24d4 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -389,6 +389,12 @@ struct zone {
>  	int			compact_order_failed;
>  #endif
>  
> +#ifdef CONFIG_CMA
> +	int has_cma;
> +	int nr_try_cma;
> +	int nr_try_movable;
> +#endif
> +
>  	ZONE_PADDING(_pad1_)
>  
>  	/* Fields commonly accessed by the page reclaim scanner */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 674ade7..6f2b27b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
>  }
>  
>  #ifdef CONFIG_CMA
> +void __init init_alloc_ratio_counter(struct zone *zone)
> +{
> +	if (zone->has_cma)
> +		return;
> +
> +	zone->has_cma = 1;
> +	zone->nr_try_movable = 0;
> +	zone->nr_try_cma = 0;
> +}
> +
>  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>  void __init init_cma_reserved_pageblock(struct page *page)
>  {
> @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  	set_pageblock_migratetype(page, MIGRATE_CMA);
>  	__free_pages(page, pageblock_order);
>  	adjust_managed_page_count(page, pageblock_nr_pages);
> +	init_alloc_ratio_counter(page_zone(page));
>  }
>  #endif
>  
> @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  	return NULL;
>  }
>  
> +#ifdef CONFIG_CMA
> +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order,
> +						int migratetype)
> +{
> +	long free, free_cma, free_wmark;
> +	struct page *page;
> +
> +	if (migratetype != MIGRATE_MOVABLE || !zone->has_cma)
> +		return NULL;
> +
> +	if (zone->nr_try_movable)
> +		goto alloc_movable;
> +
> +alloc_cma:
> +	if (zone->nr_try_cma) {
> +		/* Okay. Now, we can try to allocate the page from cma region */
> +		zone->nr_try_cma--;
> +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> +
> +		/* CMA pages can vanish through CMA allocation */
> +		if (unlikely(!page && order == 0))
> +			zone->nr_try_cma = 0;
> +
> +		return page;
> +	}
> +
> +	/* Reset ratio counter */
> +	free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES);
> +
> +	/* No cma free pages, so recharge only movable allocation */
> +	if (free_cma <= 0) {
> +		zone->nr_try_movable = pageblock_nr_pages;
> +		goto alloc_movable;
> +	}
> +
> +	free = zone_page_state(zone, NR_FREE_PAGES);
> +	free_wmark = free - free_cma - high_wmark_pages(zone);
> +
> +	/*
> +	 * free_wmark is below than 0, and it means that normal pages
> +	 * are under the pressure, so we recharge only cma allocation.
> +	 */
> +	if (free_wmark <= 0) {
> +		zone->nr_try_cma = pageblock_nr_pages;
> +		goto alloc_cma;
> +	}
> +
> +	if (free_wmark > free_cma) {
> +		zone->nr_try_movable =
> +			(free_wmark * pageblock_nr_pages) / free_cma;
> +		zone->nr_try_cma = pageblock_nr_pages;
> +	} else {
> +		zone->nr_try_movable = pageblock_nr_pages;
> +		zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark;
> +	}
> +
> +	/* Reset complete, start on movable first */
> +alloc_movable:
> +	zone->nr_try_movable--;
> +	return NULL;
> +}
> +#endif
> +
>  /*
>   * Do the hard work of removing an element from the buddy allocator.
>   * Call me with the zone->lock already held.
> @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  static struct page *__rmqueue(struct zone *zone, unsigned int order,
>  						int migratetype)
>  {
> -	struct page *page;
> +	struct page *page = NULL;
> +
> +	if (IS_ENABLED(CONFIG_CMA))
> +		page = __rmqueue_cma(zone, order, migratetype);
>  
>  retry_reserve:
> -	page = __rmqueue_smallest(zone, order, migratetype);
> +	if (!page)
> +		page = __rmqueue_smallest(zone, order, migratetype);
>  
>  	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
>  		page = __rmqueue_fallback(zone, order, migratetype);
> @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>  		zone_seqlock_init(zone);
>  		zone->zone_pgdat = pgdat;
>  		zone_pcp_init(zone);
> +		if (IS_ENABLED(CONFIG_CMA))
> +			zone->has_cma = 0;
>  
>  		/* For bootup, initialized properly in watermark setup */
>  		mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
> 

I'm going to see about running this through tests internally for comparison.
Hopefully I'll get useful results in a day or so.

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-12 17:04   ` Laura Abbott
@ 2014-05-13  1:14     ` Joonsoo Kim
  2014-05-13  3:05     ` Minchan Kim
  2014-05-24  0:57     ` Laura Abbott
  2 siblings, 0 replies; 45+ messages in thread
From: Joonsoo Kim @ 2014-05-13  1:14 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Minchan Kim, Heesub Shin, Marek Szyprowski, Michal Nazarewicz,
	linux-mm, linux-kernel

On Mon, May 12, 2014 at 10:04:29AM -0700, Laura Abbott wrote:
> Hi,
> 
> On 5/7/2014 5:32 PM, Joonsoo Kim wrote:
> > CMA is introduced to provide physically contiguous pages at runtime.
> > For this purpose, it reserves memory at boot time. Although it reserve
> > memory, this reserved memory can be used for movable memory allocation
> > request. This usecase is beneficial to the system that needs this CMA
> > reserved memory infrequently and it is one of main purpose of
> > introducing CMA.
> > 
> > But, there is a problem in current implementation. The problem is that
> > it works like as just reserved memory approach. The pages on cma reserved
> > memory are hardly used for movable memory allocation. This is caused by
> > combination of allocation and reclaim policy.
> > 
> > The pages on cma reserved memory are allocated if there is no movable
> > memory, that is, as fallback allocation. So the time this fallback
> > allocation is started is under heavy memory pressure. Although it is under
> > memory pressure, movable allocation easily succeed, since there would be
> > many pages on cma reserved memory. But this is not the case for unmovable
> > and reclaimable allocation, because they can't use the pages on cma
> > reserved memory. These allocations regard system's free memory as
> > (free pages - free cma pages) on watermark checking, that is, free
> > unmovable pages + free reclaimable pages + free movable pages. Because
> > we already exhausted movable pages, only free pages we have are unmovable
> > and reclaimable types and this would be really small amount. So watermark
> > checking would be failed. It will wake up kswapd to make enough free
> > memory for unmovable and reclaimable allocation and kswapd will do.
> > So before we fully utilize pages on cma reserved memory, kswapd start to
> > reclaim memory and try to make free memory over the high watermark. This
> > watermark checking by kswapd doesn't take care free cma pages so many
> > movable pages would be reclaimed. After then, we have a lot of movable
> > pages again, so fallback allocation doesn't happen again. To conclude,
> > amount of free memory on meminfo which includes free CMA pages is moving
> > around 512 MB if I reserve 512 MB memory for CMA.
> > 
> > I found this problem on following experiment.
> > 
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > make -j24
> > 
> > CMA reserve:		0 MB		512 MB
> > Elapsed-time:		234.8		361.8
> > Average-MemFree:	283880 KB	530851 KB
> > 
> > To solve this problem, I can think following 2 possible solutions.
> > 1. allocate the pages on cma reserved memory first, and if they are
> >    exhausted, allocate movable pages.
> > 2. interleaved allocation: try to allocate specific amounts of memory
> >    from cma reserved memory and then allocate from free movable memory.
> > 
> > I tested #1 approach and found the problem. Although free memory on
> > meminfo can move around low watermark, there is large fluctuation on free
> > memory, because too many pages are reclaimed when kswapd is invoked.
> > Reason for this behaviour is that successive allocated CMA pages are
> > on the LRU list in that order and kswapd reclaim them in same order.
> > These memory doesn't help watermark checking from kwapd, so too many
> > pages are reclaimed, I guess.
> > 
> 
> We have an out of tree implementation of #1 and so far it's worked for us
> although we weren't looking at the same metrics. I don't completely
> understand the issue you pointed out with #1. It sounds like the issue is
> that CMA pages are already in use by other processes and on LRU lists and
> because the pages are on LRU lists these aren't counted towards the
> watermark by kswapd. Is my understanding correct?

Hello,

Yes, your understanding is correct.
kswapd want to reclaim normal (not CMA) pages, but LRU lists could
have a lot of CMA pages continuously by #1 approach, so watermark
aren't restored easily.


> 
> > So, I implement #2 approach.
> > One thing I should note is that we should not change allocation target
> > (movable list or cma) on each allocation attempt, since this prevent
> > allocated pages to be in physically succession, so some I/O devices can
> > be hurt their performance. To solve this, I keep allocation target
> > in at least pageblock_nr_pages attempts and make this number reflect
> > ratio, free pages without free cma pages to free cma pages. With this
> > approach, system works very smoothly and fully utilize the pages on
> > cma reserved memory.
> > 
> > Following is the experimental result of this patch.
> > 
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > make -j24
> > 
> > <Before>
> > CMA reserve:            0 MB            512 MB
> > Elapsed-time:           234.8           361.8
> > Average-MemFree:        283880 KB       530851 KB
> > pswpin:                 7               110064
> > pswpout:                452             767502
> > 
> > <After>
> > CMA reserve:            0 MB            512 MB
> > Elapsed-time:           234.2           235.6
> > Average-MemFree:        281651 KB       290227 KB
> > pswpin:                 8               8
> > pswpout:                430             510
> > 
> > There is no difference if we don't have cma reserved memory (0 MB case).
> > But, with cma reserved memory (512 MB case), we fully utilize these
> > reserved memory through this patch and the system behaves like as
> > it doesn't reserve any memory.
> 
> What metric are you using to determine all CMA memory was fully used?
> We've been checking /proc/pagetypeinfo

In this result, we can check whether CMA memory was used more or not
by MemFree stat.
I used /proc/zoneinfo to get an insight.

> > 
> > With this patch, we aggressively allocate the pages on cma reserved memory
> > so latency of CMA can arise. Below is the experimental result about
> > latency.
> > 
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > CMA reserve: 512 MB
> > Backgound Workload: make -jN
> > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval
> > 
> > N:                    1        4       8        16
> > Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
> > Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2
> > 
> > So generally we can see latency increase. Ratio of this increase
> > is rather big - up to 70%. But, under the heavy workload, it shows
> > latency decrease - up to 55%. This may be worst-case scenario, but
> > reducing it would be important for some system, so, I can say that
> > this patch have advantages and disadvantages in terms of latency.
> > 
> 
> Do you have any statistics related to failed migration from this? Latency
> and utilization are issues but so is migration success. In the past we've
> found that an increase in CMA utilization was related to increase in CMA
> migration failures because pages were unmigratable. The current
> workaround for this is limiting CMA pages to be used for user processes
> only and not the file cache. Both of these have their own problems.

I have the retrying number when doing 8 MB CMA allocation 20 times.
These number are average of 5 runs.

N:                    1        4       8        16
Retrying(Before):     0        0       0.6      12.2 
Retrying(After):      1.4      1.8     3        3.6

If you know any permanent failure case with file cache pages, please
let me know.

What I already know CMA migration failure about file cache pages is
the problems related to buffer_head lru, which you mentioned before.

> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index fac5509..3ff24d4 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -389,6 +389,12 @@ struct zone {
> >  	int			compact_order_failed;
> >  #endif
> >  
> > +#ifdef CONFIG_CMA
> > +	int has_cma;
> > +	int nr_try_cma;
> > +	int nr_try_movable;
> > +#endif
> > +
> >  	ZONE_PADDING(_pad1_)
> >  
> >  	/* Fields commonly accessed by the page reclaim scanner */
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 674ade7..6f2b27b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
> >  }
> >  
> >  #ifdef CONFIG_CMA
> > +void __init init_alloc_ratio_counter(struct zone *zone)
> > +{
> > +	if (zone->has_cma)
> > +		return;
> > +
> > +	zone->has_cma = 1;
> > +	zone->nr_try_movable = 0;
> > +	zone->nr_try_cma = 0;
> > +}
> > +
> >  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> >  void __init init_cma_reserved_pageblock(struct page *page)
> >  {
> > @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
> >  	set_pageblock_migratetype(page, MIGRATE_CMA);
> >  	__free_pages(page, pageblock_order);
> >  	adjust_managed_page_count(page, pageblock_nr_pages);
> > +	init_alloc_ratio_counter(page_zone(page));
> >  }
> >  #endif
> >  
> > @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> >  	return NULL;
> >  }
> >  
> > +#ifdef CONFIG_CMA
> > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order,
> > +						int migratetype)
> > +{
> > +	long free, free_cma, free_wmark;
> > +	struct page *page;
> > +
> > +	if (migratetype != MIGRATE_MOVABLE || !zone->has_cma)
> > +		return NULL;
> > +
> > +	if (zone->nr_try_movable)
> > +		goto alloc_movable;
> > +
> > +alloc_cma:
> > +	if (zone->nr_try_cma) {
> > +		/* Okay. Now, we can try to allocate the page from cma region */
> > +		zone->nr_try_cma--;
> > +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> > +
> > +		/* CMA pages can vanish through CMA allocation */
> > +		if (unlikely(!page && order == 0))
> > +			zone->nr_try_cma = 0;
> > +
> > +		return page;
> > +	}
> > +
> > +	/* Reset ratio counter */
> > +	free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES);
> > +
> > +	/* No cma free pages, so recharge only movable allocation */
> > +	if (free_cma <= 0) {
> > +		zone->nr_try_movable = pageblock_nr_pages;
> > +		goto alloc_movable;
> > +	}
> > +
> > +	free = zone_page_state(zone, NR_FREE_PAGES);
> > +	free_wmark = free - free_cma - high_wmark_pages(zone);
> > +
> > +	/*
> > +	 * free_wmark is below than 0, and it means that normal pages
> > +	 * are under the pressure, so we recharge only cma allocation.
> > +	 */
> > +	if (free_wmark <= 0) {
> > +		zone->nr_try_cma = pageblock_nr_pages;
> > +		goto alloc_cma;
> > +	}
> > +
> > +	if (free_wmark > free_cma) {
> > +		zone->nr_try_movable =
> > +			(free_wmark * pageblock_nr_pages) / free_cma;
> > +		zone->nr_try_cma = pageblock_nr_pages;
> > +	} else {
> > +		zone->nr_try_movable = pageblock_nr_pages;
> > +		zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark;
> > +	}
> > +
> > +	/* Reset complete, start on movable first */
> > +alloc_movable:
> > +	zone->nr_try_movable--;
> > +	return NULL;
> > +}
> > +#endif
> > +
> >  /*
> >   * Do the hard work of removing an element from the buddy allocator.
> >   * Call me with the zone->lock already held.
> > @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> >  static struct page *__rmqueue(struct zone *zone, unsigned int order,
> >  						int migratetype)
> >  {
> > -	struct page *page;
> > +	struct page *page = NULL;
> > +
> > +	if (IS_ENABLED(CONFIG_CMA))
> > +		page = __rmqueue_cma(zone, order, migratetype);
> >  
> >  retry_reserve:
> > -	page = __rmqueue_smallest(zone, order, migratetype);
> > +	if (!page)
> > +		page = __rmqueue_smallest(zone, order, migratetype);
> >  
> >  	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
> >  		page = __rmqueue_fallback(zone, order, migratetype);
> > @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
> >  		zone_seqlock_init(zone);
> >  		zone->zone_pgdat = pgdat;
> >  		zone_pcp_init(zone);
> > +		if (IS_ENABLED(CONFIG_CMA))
> > +			zone->has_cma = 0;
> >  
> >  		/* For bootup, initialized properly in watermark setup */
> >  		mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
> > 
> 
> I'm going to see about running this through tests internally for comparison.
> Hopefully I'll get useful results in a day or so.

Okay.
I really hope to see your result. :)
Thanks for your interest.

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory
  2014-05-09 12:39 ` [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Marek Szyprowski
@ 2014-05-13  2:26   ` Joonsoo Kim
  2014-05-14  9:44     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 45+ messages in thread
From: Joonsoo Kim @ 2014-05-13  2:26 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Michal Nazarewicz,
	linux-mm, linux-kernel, Kyungmin Park, Bartlomiej Zolnierkiewicz,
	'Tomasz Stanislawski'

On Fri, May 09, 2014 at 02:39:20PM +0200, Marek Szyprowski wrote:
> Hello,
> 
> On 2014-05-08 02:32, Joonsoo Kim wrote:
> >This series tries to improve CMA.
> >
> >CMA is introduced to provide physically contiguous pages at runtime
> >without reserving memory area. But, current implementation works like as
> >reserving memory approach, because allocation on cma reserved region only
> >occurs as fallback of migrate_movable allocation. We can allocate from it
> >when there is no movable page. In that situation, kswapd would be invoked
> >easily since unmovable and reclaimable allocation consider
> >(free pages - free CMA pages) as free memory on the system and free memory
> >may be lower than high watermark in that case. If kswapd start to reclaim
> >memory, then fallback allocation doesn't occur much.
> >
> >In my experiment, I found that if system memory has 1024 MB memory and
> >has 512 MB reserved memory for CMA, kswapd is mostly invoked around
> >the 512MB free memory boundary. And invoked kswapd tries to make free
> >memory until (free pages - free CMA pages) is higher than high watermark,
> >so free memory on meminfo is moving around 512MB boundary consistently.
> >
> >To fix this problem, we should allocate the pages on cma reserved memory
> >more aggressively and intelligenetly. Patch 2 implements the solution.
> >Patch 1 is the simple optimization which remove useless re-trial and patch 3
> >is for removing useless alloc flag, so these are not important.
> >See patch 2 for more detailed description.
> >
> >This patchset is based on v3.15-rc4.
> 
> Thanks for posting those patches. It basically reminds me the
> following discussion:
> http://thread.gmane.org/gmane.linux.kernel/1391989/focus=1399524
> 
> Your approach is basically the same. I hope that your patches can be
> improved
> in such a way that they will be accepted by mm maintainers. I only
> wonder if the
> third patch is really necessary. Without it kswapd wakeup might be
> still avoided
> in some cases.

Hello,

Oh... I didn't know that patch and discussion, because I have no interest
on CMA at that time. Your approach looks similar to #1
approach of mine and could have same problem of #1 approach which I mentioned
in patch 2/3. Please refer that patch description. :)
And, there is different purpose between this and yours. This patch is
intended to better use of CMA pages and so get maximum performance.
Just to not trigger oom, it can be possible to put this logic on reclaim path.
But that is sub-optimal to get higher performance, because it needs
migration in some cases.

If second patch works as intended, there are just a few of cma free pages
when we are toward on the watermark. So benefit of third patch would
be marginal and we can remove ALLOC_CMA.

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-08  0:32 ` [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Joonsoo Kim
  2014-05-09 15:45   ` Michal Nazarewicz
  2014-05-12 17:04   ` Laura Abbott
@ 2014-05-13  3:00   ` Minchan Kim
  2014-05-15  1:53     ` Joonsoo Kim
  2014-05-14  8:42   ` [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Aneesh Kumar K.V
  3 siblings, 1 reply; 45+ messages in thread
From: Minchan Kim @ 2014-05-13  3:00 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Michal Nazarewicz, Heesub Shin, Mel Gorman,
	Johannes Weiner, Marek Szyprowski

Hey Joonsoo,

On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote:
> CMA is introduced to provide physically contiguous pages at runtime.
> For this purpose, it reserves memory at boot time. Although it reserve
> memory, this reserved memory can be used for movable memory allocation
> request. This usecase is beneficial to the system that needs this CMA
> reserved memory infrequently and it is one of main purpose of
> introducing CMA.
> 
> But, there is a problem in current implementation. The problem is that
> it works like as just reserved memory approach. The pages on cma reserved
> memory are hardly used for movable memory allocation. This is caused by
> combination of allocation and reclaim policy.
> 
> The pages on cma reserved memory are allocated if there is no movable
> memory, that is, as fallback allocation. So the time this fallback
> allocation is started is under heavy memory pressure. Although it is under
> memory pressure, movable allocation easily succeed, since there would be
> many pages on cma reserved memory. But this is not the case for unmovable
> and reclaimable allocation, because they can't use the pages on cma
> reserved memory. These allocations regard system's free memory as
> (free pages - free cma pages) on watermark checking, that is, free
> unmovable pages + free reclaimable pages + free movable pages. Because
> we already exhausted movable pages, only free pages we have are unmovable
> and reclaimable types and this would be really small amount. So watermark
> checking would be failed. It will wake up kswapd to make enough free
> memory for unmovable and reclaimable allocation and kswapd will do.
> So before we fully utilize pages on cma reserved memory, kswapd start to
> reclaim memory and try to make free memory over the high watermark. This
> watermark checking by kswapd doesn't take care free cma pages so many
> movable pages would be reclaimed. After then, we have a lot of movable
> pages again, so fallback allocation doesn't happen again. To conclude,
> amount of free memory on meminfo which includes free CMA pages is moving
> around 512 MB if I reserve 512 MB memory for CMA.
> 
> I found this problem on following experiment.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j24
> 
> CMA reserve:		0 MB		512 MB
> Elapsed-time:		234.8		361.8
> Average-MemFree:	283880 KB	530851 KB
> 
> To solve this problem, I can think following 2 possible solutions.
> 1. allocate the pages on cma reserved memory first, and if they are
>    exhausted, allocate movable pages.
> 2. interleaved allocation: try to allocate specific amounts of memory
>    from cma reserved memory and then allocate from free movable memory.

I love this idea but when I see the code, I don't like that.
In allocation path, just try to allocate pages by round-robin so it's role
of allocator. If one of migratetype is full, just pass mission to reclaimer
with hint(ie, Hey reclaimer, it's non-movable allocation fail
so there is pointless if you reclaim MIGRATE_CMA pages) so that
reclaimer can filter it out during page scanning.
We already have an tool to achieve it(ie, isolate_mode_t).

And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA?
If possible, it would be better becauser it's generic function to check
free pages and cause trigger reclaim/compaction logic.

> 
> I tested #1 approach and found the problem. Although free memory on
> meminfo can move around low watermark, there is large fluctuation on free
> memory, because too many pages are reclaimed when kswapd is invoked.
> Reason for this behaviour is that successive allocated CMA pages are
> on the LRU list in that order and kswapd reclaim them in same order.
> These memory doesn't help watermark checking from kwapd, so too many
> pages are reclaimed, I guess.
> 
> So, I implement #2 approach.
> One thing I should note is that we should not change allocation target
> (movable list or cma) on each allocation attempt, since this prevent
> allocated pages to be in physically succession, so some I/O devices can
> be hurt their performance. To solve this, I keep allocation target
> in at least pageblock_nr_pages attempts and make this number reflect
> ratio, free pages without free cma pages to free cma pages. With this
> approach, system works very smoothly and fully utilize the pages on
> cma reserved memory.
> 
> Following is the experimental result of this patch.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j24
> 
> <Before>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           234.8           361.8
> Average-MemFree:        283880 KB       530851 KB
> pswpin:                 7               110064
> pswpout:                452             767502
> 
> <After>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           234.2           235.6
> Average-MemFree:        281651 KB       290227 KB
> pswpin:                 8               8
> pswpout:                430             510
> 
> There is no difference if we don't have cma reserved memory (0 MB case).
> But, with cma reserved memory (512 MB case), we fully utilize these
> reserved memory through this patch and the system behaves like as
> it doesn't reserve any memory.
> 
> With this patch, we aggressively allocate the pages on cma reserved memory
> so latency of CMA can arise. Below is the experimental result about
> latency.
> 
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> CMA reserve: 512 MB
> Backgound Workload: make -jN
> Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval
> 
> N:                    1        4       8        16
> Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
> Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2
> 
> So generally we can see latency increase. Ratio of this increase
> is rather big - up to 70%. But, under the heavy workload, it shows
> latency decrease - up to 55%. This may be worst-case scenario, but
> reducing it would be important for some system, so, I can say that
> this patch have advantages and disadvantages in terms of latency.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fac5509..3ff24d4 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -389,6 +389,12 @@ struct zone {
>  	int			compact_order_failed;
>  #endif
>  
> +#ifdef CONFIG_CMA
> +	int has_cma;
> +	int nr_try_cma;
> +	int nr_try_movable;
> +#endif
> +
>  	ZONE_PADDING(_pad1_)
>  
>  	/* Fields commonly accessed by the page reclaim scanner */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 674ade7..6f2b27b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
>  }
>  
>  #ifdef CONFIG_CMA
> +void __init init_alloc_ratio_counter(struct zone *zone)
> +{
> +	if (zone->has_cma)
> +		return;
> +
> +	zone->has_cma = 1;
> +	zone->nr_try_movable = 0;
> +	zone->nr_try_cma = 0;
> +}
> +
>  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>  void __init init_cma_reserved_pageblock(struct page *page)
>  {
> @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  	set_pageblock_migratetype(page, MIGRATE_CMA);
>  	__free_pages(page, pageblock_order);
>  	adjust_managed_page_count(page, pageblock_nr_pages);
> +	init_alloc_ratio_counter(page_zone(page));
>  }
>  #endif
>  
> @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  	return NULL;
>  }
>  
> +#ifdef CONFIG_CMA
> +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order,
> +						int migratetype)
> +{
> +	long free, free_cma, free_wmark;
> +	struct page *page;
> +
> +	if (migratetype != MIGRATE_MOVABLE || !zone->has_cma)
> +		return NULL;
> +
> +	if (zone->nr_try_movable)
> +		goto alloc_movable;
> +
> +alloc_cma:
> +	if (zone->nr_try_cma) {
> +		/* Okay. Now, we can try to allocate the page from cma region */
> +		zone->nr_try_cma--;
> +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> +
> +		/* CMA pages can vanish through CMA allocation */
> +		if (unlikely(!page && order == 0))
> +			zone->nr_try_cma = 0;
> +
> +		return page;
> +	}
> +
> +	/* Reset ratio counter */
> +	free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES);
> +
> +	/* No cma free pages, so recharge only movable allocation */
> +	if (free_cma <= 0) {
> +		zone->nr_try_movable = pageblock_nr_pages;
> +		goto alloc_movable;
> +	}
> +
> +	free = zone_page_state(zone, NR_FREE_PAGES);
> +	free_wmark = free - free_cma - high_wmark_pages(zone);
> +
> +	/*
> +	 * free_wmark is below than 0, and it means that normal pages
> +	 * are under the pressure, so we recharge only cma allocation.
> +	 */
> +	if (free_wmark <= 0) {
> +		zone->nr_try_cma = pageblock_nr_pages;
> +		goto alloc_cma;
> +	}
> +
> +	if (free_wmark > free_cma) {
> +		zone->nr_try_movable =
> +			(free_wmark * pageblock_nr_pages) / free_cma;
> +		zone->nr_try_cma = pageblock_nr_pages;
> +	} else {
> +		zone->nr_try_movable = pageblock_nr_pages;
> +		zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark;
> +	}
> +
> +	/* Reset complete, start on movable first */
> +alloc_movable:
> +	zone->nr_try_movable--;
> +	return NULL;
> +}
> +#endif
> +
>  /*
>   * Do the hard work of removing an element from the buddy allocator.
>   * Call me with the zone->lock already held.
> @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  static struct page *__rmqueue(struct zone *zone, unsigned int order,
>  						int migratetype)
>  {
> -	struct page *page;
> +	struct page *page = NULL;
> +
> +	if (IS_ENABLED(CONFIG_CMA))
> +		page = __rmqueue_cma(zone, order, migratetype);
>  
>  retry_reserve:
> -	page = __rmqueue_smallest(zone, order, migratetype);
> +	if (!page)
> +		page = __rmqueue_smallest(zone, order, migratetype);
>  
>  	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
>  		page = __rmqueue_fallback(zone, order, migratetype);
> @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>  		zone_seqlock_init(zone);
>  		zone->zone_pgdat = pgdat;
>  		zone_pcp_init(zone);
> +		if (IS_ENABLED(CONFIG_CMA))
> +			zone->has_cma = 0;
>  
>  		/* For bootup, initialized properly in watermark setup */
>  		mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
> -- 
> 1.7.9.5
> 
> _______________________________________________
> OTC mailing list
> OTC@blackduck.lge.com
> http://blackduck.lge.com/mailman/listinfo/otc

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-12 17:04   ` Laura Abbott
  2014-05-13  1:14     ` Joonsoo Kim
@ 2014-05-13  3:05     ` Minchan Kim
  2014-05-24  0:57     ` Laura Abbott
  2 siblings, 0 replies; 45+ messages in thread
From: Minchan Kim @ 2014-05-13  3:05 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Joonsoo Kim, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, Heesub Shin, Marek Szyprowski, Michal Nazarewicz,
	linux-mm, linux-kernel

On Mon, May 12, 2014 at 10:04:29AM -0700, Laura Abbott wrote:
> Hi,
> 
> On 5/7/2014 5:32 PM, Joonsoo Kim wrote:
> > CMA is introduced to provide physically contiguous pages at runtime.
> > For this purpose, it reserves memory at boot time. Although it reserve
> > memory, this reserved memory can be used for movable memory allocation
> > request. This usecase is beneficial to the system that needs this CMA
> > reserved memory infrequently and it is one of main purpose of
> > introducing CMA.
> > 
> > But, there is a problem in current implementation. The problem is that
> > it works like as just reserved memory approach. The pages on cma reserved
> > memory are hardly used for movable memory allocation. This is caused by
> > combination of allocation and reclaim policy.
> > 
> > The pages on cma reserved memory are allocated if there is no movable
> > memory, that is, as fallback allocation. So the time this fallback
> > allocation is started is under heavy memory pressure. Although it is under
> > memory pressure, movable allocation easily succeed, since there would be
> > many pages on cma reserved memory. But this is not the case for unmovable
> > and reclaimable allocation, because they can't use the pages on cma
> > reserved memory. These allocations regard system's free memory as
> > (free pages - free cma pages) on watermark checking, that is, free
> > unmovable pages + free reclaimable pages + free movable pages. Because
> > we already exhausted movable pages, only free pages we have are unmovable
> > and reclaimable types and this would be really small amount. So watermark
> > checking would be failed. It will wake up kswapd to make enough free
> > memory for unmovable and reclaimable allocation and kswapd will do.
> > So before we fully utilize pages on cma reserved memory, kswapd start to
> > reclaim memory and try to make free memory over the high watermark. This
> > watermark checking by kswapd doesn't take care free cma pages so many
> > movable pages would be reclaimed. After then, we have a lot of movable
> > pages again, so fallback allocation doesn't happen again. To conclude,
> > amount of free memory on meminfo which includes free CMA pages is moving
> > around 512 MB if I reserve 512 MB memory for CMA.
> > 
> > I found this problem on following experiment.
> > 
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > make -j24
> > 
> > CMA reserve:		0 MB		512 MB
> > Elapsed-time:		234.8		361.8
> > Average-MemFree:	283880 KB	530851 KB
> > 
> > To solve this problem, I can think following 2 possible solutions.
> > 1. allocate the pages on cma reserved memory first, and if they are
> >    exhausted, allocate movable pages.
> > 2. interleaved allocation: try to allocate specific amounts of memory
> >    from cma reserved memory and then allocate from free movable memory.
> > 
> > I tested #1 approach and found the problem. Although free memory on
> > meminfo can move around low watermark, there is large fluctuation on free
> > memory, because too many pages are reclaimed when kswapd is invoked.
> > Reason for this behaviour is that successive allocated CMA pages are
> > on the LRU list in that order and kswapd reclaim them in same order.
> > These memory doesn't help watermark checking from kwapd, so too many
> > pages are reclaimed, I guess.
> > 
> 
> We have an out of tree implementation of #1 and so far it's worked for us
> although we weren't looking at the same metrics. I don't completely
> understand the issue you pointed out with #1. It sounds like the issue is
> that CMA pages are already in use by other processes and on LRU lists and
> because the pages are on LRU lists these aren't counted towards the
> watermark by kswapd. Is my understanding correct?

Kswapd could reclaim MIGRATE_CMA pages unconditionally although allocator
patch was failed by non-movable allocation. It's pointless and should fix.

> 
> > So, I implement #2 approach.
> > One thing I should note is that we should not change allocation target
> > (movable list or cma) on each allocation attempt, since this prevent
> > allocated pages to be in physically succession, so some I/O devices can
> > be hurt their performance. To solve this, I keep allocation target
> > in at least pageblock_nr_pages attempts and make this number reflect
> > ratio, free pages without free cma pages to free cma pages. With this
> > approach, system works very smoothly and fully utilize the pages on
> > cma reserved memory.
> > 
> > Following is the experimental result of this patch.
> > 
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > make -j24
> > 
> > <Before>
> > CMA reserve:            0 MB            512 MB
> > Elapsed-time:           234.8           361.8
> > Average-MemFree:        283880 KB       530851 KB
> > pswpin:                 7               110064
> > pswpout:                452             767502
> > 
> > <After>
> > CMA reserve:            0 MB            512 MB
> > Elapsed-time:           234.2           235.6
> > Average-MemFree:        281651 KB       290227 KB
> > pswpin:                 8               8
> > pswpout:                430             510
> > 
> > There is no difference if we don't have cma reserved memory (0 MB case).
> > But, with cma reserved memory (512 MB case), we fully utilize these
> > reserved memory through this patch and the system behaves like as
> > it doesn't reserve any memory.
> 
> What metric are you using to determine all CMA memory was fully used?
> We've been checking /proc/pagetypeinfo
> 
> > 
> > With this patch, we aggressively allocate the pages on cma reserved memory
> > so latency of CMA can arise. Below is the experimental result about
> > latency.
> > 
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > CMA reserve: 512 MB
> > Backgound Workload: make -jN
> > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval
> > 
> > N:                    1        4       8        16
> > Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
> > Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2
> > 
> > So generally we can see latency increase. Ratio of this increase
> > is rather big - up to 70%. But, under the heavy workload, it shows
> > latency decrease - up to 55%. This may be worst-case scenario, but
> > reducing it would be important for some system, so, I can say that
> > this patch have advantages and disadvantages in terms of latency.
> > 
> 
> Do you have any statistics related to failed migration from this? Latency
> and utilization are issues but so is migration success. In the past we've
> found that an increase in CMA utilization was related to increase in CMA
> migration failures because pages were unmigratable. The current
> workaround for this is limiting CMA pages to be used for user processes
> only and not the file cache. Both of these have their own problems.

If Joonsoo's patch makes fail ratio higher, it would be okay to me because
we have more report from them and have a chance to fix it. It's better than
hiding the problem of CMA with some hack.

> 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index fac5509..3ff24d4 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -389,6 +389,12 @@ struct zone {
> >  	int			compact_order_failed;
> >  #endif
> >  
> > +#ifdef CONFIG_CMA
> > +	int has_cma;
> > +	int nr_try_cma;
> > +	int nr_try_movable;
> > +#endif
> > +
> >  	ZONE_PADDING(_pad1_)
> >  
> >  	/* Fields commonly accessed by the page reclaim scanner */
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 674ade7..6f2b27b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
> >  }
> >  
> >  #ifdef CONFIG_CMA
> > +void __init init_alloc_ratio_counter(struct zone *zone)
> > +{
> > +	if (zone->has_cma)
> > +		return;
> > +
> > +	zone->has_cma = 1;
> > +	zone->nr_try_movable = 0;
> > +	zone->nr_try_cma = 0;
> > +}
> > +
> >  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> >  void __init init_cma_reserved_pageblock(struct page *page)
> >  {
> > @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
> >  	set_pageblock_migratetype(page, MIGRATE_CMA);
> >  	__free_pages(page, pageblock_order);
> >  	adjust_managed_page_count(page, pageblock_nr_pages);
> > +	init_alloc_ratio_counter(page_zone(page));
> >  }
> >  #endif
> >  
> > @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> >  	return NULL;
> >  }
> >  
> > +#ifdef CONFIG_CMA
> > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order,
> > +						int migratetype)
> > +{
> > +	long free, free_cma, free_wmark;
> > +	struct page *page;
> > +
> > +	if (migratetype != MIGRATE_MOVABLE || !zone->has_cma)
> > +		return NULL;
> > +
> > +	if (zone->nr_try_movable)
> > +		goto alloc_movable;
> > +
> > +alloc_cma:
> > +	if (zone->nr_try_cma) {
> > +		/* Okay. Now, we can try to allocate the page from cma region */
> > +		zone->nr_try_cma--;
> > +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> > +
> > +		/* CMA pages can vanish through CMA allocation */
> > +		if (unlikely(!page && order == 0))
> > +			zone->nr_try_cma = 0;
> > +
> > +		return page;
> > +	}
> > +
> > +	/* Reset ratio counter */
> > +	free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES);
> > +
> > +	/* No cma free pages, so recharge only movable allocation */
> > +	if (free_cma <= 0) {
> > +		zone->nr_try_movable = pageblock_nr_pages;
> > +		goto alloc_movable;
> > +	}
> > +
> > +	free = zone_page_state(zone, NR_FREE_PAGES);
> > +	free_wmark = free - free_cma - high_wmark_pages(zone);
> > +
> > +	/*
> > +	 * free_wmark is below than 0, and it means that normal pages
> > +	 * are under the pressure, so we recharge only cma allocation.
> > +	 */
> > +	if (free_wmark <= 0) {
> > +		zone->nr_try_cma = pageblock_nr_pages;
> > +		goto alloc_cma;
> > +	}
> > +
> > +	if (free_wmark > free_cma) {
> > +		zone->nr_try_movable =
> > +			(free_wmark * pageblock_nr_pages) / free_cma;
> > +		zone->nr_try_cma = pageblock_nr_pages;
> > +	} else {
> > +		zone->nr_try_movable = pageblock_nr_pages;
> > +		zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark;
> > +	}
> > +
> > +	/* Reset complete, start on movable first */
> > +alloc_movable:
> > +	zone->nr_try_movable--;
> > +	return NULL;
> > +}
> > +#endif
> > +
> >  /*
> >   * Do the hard work of removing an element from the buddy allocator.
> >   * Call me with the zone->lock already held.
> > @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> >  static struct page *__rmqueue(struct zone *zone, unsigned int order,
> >  						int migratetype)
> >  {
> > -	struct page *page;
> > +	struct page *page = NULL;
> > +
> > +	if (IS_ENABLED(CONFIG_CMA))
> > +		page = __rmqueue_cma(zone, order, migratetype);
> >  
> >  retry_reserve:
> > -	page = __rmqueue_smallest(zone, order, migratetype);
> > +	if (!page)
> > +		page = __rmqueue_smallest(zone, order, migratetype);
> >  
> >  	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
> >  		page = __rmqueue_fallback(zone, order, migratetype);
> > @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
> >  		zone_seqlock_init(zone);
> >  		zone->zone_pgdat = pgdat;
> >  		zone_pcp_init(zone);
> > +		if (IS_ENABLED(CONFIG_CMA))
> > +			zone->has_cma = 0;
> >  
> >  		/* For bootup, initialized properly in watermark setup */
> >  		mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
> > 
> 
> I'm going to see about running this through tests internally for comparison.
> Hopefully I'll get useful results in a day or so.
> 
> Thanks,
> Laura
> 
> -- 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-08  0:32 ` [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Joonsoo Kim
                     ` (2 preceding siblings ...)
  2014-05-13  3:00   ` Minchan Kim
@ 2014-05-14  8:42   ` Aneesh Kumar K.V
  2014-05-15  1:58     ` Joonsoo Kim
  3 siblings, 1 reply; 45+ messages in thread
From: Aneesh Kumar K.V @ 2014-05-14  8:42 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Joonsoo Kim,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, linux-mm, linux-kernel

Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:

> CMA is introduced to provide physically contiguous pages at runtime.
> For this purpose, it reserves memory at boot time. Although it reserve
> memory, this reserved memory can be used for movable memory allocation
> request. This usecase is beneficial to the system that needs this CMA
> reserved memory infrequently and it is one of main purpose of
> introducing CMA.
>
> But, there is a problem in current implementation. The problem is that
> it works like as just reserved memory approach. The pages on cma reserved
> memory are hardly used for movable memory allocation. This is caused by
> combination of allocation and reclaim policy.
>
> The pages on cma reserved memory are allocated if there is no movable
> memory, that is, as fallback allocation. So the time this fallback
> allocation is started is under heavy memory pressure. Although it is under
> memory pressure, movable allocation easily succeed, since there would be
> many pages on cma reserved memory. But this is not the case for unmovable
> and reclaimable allocation, because they can't use the pages on cma
> reserved memory. These allocations regard system's free memory as
> (free pages - free cma pages) on watermark checking, that is, free
> unmovable pages + free reclaimable pages + free movable pages. Because
> we already exhausted movable pages, only free pages we have are unmovable
> and reclaimable types and this would be really small amount. So watermark
> checking would be failed. It will wake up kswapd to make enough free
> memory for unmovable and reclaimable allocation and kswapd will do.
> So before we fully utilize pages on cma reserved memory, kswapd start to
> reclaim memory and try to make free memory over the high watermark. This
> watermark checking by kswapd doesn't take care free cma pages so many
> movable pages would be reclaimed. After then, we have a lot of movable
> pages again, so fallback allocation doesn't happen again. To conclude,
> amount of free memory on meminfo which includes free CMA pages is moving
> around 512 MB if I reserve 512 MB memory for CMA.


Another issue i am facing with the current code is the atomic allocation
failing even with large number of CMA pages around. In my case we never
reclaimed because large part of the memory is consumed by the page cache and
for that, free memory check doesn't include at free_cma. I will test
with this patchset and update here once i have the results.

>
> I found this problem on following experiment.
>
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j24
>
> CMA reserve:		0 MB		512 MB
> Elapsed-time:		234.8		361.8
> Average-MemFree:	283880 KB	530851 KB
>
> To solve this problem, I can think following 2 possible solutions.
> 1. allocate the pages on cma reserved memory first, and if they are
>    exhausted, allocate movable pages.
> 2. interleaved allocation: try to allocate specific amounts of memory
>    from cma reserved memory and then allocate from free movable memory.
>
> I tested #1 approach and found the problem. Although free memory on
> meminfo can move around low watermark, there is large fluctuation on free
> memory, because too many pages are reclaimed when kswapd is invoked.
> Reason for this behaviour is that successive allocated CMA pages are
> on the LRU list in that order and kswapd reclaim them in same order.
> These memory doesn't help watermark checking from kwapd, so too many
> pages are reclaimed, I guess.
>
> So, I implement #2 approach.
> One thing I should note is that we should not change allocation target
> (movable list or cma) on each allocation attempt, since this prevent
> allocated pages to be in physically succession, so some I/O devices can
> be hurt their performance. To solve this, I keep allocation target
> in at least pageblock_nr_pages attempts and make this number reflect
> ratio, free pages without free cma pages to free cma pages. With this
> approach, system works very smoothly and fully utilize the pages on
> cma reserved memory.
>
> Following is the experimental result of this patch.
>
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j24
>
> <Before>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           234.8           361.8
> Average-MemFree:        283880 KB       530851 KB
> pswpin:                 7               110064
> pswpout:                452             767502
>
> <After>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           234.2           235.6
> Average-MemFree:        281651 KB       290227 KB
> pswpin:                 8               8
> pswpout:                430             510
>
> There is no difference if we don't have cma reserved memory (0 MB case).
> But, with cma reserved memory (512 MB case), we fully utilize these
> reserved memory through this patch and the system behaves like as
> it doesn't reserve any memory.
>
> With this patch, we aggressively allocate the pages on cma reserved memory
> so latency of CMA can arise. Below is the experimental result about
> latency.
>
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> CMA reserve: 512 MB
> Backgound Workload: make -jN
> Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval
>
> N:                    1        4       8        16
> Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
> Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2
>
> So generally we can see latency increase. Ratio of this increase
> is rather big - up to 70%. But, under the heavy workload, it shows
> latency decrease - up to 55%. This may be worst-case scenario, but
> reducing it would be important for some system, so, I can say that
> this patch have advantages and disadvantages in terms of latency.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fac5509..3ff24d4 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -389,6 +389,12 @@ struct zone {
>  	int			compact_order_failed;
>  #endif
>  
> +#ifdef CONFIG_CMA
> +	int has_cma;
> +	int nr_try_cma;
> +	int nr_try_movable;
> +#endif


Can you write documentation around this ?

> +
>  	ZONE_PADDING(_pad1_)
>  
>  	/* Fields commonly accessed by the page reclaim scanner */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 674ade7..6f2b27b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
>  }
>  
>  #ifdef CONFIG_CMA
> +void __init init_alloc_ratio_counter(struct zone *zone)
> +{
> +	if (zone->has_cma)
> +		return;
> +
> +	zone->has_cma = 1;
> +	zone->nr_try_movable = 0;
> +	zone->nr_try_cma = 0;
> +}
> +
>  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>  void __init init_cma_reserved_pageblock(struct page *page)
>  {
> @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  	set_pageblock_migratetype(page, MIGRATE_CMA);
>  	__free_pages(page, pageblock_order);
>  	adjust_managed_page_count(page, pageblock_nr_pages);
> +	init_alloc_ratio_counter(page_zone(page));
>  }
>  #endif
>  
> @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  	return NULL;
>  }
>  
> +#ifdef CONFIG_CMA
> +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order,
> +						int migratetype)
> +{
> +	long free, free_cma, free_wmark;
> +	struct page *page;
> +
> +	if (migratetype != MIGRATE_MOVABLE || !zone->has_cma)
> +		return NULL;
> +
> +	if (zone->nr_try_movable)
> +		goto alloc_movable;
> +
> +alloc_cma:
> +	if (zone->nr_try_cma) {
> +		/* Okay. Now, we can try to allocate the page from cma region */
> +		zone->nr_try_cma--;
> +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> +
> +		/* CMA pages can vanish through CMA allocation */
> +		if (unlikely(!page && order == 0))
> +			zone->nr_try_cma = 0;
> +
> +		return page;
> +	}
> +
> +	/* Reset ratio counter */
> +	free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES);
> +
> +	/* No cma free pages, so recharge only movable allocation */
> +	if (free_cma <= 0) {
> +		zone->nr_try_movable = pageblock_nr_pages;
> +		goto alloc_movable;
> +	}
> +
> +	free = zone_page_state(zone, NR_FREE_PAGES);
> +	free_wmark = free - free_cma - high_wmark_pages(zone);
> +
> +	/*
> +	 * free_wmark is below than 0, and it means that normal pages
> +	 * are under the pressure, so we recharge only cma allocation.
> +	 */
> +	if (free_wmark <= 0) {
> +		zone->nr_try_cma = pageblock_nr_pages;
> +		goto alloc_cma;
> +	}
> +
> +	if (free_wmark > free_cma) {
> +		zone->nr_try_movable =
> +			(free_wmark * pageblock_nr_pages) / free_cma;
> +		zone->nr_try_cma = pageblock_nr_pages;
> +	} else {
> +		zone->nr_try_movable = pageblock_nr_pages;
> +		zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark;
> +	}

Can you add the commit message documentation here. 

> +
> +	/* Reset complete, start on movable first */
> +alloc_movable:
> +	zone->nr_try_movable--;
> +	return NULL;
> +}
> +#endif
> +
>  /*
>   * Do the hard work of removing an element from the buddy allocator.
>   * Call me with the zone->lock already held.
> @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  static struct page *__rmqueue(struct zone *zone, unsigned int order,
>  						int migratetype)
>  {
> -	struct page *page;
> +	struct page *page = NULL;
> +
> +	if (IS_ENABLED(CONFIG_CMA))
> +		page = __rmqueue_cma(zone, order, migratetype);

It would be better to move the migrate check here, So that it becomes 

         /* For migrate movable allocation try cma area first */
	if (IS_ENABLED(CONFIG_CMA) && (migratetype == MIGRATE_MOVABLE))



>  
>  retry_reserve:
> -	page = __rmqueue_smallest(zone, order, migratetype);
> +	if (!page)
> +		page = __rmqueue_smallest(zone, order, migratetype);
>  
>  	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
>  		page = __rmqueue_fallback(zone, order, migratetype);
> @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>  		zone_seqlock_init(zone);
>  		zone->zone_pgdat = pgdat;
>  		zone_pcp_init(zone);
> +		if (IS_ENABLED(CONFIG_CMA))
> +			zone->has_cma = 0;
>  
>  		/* For bootup, initialized properly in watermark setup */
>  		mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
> -- 
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory
  2014-05-13  2:26   ` Joonsoo Kim
@ 2014-05-14  9:44     ` Aneesh Kumar K.V
  2014-05-15  2:10       ` Joonsoo Kim
  0 siblings, 1 reply; 45+ messages in thread
From: Aneesh Kumar K.V @ 2014-05-14  9:44 UTC (permalink / raw)
  To: Joonsoo Kim, Marek Szyprowski
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Michal Nazarewicz,
	linux-mm, linux-kernel, Kyungmin Park, Bartlomiej Zolnierkiewicz,
	'Tomasz Stanislawski'

Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:

> On Fri, May 09, 2014 at 02:39:20PM +0200, Marek Szyprowski wrote:
>> Hello,
>> 
>> On 2014-05-08 02:32, Joonsoo Kim wrote:
>> >This series tries to improve CMA.
>> >
>> >CMA is introduced to provide physically contiguous pages at runtime
>> >without reserving memory area. But, current implementation works like as
>> >reserving memory approach, because allocation on cma reserved region only
>> >occurs as fallback of migrate_movable allocation. We can allocate from it
>> >when there is no movable page. In that situation, kswapd would be invoked
>> >easily since unmovable and reclaimable allocation consider
>> >(free pages - free CMA pages) as free memory on the system and free memory
>> >may be lower than high watermark in that case. If kswapd start to reclaim
>> >memory, then fallback allocation doesn't occur much.
>> >
>> >In my experiment, I found that if system memory has 1024 MB memory and
>> >has 512 MB reserved memory for CMA, kswapd is mostly invoked around
>> >the 512MB free memory boundary. And invoked kswapd tries to make free
>> >memory until (free pages - free CMA pages) is higher than high watermark,
>> >so free memory on meminfo is moving around 512MB boundary consistently.
>> >
>> >To fix this problem, we should allocate the pages on cma reserved memory
>> >more aggressively and intelligenetly. Patch 2 implements the solution.
>> >Patch 1 is the simple optimization which remove useless re-trial and patch 3
>> >is for removing useless alloc flag, so these are not important.
>> >See patch 2 for more detailed description.
>> >
>> >This patchset is based on v3.15-rc4.
>> 
>> Thanks for posting those patches. It basically reminds me the
>> following discussion:
>> http://thread.gmane.org/gmane.linux.kernel/1391989/focus=1399524
>> 
>> Your approach is basically the same. I hope that your patches can be
>> improved
>> in such a way that they will be accepted by mm maintainers. I only
>> wonder if the
>> third patch is really necessary. Without it kswapd wakeup might be
>> still avoided
>> in some cases.
>
> Hello,
>
> Oh... I didn't know that patch and discussion, because I have no interest
> on CMA at that time. Your approach looks similar to #1
> approach of mine and could have same problem of #1 approach which I mentioned
> in patch 2/3. Please refer that patch description. :)

IIUC that patch also interleave right ?

+#ifdef CONFIG_CMA
+	unsigned long nr_free = zone_page_state(zone, NR_FREE_PAGES);
+	unsigned long nr_cma_free = zone_page_state(zone, NR_FREE_CMA_PAGES);
+
+	if (migratetype == MIGRATE_MOVABLE && nr_cma_free &&
+	    nr_free - nr_cma_free < 2 * low_wmark_pages(zone))
+		migratetype = MIGRATE_CMA;
+#endif /* CONFIG_CMA */

That doesn't always prefer CMA region. It would be nice to
understand why grouping in pageblock_nr_pages is beneficial. Also in
your patch you decrement nr_try_cma for every 'order' allocation. Why ?

+	if (zone->nr_try_cma) {
+		/* Okay. Now, we can try to allocate the page from cma region */
+		zone->nr_try_cma--;
+		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
+
+		/* CMA pages can vanish through CMA allocation */
+		if (unlikely(!page && order == 0))
+			zone->nr_try_cma = 0;
+
+		return page;
+	}


If we fail above MIGRATE_CMA alloc should we return failure ? Why
not try MOVABLE allocation on failure (ie fallthrough the code path) ?

> And, there is different purpose between this and yours. This patch is
> intended to better use of CMA pages and so get maximum performance.
> Just to not trigger oom, it can be possible to put this logic on reclaim path.
> But that is sub-optimal to get higher performance, because it needs
> migration in some cases.
>
> If second patch works as intended, there are just a few of cma free pages
> when we are toward on the watermark. So benefit of third patch would
> be marginal and we can remove ALLOC_CMA.
>
> Thanks.
>

-aneesh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-13  3:00   ` Minchan Kim
@ 2014-05-15  1:53     ` Joonsoo Kim
  2014-05-15  2:43       ` Minchan Kim
                         ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Joonsoo Kim @ 2014-05-15  1:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Michal Nazarewicz, Heesub Shin, Mel Gorman,
	Johannes Weiner, Marek Szyprowski

On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote:
> Hey Joonsoo,
> 
> On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote:
> > CMA is introduced to provide physically contiguous pages at runtime.
> > For this purpose, it reserves memory at boot time. Although it reserve
> > memory, this reserved memory can be used for movable memory allocation
> > request. This usecase is beneficial to the system that needs this CMA
> > reserved memory infrequently and it is one of main purpose of
> > introducing CMA.
> > 
> > But, there is a problem in current implementation. The problem is that
> > it works like as just reserved memory approach. The pages on cma reserved
> > memory are hardly used for movable memory allocation. This is caused by
> > combination of allocation and reclaim policy.
> > 
> > The pages on cma reserved memory are allocated if there is no movable
> > memory, that is, as fallback allocation. So the time this fallback
> > allocation is started is under heavy memory pressure. Although it is under
> > memory pressure, movable allocation easily succeed, since there would be
> > many pages on cma reserved memory. But this is not the case for unmovable
> > and reclaimable allocation, because they can't use the pages on cma
> > reserved memory. These allocations regard system's free memory as
> > (free pages - free cma pages) on watermark checking, that is, free
> > unmovable pages + free reclaimable pages + free movable pages. Because
> > we already exhausted movable pages, only free pages we have are unmovable
> > and reclaimable types and this would be really small amount. So watermark
> > checking would be failed. It will wake up kswapd to make enough free
> > memory for unmovable and reclaimable allocation and kswapd will do.
> > So before we fully utilize pages on cma reserved memory, kswapd start to
> > reclaim memory and try to make free memory over the high watermark. This
> > watermark checking by kswapd doesn't take care free cma pages so many
> > movable pages would be reclaimed. After then, we have a lot of movable
> > pages again, so fallback allocation doesn't happen again. To conclude,
> > amount of free memory on meminfo which includes free CMA pages is moving
> > around 512 MB if I reserve 512 MB memory for CMA.
> > 
> > I found this problem on following experiment.
> > 
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > make -j24
> > 
> > CMA reserve:		0 MB		512 MB
> > Elapsed-time:		234.8		361.8
> > Average-MemFree:	283880 KB	530851 KB
> > 
> > To solve this problem, I can think following 2 possible solutions.
> > 1. allocate the pages on cma reserved memory first, and if they are
> >    exhausted, allocate movable pages.
> > 2. interleaved allocation: try to allocate specific amounts of memory
> >    from cma reserved memory and then allocate from free movable memory.
> 
> I love this idea but when I see the code, I don't like that.
> In allocation path, just try to allocate pages by round-robin so it's role
> of allocator. If one of migratetype is full, just pass mission to reclaimer
> with hint(ie, Hey reclaimer, it's non-movable allocation fail
> so there is pointless if you reclaim MIGRATE_CMA pages) so that
> reclaimer can filter it out during page scanning.
> We already have an tool to achieve it(ie, isolate_mode_t).

Hello,

I agree with leaving fast allocation path as simple as possible.
I will remove runtime computation for determining ratio in
__rmqueue_cma() and, instead, will use pre-computed value calculated
on the other path.

I am not sure that whether your second suggestion(Hey relaimer part)
is good or not. In my quick thought, that could be helpful in the
situation that many free cma pages remained. But, it would be not helpful
when there are neither free movable and cma pages. In generally, most
workloads mainly uses movable pages for page cache or anonymous mapping.
Although reclaim is triggered by non-movable allocation failure, reclaimed
pages are used mostly by movable allocation. We can handle these allocation
request even if we reclaim the pages just in lru order. If we rotate
the lru list for finding movable pages, it could cause more useful
pages to be evicted.

This is just my quick thought, so please let me correct if I am wrong.

> 
> And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA?
> If possible, it would be better becauser it's generic function to check
> free pages and cause trigger reclaim/compaction logic.

I guess, your *it* means ratio computation. Right?
I don't like putting it on zone_watermark_ok(). Although it need to
refer to free cma pages value which are also referred in zone_watermark_ok(),
this computation is for determining ratio, not for triggering
reclaim/compaction. And this zone_watermark_ok() is on more hot-path, so
putting this logic into zone_watermark_ok() looks not better to me.

I will think better place to do it.

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-14  8:42   ` [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Aneesh Kumar K.V
@ 2014-05-15  1:58     ` Joonsoo Kim
  2014-05-18 17:36       ` Aneesh Kumar K.V
  0 siblings, 1 reply; 45+ messages in thread
From: Joonsoo Kim @ 2014-05-15  1:58 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, linux-mm, linux-kernel

On Wed, May 14, 2014 at 02:12:19PM +0530, Aneesh Kumar K.V wrote:
> Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:
> 
> > CMA is introduced to provide physically contiguous pages at runtime.
> > For this purpose, it reserves memory at boot time. Although it reserve
> > memory, this reserved memory can be used for movable memory allocation
> > request. This usecase is beneficial to the system that needs this CMA
> > reserved memory infrequently and it is one of main purpose of
> > introducing CMA.
> >
> > But, there is a problem in current implementation. The problem is that
> > it works like as just reserved memory approach. The pages on cma reserved
> > memory are hardly used for movable memory allocation. This is caused by
> > combination of allocation and reclaim policy.
> >
> > The pages on cma reserved memory are allocated if there is no movable
> > memory, that is, as fallback allocation. So the time this fallback
> > allocation is started is under heavy memory pressure. Although it is under
> > memory pressure, movable allocation easily succeed, since there would be
> > many pages on cma reserved memory. But this is not the case for unmovable
> > and reclaimable allocation, because they can't use the pages on cma
> > reserved memory. These allocations regard system's free memory as
> > (free pages - free cma pages) on watermark checking, that is, free
> > unmovable pages + free reclaimable pages + free movable pages. Because
> > we already exhausted movable pages, only free pages we have are unmovable
> > and reclaimable types and this would be really small amount. So watermark
> > checking would be failed. It will wake up kswapd to make enough free
> > memory for unmovable and reclaimable allocation and kswapd will do.
> > So before we fully utilize pages on cma reserved memory, kswapd start to
> > reclaim memory and try to make free memory over the high watermark. This
> > watermark checking by kswapd doesn't take care free cma pages so many
> > movable pages would be reclaimed. After then, we have a lot of movable
> > pages again, so fallback allocation doesn't happen again. To conclude,
> > amount of free memory on meminfo which includes free CMA pages is moving
> > around 512 MB if I reserve 512 MB memory for CMA.
> 
> 
> Another issue i am facing with the current code is the atomic allocation
> failing even with large number of CMA pages around. In my case we never
> reclaimed because large part of the memory is consumed by the page cache and
> for that, free memory check doesn't include at free_cma. I will test
> with this patchset and update here once i have the results.
> 

Hello,

Could you elaborate more on your issue?
I can't completely understand your problem.
So your atomic allocation is movable? And although there are many free
cma pages, that request is fail?


> >
> > I found this problem on following experiment.
> >
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > make -j24
> >
> > CMA reserve:		0 MB		512 MB
> > Elapsed-time:		234.8		361.8
> > Average-MemFree:	283880 KB	530851 KB
> >
> > To solve this problem, I can think following 2 possible solutions.
> > 1. allocate the pages on cma reserved memory first, and if they are
> >    exhausted, allocate movable pages.
> > 2. interleaved allocation: try to allocate specific amounts of memory
> >    from cma reserved memory and then allocate from free movable memory.
> >
> > I tested #1 approach and found the problem. Although free memory on
> > meminfo can move around low watermark, there is large fluctuation on free
> > memory, because too many pages are reclaimed when kswapd is invoked.
> > Reason for this behaviour is that successive allocated CMA pages are
> > on the LRU list in that order and kswapd reclaim them in same order.
> > These memory doesn't help watermark checking from kwapd, so too many
> > pages are reclaimed, I guess.
> >
> > So, I implement #2 approach.
> > One thing I should note is that we should not change allocation target
> > (movable list or cma) on each allocation attempt, since this prevent
> > allocated pages to be in physically succession, so some I/O devices can
> > be hurt their performance. To solve this, I keep allocation target
> > in at least pageblock_nr_pages attempts and make this number reflect
> > ratio, free pages without free cma pages to free cma pages. With this
> > approach, system works very smoothly and fully utilize the pages on
> > cma reserved memory.
> >
> > Following is the experimental result of this patch.
> >
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > make -j24
> >
> > <Before>
> > CMA reserve:            0 MB            512 MB
> > Elapsed-time:           234.8           361.8
> > Average-MemFree:        283880 KB       530851 KB
> > pswpin:                 7               110064
> > pswpout:                452             767502
> >
> > <After>
> > CMA reserve:            0 MB            512 MB
> > Elapsed-time:           234.2           235.6
> > Average-MemFree:        281651 KB       290227 KB
> > pswpin:                 8               8
> > pswpout:                430             510
> >
> > There is no difference if we don't have cma reserved memory (0 MB case).
> > But, with cma reserved memory (512 MB case), we fully utilize these
> > reserved memory through this patch and the system behaves like as
> > it doesn't reserve any memory.
> >
> > With this patch, we aggressively allocate the pages on cma reserved memory
> > so latency of CMA can arise. Below is the experimental result about
> > latency.
> >
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > CMA reserve: 512 MB
> > Backgound Workload: make -jN
> > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval
> >
> > N:                    1        4       8        16
> > Elapsed-time(Before): 4309.75  9511.09 12276.1  77103.5
> > Elapsed-time(After):  5391.69 16114.1  19380.3  34879.2
> >
> > So generally we can see latency increase. Ratio of this increase
> > is rather big - up to 70%. But, under the heavy workload, it shows
> > latency decrease - up to 55%. This may be worst-case scenario, but
> > reducing it would be important for some system, so, I can say that
> > this patch have advantages and disadvantages in terms of latency.
> >
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index fac5509..3ff24d4 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -389,6 +389,12 @@ struct zone {
> >  	int			compact_order_failed;
> >  #endif
> >  
> > +#ifdef CONFIG_CMA
> > +	int has_cma;
> > +	int nr_try_cma;
> > +	int nr_try_movable;
> > +#endif
> 
> 
> Can you write documentation around this ?
> 

Okay.

> > +
> >  	ZONE_PADDING(_pad1_)
> >  
> >  	/* Fields commonly accessed by the page reclaim scanner */
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 674ade7..6f2b27b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
> >  }
> >  
> >  #ifdef CONFIG_CMA
> > +void __init init_alloc_ratio_counter(struct zone *zone)
> > +{
> > +	if (zone->has_cma)
> > +		return;
> > +
> > +	zone->has_cma = 1;
> > +	zone->nr_try_movable = 0;
> > +	zone->nr_try_cma = 0;
> > +}
> > +
> >  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> >  void __init init_cma_reserved_pageblock(struct page *page)
> >  {
> > @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
> >  	set_pageblock_migratetype(page, MIGRATE_CMA);
> >  	__free_pages(page, pageblock_order);
> >  	adjust_managed_page_count(page, pageblock_nr_pages);
> > +	init_alloc_ratio_counter(page_zone(page));
> >  }
> >  #endif
> >  
> > @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> >  	return NULL;
> >  }
> >  
> > +#ifdef CONFIG_CMA
> > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order,
> > +						int migratetype)
> > +{
> > +	long free, free_cma, free_wmark;
> > +	struct page *page;
> > +
> > +	if (migratetype != MIGRATE_MOVABLE || !zone->has_cma)
> > +		return NULL;
> > +
> > +	if (zone->nr_try_movable)
> > +		goto alloc_movable;
> > +
> > +alloc_cma:
> > +	if (zone->nr_try_cma) {
> > +		/* Okay. Now, we can try to allocate the page from cma region */
> > +		zone->nr_try_cma--;
> > +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> > +
> > +		/* CMA pages can vanish through CMA allocation */
> > +		if (unlikely(!page && order == 0))
> > +			zone->nr_try_cma = 0;
> > +
> > +		return page;
> > +	}
> > +
> > +	/* Reset ratio counter */
> > +	free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES);
> > +
> > +	/* No cma free pages, so recharge only movable allocation */
> > +	if (free_cma <= 0) {
> > +		zone->nr_try_movable = pageblock_nr_pages;
> > +		goto alloc_movable;
> > +	}
> > +
> > +	free = zone_page_state(zone, NR_FREE_PAGES);
> > +	free_wmark = free - free_cma - high_wmark_pages(zone);
> > +
> > +	/*
> > +	 * free_wmark is below than 0, and it means that normal pages
> > +	 * are under the pressure, so we recharge only cma allocation.
> > +	 */
> > +	if (free_wmark <= 0) {
> > +		zone->nr_try_cma = pageblock_nr_pages;
> > +		goto alloc_cma;
> > +	}
> > +
> > +	if (free_wmark > free_cma) {
> > +		zone->nr_try_movable =
> > +			(free_wmark * pageblock_nr_pages) / free_cma;
> > +		zone->nr_try_cma = pageblock_nr_pages;
> > +	} else {
> > +		zone->nr_try_movable = pageblock_nr_pages;
> > +		zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark;
> > +	}
> 
> Can you add the commit message documentation here. 
> 

Okay.

> > +
> > +	/* Reset complete, start on movable first */
> > +alloc_movable:
> > +	zone->nr_try_movable--;
> > +	return NULL;
> > +}
> > +#endif
> > +
> >  /*
> >   * Do the hard work of removing an element from the buddy allocator.
> >   * Call me with the zone->lock already held.
> > @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> >  static struct page *__rmqueue(struct zone *zone, unsigned int order,
> >  						int migratetype)
> >  {
> > -	struct page *page;
> > +	struct page *page = NULL;
> > +
> > +	if (IS_ENABLED(CONFIG_CMA))
> > +		page = __rmqueue_cma(zone, order, migratetype);
> 
> It would be better to move the migrate check here, So that it becomes 
> 
>          /* For migrate movable allocation try cma area first */
> 	if (IS_ENABLED(CONFIG_CMA) && (migratetype == MIGRATE_MOVABLE))
> 
> 

Okay. But it makes no difference between current code and your
suggestion, because __rmqueue_cma would be inlined by compiler
optimization.

Thanks.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory
  2014-05-14  9:44     ` Aneesh Kumar K.V
@ 2014-05-15  2:10       ` Joonsoo Kim
  2014-05-15  9:47         ` Mel Gorman
  0 siblings, 1 reply; 45+ messages in thread
From: Joonsoo Kim @ 2014-05-15  2:10 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Marek Szyprowski, Andrew Morton, Rik van Riel, Johannes Weiner,
	Mel Gorman, Laura Abbott, Minchan Kim, Heesub Shin,
	Michal Nazarewicz, linux-mm, linux-kernel, Kyungmin Park,
	Bartlomiej Zolnierkiewicz, 'Tomasz Stanislawski'

On Wed, May 14, 2014 at 03:14:30PM +0530, Aneesh Kumar K.V wrote:
> Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:
> 
> > On Fri, May 09, 2014 at 02:39:20PM +0200, Marek Szyprowski wrote:
> >> Hello,
> >> 
> >> On 2014-05-08 02:32, Joonsoo Kim wrote:
> >> >This series tries to improve CMA.
> >> >
> >> >CMA is introduced to provide physically contiguous pages at runtime
> >> >without reserving memory area. But, current implementation works like as
> >> >reserving memory approach, because allocation on cma reserved region only
> >> >occurs as fallback of migrate_movable allocation. We can allocate from it
> >> >when there is no movable page. In that situation, kswapd would be invoked
> >> >easily since unmovable and reclaimable allocation consider
> >> >(free pages - free CMA pages) as free memory on the system and free memory
> >> >may be lower than high watermark in that case. If kswapd start to reclaim
> >> >memory, then fallback allocation doesn't occur much.
> >> >
> >> >In my experiment, I found that if system memory has 1024 MB memory and
> >> >has 512 MB reserved memory for CMA, kswapd is mostly invoked around
> >> >the 512MB free memory boundary. And invoked kswapd tries to make free
> >> >memory until (free pages - free CMA pages) is higher than high watermark,
> >> >so free memory on meminfo is moving around 512MB boundary consistently.
> >> >
> >> >To fix this problem, we should allocate the pages on cma reserved memory
> >> >more aggressively and intelligenetly. Patch 2 implements the solution.
> >> >Patch 1 is the simple optimization which remove useless re-trial and patch 3
> >> >is for removing useless alloc flag, so these are not important.
> >> >See patch 2 for more detailed description.
> >> >
> >> >This patchset is based on v3.15-rc4.
> >> 
> >> Thanks for posting those patches. It basically reminds me the
> >> following discussion:
> >> http://thread.gmane.org/gmane.linux.kernel/1391989/focus=1399524
> >> 
> >> Your approach is basically the same. I hope that your patches can be
> >> improved
> >> in such a way that they will be accepted by mm maintainers. I only
> >> wonder if the
> >> third patch is really necessary. Without it kswapd wakeup might be
> >> still avoided
> >> in some cases.
> >
> > Hello,
> >
> > Oh... I didn't know that patch and discussion, because I have no interest
> > on CMA at that time. Your approach looks similar to #1
> > approach of mine and could have same problem of #1 approach which I mentioned
> > in patch 2/3. Please refer that patch description. :)
> 
> IIUC that patch also interleave right ?
> 
> +#ifdef CONFIG_CMA
> +	unsigned long nr_free = zone_page_state(zone, NR_FREE_PAGES);
> +	unsigned long nr_cma_free = zone_page_state(zone, NR_FREE_CMA_PAGES);
> +
> +	if (migratetype == MIGRATE_MOVABLE && nr_cma_free &&
> +	    nr_free - nr_cma_free < 2 * low_wmark_pages(zone))
> +		migratetype = MIGRATE_CMA;
> +#endif /* CONFIG_CMA */

Hello,

This is not interleave in my point of view. This logic will allocate
free movable pages until hitting 2 * low_wmark, and then allocate free
cma pages. Interleave that I mean is something like round-robin policy
with no constraint like above.

> 
> That doesn't always prefer CMA region. It would be nice to
> understand why grouping in pageblock_nr_pages is beneficial. Also in
> your patch you decrement nr_try_cma for every 'order' allocation. Why ?

pageblock_nr_pages is just magic value with no rationale. :)
But we need grouping, because without it, we can't get physically
contiguous pages. When we allocate the pages for page cache, readahead
logic will try to allocate 32 pages. If we don't use grouping, disk
I/O for these pages can't be handled by one I/O request on some devices.
I'm not familiar to I/O device, please let me correct.

And, yes, I will consider 'order' allocation when inc/dec nr_try_cma.

> 
> +	if (zone->nr_try_cma) {
> +		/* Okay. Now, we can try to allocate the page from cma region */
> +		zone->nr_try_cma--;
> +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> +
> +		/* CMA pages can vanish through CMA allocation */
> +		if (unlikely(!page && order == 0))
> +			zone->nr_try_cma = 0;
> +
> +		return page;
> +	}
> 
> 
> If we fail above MIGRATE_CMA alloc should we return failure ? Why
> not try MOVABLE allocation on failure (ie fallthrough the code path) ?

This patch use fallthrough logic. If we fail on __rmqueue_cma(), it will
go __rmqueue() as usual.

Thanks.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-15  1:53     ` Joonsoo Kim
@ 2014-05-15  2:43       ` Minchan Kim
  2014-05-19  2:11         ` Joonsoo Kim
  2014-05-15  2:45       ` Heesub Shin
  2014-05-16  8:02       ` [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value Gioh Kim
  2 siblings, 1 reply; 45+ messages in thread
From: Minchan Kim @ 2014-05-15  2:43 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Michal Nazarewicz, Heesub Shin, Mel Gorman,
	Johannes Weiner, Marek Szyprowski

On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote:
> On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote:
> > Hey Joonsoo,
> > 
> > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote:
> > > CMA is introduced to provide physically contiguous pages at runtime.
> > > For this purpose, it reserves memory at boot time. Although it reserve
> > > memory, this reserved memory can be used for movable memory allocation
> > > request. This usecase is beneficial to the system that needs this CMA
> > > reserved memory infrequently and it is one of main purpose of
> > > introducing CMA.
> > > 
> > > But, there is a problem in current implementation. The problem is that
> > > it works like as just reserved memory approach. The pages on cma reserved
> > > memory are hardly used for movable memory allocation. This is caused by
> > > combination of allocation and reclaim policy.
> > > 
> > > The pages on cma reserved memory are allocated if there is no movable
> > > memory, that is, as fallback allocation. So the time this fallback
> > > allocation is started is under heavy memory pressure. Although it is under
> > > memory pressure, movable allocation easily succeed, since there would be
> > > many pages on cma reserved memory. But this is not the case for unmovable
> > > and reclaimable allocation, because they can't use the pages on cma
> > > reserved memory. These allocations regard system's free memory as
> > > (free pages - free cma pages) on watermark checking, that is, free
> > > unmovable pages + free reclaimable pages + free movable pages. Because
> > > we already exhausted movable pages, only free pages we have are unmovable
> > > and reclaimable types and this would be really small amount. So watermark
> > > checking would be failed. It will wake up kswapd to make enough free
> > > memory for unmovable and reclaimable allocation and kswapd will do.
> > > So before we fully utilize pages on cma reserved memory, kswapd start to
> > > reclaim memory and try to make free memory over the high watermark. This
> > > watermark checking by kswapd doesn't take care free cma pages so many
> > > movable pages would be reclaimed. After then, we have a lot of movable
> > > pages again, so fallback allocation doesn't happen again. To conclude,
> > > amount of free memory on meminfo which includes free CMA pages is moving
> > > around 512 MB if I reserve 512 MB memory for CMA.
> > > 
> > > I found this problem on following experiment.
> > > 
> > > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > > make -j24
> > > 
> > > CMA reserve:		0 MB		512 MB
> > > Elapsed-time:		234.8		361.8
> > > Average-MemFree:	283880 KB	530851 KB
> > > 
> > > To solve this problem, I can think following 2 possible solutions.
> > > 1. allocate the pages on cma reserved memory first, and if they are
> > >    exhausted, allocate movable pages.
> > > 2. interleaved allocation: try to allocate specific amounts of memory
> > >    from cma reserved memory and then allocate from free movable memory.
> > 
> > I love this idea but when I see the code, I don't like that.
> > In allocation path, just try to allocate pages by round-robin so it's role
> > of allocator. If one of migratetype is full, just pass mission to reclaimer
> > with hint(ie, Hey reclaimer, it's non-movable allocation fail
> > so there is pointless if you reclaim MIGRATE_CMA pages) so that
> > reclaimer can filter it out during page scanning.
> > We already have an tool to achieve it(ie, isolate_mode_t).
> 
> Hello,
> 
> I agree with leaving fast allocation path as simple as possible.
> I will remove runtime computation for determining ratio in
> __rmqueue_cma() and, instead, will use pre-computed value calculated
> on the other path.

Sounds good.

> 
> I am not sure that whether your second suggestion(Hey relaimer part)
> is good or not. In my quick thought, that could be helpful in the
> situation that many free cma pages remained. But, it would be not helpful
> when there are neither free movable and cma pages. In generally, most
> workloads mainly uses movable pages for page cache or anonymous mapping.
> Although reclaim is triggered by non-movable allocation failure, reclaimed
> pages are used mostly by movable allocation. We can handle these allocation
> request even if we reclaim the pages just in lru order. If we rotate
> the lru list for finding movable pages, it could cause more useful
> pages to be evicted.
> 
> This is just my quick thought, so please let me correct if I am wrong.

Why should reclaimer reclaim unnecessary pages?
So, your answer is that it would be better because upcoming newly allocated
pages would be allocated easily without interrupt. But it could reclaim
too much pages until watermark for unmovable allocation is okay.
Even, sometime, you might see OOM.

Moreover, how could you handle current trobule?
For example, there is atomic allocation and the only thing to save the world
is kswapd because it's one of kswapd role but kswapd is spending many time to
reclaim CMA pages, which is pointless so the allocation would be easily failed.

> 
> > 
> > And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA?
> > If possible, it would be better becauser it's generic function to check
> > free pages and cause trigger reclaim/compaction logic.
> 
> I guess, your *it* means ratio computation. Right?

I meant just get_page_from_freelist like fair zone allocation for consistency
but as we discussed offline, i'm not against with you if it's not right place.


> I don't like putting it on zone_watermark_ok(). Although it need to
> refer to free cma pages value which are also referred in zone_watermark_ok(),
> this computation is for determining ratio, not for triggering
> reclaim/compaction. And this zone_watermark_ok() is on more hot-path, so
> putting this logic into zone_watermark_ok() looks not better to me.
> 
> I will think better place to do it.

Yeb, Thanks!

> 
> Thanks.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-15  1:53     ` Joonsoo Kim
  2014-05-15  2:43       ` Minchan Kim
@ 2014-05-15  2:45       ` Heesub Shin
  2014-05-15  5:06         ` Minchan Kim
  2014-05-19 23:22         ` Minchan Kim
  2014-05-16  8:02       ` [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value Gioh Kim
  2 siblings, 2 replies; 45+ messages in thread
From: Heesub Shin @ 2014-05-15  2:45 UTC (permalink / raw)
  To: Joonsoo Kim, Minchan Kim
  Cc: Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Michal Nazarewicz, Mel Gorman, Johannes Weiner,
	Marek Szyprowski

Hello,

On 05/15/2014 10:53 AM, Joonsoo Kim wrote:
> On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote:
>> Hey Joonsoo,
>>
>> On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote:
>>> CMA is introduced to provide physically contiguous pages at runtime.
>>> For this purpose, it reserves memory at boot time. Although it reserve
>>> memory, this reserved memory can be used for movable memory allocation
>>> request. This usecase is beneficial to the system that needs this CMA
>>> reserved memory infrequently and it is one of main purpose of
>>> introducing CMA.
>>>
>>> But, there is a problem in current implementation. The problem is that
>>> it works like as just reserved memory approach. The pages on cma reserved
>>> memory are hardly used for movable memory allocation. This is caused by
>>> combination of allocation and reclaim policy.
>>>
>>> The pages on cma reserved memory are allocated if there is no movable
>>> memory, that is, as fallback allocation. So the time this fallback
>>> allocation is started is under heavy memory pressure. Although it is under
>>> memory pressure, movable allocation easily succeed, since there would be
>>> many pages on cma reserved memory. But this is not the case for unmovable
>>> and reclaimable allocation, because they can't use the pages on cma
>>> reserved memory. These allocations regard system's free memory as
>>> (free pages - free cma pages) on watermark checking, that is, free
>>> unmovable pages + free reclaimable pages + free movable pages. Because
>>> we already exhausted movable pages, only free pages we have are unmovable
>>> and reclaimable types and this would be really small amount. So watermark
>>> checking would be failed. It will wake up kswapd to make enough free
>>> memory for unmovable and reclaimable allocation and kswapd will do.
>>> So before we fully utilize pages on cma reserved memory, kswapd start to
>>> reclaim memory and try to make free memory over the high watermark. This
>>> watermark checking by kswapd doesn't take care free cma pages so many
>>> movable pages would be reclaimed. After then, we have a lot of movable
>>> pages again, so fallback allocation doesn't happen again. To conclude,
>>> amount of free memory on meminfo which includes free CMA pages is moving
>>> around 512 MB if I reserve 512 MB memory for CMA.
>>>
>>> I found this problem on following experiment.
>>>
>>> 4 CPUs, 1024 MB, VIRTUAL MACHINE
>>> make -j24
>>>
>>> CMA reserve:		0 MB		512 MB
>>> Elapsed-time:		234.8		361.8
>>> Average-MemFree:	283880 KB	530851 KB
>>>
>>> To solve this problem, I can think following 2 possible solutions.
>>> 1. allocate the pages on cma reserved memory first, and if they are
>>>     exhausted, allocate movable pages.
>>> 2. interleaved allocation: try to allocate specific amounts of memory
>>>     from cma reserved memory and then allocate from free movable memory.
>>
>> I love this idea but when I see the code, I don't like that.
>> In allocation path, just try to allocate pages by round-robin so it's role
>> of allocator. If one of migratetype is full, just pass mission to reclaimer
>> with hint(ie, Hey reclaimer, it's non-movable allocation fail
>> so there is pointless if you reclaim MIGRATE_CMA pages) so that
>> reclaimer can filter it out during page scanning.
>> We already have an tool to achieve it(ie, isolate_mode_t).
>
> Hello,
>
> I agree with leaving fast allocation path as simple as possible.
> I will remove runtime computation for determining ratio in
> __rmqueue_cma() and, instead, will use pre-computed value calculated
> on the other path.
>
> I am not sure that whether your second suggestion(Hey relaimer part)
> is good or not. In my quick thought, that could be helpful in the
> situation that many free cma pages remained. But, it would be not helpful
> when there are neither free movable and cma pages. In generally, most
> workloads mainly uses movable pages for page cache or anonymous mapping.
> Although reclaim is triggered by non-movable allocation failure, reclaimed
> pages are used mostly by movable allocation. We can handle these allocation
> request even if we reclaim the pages just in lru order. If we rotate
> the lru list for finding movable pages, it could cause more useful
> pages to be evicted.
>
> This is just my quick thought, so please let me correct if I am wrong.

We have an out of tree implementation that is completely the same with 
the approach Minchan said and it works, but it has definitely some 
side-effects as you pointed, distorting the LRU and evicting hot pages. 
I do not attach code fragments in this thread for some reasons, but it 
must be easy for yourself. I am wondering if it could help also in your 
case.

Thanks,
Heesub

>
>>
>> And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA?
>> If possible, it would be better becauser it's generic function to check
>> free pages and cause trigger reclaim/compaction logic.
>
> I guess, your *it* means ratio computation. Right?
> I don't like putting it on zone_watermark_ok(). Although it need to
> refer to free cma pages value which are also referred in zone_watermark_ok(),
> this computation is for determining ratio, not for triggering
> reclaim/compaction. And this zone_watermark_ok() is on more hot-path, so
> putting this logic into zone_watermark_ok() looks not better to me.
>
> I will think better place to do it.
>
> Thanks.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-15  2:45       ` Heesub Shin
@ 2014-05-15  5:06         ` Minchan Kim
  2014-05-19 23:22         ` Minchan Kim
  1 sibling, 0 replies; 45+ messages in thread
From: Minchan Kim @ 2014-05-15  5:06 UTC (permalink / raw)
  To: Heesub Shin
  Cc: Joonsoo Kim, Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Michal Nazarewicz, Mel Gorman, Johannes Weiner,
	Marek Szyprowski

Hello Heesub,

On Thu, May 15, 2014 at 11:45:31AM +0900, Heesub Shin wrote:
> Hello,
> 
> On 05/15/2014 10:53 AM, Joonsoo Kim wrote:
> >On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote:
> >>Hey Joonsoo,
> >>
> >>On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote:
> >>>CMA is introduced to provide physically contiguous pages at runtime.
> >>>For this purpose, it reserves memory at boot time. Although it reserve
> >>>memory, this reserved memory can be used for movable memory allocation
> >>>request. This usecase is beneficial to the system that needs this CMA
> >>>reserved memory infrequently and it is one of main purpose of
> >>>introducing CMA.
> >>>
> >>>But, there is a problem in current implementation. The problem is that
> >>>it works like as just reserved memory approach. The pages on cma reserved
> >>>memory are hardly used for movable memory allocation. This is caused by
> >>>combination of allocation and reclaim policy.
> >>>
> >>>The pages on cma reserved memory are allocated if there is no movable
> >>>memory, that is, as fallback allocation. So the time this fallback
> >>>allocation is started is under heavy memory pressure. Although it is under
> >>>memory pressure, movable allocation easily succeed, since there would be
> >>>many pages on cma reserved memory. But this is not the case for unmovable
> >>>and reclaimable allocation, because they can't use the pages on cma
> >>>reserved memory. These allocations regard system's free memory as
> >>>(free pages - free cma pages) on watermark checking, that is, free
> >>>unmovable pages + free reclaimable pages + free movable pages. Because
> >>>we already exhausted movable pages, only free pages we have are unmovable
> >>>and reclaimable types and this would be really small amount. So watermark
> >>>checking would be failed. It will wake up kswapd to make enough free
> >>>memory for unmovable and reclaimable allocation and kswapd will do.
> >>>So before we fully utilize pages on cma reserved memory, kswapd start to
> >>>reclaim memory and try to make free memory over the high watermark. This
> >>>watermark checking by kswapd doesn't take care free cma pages so many
> >>>movable pages would be reclaimed. After then, we have a lot of movable
> >>>pages again, so fallback allocation doesn't happen again. To conclude,
> >>>amount of free memory on meminfo which includes free CMA pages is moving
> >>>around 512 MB if I reserve 512 MB memory for CMA.
> >>>
> >>>I found this problem on following experiment.
> >>>
> >>>4 CPUs, 1024 MB, VIRTUAL MACHINE
> >>>make -j24
> >>>
> >>>CMA reserve:		0 MB		512 MB
> >>>Elapsed-time:		234.8		361.8
> >>>Average-MemFree:	283880 KB	530851 KB
> >>>
> >>>To solve this problem, I can think following 2 possible solutions.
> >>>1. allocate the pages on cma reserved memory first, and if they are
> >>>    exhausted, allocate movable pages.
> >>>2. interleaved allocation: try to allocate specific amounts of memory
> >>>    from cma reserved memory and then allocate from free movable memory.
> >>
> >>I love this idea but when I see the code, I don't like that.
> >>In allocation path, just try to allocate pages by round-robin so it's role
> >>of allocator. If one of migratetype is full, just pass mission to reclaimer
> >>with hint(ie, Hey reclaimer, it's non-movable allocation fail
> >>so there is pointless if you reclaim MIGRATE_CMA pages) so that
> >>reclaimer can filter it out during page scanning.
> >>We already have an tool to achieve it(ie, isolate_mode_t).
> >
> >Hello,
> >
> >I agree with leaving fast allocation path as simple as possible.
> >I will remove runtime computation for determining ratio in
> >__rmqueue_cma() and, instead, will use pre-computed value calculated
> >on the other path.
> >
> >I am not sure that whether your second suggestion(Hey relaimer part)
> >is good or not. In my quick thought, that could be helpful in the
> >situation that many free cma pages remained. But, it would be not helpful
> >when there are neither free movable and cma pages. In generally, most
> >workloads mainly uses movable pages for page cache or anonymous mapping.
> >Although reclaim is triggered by non-movable allocation failure, reclaimed
> >pages are used mostly by movable allocation. We can handle these allocation
> >request even if we reclaim the pages just in lru order. If we rotate
> >the lru list for finding movable pages, it could cause more useful
> >pages to be evicted.
> >
> >This is just my quick thought, so please let me correct if I am wrong.
> 
> We have an out of tree implementation that is completely the same
> with the approach Minchan said and it works, but it has definitely
> some side-effects as you pointed, distorting the LRU and evicting
> hot pages. I do not attach code fragments in this thread for some

Actually, I discussed with Joonsoo to solve such corner case in future if
someone report it but you did it now. Thanks!

LRU churning is a general problem, not CMA specific although CMA would make
worse more agressively so I'd like to handle it another topic(ie, patchset)

The reason we did rotate them back to LRU head was just to avoid scanning
repeat overhead of one reclaim cycle so one of idea I can think of is that
we can put a reclaim cursor into LRU tail right before reclaim cycle and
start scanning from the cursor and update the cursor position on every
scanning cycle. Of course, we should rotate filtered out pages back to
LRU's tail, not head but with cursor, we can skip pointless pages which was
already scanned by this reclaim cycle.

The cursor should be removed when the reclaim cycle would be done so if next
reclaim happens, cursor will start from the beginning so it could make
unecessary scanning again until reaching the proper victim page so CPU usage
would be higher but it's better than evicting working set.

Another idea?

> reasons, but it must be easy for yourself. I am wondering if it
> could help also in your case.
> 
> Thanks,
> Heesub
> 
> >
> >>
> >>And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA?
> >>If possible, it would be better becauser it's generic function to check
> >>free pages and cause trigger reclaim/compaction logic.
> >
> >I guess, your *it* means ratio computation. Right?
> >I don't like putting it on zone_watermark_ok(). Although it need to
> >refer to free cma pages value which are also referred in zone_watermark_ok(),
> >this computation is for determining ratio, not for triggering
> >reclaim/compaction. And this zone_watermark_ok() is on more hot-path, so
> >putting this logic into zone_watermark_ok() looks not better to me.
> >
> >I will think better place to do it.
> >
> >Thanks.
> >
> >--
> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >the body to majordomo@kvack.org.  For more info on Linux MM,
> >see: http://www.linux-mm.org/ .
> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory
  2014-05-15  2:10       ` Joonsoo Kim
@ 2014-05-15  9:47         ` Mel Gorman
  2014-05-19  2:12           ` Joonsoo Kim
  0 siblings, 1 reply; 45+ messages in thread
From: Mel Gorman @ 2014-05-15  9:47 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Aneesh Kumar K.V, Marek Szyprowski, Andrew Morton, Rik van Riel,
	Johannes Weiner, Laura Abbott, Minchan Kim, Heesub Shin,
	Michal Nazarewicz, linux-mm, linux-kernel, Kyungmin Park,
	Bartlomiej Zolnierkiewicz, 'Tomasz Stanislawski'

On Thu, May 15, 2014 at 11:10:55AM +0900, Joonsoo Kim wrote:
> > That doesn't always prefer CMA region. It would be nice to
> > understand why grouping in pageblock_nr_pages is beneficial. Also in
> > your patch you decrement nr_try_cma for every 'order' allocation. Why ?
> 
> pageblock_nr_pages is just magic value with no rationale. :)

I'm not following this discussions closely but there is rational to that
value -- it's the size of a huge page for that architecture.  At the time
the fragmentation avoidance was implemented this was the largest allocation
size of interest.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value
  2014-05-15  1:53     ` Joonsoo Kim
  2014-05-15  2:43       ` Minchan Kim
  2014-05-15  2:45       ` Heesub Shin
@ 2014-05-16  8:02       ` Gioh Kim
  2014-05-16 17:45         ` Michal Nazarewicz
  2 siblings, 1 reply; 45+ messages in thread
From: Gioh Kim @ 2014-05-16  8:02 UTC (permalink / raw)
  To: Joonsoo Kim, Minchan Kim
  Cc: Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Michal Nazarewicz, Heesub Shin, Mel Gorman,
	Johannes Weiner, Marek Szyprowski, 이건호,
	gurugio

Hi,

I've been trying to apply CMA into my platform.
USB host driver generated kernel panic like below when USB mouse is connected,
because I turned on CMA and set the CMA_SIZE_MBYTES value into zero by mistake.

I think the panic is cuased by atomic_pool in arch/arm/mm/dma-mapping.c.
Zero CMA_SIZE_MBYTES value skips CMA initialization and then atomic_pool is not initialized also
because __alloc_from_contiguous is failed in atomic_pool_init().

If CMA_SIZE_MBYTES_MAX is allowed to be zero, there should be defense code to check
CMA is initlaized correctly. And atomic_pool initialization should be done by __alloc_remap_buffer
instead of __alloc_from_contiguous if __alloc_from_contiguous is failed.

IMPO, it is more simple and powerful to restrict CMA_SIZE_MBYTES_MAX configuration to be larger than zero.


[    1.474523] ------------[ cut here ]------------
[    1.479150] WARNING: at arch/arm/mm/dma-mapping.c:496 __dma_alloc.isra.19+0x1b8/0x1e0()
[    1.487160] coherent pool not initialised!
[    1.491249] Modules linked in:
[    1.494317] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.10.19+ #55
[    1.500521] [<80013e20>] (unwind_backtrace+0x0/0xf8) from [<80011c60>] (show_stack+0x10/0x14)
[    1.509064] [<80011c60>] (show_stack+0x10/0x14) from [<8001eedc>] (warn_slowpath_common+0x4c/0x6c)
[    1.518038] [<8001eedc>] (warn_slowpath_common+0x4c/0x6c) from [<8001ef90>] (warn_slowpath_fmt+0x30/0x40)
[    1.527616] [<8001ef90>] (warn_slowpath_fmt+0x30/0x40) from [<80017c28>] (__dma_alloc.isra.19+0x1b8/0x1e0)
[    1.537282] [<80017c28>] (__dma_alloc.isra.19+0x1b8/0x1e0) from [<80017d7c>] (arm_dma_alloc+0x90/0x98)
[    1.546608] [<80017d7c>] (arm_dma_alloc+0x90/0x98) from [<8034a860>] (ohci_init+0x1b0/0x278)
[    1.555062] [<8034a860>] (ohci_init+0x1b0/0x278) from [<80332b0c>] (usb_add_hcd+0x184/0x5b8)
[    1.563500] [<80332b0c>] (usb_add_hcd+0x184/0x5b8) from [<8034b5e0>] (ohci_platform_probe+0xd0/0x174)
[    1.572729] [<8034b5e0>] (ohci_platform_probe+0xd0/0x174) from [<802f196c>] (platform_drv_probe+0x14/0x18)
[    1.582401] [<802f196c>] (platform_drv_probe+0x14/0x18) from [<802f0714>] (driver_probe_device+0x6c/0x1f8)
[    1.592064] [<802f0714>] (driver_probe_device+0x6c/0x1f8) from [<802f092c>] (__driver_attach+0x8c/0x90)
[    1.601465] [<802f092c>] (__driver_attach+0x8c/0x90) from [<802eeec8>] (bus_for_each_dev+0x54/0x88)
[    1.610518] [<802eeec8>] (bus_for_each_dev+0x54/0x88) from [<802efef0>] (bus_add_driver+0xd8/0x230)
[    1.619572] [<802efef0>] (bus_add_driver+0xd8/0x230) from [<802f0de4>] (driver_register+0x78/0x14c)
[    1.628632] [<802f0de4>] (driver_register+0x78/0x14c) from [<806ff018>] (ohci_hcd_mod_init+0x34/0x64)
[    1.637859] [<806ff018>] (ohci_hcd_mod_init+0x34/0x64) from [<8000879c>] (do_one_initcall+0xec/0x14c)
[    1.647088] [<8000879c>] (do_one_initcall+0xec/0x14c) from [<806dab30>] (kernel_init_freeable+0x150/0x220)
[    1.656754] [<806dab30>] (kernel_init_freeable+0x150/0x220) from [<80509f54>] (kernel_init+0x8/0xf8)
[    1.665895] [<80509f54>] (kernel_init+0x8/0xf8) from [<8000e398>] (ret_from_fork+0x14/0x3c)
[    1.674264] ---[ end trace 6f1857db5ef45cb9 ]---
[    1.678880] ohci-platform ohci-platform.0: can't setup
[    1.684027] ohci-platform ohci-platform.0: USB bus 1 deregistered
[    1.690362] ohci-platform: probe of ohci-platform.0 failed with error -12
[    1.697188] ohci-platform ohci-platform.1: Generic Platform OHCI Controller
[    1.704365] ohci-platform ohci-platform.1: new USB bus registered, assigned bus number 1
[    1.712457] ------------[ cut here ]------------
[    1.717096] WARNING: at arch/arm/mm/dma-mapping.c:496 __dma_alloc.isra.19+0x1b8/0x1e0()
[    1.725105] coherent pool not initialised!
[    1.729194] Modules linked in:
[    1.732247] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W    3.10.19+ #55
[    1.739404] [<80013e20>] (unwind_backtrace+0x0/0xf8) from [<80011c60>] (show_stack+0x10/0x14)
[    1.747949] [<80011c60>] (show_stack+0x10/0x14) from [<8001eedc>] (warn_slowpath_common+0x4c/0x6c)
[    1.756923] [<8001eedc>] (warn_slowpath_common+0x4c/0x6c) from [<8001ef90>] (warn_slowpath_fmt+0x30/0x40)
[    1.766502] [<8001ef90>] (warn_slowpath_fmt+0x30/0x40) from [<80017c28>] (__dma_alloc.isra.19+0x1b8/0x1e0)
[    1.776168] [<80017c28>] (__dma_alloc.isra.19+0x1b8/0x1e0) from [<80017d7c>] (arm_dma_alloc+0x90/0x98)
[    1.785484] [<80017d7c>] (arm_dma_alloc+0x90/0x98) from [<8034a860>] (ohci_init+0x1b0/0x278)
[    1.793933] [<8034a860>] (ohci_init+0x1b0/0x278) from [<80332b0c>] (usb_add_hcd+0x184/0x5b8)
[    1.802370] [<80332b0c>] (usb_add_hcd+0x184/0x5b8) from [<8034b5e0>] (ohci_platform_probe+0xd0/0x174)
[    1.811597] [<8034b5e0>] (ohci_platform_probe+0xd0/0x174) from [<802f196c>] (platform_drv_probe+0x14/0x18)
[    1.821263] [<802f196c>] (platform_drv_probe+0x14/0x18) from [<802f0714>] (driver_probe_device+0x6c/0x1f8)
[    1.830926] [<802f0714>] (driver_probe_device+0x6c/0x1f8) from [<802f092c>] (__driver_attach+0x8c/0x90)
[    1.840326] [<802f092c>] (__driver_attach+0x8c/0x90) from [<802eeec8>] (bus_for_each_dev+0x54/0x88)
[    1.849379] [<802eeec8>] (bus_for_each_dev+0x54/0x88) from [<802efef0>] (bus_add_driver+0xd8/0x230)
[    1.858432] [<802efef0>] (bus_add_driver+0xd8/0x230) from [<802f0de4>] (driver_register+0x78/0x14c)
[    1.867488] [<802f0de4>] (driver_register+0x78/0x14c) from [<806ff018>] (ohci_hcd_mod_init+0x34/0x64)
[    1.876714] [<806ff018>] (ohci_hcd_mod_init+0x34/0x64) from [<8000879c>] (do_one_initcall+0xec/0x14c)
[    1.885940] [<8000879c>] (do_one_initcall+0xec/0x14c) from [<806dab30>] (kernel_init_freeable+0x150/0x220)
[    1.895601] [<806dab30>] (kernel_init_freeable+0x150/0x220) from [<80509f54>] (kernel_init+0x8/0xf8)
[    1.904741] [<80509f54>] (kernel_init+0x8/0xf8) from [<8000e398>] (ret_from_fork+0x14/0x3c)
[    1.913085] ---[ end trace 6f1857db5ef45cba ]---


I'm adding my patch to restrict CMA_SIZE_MBYTES.
This patch is based on 3.15.0-rc5

-------------------------------- 8< --------------------------------------
 From 9f8e6d3c1f4bdeeeb7af3df7898b773a612c62e8 Mon Sep 17 00:00:00 2001
From: Gioh Kim <gioh.kim@lge.com>
Date: Fri, 16 May 2014 16:15:43 +0900
Subject: [PATCH] drivers/base/Kconfig: restrict CMA size to non-zero value

The size of CMA area must be larger than zero.
If the size is zero, CMA canno be initialized.

Signed-off-by: Gioh Kim <gioh.kim@lge.c>
---
  drivers/base/Kconfig |    7 ++++++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 4b7b452..19b3578 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -222,13 +222,18 @@ config DMA_CMA
  if  DMA_CMA
  comment "Default contiguous memory area size:"

+config CMA_SIZE_MBYTES_MAX
+       int
+       default 1024
+
  config CMA_SIZE_MBYTES
         int "Size in Mega Bytes"
         depends on !CMA_SIZE_SEL_PERCENTAGE
+       range 1 CMA_SIZE_MBYTES_MAX
         default 16
         help
           Defines the size (in MiB) of the default memory area for Contiguous
-         Memory Allocator.
+         Memory Allocator. This value must be larger than zero.

  config CMA_SIZE_PERCENTAGE
         int "Percentage of total memory"
--
1.7.9.5



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value
  2014-05-16  8:02       ` [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value Gioh Kim
@ 2014-05-16 17:45         ` Michal Nazarewicz
  2014-05-19  1:47           ` Gioh Kim
  0 siblings, 1 reply; 45+ messages in thread
From: Michal Nazarewicz @ 2014-05-16 17:45 UTC (permalink / raw)
  To: Gioh Kim, Joonsoo Kim, Minchan Kim
  Cc: Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Heesub Shin, Mel Gorman, Johannes Weiner,
	Marek Szyprowski, 이건호,
	gurugio

[-- Attachment #1: Type: text/plain, Size: 1038 bytes --]

On Fri, May 16 2014, Gioh Kim wrote:
> If CMA_SIZE_MBYTES is allowed to be zero, there should be defense code
> to check CMA is initlaized correctly. And atomic_pool initialization
> should be done by __alloc_remap_buffer instead of
> __alloc_from_contiguous if __alloc_from_contiguous is failed.

Agreed, and this is the correct fix.

> IMPO, it is more simple and powerful to restrict CMA_SIZE_MBYTES_MAX
> configuration to be larger than zero.

No, because it makes it impossible to have CMA disabled by default and
only enabled if command line argument is given.

Furthermore, your patch does *not* guarantee CMA region to always be
allocated.  If CMA_SIZE_SEL_PERCENTAGE is selected for instance.  Or if
user explicitly passes 0 on command line.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

[-- Attachment #2.1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-15  1:58     ` Joonsoo Kim
@ 2014-05-18 17:36       ` Aneesh Kumar K.V
  2014-05-19  2:29         ` Joonsoo Kim
  0 siblings, 1 reply; 45+ messages in thread
From: Aneesh Kumar K.V @ 2014-05-18 17:36 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, linux-mm, linux-kernel

Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:

> On Wed, May 14, 2014 at 02:12:19PM +0530, Aneesh Kumar K.V wrote:
>> Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:
>> 
>> 
>> 
>> Another issue i am facing with the current code is the atomic allocation
>> failing even with large number of CMA pages around. In my case we never
>> reclaimed because large part of the memory is consumed by the page cache and
>> for that, free memory check doesn't include at free_cma. I will test
>> with this patchset and update here once i have the results.
>> 
>
> Hello,
>
> Could you elaborate more on your issue?
> I can't completely understand your problem.
> So your atomic allocation is movable? And although there are many free
> cma pages, that request is fail?
>

non movable atomic allocations are failing because we don't have
anything other than CMA pages left and kswapd is yet to catchup ?


  swapper/0: page allocation failure: order:0, mode:0x20
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.23-1500.pkvm2_1.5.ppc64 #1
  Call Trace:
 [c000000ffffcb610] [c000000000017330] .show_stack+0x130/0x200 (unreliable)
 [c000000ffffcb6e0] [c00000000087a8c8] .dump_stack+0x28/0x3c
 [c000000ffffcb750] [c0000000001e06f0] .warn_alloc_failed+0x110/0x160
 [c000000ffffcb800] [c0000000001e5984] .__alloc_pages_nodemask+0x9d4/0xbf0
 [c000000ffffcb9e0] [c00000000023775c] .alloc_pages_current+0xcc/0x1b0
 [c000000ffffcba80] [c0000000007098d4] .__netdev_alloc_frag+0x1a4/0x1d0
 [c000000ffffcbb20] [c00000000070d750] .__netdev_alloc_skb+0xc0/0x130
 [c000000ffffcbbb0] [d000000009639b40] .tg3_poll_work+0x900/0x1110 [tg3]
 [c000000ffffcbd10] [d00000000963a3a4] .tg3_poll_msix+0x54/0x200 [tg3]
 [c000000ffffcbdb0] [c00000000071fcec] .net_rx_action+0x1dc/0x310
 [c000000ffffcbe90] [c0000000000c1b08] .__do_softirq+0x158/0x330
 [c000000ffffcbf90] [c000000000025744] .call_do_softirq+0x14/0x24
 [c000000ffffc7e00] [c000000000011684] .do_softirq+0xf4/0x130
 [c000000ffffc7e90] [c0000000000c1f18] .irq_exit+0xc8/0x110
 [c000000ffffc7f10] [c000000000011258] .__do_irq+0xc8/0x1f0
 [c000000ffffc7f90] [c000000000025768] .call_do_irq+0x14/0x24
 [c00000000137b750] [c00000000001142c] .do_IRQ+0xac/0x130
 [c00000000137b800] [c000000000002a64]
 hardware_interrupt_common+0x164/0x180

....


 Node 0 DMA: 408*64kB (C) 408*128kB (C) 408*256kB (C) 408*512kB (C) 408*1024kB (C) 406*2048kB (C) 199*4096kB (C) 97*8192kB (C) 6*16384kB (C) =
 3348992kB
 Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16384kB
 Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16777216kB

meminfo details:

 MemTotal:       65875584 kB
 MemFree:         8001856 kB
 Buffers:        49330368 kB
 Cached:           178752 kB
 SwapCached:            0 kB
 Active:         28550464 kB
 Inactive:       25476416 kB
 Active(anon):    3771008 kB
 Inactive(anon):   767360 kB
 Active(file):   24779456 kB
 Inactive(file): 24709056 kB
 Unevictable:       15104 kB
 Mlocked:           15104 kB
 SwapTotal:       8384448 kB
 SwapFree:        8384448 kB
 Dirty:                 0 kB

-aneesh


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value
  2014-05-16 17:45         ` Michal Nazarewicz
@ 2014-05-19  1:47           ` Gioh Kim
  2014-05-19  5:55             ` Joonsoo Kim
  0 siblings, 1 reply; 45+ messages in thread
From: Gioh Kim @ 2014-05-19  1:47 UTC (permalink / raw)
  To: Michal Nazarewicz, Joonsoo Kim, Minchan Kim
  Cc: Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Heesub Shin, Mel Gorman, Johannes Weiner,
	Marek Szyprowski, 이건호,
	gurugio

Thank you for your advice. I didn't notice it.

I'm adding followings according to your advice:

- range restrict for CMA_SIZE_MBYTES and *CMA_SIZE_PERCENTAGE*
I think this can prevent the wrong kernel option.

- change size_cmdline into default value SZ_16M
I am not sure this can prevent if cma=0 cmdline option is also with base and limit options.


I don't know how to send the second patch.
Please pardon me that I just copy the patch here.

--------------------------------- 8< -------------------------------------
 From c283eaac41b044a2abb11cfd32a60fff034633c3 Mon Sep 17 00:00:00 2001
From: Gioh Kim <gioh.kim@lge.com>
Date: Fri, 16 May 2014 16:15:43 +0900
Subject: [PATCH] drivers/base/Kconfig: restrict CMA size to non-zero value

The size of CMA area must be larger than zero.
If the size is zero, all physically-contiguous allocation
can be failed.

Signed-off-by: Gioh Kim <gioh.kim@lge.co.kr>
---
  drivers/base/Kconfig          |   14 ++++++++++++--
  drivers/base/dma-contiguous.c |    3 ++-
  2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 4b7b452..a7292ac 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -222,17 +222,27 @@ config DMA_CMA
  if  DMA_CMA
  comment "Default contiguous memory area size:"

+config CMA_SIZE_MBYTES_DEFAULT
+       int
+       default 16
+
+config CMA_SIZE_MBYTES_MAX
+       int
+       default 1024
+
  config CMA_SIZE_MBYTES
         int "Size in Mega Bytes"
         depends on !CMA_SIZE_SEL_PERCENTAGE
-       default 16
+       range 1 CMA_SIZE_MBYTES_MAX
+       default CMA_SIZE_MBYTES_DEFAULT
         help
           Defines the size (in MiB) of the default memory area for Contiguous
-         Memory Allocator.
+         Memory Allocator. This value must be larger than zero.

  config CMA_SIZE_PERCENTAGE
         int "Percentage of total memory"
         depends on !CMA_SIZE_SEL_MBYTES
+       range 1 100
         default 10
         help
           Defines the size of the default memory area for Contiguous Memory
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index b056661..5b70442 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -125,7 +125,8 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
         pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);

         if (size_cmdline != -1) {
-               selected_size = size_cmdline;
+               selected_size = ((size_cmdline == 0) ?
+                                CONFIG_CMA_SIZE_MBYTES_DEFAULT : size_cmdline);
                 selected_base = base_cmdline;
                 selected_limit = min_not_zero(limit_cmdline, limit);
                 if (base_cmdline + size_cmdline == limit_cmdline)
--
1.7.9.5


2014-05-17 오전 2:45, Michal Nazarewicz 쓴 글:
> On Fri, May 16 2014, Gioh Kim wrote:
>> If CMA_SIZE_MBYTES is allowed to be zero, there should be defense code
>> to check CMA is initlaized correctly. And atomic_pool initialization
>> should be done by __alloc_remap_buffer instead of
>> __alloc_from_contiguous if __alloc_from_contiguous is failed.
>
> Agreed, and this is the correct fix.
>
>> IMPO, it is more simple and powerful to restrict CMA_SIZE_MBYTES_MAX
>> configuration to be larger than zero.
>
> No, because it makes it impossible to have CMA disabled by default and
> only enabled if command line argument is given.
>
> Furthermore, your patch does *not* guarantee CMA region to always be
> allocated.  If CMA_SIZE_SEL_PERCENTAGE is selected for instance.  Or if
> user explicitly passes 0 on command line.
>
>
>

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-15  2:43       ` Minchan Kim
@ 2014-05-19  2:11         ` Joonsoo Kim
  2014-05-19  2:53           ` Minchan Kim
  0 siblings, 1 reply; 45+ messages in thread
From: Joonsoo Kim @ 2014-05-19  2:11 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Michal Nazarewicz, Heesub Shin, Mel Gorman,
	Johannes Weiner, Marek Szyprowski

On Thu, May 15, 2014 at 11:43:53AM +0900, Minchan Kim wrote:
> On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote:
> > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote:
> > > Hey Joonsoo,
> > > 
> > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote:
> > > > CMA is introduced to provide physically contiguous pages at runtime.
> > > > For this purpose, it reserves memory at boot time. Although it reserve
> > > > memory, this reserved memory can be used for movable memory allocation
> > > > request. This usecase is beneficial to the system that needs this CMA
> > > > reserved memory infrequently and it is one of main purpose of
> > > > introducing CMA.
> > > > 
> > > > But, there is a problem in current implementation. The problem is that
> > > > it works like as just reserved memory approach. The pages on cma reserved
> > > > memory are hardly used for movable memory allocation. This is caused by
> > > > combination of allocation and reclaim policy.
> > > > 
> > > > The pages on cma reserved memory are allocated if there is no movable
> > > > memory, that is, as fallback allocation. So the time this fallback
> > > > allocation is started is under heavy memory pressure. Although it is under
> > > > memory pressure, movable allocation easily succeed, since there would be
> > > > many pages on cma reserved memory. But this is not the case for unmovable
> > > > and reclaimable allocation, because they can't use the pages on cma
> > > > reserved memory. These allocations regard system's free memory as
> > > > (free pages - free cma pages) on watermark checking, that is, free
> > > > unmovable pages + free reclaimable pages + free movable pages. Because
> > > > we already exhausted movable pages, only free pages we have are unmovable
> > > > and reclaimable types and this would be really small amount. So watermark
> > > > checking would be failed. It will wake up kswapd to make enough free
> > > > memory for unmovable and reclaimable allocation and kswapd will do.
> > > > So before we fully utilize pages on cma reserved memory, kswapd start to
> > > > reclaim memory and try to make free memory over the high watermark. This
> > > > watermark checking by kswapd doesn't take care free cma pages so many
> > > > movable pages would be reclaimed. After then, we have a lot of movable
> > > > pages again, so fallback allocation doesn't happen again. To conclude,
> > > > amount of free memory on meminfo which includes free CMA pages is moving
> > > > around 512 MB if I reserve 512 MB memory for CMA.
> > > > 
> > > > I found this problem on following experiment.
> > > > 
> > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > > > make -j24
> > > > 
> > > > CMA reserve:		0 MB		512 MB
> > > > Elapsed-time:		234.8		361.8
> > > > Average-MemFree:	283880 KB	530851 KB
> > > > 
> > > > To solve this problem, I can think following 2 possible solutions.
> > > > 1. allocate the pages on cma reserved memory first, and if they are
> > > >    exhausted, allocate movable pages.
> > > > 2. interleaved allocation: try to allocate specific amounts of memory
> > > >    from cma reserved memory and then allocate from free movable memory.
> > > 
> > > I love this idea but when I see the code, I don't like that.
> > > In allocation path, just try to allocate pages by round-robin so it's role
> > > of allocator. If one of migratetype is full, just pass mission to reclaimer
> > > with hint(ie, Hey reclaimer, it's non-movable allocation fail
> > > so there is pointless if you reclaim MIGRATE_CMA pages) so that
> > > reclaimer can filter it out during page scanning.
> > > We already have an tool to achieve it(ie, isolate_mode_t).
> > 
> > Hello,
> > 
> > I agree with leaving fast allocation path as simple as possible.
> > I will remove runtime computation for determining ratio in
> > __rmqueue_cma() and, instead, will use pre-computed value calculated
> > on the other path.
> 
> Sounds good.
> 
> > 
> > I am not sure that whether your second suggestion(Hey relaimer part)
> > is good or not. In my quick thought, that could be helpful in the
> > situation that many free cma pages remained. But, it would be not helpful
> > when there are neither free movable and cma pages. In generally, most
> > workloads mainly uses movable pages for page cache or anonymous mapping.
> > Although reclaim is triggered by non-movable allocation failure, reclaimed
> > pages are used mostly by movable allocation. We can handle these allocation
> > request even if we reclaim the pages just in lru order. If we rotate
> > the lru list for finding movable pages, it could cause more useful
> > pages to be evicted.
> > 
> > This is just my quick thought, so please let me correct if I am wrong.
> 
> Why should reclaimer reclaim unnecessary pages?
> So, your answer is that it would be better because upcoming newly allocated
> pages would be allocated easily without interrupt. But it could reclaim
> too much pages until watermark for unmovable allocation is okay.
> Even, sometime, you might see OOM.
> 
> Moreover, how could you handle current trobule?
> For example, there is atomic allocation and the only thing to save the world
> is kswapd because it's one of kswapd role but kswapd is spending many time to
> reclaim CMA pages, which is pointless so the allocation would be easily failed.

Hello,

I guess that it isn't the problem. In lru, movable pages and cma pages
would be interleaved. So it doesn't takes too long time to get the
page for non-movable allocation.

IMHO, in generally, memory shortage is made by movable allocation, so
to distinguish allocation type and to handle them differently has
marginal effect.

Anyway, I will think more deeply.

> 
> > 
> > > 
> > > And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA?
> > > If possible, it would be better becauser it's generic function to check
> > > free pages and cause trigger reclaim/compaction logic.
> > 
> > I guess, your *it* means ratio computation. Right?
> 
> I meant just get_page_from_freelist like fair zone allocation for consistency
> but as we discussed offline, i'm not against with you if it's not right place.

Okay :)

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory
  2014-05-15  9:47         ` Mel Gorman
@ 2014-05-19  2:12           ` Joonsoo Kim
  0 siblings, 0 replies; 45+ messages in thread
From: Joonsoo Kim @ 2014-05-19  2:12 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Aneesh Kumar K.V, Marek Szyprowski, Andrew Morton, Rik van Riel,
	Johannes Weiner, Laura Abbott, Minchan Kim, Heesub Shin,
	Michal Nazarewicz, linux-mm, linux-kernel, Kyungmin Park,
	Bartlomiej Zolnierkiewicz, 'Tomasz Stanislawski'

On Thu, May 15, 2014 at 10:47:18AM +0100, Mel Gorman wrote:
> On Thu, May 15, 2014 at 11:10:55AM +0900, Joonsoo Kim wrote:
> > > That doesn't always prefer CMA region. It would be nice to
> > > understand why grouping in pageblock_nr_pages is beneficial. Also in
> > > your patch you decrement nr_try_cma for every 'order' allocation. Why ?
> > 
> > pageblock_nr_pages is just magic value with no rationale. :)
> 
> I'm not following this discussions closely but there is rational to that
> value -- it's the size of a huge page for that architecture.  At the time
> the fragmentation avoidance was implemented this was the largest allocation
> size of interest.

Hello,

Indeed. There is a such good rationale.
Really thanks for informing it.

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-18 17:36       ` Aneesh Kumar K.V
@ 2014-05-19  2:29         ` Joonsoo Kim
  0 siblings, 0 replies; 45+ messages in thread
From: Joonsoo Kim @ 2014-05-19  2:29 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Heesub Shin, Marek Szyprowski,
	Michal Nazarewicz, linux-mm, linux-kernel

On Sun, May 18, 2014 at 11:06:08PM +0530, Aneesh Kumar K.V wrote:
> Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:
> 
> > On Wed, May 14, 2014 at 02:12:19PM +0530, Aneesh Kumar K.V wrote:
> >> Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:
> >> 
> >> 
> >> 
> >> Another issue i am facing with the current code is the atomic allocation
> >> failing even with large number of CMA pages around. In my case we never
> >> reclaimed because large part of the memory is consumed by the page cache and
> >> for that, free memory check doesn't include at free_cma. I will test
> >> with this patchset and update here once i have the results.
> >> 
> >
> > Hello,
> >
> > Could you elaborate more on your issue?
> > I can't completely understand your problem.
> > So your atomic allocation is movable? And although there are many free
> > cma pages, that request is fail?
> >
> 
> non movable atomic allocations are failing because we don't have
> anything other than CMA pages left and kswapd is yet to catchup ?
> 
> 
>   swapper/0: page allocation failure: order:0, mode:0x20
>   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.23-1500.pkvm2_1.5.ppc64 #1
>   Call Trace:
>  [c000000ffffcb610] [c000000000017330] .show_stack+0x130/0x200 (unreliable)
>  [c000000ffffcb6e0] [c00000000087a8c8] .dump_stack+0x28/0x3c
>  [c000000ffffcb750] [c0000000001e06f0] .warn_alloc_failed+0x110/0x160
>  [c000000ffffcb800] [c0000000001e5984] .__alloc_pages_nodemask+0x9d4/0xbf0
>  [c000000ffffcb9e0] [c00000000023775c] .alloc_pages_current+0xcc/0x1b0
>  [c000000ffffcba80] [c0000000007098d4] .__netdev_alloc_frag+0x1a4/0x1d0
>  [c000000ffffcbb20] [c00000000070d750] .__netdev_alloc_skb+0xc0/0x130
>  [c000000ffffcbbb0] [d000000009639b40] .tg3_poll_work+0x900/0x1110 [tg3]
>  [c000000ffffcbd10] [d00000000963a3a4] .tg3_poll_msix+0x54/0x200 [tg3]
>  [c000000ffffcbdb0] [c00000000071fcec] .net_rx_action+0x1dc/0x310
>  [c000000ffffcbe90] [c0000000000c1b08] .__do_softirq+0x158/0x330
>  [c000000ffffcbf90] [c000000000025744] .call_do_softirq+0x14/0x24
>  [c000000ffffc7e00] [c000000000011684] .do_softirq+0xf4/0x130
>  [c000000ffffc7e90] [c0000000000c1f18] .irq_exit+0xc8/0x110
>  [c000000ffffc7f10] [c000000000011258] .__do_irq+0xc8/0x1f0
>  [c000000ffffc7f90] [c000000000025768] .call_do_irq+0x14/0x24
>  [c00000000137b750] [c00000000001142c] .do_IRQ+0xac/0x130
>  [c00000000137b800] [c000000000002a64]
>  hardware_interrupt_common+0x164/0x180
> 
> ....
> 
> 
>  Node 0 DMA: 408*64kB (C) 408*128kB (C) 408*256kB (C) 408*512kB (C) 408*1024kB (C) 406*2048kB (C) 199*4096kB (C) 97*8192kB (C) 6*16384kB (C) =
>  3348992kB
>  Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16384kB
>  Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16777216kB
> 
> meminfo details:
> 
>  MemTotal:       65875584 kB
>  MemFree:         8001856 kB
>  Buffers:        49330368 kB
>  Cached:           178752 kB
>  SwapCached:            0 kB
>  Active:         28550464 kB
>  Inactive:       25476416 kB
>  Active(anon):    3771008 kB
>  Inactive(anon):   767360 kB
>  Active(file):   24779456 kB
>  Inactive(file): 24709056 kB
>  Unevictable:       15104 kB
>  Mlocked:           15104 kB
>  SwapTotal:       8384448 kB
>  SwapFree:        8384448 kB
>  Dirty:                 0 kB
> 
> -aneesh
> 

Hello,

I think that third patch in this patchset would solve this problem.
Your problem may occur in following scenario.

1. Unmovable, reclaimable page are nearly empty.
2. There are some movable pages, so watermark checking is ok.
3. A lot of movable allocations are requested.
4. Most of movable pages are allocated.
5. But, watermark checking is still ok, because we have a lot of
   free cma pages and this allocation is for movable type.
   No waking up kswapd.
6. non-movable atomic allocation request => fail

So, the problem is in step #5. Althoght we have enough pages for
movable type, we should prepare allocation request for the others.
With my third patch, kswapd could be woken by movable allocation, so
your problem would disappreared.

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-19  2:11         ` Joonsoo Kim
@ 2014-05-19  2:53           ` Minchan Kim
  2014-05-19  4:50             ` Joonsoo Kim
  0 siblings, 1 reply; 45+ messages in thread
From: Minchan Kim @ 2014-05-19  2:53 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Michal Nazarewicz, Heesub Shin, Mel Gorman,
	Johannes Weiner, Marek Szyprowski

On Mon, May 19, 2014 at 11:11:21AM +0900, Joonsoo Kim wrote:
> On Thu, May 15, 2014 at 11:43:53AM +0900, Minchan Kim wrote:
> > On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote:
> > > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote:
> > > > Hey Joonsoo,
> > > > 
> > > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote:
> > > > > CMA is introduced to provide physically contiguous pages at runtime.
> > > > > For this purpose, it reserves memory at boot time. Although it reserve
> > > > > memory, this reserved memory can be used for movable memory allocation
> > > > > request. This usecase is beneficial to the system that needs this CMA
> > > > > reserved memory infrequently and it is one of main purpose of
> > > > > introducing CMA.
> > > > > 
> > > > > But, there is a problem in current implementation. The problem is that
> > > > > it works like as just reserved memory approach. The pages on cma reserved
> > > > > memory are hardly used for movable memory allocation. This is caused by
> > > > > combination of allocation and reclaim policy.
> > > > > 
> > > > > The pages on cma reserved memory are allocated if there is no movable
> > > > > memory, that is, as fallback allocation. So the time this fallback
> > > > > allocation is started is under heavy memory pressure. Although it is under
> > > > > memory pressure, movable allocation easily succeed, since there would be
> > > > > many pages on cma reserved memory. But this is not the case for unmovable
> > > > > and reclaimable allocation, because they can't use the pages on cma
> > > > > reserved memory. These allocations regard system's free memory as
> > > > > (free pages - free cma pages) on watermark checking, that is, free
> > > > > unmovable pages + free reclaimable pages + free movable pages. Because
> > > > > we already exhausted movable pages, only free pages we have are unmovable
> > > > > and reclaimable types and this would be really small amount. So watermark
> > > > > checking would be failed. It will wake up kswapd to make enough free
> > > > > memory for unmovable and reclaimable allocation and kswapd will do.
> > > > > So before we fully utilize pages on cma reserved memory, kswapd start to
> > > > > reclaim memory and try to make free memory over the high watermark. This
> > > > > watermark checking by kswapd doesn't take care free cma pages so many
> > > > > movable pages would be reclaimed. After then, we have a lot of movable
> > > > > pages again, so fallback allocation doesn't happen again. To conclude,
> > > > > amount of free memory on meminfo which includes free CMA pages is moving
> > > > > around 512 MB if I reserve 512 MB memory for CMA.
> > > > > 
> > > > > I found this problem on following experiment.
> > > > > 
> > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > > > > make -j24
> > > > > 
> > > > > CMA reserve:		0 MB		512 MB
> > > > > Elapsed-time:		234.8		361.8
> > > > > Average-MemFree:	283880 KB	530851 KB
> > > > > 
> > > > > To solve this problem, I can think following 2 possible solutions.
> > > > > 1. allocate the pages on cma reserved memory first, and if they are
> > > > >    exhausted, allocate movable pages.
> > > > > 2. interleaved allocation: try to allocate specific amounts of memory
> > > > >    from cma reserved memory and then allocate from free movable memory.
> > > > 
> > > > I love this idea but when I see the code, I don't like that.
> > > > In allocation path, just try to allocate pages by round-robin so it's role
> > > > of allocator. If one of migratetype is full, just pass mission to reclaimer
> > > > with hint(ie, Hey reclaimer, it's non-movable allocation fail
> > > > so there is pointless if you reclaim MIGRATE_CMA pages) so that
> > > > reclaimer can filter it out during page scanning.
> > > > We already have an tool to achieve it(ie, isolate_mode_t).
> > > 
> > > Hello,
> > > 
> > > I agree with leaving fast allocation path as simple as possible.
> > > I will remove runtime computation for determining ratio in
> > > __rmqueue_cma() and, instead, will use pre-computed value calculated
> > > on the other path.
> > 
> > Sounds good.
> > 
> > > 
> > > I am not sure that whether your second suggestion(Hey relaimer part)
> > > is good or not. In my quick thought, that could be helpful in the
> > > situation that many free cma pages remained. But, it would be not helpful
> > > when there are neither free movable and cma pages. In generally, most
> > > workloads mainly uses movable pages for page cache or anonymous mapping.
> > > Although reclaim is triggered by non-movable allocation failure, reclaimed
> > > pages are used mostly by movable allocation. We can handle these allocation
> > > request even if we reclaim the pages just in lru order. If we rotate
> > > the lru list for finding movable pages, it could cause more useful
> > > pages to be evicted.
> > > 
> > > This is just my quick thought, so please let me correct if I am wrong.
> > 
> > Why should reclaimer reclaim unnecessary pages?
> > So, your answer is that it would be better because upcoming newly allocated
> > pages would be allocated easily without interrupt. But it could reclaim
> > too much pages until watermark for unmovable allocation is okay.
> > Even, sometime, you might see OOM.
> > 
> > Moreover, how could you handle current trobule?
> > For example, there is atomic allocation and the only thing to save the world
> > is kswapd because it's one of kswapd role but kswapd is spending many time to
> > reclaim CMA pages, which is pointless so the allocation would be easily failed.
> 
> Hello,
> 
> I guess that it isn't the problem. In lru, movable pages and cma pages
> would be interleaved. So it doesn't takes too long time to get the
> page for non-movable allocation.

Please, don't assume there are ideal LRU ordering.
Newly allocated page by fairness allocation is located by head of LRU
while old pages are approaching the tail so there is huge time gab.
During the time, old pages could be dropped/promoting so one of side
could be filled with one type rather than interleaving both types pages
you expected.

Additionally, if you uses syncable backed device like ramdisk/zram
or something, pageout can be synchronized with page I/O.
In this case, reclaim time wouldn't be trivial than async I/O.
For exmaple, zram-swap case, it needs page copy + comperssion and
the speed depends on your CPU speed.

> 
> IMHO, in generally, memory shortage is made by movable allocation, so
> to distinguish allocation type and to handle them differently has
> marginal effect.

Again, please don't think workloads you know only and open the various
possiblity from the design although such consideration doesn't make code
ugly.

> 
> Anyway, I will think more deeply.

Yes, Please.

> 
> > 
> > > 
> > > > 
> > > > And we couldn't do it in zone_watermark_ok with set/reset ALLOC_CMA?
> > > > If possible, it would be better becauser it's generic function to check
> > > > free pages and cause trigger reclaim/compaction logic.
> > > 
> > > I guess, your *it* means ratio computation. Right?
> > 
> > I meant just get_page_from_freelist like fair zone allocation for consistency
> > but as we discussed offline, i'm not against with you if it's not right place.
> 
> Okay :)
> 
> Thanks.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-19  2:53           ` Minchan Kim
@ 2014-05-19  4:50             ` Joonsoo Kim
  2014-05-19 23:18               ` Minchan Kim
  0 siblings, 1 reply; 45+ messages in thread
From: Joonsoo Kim @ 2014-05-19  4:50 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Michal Nazarewicz, Heesub Shin, Mel Gorman,
	Johannes Weiner, Marek Szyprowski

On Mon, May 19, 2014 at 11:53:05AM +0900, Minchan Kim wrote:
> On Mon, May 19, 2014 at 11:11:21AM +0900, Joonsoo Kim wrote:
> > On Thu, May 15, 2014 at 11:43:53AM +0900, Minchan Kim wrote:
> > > On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote:
> > > > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote:
> > > > > Hey Joonsoo,
> > > > > 
> > > > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote:
> > > > > > CMA is introduced to provide physically contiguous pages at runtime.
> > > > > > For this purpose, it reserves memory at boot time. Although it reserve
> > > > > > memory, this reserved memory can be used for movable memory allocation
> > > > > > request. This usecase is beneficial to the system that needs this CMA
> > > > > > reserved memory infrequently and it is one of main purpose of
> > > > > > introducing CMA.
> > > > > > 
> > > > > > But, there is a problem in current implementation. The problem is that
> > > > > > it works like as just reserved memory approach. The pages on cma reserved
> > > > > > memory are hardly used for movable memory allocation. This is caused by
> > > > > > combination of allocation and reclaim policy.
> > > > > > 
> > > > > > The pages on cma reserved memory are allocated if there is no movable
> > > > > > memory, that is, as fallback allocation. So the time this fallback
> > > > > > allocation is started is under heavy memory pressure. Although it is under
> > > > > > memory pressure, movable allocation easily succeed, since there would be
> > > > > > many pages on cma reserved memory. But this is not the case for unmovable
> > > > > > and reclaimable allocation, because they can't use the pages on cma
> > > > > > reserved memory. These allocations regard system's free memory as
> > > > > > (free pages - free cma pages) on watermark checking, that is, free
> > > > > > unmovable pages + free reclaimable pages + free movable pages. Because
> > > > > > we already exhausted movable pages, only free pages we have are unmovable
> > > > > > and reclaimable types and this would be really small amount. So watermark
> > > > > > checking would be failed. It will wake up kswapd to make enough free
> > > > > > memory for unmovable and reclaimable allocation and kswapd will do.
> > > > > > So before we fully utilize pages on cma reserved memory, kswapd start to
> > > > > > reclaim memory and try to make free memory over the high watermark. This
> > > > > > watermark checking by kswapd doesn't take care free cma pages so many
> > > > > > movable pages would be reclaimed. After then, we have a lot of movable
> > > > > > pages again, so fallback allocation doesn't happen again. To conclude,
> > > > > > amount of free memory on meminfo which includes free CMA pages is moving
> > > > > > around 512 MB if I reserve 512 MB memory for CMA.
> > > > > > 
> > > > > > I found this problem on following experiment.
> > > > > > 
> > > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > > > > > make -j24
> > > > > > 
> > > > > > CMA reserve:		0 MB		512 MB
> > > > > > Elapsed-time:		234.8		361.8
> > > > > > Average-MemFree:	283880 KB	530851 KB
> > > > > > 
> > > > > > To solve this problem, I can think following 2 possible solutions.
> > > > > > 1. allocate the pages on cma reserved memory first, and if they are
> > > > > >    exhausted, allocate movable pages.
> > > > > > 2. interleaved allocation: try to allocate specific amounts of memory
> > > > > >    from cma reserved memory and then allocate from free movable memory.
> > > > > 
> > > > > I love this idea but when I see the code, I don't like that.
> > > > > In allocation path, just try to allocate pages by round-robin so it's role
> > > > > of allocator. If one of migratetype is full, just pass mission to reclaimer
> > > > > with hint(ie, Hey reclaimer, it's non-movable allocation fail
> > > > > so there is pointless if you reclaim MIGRATE_CMA pages) so that
> > > > > reclaimer can filter it out during page scanning.
> > > > > We already have an tool to achieve it(ie, isolate_mode_t).
> > > > 
> > > > Hello,
> > > > 
> > > > I agree with leaving fast allocation path as simple as possible.
> > > > I will remove runtime computation for determining ratio in
> > > > __rmqueue_cma() and, instead, will use pre-computed value calculated
> > > > on the other path.
> > > 
> > > Sounds good.
> > > 
> > > > 
> > > > I am not sure that whether your second suggestion(Hey relaimer part)
> > > > is good or not. In my quick thought, that could be helpful in the
> > > > situation that many free cma pages remained. But, it would be not helpful
> > > > when there are neither free movable and cma pages. In generally, most
> > > > workloads mainly uses movable pages for page cache or anonymous mapping.
> > > > Although reclaim is triggered by non-movable allocation failure, reclaimed
> > > > pages are used mostly by movable allocation. We can handle these allocation
> > > > request even if we reclaim the pages just in lru order. If we rotate
> > > > the lru list for finding movable pages, it could cause more useful
> > > > pages to be evicted.
> > > > 
> > > > This is just my quick thought, so please let me correct if I am wrong.
> > > 
> > > Why should reclaimer reclaim unnecessary pages?
> > > So, your answer is that it would be better because upcoming newly allocated
> > > pages would be allocated easily without interrupt. But it could reclaim
> > > too much pages until watermark for unmovable allocation is okay.
> > > Even, sometime, you might see OOM.
> > > 
> > > Moreover, how could you handle current trobule?
> > > For example, there is atomic allocation and the only thing to save the world
> > > is kswapd because it's one of kswapd role but kswapd is spending many time to
> > > reclaim CMA pages, which is pointless so the allocation would be easily failed.
> > 
> > Hello,
> > 
> > I guess that it isn't the problem. In lru, movable pages and cma pages
> > would be interleaved. So it doesn't takes too long time to get the
> > page for non-movable allocation.
> 
> Please, don't assume there are ideal LRU ordering.
> Newly allocated page by fairness allocation is located by head of LRU
> while old pages are approaching the tail so there is huge time gab.
> During the time, old pages could be dropped/promoting so one of side
> could be filled with one type rather than interleaving both types pages
> you expected.

I assumed general case, not ideal case.
Your example can be possible, but would be corner case.

> 
> Additionally, if you uses syncable backed device like ramdisk/zram
> or something, pageout can be synchronized with page I/O.
> In this case, reclaim time wouldn't be trivial than async I/O.
> For exmaple, zram-swap case, it needs page copy + comperssion and
> the speed depends on your CPU speed.

This is a general problem what zram-swap have,
although reclaiming cma pages worse the situation.

Thanks.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value
  2014-05-19  1:47           ` Gioh Kim
@ 2014-05-19  5:55             ` Joonsoo Kim
  2014-05-19  9:14               ` Gioh Kim
  2014-05-19 19:59               ` Michal Nazarewicz
  0 siblings, 2 replies; 45+ messages in thread
From: Joonsoo Kim @ 2014-05-19  5:55 UTC (permalink / raw)
  To: Gioh Kim
  Cc: Michal Nazarewicz, Minchan Kim, Andrew Morton, Rik van Riel,
	Laura Abbott, linux-mm, linux-kernel, Heesub Shin, Mel Gorman,
	Johannes Weiner, Marek Szyprowski, 이건호,
	gurugio

On Mon, May 19, 2014 at 10:47:12AM +0900, Gioh Kim wrote:
> Thank you for your advice. I didn't notice it.
> 
> I'm adding followings according to your advice:
> 
> - range restrict for CMA_SIZE_MBYTES and *CMA_SIZE_PERCENTAGE*
> I think this can prevent the wrong kernel option.
> 
> - change size_cmdline into default value SZ_16M
> I am not sure this can prevent if cma=0 cmdline option is also with base and limit options.

Hello,

I think that this problem is originated from atomic_pool_init().
If configured coherent_pool size is larger than default cma size,
it can be failed even if this patch is applied.

How about below patch?
It uses fallback allocation if CMA is failed.

Thanks.

-----------------8<---------------------
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 6b00be1..2909ab9 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -379,7 +379,7 @@ static int __init atomic_pool_init(void)
        unsigned long *bitmap;
        struct page *page;
        struct page **pages;
-       void *ptr;
+       void *ptr = NULL;
        int bitmap_size = BITS_TO_LONGS(nr_pages) * sizeof(long);
 
        bitmap = kzalloc(bitmap_size, GFP_KERNEL);
@@ -393,7 +393,7 @@ static int __init atomic_pool_init(void)
        if (IS_ENABLED(CONFIG_DMA_CMA))
                ptr = __alloc_from_contiguous(NULL, pool->size, prot, &page,
                                              atomic_pool_init);
-       else
+       if (!ptr)
                ptr = __alloc_remap_buffer(NULL, pool->size, gfp, prot, &page,
                                           atomic_pool_init);
        if (ptr) {


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value
  2014-05-19  5:55             ` Joonsoo Kim
@ 2014-05-19  9:14               ` Gioh Kim
  2014-05-19 19:59               ` Michal Nazarewicz
  1 sibling, 0 replies; 45+ messages in thread
From: Gioh Kim @ 2014-05-19  9:14 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Michal Nazarewicz, Minchan Kim, Andrew Morton, Rik van Riel,
	Laura Abbott, linux-mm, linux-kernel, Heesub Shin, Mel Gorman,
	Johannes Weiner, Marek Szyprowski, 이건호,
	gurugio

In __dma_alloc function, your patch can make __alloc_from_pool work.
But __alloc_from_contiguous doesn't work.
Therefore __dma_alloc sometimes works and sometimes not according to the gfp(__GFP_WAIT) flag.
Do I understand correctly?

I think __dma_alloc should work consistently.
Both of __alloc_from_contiguous and __alloc_from_pool should work together,
or both of them do not work.


2014-05-19 오후 2:55, Joonsoo Kim 쓴 글:
> On Mon, May 19, 2014 at 10:47:12AM +0900, Gioh Kim wrote:
>> Thank you for your advice. I didn't notice it.
>>
>> I'm adding followings according to your advice:
>>
>> - range restrict for CMA_SIZE_MBYTES and *CMA_SIZE_PERCENTAGE*
>> I think this can prevent the wrong kernel option.
>>
>> - change size_cmdline into default value SZ_16M
>> I am not sure this can prevent if cma=0 cmdline option is also with base and limit options.
>
> Hello,
>
> I think that this problem is originated from atomic_pool_init().
> If configured coherent_pool size is larger than default cma size,
> it can be failed even if this patch is applied.
>
> How about below patch?
> It uses fallback allocation if CMA is failed.
>
> Thanks.
>
> -----------------8<---------------------
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 6b00be1..2909ab9 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -379,7 +379,7 @@ static int __init atomic_pool_init(void)
>          unsigned long *bitmap;
>          struct page *page;
>          struct page **pages;
> -       void *ptr;
> +       void *ptr = NULL;
>          int bitmap_size = BITS_TO_LONGS(nr_pages) * sizeof(long);
>
>          bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> @@ -393,7 +393,7 @@ static int __init atomic_pool_init(void)
>          if (IS_ENABLED(CONFIG_DMA_CMA))
>                  ptr = __alloc_from_contiguous(NULL, pool->size, prot, &page,
>                                                atomic_pool_init);
> -       else
> +       if (!ptr)
>                  ptr = __alloc_remap_buffer(NULL, pool->size, gfp, prot, &page,
>                                             atomic_pool_init);
>          if (ptr) {
>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value
  2014-05-19  5:55             ` Joonsoo Kim
  2014-05-19  9:14               ` Gioh Kim
@ 2014-05-19 19:59               ` Michal Nazarewicz
  2014-05-20  0:50                 ` Gioh Kim
  1 sibling, 1 reply; 45+ messages in thread
From: Michal Nazarewicz @ 2014-05-19 19:59 UTC (permalink / raw)
  To: Joonsoo Kim, Gioh Kim
  Cc: Minchan Kim, Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Heesub Shin, Mel Gorman, Johannes Weiner,
	Marek Szyprowski, 이건호,
	gurugio

[-- Attachment #1: Type: text/plain, Size: 1894 bytes --]

On Sun, May 18 2014, Joonsoo Kim wrote:
> I think that this problem is originated from atomic_pool_init().
> If configured coherent_pool size is larger than default cma size,
> it can be failed even if this patch is applied.
>
> How about below patch?
> It uses fallback allocation if CMA is failed.

Yes, I thought about it, but __dma_alloc uses similar code:

	else if (!IS_ENABLED(CONFIG_DMA_CMA))
		addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, caller);
	else
		addr = __alloc_from_contiguous(dev, size, prot, &page, caller);

so it probably needs to be changed as well.

> -----------------8<---------------------
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 6b00be1..2909ab9 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -379,7 +379,7 @@ static int __init atomic_pool_init(void)
>         unsigned long *bitmap;
>         struct page *page;
>         struct page **pages;
> -       void *ptr;
> +       void *ptr = NULL;
>         int bitmap_size = BITS_TO_LONGS(nr_pages) * sizeof(long);
>  
>         bitmap = kzalloc(bitmap_size, GFP_KERNEL);
> @@ -393,7 +393,7 @@ static int __init atomic_pool_init(void)
>         if (IS_ENABLED(CONFIG_DMA_CMA))
>                 ptr = __alloc_from_contiguous(NULL, pool->size, prot, &page,
>                                               atomic_pool_init);
> -       else
> +       if (!ptr)
>                 ptr = __alloc_remap_buffer(NULL, pool->size, gfp, prot, &page,
>                                            atomic_pool_init);
>         if (ptr) {
>

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

[-- Attachment #2.1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-19  4:50             ` Joonsoo Kim
@ 2014-05-19 23:18               ` Minchan Kim
  2014-05-20  6:33                 ` Joonsoo Kim
  0 siblings, 1 reply; 45+ messages in thread
From: Minchan Kim @ 2014-05-19 23:18 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Michal Nazarewicz, Heesub Shin, Mel Gorman,
	Johannes Weiner, Marek Szyprowski

On Mon, May 19, 2014 at 01:50:01PM +0900, Joonsoo Kim wrote:
> On Mon, May 19, 2014 at 11:53:05AM +0900, Minchan Kim wrote:
> > On Mon, May 19, 2014 at 11:11:21AM +0900, Joonsoo Kim wrote:
> > > On Thu, May 15, 2014 at 11:43:53AM +0900, Minchan Kim wrote:
> > > > On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote:
> > > > > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote:
> > > > > > Hey Joonsoo,
> > > > > > 
> > > > > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote:
> > > > > > > CMA is introduced to provide physically contiguous pages at runtime.
> > > > > > > For this purpose, it reserves memory at boot time. Although it reserve
> > > > > > > memory, this reserved memory can be used for movable memory allocation
> > > > > > > request. This usecase is beneficial to the system that needs this CMA
> > > > > > > reserved memory infrequently and it is one of main purpose of
> > > > > > > introducing CMA.
> > > > > > > 
> > > > > > > But, there is a problem in current implementation. The problem is that
> > > > > > > it works like as just reserved memory approach. The pages on cma reserved
> > > > > > > memory are hardly used for movable memory allocation. This is caused by
> > > > > > > combination of allocation and reclaim policy.
> > > > > > > 
> > > > > > > The pages on cma reserved memory are allocated if there is no movable
> > > > > > > memory, that is, as fallback allocation. So the time this fallback
> > > > > > > allocation is started is under heavy memory pressure. Although it is under
> > > > > > > memory pressure, movable allocation easily succeed, since there would be
> > > > > > > many pages on cma reserved memory. But this is not the case for unmovable
> > > > > > > and reclaimable allocation, because they can't use the pages on cma
> > > > > > > reserved memory. These allocations regard system's free memory as
> > > > > > > (free pages - free cma pages) on watermark checking, that is, free
> > > > > > > unmovable pages + free reclaimable pages + free movable pages. Because
> > > > > > > we already exhausted movable pages, only free pages we have are unmovable
> > > > > > > and reclaimable types and this would be really small amount. So watermark
> > > > > > > checking would be failed. It will wake up kswapd to make enough free
> > > > > > > memory for unmovable and reclaimable allocation and kswapd will do.
> > > > > > > So before we fully utilize pages on cma reserved memory, kswapd start to
> > > > > > > reclaim memory and try to make free memory over the high watermark. This
> > > > > > > watermark checking by kswapd doesn't take care free cma pages so many
> > > > > > > movable pages would be reclaimed. After then, we have a lot of movable
> > > > > > > pages again, so fallback allocation doesn't happen again. To conclude,
> > > > > > > amount of free memory on meminfo which includes free CMA pages is moving
> > > > > > > around 512 MB if I reserve 512 MB memory for CMA.
> > > > > > > 
> > > > > > > I found this problem on following experiment.
> > > > > > > 
> > > > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > > > > > > make -j24
> > > > > > > 
> > > > > > > CMA reserve:		0 MB		512 MB
> > > > > > > Elapsed-time:		234.8		361.8
> > > > > > > Average-MemFree:	283880 KB	530851 KB
> > > > > > > 
> > > > > > > To solve this problem, I can think following 2 possible solutions.
> > > > > > > 1. allocate the pages on cma reserved memory first, and if they are
> > > > > > >    exhausted, allocate movable pages.
> > > > > > > 2. interleaved allocation: try to allocate specific amounts of memory
> > > > > > >    from cma reserved memory and then allocate from free movable memory.
> > > > > > 
> > > > > > I love this idea but when I see the code, I don't like that.
> > > > > > In allocation path, just try to allocate pages by round-robin so it's role
> > > > > > of allocator. If one of migratetype is full, just pass mission to reclaimer
> > > > > > with hint(ie, Hey reclaimer, it's non-movable allocation fail
> > > > > > so there is pointless if you reclaim MIGRATE_CMA pages) so that
> > > > > > reclaimer can filter it out during page scanning.
> > > > > > We already have an tool to achieve it(ie, isolate_mode_t).
> > > > > 
> > > > > Hello,
> > > > > 
> > > > > I agree with leaving fast allocation path as simple as possible.
> > > > > I will remove runtime computation for determining ratio in
> > > > > __rmqueue_cma() and, instead, will use pre-computed value calculated
> > > > > on the other path.
> > > > 
> > > > Sounds good.
> > > > 
> > > > > 
> > > > > I am not sure that whether your second suggestion(Hey relaimer part)
> > > > > is good or not. In my quick thought, that could be helpful in the
> > > > > situation that many free cma pages remained. But, it would be not helpful
> > > > > when there are neither free movable and cma pages. In generally, most
> > > > > workloads mainly uses movable pages for page cache or anonymous mapping.
> > > > > Although reclaim is triggered by non-movable allocation failure, reclaimed
> > > > > pages are used mostly by movable allocation. We can handle these allocation
> > > > > request even if we reclaim the pages just in lru order. If we rotate
> > > > > the lru list for finding movable pages, it could cause more useful
> > > > > pages to be evicted.
> > > > > 
> > > > > This is just my quick thought, so please let me correct if I am wrong.
> > > > 
> > > > Why should reclaimer reclaim unnecessary pages?
> > > > So, your answer is that it would be better because upcoming newly allocated
> > > > pages would be allocated easily without interrupt. But it could reclaim
> > > > too much pages until watermark for unmovable allocation is okay.
> > > > Even, sometime, you might see OOM.
> > > > 
> > > > Moreover, how could you handle current trobule?
> > > > For example, there is atomic allocation and the only thing to save the world
> > > > is kswapd because it's one of kswapd role but kswapd is spending many time to
> > > > reclaim CMA pages, which is pointless so the allocation would be easily failed.
> > > 
> > > Hello,
> > > 
> > > I guess that it isn't the problem. In lru, movable pages and cma pages
> > > would be interleaved. So it doesn't takes too long time to get the
> > > page for non-movable allocation.
> > 
> > Please, don't assume there are ideal LRU ordering.
> > Newly allocated page by fairness allocation is located by head of LRU
> > while old pages are approaching the tail so there is huge time gab.
> > During the time, old pages could be dropped/promoting so one of side
> > could be filled with one type rather than interleaving both types pages
> > you expected.
> 
> I assumed general case, not ideal case.
> Your example can be possible, but would be corner case.

I talked with Joonsoo yesterday and should post our conclusion
for other reviewers/maintainers.

It's not a corner case and it could happen depending on zone and CMA
configuration. For example, there is 330M high zone and CMA consumes
300M in the space while normal movable area consumes just 30M.
In the case, unmovable allocation could make too many unnecessary
reclaiming of the zone so the conclusion we reached is to need target
reclaiming(ex, isolate_mode_t).

But not sure it should be part of this patchset because this patchset
is surely enhance(ie, before, it was hard to allocate page from CMA area
but this patchset makes it works) but this patchset could make mentioned
problem as side-effect so I think we could solve the issue(ie, too many
reclaiming in unbalanced zone) in another patchset.

Joonsoo, please mention this problem in the description when you respin
so other MM guys can notice that and give ideas, which would be helpful
a lot.

> 
> > 
> > Additionally, if you uses syncable backed device like ramdisk/zram
> > or something, pageout can be synchronized with page I/O.
> > In this case, reclaim time wouldn't be trivial than async I/O.
> > For exmaple, zram-swap case, it needs page copy + comperssion and
> > the speed depends on your CPU speed.
> 
> This is a general problem what zram-swap have,
> although reclaiming cma pages worse the situation.
> 
> Thanks.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-15  2:45       ` Heesub Shin
  2014-05-15  5:06         ` Minchan Kim
@ 2014-05-19 23:22         ` Minchan Kim
  1 sibling, 0 replies; 45+ messages in thread
From: Minchan Kim @ 2014-05-19 23:22 UTC (permalink / raw)
  To: Heesub Shin
  Cc: Joonsoo Kim, Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Michal Nazarewicz, Mel Gorman, Johannes Weiner,
	Marek Szyprowski

On Thu, May 15, 2014 at 11:45:31AM +0900, Heesub Shin wrote:
> Hello,
> 
> On 05/15/2014 10:53 AM, Joonsoo Kim wrote:
> >On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote:
> >>Hey Joonsoo,
> >>
> >>On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote:
> >>>CMA is introduced to provide physically contiguous pages at runtime.
> >>>For this purpose, it reserves memory at boot time. Although it reserve
> >>>memory, this reserved memory can be used for movable memory allocation
> >>>request. This usecase is beneficial to the system that needs this CMA
> >>>reserved memory infrequently and it is one of main purpose of
> >>>introducing CMA.
> >>>
> >>>But, there is a problem in current implementation. The problem is that
> >>>it works like as just reserved memory approach. The pages on cma reserved
> >>>memory are hardly used for movable memory allocation. This is caused by
> >>>combination of allocation and reclaim policy.
> >>>
> >>>The pages on cma reserved memory are allocated if there is no movable
> >>>memory, that is, as fallback allocation. So the time this fallback
> >>>allocation is started is under heavy memory pressure. Although it is under
> >>>memory pressure, movable allocation easily succeed, since there would be
> >>>many pages on cma reserved memory. But this is not the case for unmovable
> >>>and reclaimable allocation, because they can't use the pages on cma
> >>>reserved memory. These allocations regard system's free memory as
> >>>(free pages - free cma pages) on watermark checking, that is, free
> >>>unmovable pages + free reclaimable pages + free movable pages. Because
> >>>we already exhausted movable pages, only free pages we have are unmovable
> >>>and reclaimable types and this would be really small amount. So watermark
> >>>checking would be failed. It will wake up kswapd to make enough free
> >>>memory for unmovable and reclaimable allocation and kswapd will do.
> >>>So before we fully utilize pages on cma reserved memory, kswapd start to
> >>>reclaim memory and try to make free memory over the high watermark. This
> >>>watermark checking by kswapd doesn't take care free cma pages so many
> >>>movable pages would be reclaimed. After then, we have a lot of movable
> >>>pages again, so fallback allocation doesn't happen again. To conclude,
> >>>amount of free memory on meminfo which includes free CMA pages is moving
> >>>around 512 MB if I reserve 512 MB memory for CMA.
> >>>
> >>>I found this problem on following experiment.
> >>>
> >>>4 CPUs, 1024 MB, VIRTUAL MACHINE
> >>>make -j24
> >>>
> >>>CMA reserve:		0 MB		512 MB
> >>>Elapsed-time:		234.8		361.8
> >>>Average-MemFree:	283880 KB	530851 KB
> >>>
> >>>To solve this problem, I can think following 2 possible solutions.
> >>>1. allocate the pages on cma reserved memory first, and if they are
> >>>    exhausted, allocate movable pages.
> >>>2. interleaved allocation: try to allocate specific amounts of memory
> >>>    from cma reserved memory and then allocate from free movable memory.
> >>
> >>I love this idea but when I see the code, I don't like that.
> >>In allocation path, just try to allocate pages by round-robin so it's role
> >>of allocator. If one of migratetype is full, just pass mission to reclaimer
> >>with hint(ie, Hey reclaimer, it's non-movable allocation fail
> >>so there is pointless if you reclaim MIGRATE_CMA pages) so that
> >>reclaimer can filter it out during page scanning.
> >>We already have an tool to achieve it(ie, isolate_mode_t).
> >
> >Hello,
> >
> >I agree with leaving fast allocation path as simple as possible.
> >I will remove runtime computation for determining ratio in
> >__rmqueue_cma() and, instead, will use pre-computed value calculated
> >on the other path.
> >
> >I am not sure that whether your second suggestion(Hey relaimer part)
> >is good or not. In my quick thought, that could be helpful in the
> >situation that many free cma pages remained. But, it would be not helpful
> >when there are neither free movable and cma pages. In generally, most
> >workloads mainly uses movable pages for page cache or anonymous mapping.
> >Although reclaim is triggered by non-movable allocation failure, reclaimed
> >pages are used mostly by movable allocation. We can handle these allocation
> >request even if we reclaim the pages just in lru order. If we rotate
> >the lru list for finding movable pages, it could cause more useful
> >pages to be evicted.
> >
> >This is just my quick thought, so please let me correct if I am wrong.
> 
> We have an out of tree implementation that is completely the same
> with the approach Minchan said and it works, but it has definitely
> some side-effects as you pointed, distorting the LRU and evicting
> hot pages. I do not attach code fragments in this thread for some
> reasons, but it must be easy for yourself. I am wondering if it
> could help also in your case.
> 
> Thanks,
> Heesub

Heesub, To be sure, did you try round-robin allocate like Joonsoo's
approach and happend such LRU churning problem?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value
  2014-05-19 19:59               ` Michal Nazarewicz
@ 2014-05-20  0:50                 ` Gioh Kim
  2014-05-20  1:28                   ` Michal Nazarewicz
  2014-05-20 11:38                   ` Marek Szyprowski
  0 siblings, 2 replies; 45+ messages in thread
From: Gioh Kim @ 2014-05-20  0:50 UTC (permalink / raw)
  To: Michal Nazarewicz, Joonsoo Kim
  Cc: Minchan Kim, Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Heesub Shin, Mel Gorman, Johannes Weiner,
	Marek Szyprowski, 이건호,
	gurugio



2014-05-20 오전 4:59, Michal Nazarewicz 쓴 글:
> On Sun, May 18 2014, Joonsoo Kim wrote:
>> I think that this problem is originated from atomic_pool_init().
>> If configured coherent_pool size is larger than default cma size,
>> it can be failed even if this patch is applied.

The coherent_pool size (atomic_pool.size) should be restricted smaller than cma size.

This is another issue, however I think the default atomic pool size is too small.
Only one port of USB host needs at most 256Kbytes coherent memory (according to the USB host spec).
If a platform has several ports, it needs more than 1MB.
Therefore the default atomic pool size should be at least 1MB.

>>
>> How about below patch?
>> It uses fallback allocation if CMA is failed.
>
> Yes, I thought about it, but __dma_alloc uses similar code:
>
> 	else if (!IS_ENABLED(CONFIG_DMA_CMA))
> 		addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, caller);
> 	else
> 		addr = __alloc_from_contiguous(dev, size, prot, &page, caller);
>
> so it probably needs to be changed as well.

If CMA option is not selected, __alloc_from_contiguous would not be called.
We don't need to the fallback allocation.

And if CMA option is selected and initialized correctly,
the cma allocation can fail in case of no-CMA-memory situation.
I thinks in that case we don't need to the fallback allocation also,
because it is normal case.

Therefore I think the restriction of CMA size option and make CMA work can cover every cases.

I think below patch is also good choice.
If both of you, Michal and Joonsoo, do not agree with me, please inform me.
I will make a patch including option restriction and fallback allocation.

>
>> -----------------8<---------------------
>> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>> index 6b00be1..2909ab9 100644
>> --- a/arch/arm/mm/dma-mapping.c
>> +++ b/arch/arm/mm/dma-mapping.c
>> @@ -379,7 +379,7 @@ static int __init atomic_pool_init(void)
>>          unsigned long *bitmap;
>>          struct page *page;
>>          struct page **pages;
>> -       void *ptr;
>> +       void *ptr = NULL;
>>          int bitmap_size = BITS_TO_LONGS(nr_pages) * sizeof(long);
>>
>>          bitmap = kzalloc(bitmap_size, GFP_KERNEL);
>> @@ -393,7 +393,7 @@ static int __init atomic_pool_init(void)
>>          if (IS_ENABLED(CONFIG_DMA_CMA))
>>                  ptr = __alloc_from_contiguous(NULL, pool->size, prot, &page,
>>                                                atomic_pool_init);
>> -       else
>> +       if (!ptr)
>>                  ptr = __alloc_remap_buffer(NULL, pool->size, gfp, prot, &page,
>>                                             atomic_pool_init);
>>          if (ptr) {
>>
>
>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value
  2014-05-20  0:50                 ` Gioh Kim
@ 2014-05-20  1:28                   ` Michal Nazarewicz
  2014-05-20  2:26                     ` Gioh Kim
  2014-05-20 11:38                   ` Marek Szyprowski
  1 sibling, 1 reply; 45+ messages in thread
From: Michal Nazarewicz @ 2014-05-20  1:28 UTC (permalink / raw)
  To: Gioh Kim, Joonsoo Kim
  Cc: Minchan Kim, Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Heesub Shin, Mel Gorman, Johannes Weiner,
	Marek Szyprowski, 이건호,
	gurugio

[-- Attachment #1: Type: text/plain, Size: 1206 bytes --]

On Mon, May 19 2014, Gioh Kim wrote:
> If CMA option is not selected, __alloc_from_contiguous would not be
> called.  We don't need to the fallback allocation.
>
> And if CMA option is selected and initialized correctly,
> the cma allocation can fail in case of no-CMA-memory situation.
> I thinks in that case we don't need to the fallback allocation also,
> because it is normal case.
>
> Therefore I think the restriction of CMA size option and make CMA work
> can cover every cases.

Wait, you just wrote that if CMA is not initialised correctly, it's fine
for atomic pool initialisation to fail, but if CMA size is initialised
correctly but too small, this is somehow worse situation?  I'm a bit
confused to be honest.

IMO, cma=0 command line argument should be supported, as should having
the default CMA size zero.  If CMA size is set to zero, kernel should
behave as if CMA was not enabled at compile time.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

[-- Attachment #2.1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value
  2014-05-20  1:28                   ` Michal Nazarewicz
@ 2014-05-20  2:26                     ` Gioh Kim
  2014-05-20 18:15                       ` Michal Nazarewicz
  0 siblings, 1 reply; 45+ messages in thread
From: Gioh Kim @ 2014-05-20  2:26 UTC (permalink / raw)
  To: Michal Nazarewicz, Joonsoo Kim
  Cc: Minchan Kim, Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Heesub Shin, Mel Gorman, Johannes Weiner,
	Marek Szyprowski, 이건호,
	gurugio


2014-05-20 오전 10:28, Michal Nazarewicz 쓴 글:
> On Mon, May 19 2014, Gioh Kim wrote:
>> If CMA option is not selected, __alloc_from_contiguous would not be
>> called.  We don't need to the fallback allocation.
>>
>> And if CMA option is selected and initialized correctly,
>> the cma allocation can fail in case of no-CMA-memory situation.
>> I thinks in that case we don't need to the fallback allocation also,
>> because it is normal case.
>>
>> Therefore I think the restriction of CMA size option and make CMA work
>> can cover every cases.
>
> Wait, you just wrote that if CMA is not initialised correctly, it's fine
> for atomic pool initialisation to fail, but if CMA size is initialised
> correctly but too small, this is somehow worse situation?  I'm a bit
> confused to be honest.

I'm sorry to confuse you.
Please forgive my poor English.
My point is atomic_pool should be able to work with/without CMA.


>
> IMO, cma=0 command line argument should be supported, as should having
> the default CMA size zero.  If CMA size is set to zero, kernel should
> behave as if CMA was not enabled at compile time.

It's also good if atomic_pool can work well with zero CMA size.

I can give up my patch.
But Joonsoo's patch should be applied.

Joonsoo, can you please send the full patch to maintainers?

>
>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-19 23:18               ` Minchan Kim
@ 2014-05-20  6:33                 ` Joonsoo Kim
  0 siblings, 0 replies; 45+ messages in thread
From: Joonsoo Kim @ 2014-05-20  6:33 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Michal Nazarewicz, Heesub Shin, Mel Gorman,
	Johannes Weiner, Marek Szyprowski

On Tue, May 20, 2014 at 08:18:59AM +0900, Minchan Kim wrote:
> On Mon, May 19, 2014 at 01:50:01PM +0900, Joonsoo Kim wrote:
> > On Mon, May 19, 2014 at 11:53:05AM +0900, Minchan Kim wrote:
> > > On Mon, May 19, 2014 at 11:11:21AM +0900, Joonsoo Kim wrote:
> > > > On Thu, May 15, 2014 at 11:43:53AM +0900, Minchan Kim wrote:
> > > > > On Thu, May 15, 2014 at 10:53:01AM +0900, Joonsoo Kim wrote:
> > > > > > On Tue, May 13, 2014 at 12:00:57PM +0900, Minchan Kim wrote:
> > > > > > > Hey Joonsoo,
> > > > > > > 
> > > > > > > On Thu, May 08, 2014 at 09:32:23AM +0900, Joonsoo Kim wrote:
> > > > > > > > CMA is introduced to provide physically contiguous pages at runtime.
> > > > > > > > For this purpose, it reserves memory at boot time. Although it reserve
> > > > > > > > memory, this reserved memory can be used for movable memory allocation
> > > > > > > > request. This usecase is beneficial to the system that needs this CMA
> > > > > > > > reserved memory infrequently and it is one of main purpose of
> > > > > > > > introducing CMA.
> > > > > > > > 
> > > > > > > > But, there is a problem in current implementation. The problem is that
> > > > > > > > it works like as just reserved memory approach. The pages on cma reserved
> > > > > > > > memory are hardly used for movable memory allocation. This is caused by
> > > > > > > > combination of allocation and reclaim policy.
> > > > > > > > 
> > > > > > > > The pages on cma reserved memory are allocated if there is no movable
> > > > > > > > memory, that is, as fallback allocation. So the time this fallback
> > > > > > > > allocation is started is under heavy memory pressure. Although it is under
> > > > > > > > memory pressure, movable allocation easily succeed, since there would be
> > > > > > > > many pages on cma reserved memory. But this is not the case for unmovable
> > > > > > > > and reclaimable allocation, because they can't use the pages on cma
> > > > > > > > reserved memory. These allocations regard system's free memory as
> > > > > > > > (free pages - free cma pages) on watermark checking, that is, free
> > > > > > > > unmovable pages + free reclaimable pages + free movable pages. Because
> > > > > > > > we already exhausted movable pages, only free pages we have are unmovable
> > > > > > > > and reclaimable types and this would be really small amount. So watermark
> > > > > > > > checking would be failed. It will wake up kswapd to make enough free
> > > > > > > > memory for unmovable and reclaimable allocation and kswapd will do.
> > > > > > > > So before we fully utilize pages on cma reserved memory, kswapd start to
> > > > > > > > reclaim memory and try to make free memory over the high watermark. This
> > > > > > > > watermark checking by kswapd doesn't take care free cma pages so many
> > > > > > > > movable pages would be reclaimed. After then, we have a lot of movable
> > > > > > > > pages again, so fallback allocation doesn't happen again. To conclude,
> > > > > > > > amount of free memory on meminfo which includes free CMA pages is moving
> > > > > > > > around 512 MB if I reserve 512 MB memory for CMA.
> > > > > > > > 
> > > > > > > > I found this problem on following experiment.
> > > > > > > > 
> > > > > > > > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > > > > > > > make -j24
> > > > > > > > 
> > > > > > > > CMA reserve:		0 MB		512 MB
> > > > > > > > Elapsed-time:		234.8		361.8
> > > > > > > > Average-MemFree:	283880 KB	530851 KB
> > > > > > > > 
> > > > > > > > To solve this problem, I can think following 2 possible solutions.
> > > > > > > > 1. allocate the pages on cma reserved memory first, and if they are
> > > > > > > >    exhausted, allocate movable pages.
> > > > > > > > 2. interleaved allocation: try to allocate specific amounts of memory
> > > > > > > >    from cma reserved memory and then allocate from free movable memory.
> > > > > > > 
> > > > > > > I love this idea but when I see the code, I don't like that.
> > > > > > > In allocation path, just try to allocate pages by round-robin so it's role
> > > > > > > of allocator. If one of migratetype is full, just pass mission to reclaimer
> > > > > > > with hint(ie, Hey reclaimer, it's non-movable allocation fail
> > > > > > > so there is pointless if you reclaim MIGRATE_CMA pages) so that
> > > > > > > reclaimer can filter it out during page scanning.
> > > > > > > We already have an tool to achieve it(ie, isolate_mode_t).
> > > > > > 
> > > > > > Hello,
> > > > > > 
> > > > > > I agree with leaving fast allocation path as simple as possible.
> > > > > > I will remove runtime computation for determining ratio in
> > > > > > __rmqueue_cma() and, instead, will use pre-computed value calculated
> > > > > > on the other path.
> > > > > 
> > > > > Sounds good.
> > > > > 
> > > > > > 
> > > > > > I am not sure that whether your second suggestion(Hey relaimer part)
> > > > > > is good or not. In my quick thought, that could be helpful in the
> > > > > > situation that many free cma pages remained. But, it would be not helpful
> > > > > > when there are neither free movable and cma pages. In generally, most
> > > > > > workloads mainly uses movable pages for page cache or anonymous mapping.
> > > > > > Although reclaim is triggered by non-movable allocation failure, reclaimed
> > > > > > pages are used mostly by movable allocation. We can handle these allocation
> > > > > > request even if we reclaim the pages just in lru order. If we rotate
> > > > > > the lru list for finding movable pages, it could cause more useful
> > > > > > pages to be evicted.
> > > > > > 
> > > > > > This is just my quick thought, so please let me correct if I am wrong.
> > > > > 
> > > > > Why should reclaimer reclaim unnecessary pages?
> > > > > So, your answer is that it would be better because upcoming newly allocated
> > > > > pages would be allocated easily without interrupt. But it could reclaim
> > > > > too much pages until watermark for unmovable allocation is okay.
> > > > > Even, sometime, you might see OOM.
> > > > > 
> > > > > Moreover, how could you handle current trobule?
> > > > > For example, there is atomic allocation and the only thing to save the world
> > > > > is kswapd because it's one of kswapd role but kswapd is spending many time to
> > > > > reclaim CMA pages, which is pointless so the allocation would be easily failed.
> > > > 
> > > > Hello,
> > > > 
> > > > I guess that it isn't the problem. In lru, movable pages and cma pages
> > > > would be interleaved. So it doesn't takes too long time to get the
> > > > page for non-movable allocation.
> > > 
> > > Please, don't assume there are ideal LRU ordering.
> > > Newly allocated page by fairness allocation is located by head of LRU
> > > while old pages are approaching the tail so there is huge time gab.
> > > During the time, old pages could be dropped/promoting so one of side
> > > could be filled with one type rather than interleaving both types pages
> > > you expected.
> > 
> > I assumed general case, not ideal case.
> > Your example can be possible, but would be corner case.
> 
> I talked with Joonsoo yesterday and should post our conclusion
> for other reviewers/maintainers.
> 
> It's not a corner case and it could happen depending on zone and CMA
> configuration. For example, there is 330M high zone and CMA consumes
> 300M in the space while normal movable area consumes just 30M.
> In the case, unmovable allocation could make too many unnecessary
> reclaiming of the zone so the conclusion we reached is to need target
> reclaiming(ex, isolate_mode_t).
> 
> But not sure it should be part of this patchset because this patchset
> is surely enhance(ie, before, it was hard to allocate page from CMA area
> but this patchset makes it works) but this patchset could make mentioned
> problem as side-effect so I think we could solve the issue(ie, too many
> reclaiming in unbalanced zone) in another patchset.
> 
> Joonsoo, please mention this problem in the description when you respin
> so other MM guys can notice that and give ideas, which would be helpful
> a lot.

Okay. Will do :)

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value
  2014-05-20  0:50                 ` Gioh Kim
  2014-05-20  1:28                   ` Michal Nazarewicz
@ 2014-05-20 11:38                   ` Marek Szyprowski
  2014-05-21  0:15                     ` Gioh Kim
  1 sibling, 1 reply; 45+ messages in thread
From: Marek Szyprowski @ 2014-05-20 11:38 UTC (permalink / raw)
  To: Gioh Kim, Michal Nazarewicz, Joonsoo Kim
  Cc: Minchan Kim, Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Heesub Shin, Mel Gorman, Johannes Weiner,
	이건호,
	gurugio

Hello,

On 2014-05-20 02:50, Gioh Kim wrote:
>
>
> 2014-05-20 오전 4:59, Michal Nazarewicz 쓴 글:
>> On Sun, May 18 2014, Joonsoo Kim wrote:
>>> I think that this problem is originated from atomic_pool_init().
>>> If configured coherent_pool size is larger than default cma size,
>>> it can be failed even if this patch is applied.
>
> The coherent_pool size (atomic_pool.size) should be restricted smaller 
> than cma size.
>
> This is another issue, however I think the default atomic pool size is 
> too small.
> Only one port of USB host needs at most 256Kbytes coherent memory 
> (according to the USB host spec).

This pool is used only for allocation done in atomic context (allocations
done with GFP_ATOMIC flag), otherwise the standard allocation path is used.
Are you sure that each usb host port really needs so much memory allocated
in atomic context?

> If a platform has several ports, it needs more than 1MB.
> Therefore the default atomic pool size should be at least 1MB.
>
>>>
>>> How about below patch?
>>> It uses fallback allocation if CMA is failed.
>>
>> Yes, I thought about it, but __dma_alloc uses similar code:
>>
>>     else if (!IS_ENABLED(CONFIG_DMA_CMA))
>>         addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, 
>> caller);
>>     else
>>         addr = __alloc_from_contiguous(dev, size, prot, &page, caller);
>>
>> so it probably needs to be changed as well.
>
> If CMA option is not selected, __alloc_from_contiguous would not be 
> called.
> We don't need to the fallback allocation.
>
> And if CMA option is selected and initialized correctly,
> the cma allocation can fail in case of no-CMA-memory situation.
> I thinks in that case we don't need to the fallback allocation also,
> because it is normal case.
>
> Therefore I think the restriction of CMA size option and make CMA work 
> can cover every cases.
>
> I think below patch is also good choice.
> If both of you, Michal and Joonsoo, do not agree with me, please 
> inform me.
> I will make a patch including option restriction and fallback allocation.

I'm not sure if we need a fallback for failed CMA allocation. The only 
issue that
have been mentioned here and needs to be resolved is support for 
disabling cma by
kernel command line. Right now it will fails completely.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value
  2014-05-20  2:26                     ` Gioh Kim
@ 2014-05-20 18:15                       ` Michal Nazarewicz
  0 siblings, 0 replies; 45+ messages in thread
From: Michal Nazarewicz @ 2014-05-20 18:15 UTC (permalink / raw)
  To: Gioh Kim, Joonsoo Kim
  Cc: Minchan Kim, Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Heesub Shin, Mel Gorman, Johannes Weiner,
	Marek Szyprowski, 이건호,
	gurugio

[-- Attachment #1: Type: text/plain, Size: 673 bytes --]

On Mon, May 19 2014, Gioh Kim wrote:
> My point is atomic_pool should be able to work with/without CMA.

Agreed.

>> IMO, cma=0 command line argument should be supported, as should having
>> the default CMA size zero.  If CMA size is set to zero, kernel should
>> behave as if CMA was not enabled at compile time.

> It's also good if atomic_pool can work well with zero CMA size.

Exactly.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

[-- Attachment #2.1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value
  2014-05-20 11:38                   ` Marek Szyprowski
@ 2014-05-21  0:15                     ` Gioh Kim
  0 siblings, 0 replies; 45+ messages in thread
From: Gioh Kim @ 2014-05-21  0:15 UTC (permalink / raw)
  To: Marek Szyprowski, Michal Nazarewicz, Joonsoo Kim
  Cc: Minchan Kim, Andrew Morton, Rik van Riel, Laura Abbott, linux-mm,
	linux-kernel, Heesub Shin, Mel Gorman, Johannes Weiner,
	이건호,
	gurugio



2014-05-20 오후 8:38, Marek Szyprowski 쓴 글:
> Hello,
>
> On 2014-05-20 02:50, Gioh Kim wrote:
>>
>>
>> 2014-05-20 오전 4:59, Michal Nazarewicz 쓴 글:
>>> On Sun, May 18 2014, Joonsoo Kim wrote:
>>>> I think that this problem is originated from atomic_pool_init().
>>>> If configured coherent_pool size is larger than default cma size,
>>>> it can be failed even if this patch is applied.
>>
>> The coherent_pool size (atomic_pool.size) should be restricted smaller than cma size.
>>
>> This is another issue, however I think the default atomic pool size is too small.
>> Only one port of USB host needs at most 256Kbytes coherent memory (according to the USB host spec).
>
> This pool is used only for allocation done in atomic context (allocations
> done with GFP_ATOMIC flag), otherwise the standard allocation path is used.
> Are you sure that each usb host port really needs so much memory allocated
> in atomic context?

I don't know why but drivers/usb/host/ohci-hcd.c:ohci_init() calls dma_alloc_coherent with zero gfp.
Therefore it occurs panic if CMA is turned on and CONFIG_CMA_SIZE_MBYTES is zero.
A pointer pool->vaddr is NULL in __alloc_from_pool.

Below is my kernel message.
[ 24.339858] -----------[ cut here ]-----------
[ 24.344535] WARNING: at arch/arm/mm/dma-mapping.c:492 __dma_alloc.isra.19+0x25c/0x2a4()
[ 24.352554] coherent pool not initialised!
[ 24.356644] Modules linked in:
[ 24.359701] CPU: 1 PID: 711 Comm: sh Not tainted 3.10.19+ #42
[ 24.365488] [<800140e0>] (unwind_backtrace+0x0/0xf8) from [<80011f20>] (show_stack+0x10/0x14)
[ 24.374045] [<80011f20>] (show_stack+0x10/0x14) from [<8001f21c>] (warn_slowpath_common+0x4c/0x6c)
[ 24.383022] [<8001f21c>] (warn_slowpath_common+0x4c/0x6c) from [<8001f2d0>] (warn_slowpath_fmt+0x30/0x40)
[ 24.392602] [<8001f2d0>] (warn_slowpath_fmt+0x30/0x40) from [<80017f5c>] (__dma_alloc.isra.19+0x25c/0x2a4)
[ 24.402270] [<80017f5c>] (__dma_alloc.isra.19+0x25c/0x2a4) from [<800180d0>] (arm_dma_alloc+0x90/0x98)
[ 24.411580] [<800180d0>] (arm_dma_alloc+0x90/0x98) from [<8034ab54>] (ohci_init+0x1b0/0x278)
[ 24.420035] [<8034ab54>] (ohci_init+0x1b0/0x278) from [<80332e00>] (usb_add_hcd+0x184/0x5b8)
[ 24.428484] [<80332e00>] (usb_add_hcd+0x184/0x5b8) from [<8034b8d4>] (ohci_platform_probe+0xd0/0x174)
[ 24.437713] [<8034b8d4>] (ohci_platform_probe+0xd0/0x174) from [<802f1cac>] (platform_drv_probe+0x14/0x18)
[ 24.447385] [<802f1cac>] (platform_drv_probe+0x14/0x18) from [<802f0a54>] (driver_probe_device+0x6c/0x1f8)
[ 24.457049] [<802f0a54>] (driver_probe_device+0x6c/0x1f8) from [<802ef16c>] (bus_for_each_drv+0x44/0x8c)
[ 24.466537] [<802ef16c>] (bus_for_each_drv+0x44/0x8c) from [<802f09bc>] (device_attach+0x74/0x80)
[ 24.475416] [<802f09bc>] (device_attach+0x74/0x80) from [<802f0050>] (bus_probe_device+0x84/0xa8)
[ 24.484295] [<802f0050>] (bus_probe_device+0x84/0xa8) from [<802ee89c>] (device_add+0x4c0/0x58c)
[ 24.493088] [<802ee89c>] (device_add+0x4c0/0x58c) from [<802f21b8>] (platform_device_add+0xac/0x214)
[ 24.502227] [<802f21b8>] (platform_device_add+0xac/0x214) from [<8001bf3c>] (lg115x_init_usb+0xbc/0xe4)
[ 24.511618] [<8001bf3c>] (lg115x_init_usb+0xbc/0xe4) from [<80008734>] (do_user_initcalls+0x98/0x128)
[ 24.520843] [<80008734>] (do_user_initcalls+0x98/0x128) from [<80008870>] (proc_write_usercalls+0xac/0xd0)
[ 24.530512] [<80008870>] (proc_write_usercalls+0xac/0xd0) from [<80138f48>] (proc_reg_write+0x58/0x80)
[ 24.539830] [<80138f48>] (proc_reg_write+0x58/0x80) from [<800f0084>] (vfs_write+0xb0/0x1bc)
[ 24.548275] [<800f0084>] (vfs_write+0xb0/0x1bc) from [<800f04d0>] (SyS_write+0x3c/0x70)
[ 24.556287] [<800f04d0>] (SyS_write+0x3c/0x70) from [<8000e5c0>] (ret_fast_syscall+0x0/0x30)
[ 24.564726] --[ end trace c092568e2a263d21 ]--
[ 24.569345] ohci-platform ohci-platform.0: can't setup
[ 24.574498] ohci-platform ohci-platform.0: USB bus 1 deregistered
[ 24.582241] ohci-platform: probe of ohci-platform.0 failed with error -12
[ 24.590496] ohci-platform ohci-platform.1: Generic Platform OHCI Controller
[ 24.598984] ohci-platform ohci-platform.1: new USB bus registered, assigned bus number 1

>
>> If a platform has several ports, it needs more than 1MB.
>> Therefore the default atomic pool size should be at least 1MB.
>>
>>>>
>>>> How about below patch?
>>>> It uses fallback allocation if CMA is failed.
>>>
>>> Yes, I thought about it, but __dma_alloc uses similar code:
>>>
>>>     else if (!IS_ENABLED(CONFIG_DMA_CMA))
>>>         addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, caller);
>>>     else
>>>         addr = __alloc_from_contiguous(dev, size, prot, &page, caller);
>>>
>>> so it probably needs to be changed as well.
>>
>> If CMA option is not selected, __alloc_from_contiguous would not be called.
>> We don't need to the fallback allocation.
>>
>> And if CMA option is selected and initialized correctly,
>> the cma allocation can fail in case of no-CMA-memory situation.
>> I thinks in that case we don't need to the fallback allocation also,
>> because it is normal case.
>>
>> Therefore I think the restriction of CMA size option and make CMA work can cover every cases.
>>
>> I think below patch is also good choice.
>> If both of you, Michal and Joonsoo, do not agree with me, please inform me.
>> I will make a patch including option restriction and fallback allocation.
>
> I'm not sure if we need a fallback for failed CMA allocation. The only issue that
> have been mentioned here and needs to be resolved is support for disabling cma by
> kernel command line. Right now it will fails completely.

cma=0 in the kernel command line and CONFIG_CMA_SIZE_MBYTES 0 are set selected_size as zero in dma_contiguous_reserve.
And dma_contiguous_reserve_area cannot be called and atomic_pool is not initialized.

After that dma_alloc_coherent try to allocate via atomic_pool (__alloc_from_pool) or CMA (__alloc_from_contiguous).
Allocation via atomic_pool fails becauseof atomic_pool->vaddr is NULL.
And CMA allocation shouldn't be called because cma=0 or setting CONFIG_CMA_SIZE_MBYTES 0 is the same with disabling CMA.
If cma=0 or CONFIG_CMA_SIZE_MBYTES is 0, __alloc_remap_buffer should be called instead of __alloc_from_contiguous even-if CMA is turned on.

I'm poor at English so I describe the problem in seudo code:

if (CMA is turned on) and ((cma=0 in command line) or (CONFIG_CMA_SIZE_MBYTES=0))
         try to allocate from CMA but CMA is not initialized



>
> Best regards

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-12 17:04   ` Laura Abbott
  2014-05-13  1:14     ` Joonsoo Kim
  2014-05-13  3:05     ` Minchan Kim
@ 2014-05-24  0:57     ` Laura Abbott
  2014-05-26  2:44       ` Joonsoo Kim
  2 siblings, 1 reply; 45+ messages in thread
From: Laura Abbott @ 2014-05-24  0:57 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, Mel Gorman, Minchan Kim,
	Heesub Shin, Marek Szyprowski, Michal Nazarewicz, linux-mm,
	linux-kernel, lmark

On 5/12/2014 10:04 AM, Laura Abbott wrote:
> 
> I'm going to see about running this through tests internally for comparison.
> Hopefully I'll get useful results in a day or so.
> 
> Thanks,
> Laura
> 

We ran some tests internally and found that for our purposes these patches made
the benchmarks worse vs. the existing implementation of using CMA first for some
pages. These are mostly androidisms but androidisms that we care about for
having a device be useful.

The foreground memory headroom on the device was on average about 40 MB smaller
 when using these patches vs our existing implementation of something like
solution #1. By foreground memory headroom we simply mean the amount of memory
that the foreground application can allocate before it is killed by the Android
 Low Memory killer.

We also found that when running a sequence of app launches these patches had
more high priority app kills by the LMK and more alloc stalls. The test did a
total of 500 hundred app launches (using 9 separate applications) The CMA
memory in our system is rarely used by its client and is therefore available
to the system most of the time.

Test device
- 4 CPUs
- Android 4.4.2
- 512MB of RAM
- 68 MB of CMA

Results:

Existing solution:
Foreground headroom: 200MB
Number of higher priority LMK kills (oom_score_adj < 529): 332
Number of alloc stalls: 607

Test patches:
Foreground headroom: 160MB
Number of higher priority LMK kills (oom_score_adj < 529):
459 Number of alloc stalls: 29538

We believe that the issues seen with these patches are the result of the LMK
being more aggressive. The LMK will be more aggressive because it will ignore
free CMA pages for unmovable allocations, and since most calls to the LMK are
made by kswapd (which uses GFP_KERNEL) the LMK will mostly ignore free CMA
pages. Because the LMK thresholds are higher than the zone watermarks, there
will often be a lot of free CMA pages in the system when the LMK is called,
which the LMK will usually ignore.

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
  2014-05-24  0:57     ` Laura Abbott
@ 2014-05-26  2:44       ` Joonsoo Kim
  0 siblings, 0 replies; 45+ messages in thread
From: Joonsoo Kim @ 2014-05-26  2:44 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Minchan Kim, Heesub Shin, Marek Szyprowski, Michal Nazarewicz,
	linux-mm, linux-kernel, lmark

On Fri, May 23, 2014 at 05:57:58PM -0700, Laura Abbott wrote:
> On 5/12/2014 10:04 AM, Laura Abbott wrote:
> > 
> > I'm going to see about running this through tests internally for comparison.
> > Hopefully I'll get useful results in a day or so.
> > 
> > Thanks,
> > Laura
> > 
> 
> We ran some tests internally and found that for our purposes these patches made
> the benchmarks worse vs. the existing implementation of using CMA first for some
> pages. These are mostly androidisms but androidisms that we care about for
> having a device be useful.
> 
> The foreground memory headroom on the device was on average about 40 MB smaller
>  when using these patches vs our existing implementation of something like
> solution #1. By foreground memory headroom we simply mean the amount of memory
> that the foreground application can allocate before it is killed by the Android
>  Low Memory killer.
> 
> We also found that when running a sequence of app launches these patches had
> more high priority app kills by the LMK and more alloc stalls. The test did a
> total of 500 hundred app launches (using 9 separate applications) The CMA
> memory in our system is rarely used by its client and is therefore available
> to the system most of the time.
> 
> Test device
> - 4 CPUs
> - Android 4.4.2
> - 512MB of RAM
> - 68 MB of CMA
> 
> 
> Results:
> 
> Existing solution:
> Foreground headroom: 200MB
> Number of higher priority LMK kills (oom_score_adj < 529): 332
> Number of alloc stalls: 607
> 
> 
> Test patches:
> Foreground headroom: 160MB
> Number of higher priority LMK kills (oom_score_adj < 529):
> 459 Number of alloc stalls: 29538
> 
> We believe that the issues seen with these patches are the result of the LMK
> being more aggressive. The LMK will be more aggressive because it will ignore
> free CMA pages for unmovable allocations, and since most calls to the LMK are
> made by kswapd (which uses GFP_KERNEL) the LMK will mostly ignore free CMA
> pages. Because the LMK thresholds are higher than the zone watermarks, there
> will often be a lot of free CMA pages in the system when the LMK is called,
> which the LMK will usually ignore.

Hello,

Really thanks for testing!!!
If possible, please let me know nr_free_cma of these patches/your in-house
implementation before testing.

I can guess following scenario about your test.

On boot-up, CMA memory are mostly used by native processes, because
your implementation use CMA first for some pages. kswapd
is woken up late since non-CMA free memory is larger than my
implementation. And, on reclaiming, the LMK reclaiming memory by
killing app process would reclaim movable memory with high probability
since cma memory are mostly used by native processes and app processes
have just movable memory.

This is just my guess. But, if it is true, this is not fair test for
this patchset. If possible, could you make nr_free_cma same on both
implementation before testing?

Moreover, in mainline implementation, the LMK doesn't consider if memory
type is CMA or not. Maybe your overall system would be highly optimized
for your implementation, so I'm not sure if your testing is
appropriate or not for this patchset.

Anyway, I would like to optimize this for android. :)
Please let me know more about your system.

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2014-05-26  2:41 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-08  0:32 [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Joonsoo Kim
2014-05-08  0:32 ` [RFC PATCH 1/3] CMA: remove redundant retrying code in __alloc_contig_migrate_range Joonsoo Kim
2014-05-09 15:44   ` Michal Nazarewicz
2014-05-08  0:32 ` [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Joonsoo Kim
2014-05-09 15:45   ` Michal Nazarewicz
2014-05-12 17:04   ` Laura Abbott
2014-05-13  1:14     ` Joonsoo Kim
2014-05-13  3:05     ` Minchan Kim
2014-05-24  0:57     ` Laura Abbott
2014-05-26  2:44       ` Joonsoo Kim
2014-05-13  3:00   ` Minchan Kim
2014-05-15  1:53     ` Joonsoo Kim
2014-05-15  2:43       ` Minchan Kim
2014-05-19  2:11         ` Joonsoo Kim
2014-05-19  2:53           ` Minchan Kim
2014-05-19  4:50             ` Joonsoo Kim
2014-05-19 23:18               ` Minchan Kim
2014-05-20  6:33                 ` Joonsoo Kim
2014-05-15  2:45       ` Heesub Shin
2014-05-15  5:06         ` Minchan Kim
2014-05-19 23:22         ` Minchan Kim
2014-05-16  8:02       ` [RFC][PATCH] CMA: drivers/base/Kconfig: restrict CMA size to non-zero value Gioh Kim
2014-05-16 17:45         ` Michal Nazarewicz
2014-05-19  1:47           ` Gioh Kim
2014-05-19  5:55             ` Joonsoo Kim
2014-05-19  9:14               ` Gioh Kim
2014-05-19 19:59               ` Michal Nazarewicz
2014-05-20  0:50                 ` Gioh Kim
2014-05-20  1:28                   ` Michal Nazarewicz
2014-05-20  2:26                     ` Gioh Kim
2014-05-20 18:15                       ` Michal Nazarewicz
2014-05-20 11:38                   ` Marek Szyprowski
2014-05-21  0:15                     ` Gioh Kim
2014-05-14  8:42   ` [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used Aneesh Kumar K.V
2014-05-15  1:58     ` Joonsoo Kim
2014-05-18 17:36       ` Aneesh Kumar K.V
2014-05-19  2:29         ` Joonsoo Kim
2014-05-08  0:32 ` [RFC PATCH 3/3] CMA: always treat free cma pages as non-free on watermark checking Joonsoo Kim
2014-05-09 15:46   ` Michal Nazarewicz
2014-05-09 12:39 ` [RFC PATCH 0/3] Aggressively allocate the pages on cma reserved memory Marek Szyprowski
2014-05-13  2:26   ` Joonsoo Kim
2014-05-14  9:44     ` Aneesh Kumar K.V
2014-05-15  2:10       ` Joonsoo Kim
2014-05-15  9:47         ` Mel Gorman
2014-05-19  2:12           ` Joonsoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).