linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] Introduce ZONE_CMA
@ 2016-05-26  6:22 js1304
  2016-05-26  6:22 ` [PATCH v3 1/6] mm/page_alloc: recalculate some of zone threshold when on/offline memory js1304
                   ` (7 more replies)
  0 siblings, 8 replies; 34+ messages in thread
From: js1304 @ 2016-05-26  6:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Hello,

Changes from v2
o Rebase on next-20160525
o No other changes except following description

There was a discussion with Mel [1] after LSF/MM 2016. I could summarise
it to help merge decision but it's better to read by yourself since
if I summarise it, it would be biased for me. But, if anyone hope
the summary, I will do it. :)

Anyway, Mel's position on this patchset seems to be neutral. He said:
"I'm not going to outright NAK your series but I won't ACK it either"

We can fix the problems with any approach but I hope to go a new zone
approach because it is less error-prone. It reduces some corner case
handling for now and remove need for potential corner case handling to fix
problems.

Note that our company is already using ZONE_CMA for a years and
there is no problem.

If anyone has a different opinion, please let me know and let's discuss
together.

Andrew, if there is something to do for merge, please let me know.

[1] https://lkml.kernel.org/r/20160425053653.GA25662@js1304-P5Q-DELUXE

Changes from v1
o Separate some patches which deserve to submit independently
o Modify description to reflect current kernel state
(e.g. high-order watermark problem disappeared by Mel's work)
o Don't increase SECTION_SIZE_BITS to make a room in page flags
(detailed reason is on the patch that adds ZONE_CMA)
o Adjust ZONE_CMA population code

This series try to solve problems of current CMA implementation.

CMA is introduced to provide physically contiguous pages at runtime
without exclusive reserved memory area. But, current implementation
works like as previous reserved memory approach, because freepages
on CMA region are used only if there is no movable freepage. In other
words, freepages on CMA region are only used as fallback. In that
situation where freepages on CMA region are used as fallback, kswapd
would be woken up easily since there is no unmovable and reclaimable
freepage, too. If kswapd starts to reclaim memory, fallback allocation
to MIGRATE_CMA doesn't occur any more since movable freepages are
already refilled by kswapd and then most of freepage on CMA are left
to be in free. This situation looks like exclusive reserved memory case.

In my experiment, I found that if system memory has 1024 MB memory and
512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
free memory is left. Detailed reason is that for keeping enough free
memory for unmovable and reclaimable allocation, kswapd uses below
equation when calculating free memory and it easily go under the watermark.

Free memory for unmovable and reclaimable = Free total - Free CMA pages

This is derivated from the property of CMA freepage that CMA freepage
can't be used for unmovable and reclaimable allocation.

Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
is lower than low watermark and tries to make free memory until
(FreeTotal - FreeCMA) is higher than high watermark. That results
in that FreeTotal is moving around 512MB boundary consistently. It
then means that we can't utilize full memory capacity.

To fix this problem, I submitted some patches [1] about 10 months ago,
but, found some more problems to be fixed before solving this problem.
It requires many hooks in allocator hotpath so some developers doesn't
like it. Instead, some of them suggest different approach [2] to fix
all the problems related to CMA, that is, introducing a new zone to deal
with free CMA pages. I agree that it is the best way to go so implement
here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
decide to add a new zone rather than piggyback on ZONE_MOVABLE since
they have some differences. First, reserved CMA pages should not be
offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
to distiguish hotpluggable memory and reserved memory for CMA in the same
zone. It would make memory hotplug code which is already complicated
more complicated. Second, cma_alloc() can be called more frequently
than memory hotplug operation and possibly we need to control
allocation rate of ZONE_CMA to optimize latency in the future.
In this case, separate zone approach is easy to modify. Third, I'd
like to see statistics for CMA, separately. Sometimes, we need to debug
why cma_alloc() is failed and separate statistics would be more helpful
in this situtaion.

Anyway, this patchset solves four problems related to CMA implementation.

1) Utilization problem
As mentioned above, we can't utilize full memory capacity due to the
limitation of CMA freepage and fallback policy. This patchset implements
a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
typed allocation is used for page cache and anonymous pages which
occupies most of memory usage in normal case so we can utilize full
memory capacity. Below is the experiment result about this problem.

8 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16

<Before this series>
CMA reserve:            0 MB            512 MB
Elapsed-time:           92.4		186.5
pswpin:                 82		18647
pswpout:                160		69839

<After this series>
CMA reserve:            0 MB            512 MB
Elapsed-time:           93.1		93.4
pswpin:                 84		46
pswpout:                183		92

FYI, there is another attempt [3] trying to solve this problem in lkml.
And, as far as I know, Qualcomm also has out-of-tree solution for this
problem.

2) Reclaim problem
Currently, there is no logic to distinguish CMA pages in reclaim path.
If reclaim is initiated for unmovable and reclaimable allocation,
reclaiming CMA pages doesn't help to satisfy the request and reclaiming
CMA page is just waste. By managing CMA pages in the new zone, we can
skip to reclaim ZONE_CMA completely if it is unnecessary.

3) Atomic allocation failure problem
Kswapd isn't started to reclaim pages when allocation request is movable
type and there is enough free page in the CMA region. After bunch of
consecutive movable allocation requests, free pages in ordinary region
(not CMA region) would be exhausted without waking up kswapd. At that time,
if atomic unmovable allocation comes, it can't be successful since there
is not enough page in ordinary region. This problem is reported
by Aneesh [4] and can be solved by this patchset.

4) Inefficiently work of compaction
Usual high-order allocation request is unmovable type and it cannot
be serviced from CMA area. In compaction, migration scanner doesn't
distinguish migratable pages on the CMA area and do migration.
In this case, even if we make high-order page on that region, it
cannot be used due to type mismatch. This patch will solve this problem
by separating CMA pages from ordinary zones.

I passed boot test on x86_64, x86_32, arm and arm64. I did some stress
tests on x86_64 and x86_32 and there is no problem. Feel free to enjoy
and please give me a feedback. :)

This patchset is based on linux-next-20160330.

Thanks.

[1] https://lkml.org/lkml/2014/5/28/64
[2] https://lkml.org/lkml/2014/11/4/55
[3] https://lkml.org/lkml/2014/10/15/623
[4] http://www.spinics.net/lists/linux-mm/msg100562.html
Joonsoo Kim (6):
  mm/page_alloc: recalculate some of zone threshold when on/offline
    memory
  mm/cma: introduce new zone, ZONE_CMA
  mm/cma: populate ZONE_CMA
  mm/cma: remove ALLOC_CMA
  mm/cma: remove MIGRATE_CMA
  mm/cma: remove per zone CMA stat

 arch/x86/mm/highmem_32.c          |   8 ++
 fs/proc/meminfo.c                 |   2 +-
 include/linux/cma.h               |   6 +
 include/linux/gfp.h               |  32 +++--
 include/linux/memory_hotplug.h    |   3 -
 include/linux/mempolicy.h         |   2 +-
 include/linux/mmzone.h            |  54 +++++----
 include/linux/vm_event_item.h     |  10 +-
 include/linux/vmstat.h            |   8 --
 include/trace/events/compaction.h |  10 +-
 kernel/power/snapshot.c           |   8 ++
 mm/cma.c                          |  58 ++++++++-
 mm/compaction.c                   |  10 +-
 mm/hugetlb.c                      |   2 +-
 mm/internal.h                     |   6 +-
 mm/memory_hotplug.c               |   3 +
 mm/page_alloc.c                   | 242 +++++++++++++++++++++-----------------
 mm/page_isolation.c               |   5 +-
 mm/vmstat.c                       |  15 ++-
 19 files changed, 303 insertions(+), 181 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 1/6] mm/page_alloc: recalculate some of zone threshold when on/offline memory
  2016-05-26  6:22 [PATCH v3 0/6] Introduce ZONE_CMA js1304
@ 2016-05-26  6:22 ` js1304
  2016-06-24 13:20   ` Vlastimil Babka
  2016-05-26  6:22 ` [PATCH v3 2/6] mm/cma: introduce new zone, ZONE_CMA js1304
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 34+ messages in thread
From: js1304 @ 2016-05-26  6:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Some of zone threshold depends on number of managed pages in the zone.
When memory is going on/offline, it can be changed and we need to
adjust them.

This patch add recalculation to appropriate places and clean-up
related function for better maintanance.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_alloc.c | 36 +++++++++++++++++++++++++++++-------
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d27e8b9..90e5a82 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4874,6 +4874,8 @@ int local_memory_node(int node)
 }
 #endif
 
+static void setup_min_unmapped_ratio(struct zone *zone);
+static void setup_min_slab_ratio(struct zone *zone);
 #else	/* CONFIG_NUMA */
 
 static void set_zonelist_order(void)
@@ -5988,9 +5990,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
 #ifdef CONFIG_NUMA
 		zone->node = nid;
-		zone->min_unmapped_pages = (freesize*sysctl_min_unmapped_ratio)
-						/ 100;
-		zone->min_slab_pages = (freesize * sysctl_min_slab_ratio) / 100;
+		setup_min_unmapped_ratio(zone);
+		setup_min_slab_ratio(zone);
 #endif
 		zone->name = zone_names[j];
 		spin_lock_init(&zone->lock);
@@ -6896,6 +6897,7 @@ int __meminit init_per_zone_wmark_min(void)
 {
 	unsigned long lowmem_kbytes;
 	int new_min_free_kbytes;
+	struct zone *zone;
 
 	lowmem_kbytes = nr_free_buffer_pages() * (PAGE_SIZE >> 10);
 	new_min_free_kbytes = int_sqrt(lowmem_kbytes * 16);
@@ -6913,6 +6915,14 @@ int __meminit init_per_zone_wmark_min(void)
 	setup_per_zone_wmarks();
 	refresh_zone_stat_thresholds();
 	setup_per_zone_lowmem_reserve();
+
+	for_each_zone(zone) {
+#ifdef CONFIG_NUMA
+		setup_min_unmapped_ratio(zone);
+		setup_min_slab_ratio(zone);
+#endif
+	}
+
 	return 0;
 }
 core_initcall(init_per_zone_wmark_min)
@@ -6954,6 +6964,12 @@ int watermark_scale_factor_sysctl_handler(struct ctl_table *table, int write,
 }
 
 #ifdef CONFIG_NUMA
+static void setup_min_unmapped_ratio(struct zone *zone)
+{
+	zone->min_unmapped_pages = (zone->managed_pages *
+			sysctl_min_unmapped_ratio) / 100;
+}
+
 int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
@@ -6965,11 +6981,17 @@ int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *table, int write,
 		return rc;
 
 	for_each_zone(zone)
-		zone->min_unmapped_pages = (zone->managed_pages *
-				sysctl_min_unmapped_ratio) / 100;
+		setup_min_unmapped_ratio(zone);
+
 	return 0;
 }
 
+static void setup_min_slab_ratio(struct zone *zone)
+{
+	zone->min_slab_pages = (zone->managed_pages *
+			sysctl_min_slab_ratio) / 100;
+}
+
 int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
@@ -6981,8 +7003,8 @@ int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *table, int write,
 		return rc;
 
 	for_each_zone(zone)
-		zone->min_slab_pages = (zone->managed_pages *
-				sysctl_min_slab_ratio) / 100;
+		setup_min_slab_ratio(zone);
+
 	return 0;
 }
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 2/6] mm/cma: introduce new zone, ZONE_CMA
  2016-05-26  6:22 [PATCH v3 0/6] Introduce ZONE_CMA js1304
  2016-05-26  6:22 ` [PATCH v3 1/6] mm/page_alloc: recalculate some of zone threshold when on/offline memory js1304
@ 2016-05-26  6:22 ` js1304
  2016-05-26  6:22 ` [PATCH v3 3/6] mm/cma: populate ZONE_CMA js1304
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 34+ messages in thread
From: js1304 @ 2016-05-26  6:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Attached cover-letter:

This series try to solve problems of current CMA implementation.

CMA is introduced to provide physically contiguous pages at runtime
without exclusive reserved memory area. But, current implementation
works like as previous reserved memory approach, because freepages
on CMA region are used only if there is no movable freepage. In other
words, freepages on CMA region are only used as fallback. In that
situation where freepages on CMA region are used as fallback, kswapd
would be woken up easily since there is no unmovable and reclaimable
freepage, too. If kswapd starts to reclaim memory, fallback allocation
to MIGRATE_CMA doesn't occur any more since movable freepages are
already refilled by kswapd and then most of freepage on CMA are left
to be in free. This situation looks like exclusive reserved memory case.

In my experiment, I found that if system memory has 1024 MB memory and
512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
free memory is left. Detailed reason is that for keeping enough free
memory for unmovable and reclaimable allocation, kswapd uses below
equation when calculating free memory and it easily go under the watermark.

Free memory for unmovable and reclaimable = Free total - Free CMA pages

This is derivated from the property of CMA freepage that CMA freepage
can't be used for unmovable and reclaimable allocation.

Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
is lower than low watermark and tries to make free memory until
(FreeTotal - FreeCMA) is higher than high watermark. That results
in that FreeTotal is moving around 512MB boundary consistently. It
then means that we can't utilize full memory capacity.

To fix this problem, I submitted some patches [1] about 10 months ago,
but, found some more problems to be fixed before solving this problem.
It requires many hooks in allocator hotpath so some developers doesn't
like it. Instead, some of them suggest different approach [2] to fix
all the problems related to CMA, that is, introducing a new zone to deal
with free CMA pages. I agree that it is the best way to go so implement
here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
decide to add a new zone rather than piggyback on ZONE_MOVABLE since
they have some differences. First, reserved CMA pages should not be
offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
to distiguish hotpluggable memory and reserved memory for CMA in the same
zone. It would make memory hotplug code which is already complicated
more complicated. Second, cma_alloc() can be called more frequently
than memory hotplug operation and possibly we need to control
allocation rate of ZONE_CMA to optimize latency in the future.
In this case, separate zone approach is easy to modify. Third, I'd
like to see statistics for CMA, separately. Sometimes, we need to debug
why cma_alloc() is failed and separate statistics would be more helpful
in this situtaion.

Anyway, this patchset solves four problems related to CMA implementation.

1) Utilization problem
As mentioned above, we can't utilize full memory capacity due to the
limitation of CMA freepage and fallback policy. This patchset implements
a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
typed allocation is used for page cache and anonymous pages which
occupies most of memory usage in normal case so we can utilize full
memory capacity. Below is the experiment result about this problem.

8 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16

<Before this series>
CMA reserve:            0 MB            512 MB
Elapsed-time:           92.4		186.5
pswpin:                 82		18647
pswpout:                160		69839

<After this series>
CMA reserve:            0 MB            512 MB
Elapsed-time:           93.1		93.4
pswpin:                 84		46
pswpout:                183		92

FYI, there is another attempt [3] trying to solve this problem in lkml.
And, as far as I know, Qualcomm also has out-of-tree solution for this
problem.

2) Reclaim problem
Currently, there is no logic to distinguish CMA pages in reclaim path.
If reclaim is initiated for unmovable and reclaimable allocation,
reclaiming CMA pages doesn't help to satisfy the request and reclaiming
CMA page is just waste. By managing CMA pages in the new zone, we can
skip to reclaim ZONE_CMA completely if it is unnecessary.

3) Atomic allocation failure problem
Kswapd isn't started to reclaim pages when allocation request is movable
type and there is enough free page in the CMA region. After bunch of
consecutive movable allocation requests, free pages in ordinary region
(not CMA region) would be exhausted without waking up kswapd. At that time,
if atomic unmovable allocation comes, it can't be successful since there
is not enough page in ordinary region. This problem is reported
by Aneesh [4] and can be solved by this patchset.

4) Inefficiently work of compaction
Usual high-order allocation request is unmovable type and it cannot
be serviced from CMA area. In compaction, migration scanner doesn't
distinguish migratable pages on the CMA area and do migration.
In this case, even if we make high-order page on that region, it
cannot be used due to type mismatch. This patch will solve this problem
by separating CMA pages from ordinary zones.

[1] https://lkml.org/lkml/2014/5/28/64
[2] https://lkml.org/lkml/2014/11/4/55
[3] https://lkml.org/lkml/2014/10/15/623
[4] http://www.spinics.net/lists/linux-mm/msg100562.html
[5] https://lkml.org/lkml/2014/5/30/320

For this patch:

Currently, reserved pages for CMA are managed together with normal pages.
To distinguish them, we used migratetype, MIGRATE_CMA, and
do special handlings for this migratetype. But, it turns out that
there are too many problems with this approach and to fix all of them
needs many more hooks to page allocation and reclaim path so
some developers express their discomfort and problems on CMA aren't fixed
for a long time.

To terminate this situation and fix CMA problems, this patch implements
ZONE_CMA. Reserved pages for CMA will be managed in this new zone. This
approach will remove all exisiting hooks for MIGRATE_CMA and many
problems related to CMA implementation will be solved.

This patch only add basic infrastructure of ZONE_CMA. In the following
patch, ZONE_CMA is actually populated and used.

Adding a new zone could cause two possible problems. One is the overflow
of page flags and the other is GFP_ZONES_TABLE issue.

Following is page-flags layout described in page-flags-layout.h.

1. No sparsemem or sparsemem vmemmap: |       NODE     | ZONE |             ... | FLAGS |
2.      " plus space for last_cpupid: |       NODE     | ZONE | LAST_CPUPID ... | FLAGS |
3. classic sparse with space for node:| SECTION | NODE | ZONE |             ... | FLAGS |
4.      " plus space for last_cpupid: | SECTION | NODE | ZONE | LAST_CPUPID ... | FLAGS |
5. classic sparse no space for node:  | SECTION |     ZONE    | ... | FLAGS |

There is no problem in #1, #2 configurations for 64-bit system. There are
enough room even for extremiely large x86_64 system. 32-bit system would
not have many nodes so it would have no problem, too.
System with #3, #4, #5 configurations could be affected by this zone
addition, but, thanks to recent THP rework which reduce one page flag,
problem surface would be small. In some configurations, problem is
still possible, but, it highly depends on individual configuration
so impact cannot be easily estimated. I guess that usual system
with CONFIG_CMA would not be affected. If there is a problem,
we can adjust section width or node width for that architecture.

Currently, GFP_ZONES_TABLE is 32-bit value for 32-bit bit operation
in the 32-bit system. If we add one more zone, it will be 48-bit and
32-bit bit operation cannot be possible. Although it will cause slight
overhead, there is no other way so this patch relax GFP_ZONES_TABLE's
32-bit limitation. 32-bit System with CONFIG_CMA will be affected by
this change but it would be marginal.

Note that there are many checkpatch warnings but I think that current
code is better for readability than fixing them up.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 arch/x86/mm/highmem_32.c          |  8 +++++
 include/linux/gfp.h               | 29 +++++++++++-------
 include/linux/mempolicy.h         |  2 +-
 include/linux/mmzone.h            | 31 ++++++++++++++++++-
 include/linux/vm_event_item.h     | 10 ++++++-
 include/trace/events/compaction.h | 10 ++++++-
 kernel/power/snapshot.c           |  8 +++++
 mm/memory_hotplug.c               |  3 ++
 mm/page_alloc.c                   | 63 +++++++++++++++++++++++++++++++++------
 mm/vmstat.c                       |  9 +++++-
 10 files changed, 148 insertions(+), 25 deletions(-)

diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index a6d7392..a7fcb12 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -120,6 +120,14 @@ void __init set_highmem_pages_init(void)
 		if (!is_highmem(zone))
 			continue;
 
+		/*
+		 * ZONE_CMA is a special zone that should not be
+		 * participated in initialization because it's pages
+		 * would be initialized by initialization of other zones.
+		 */
+		if (is_zone_cma(zone))
+			continue;
+
 		zone_start_pfn = zone->zone_start_pfn;
 		zone_end_pfn = zone_start_pfn + zone->spanned_pages;
 
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 570383a..4d6c008 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -301,6 +301,12 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
 #define OPT_ZONE_DMA32 ZONE_NORMAL
 #endif
 
+#ifdef CONFIG_CMA
+#define OPT_ZONE_CMA ZONE_CMA
+#else
+#define OPT_ZONE_CMA ZONE_MOVABLE
+#endif
+
 /*
  * GFP_ZONE_TABLE is a word size bitstring that is used for looking up the
  * zone to use given the lowest 4 bits of gfp_t. Entries are ZONE_SHIFT long
@@ -331,7 +337,6 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
  *       0xe    => BAD (MOVABLE+DMA32+HIGHMEM)
  *       0xf    => BAD (MOVABLE+DMA32+HIGHMEM+DMA)
  *
- * GFP_ZONES_SHIFT must be <= 2 on 32 bit platforms.
  */
 
 #if defined(CONFIG_ZONE_DEVICE) && (MAX_NR_ZONES-1) <= 4
@@ -341,19 +346,21 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
 #define GFP_ZONES_SHIFT ZONES_SHIFT
 #endif
 
-#if 16 * GFP_ZONES_SHIFT > BITS_PER_LONG
-#error GFP_ZONES_SHIFT too large to create GFP_ZONE_TABLE integer
+#if !defined(CONFIG_64BITS) && GFP_ZONES_SHIFT > 2
+#define GFP_ZONE_TABLE_CAST unsigned long long
+#else
+#define GFP_ZONE_TABLE_CAST unsigned long
 #endif
 
 #define GFP_ZONE_TABLE ( \
-	(ZONE_NORMAL << 0 * GFP_ZONES_SHIFT)				       \
-	| (OPT_ZONE_DMA << ___GFP_DMA * GFP_ZONES_SHIFT)		       \
-	| (OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * GFP_ZONES_SHIFT)	       \
-	| (OPT_ZONE_DMA32 << ___GFP_DMA32 * GFP_ZONES_SHIFT)		       \
-	| (ZONE_NORMAL << ___GFP_MOVABLE * GFP_ZONES_SHIFT)		       \
-	| (OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * GFP_ZONES_SHIFT)    \
-	| (ZONE_MOVABLE << (___GFP_MOVABLE | ___GFP_HIGHMEM) * GFP_ZONES_SHIFT)\
-	| (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * GFP_ZONES_SHIFT)\
+	((GFP_ZONE_TABLE_CAST) ZONE_NORMAL << 0 * GFP_ZONES_SHIFT)					\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_DMA << ___GFP_DMA * GFP_ZONES_SHIFT)				\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * GFP_ZONES_SHIFT)			\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_DMA32 << ___GFP_DMA32 * GFP_ZONES_SHIFT)			\
+	| ((GFP_ZONE_TABLE_CAST) ZONE_NORMAL << ___GFP_MOVABLE * GFP_ZONES_SHIFT)			\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * GFP_ZONES_SHIFT)	\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_CMA << (___GFP_MOVABLE | ___GFP_HIGHMEM) * GFP_ZONES_SHIFT)	\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * GFP_ZONES_SHIFT)	\
 )
 
 /*
diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 4429d25..c4cc86e 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -157,7 +157,7 @@ extern enum zone_type policy_zone;
 
 static inline void check_highest_zone(enum zone_type k)
 {
-	if (k > policy_zone && k != ZONE_MOVABLE)
+	if (k > policy_zone && k != ZONE_MOVABLE && !is_zone_cma_idx(k))
 		policy_zone = k;
 }
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 02069c2..54c92a6 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -312,6 +312,9 @@ enum zone_type {
 	ZONE_HIGHMEM,
 #endif
 	ZONE_MOVABLE,
+#ifdef CONFIG_CMA
+	ZONE_CMA,
+#endif
 #ifdef CONFIG_ZONE_DEVICE
 	ZONE_DEVICE,
 #endif
@@ -806,11 +809,37 @@ static inline int zone_movable_is_highmem(void)
 }
 #endif
 
+static inline int is_zone_cma_idx(enum zone_type idx)
+{
+#ifdef CONFIG_CMA
+	return idx == ZONE_CMA;
+#else
+	return 0;
+#endif
+}
+
+static inline int is_zone_cma(struct zone *zone)
+{
+	int zone_idx = zone_idx(zone);
+
+	return is_zone_cma_idx(zone_idx);
+}
+
+static inline int zone_cma_is_highmem(void)
+{
+#ifdef CONFIG_HIGHMEM
+	return 1;
+#else
+	return 0;
+#endif
+}
+
 static inline int is_highmem_idx(enum zone_type idx)
 {
 #ifdef CONFIG_HIGHMEM
 	return (idx == ZONE_HIGHMEM ||
-		(idx == ZONE_MOVABLE && zone_movable_is_highmem()));
+		(idx == ZONE_MOVABLE && zone_movable_is_highmem()) ||
+		(is_zone_cma_idx(idx) && zone_cma_is_highmem()));
 #else
 	return 0;
 #endif
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index ec08432..bbe16d4 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -19,7 +19,15 @@
 #define HIGHMEM_ZONE(xx)
 #endif
 
-#define FOR_ALL_ZONES(xx) DMA_ZONE(xx) DMA32_ZONE(xx) xx##_NORMAL, HIGHMEM_ZONE(xx) xx##_MOVABLE
+#ifdef CONFIG_CMA
+#define MOVABLE_ZONE(xx) xx##_MOVABLE,
+#define CMA_ZONE(xx) xx##_CMA
+#else
+#define MOVABLE_ZONE(xx) xx##_MOVABLE
+#define CMA_ZONE(xx)
+#endif
+
+#define FOR_ALL_ZONES(xx) DMA_ZONE(xx) DMA32_ZONE(xx) xx##_NORMAL, HIGHMEM_ZONE(xx) MOVABLE_ZONE(xx) CMA_ZONE(xx)
 
 enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		FOR_ALL_ZONES(PGALLOC),
diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
index 36e2d6f..9d3b254 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -38,12 +38,20 @@
 #define IFDEF_ZONE_HIGHMEM(X)
 #endif
 
+#ifdef CONFIG_CMA
+#define IFDEF_ZONE_CMA(X, Y, Z) X Z
+#else
+#define IFDEF_ZONE_CMA(X, Y, Z) Y
+#endif
+
 #define ZONE_TYPE						\
 	IFDEF_ZONE_DMA(		EM (ZONE_DMA,	 "DMA"))	\
 	IFDEF_ZONE_DMA32(	EM (ZONE_DMA32,	 "DMA32"))	\
 				EM (ZONE_NORMAL, "Normal")	\
 	IFDEF_ZONE_HIGHMEM(	EM (ZONE_HIGHMEM,"HighMem"))	\
-				EMe(ZONE_MOVABLE,"Movable")
+	IFDEF_ZONE_CMA(		EM (ZONE_MOVABLE,"Movable"),	\
+				EMe(ZONE_MOVABLE,"Movable"),	\
+				EMe(ZONE_CMA,    "CMA"))
 
 /*
  * First define the enums in the above macros to be exported to userspace
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 3a97060..e8a7d8f 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1042,6 +1042,14 @@ unsigned int snapshot_additional_pages(struct zone *zone)
 {
 	unsigned int rtree, nodes;
 
+	/*
+	 * Estimation of needed pages for ZONE_CMA is already considered
+	 * when calculating other zones since span of ZONE_CMA is subset
+	 * of other zones.
+	 */
+	if (is_zone_cma(zone))
+		return 0;
+
 	rtree = nodes = DIV_ROUND_UP(zone->spanned_pages, BM_BITS_PER_BLOCK);
 	rtree += DIV_ROUND_UP(rtree * sizeof(struct rtree_node),
 			      LINKED_PAGE_DATA_SIZE);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 522e3ef..a3a2875 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1854,6 +1854,9 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	if (zone_idx(zone) <= ZONE_NORMAL && !can_offline_normal(zone, nr_pages))
 		return -EINVAL;
 
+	if (is_zone_cma(zone))
+		return -EINVAL;
+
 	/* set above range as isolated */
 	ret = start_isolate_page_range(start_pfn, end_pfn,
 				       MIGRATE_MOVABLE, true);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 90e5a82..0197d5d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -202,6 +202,9 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
 	 32,
 #endif
 	 32,
+#ifdef CONFIG_CMA
+	 32,
+#endif
 };
 
 EXPORT_SYMBOL(totalram_pages);
@@ -218,6 +221,9 @@ static char * const zone_names[MAX_NR_ZONES] = {
 	 "HighMem",
 #endif
 	 "Movable",
+#ifdef CONFIG_CMA
+	 "CMA",
+#endif
 #ifdef CONFIG_ZONE_DEVICE
 	 "Device",
 #endif
@@ -5137,6 +5143,15 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	struct memblock_region *r = NULL, *tmp;
 #endif
 
+	/*
+	 * Physical pages for ZONE_CMA are belong to other zones now. They
+	 * are initialized when corresponding zone is initialized and they
+	 * will be moved to ZONE_CMA later. Zone information will also be
+	 * adjusted later.
+	 */
+	if (is_zone_cma_idx(zone))
+		return;
+
 	if (highest_memmap_pfn < end_pfn - 1)
 		highest_memmap_pfn = end_pfn - 1;
 
@@ -5573,7 +5588,7 @@ static void __init find_usable_zone_for_movable(void)
 {
 	int zone_index;
 	for (zone_index = MAX_NR_ZONES - 1; zone_index >= 0; zone_index--) {
-		if (zone_index == ZONE_MOVABLE)
+		if (zone_index == ZONE_MOVABLE || is_zone_cma_idx(zone_index))
 			continue;
 
 		if (arch_zone_highest_possible_pfn[zone_index] >
@@ -5782,6 +5797,8 @@ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
 						unsigned long *zholes_size)
 {
 	unsigned long realtotalpages = 0, totalpages = 0;
+	unsigned long zone_cma_start_pfn = UINT_MAX;
+	unsigned long zone_cma_end_pfn = 0;
 	enum zone_type i;
 
 	for (i = 0; i < MAX_NR_ZONES; i++) {
@@ -5789,6 +5806,13 @@ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
 		unsigned long zone_start_pfn, zone_end_pfn;
 		unsigned long size, real_size;
 
+		if (is_zone_cma_idx(i)) {
+			zone->zone_start_pfn = zone_cma_start_pfn;
+			size = zone_cma_end_pfn - zone_cma_start_pfn;
+			real_size = 0;
+			goto init_zone;
+		}
+
 		size = zone_spanned_pages_in_node(pgdat->node_id, i,
 						  node_start_pfn,
 						  node_end_pfn,
@@ -5798,13 +5822,23 @@ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
 		real_size = size - zone_absent_pages_in_node(pgdat->node_id, i,
 						  node_start_pfn, node_end_pfn,
 						  zholes_size);
-		if (size)
+		if (size) {
 			zone->zone_start_pfn = zone_start_pfn;
-		else
+			if (zone_cma_start_pfn > zone_start_pfn)
+				zone_cma_start_pfn = zone_start_pfn;
+			if (zone_cma_end_pfn < zone_start_pfn + size)
+				zone_cma_end_pfn = zone_start_pfn + size;
+		} else
 			zone->zone_start_pfn = 0;
+
+init_zone:
 		zone->spanned_pages = size;
 		zone->present_pages = real_size;
 
+		/* Prevent to over-count node span */
+		if (is_zone_cma_idx(i))
+			size = 0;
+
 		totalpages += size;
 		realtotalpages += real_size;
 	}
@@ -5946,6 +5980,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		struct zone *zone = pgdat->node_zones + j;
 		unsigned long size, realsize, freesize, memmap_pages;
 		unsigned long zone_start_pfn = zone->zone_start_pfn;
+		bool zone_kernel = !is_highmem_idx(j) && !is_zone_cma_idx(j);
 
 		size = zone->spanned_pages;
 		realsize = freesize = zone->present_pages;
@@ -5956,7 +5991,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		 * and per-cpu initialisations
 		 */
 		memmap_pages = calc_memmap_size(size, realsize);
-		if (!is_highmem_idx(j)) {
+		if (zone_kernel) {
 			if (freesize >= memmap_pages) {
 				freesize -= memmap_pages;
 				if (memmap_pages)
@@ -5975,7 +6010,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 					zone_names[0], dma_reserve);
 		}
 
-		if (!is_highmem_idx(j))
+		if (zone_kernel)
 			nr_kernel_pages += freesize;
 		/* Charge for highmem memmap if there are enough kernel pages */
 		else if (nr_kernel_pages > memmap_pages * 2)
@@ -5987,7 +6022,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		 * when the bootmem allocator frees pages into the buddy system.
 		 * And all highmem pages will be managed by the buddy system.
 		 */
-		zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
+		zone->managed_pages = zone_kernel ? freesize : realsize;
 #ifdef CONFIG_NUMA
 		zone->node = nid;
 		setup_min_unmapped_ratio(zone);
@@ -6004,7 +6039,12 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
 
 		lruvec_init(&zone->lruvec);
-		if (!size)
+
+		/*
+		 * ZONE_CMA should be initialized even if it has no present
+		 * page now since pages will be moved to the zone later.
+		 */
+		if (!size && !is_zone_cma_idx(j))
 			continue;
 
 		set_pageblock_order();
@@ -6458,7 +6498,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 	arch_zone_lowest_possible_pfn[0] = find_min_pfn_with_active_regions();
 	arch_zone_highest_possible_pfn[0] = max_zone_pfn[0];
 	for (i = 1; i < MAX_NR_ZONES; i++) {
-		if (i == ZONE_MOVABLE)
+		if (i == ZONE_MOVABLE || is_zone_cma_idx(i))
 			continue;
 		arch_zone_lowest_possible_pfn[i] =
 			arch_zone_highest_possible_pfn[i-1];
@@ -6475,7 +6515,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 	/* Print out the zone ranges */
 	pr_info("Zone ranges:\n");
 	for (i = 0; i < MAX_NR_ZONES; i++) {
-		if (i == ZONE_MOVABLE)
+		if (i == ZONE_MOVABLE || is_zone_cma_idx(i))
 			continue;
 		pr_info("  %-8s ", zone_names[i]);
 		if (arch_zone_lowest_possible_pfn[i] ==
@@ -7197,6 +7237,11 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return false;
+
+	/* ZONE_CMA never contains unmovable pages */
+	if (is_zone_cma(zone))
+		return false;
+
 	mt = get_pageblock_migratetype(page);
 	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
 		return false;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 1b585f8..48c4942 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -697,8 +697,15 @@ int fragmentation_index(struct zone *zone, unsigned int order)
 #define TEXT_FOR_HIGHMEM(xx)
 #endif
 
+#ifdef CONFIG_CMA
+#define TEXT_FOR_CMA(xx) xx "_cma",
+#else
+#define TEXT_FOR_CMA(xx)
+#endif
+
 #define TEXTS_FOR_ZONES(xx) TEXT_FOR_DMA(xx) TEXT_FOR_DMA32(xx) xx "_normal", \
-					TEXT_FOR_HIGHMEM(xx) xx "_movable",
+					TEXT_FOR_HIGHMEM(xx) xx "_movable", \
+					TEXT_FOR_CMA(xx)
 
 const char * const vmstat_text[] = {
 	/* enum zone_stat_item countes */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 3/6] mm/cma: populate ZONE_CMA
  2016-05-26  6:22 [PATCH v3 0/6] Introduce ZONE_CMA js1304
  2016-05-26  6:22 ` [PATCH v3 1/6] mm/page_alloc: recalculate some of zone threshold when on/offline memory js1304
  2016-05-26  6:22 ` [PATCH v3 2/6] mm/cma: introduce new zone, ZONE_CMA js1304
@ 2016-05-26  6:22 ` js1304
  2016-06-22  9:23   ` Chen Feng
  2016-06-27  8:24   ` Vlastimil Babka
  2016-05-26  6:22 ` [PATCH v3 4/6] mm/cma: remove ALLOC_CMA js1304
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 34+ messages in thread
From: js1304 @ 2016-05-26  6:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Until now, reserved pages for CMA are managed in the ordinary zones
where page's pfn are belong to. This approach has numorous problems
and fixing them isn't easy. (It is mentioned on previous patch.)
To fix this situation, ZONE_CMA is introduced in previous patch, but,
not yet populated. This patch implement population of ZONE_CMA
by stealing reserved pages from the ordinary zones.

Unlike previous implementation that kernel allocation request with
__GFP_MOVABLE could be serviced from CMA region, allocation request only
with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
approach. This is an inevitable design decision to use the zone
implementation because ZONE_CMA could contain highmem. Due to this
decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.

I don't think it would be a problem because most of file cache pages
and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
be proved by the fact that there are many systems with ZONE_HIGHMEM and
they work fine. Notable disadvantage is that we cannot use these pages
for blockdev file cache page, because it usually has __GFP_MOVABLE but
not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
cons. In my experience, blockdev file cache pages are one of the top
reason that causes cma_alloc() to fail temporarily. So, we can get more
guarantee of cma_alloc() success by discarding that case.

Implementation itself is very easy to understand. Steal when cma area is
initialized and recalculate various per zone stat/threshold.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/memory_hotplug.h |  3 ---
 mm/cma.c                       | 41 +++++++++++++++++++++++++++++++++++++++++
 mm/internal.h                  |  3 +++
 mm/page_alloc.c                | 26 ++++++++++++++++++++++++--
 4 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index a864d79..6fde69b 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -198,9 +198,6 @@ void put_online_mems(void);
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);
 
-extern void set_zone_contiguous(struct zone *zone);
-extern void clear_zone_contiguous(struct zone *zone);
-
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 /*
  * Stub functions for when hotplug is off
diff --git a/mm/cma.c b/mm/cma.c
index ea506eb..8684f50 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -38,6 +38,7 @@
 #include <trace/events/cma.h>
 
 #include "cma.h"
+#include "internal.h"
 
 struct cma cma_areas[MAX_CMA_AREAS];
 unsigned cma_area_count;
@@ -145,6 +146,11 @@ err:
 static int __init cma_init_reserved_areas(void)
 {
 	int i;
+	struct zone *zone;
+	unsigned long start_pfn = UINT_MAX, end_pfn = 0;
+
+	if (!cma_area_count)
+		return 0;
 
 	for (i = 0; i < cma_area_count; i++) {
 		int ret = cma_activate_area(&cma_areas[i]);
@@ -153,6 +159,41 @@ static int __init cma_init_reserved_areas(void)
 			return ret;
 	}
 
+	for (i = 0; i < cma_area_count; i++) {
+		if (start_pfn > cma_areas[i].base_pfn)
+			start_pfn = cma_areas[i].base_pfn;
+		if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
+			end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
+	}
+
+	for_each_populated_zone(zone) {
+		if (!is_zone_cma(zone))
+			continue;
+
+		/* ZONE_CMA doesn't need to exceed CMA region */
+		zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
+		zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
+					zone->zone_start_pfn;
+	}
+
+	/*
+	 * Reserved pages for ZONE_CMA are now activated and this would change
+	 * ZONE_CMA's managed page counter and other zone's present counter.
+	 * We need to re-calculate various zone information that depends on
+	 * this initialization.
+	 */
+	build_all_zonelists(NULL, NULL);
+	for_each_populated_zone(zone) {
+		zone_pcp_update(zone);
+		set_zone_contiguous(zone);
+	}
+
+	/*
+	 * We need to re-init per zone wmark by calling
+	 * init_per_zone_wmark_min() but doesn't call here because it is
+	 * registered on module_init and it will be called later than us.
+	 */
+
 	return 0;
 }
 core_initcall(cma_init_reserved_areas);
diff --git a/mm/internal.h b/mm/internal.h
index b6ead95..4c37234 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -155,6 +155,9 @@ extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
 extern void prep_compound_page(struct page *page, unsigned int order);
 extern int user_min_free_kbytes;
 
+extern void set_zone_contiguous(struct zone *zone);
+extern void clear_zone_contiguous(struct zone *zone);
+
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0197d5d..796b271 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1572,16 +1572,38 @@ void __init page_alloc_init_late(void)
 }
 
 #ifdef CONFIG_CMA
+static void __init adjust_present_page_count(struct page *page, long count)
+{
+	struct zone *zone = page_zone(page);
+
+	/* We don't need to hold a lock since it is boot-up process */
+	zone->present_pages += count;
+}
+
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
 {
 	unsigned i = pageblock_nr_pages;
+	unsigned long pfn = page_to_pfn(page);
 	struct page *p = page;
+	int nid = page_to_nid(page);
+
+	/*
+	 * ZONE_CMA will steal present pages from other zones by changing
+	 * page links so page_zone() is changed. Before that,
+	 * we need to adjust previous zone's page count first.
+	 */
+	adjust_present_page_count(page, -pageblock_nr_pages);
 
 	do {
 		__ClearPageReserved(p);
 		set_page_count(p, 0);
-	} while (++p, --i);
+
+		/* Steal pages from other zones */
+		set_page_links(p, ZONE_CMA, nid, pfn);
+	} while (++p, ++pfn, --i);
+
+	adjust_present_page_count(page, pageblock_nr_pages);
 
 	set_pageblock_migratetype(page, MIGRATE_CMA);
 
@@ -7545,7 +7567,7 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
 }
 #endif
 
-#ifdef CONFIG_MEMORY_HOTPLUG
+#if defined CONFIG_MEMORY_HOTPLUG || defined CONFIG_CMA
 /*
  * The zone indicated has a new number of managed_pages; batch sizes and percpu
  * page high values need to be recalulated.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 4/6] mm/cma: remove ALLOC_CMA
  2016-05-26  6:22 [PATCH v3 0/6] Introduce ZONE_CMA js1304
                   ` (2 preceding siblings ...)
  2016-05-26  6:22 ` [PATCH v3 3/6] mm/cma: populate ZONE_CMA js1304
@ 2016-05-26  6:22 ` js1304
  2016-06-27  9:30   ` Vlastimil Babka
  2016-05-26  6:22 ` [PATCH v3 5/6] mm/cma: remove MIGRATE_CMA js1304
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 34+ messages in thread
From: js1304 @ 2016-05-26  6:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Now, all reserved pages for CMA region are belong to the ZONE_CMA
and it only serves for GFP_HIGHUSER_MOVABLE. Therefore, we don't need to
consider ALLOC_CMA at all.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/internal.h   |  3 +--
 mm/page_alloc.c | 27 +++------------------------
 2 files changed, 4 insertions(+), 26 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 4c37234..04b75d6 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -468,8 +468,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_HARDER		0x10 /* try to alloc harder */
 #define ALLOC_HIGH		0x20 /* __GFP_HIGH set */
 #define ALLOC_CPUSET		0x40 /* check for correct cpuset */
-#define ALLOC_CMA		0x80 /* allow allocations from CMA areas */
-#define ALLOC_FAIR		0x100 /* fair zone allocation */
+#define ALLOC_FAIR		0x80 /* fair zone allocation */
 
 enum ttu_flags;
 struct tlbflush_unmap_batch;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 796b271..bab3698 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2798,12 +2798,6 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 	else
 		min -= min / 4;
 
-#ifdef CONFIG_CMA
-	/* If allocation can't use CMA areas don't use free CMA pages */
-	if (!(alloc_flags & ALLOC_CMA))
-		free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
-#endif
-
 	/*
 	 * Check watermarks for an order-0 allocation request. If these
 	 * are not met, then a high-order request also cannot go ahead
@@ -2833,10 +2827,8 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 		}
 
 #ifdef CONFIG_CMA
-		if ((alloc_flags & ALLOC_CMA) &&
-		    !list_empty(&area->free_list[MIGRATE_CMA])) {
+		if (!list_empty(&area->free_list[MIGRATE_CMA]))
 			return true;
-		}
 #endif
 	}
 	return false;
@@ -2853,13 +2845,6 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
 		unsigned long mark, int classzone_idx, unsigned int alloc_flags)
 {
 	long free_pages = zone_page_state(z, NR_FREE_PAGES);
-	long cma_pages = 0;
-
-#ifdef CONFIG_CMA
-	/* If allocation can't use CMA areas don't use free CMA pages */
-	if (!(alloc_flags & ALLOC_CMA))
-		cma_pages = zone_page_state(z, NR_FREE_CMA_PAGES);
-#endif
 
 	/*
 	 * Fast check for order-0 only. If this fails then the reserves
@@ -2868,7 +2853,7 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
 	 * the caller is !atomic then it'll uselessly search the free
 	 * list. That corner case is then slower but it is harmless.
 	 */
-	if (!order && (free_pages - cma_pages) > mark + z->lowmem_reserve[classzone_idx])
+	if (!order && free_pages > mark + z->lowmem_reserve[classzone_idx])
 		return true;
 
 	return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags,
@@ -3475,10 +3460,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 				 unlikely(test_thread_flag(TIF_MEMDIE))))
 			alloc_flags |= ALLOC_NO_WATERMARKS;
 	}
-#ifdef CONFIG_CMA
-	if (gfpflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
-#endif
+
 	return alloc_flags;
 }
 
@@ -3833,9 +3815,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	if (unlikely(!zonelist->_zonerefs->zone))
 		return NULL;
 
-	if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
-
 retry_cpuset:
 	cpuset_mems_cookie = read_mems_allowed_begin();
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 5/6] mm/cma: remove MIGRATE_CMA
  2016-05-26  6:22 [PATCH v3 0/6] Introduce ZONE_CMA js1304
                   ` (3 preceding siblings ...)
  2016-05-26  6:22 ` [PATCH v3 4/6] mm/cma: remove ALLOC_CMA js1304
@ 2016-05-26  6:22 ` js1304
  2016-05-27  1:42   ` Chen Feng
  2016-06-27  9:46   ` Vlastimil Babka
  2016-05-26  6:22 ` [PATCH v3 6/6] mm/cma: remove per zone CMA stat js1304
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 34+ messages in thread
From: js1304 @ 2016-05-26  6:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Now, all reserved pages for CMA region are belong to the ZONE_CMA
and there is no other type of pages. Therefore, we don't need to
use MIGRATE_CMA to distinguish and handle differently for CMA pages
and ordinary pages. Remove MIGRATE_CMA.

Unfortunately, this patch make free CMA counter incorrect because
we count it when pages are on the MIGRATE_CMA. It will be fixed
by next patch. I can squash next patch here but it makes changes
complicated and hard to review so I separate that.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/gfp.h    |  3 +-
 include/linux/mmzone.h | 22 -------------
 include/linux/vmstat.h |  8 -----
 mm/cma.c               |  2 +-
 mm/compaction.c        | 10 ++----
 mm/hugetlb.c           |  2 +-
 mm/page_alloc.c        | 87 +++++++++++++-------------------------------------
 mm/page_isolation.c    |  5 ++-
 mm/vmstat.c            |  5 +--
 9 files changed, 31 insertions(+), 113 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4d6c008..1a3b869 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -559,8 +559,7 @@ static inline bool pm_suspended_storage(void)
 
 #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
 /* The below functions must be run on a range from a single zone. */
-extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      unsigned migratetype);
+extern int alloc_contig_range(unsigned long start, unsigned long end);
 extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
 #endif
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 54c92a6..236d0bd 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -41,22 +41,6 @@ enum {
 	MIGRATE_RECLAIMABLE,
 	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
 	MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
-#ifdef CONFIG_CMA
-	/*
-	 * MIGRATE_CMA migration type is designed to mimic the way
-	 * ZONE_MOVABLE works.  Only movable pages can be allocated
-	 * from MIGRATE_CMA pageblocks and page allocator never
-	 * implicitly change migration type of MIGRATE_CMA pageblock.
-	 *
-	 * The way to use it is to change migratetype of a range of
-	 * pageblocks to MIGRATE_CMA which can be done by
-	 * __free_pageblock_cma() function.  What is important though
-	 * is that a range of pageblocks must be aligned to
-	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
-	 * a single pageblock.
-	 */
-	MIGRATE_CMA,
-#endif
 #ifdef CONFIG_MEMORY_ISOLATION
 	MIGRATE_ISOLATE,	/* can't allocate from here */
 #endif
@@ -66,12 +50,6 @@ enum {
 /* In mm/page_alloc.c; keep in sync also with show_migration_types() there */
 extern char * const migratetype_names[MIGRATE_TYPES];
 
-#ifdef CONFIG_CMA
-#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
-#else
-#  define is_migrate_cma(migratetype) false
-#endif
-
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 0aa613d..e0eb3e5 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -264,14 +264,6 @@ static inline void drain_zonestat(struct zone *zone,
 			struct per_cpu_pageset *pset) { }
 #endif		/* CONFIG_SMP */
 
-static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages,
-					     int migratetype)
-{
-	__mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
-	if (is_migrate_cma(migratetype))
-		__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
-}
-
 extern const char * const vmstat_text[];
 
 #endif /* _LINUX_VMSTAT_H */
diff --git a/mm/cma.c b/mm/cma.c
index 8684f50..bd436e4 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -444,7 +444,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align)
 
 		pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
 		mutex_lock(&cma_mutex);
-		ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
+		ret = alloc_contig_range(pfn, pfn + count);
 		mutex_unlock(&cma_mutex);
 		if (ret == 0) {
 			page = pfn_to_page(pfn);
diff --git a/mm/compaction.c b/mm/compaction.c
index 1427366..acb1d1a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -76,7 +76,7 @@ static void map_pages(struct list_head *list)
 
 static inline bool migrate_async_suitable(int migratetype)
 {
-	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
+	return migratetype == MIGRATE_MOVABLE;
 }
 
 #ifdef CONFIG_COMPACTION
@@ -953,7 +953,7 @@ static bool suitable_migration_target(struct page *page)
 			return false;
 	}
 
-	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
+	/* If the block is MIGRATE_MOVABLE, allow migration */
 	if (migrate_async_suitable(get_pageblock_migratetype(page)))
 		return true;
 
@@ -1277,12 +1277,6 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 		if (!list_empty(&area->free_list[migratetype]))
 			return COMPACT_PARTIAL;
 
-#ifdef CONFIG_CMA
-		/* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
-		if (migratetype == MIGRATE_MOVABLE &&
-			!list_empty(&area->free_list[MIGRATE_CMA]))
-			return COMPACT_PARTIAL;
-#endif
 		/*
 		 * Job done if allocation would steal freepages from
 		 * other migratetype buddy lists.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d26162e..a081f15 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1029,7 +1029,7 @@ static int __alloc_gigantic_page(unsigned long start_pfn,
 				unsigned long nr_pages)
 {
 	unsigned long end_pfn = start_pfn + nr_pages;
-	return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	return alloc_contig_range(start_pfn, end_pfn);
 }
 
 static bool pfn_range_valid_gigantic(struct zone *z,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bab3698..e1c17d15 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -124,8 +124,8 @@ gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
  * put on a pcplist. Used to avoid the pageblock migratetype lookup when
  * freeing from pcplists in most cases, at the cost of possibly becoming stale.
  * Also the migratetype set in the page does not necessarily match the pcplist
- * index, e.g. page might have MIGRATE_CMA set but be on a pcplist with any
- * other index - this ensures that it will be put on the correct CMA freelist.
+ * index, e.g. page might have MIGRATE_MOVABLE set but be on a pcplist with any
+ * other index - this ensures that it will be put on the correct freelist.
  */
 static inline int get_pcppage_migratetype(struct page *page)
 {
@@ -234,9 +234,6 @@ char * const migratetype_names[MIGRATE_TYPES] = {
 	"Movable",
 	"Reclaimable",
 	"HighAtomic",
-#ifdef CONFIG_CMA
-	"CMA",
-#endif
 #ifdef CONFIG_MEMORY_ISOLATION
 	"Isolate",
 #endif
@@ -670,7 +667,7 @@ static inline void set_page_guard(struct zone *zone, struct page *page,
 	INIT_LIST_HEAD(&page->lru);
 	set_page_private(page, order);
 	/* Guard pages are not available for any usage */
-	__mod_zone_freepage_state(zone, -(1 << order), migratetype);
+	__mod_zone_page_state(zone, NR_FREE_PAGES, -(1 << order));
 }
 
 static inline void clear_page_guard(struct zone *zone, struct page *page,
@@ -689,7 +686,7 @@ static inline void clear_page_guard(struct zone *zone, struct page *page,
 
 	set_page_private(page, 0);
 	if (!is_migrate_isolate(migratetype))
-		__mod_zone_freepage_state(zone, (1 << order), migratetype);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, (1 << order));
 }
 #else
 struct page_ext_operations debug_guardpage_ops = { NULL, };
@@ -800,7 +797,7 @@ static inline void __free_one_page(struct page *page,
 
 	VM_BUG_ON(migratetype == -1);
 	if (likely(!is_migrate_isolate(migratetype)))
-		__mod_zone_freepage_state(zone, 1 << order, migratetype);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, 1 << order);
 
 	page_idx = pfn & ((1 << MAX_ORDER) - 1);
 
@@ -1580,7 +1577,7 @@ static void __init adjust_present_page_count(struct page *page, long count)
 	zone->present_pages += count;
 }
 
-/* Free whole pageblock and set its migration type to MIGRATE_CMA. */
+/* Free whole pageblock and set its migration type to MIGRATE_MOVABLE. */
 void __init init_cma_reserved_pageblock(struct page *page)
 {
 	unsigned i = pageblock_nr_pages;
@@ -1605,7 +1602,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
 
 	adjust_present_page_count(page, pageblock_nr_pages);
 
-	set_pageblock_migratetype(page, MIGRATE_CMA);
+	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 
 	if (pageblock_order >= MAX_ORDER) {
 		i = pageblock_nr_pages;
@@ -1830,25 +1827,11 @@ static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
 	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
-#ifdef CONFIG_CMA
-	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
-#endif
 #ifdef CONFIG_MEMORY_ISOLATION
 	[MIGRATE_ISOLATE]     = { MIGRATE_TYPES }, /* Never used */
 #endif
 };
 
-#ifdef CONFIG_CMA
-static struct page *__rmqueue_cma_fallback(struct zone *zone,
-					unsigned int order)
-{
-	return __rmqueue_smallest(zone, order, MIGRATE_CMA);
-}
-#else
-static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
-					unsigned int order) { return NULL; }
-#endif
-
 /*
  * Move the free pages in a range to the free lists of the requested type.
  * Note that start_page and end_pages are not aligned on a pageblock
@@ -2053,7 +2036,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
 	/* Yoink! */
 	mt = get_pageblock_migratetype(page);
 	if (mt != MIGRATE_HIGHATOMIC &&
-			!is_migrate_isolate(mt) && !is_migrate_cma(mt)) {
+			!is_migrate_isolate(mt)) {
 		zone->nr_reserved_highatomic += pageblock_nr_pages;
 		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
 		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC);
@@ -2156,9 +2139,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
 		/*
 		 * The pcppage_migratetype may differ from pageblock's
 		 * migratetype depending on the decisions in
-		 * find_suitable_fallback(). This is OK as long as it does not
-		 * differ for MIGRATE_CMA pageblocks. Those can be used as
-		 * fallback only via special __rmqueue_cma_fallback() function
+		 * find_suitable_fallback(). This is OK.
 		 */
 		set_pcppage_migratetype(page, start_migratetype);
 
@@ -2181,13 +2162,8 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 	struct page *page;
 
 	page = __rmqueue_smallest(zone, order, migratetype);
-	if (unlikely(!page)) {
-		if (migratetype == MIGRATE_MOVABLE)
-			page = __rmqueue_cma_fallback(zone, order);
-
-		if (!page)
-			page = __rmqueue_fallback(zone, order, migratetype);
-	}
+	if (unlikely(!page))
+		page = __rmqueue_fallback(zone, order, migratetype);
 
 	trace_mm_page_alloc_zone_locked(page, order, migratetype);
 	return page;
@@ -2227,9 +2203,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 		else
 			list_add_tail(&page->lru, list);
 		list = &page->lru;
-		if (is_migrate_cma(get_pcppage_migratetype(page)))
-			__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
-					      -(1 << order));
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
 	spin_unlock(&zone->lock);
@@ -2527,7 +2500,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
 		if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
 			return 0;
 
-		__mod_zone_freepage_state(zone, -(1UL << order), mt);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
 	}
 
 	/* Remove page from free list */
@@ -2542,7 +2515,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages) {
 			int mt = get_pageblock_migratetype(page);
-			if (!is_migrate_isolate(mt) && !is_migrate_cma(mt))
+			if (!is_migrate_isolate(mt))
 				set_pageblock_migratetype(page,
 							  MIGRATE_MOVABLE);
 		}
@@ -2670,8 +2643,7 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
 		if (!page)
 			goto failed;
 		__mod_zone_page_state(zone, NR_ALLOC_BATCH, -(1 << order));
-		__mod_zone_freepage_state(zone, -(1 << order),
-					  get_pcppage_migratetype(page));
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1 << order));
 	}
 
 	if (atomic_long_read(&zone->vm_stat[NR_ALLOC_BATCH]) <= 0 &&
@@ -2825,11 +2797,6 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 			if (!list_empty(&area->free_list[mt]))
 				return true;
 		}
-
-#ifdef CONFIG_CMA
-		if (!list_empty(&area->free_list[MIGRATE_CMA]))
-			return true;
-#endif
 	}
 	return false;
 }
@@ -4320,9 +4287,6 @@ static void show_migration_types(unsigned char type)
 		[MIGRATE_MOVABLE]	= 'M',
 		[MIGRATE_RECLAIMABLE]	= 'E',
 		[MIGRATE_HIGHATOMIC]	= 'H',
-#ifdef CONFIG_CMA
-		[MIGRATE_CMA]		= 'C',
-#endif
 #ifdef CONFIG_MEMORY_ISOLATION
 		[MIGRATE_ISOLATE]	= 'I',
 #endif
@@ -7244,7 +7208,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 		return false;
 
 	mt = get_pageblock_migratetype(page);
-	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
+	if (mt == MIGRATE_MOVABLE)
 		return false;
 
 	pfn = page_to_pfn(page);
@@ -7392,15 +7356,11 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
- * @migratetype:	migratetype of the underlaying pageblocks (either
- *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
- *			in range must have the same migratetype and it must
- *			be either of the two.
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
  * aligned, however it's the caller's responsibility to guarantee that
  * we are the only thread that changes migrate type of pageblocks the
- * pages fall in.
+ * pages fall in and it should be MIGRATE_MOVABLE.
  *
  * The PFN range must belong to a single zone.
  *
@@ -7408,8 +7368,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
  * pages which PFN is in [start, end) are allocated for the caller and
  * need to be freed with free_contig_range().
  */
-int alloc_contig_range(unsigned long start, unsigned long end,
-		       unsigned migratetype)
+int alloc_contig_range(unsigned long start, unsigned long end)
 {
 	unsigned long outer_start, outer_end;
 	unsigned int order;
@@ -7442,14 +7401,14 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * allocator removing them from the buddy system.  This way
 	 * page allocator will never consider using them.
 	 *
-	 * This lets us mark the pageblocks back as
-	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
-	 * aligned range but not in the unaligned, original range are
-	 * put back to page allocator so that buddy can use them.
+	 * This lets us mark the pageblocks back as MIGRATE_MOVABLE
+	 * so that free pages in the aligned range but not in the
+	 * unaligned, original range are put back to page allocator
+	 * so that buddy can use them.
 	 */
 
 	ret = start_isolate_page_range(pfn_max_align_down(start),
-				       pfn_max_align_up(end), migratetype,
+				       pfn_max_align_up(end), MIGRATE_MOVABLE,
 				       false);
 	if (ret)
 		return ret;
@@ -7528,7 +7487,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 done:
 	undo_isolate_page_range(pfn_max_align_down(start),
-				pfn_max_align_up(end), migratetype);
+				pfn_max_align_up(end), MIGRATE_MOVABLE);
 	return ret;
 }
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 612122b..5708649 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -61,13 +61,12 @@ static int set_migratetype_isolate(struct page *page,
 out:
 	if (!ret) {
 		unsigned long nr_pages;
-		int migratetype = get_pageblock_migratetype(page);
 
 		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 		zone->nr_isolate_pageblock++;
 		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
 
-		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -nr_pages);
 	}
 
 	spin_unlock_irqrestore(&zone->lock, flags);
@@ -122,7 +121,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 	 */
 	if (!isolated_page) {
 		nr_pages = move_freepages_block(zone, page, migratetype);
-		__mod_zone_freepage_state(zone, nr_pages, migratetype);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
 	}
 	set_pageblock_migratetype(page, migratetype);
 	zone->nr_isolate_pageblock--;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 48c4942..8d18d1e 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1088,10 +1088,7 @@ static void pagetypeinfo_showmixedcount_print(struct seq_file *m,
 
 			page_mt = gfpflags_to_migratetype(page_ext->gfp_mask);
 			if (pageblock_mt != page_mt) {
-				if (is_migrate_cma(pageblock_mt))
-					count[MIGRATE_MOVABLE]++;
-				else
-					count[pageblock_mt]++;
+				count[pageblock_mt]++;
 
 				pfn = block_end_pfn;
 				break;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 6/6] mm/cma: remove per zone CMA stat
  2016-05-26  6:22 [PATCH v3 0/6] Introduce ZONE_CMA js1304
                   ` (4 preceding siblings ...)
  2016-05-26  6:22 ` [PATCH v3 5/6] mm/cma: remove MIGRATE_CMA js1304
@ 2016-05-26  6:22 ` js1304
  2016-06-27  9:54   ` Vlastimil Babka
  2016-05-26  8:04 ` [PATCH v3 0/6] Introduce ZONE_CMA Feng Tang
  2016-06-27 11:25 ` Balbir Singh
  7 siblings, 1 reply; 34+ messages in thread
From: js1304 @ 2016-05-26  6:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Now, all reserved pages for CMA region are belong to the ZONE_CMA
so we don't need to maintain CMA stat in other zones. Remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 fs/proc/meminfo.c      |  2 +-
 include/linux/cma.h    |  6 ++++++
 include/linux/mmzone.h |  1 -
 mm/cma.c               | 15 +++++++++++++++
 mm/page_alloc.c        |  5 ++---
 mm/vmstat.c            |  1 -
 6 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 8372046..bd853725c 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -167,7 +167,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 #endif
 #ifdef CONFIG_CMA
 		, K(totalcma_pages)
-		, K(global_page_state(NR_FREE_CMA_PAGES))
+		, K(cma_get_free())
 #endif
 		);
 
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 29f9e77..816290c 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -28,4 +28,10 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 					struct cma **res_cma);
 extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align);
 extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count);
+
+#ifdef CONFIG_CMA
+extern unsigned long cma_get_free(void);
+#else
+static inline unsigned long cma_get_free(void) { return 0; }
+#endif
 #endif
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 236d0bd..59f2181 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -130,7 +130,6 @@ enum zone_stat_item {
 	WORKINGSET_ACTIVATE,
 	WORKINGSET_NODERECLAIM,
 	NR_ANON_TRANSPARENT_HUGEPAGES,
-	NR_FREE_CMA_PAGES,
 	NR_VM_ZONE_STAT_ITEMS };
 
 /*
diff --git a/mm/cma.c b/mm/cma.c
index bd436e4..6dbddf2 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -54,6 +54,21 @@ unsigned long cma_get_size(const struct cma *cma)
 	return cma->count << PAGE_SHIFT;
 }
 
+unsigned long cma_get_free(void)
+{
+	struct zone *zone;
+	unsigned long freecma = 0;
+
+	for_each_populated_zone(zone) {
+		if (!is_zone_cma(zone))
+			continue;
+
+		freecma += zone_page_state(zone, NR_FREE_PAGES);
+	}
+
+	return freecma;
+}
+
 static unsigned long cma_bitmap_aligned_mask(const struct cma *cma,
 					     int align_order)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e1c17d15..da6e6cf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -63,6 +63,7 @@
 #include <linux/sched/rt.h>
 #include <linux/page_owner.h>
 #include <linux/kthread.h>
+#include <linux/cma.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
@@ -4351,7 +4352,7 @@ void show_free_areas(unsigned int filter)
 		global_page_state(NR_BOUNCE),
 		global_page_state(NR_FREE_PAGES),
 		free_pcp,
-		global_page_state(NR_FREE_CMA_PAGES));
+		cma_get_free());
 
 	for_each_populated_zone(zone) {
 		int i;
@@ -4391,7 +4392,6 @@ void show_free_areas(unsigned int filter)
 			" bounce:%lukB"
 			" free_pcp:%lukB"
 			" local_pcp:%ukB"
-			" free_cma:%lukB"
 			" writeback_tmp:%lukB"
 			" pages_scanned:%lu"
 			" all_unreclaimable? %s"
@@ -4424,7 +4424,6 @@ void show_free_areas(unsigned int filter)
 			K(zone_page_state(zone, NR_BOUNCE)),
 			K(free_pcp),
 			K(this_cpu_read(zone->pageset->pcp.count)),
-			K(zone_page_state(zone, NR_FREE_CMA_PAGES)),
 			K(zone_page_state(zone, NR_WRITEBACK_TEMP)),
 			K(zone_page_state(zone, NR_PAGES_SCANNED)),
 			(!zone_reclaimable(zone) ? "yes" : "no")
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 8d18d1e..9607f99 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -750,7 +750,6 @@ const char * const vmstat_text[] = {
 	"workingset_activate",
 	"workingset_nodereclaim",
 	"nr_anon_transparent_hugepages",
-	"nr_free_cma",
 
 	/* enum writeback_stat_item counters */
 	"nr_dirty_threshold",
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/6] Introduce ZONE_CMA
  2016-05-26  6:22 [PATCH v3 0/6] Introduce ZONE_CMA js1304
                   ` (5 preceding siblings ...)
  2016-05-26  6:22 ` [PATCH v3 6/6] mm/cma: remove per zone CMA stat js1304
@ 2016-05-26  8:04 ` Feng Tang
  2016-05-27  5:28   ` Joonsoo Kim
  2016-06-27 11:25 ` Balbir Singh
  7 siblings, 1 reply; 34+ messages in thread
From: Feng Tang @ 2016-05-26  8:04 UTC (permalink / raw)
  To: js1304
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, Joonsoo Kim

On Thu, May 26, 2016 at 02:22:22PM +0800, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Hi Joonsoo,

Nice work!

> 
> Hello,
> 
> Changes from v2
> o Rebase on next-20160525
> o No other changes except following description
> 
> There was a discussion with Mel [1] after LSF/MM 2016. I could summarise
> it to help merge decision but it's better to read by yourself since
> if I summarise it, it would be biased for me. But, if anyone hope
> the summary, I will do it. :)
> 
> Anyway, Mel's position on this patchset seems to be neutral. He said:
> "I'm not going to outright NAK your series but I won't ACK it either"
> 
> We can fix the problems with any approach but I hope to go a new zone
> approach because it is less error-prone. It reduces some corner case
> handling for now and remove need for potential corner case handling to fix
> problems.
> 
> Note that our company is already using ZONE_CMA for a years and
> there is no problem.
> 
> If anyone has a different opinion, please let me know and let's discuss
> together.
> 
> Andrew, if there is something to do for merge, please let me know.
> 
> [1] https://lkml.kernel.org/r/20160425053653.GA25662@js1304-P5Q-DELUXE
> 
> Changes from v1
> o Separate some patches which deserve to submit independently
> o Modify description to reflect current kernel state
> (e.g. high-order watermark problem disappeared by Mel's work)
> o Don't increase SECTION_SIZE_BITS to make a room in page flags
> (detailed reason is on the patch that adds ZONE_CMA)
> o Adjust ZONE_CMA population code
> 
> This series try to solve problems of current CMA implementation.
> 
> CMA is introduced to provide physically contiguous pages at runtime
> without exclusive reserved memory area. But, current implementation
> works like as previous reserved memory approach, because freepages
> on CMA region are used only if there is no movable freepage. In other
> words, freepages on CMA region are only used as fallback. In that
> situation where freepages on CMA region are used as fallback, kswapd
> would be woken up easily since there is no unmovable and reclaimable
> freepage, too. If kswapd starts to reclaim memory, fallback allocation
> to MIGRATE_CMA doesn't occur any more since movable freepages are
> already refilled by kswapd and then most of freepage on CMA are left
> to be in free. This situation looks like exclusive reserved memory case.
> 
> In my experiment, I found that if system memory has 1024 MB memory and
> 512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
> free memory is left. Detailed reason is that for keeping enough free
> memory for unmovable and reclaimable allocation, kswapd uses below
> equation when calculating free memory and it easily go under the watermark.
> 
> Free memory for unmovable and reclaimable = Free total - Free CMA pages
> 
> This is derivated from the property of CMA freepage that CMA freepage
> can't be used for unmovable and reclaimable allocation.
> 
> Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
> is lower than low watermark and tries to make free memory until
> (FreeTotal - FreeCMA) is higher than high watermark. That results
> in that FreeTotal is moving around 512MB boundary consistently. It
> then means that we can't utilize full memory capacity.
> 
> To fix this problem, I submitted some patches [1] about 10 months ago,
> but, found some more problems to be fixed before solving this problem.
> It requires many hooks in allocator hotpath so some developers doesn't
> like it. Instead, some of them suggest different approach [2] to fix
> all the problems related to CMA, that is, introducing a new zone to deal
> with free CMA pages. I agree that it is the best way to go so implement
> here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
> decide to add a new zone rather than piggyback on ZONE_MOVABLE since
> they have some differences. First, reserved CMA pages should not be
> offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
> MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
> to distiguish hotpluggable memory and reserved memory for CMA in the same
> zone. It would make memory hotplug code which is already complicated
> more complicated. Second, cma_alloc() can be called more frequently
> than memory hotplug operation and possibly we need to control
> allocation rate of ZONE_CMA to optimize latency in the future.
> In this case, separate zone approach is easy to modify. Third, I'd
> like to see statistics for CMA, separately. Sometimes, we need to debug
> why cma_alloc() is failed and separate statistics would be more helpful
> in this situtaion.
> 
> Anyway, this patchset solves four problems related to CMA implementation.
> 
> 1) Utilization problem
> As mentioned above, we can't utilize full memory capacity due to the
> limitation of CMA freepage and fallback policy. This patchset implements
> a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
> typed allocation is used for page cache and anonymous pages which
> occupies most of memory usage in normal case so we can utilize full
> memory capacity. Below is the experiment result about this problem.
> 
> 8 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
> 
> <Before this series>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           92.4		186.5
> pswpin:                 82		18647
> pswpout:                160		69839
> 
> <After this series>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           93.1		93.4
> pswpin:                 84		46
> pswpout:                183		92
> 
> FYI, there is another attempt [3] trying to solve this problem in lkml.
> And, as far as I know, Qualcomm also has out-of-tree solution for this
> problem.

This may be a little off-topic :) Actually, we have used another way in
our products, that we disable the fallback from MIGRATETYE_MOVABLE to
MIGRATETYPE_CMA completely, and only allow free CMA memory to be used
by file page cache (which is easy to be reclaimed by its nature). 
We did it by adding a GFP_PAGE_CACHE to every allocation request for
page cache, and the MM will try to pick up an available free CMA page
first, and goes to normal path when fail. 

It works fine on our products, though we still see some cases that
some page can't be reclaimed. 

Our product has a special user case of CMA, that sometimes it will
need to use the whole CMA memory (say 256MB on a phone), then all
share out CMA pages need to be reclaimed all at once. Don't know if
this new ZONE_CMA approach could meet this request? (our page cache
solution can't ganrantee to meet this request all the time).

Thanks,
Feng

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 5/6] mm/cma: remove MIGRATE_CMA
  2016-05-26  6:22 ` [PATCH v3 5/6] mm/cma: remove MIGRATE_CMA js1304
@ 2016-05-27  1:42   ` Chen Feng
  2016-05-27  5:32     ` Joonsoo Kim
  2016-06-27  9:46   ` Vlastimil Babka
  1 sibling, 1 reply; 34+ messages in thread
From: Chen Feng @ 2016-05-27  1:42 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, Joonsoo Kim, qijiwen, Zhuangluan Su, Dan Zhao

Hi Joonsoo,

On 2016/5/26 14:22, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> Now, all reserved pages for CMA region are belong to the ZONE_CMA
> and there is no other type of pages. Therefore, we don't need to
> use MIGRATE_CMA to distinguish and handle differently for CMA pages
> and ordinary pages. Remove MIGRATE_CMA.
> 
> Unfortunately, this patch make free CMA counter incorrect because
> we count it when pages are on the MIGRATE_CMA. It will be fixed
> by next patch. I can squash next patch here but it makes changes
> complicated and hard to review so I separate that.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  include/linux/gfp.h    |  3 +-
>  include/linux/mmzone.h | 22 -------------
>  include/linux/vmstat.h |  8 -----
>  mm/cma.c               |  2 +-
>  mm/compaction.c        | 10 ++----
>  mm/hugetlb.c           |  2 +-
>  mm/page_alloc.c        | 87 +++++++++++++-------------------------------------
>  mm/page_isolation.c    |  5 ++-
>  mm/vmstat.c            |  5 +--
>  9 files changed, 31 insertions(+), 113 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 4d6c008..1a3b869 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -559,8 +559,7 @@ static inline bool pm_suspended_storage(void)
>  
>  #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
>  /* The below functions must be run on a range from a single zone. */
> -extern int alloc_contig_range(unsigned long start, unsigned long end,
> -			      unsigned migratetype);
> +extern int alloc_contig_range(unsigned long start, unsigned long end);
>  extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
>  #endif
>  
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 54c92a6..236d0bd 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -41,22 +41,6 @@ enum {
>  	MIGRATE_RECLAIMABLE,
>  	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
>  	MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
> -#ifdef CONFIG_CMA
> -	/*
> -	 * MIGRATE_CMA migration type is designed to mimic the way
> -	 * ZONE_MOVABLE works.  Only movable pages can be allocated
> -	 * from MIGRATE_CMA pageblocks and page allocator never
> -	 * implicitly change migration type of MIGRATE_CMA pageblock.
> -	 *
> -	 * The way to use it is to change migratetype of a range of
> -	 * pageblocks to MIGRATE_CMA which can be done by
> -	 * __free_pageblock_cma() function.  What is important though
> -	 * is that a range of pageblocks must be aligned to
> -	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
> -	 * a single pageblock.
> -	 */
> -	MIGRATE_CMA,
> -#endif
>  #ifdef CONFIG_MEMORY_ISOLATION
>  	MIGRATE_ISOLATE,	/* can't allocate from here */
>  #endif
> @@ -66,12 +50,6 @@ enum {
>  /* In mm/page_alloc.c; keep in sync also with show_migration_types() there */
>  extern char * const migratetype_names[MIGRATE_TYPES];
>  
> -#ifdef CONFIG_CMA
> -#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
> -#else
> -#  define is_migrate_cma(migratetype) false
> -#endif
> -
>  #define for_each_migratetype_order(order, type) \
>  	for (order = 0; order < MAX_ORDER; order++) \
>  		for (type = 0; type < MIGRATE_TYPES; type++)
> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> index 0aa613d..e0eb3e5 100644
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -264,14 +264,6 @@ static inline void drain_zonestat(struct zone *zone,
>  			struct per_cpu_pageset *pset) { }
>  #endif		/* CONFIG_SMP */
>  
> -static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages,
> -					     int migratetype)
> -{
> -	__mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
> -	if (is_migrate_cma(migratetype))
> -		__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
> -}
> -
>  extern const char * const vmstat_text[];
>  
>  #endif /* _LINUX_VMSTAT_H */
> diff --git a/mm/cma.c b/mm/cma.c
> index 8684f50..bd436e4 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -444,7 +444,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align)
>  
>  		pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
>  		mutex_lock(&cma_mutex);
> -		ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
> +		ret = alloc_contig_range(pfn, pfn + count);
>  		mutex_unlock(&cma_mutex);
>  		if (ret == 0) {
>  			page = pfn_to_page(pfn);
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 1427366..acb1d1a 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -76,7 +76,7 @@ static void map_pages(struct list_head *list)
>  
>  static inline bool migrate_async_suitable(int migratetype)
>  {
> -	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
> +	return migratetype == MIGRATE_MOVABLE;
>  }
>  
>  #ifdef CONFIG_COMPACTION
> @@ -953,7 +953,7 @@ static bool suitable_migration_target(struct page *page)
>  			return false;
>  	}
>  
> -	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
> +	/* If the block is MIGRATE_MOVABLE, allow migration */
>  	if (migrate_async_suitable(get_pageblock_migratetype(page)))
>  		return true;
>  
> @@ -1277,12 +1277,6 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
>  		if (!list_empty(&area->free_list[migratetype]))
>  			return COMPACT_PARTIAL;
>  
> -#ifdef CONFIG_CMA
> -		/* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
> -		if (migratetype == MIGRATE_MOVABLE &&
> -			!list_empty(&area->free_list[MIGRATE_CMA]))
> -			return COMPACT_PARTIAL;
> -#endif
>  		/*
>  		 * Job done if allocation would steal freepages from
>  		 * other migratetype buddy lists.
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d26162e..a081f15 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1029,7 +1029,7 @@ static int __alloc_gigantic_page(unsigned long start_pfn,
>  				unsigned long nr_pages)
>  {
>  	unsigned long end_pfn = start_pfn + nr_pages;
> -	return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
> +	return alloc_contig_range(start_pfn, end_pfn);
>  }
>  
>  static bool pfn_range_valid_gigantic(struct zone *z,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bab3698..e1c17d15 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -124,8 +124,8 @@ gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
>   * put on a pcplist. Used to avoid the pageblock migratetype lookup when
>   * freeing from pcplists in most cases, at the cost of possibly becoming stale.
>   * Also the migratetype set in the page does not necessarily match the pcplist
> - * index, e.g. page might have MIGRATE_CMA set but be on a pcplist with any
> - * other index - this ensures that it will be put on the correct CMA freelist.
> + * index, e.g. page might have MIGRATE_MOVABLE set but be on a pcplist with any
> + * other index - this ensures that it will be put on the correct freelist.
>   */
>  static inline int get_pcppage_migratetype(struct page *page)
>  {
> @@ -234,9 +234,6 @@ char * const migratetype_names[MIGRATE_TYPES] = {
>  	"Movable",
>  	"Reclaimable",
>  	"HighAtomic",
> -#ifdef CONFIG_CMA
> -	"CMA",
> -#endif
>  #ifdef CONFIG_MEMORY_ISOLATION
>  	"Isolate",
>  #endif
> @@ -670,7 +667,7 @@ static inline void set_page_guard(struct zone *zone, struct page *page,
>  	INIT_LIST_HEAD(&page->lru);
>  	set_page_private(page, order);
>  	/* Guard pages are not available for any usage */
> -	__mod_zone_freepage_state(zone, -(1 << order), migratetype);
> +	__mod_zone_page_state(zone, NR_FREE_PAGES, -(1 << order));
>  }
>  
>  static inline void clear_page_guard(struct zone *zone, struct page *page,
> @@ -689,7 +686,7 @@ static inline void clear_page_guard(struct zone *zone, struct page *page,
>  
>  	set_page_private(page, 0);
>  	if (!is_migrate_isolate(migratetype))
> -		__mod_zone_freepage_state(zone, (1 << order), migratetype);
> +		__mod_zone_page_state(zone, NR_FREE_PAGES, (1 << order));
>  }
>  #else
>  struct page_ext_operations debug_guardpage_ops = { NULL, };
> @@ -800,7 +797,7 @@ static inline void __free_one_page(struct page *page,
>  
>  	VM_BUG_ON(migratetype == -1);
>  	if (likely(!is_migrate_isolate(migratetype)))
> -		__mod_zone_freepage_state(zone, 1 << order, migratetype);
> +		__mod_zone_page_state(zone, NR_FREE_PAGES, 1 << order);
>  
>  	page_idx = pfn & ((1 << MAX_ORDER) - 1);
>  
> @@ -1580,7 +1577,7 @@ static void __init adjust_present_page_count(struct page *page, long count)
>  	zone->present_pages += count;
>  }
>  
> -/* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> +/* Free whole pageblock and set its migration type to MIGRATE_MOVABLE. */
>  void __init init_cma_reserved_pageblock(struct page *page)
>  {
>  	unsigned i = pageblock_nr_pages;
> @@ -1605,7 +1602,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  
>  	adjust_present_page_count(page, pageblock_nr_pages);
>  
> -	set_pageblock_migratetype(page, MIGRATE_CMA);
> +	set_pageblock_migratetype(page, MIGRATE_MOVABLE);

I have a question here, if the ZONE_CMA pages are all movable.

Then the unmovable alloc will also use CMA memory. Is this right?

How can the cma memory be migrate?

>  
>  	if (pageblock_order >= MAX_ORDER) {
>  		i = pageblock_nr_pages;
> @@ -1830,25 +1827,11 @@ static int fallbacks[MIGRATE_TYPES][4] = {
>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
> -#ifdef CONFIG_CMA
> -	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
> -#endif
>  #ifdef CONFIG_MEMORY_ISOLATION
>  	[MIGRATE_ISOLATE]     = { MIGRATE_TYPES }, /* Never used */
>  #endif
>  };
>  
> -#ifdef CONFIG_CMA
> -static struct page *__rmqueue_cma_fallback(struct zone *zone,
> -					unsigned int order)
> -{
> -	return __rmqueue_smallest(zone, order, MIGRATE_CMA);
> -}
> -#else
> -static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
> -					unsigned int order) { return NULL; }
> -#endif
> -
>  /*
>   * Move the free pages in a range to the free lists of the requested type.
>   * Note that start_page and end_pages are not aligned on a pageblock
> @@ -2053,7 +2036,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
>  	/* Yoink! */
>  	mt = get_pageblock_migratetype(page);
>  	if (mt != MIGRATE_HIGHATOMIC &&
> -			!is_migrate_isolate(mt) && !is_migrate_cma(mt)) {
> +			!is_migrate_isolate(mt)) {
>  		zone->nr_reserved_highatomic += pageblock_nr_pages;
>  		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
>  		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC);
> @@ -2156,9 +2139,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
>  		/*
>  		 * The pcppage_migratetype may differ from pageblock's
>  		 * migratetype depending on the decisions in
> -		 * find_suitable_fallback(). This is OK as long as it does not
> -		 * differ for MIGRATE_CMA pageblocks. Those can be used as
> -		 * fallback only via special __rmqueue_cma_fallback() function
> +		 * find_suitable_fallback(). This is OK.
>  		 */
>  		set_pcppage_migratetype(page, start_migratetype);
>  
> @@ -2181,13 +2162,8 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
>  	struct page *page;
>  
>  	page = __rmqueue_smallest(zone, order, migratetype);
> -	if (unlikely(!page)) {
> -		if (migratetype == MIGRATE_MOVABLE)
> -			page = __rmqueue_cma_fallback(zone, order);
> -
> -		if (!page)
> -			page = __rmqueue_fallback(zone, order, migratetype);
> -	}
> +	if (unlikely(!page))
> +		page = __rmqueue_fallback(zone, order, migratetype);
>  
>  	trace_mm_page_alloc_zone_locked(page, order, migratetype);
>  	return page;
> @@ -2227,9 +2203,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>  		else
>  			list_add_tail(&page->lru, list);
>  		list = &page->lru;
> -		if (is_migrate_cma(get_pcppage_migratetype(page)))
> -			__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
> -					      -(1 << order));
>  	}
>  	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
>  	spin_unlock(&zone->lock);
> @@ -2527,7 +2500,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
>  		if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
>  			return 0;
>  
> -		__mod_zone_freepage_state(zone, -(1UL << order), mt);
> +		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
>  	}
>  
>  	/* Remove page from free list */
> @@ -2542,7 +2515,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
>  		struct page *endpage = page + (1 << order) - 1;
>  		for (; page < endpage; page += pageblock_nr_pages) {
>  			int mt = get_pageblock_migratetype(page);
> -			if (!is_migrate_isolate(mt) && !is_migrate_cma(mt))
> +			if (!is_migrate_isolate(mt))
>  				set_pageblock_migratetype(page,
>  							  MIGRATE_MOVABLE);
>  		}
> @@ -2670,8 +2643,7 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
>  		if (!page)
>  			goto failed;
>  		__mod_zone_page_state(zone, NR_ALLOC_BATCH, -(1 << order));
> -		__mod_zone_freepage_state(zone, -(1 << order),
> -					  get_pcppage_migratetype(page));
> +		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1 << order));
>  	}
>  
>  	if (atomic_long_read(&zone->vm_stat[NR_ALLOC_BATCH]) <= 0 &&
> @@ -2825,11 +2797,6 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
>  			if (!list_empty(&area->free_list[mt]))
>  				return true;
>  		}
> -
> -#ifdef CONFIG_CMA
> -		if (!list_empty(&area->free_list[MIGRATE_CMA]))
> -			return true;
> -#endif
>  	}
>  	return false;
>  }
> @@ -4320,9 +4287,6 @@ static void show_migration_types(unsigned char type)
>  		[MIGRATE_MOVABLE]	= 'M',
>  		[MIGRATE_RECLAIMABLE]	= 'E',
>  		[MIGRATE_HIGHATOMIC]	= 'H',
> -#ifdef CONFIG_CMA
> -		[MIGRATE_CMA]		= 'C',
> -#endif
>  #ifdef CONFIG_MEMORY_ISOLATION
>  		[MIGRATE_ISOLATE]	= 'I',
>  #endif
> @@ -7244,7 +7208,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  		return false;
>  
>  	mt = get_pageblock_migratetype(page);
> -	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> +	if (mt == MIGRATE_MOVABLE)
>  		return false;
>  
>  	pfn = page_to_pfn(page);
> @@ -7392,15 +7356,11 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>   * alloc_contig_range() -- tries to allocate given range of pages
>   * @start:	start PFN to allocate
>   * @end:	one-past-the-last PFN to allocate
> - * @migratetype:	migratetype of the underlaying pageblocks (either
> - *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
> - *			in range must have the same migratetype and it must
> - *			be either of the two.
>   *
>   * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
>   * aligned, however it's the caller's responsibility to guarantee that
>   * we are the only thread that changes migrate type of pageblocks the
> - * pages fall in.
> + * pages fall in and it should be MIGRATE_MOVABLE.
>   *
>   * The PFN range must belong to a single zone.
>   *
> @@ -7408,8 +7368,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>   * pages which PFN is in [start, end) are allocated for the caller and
>   * need to be freed with free_contig_range().
>   */
> -int alloc_contig_range(unsigned long start, unsigned long end,
> -		       unsigned migratetype)
> +int alloc_contig_range(unsigned long start, unsigned long end)
>  {
>  	unsigned long outer_start, outer_end;
>  	unsigned int order;
> @@ -7442,14 +7401,14 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  	 * allocator removing them from the buddy system.  This way
>  	 * page allocator will never consider using them.
>  	 *
> -	 * This lets us mark the pageblocks back as
> -	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
> -	 * aligned range but not in the unaligned, original range are
> -	 * put back to page allocator so that buddy can use them.
> +	 * This lets us mark the pageblocks back as MIGRATE_MOVABLE
> +	 * so that free pages in the aligned range but not in the
> +	 * unaligned, original range are put back to page allocator
> +	 * so that buddy can use them.
>  	 */
>  
>  	ret = start_isolate_page_range(pfn_max_align_down(start),
> -				       pfn_max_align_up(end), migratetype,
> +				       pfn_max_align_up(end), MIGRATE_MOVABLE,
>  				       false);
>  	if (ret)
>  		return ret;
> @@ -7528,7 +7487,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  
>  done:
>  	undo_isolate_page_range(pfn_max_align_down(start),
> -				pfn_max_align_up(end), migratetype);
> +				pfn_max_align_up(end), MIGRATE_MOVABLE);
>  	return ret;
>  }
>  
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 612122b..5708649 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -61,13 +61,12 @@ static int set_migratetype_isolate(struct page *page,
>  out:
>  	if (!ret) {
>  		unsigned long nr_pages;
> -		int migratetype = get_pageblock_migratetype(page);
>  
>  		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
>  		zone->nr_isolate_pageblock++;
>  		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
>  
> -		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
> +		__mod_zone_page_state(zone, NR_FREE_PAGES, -nr_pages);
>  	}
>  
>  	spin_unlock_irqrestore(&zone->lock, flags);
> @@ -122,7 +121,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>  	 */
>  	if (!isolated_page) {
>  		nr_pages = move_freepages_block(zone, page, migratetype);
> -		__mod_zone_freepage_state(zone, nr_pages, migratetype);
> +		__mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
>  	}
>  	set_pageblock_migratetype(page, migratetype);
>  	zone->nr_isolate_pageblock--;
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 48c4942..8d18d1e 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1088,10 +1088,7 @@ static void pagetypeinfo_showmixedcount_print(struct seq_file *m,
>  
>  			page_mt = gfpflags_to_migratetype(page_ext->gfp_mask);
>  			if (pageblock_mt != page_mt) {
> -				if (is_migrate_cma(pageblock_mt))
> -					count[MIGRATE_MOVABLE]++;
> -				else
> -					count[pageblock_mt]++;
> +				count[pageblock_mt]++;
>  
>  				pfn = block_end_pfn;
>  				break;
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/6] Introduce ZONE_CMA
  2016-05-26  8:04 ` [PATCH v3 0/6] Introduce ZONE_CMA Feng Tang
@ 2016-05-27  5:28   ` Joonsoo Kim
  2016-05-27  6:25     ` Feng Tang
  0 siblings, 1 reply; 34+ messages in thread
From: Joonsoo Kim @ 2016-05-27  5:28 UTC (permalink / raw)
  To: Feng Tang
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel

On Thu, May 26, 2016 at 04:04:54PM +0800, Feng Tang wrote:
> On Thu, May 26, 2016 at 02:22:22PM +0800, js1304@gmail.com wrote:
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> Hi Joonsoo,
> 
> Nice work!

Thanks!

> > FYI, there is another attempt [3] trying to solve this problem in lkml.
> > And, as far as I know, Qualcomm also has out-of-tree solution for this
> > problem.
> 
> This may be a little off-topic :) Actually, we have used another way in
> our products, that we disable the fallback from MIGRATETYE_MOVABLE to
> MIGRATETYPE_CMA completely, and only allow free CMA memory to be used
> by file page cache (which is easy to be reclaimed by its nature). 
> We did it by adding a GFP_PAGE_CACHE to every allocation request for
> page cache, and the MM will try to pick up an available free CMA page
> first, and goes to normal path when fail. 

Just wonder, why do you allow CMA memory to file page cache rather
than anonymous page? I guess that anonymous pages would be more easily
migrated/reclaimed than file page cache. In fact, some of our product
uses anonymous page adaptation to satisfy similar requirement by
introducing GFP_CMA. AFAIK, some of chip vendor also uses "anonymous
page first adaptation" to get better success rate.

> It works fine on our products, though we still see some cases that
> some page can't be reclaimed. 
> 
> Our product has a special user case of CMA, that sometimes it will
> need to use the whole CMA memory (say 256MB on a phone), then all

I don't think this usecase is so special. Our product also has similar
usecase. And, I already knows one another.

> share out CMA pages need to be reclaimed all at once. Don't know if
> this new ZONE_CMA approach could meet this request? (our page cache
> solution can't ganrantee to meet this request all the time).

This ZONE_CMA approach would be better than before, since CMA memory
is not be used for blockdev page cache. Blockdev page cache is one of
the frequent failure points in my experience.

I'm not sure that ZONE_CMA works better than your GFP_PAGE_CACHE
adaptation for your system. In ZONE_CMA, CMA memory is used for file
page cache or anonymous pages. If my assumption that anonymous pages
are easier to be migrated/reclaimed is correct, ZONE_CMA would work
better than your adaptation since there is less file page cache pages
in CMA memory.

Anyway, it also doesn't guarantee to succeed all the time. There is
different kind of problem that prevents CMA allocation success and we
need to solve it. I will try it after problems that this patchset try
to fix is solved.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 5/6] mm/cma: remove MIGRATE_CMA
  2016-05-27  1:42   ` Chen Feng
@ 2016-05-27  5:32     ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2016-05-27  5:32 UTC (permalink / raw)
  To: Chen Feng
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, qijiwen, Zhuangluan Su, Dan Zhao

On Fri, May 27, 2016 at 09:42:24AM +0800, Chen Feng wrote:
> Hi Joonsoo,
> > -/* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> > +/* Free whole pageblock and set its migration type to MIGRATE_MOVABLE. */
> >  void __init init_cma_reserved_pageblock(struct page *page)
> >  {
> >  	unsigned i = pageblock_nr_pages;
> > @@ -1605,7 +1602,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
> >  
> >  	adjust_present_page_count(page, pageblock_nr_pages);
> >  
> > -	set_pageblock_migratetype(page, MIGRATE_CMA);
> > +	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> 
> I have a question here, if the ZONE_CMA pages are all movable.
> 
> Then the unmovable alloc will also use CMA memory. Is this right?

No, previous patch changes that the CMA memory is on separate zone,
ZONE_CMA. We allow that zone when gfp is GFP_HIGHUSER_MOVABLE so
unmovable allocation cannot happen on CMA memory.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/6] Introduce ZONE_CMA
  2016-05-27  5:28   ` Joonsoo Kim
@ 2016-05-27  6:25     ` Feng Tang
  2016-05-27  6:42       ` Joonsoo Kim
  0 siblings, 1 reply; 34+ messages in thread
From: Feng Tang @ 2016-05-27  6:25 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel

On Fri, May 27, 2016 at 01:28:20PM +0800, Joonsoo Kim wrote:
> On Thu, May 26, 2016 at 04:04:54PM +0800, Feng Tang wrote:
> > On Thu, May 26, 2016 at 02:22:22PM +0800, js1304@gmail.com wrote:
> > > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
 
> > > FYI, there is another attempt [3] trying to solve this problem in lkml.
> > > And, as far as I know, Qualcomm also has out-of-tree solution for this
> > > problem.
> > 
> > This may be a little off-topic :) Actually, we have used another way in
> > our products, that we disable the fallback from MIGRATETYE_MOVABLE to
> > MIGRATETYPE_CMA completely, and only allow free CMA memory to be used
> > by file page cache (which is easy to be reclaimed by its nature). 
> > We did it by adding a GFP_PAGE_CACHE to every allocation request for
> > page cache, and the MM will try to pick up an available free CMA page
> > first, and goes to normal path when fail. 
> 
> Just wonder, why do you allow CMA memory to file page cache rather
> than anonymous page? I guess that anonymous pages would be more easily
> migrated/reclaimed than file page cache. In fact, some of our product
> uses anonymous page adaptation to satisfy similar requirement by
> introducing GFP_CMA. AFAIK, some of chip vendor also uses "anonymous
> page first adaptation" to get better success rate.

The biggest problem we faced is to allocate big chunk of CMA memory,
say 256MB in a whole, or 9 pieces of 20MB buffers, so the speed
is not the biggest concern, but whether all the cma pages be reclaimed.

With the MOVABLE fallback, there may be many types of bad guys from device
drivers/kernel or different subsystems, who refuse to return the borrowed
cma pages, so I took a lazy way by only allowing page cache to use free
cma pages, and we see good results which could pass most of the test for
allocating big chunks. 

One of the customer used to use a CMA sharing patch from another vendor
on our Socs, which can't pass these tests and finally took our page cache
approach.

> 
> > It works fine on our products, though we still see some cases that
> > some page can't be reclaimed. 
> > 
> > Our product has a special user case of CMA, that sometimes it will
> > need to use the whole CMA memory (say 256MB on a phone), then all
> 
> I don't think this usecase is so special. Our product also has similar
> usecase. And, I already knows one another.

:) I first touch CMA in 2014 and have only worked on Sofia platforms.

> 
> > share out CMA pages need to be reclaimed all at once. Don't know if
> > this new ZONE_CMA approach could meet this request? (our page cache
> > solution can't ganrantee to meet this request all the time).
> 
> This ZONE_CMA approach would be better than before, since CMA memory
> is not be used for blockdev page cache. Blockdev page cache is one of
> the frequent failure points in my experience.

Indeed! I also explicitely disabled cma sharing for blkdev FS page cache.

> 
> I'm not sure that ZONE_CMA works better than your GFP_PAGE_CACHE
> adaptation for your system. In ZONE_CMA, CMA memory is used for file
> page cache or anonymous pages. If my assumption that anonymous pages
> are easier to be migrated/reclaimed is correct, ZONE_CMA would work
> better than your adaptation since there is less file page cache pages
> in CMA memory.
> 
> Anyway, it also doesn't guarantee to succeed all the time. There is
> different kind of problem that prevents CMA allocation success and we
> need to solve it. I will try it after problems that this patchset try
> to fix is solved.

ZONE_CMA should be cleaner, while our page cache solution needs to
adjust some policy for lowmemorykiller and page scan/reclaim code.   

Thanks,
Feng

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/6] Introduce ZONE_CMA
  2016-05-27  6:25     ` Feng Tang
@ 2016-05-27  6:42       ` Joonsoo Kim
  2016-05-27  7:27         ` Feng Tang
  0 siblings, 1 reply; 34+ messages in thread
From: Joonsoo Kim @ 2016-05-27  6:42 UTC (permalink / raw)
  To: Feng Tang
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel

On Fri, May 27, 2016 at 02:25:27PM +0800, Feng Tang wrote:
> On Fri, May 27, 2016 at 01:28:20PM +0800, Joonsoo Kim wrote:
> > On Thu, May 26, 2016 at 04:04:54PM +0800, Feng Tang wrote:
> > > On Thu, May 26, 2016 at 02:22:22PM +0800, js1304@gmail.com wrote:
> > > > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > 
>  
> > > > FYI, there is another attempt [3] trying to solve this problem in lkml.
> > > > And, as far as I know, Qualcomm also has out-of-tree solution for this
> > > > problem.
> > > 
> > > This may be a little off-topic :) Actually, we have used another way in
> > > our products, that we disable the fallback from MIGRATETYE_MOVABLE to
> > > MIGRATETYPE_CMA completely, and only allow free CMA memory to be used
> > > by file page cache (which is easy to be reclaimed by its nature). 
> > > We did it by adding a GFP_PAGE_CACHE to every allocation request for
> > > page cache, and the MM will try to pick up an available free CMA page
> > > first, and goes to normal path when fail. 
> > 
> > Just wonder, why do you allow CMA memory to file page cache rather
> > than anonymous page? I guess that anonymous pages would be more easily
> > migrated/reclaimed than file page cache. In fact, some of our product
> > uses anonymous page adaptation to satisfy similar requirement by
> > introducing GFP_CMA. AFAIK, some of chip vendor also uses "anonymous
> > page first adaptation" to get better success rate.
> 
> The biggest problem we faced is to allocate big chunk of CMA memory,
> say 256MB in a whole, or 9 pieces of 20MB buffers, so the speed
> is not the biggest concern, but whether all the cma pages be reclaimed.

Okay. Our product have similar workload.

> With the MOVABLE fallback, there may be many types of bad guys from device
> drivers/kernel or different subsystems, who refuse to return the borrowed
> cma pages, so I took a lazy way by only allowing page cache to use free
> cma pages, and we see good results which could pass most of the test for
> allocating big chunks. 

Could you explain more about why file page cache rather than anonymous page?
If there is a reason, I'd like to test it by myself.

> One of the customer used to use a CMA sharing patch from another vendor
> on our Socs, which can't pass these tests and finally took our page cache
> approach.

CMA has too many problems so each vendor uses their own adaptation. I'd
like to solve this code fragmentation by fixing problems on upstream
kernel and this ZONE_CMA is one of that effort. If you can share the
pointer for your adaptation, it would be very helpful to me.

Thanks.

> > 
> > > It works fine on our products, though we still see some cases that
> > > some page can't be reclaimed. 
> > > 
> > > Our product has a special user case of CMA, that sometimes it will
> > > need to use the whole CMA memory (say 256MB on a phone), then all
> > 
> > I don't think this usecase is so special. Our product also has similar
> > usecase. And, I already knows one another.
> 
> :) I first touch CMA in 2014 and have only worked on Sofia platforms.
> 
> > 
> > > share out CMA pages need to be reclaimed all at once. Don't know if
> > > this new ZONE_CMA approach could meet this request? (our page cache
> > > solution can't ganrantee to meet this request all the time).
> > 
> > This ZONE_CMA approach would be better than before, since CMA memory
> > is not be used for blockdev page cache. Blockdev page cache is one of
> > the frequent failure points in my experience.
> 
> Indeed! I also explicitely disabled cma sharing for blkdev FS page cache.
> 
> > 
> > I'm not sure that ZONE_CMA works better than your GFP_PAGE_CACHE
> > adaptation for your system. In ZONE_CMA, CMA memory is used for file
> > page cache or anonymous pages. If my assumption that anonymous pages
> > are easier to be migrated/reclaimed is correct, ZONE_CMA would work
> > better than your adaptation since there is less file page cache pages
> > in CMA memory.
> > 
> > Anyway, it also doesn't guarantee to succeed all the time. There is
> > different kind of problem that prevents CMA allocation success and we
> > need to solve it. I will try it after problems that this patchset try
> > to fix is solved.
> 
> ZONE_CMA should be cleaner, while our page cache solution needs to
> adjust some policy for lowmemorykiller and page scan/reclaim code.   
> 
> Thanks,
> Feng
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/6] Introduce ZONE_CMA
  2016-05-27  6:42       ` Joonsoo Kim
@ 2016-05-27  7:27         ` Feng Tang
  2016-05-30  5:45           ` Joonsoo Kim
  2016-06-17  7:38           ` Chen Feng
  0 siblings, 2 replies; 34+ messages in thread
From: Feng Tang @ 2016-05-27  7:27 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel

On Fri, May 27, 2016 at 02:42:18PM +0800, Joonsoo Kim wrote:
> On Fri, May 27, 2016 at 02:25:27PM +0800, Feng Tang wrote:
> > On Fri, May 27, 2016 at 01:28:20PM +0800, Joonsoo Kim wrote:
> > > On Thu, May 26, 2016 at 04:04:54PM +0800, Feng Tang wrote:
> > > > On Thu, May 26, 2016 at 02:22:22PM +0800, js1304@gmail.com wrote:
> > > > > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > > 
> >  
> > > > > FYI, there is another attempt [3] trying to solve this problem in lkml.
> > > > > And, as far as I know, Qualcomm also has out-of-tree solution for this
> > > > > problem.
> > > > 
> > > > This may be a little off-topic :) Actually, we have used another way in
> > > > our products, that we disable the fallback from MIGRATETYE_MOVABLE to
> > > > MIGRATETYPE_CMA completely, and only allow free CMA memory to be used
> > > > by file page cache (which is easy to be reclaimed by its nature). 
> > > > We did it by adding a GFP_PAGE_CACHE to every allocation request for
> > > > page cache, and the MM will try to pick up an available free CMA page
> > > > first, and goes to normal path when fail. 
> > > 
> > > Just wonder, why do you allow CMA memory to file page cache rather
> > > than anonymous page? I guess that anonymous pages would be more easily
> > > migrated/reclaimed than file page cache. In fact, some of our product
> > > uses anonymous page adaptation to satisfy similar requirement by
> > > introducing GFP_CMA. AFAIK, some of chip vendor also uses "anonymous
> > > page first adaptation" to get better success rate.
> > 
> > The biggest problem we faced is to allocate big chunk of CMA memory,
> > say 256MB in a whole, or 9 pieces of 20MB buffers, so the speed
> > is not the biggest concern, but whether all the cma pages be reclaimed.
> 
> Okay. Our product have similar workload.
> 
> > With the MOVABLE fallback, there may be many types of bad guys from device
> > drivers/kernel or different subsystems, who refuse to return the borrowed
> > cma pages, so I took a lazy way by only allowing page cache to use free
> > cma pages, and we see good results which could pass most of the test for
> > allocating big chunks. 
> 
> Could you explain more about why file page cache rather than anonymous page?
> If there is a reason, I'd like to test it by myself.

I didn't make it clear. This is not for anonymous page, but for MIGRATETYPE_MOVABLE.

following is the patch to disable the kernel default sharing (kernel 3.14)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1b5f20e..a5e698f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -974,7 +974,11 @@ static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,     MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,     MIGRATE_RESERVE },
 #ifdef CONFIG_CMA
-	[MIGRATE_MOVABLE]     = { MIGRATE_CMA,         MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
 	[MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
 	[MIGRATE_CMA_ISOLATE] = { MIGRATE_RESERVE }, /* Never used */
 #else
@@ -1414,6 +1418,18 @@ void free_hot_cold_page(struct page *page, int cold)
 	local_irq_save(flags);
 	__count_vm_event(PGFREE);
 
+#ifndef CONFIG_USE_CMA_FALLBACK
+	if (migratetype == MIGRATE_CMA) {
+		free_one_page(zone, page, 0, MIGRATE_CMA);
+		local_irq_restore(flags);
+		return;
+	}
+#endif
+

> 
> > One of the customer used to use a CMA sharing patch from another vendor
> > on our Socs, which can't pass these tests and finally took our page cache
> > approach.
> 
> CMA has too many problems so each vendor uses their own adaptation. I'd
> like to solve this code fragmentation by fixing problems on upstream
> kernel and this ZONE_CMA is one of that effort. If you can share the
> pointer for your adaptation, it would be very helpful to me.

As I said, I started to work on CMA problem back in 2014, and faced many
of these failure in reclamation problems. I didn't have time and capability
to track/analyze each and every failure, but decided to go another way by
only allowing the page cache to use CMA.  And frankly speaking, I don't have
detailed data for performance measurement, but some rough one, that it
did improve the cma page reclaiming and the usage rate.

Our patches was based on 3.14 (the Android Mashmallow kenrel). Earlier this
year I finally got some free time, and worked on cleaning them for submission
to LKML, and found your cma improving patches merged in 4.1 or 4.2, so I gave
up as my patches is more hacky :)

The sharing patch is here FYI:
------
commit fb28d4db6278df42ab2ef4996bdfd44e613ace99
Author: Feng Tang <feng.tang@intel.com>
Date:   Wed Jul 15 13:39:50 2015 +0800

    cma, page-cache: use cma as page cache
    
    This will free a lot of cma memory for system to use them
    as page cache. Previously, cma memory is mostly preserved
    and difficult to be shared by others, thus a big waste.
    
    Using them as page cache will improve the meory usage, while
    keeping the flexibility of fast reclaiming when big cma memory
    request comes.
    
    And some of the threshold values should be adjustable for
    different platforms with different cma reserved memory, common
    cma usage scenario and CTS test should be carefully verified
    for those adjustment.
    
    Signed-off-by: Feng Tang <feng.tang@intel.com>

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 5dc12b7..3c3ab2b 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -36,6 +36,7 @@ struct vm_area_struct;
 #define ___GFP_NO_KSWAPD	0x400000u
 #define ___GFP_OTHER_NODE	0x800000u
 #define ___GFP_WRITE		0x1000000u
+#define ___GFP_CMA_PAGE_CACHE	0x2000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -123,6 +124,9 @@ struct vm_area_struct;
 			 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | \
 			 __GFP_NO_KSWAPD)
 
+/* Allocat for page cache use */
+#define GFP_PAGE_CACHE	((__force gfp_t)___GFP_CMA_PAGE_CACHE)
+
 /*
  * GFP_THISNODE does not perform any reclaim, you most likely want to
  * use __GFP_THISNODE to allocate from a given node without fallback!
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 1710d1b..a2452f6 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -221,7 +221,7 @@ extern struct page *__page_cache_alloc(gfp_t gfp);
 #else
 static inline struct page *__page_cache_alloc(gfp_t gfp)
 {
-	return alloc_pages(gfp, 0);
+	return alloc_pages(gfp | GFP_PAGE_CACHE, 0);
 }
 #endif
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 532ee0d..1b5f20e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1568,7 +1568,7 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
 	int cold = !!(gfp_flags & __GFP_COLD);
 
 again:
-	if (likely(order == 0)) {
+	if (likely(order == 0) && !(gfp_flags & GFP_PAGE_CACHE)) {
 		struct per_cpu_pages *pcp;
 		struct list_head *list;
 
@@ -2744,6 +2744,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET;
 	struct mem_cgroup *memcg = NULL;
 
+	gfp_allowed_mask |= GFP_PAGE_CACHE;
+
 	gfp_mask &= gfp_allowed_mask;
 
 	lockdep_trace_alloc(gfp_mask);
@@ -2753,6 +2755,25 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	if (should_fail_alloc_page(gfp_mask, order))
 		return NULL;
 
+#ifdef CONFIG_CMA
+	if (gfp_mask & GFP_PAGE_CACHE) {
+		int nr_free = global_page_state(NR_FREE_PAGES)
+				- totalreserve_pages;
+		int free_cma = global_page_state(NR_FREE_CMA_PAGES);
+
+		/*
+		 * Use CMA memory as page cache iff system is under memory
+		 * pressure and free cma is big enough (>= 48M).  And these
+		 * value should be adjustable for different platforms with
+		 * different cma reserved memory
+		 */
+		if ((nr_free - free_cma) <= (48 * 1024 * 1024 / PAGE_SIZE)
+			&& free_cma >= (48 * 1024 * 1024 / PAGE_SIZE)) {
+			migratetype = MIGRATE_CMA;
+		}
+	}
+#endif
+
 	/*
 	 * Check the zones suitable for the gfp_mask contain at least one
 	 * valid zone. It's possible to have an empty zonelist as a result



 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/6] Introduce ZONE_CMA
  2016-05-27  7:27         ` Feng Tang
@ 2016-05-30  5:45           ` Joonsoo Kim
  2016-06-17  7:38           ` Chen Feng
  1 sibling, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2016-05-30  5:45 UTC (permalink / raw)
  To: Feng Tang
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel

On Fri, May 27, 2016 at 03:27:02PM +0800, Feng Tang wrote:
> On Fri, May 27, 2016 at 02:42:18PM +0800, Joonsoo Kim wrote:
> > On Fri, May 27, 2016 at 02:25:27PM +0800, Feng Tang wrote:
> > > On Fri, May 27, 2016 at 01:28:20PM +0800, Joonsoo Kim wrote:
> > > > On Thu, May 26, 2016 at 04:04:54PM +0800, Feng Tang wrote:
> > > > > On Thu, May 26, 2016 at 02:22:22PM +0800, js1304@gmail.com wrote:
> > > > > > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > > > 
> > >  
> > > > > > FYI, there is another attempt [3] trying to solve this problem in lkml.
> > > > > > And, as far as I know, Qualcomm also has out-of-tree solution for this
> > > > > > problem.
> > > > > 
> > > > > This may be a little off-topic :) Actually, we have used another way in
> > > > > our products, that we disable the fallback from MIGRATETYE_MOVABLE to
> > > > > MIGRATETYPE_CMA completely, and only allow free CMA memory to be used
> > > > > by file page cache (which is easy to be reclaimed by its nature). 
> > > > > We did it by adding a GFP_PAGE_CACHE to every allocation request for
> > > > > page cache, and the MM will try to pick up an available free CMA page
> > > > > first, and goes to normal path when fail. 
> > > > 
> > > > Just wonder, why do you allow CMA memory to file page cache rather
> > > > than anonymous page? I guess that anonymous pages would be more easily
> > > > migrated/reclaimed than file page cache. In fact, some of our product
> > > > uses anonymous page adaptation to satisfy similar requirement by
> > > > introducing GFP_CMA. AFAIK, some of chip vendor also uses "anonymous
> > > > page first adaptation" to get better success rate.
> > > 
> > > The biggest problem we faced is to allocate big chunk of CMA memory,
> > > say 256MB in a whole, or 9 pieces of 20MB buffers, so the speed
> > > is not the biggest concern, but whether all the cma pages be reclaimed.
> > 
> > Okay. Our product have similar workload.
> > 
> > > With the MOVABLE fallback, there may be many types of bad guys from device
> > > drivers/kernel or different subsystems, who refuse to return the borrowed
> > > cma pages, so I took a lazy way by only allowing page cache to use free
> > > cma pages, and we see good results which could pass most of the test for
> > > allocating big chunks. 
> > 
> > Could you explain more about why file page cache rather than anonymous page?
> > If there is a reason, I'd like to test it by myself.
> 
> I didn't make it clear. This is not for anonymous page, but for MIGRATETYPE_MOVABLE.

Anonymous page is one of the pages with MIGRATETYPE_MOVABLE. So, you
can also restrict CMA memory only for anonymous page like as you did
for file page cache. Some of our product used this work around so I'd
like to know if there is a reason.

> 
> following is the patch to disable the kernel default sharing (kernel 3.14)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1b5f20e..a5e698f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -974,7 +974,11 @@ static int fallbacks[MIGRATE_TYPES][4] = {
>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,     MIGRATE_RESERVE },
>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,     MIGRATE_RESERVE },
>  #ifdef CONFIG_CMA
> -	[MIGRATE_MOVABLE]     = { MIGRATE_CMA,         MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> +	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
>  	[MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
>  	[MIGRATE_CMA_ISOLATE] = { MIGRATE_RESERVE }, /* Never used */
>  #else
> @@ -1414,6 +1418,18 @@ void free_hot_cold_page(struct page *page, int cold)
>  	local_irq_save(flags);
>  	__count_vm_event(PGFREE);
>  
> +#ifndef CONFIG_USE_CMA_FALLBACK
> +	if (migratetype == MIGRATE_CMA) {
> +		free_one_page(zone, page, 0, MIGRATE_CMA);
> +		local_irq_restore(flags);
> +		return;
> +	}
> +#endif
> +
> 
> > 
> > > One of the customer used to use a CMA sharing patch from another vendor
> > > on our Socs, which can't pass these tests and finally took our page cache
> > > approach.
> > 
> > CMA has too many problems so each vendor uses their own adaptation. I'd
> > like to solve this code fragmentation by fixing problems on upstream
> > kernel and this ZONE_CMA is one of that effort. If you can share the
> > pointer for your adaptation, it would be very helpful to me.
> 
> As I said, I started to work on CMA problem back in 2014, and faced many
> of these failure in reclamation problems. I didn't have time and capability
> to track/analyze each and every failure, but decided to go another way by
> only allowing the page cache to use CMA.  And frankly speaking, I don't have
> detailed data for performance measurement, but some rough one, that it
> did improve the cma page reclaiming and the usage rate.

Okay!

> Our patches was based on 3.14 (the Android Mashmallow kenrel). Earlier this
> year I finally got some free time, and worked on cleaning them for submission
> to LKML, and found your cma improving patches merged in 4.1 or 4.2, so I gave
> up as my patches is more hacky :)
> 
> The sharing patch is here FYI:

Thanks for sharing!! It will be helpful.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/6] Introduce ZONE_CMA
  2016-05-27  7:27         ` Feng Tang
  2016-05-30  5:45           ` Joonsoo Kim
@ 2016-06-17  7:38           ` Chen Feng
  2016-06-20  6:48             ` Joonsoo Kim
  1 sibling, 1 reply; 34+ messages in thread
From: Chen Feng @ 2016-06-17  7:38 UTC (permalink / raw)
  To: Feng Tang, Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, Yiping Xu, fujun (F),
	Zhuangluan Su, Dan Zhao, saberlily.xia

Hi Kim & feng,

Thanks for the share. In our platform also has the same use case.

We only let the alloc with GFP_HIGHUSER_MOVABLE in memory.c to use cma memory.

If we add zone_cma, It seems can resolve the cma migrate issue.

But when free_hot_cold_page, we need let the cma page goto system directly not the pcp.
It can be fail while cma_alloc and cma_release. If we alloc the whole cma pages which
declared before.

On 2016/5/27 15:27, Feng Tang wrote:
> On Fri, May 27, 2016 at 02:42:18PM +0800, Joonsoo Kim wrote:
>> On Fri, May 27, 2016 at 02:25:27PM +0800, Feng Tang wrote:
>>> On Fri, May 27, 2016 at 01:28:20PM +0800, Joonsoo Kim wrote:
>>>> On Thu, May 26, 2016 at 04:04:54PM +0800, Feng Tang wrote:
>>>>> On Thu, May 26, 2016 at 02:22:22PM +0800, js1304@gmail.com wrote:
>>>>>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>>>>
>>>  
>>>>>> FYI, there is another attempt [3] trying to solve this problem in lkml.
>>>>>> And, as far as I know, Qualcomm also has out-of-tree solution for this
>>>>>> problem.
>>>>>
>>>>> This may be a little off-topic :) Actually, we have used another way in
>>>>> our products, that we disable the fallback from MIGRATETYE_MOVABLE to
>>>>> MIGRATETYPE_CMA completely, and only allow free CMA memory to be used
>>>>> by file page cache (which is easy to be reclaimed by its nature). 
>>>>> We did it by adding a GFP_PAGE_CACHE to every allocation request for
>>>>> page cache, and the MM will try to pick up an available free CMA page
>>>>> first, and goes to normal path when fail. 
>>>>
>>>> Just wonder, why do you allow CMA memory to file page cache rather
>>>> than anonymous page? I guess that anonymous pages would be more easily
>>>> migrated/reclaimed than file page cache. In fact, some of our product
>>>> uses anonymous page adaptation to satisfy similar requirement by
>>>> introducing GFP_CMA. AFAIK, some of chip vendor also uses "anonymous
>>>> page first adaptation" to get better success rate.
>>>
>>> The biggest problem we faced is to allocate big chunk of CMA memory,
>>> say 256MB in a whole, or 9 pieces of 20MB buffers, so the speed
>>> is not the biggest concern, but whether all the cma pages be reclaimed.
>>
>> Okay. Our product have similar workload.
>>
>>> With the MOVABLE fallback, there may be many types of bad guys from device
>>> drivers/kernel or different subsystems, who refuse to return the borrowed
>>> cma pages, so I took a lazy way by only allowing page cache to use free
>>> cma pages, and we see good results which could pass most of the test for
>>> allocating big chunks. 
>>
>> Could you explain more about why file page cache rather than anonymous page?
>> If there is a reason, I'd like to test it by myself.
> 
> I didn't make it clear. This is not for anonymous page, but for MIGRATETYPE_MOVABLE.
> 
> following is the patch to disable the kernel default sharing (kernel 3.14)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1b5f20e..a5e698f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -974,7 +974,11 @@ static int fallbacks[MIGRATE_TYPES][4] = {
>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,     MIGRATE_RESERVE },
>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,     MIGRATE_RESERVE },
>  #ifdef CONFIG_CMA
> -	[MIGRATE_MOVABLE]     = { MIGRATE_CMA,         MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> +	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
>  	[MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
>  	[MIGRATE_CMA_ISOLATE] = { MIGRATE_RESERVE }, /* Never used */
>  #else
> @@ -1414,6 +1418,18 @@ void free_hot_cold_page(struct page *page, int cold)
>  	local_irq_save(flags);
>  	__count_vm_event(PGFREE);
>  
> +#ifndef CONFIG_USE_CMA_FALLBACK
> +	if (migratetype == MIGRATE_CMA) {
> +		free_one_page(zone, page, 0, MIGRATE_CMA);
> +		local_irq_restore(flags);
> +		return;
> +	}
> +#endif
> +
> 
>>
>>> One of the customer used to use a CMA sharing patch from another vendor
>>> on our Socs, which can't pass these tests and finally took our page cache
>>> approach.
>>
>> CMA has too many problems so each vendor uses their own adaptation. I'd
>> like to solve this code fragmentation by fixing problems on upstream
>> kernel and this ZONE_CMA is one of that effort. If you can share the
>> pointer for your adaptation, it would be very helpful to me.
> 
> As I said, I started to work on CMA problem back in 2014, and faced many
> of these failure in reclamation problems. I didn't have time and capability
> to track/analyze each and every failure, but decided to go another way by
> only allowing the page cache to use CMA.  And frankly speaking, I don't have
> detailed data for performance measurement, but some rough one, that it
> did improve the cma page reclaiming and the usage rate.
> 
> Our patches was based on 3.14 (the Android Mashmallow kenrel). Earlier this
> year I finally got some free time, and worked on cleaning them for submission
> to LKML, and found your cma improving patches merged in 4.1 or 4.2, so I gave
> up as my patches is more hacky :)
> 
> The sharing patch is here FYI:
> ------
> commit fb28d4db6278df42ab2ef4996bdfd44e613ace99
> Author: Feng Tang <feng.tang@intel.com>
> Date:   Wed Jul 15 13:39:50 2015 +0800
> 
>     cma, page-cache: use cma as page cache
>     
>     This will free a lot of cma memory for system to use them
>     as page cache. Previously, cma memory is mostly preserved
>     and difficult to be shared by others, thus a big waste.
>     
>     Using them as page cache will improve the meory usage, while
>     keeping the flexibility of fast reclaiming when big cma memory
>     request comes.
>     
>     And some of the threshold values should be adjustable for
>     different platforms with different cma reserved memory, common
>     cma usage scenario and CTS test should be carefully verified
>     for those adjustment.
>     
>     Signed-off-by: Feng Tang <feng.tang@intel.com>
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 5dc12b7..3c3ab2b 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -36,6 +36,7 @@ struct vm_area_struct;
>  #define ___GFP_NO_KSWAPD	0x400000u
>  #define ___GFP_OTHER_NODE	0x800000u
>  #define ___GFP_WRITE		0x1000000u
> +#define ___GFP_CMA_PAGE_CACHE	0x2000000u
>  /* If the above are modified, __GFP_BITS_SHIFT may need updating */
>  
>  /*
> @@ -123,6 +124,9 @@ struct vm_area_struct;
>  			 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | \
>  			 __GFP_NO_KSWAPD)
>  
> +/* Allocat for page cache use */
> +#define GFP_PAGE_CACHE	((__force gfp_t)___GFP_CMA_PAGE_CACHE)
> +
>  /*
>   * GFP_THISNODE does not perform any reclaim, you most likely want to
>   * use __GFP_THISNODE to allocate from a given node without fallback!
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 1710d1b..a2452f6 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -221,7 +221,7 @@ extern struct page *__page_cache_alloc(gfp_t gfp);
>  #else
>  static inline struct page *__page_cache_alloc(gfp_t gfp)
>  {
> -	return alloc_pages(gfp, 0);
> +	return alloc_pages(gfp | GFP_PAGE_CACHE, 0);
>  }
>  #endif
>  
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 532ee0d..1b5f20e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1568,7 +1568,7 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
>  	int cold = !!(gfp_flags & __GFP_COLD);
>  
>  again:
> -	if (likely(order == 0)) {
> +	if (likely(order == 0) && !(gfp_flags & GFP_PAGE_CACHE)) {
>  		struct per_cpu_pages *pcp;
>  		struct list_head *list;
>  
> @@ -2744,6 +2744,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
>  	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET;
>  	struct mem_cgroup *memcg = NULL;
>  
> +	gfp_allowed_mask |= GFP_PAGE_CACHE;
> +
>  	gfp_mask &= gfp_allowed_mask;
>  
>  	lockdep_trace_alloc(gfp_mask);
> @@ -2753,6 +2755,25 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
>  	if (should_fail_alloc_page(gfp_mask, order))
>  		return NULL;
>  
> +#ifdef CONFIG_CMA
> +	if (gfp_mask & GFP_PAGE_CACHE) {
> +		int nr_free = global_page_state(NR_FREE_PAGES)
> +				- totalreserve_pages;
> +		int free_cma = global_page_state(NR_FREE_CMA_PAGES);
> +
> +		/*
> +		 * Use CMA memory as page cache iff system is under memory
> +		 * pressure and free cma is big enough (>= 48M).  And these
> +		 * value should be adjustable for different platforms with
> +		 * different cma reserved memory
> +		 */
> +		if ((nr_free - free_cma) <= (48 * 1024 * 1024 / PAGE_SIZE)
> +			&& free_cma >= (48 * 1024 * 1024 / PAGE_SIZE)) {
> +			migratetype = MIGRATE_CMA;
> +		}
> +	}
> +#endif
> +
>  	/*
>  	 * Check the zones suitable for the gfp_mask contain at least one
>  	 * valid zone. It's possible to have an empty zonelist as a result
> 
> 
> 
>  
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> .
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/6] Introduce ZONE_CMA
  2016-06-17  7:38           ` Chen Feng
@ 2016-06-20  6:48             ` Joonsoo Kim
  2016-06-21  2:08               ` Chen Feng
  0 siblings, 1 reply; 34+ messages in thread
From: Joonsoo Kim @ 2016-06-20  6:48 UTC (permalink / raw)
  To: Chen Feng
  Cc: Feng Tang, Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, Yiping Xu, fujun (F),
	Zhuangluan Su, Dan Zhao, saberlily.xia

On Fri, Jun 17, 2016 at 03:38:49PM +0800, Chen Feng wrote:
> Hi Kim & feng,
> 
> Thanks for the share. In our platform also has the same use case.
> 
> We only let the alloc with GFP_HIGHUSER_MOVABLE in memory.c to use cma memory.
> 
> If we add zone_cma, It seems can resolve the cma migrate issue.
> 
> But when free_hot_cold_page, we need let the cma page goto system directly not the pcp.
> It can be fail while cma_alloc and cma_release. If we alloc the whole cma pages which
> declared before.

Hmm...I'm not sure I understand your explanation. So, if I miss
something, please let me know. We calls drain_all_pages() when
isolating pageblock and alloc_contig_range() also has one
drain_all_pages() calls to drain pcp pages. And, after pageblock isolation,
freed pages belonging to MIGRATE_ISOLATE pageblock will go to the
buddy directly so there would be no problem you mentioned. Isn't it?

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/6] Introduce ZONE_CMA
  2016-06-20  6:48             ` Joonsoo Kim
@ 2016-06-21  2:08               ` Chen Feng
  2016-06-21  6:56                 ` Joonsoo Kim
  0 siblings, 1 reply; 34+ messages in thread
From: Chen Feng @ 2016-06-21  2:08 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Feng Tang, Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, Yiping Xu, fujun (F),
	Zhuangluan Su, Dan Zhao, saberlily.xia



On 2016/6/20 14:48, Joonsoo Kim wrote:
> On Fri, Jun 17, 2016 at 03:38:49PM +0800, Chen Feng wrote:
>> Hi Kim & feng,
>>
>> Thanks for the share. In our platform also has the same use case.
>>
>> We only let the alloc with GFP_HIGHUSER_MOVABLE in memory.c to use cma memory.
>>
>> If we add zone_cma, It seems can resolve the cma migrate issue.
>>
>> But when free_hot_cold_page, we need let the cma page goto system directly not the pcp.
>> It can be fail while cma_alloc and cma_release. If we alloc the whole cma pages which
>> declared before.
> 
> Hmm...I'm not sure I understand your explanation. So, if I miss
> something, please let me know. We calls drain_all_pages() when
> isolating pageblock and alloc_contig_range() also has one
> drain_all_pages() calls to drain pcp pages. And, after pageblock isolation,
> freed pages belonging to MIGRATE_ISOLATE pageblock will go to the
> buddy directly so there would be no problem you mentioned. Isn't it?
> 
Yes, you are right.

I mean if the we free cma page to pcp-list, it will goto the migrate_movable list.

Then the alloc with movable flag can use the cma memory from the list with buffered_rmqueue.

But that's not what we want. It will cause the migrate fail if all movable alloc can use cma memory.

If I am wrong, please correct me.

Thanks.

> Thanks.
> 
> .
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/6] Introduce ZONE_CMA
  2016-06-21  2:08               ` Chen Feng
@ 2016-06-21  6:56                 ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2016-06-21  6:56 UTC (permalink / raw)
  To: Chen Feng
  Cc: Feng Tang, Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, Yiping Xu, fujun (F),
	Zhuangluan Su, Dan Zhao, saberlily.xia

On Tue, Jun 21, 2016 at 10:08:24AM +0800, Chen Feng wrote:
> 
> 
> On 2016/6/20 14:48, Joonsoo Kim wrote:
> > On Fri, Jun 17, 2016 at 03:38:49PM +0800, Chen Feng wrote:
> >> Hi Kim & feng,
> >>
> >> Thanks for the share. In our platform also has the same use case.
> >>
> >> We only let the alloc with GFP_HIGHUSER_MOVABLE in memory.c to use cma memory.
> >>
> >> If we add zone_cma, It seems can resolve the cma migrate issue.
> >>
> >> But when free_hot_cold_page, we need let the cma page goto system directly not the pcp.
> >> It can be fail while cma_alloc and cma_release. If we alloc the whole cma pages which
> >> declared before.
> > 
> > Hmm...I'm not sure I understand your explanation. So, if I miss
> > something, please let me know. We calls drain_all_pages() when
> > isolating pageblock and alloc_contig_range() also has one
> > drain_all_pages() calls to drain pcp pages. And, after pageblock isolation,
> > freed pages belonging to MIGRATE_ISOLATE pageblock will go to the
> > buddy directly so there would be no problem you mentioned. Isn't it?
> > 
> Yes, you are right.
> 
> I mean if the we free cma page to pcp-list, it will goto the migrate_movable list.
> 
> Then the alloc with movable flag can use the cma memory from the list with buffered_rmqueue.
> 
> But that's not what we want. It will cause the migrate fail if all movable alloc can use cma memory.

Yes, if you modify current kernel code to allow cma pages only for
GFP_HIGHUSER_MOVABLE in memory.c, there are some corner cases and some of cma
pages would be allocated for !GFP_HIGHUSER_MOVABLE. One possible site is
pcp list as you mentioned and the other site is on compaction.

If we uses ZONE_CMA, there is no such problem, because freepages on
pcp list on ZONE_CMA are allocated only when GFP_HIGHUSER_MOVABLE requset
comes.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 3/6] mm/cma: populate ZONE_CMA
  2016-05-26  6:22 ` [PATCH v3 3/6] mm/cma: populate ZONE_CMA js1304
@ 2016-06-22  9:23   ` Chen Feng
  2016-06-23  2:52     ` Joonsoo Kim
  2016-06-27  8:24   ` Vlastimil Babka
  1 sibling, 1 reply; 34+ messages in thread
From: Chen Feng @ 2016-06-22  9:23 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, Joonsoo Kim, fujun (F),
	Zhuangluan Su, Yiping Xu, Dan Zhao

Hello,

On 2016/5/26 14:22, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> Until now, reserved pages for CMA are managed in the ordinary zones
> where page's pfn are belong to. This approach has numorous problems
> and fixing them isn't easy. (It is mentioned on previous patch.)
> To fix this situation, ZONE_CMA is introduced in previous patch, but,
> not yet populated. This patch implement population of ZONE_CMA
> by stealing reserved pages from the ordinary zones.
> 
> Unlike previous implementation that kernel allocation request with
> __GFP_MOVABLE could be serviced from CMA region, allocation request only
> with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
> approach. This is an inevitable design decision to use the zone
> implementation because ZONE_CMA could contain highmem. Due to this
> decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.
> 
> I don't think it would be a problem because most of file cache pages
> and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
> be proved by the fact that there are many systems with ZONE_HIGHMEM and
> they work fine. Notable disadvantage is that we cannot use these pages
> for blockdev file cache page, because it usually has __GFP_MOVABLE but
> not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
> cons. In my experience, blockdev file cache pages are one of the top
> reason that causes cma_alloc() to fail temporarily. So, we can get more
> guarantee of cma_alloc() success by discarding that case.
> 
> Implementation itself is very easy to understand. Steal when cma area is
> initialized and recalculate various per zone stat/threshold.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  include/linux/memory_hotplug.h |  3 ---
>  mm/cma.c                       | 41 +++++++++++++++++++++++++++++++++++++++++
>  mm/internal.h                  |  3 +++
>  mm/page_alloc.c                | 26 ++++++++++++++++++++++++--
>  4 files changed, 68 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index a864d79..6fde69b 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -198,9 +198,6 @@ void put_online_mems(void);
>  void mem_hotplug_begin(void);
>  void mem_hotplug_done(void);
>  
> -extern void set_zone_contiguous(struct zone *zone);
> -extern void clear_zone_contiguous(struct zone *zone);
> -
>  #else /* ! CONFIG_MEMORY_HOTPLUG */
>  /*
>   * Stub functions for when hotplug is off
> diff --git a/mm/cma.c b/mm/cma.c
> index ea506eb..8684f50 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -38,6 +38,7 @@
>  #include <trace/events/cma.h>
>  
>  #include "cma.h"
> +#include "internal.h"
>  
>  struct cma cma_areas[MAX_CMA_AREAS];
>  unsigned cma_area_count;
> @@ -145,6 +146,11 @@ err:
>  static int __init cma_init_reserved_areas(void)
>  {
>  	int i;
> +	struct zone *zone;
> +	unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> +
> +	if (!cma_area_count)
> +		return 0;
>  
>  	for (i = 0; i < cma_area_count; i++) {
>  		int ret = cma_activate_area(&cma_areas[i]);
> @@ -153,6 +159,41 @@ static int __init cma_init_reserved_areas(void)
>  			return ret;
>  	}
>  
> +	for (i = 0; i < cma_area_count; i++) {
> +		if (start_pfn > cma_areas[i].base_pfn)
> +			start_pfn = cma_areas[i].base_pfn;
> +		if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
> +			end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
> +	}
> +
> +	for_each_populated_zone(zone) {
> +		if (!is_zone_cma(zone))
> +			continue;
> +
> +		/* ZONE_CMA doesn't need to exceed CMA region */
> +		zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
> +		zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
> +					zone->zone_start_pfn;
> +	}
> +
> +	/*
> +	 * Reserved pages for ZONE_CMA are now activated and this would change
> +	 * ZONE_CMA's managed page counter and other zone's present counter.
> +	 * We need to re-calculate various zone information that depends on
> +	 * this initialization.
> +	 */
> +	build_all_zonelists(NULL, NULL);
> +	for_each_populated_zone(zone) {
> +		zone_pcp_update(zone);
> +		set_zone_contiguous(zone);
> +	}
> +
> +	/*
> +	 * We need to re-init per zone wmark by calling
> +	 * init_per_zone_wmark_min() but doesn't call here because it is
> +	 * registered on module_init and it will be called later than us.
> +	 */
> +
>  	return 0;
>  }
>  core_initcall(cma_init_reserved_areas);
> diff --git a/mm/internal.h b/mm/internal.h
> index b6ead95..4c37234 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -155,6 +155,9 @@ extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
>  extern void prep_compound_page(struct page *page, unsigned int order);
>  extern int user_min_free_kbytes;
>  
> +extern void set_zone_contiguous(struct zone *zone);
> +extern void clear_zone_contiguous(struct zone *zone);
> +
>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>  
>  /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0197d5d..796b271 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1572,16 +1572,38 @@ void __init page_alloc_init_late(void)
>  }
>  
>  #ifdef CONFIG_CMA
> +static void __init adjust_present_page_count(struct page *page, long count)
> +{
> +	struct zone *zone = page_zone(page);
> +
> +	/* We don't need to hold a lock since it is boot-up process */
> +	zone->present_pages += count;
> +}
> +
>  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>  void __init init_cma_reserved_pageblock(struct page *page)
>  {
>  	unsigned i = pageblock_nr_pages;
> +	unsigned long pfn = page_to_pfn(page);
>  	struct page *p = page;
> +	int nid = page_to_nid(page);
> +
> +	/*
> +	 * ZONE_CMA will steal present pages from other zones by changing
> +	 * page links so page_zone() is changed. Before that,
> +	 * we need to adjust previous zone's page count first.
> +	 */
> +	adjust_present_page_count(page, -pageblock_nr_pages);
>  
>  	do {
>  		__ClearPageReserved(p);
>  		set_page_count(p, 0);
> -	} while (++p, --i);
> +
> +		/* Steal pages from other zones */
> +		set_page_links(p, ZONE_CMA, nid, pfn);
> +	} while (++p, ++pfn, --i);
> +
> +	adjust_present_page_count(page, pageblock_nr_pages);
>  
>  	set_pageblock_migratetype(page, MIGRATE_CMA);

The ZONE_CMA should depends on sparse_mem.

Because the zone size is not fixed when init the buddy core.
The pageblock_flags will be NULL when setup_usemap.
>  
> @@ -7545,7 +7567,7 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
>  }
>  #endif
>  
> -#ifdef CONFIG_MEMORY_HOTPLUG
> +#if defined CONFIG_MEMORY_HOTPLUG || defined CONFIG_CMA
>  /*
>   * The zone indicated has a new number of managed_pages; batch sizes and percpu
>   * page high values need to be recalulated.
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 3/6] mm/cma: populate ZONE_CMA
  2016-06-22  9:23   ` Chen Feng
@ 2016-06-23  2:52     ` Joonsoo Kim
  2016-06-28 11:23       ` Chen Feng
  0 siblings, 1 reply; 34+ messages in thread
From: Joonsoo Kim @ 2016-06-23  2:52 UTC (permalink / raw)
  To: Chen Feng
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, fujun (F),
	Zhuangluan Su, Yiping Xu, Dan Zhao

On Wed, Jun 22, 2016 at 05:23:06PM +0800, Chen Feng wrote:
> Hello,
> 
> On 2016/5/26 14:22, js1304@gmail.com wrote:
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
> > Until now, reserved pages for CMA are managed in the ordinary zones
> > where page's pfn are belong to. This approach has numorous problems
> > and fixing them isn't easy. (It is mentioned on previous patch.)
> > To fix this situation, ZONE_CMA is introduced in previous patch, but,
> > not yet populated. This patch implement population of ZONE_CMA
> > by stealing reserved pages from the ordinary zones.
> > 
> > Unlike previous implementation that kernel allocation request with
> > __GFP_MOVABLE could be serviced from CMA region, allocation request only
> > with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
> > approach. This is an inevitable design decision to use the zone
> > implementation because ZONE_CMA could contain highmem. Due to this
> > decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.
> > 
> > I don't think it would be a problem because most of file cache pages
> > and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
> > be proved by the fact that there are many systems with ZONE_HIGHMEM and
> > they work fine. Notable disadvantage is that we cannot use these pages
> > for blockdev file cache page, because it usually has __GFP_MOVABLE but
> > not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
> > cons. In my experience, blockdev file cache pages are one of the top
> > reason that causes cma_alloc() to fail temporarily. So, we can get more
> > guarantee of cma_alloc() success by discarding that case.
> > 
> > Implementation itself is very easy to understand. Steal when cma area is
> > initialized and recalculate various per zone stat/threshold.
> > 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> >  include/linux/memory_hotplug.h |  3 ---
> >  mm/cma.c                       | 41 +++++++++++++++++++++++++++++++++++++++++
> >  mm/internal.h                  |  3 +++
> >  mm/page_alloc.c                | 26 ++++++++++++++++++++++++--
> >  4 files changed, 68 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> > index a864d79..6fde69b 100644
> > --- a/include/linux/memory_hotplug.h
> > +++ b/include/linux/memory_hotplug.h
> > @@ -198,9 +198,6 @@ void put_online_mems(void);
> >  void mem_hotplug_begin(void);
> >  void mem_hotplug_done(void);
> >  
> > -extern void set_zone_contiguous(struct zone *zone);
> > -extern void clear_zone_contiguous(struct zone *zone);
> > -
> >  #else /* ! CONFIG_MEMORY_HOTPLUG */
> >  /*
> >   * Stub functions for when hotplug is off
> > diff --git a/mm/cma.c b/mm/cma.c
> > index ea506eb..8684f50 100644
> > --- a/mm/cma.c
> > +++ b/mm/cma.c
> > @@ -38,6 +38,7 @@
> >  #include <trace/events/cma.h>
> >  
> >  #include "cma.h"
> > +#include "internal.h"
> >  
> >  struct cma cma_areas[MAX_CMA_AREAS];
> >  unsigned cma_area_count;
> > @@ -145,6 +146,11 @@ err:
> >  static int __init cma_init_reserved_areas(void)
> >  {
> >  	int i;
> > +	struct zone *zone;
> > +	unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> > +
> > +	if (!cma_area_count)
> > +		return 0;
> >  
> >  	for (i = 0; i < cma_area_count; i++) {
> >  		int ret = cma_activate_area(&cma_areas[i]);
> > @@ -153,6 +159,41 @@ static int __init cma_init_reserved_areas(void)
> >  			return ret;
> >  	}
> >  
> > +	for (i = 0; i < cma_area_count; i++) {
> > +		if (start_pfn > cma_areas[i].base_pfn)
> > +			start_pfn = cma_areas[i].base_pfn;
> > +		if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
> > +			end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
> > +	}
> > +
> > +	for_each_populated_zone(zone) {
> > +		if (!is_zone_cma(zone))
> > +			continue;
> > +
> > +		/* ZONE_CMA doesn't need to exceed CMA region */
> > +		zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
> > +		zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
> > +					zone->zone_start_pfn;
> > +	}
> > +
> > +	/*
> > +	 * Reserved pages for ZONE_CMA are now activated and this would change
> > +	 * ZONE_CMA's managed page counter and other zone's present counter.
> > +	 * We need to re-calculate various zone information that depends on
> > +	 * this initialization.
> > +	 */
> > +	build_all_zonelists(NULL, NULL);
> > +	for_each_populated_zone(zone) {
> > +		zone_pcp_update(zone);
> > +		set_zone_contiguous(zone);
> > +	}
> > +
> > +	/*
> > +	 * We need to re-init per zone wmark by calling
> > +	 * init_per_zone_wmark_min() but doesn't call here because it is
> > +	 * registered on module_init and it will be called later than us.
> > +	 */
> > +
> >  	return 0;
> >  }
> >  core_initcall(cma_init_reserved_areas);
> > diff --git a/mm/internal.h b/mm/internal.h
> > index b6ead95..4c37234 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -155,6 +155,9 @@ extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
> >  extern void prep_compound_page(struct page *page, unsigned int order);
> >  extern int user_min_free_kbytes;
> >  
> > +extern void set_zone_contiguous(struct zone *zone);
> > +extern void clear_zone_contiguous(struct zone *zone);
> > +
> >  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> >  
> >  /*
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 0197d5d..796b271 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1572,16 +1572,38 @@ void __init page_alloc_init_late(void)
> >  }
> >  
> >  #ifdef CONFIG_CMA
> > +static void __init adjust_present_page_count(struct page *page, long count)
> > +{
> > +	struct zone *zone = page_zone(page);
> > +
> > +	/* We don't need to hold a lock since it is boot-up process */
> > +	zone->present_pages += count;
> > +}
> > +
> >  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> >  void __init init_cma_reserved_pageblock(struct page *page)
> >  {
> >  	unsigned i = pageblock_nr_pages;
> > +	unsigned long pfn = page_to_pfn(page);
> >  	struct page *p = page;
> > +	int nid = page_to_nid(page);
> > +
> > +	/*
> > +	 * ZONE_CMA will steal present pages from other zones by changing
> > +	 * page links so page_zone() is changed. Before that,
> > +	 * we need to adjust previous zone's page count first.
> > +	 */
> > +	adjust_present_page_count(page, -pageblock_nr_pages);
> >  
> >  	do {
> >  		__ClearPageReserved(p);
> >  		set_page_count(p, 0);
> > -	} while (++p, --i);
> > +
> > +		/* Steal pages from other zones */
> > +		set_page_links(p, ZONE_CMA, nid, pfn);
> > +	} while (++p, ++pfn, --i);
> > +
> > +	adjust_present_page_count(page, pageblock_nr_pages);
> >  
> >  	set_pageblock_migratetype(page, MIGRATE_CMA);
> 
> The ZONE_CMA should depends on sparse_mem.
> 
> Because the zone size is not fixed when init the buddy core.
> The pageblock_flags will be NULL when setup_usemap.

Before setup_usemap(), range of ZONE_CMA is set conservatively, from
min_start_pfn of the node to max_end_pfn of the node. So,
pageblock_flags will be allocated and assigned properly.

Unfortunately, I found a bug for FLATMEM system. If you'd like to
test ZONE_CMA on FLATMEM system, please apply below one.

Thanks.
------------>8------------
diff --git a/mm/cma.c b/mm/cma.c
index 0c1a72f..6cd2973 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -168,13 +168,6 @@ static int __init cma_init_reserved_areas(void)
                return 0;
 
        for (i = 0; i < cma_area_count; i++) {
-               int ret = cma_activate_area(&cma_areas[i]);
-
-               if (ret)
-                       return ret;
-       }
-
-       for (i = 0; i < cma_area_count; i++) {
                if (start_pfn > cma_areas[i].base_pfn)
                        start_pfn = cma_areas[i].base_pfn;
                if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
@@ -191,6 +184,13 @@ static int __init cma_init_reserved_areas(void)
                                        zone->zone_start_pfn;
        }
 
+       for (i = 0; i < cma_area_count; i++) {
+               int ret = cma_activate_area(&cma_areas[i]);
+
+               if (ret)
+                       return ret;
+       }
+
        /*
         * Reserved pages for ZONE_CMA are now activated and this would change
         * ZONE_CMA's managed page counter and other zone's present counter.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/6] mm/page_alloc: recalculate some of zone threshold when on/offline memory
  2016-05-26  6:22 ` [PATCH v3 1/6] mm/page_alloc: recalculate some of zone threshold when on/offline memory js1304
@ 2016-06-24 13:20   ` Vlastimil Babka
  2016-06-28  8:12     ` Joonsoo Kim
  0 siblings, 1 reply; 34+ messages in thread
From: Vlastimil Babka @ 2016-06-24 13:20 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Rui Teng, linux-mm, linux-kernel, Joonsoo Kim

On 05/26/2016 08:22 AM, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Some of zone threshold depends on number of managed pages in the zone.
> When memory is going on/offline, it can be changed and we need to
> adjust them.
>
> This patch add recalculation to appropriate places and clean-up
> related function for better maintanance.

Can you be more specific about the user visible effect? Presumably it's 
not affecting just ZONE_CMA?
I assume it's fixing the thresholds where only part of node is onlined 
or offlined? Or are they currently wrong even when whole node is 
onlined/offlined?

(Sorry but I can't really orient myself in the maze of memory hotplug :(

Thanks,
Vlastimil

> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/page_alloc.c | 36 +++++++++++++++++++++++++++++-------
>  1 file changed, 29 insertions(+), 7 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d27e8b9..90e5a82 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4874,6 +4874,8 @@ int local_memory_node(int node)
>  }
>  #endif
>
> +static void setup_min_unmapped_ratio(struct zone *zone);
> +static void setup_min_slab_ratio(struct zone *zone);
>  #else	/* CONFIG_NUMA */
>
>  static void set_zonelist_order(void)
> @@ -5988,9 +5990,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
>  		zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
>  #ifdef CONFIG_NUMA
>  		zone->node = nid;
> -		zone->min_unmapped_pages = (freesize*sysctl_min_unmapped_ratio)
> -						/ 100;
> -		zone->min_slab_pages = (freesize * sysctl_min_slab_ratio) / 100;
> +		setup_min_unmapped_ratio(zone);
> +		setup_min_slab_ratio(zone);
>  #endif
>  		zone->name = zone_names[j];
>  		spin_lock_init(&zone->lock);
> @@ -6896,6 +6897,7 @@ int __meminit init_per_zone_wmark_min(void)
>  {
>  	unsigned long lowmem_kbytes;
>  	int new_min_free_kbytes;
> +	struct zone *zone;
>
>  	lowmem_kbytes = nr_free_buffer_pages() * (PAGE_SIZE >> 10);
>  	new_min_free_kbytes = int_sqrt(lowmem_kbytes * 16);
> @@ -6913,6 +6915,14 @@ int __meminit init_per_zone_wmark_min(void)
>  	setup_per_zone_wmarks();
>  	refresh_zone_stat_thresholds();
>  	setup_per_zone_lowmem_reserve();
> +
> +	for_each_zone(zone) {
> +#ifdef CONFIG_NUMA
> +		setup_min_unmapped_ratio(zone);
> +		setup_min_slab_ratio(zone);
> +#endif
> +	}
> +
>  	return 0;
>  }
>  core_initcall(init_per_zone_wmark_min)
> @@ -6954,6 +6964,12 @@ int watermark_scale_factor_sysctl_handler(struct ctl_table *table, int write,
>  }
>
>  #ifdef CONFIG_NUMA
> +static void setup_min_unmapped_ratio(struct zone *zone)
> +{
> +	zone->min_unmapped_pages = (zone->managed_pages *
> +			sysctl_min_unmapped_ratio) / 100;
> +}
> +
>  int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *table, int write,
>  	void __user *buffer, size_t *length, loff_t *ppos)
>  {
> @@ -6965,11 +6981,17 @@ int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *table, int write,
>  		return rc;
>
>  	for_each_zone(zone)
> -		zone->min_unmapped_pages = (zone->managed_pages *
> -				sysctl_min_unmapped_ratio) / 100;
> +		setup_min_unmapped_ratio(zone);
> +
>  	return 0;
>  }
>
> +static void setup_min_slab_ratio(struct zone *zone)
> +{
> +	zone->min_slab_pages = (zone->managed_pages *
> +			sysctl_min_slab_ratio) / 100;
> +}
> +
>  int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *table, int write,
>  	void __user *buffer, size_t *length, loff_t *ppos)
>  {
> @@ -6981,8 +7003,8 @@ int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *table, int write,
>  		return rc;
>
>  	for_each_zone(zone)
> -		zone->min_slab_pages = (zone->managed_pages *
> -				sysctl_min_slab_ratio) / 100;
> +		setup_min_slab_ratio(zone);
> +
>  	return 0;
>  }
>  #endif
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 3/6] mm/cma: populate ZONE_CMA
  2016-05-26  6:22 ` [PATCH v3 3/6] mm/cma: populate ZONE_CMA js1304
  2016-06-22  9:23   ` Chen Feng
@ 2016-06-27  8:24   ` Vlastimil Babka
  2016-06-28  8:31     ` Joonsoo Kim
  1 sibling, 1 reply; 34+ messages in thread
From: Vlastimil Babka @ 2016-06-27  8:24 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Rui Teng, linux-mm, linux-kernel, Joonsoo Kim

On 05/26/2016 08:22 AM, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Until now, reserved pages for CMA are managed in the ordinary zones
> where page's pfn are belong to. This approach has numorous problems
> and fixing them isn't easy. (It is mentioned on previous patch.)
> To fix this situation, ZONE_CMA is introduced in previous patch, but,
> not yet populated. This patch implement population of ZONE_CMA
> by stealing reserved pages from the ordinary zones.
>
> Unlike previous implementation that kernel allocation request with
> __GFP_MOVABLE could be serviced from CMA region, allocation request only
> with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
> approach. This is an inevitable design decision to use the zone
> implementation because ZONE_CMA could contain highmem. Due to this
> decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.
>
> I don't think it would be a problem because most of file cache pages
> and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
> be proved by the fact that there are many systems with ZONE_HIGHMEM and
> they work fine. Notable disadvantage is that we cannot use these pages
> for blockdev file cache page, because it usually has __GFP_MOVABLE but
> not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
> cons. In my experience, blockdev file cache pages are one of the top
> reason that causes cma_alloc() to fail temporarily. So, we can get more
> guarantee of cma_alloc() success by discarding that case.
>
> Implementation itself is very easy to understand. Steal when cma area is
> initialized and recalculate various per zone stat/threshold.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

I realize I differ here from much more experienced mm guys, and will 
probably deservingly regret it later on, but I think that the ZONE_CMA 
approach could work indeed better than current MIGRATE_CMA pageblocks.

My main worry is (naturally :) the effect on compaction overhead, due to 
potentially sparsely populated zones with holes that have to be scanned 
over - either ZONE_CMA itself, or the zones where pages were stolen from.

> ---
>  include/linux/memory_hotplug.h |  3 ---
>  mm/cma.c                       | 41 +++++++++++++++++++++++++++++++++++++++++
>  mm/internal.h                  |  3 +++
>  mm/page_alloc.c                | 26 ++++++++++++++++++++++++--
>  4 files changed, 68 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index a864d79..6fde69b 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -198,9 +198,6 @@ void put_online_mems(void);
>  void mem_hotplug_begin(void);
>  void mem_hotplug_done(void);
>
> -extern void set_zone_contiguous(struct zone *zone);
> -extern void clear_zone_contiguous(struct zone *zone);
> -
>  #else /* ! CONFIG_MEMORY_HOTPLUG */
>  /*
>   * Stub functions for when hotplug is off
> diff --git a/mm/cma.c b/mm/cma.c
> index ea506eb..8684f50 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -38,6 +38,7 @@
>  #include <trace/events/cma.h>
>
>  #include "cma.h"
> +#include "internal.h"
>
>  struct cma cma_areas[MAX_CMA_AREAS];
>  unsigned cma_area_count;
> @@ -145,6 +146,11 @@ err:
>  static int __init cma_init_reserved_areas(void)
>  {
>  	int i;
> +	struct zone *zone;
> +	unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> +
> +	if (!cma_area_count)
> +		return 0;
>
>  	for (i = 0; i < cma_area_count; i++) {
>  		int ret = cma_activate_area(&cma_areas[i]);
> @@ -153,6 +159,41 @@ static int __init cma_init_reserved_areas(void)
>  			return ret;
>  	}
>
> +	for (i = 0; i < cma_area_count; i++) {
> +		if (start_pfn > cma_areas[i].base_pfn)
> +			start_pfn = cma_areas[i].base_pfn;
> +		if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
> +			end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
> +	}
> +
> +	for_each_populated_zone(zone) {
> +		if (!is_zone_cma(zone))
> +			continue;
> +
> +		/* ZONE_CMA doesn't need to exceed CMA region */
> +		zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
> +		zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
> +					zone->zone_start_pfn;

So what's the typical spanned vs present pages here? Should there 
perhaps be some pr_warns about large holes?

> +	}
> +
> +	/*
> +	 * Reserved pages for ZONE_CMA are now activated and this would change
> +	 * ZONE_CMA's managed page counter and other zone's present counter.
> +	 * We need to re-calculate various zone information that depends on
> +	 * this initialization.
> +	 */
> +	build_all_zonelists(NULL, NULL);
> +	for_each_populated_zone(zone) {
> +		zone_pcp_update(zone);
> +		set_zone_contiguous(zone);
> +	}
> +
> +	/*
> +	 * We need to re-init per zone wmark by calling
> +	 * init_per_zone_wmark_min() but doesn't call here because it is
> +	 * registered on module_init and it will be called later than us.
> +	 */
> +
>  	return 0;
>  }
>  core_initcall(cma_init_reserved_areas);
> diff --git a/mm/internal.h b/mm/internal.h
> index b6ead95..4c37234 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -155,6 +155,9 @@ extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
>  extern void prep_compound_page(struct page *page, unsigned int order);
>  extern int user_min_free_kbytes;
>
> +extern void set_zone_contiguous(struct zone *zone);
> +extern void clear_zone_contiguous(struct zone *zone);
> +
>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>
>  /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0197d5d..796b271 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1572,16 +1572,38 @@ void __init page_alloc_init_late(void)
>  }
>
>  #ifdef CONFIG_CMA
> +static void __init adjust_present_page_count(struct page *page, long count)
> +{
> +	struct zone *zone = page_zone(page);
> +
> +	/* We don't need to hold a lock since it is boot-up process */
> +	zone->present_pages += count;
> +}
> +
>  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>  void __init init_cma_reserved_pageblock(struct page *page)
>  {
>  	unsigned i = pageblock_nr_pages;
> +	unsigned long pfn = page_to_pfn(page);
>  	struct page *p = page;
> +	int nid = page_to_nid(page);
> +
> +	/*
> +	 * ZONE_CMA will steal present pages from other zones by changing
> +	 * page links so page_zone() is changed. Before that,
> +	 * we need to adjust previous zone's page count first.
> +	 */
> +	adjust_present_page_count(page, -pageblock_nr_pages);

Ideally, zone's start_pfn and spanned_pages should be also adjusted if 
we stole from the beginning/end (which I suppose should be quite common?).

BTW, shouldn't it be possible with ZONE_CMA to drop the requirement in 
cma_activate_area() that all pages come (originally) from the same zone, 
after your series?

>
>  	do {
>  		__ClearPageReserved(p);
>  		set_page_count(p, 0);
> -	} while (++p, --i);
> +
> +		/* Steal pages from other zones */
> +		set_page_links(p, ZONE_CMA, nid, pfn);
> +	} while (++p, ++pfn, --i);
> +
> +	adjust_present_page_count(page, pageblock_nr_pages);
>
>  	set_pageblock_migratetype(page, MIGRATE_CMA);
>
> @@ -7545,7 +7567,7 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
>  }
>  #endif
>
> -#ifdef CONFIG_MEMORY_HOTPLUG
> +#if defined CONFIG_MEMORY_HOTPLUG || defined CONFIG_CMA
>  /*
>   * The zone indicated has a new number of managed_pages; batch sizes and percpu
>   * page high values need to be recalulated.
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 4/6] mm/cma: remove ALLOC_CMA
  2016-05-26  6:22 ` [PATCH v3 4/6] mm/cma: remove ALLOC_CMA js1304
@ 2016-06-27  9:30   ` Vlastimil Babka
  2016-06-28  8:16     ` Joonsoo Kim
  0 siblings, 1 reply; 34+ messages in thread
From: Vlastimil Babka @ 2016-06-27  9:30 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Rui Teng, linux-mm, linux-kernel, Joonsoo Kim

On 05/26/2016 08:22 AM, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Now, all reserved pages for CMA region are belong to the ZONE_CMA
> and it only serves for GFP_HIGHUSER_MOVABLE. Therefore, we don't need to
> consider ALLOC_CMA at all.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/internal.h   |  3 +--
>  mm/page_alloc.c | 27 +++------------------------
>  2 files changed, 4 insertions(+), 26 deletions(-)
>

[...]

> @@ -2833,10 +2827,8 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
>  		}
>
>  #ifdef CONFIG_CMA
> -		if ((alloc_flags & ALLOC_CMA) &&
> -		    !list_empty(&area->free_list[MIGRATE_CMA])) {
> +		if (!list_empty(&area->free_list[MIGRATE_CMA]))
>  			return true;
> -		}
>  #endif

Nitpick: it would be more logical to remove the whole block in this 
patch, as removing ALLOC_CMA means it's effectively false? Also less churn.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 5/6] mm/cma: remove MIGRATE_CMA
  2016-05-26  6:22 ` [PATCH v3 5/6] mm/cma: remove MIGRATE_CMA js1304
  2016-05-27  1:42   ` Chen Feng
@ 2016-06-27  9:46   ` Vlastimil Babka
  2016-06-28  8:17     ` Joonsoo Kim
  1 sibling, 1 reply; 34+ messages in thread
From: Vlastimil Babka @ 2016-06-27  9:46 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Rui Teng, linux-mm, linux-kernel, Joonsoo Kim

On 05/26/2016 08:22 AM, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Now, all reserved pages for CMA region are belong to the ZONE_CMA
> and there is no other type of pages. Therefore, we don't need to
> use MIGRATE_CMA to distinguish and handle differently for CMA pages
> and ordinary pages. Remove MIGRATE_CMA.
>
> Unfortunately, this patch make free CMA counter incorrect because
> we count it when pages are on the MIGRATE_CMA. It will be fixed
> by next patch. I can squash next patch here but it makes changes
> complicated and hard to review so I separate that.

Doesn't sound like a big deal.

> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

[...]

> @@ -7442,14 +7401,14 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  	 * allocator removing them from the buddy system.  This way
>  	 * page allocator will never consider using them.
>  	 *
> -	 * This lets us mark the pageblocks back as
> -	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
> -	 * aligned range but not in the unaligned, original range are
> -	 * put back to page allocator so that buddy can use them.
> +	 * This lets us mark the pageblocks back as MIGRATE_MOVABLE
> +	 * so that free pages in the aligned range but not in the
> +	 * unaligned, original range are put back to page allocator
> +	 * so that buddy can use them.
>  	 */
>
>  	ret = start_isolate_page_range(pfn_max_align_down(start),
> -				       pfn_max_align_up(end), migratetype,
> +				       pfn_max_align_up(end), MIGRATE_MOVABLE,
>  				       false);
>  	if (ret)
>  		return ret;
> @@ -7528,7 +7487,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>
>  done:
>  	undo_isolate_page_range(pfn_max_align_down(start),
> -				pfn_max_align_up(end), migratetype);
> +				pfn_max_align_up(end), MIGRATE_MOVABLE);
>  	return ret;
>  }

Looks like all callers of {start,undo}_isolate_page_range() now use 
MIGRATE_MOVABLE, so it could be removed.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 6/6] mm/cma: remove per zone CMA stat
  2016-05-26  6:22 ` [PATCH v3 6/6] mm/cma: remove per zone CMA stat js1304
@ 2016-06-27  9:54   ` Vlastimil Babka
  0 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2016-06-27  9:54 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Rui Teng, linux-mm, linux-kernel, Joonsoo Kim

On 05/26/2016 08:22 AM, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Now, all reserved pages for CMA region are belong to the ZONE_CMA
> so we don't need to maintain CMA stat in other zones. Remove it.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/6] Introduce ZONE_CMA
  2016-05-26  6:22 [PATCH v3 0/6] Introduce ZONE_CMA js1304
                   ` (6 preceding siblings ...)
  2016-05-26  8:04 ` [PATCH v3 0/6] Introduce ZONE_CMA Feng Tang
@ 2016-06-27 11:25 ` Balbir Singh
  2016-06-29  7:57   ` Joonsoo Kim
  7 siblings, 1 reply; 34+ messages in thread
From: Balbir Singh @ 2016-06-27 11:25 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, Joonsoo Kim



On 26/05/16 16:22, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> Hello,
> 
> Changes from v2
> o Rebase on next-20160525
> o No other changes except following description
> 
> There was a discussion with Mel [1] after LSF/MM 2016. I could summarise
> it to help merge decision but it's better to read by yourself since
> if I summarise it, it would be biased for me. But, if anyone hope
> the summary, I will do it. :)
> 
> Anyway, Mel's position on this patchset seems to be neutral. He said:
> "I'm not going to outright NAK your series but I won't ACK it either"
> 
> We can fix the problems with any approach but I hope to go a new zone
> approach because it is less error-prone. It reduces some corner case
> handling for now and remove need for potential corner case handling to fix
> problems.
> 
> Note that our company is already using ZONE_CMA for a years and
> there is no problem.
> 
> If anyone has a different opinion, please let me know and let's discuss
> together.
> 
> Andrew, if there is something to do for merge, please let me know.
> 
> [1] https://lkml.kernel.org/r/20160425053653.GA25662@js1304-P5Q-DELUXE
> 
> Changes from v1
> o Separate some patches which deserve to submit independently
> o Modify description to reflect current kernel state
> (e.g. high-order watermark problem disappeared by Mel's work)
> o Don't increase SECTION_SIZE_BITS to make a room in page flags
> (detailed reason is on the patch that adds ZONE_CMA)
> o Adjust ZONE_CMA population code
> 
> This series try to solve problems of current CMA implementation.
> 
> CMA is introduced to provide physically contiguous pages at runtime
> without exclusive reserved memory area. But, current implementation
> works like as previous reserved memory approach, because freepages
> on CMA region are used only if there is no movable freepage. In other
> words, freepages on CMA region are only used as fallback. In that
> situation where freepages on CMA region are used as fallback, kswapd
> would be woken up easily since there is no unmovable and reclaimable
> freepage, too. If kswapd starts to reclaim memory, fallback allocation
> to MIGRATE_CMA doesn't occur any more since movable freepages are
> already refilled by kswapd and then most of freepage on CMA are left
> to be in free. This situation looks like exclusive reserved memory case.

I am afraid I don't understand the problem statement completely understand.
Is this the ALLOC_CMA case or the !ALLOC_CMA one? I also think one other
problem is that in my experience and observation all CMA allocations seem
to come from one node-- the highest node on the system

> 
> In my experiment, I found that if system memory has 1024 MB memory and
> 512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
> free memory is left. Detailed reason is that for keeping enough free
> memory for unmovable and reclaimable allocation, kswapd uses below
> equation when calculating free memory and it easily go under the watermark.
> 
> Free memory for unmovable and reclaimable = Free total - Free CMA pages
> 
> This is derivated from the property of CMA freepage that CMA freepage
> can't be used for unmovable and reclaimable allocation.
> 
> Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
> is lower than low watermark and tries to make free memory until
> (FreeTotal - FreeCMA) is higher than high watermark. That results
> in that FreeTotal is moving around 512MB boundary consistently. It
> then means that we can't utilize full memory capacity.
> 

OK.. so you are suggesting that we are under-utilizing the memory in the
CMA region?

> To fix this problem, I submitted some patches [1] about 10 months ago,
> but, found some more problems to be fixed before solving this problem.
> It requires many hooks in allocator hotpath so some developers doesn't
> like it. Instead, some of them suggest different approach [2] to fix
> all the problems related to CMA, that is, introducing a new zone to deal
> with free CMA pages. I agree that it is the best way to go so implement
> here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
> decide to add a new zone rather than piggyback on ZONE_MOVABLE since
> they have some differences. First, reserved CMA pages should not be
> offlined.

Why? Why are they special? Even if they are offlined by user action,
one would expect the following to occur

1. User would mark/release the cma region associated with them
2. User would then hotplug the memory

> If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
> MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
> to distiguish hotpluggable memory and reserved memory for CMA in the same
> zone. It would make memory hotplug code which is already complicated
> more complicated.

Again why treat it special, one could potentially deny the hotplug based
on the knowledge of where the CMA region is allocated from

> Second, cma_alloc() can be called more frequently
> than memory hotplug operation and possibly we need to control
> allocation rate of ZONE_CMA to optimize latency in the future.
> In this case, separate zone approach is easy to modify. Third, I'd
> like to see statistics for CMA, separately. Sometimes, we need to debug
> why cma_alloc() is failed and separate statistics would be more helpful
> in this situtaion.
> 
> Anyway, this patchset solves four problems related to CMA implementation.
>

Balbir 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/6] mm/page_alloc: recalculate some of zone threshold when on/offline memory
  2016-06-24 13:20   ` Vlastimil Babka
@ 2016-06-28  8:12     ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2016-06-28  8:12 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Rui Teng, linux-mm, linux-kernel

On Fri, Jun 24, 2016 at 03:20:43PM +0200, Vlastimil Babka wrote:
> On 05/26/2016 08:22 AM, js1304@gmail.com wrote:
> >From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> >Some of zone threshold depends on number of managed pages in the zone.
> >When memory is going on/offline, it can be changed and we need to
> >adjust them.
> >
> >This patch add recalculation to appropriate places and clean-up
> >related function for better maintanance.
> 
> Can you be more specific about the user visible effect? Presumably
> it's not affecting just ZONE_CMA?

Yes, it's also affecting memory hotplug.

> I assume it's fixing the thresholds where only part of node is
> onlined or offlined? Or are they currently wrong even when whole
> node is onlined/offlined?

When memory hotplug happens, managed_pages changes and we need to
recalculate everything based on managed_pages. min_slab_pages and
min_unmapped_pages are missed so this patch does it, too.

Thanks.

> 
> (Sorry but I can't really orient myself in the maze of memory hotplug :(
> 
> Thanks,
> Vlastimil
> 
> >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >---
> > mm/page_alloc.c | 36 +++++++++++++++++++++++++++++-------
> > 1 file changed, 29 insertions(+), 7 deletions(-)
> >
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index d27e8b9..90e5a82 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -4874,6 +4874,8 @@ int local_memory_node(int node)
> > }
> > #endif
> >
> >+static void setup_min_unmapped_ratio(struct zone *zone);
> >+static void setup_min_slab_ratio(struct zone *zone);
> > #else	/* CONFIG_NUMA */
> >
> > static void set_zonelist_order(void)
> >@@ -5988,9 +5990,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
> > 		zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
> > #ifdef CONFIG_NUMA
> > 		zone->node = nid;
> >-		zone->min_unmapped_pages = (freesize*sysctl_min_unmapped_ratio)
> >-						/ 100;
> >-		zone->min_slab_pages = (freesize * sysctl_min_slab_ratio) / 100;
> >+		setup_min_unmapped_ratio(zone);
> >+		setup_min_slab_ratio(zone);
> > #endif
> > 		zone->name = zone_names[j];
> > 		spin_lock_init(&zone->lock);
> >@@ -6896,6 +6897,7 @@ int __meminit init_per_zone_wmark_min(void)
> > {
> > 	unsigned long lowmem_kbytes;
> > 	int new_min_free_kbytes;
> >+	struct zone *zone;
> >
> > 	lowmem_kbytes = nr_free_buffer_pages() * (PAGE_SIZE >> 10);
> > 	new_min_free_kbytes = int_sqrt(lowmem_kbytes * 16);
> >@@ -6913,6 +6915,14 @@ int __meminit init_per_zone_wmark_min(void)
> > 	setup_per_zone_wmarks();
> > 	refresh_zone_stat_thresholds();
> > 	setup_per_zone_lowmem_reserve();
> >+
> >+	for_each_zone(zone) {
> >+#ifdef CONFIG_NUMA
> >+		setup_min_unmapped_ratio(zone);
> >+		setup_min_slab_ratio(zone);
> >+#endif
> >+	}
> >+
> > 	return 0;
> > }
> > core_initcall(init_per_zone_wmark_min)
> >@@ -6954,6 +6964,12 @@ int watermark_scale_factor_sysctl_handler(struct ctl_table *table, int write,
> > }
> >
> > #ifdef CONFIG_NUMA
> >+static void setup_min_unmapped_ratio(struct zone *zone)
> >+{
> >+	zone->min_unmapped_pages = (zone->managed_pages *
> >+			sysctl_min_unmapped_ratio) / 100;
> >+}
> >+
> > int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *table, int write,
> > 	void __user *buffer, size_t *length, loff_t *ppos)
> > {
> >@@ -6965,11 +6981,17 @@ int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *table, int write,
> > 		return rc;
> >
> > 	for_each_zone(zone)
> >-		zone->min_unmapped_pages = (zone->managed_pages *
> >-				sysctl_min_unmapped_ratio) / 100;
> >+		setup_min_unmapped_ratio(zone);
> >+
> > 	return 0;
> > }
> >
> >+static void setup_min_slab_ratio(struct zone *zone)
> >+{
> >+	zone->min_slab_pages = (zone->managed_pages *
> >+			sysctl_min_slab_ratio) / 100;
> >+}
> >+
> > int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *table, int write,
> > 	void __user *buffer, size_t *length, loff_t *ppos)
> > {
> >@@ -6981,8 +7003,8 @@ int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *table, int write,
> > 		return rc;
> >
> > 	for_each_zone(zone)
> >-		zone->min_slab_pages = (zone->managed_pages *
> >-				sysctl_min_slab_ratio) / 100;
> >+		setup_min_slab_ratio(zone);
> >+
> > 	return 0;
> > }
> > #endif
> >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 4/6] mm/cma: remove ALLOC_CMA
  2016-06-27  9:30   ` Vlastimil Babka
@ 2016-06-28  8:16     ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2016-06-28  8:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Rui Teng, linux-mm, linux-kernel

On Mon, Jun 27, 2016 at 11:30:52AM +0200, Vlastimil Babka wrote:
> On 05/26/2016 08:22 AM, js1304@gmail.com wrote:
> >From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> >Now, all reserved pages for CMA region are belong to the ZONE_CMA
> >and it only serves for GFP_HIGHUSER_MOVABLE. Therefore, we don't need to
> >consider ALLOC_CMA at all.
> >
> >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >---
> > mm/internal.h   |  3 +--
> > mm/page_alloc.c | 27 +++------------------------
> > 2 files changed, 4 insertions(+), 26 deletions(-)
> >
> 
> [...]
> 
> >@@ -2833,10 +2827,8 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
> > 		}
> >
> > #ifdef CONFIG_CMA
> >-		if ((alloc_flags & ALLOC_CMA) &&
> >-		    !list_empty(&area->free_list[MIGRATE_CMA])) {
> >+		if (!list_empty(&area->free_list[MIGRATE_CMA]))
> > 			return true;
> >-		}
> > #endif
> 
> Nitpick: it would be more logical to remove the whole block in this
> patch, as removing ALLOC_CMA means it's effectively false? Also less
> churn.

No, all freepages on ZONE_CMA is attached on area->free_list[MIGRATE_CMA].
We need to check whether there is a freepage on it or not to pass watermark
check for high-order allocation.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 5/6] mm/cma: remove MIGRATE_CMA
  2016-06-27  9:46   ` Vlastimil Babka
@ 2016-06-28  8:17     ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2016-06-28  8:17 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Rui Teng, linux-mm, linux-kernel

On Mon, Jun 27, 2016 at 11:46:39AM +0200, Vlastimil Babka wrote:
> On 05/26/2016 08:22 AM, js1304@gmail.com wrote:
> >From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> >Now, all reserved pages for CMA region are belong to the ZONE_CMA
> >and there is no other type of pages. Therefore, we don't need to
> >use MIGRATE_CMA to distinguish and handle differently for CMA pages
> >and ordinary pages. Remove MIGRATE_CMA.
> >
> >Unfortunately, this patch make free CMA counter incorrect because
> >we count it when pages are on the MIGRATE_CMA. It will be fixed
> >by next patch. I can squash next patch here but it makes changes
> >complicated and hard to review so I separate that.
> 
> Doesn't sound like a big deal.

Okay.

> 
> >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> [...]
> 
> >@@ -7442,14 +7401,14 @@ int alloc_contig_range(unsigned long start, unsigned long end,
> > 	 * allocator removing them from the buddy system.  This way
> > 	 * page allocator will never consider using them.
> > 	 *
> >-	 * This lets us mark the pageblocks back as
> >-	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
> >-	 * aligned range but not in the unaligned, original range are
> >-	 * put back to page allocator so that buddy can use them.
> >+	 * This lets us mark the pageblocks back as MIGRATE_MOVABLE
> >+	 * so that free pages in the aligned range but not in the
> >+	 * unaligned, original range are put back to page allocator
> >+	 * so that buddy can use them.
> > 	 */
> >
> > 	ret = start_isolate_page_range(pfn_max_align_down(start),
> >-				       pfn_max_align_up(end), migratetype,
> >+				       pfn_max_align_up(end), MIGRATE_MOVABLE,
> > 				       false);
> > 	if (ret)
> > 		return ret;
> >@@ -7528,7 +7487,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
> >
> > done:
> > 	undo_isolate_page_range(pfn_max_align_down(start),
> >-				pfn_max_align_up(end), migratetype);
> >+				pfn_max_align_up(end), MIGRATE_MOVABLE);
> > 	return ret;
> > }
> 
> Looks like all callers of {start,undo}_isolate_page_range() now use
> MIGRATE_MOVABLE, so it could be removed.

You're right. Will do in next spin.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 3/6] mm/cma: populate ZONE_CMA
  2016-06-27  8:24   ` Vlastimil Babka
@ 2016-06-28  8:31     ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2016-06-28  8:31 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Rui Teng, linux-mm, linux-kernel

On Mon, Jun 27, 2016 at 10:24:05AM +0200, Vlastimil Babka wrote:
> On 05/26/2016 08:22 AM, js1304@gmail.com wrote:
> >From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> >Until now, reserved pages for CMA are managed in the ordinary zones
> >where page's pfn are belong to. This approach has numorous problems
> >and fixing them isn't easy. (It is mentioned on previous patch.)
> >To fix this situation, ZONE_CMA is introduced in previous patch, but,
> >not yet populated. This patch implement population of ZONE_CMA
> >by stealing reserved pages from the ordinary zones.
> >
> >Unlike previous implementation that kernel allocation request with
> >__GFP_MOVABLE could be serviced from CMA region, allocation request only
> >with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
> >approach. This is an inevitable design decision to use the zone
> >implementation because ZONE_CMA could contain highmem. Due to this
> >decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.
> >
> >I don't think it would be a problem because most of file cache pages
> >and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
> >be proved by the fact that there are many systems with ZONE_HIGHMEM and
> >they work fine. Notable disadvantage is that we cannot use these pages
> >for blockdev file cache page, because it usually has __GFP_MOVABLE but
> >not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
> >cons. In my experience, blockdev file cache pages are one of the top
> >reason that causes cma_alloc() to fail temporarily. So, we can get more
> >guarantee of cma_alloc() success by discarding that case.
> >
> >Implementation itself is very easy to understand. Steal when cma area is
> >initialized and recalculate various per zone stat/threshold.
> >
> >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> I realize I differ here from much more experienced mm guys, and will
> probably deservingly regret it later on, but I think that the
> ZONE_CMA approach could work indeed better than current MIGRATE_CMA
> pageblocks.
> 
> My main worry is (naturally :) the effect on compaction overhead,
> due to potentially sparsely populated zones with holes that have to
> be scanned over - either ZONE_CMA itself, or the zones where pages
> were stolen from.

As I said before, without this patchset, it'd be helpful to skip
MIGRATE_CMA pageblock on compaction since it cannot be usable for
non-movable allocation. If we assume that we need skip, overhead due
to ZONE_CMA would be marginal.

> 
> >---
> > include/linux/memory_hotplug.h |  3 ---
> > mm/cma.c                       | 41 +++++++++++++++++++++++++++++++++++++++++
> > mm/internal.h                  |  3 +++
> > mm/page_alloc.c                | 26 ++++++++++++++++++++++++--
> > 4 files changed, 68 insertions(+), 5 deletions(-)
> >
> >diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> >index a864d79..6fde69b 100644
> >--- a/include/linux/memory_hotplug.h
> >+++ b/include/linux/memory_hotplug.h
> >@@ -198,9 +198,6 @@ void put_online_mems(void);
> > void mem_hotplug_begin(void);
> > void mem_hotplug_done(void);
> >
> >-extern void set_zone_contiguous(struct zone *zone);
> >-extern void clear_zone_contiguous(struct zone *zone);
> >-
> > #else /* ! CONFIG_MEMORY_HOTPLUG */
> > /*
> >  * Stub functions for when hotplug is off
> >diff --git a/mm/cma.c b/mm/cma.c
> >index ea506eb..8684f50 100644
> >--- a/mm/cma.c
> >+++ b/mm/cma.c
> >@@ -38,6 +38,7 @@
> > #include <trace/events/cma.h>
> >
> > #include "cma.h"
> >+#include "internal.h"
> >
> > struct cma cma_areas[MAX_CMA_AREAS];
> > unsigned cma_area_count;
> >@@ -145,6 +146,11 @@ err:
> > static int __init cma_init_reserved_areas(void)
> > {
> > 	int i;
> >+	struct zone *zone;
> >+	unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> >+
> >+	if (!cma_area_count)
> >+		return 0;
> >
> > 	for (i = 0; i < cma_area_count; i++) {
> > 		int ret = cma_activate_area(&cma_areas[i]);
> >@@ -153,6 +159,41 @@ static int __init cma_init_reserved_areas(void)
> > 			return ret;
> > 	}
> >
> >+	for (i = 0; i < cma_area_count; i++) {
> >+		if (start_pfn > cma_areas[i].base_pfn)
> >+			start_pfn = cma_areas[i].base_pfn;
> >+		if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
> >+			end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
> >+	}
> >+
> >+	for_each_populated_zone(zone) {
> >+		if (!is_zone_cma(zone))
> >+			continue;
> >+
> >+		/* ZONE_CMA doesn't need to exceed CMA region */
> >+		zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
> >+		zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
> >+					zone->zone_start_pfn;
> 
> So what's the typical spanned vs present pages here? Should there
> perhaps be some pr_warns about large holes?

It's completely different case by case, so I cannot say anything. :)
Usually, it is a hardware requirement which cannot be changed so
pr_warns() would not help here.

> >+	}
> >+
> >+	/*
> >+	 * Reserved pages for ZONE_CMA are now activated and this would change
> >+	 * ZONE_CMA's managed page counter and other zone's present counter.
> >+	 * We need to re-calculate various zone information that depends on
> >+	 * this initialization.
> >+	 */
> >+	build_all_zonelists(NULL, NULL);
> >+	for_each_populated_zone(zone) {
> >+		zone_pcp_update(zone);
> >+		set_zone_contiguous(zone);
> >+	}
> >+
> >+	/*
> >+	 * We need to re-init per zone wmark by calling
> >+	 * init_per_zone_wmark_min() but doesn't call here because it is
> >+	 * registered on module_init and it will be called later than us.
> >+	 */
> >+
> > 	return 0;
> > }
> > core_initcall(cma_init_reserved_areas);
> >diff --git a/mm/internal.h b/mm/internal.h
> >index b6ead95..4c37234 100644
> >--- a/mm/internal.h
> >+++ b/mm/internal.h
> >@@ -155,6 +155,9 @@ extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
> > extern void prep_compound_page(struct page *page, unsigned int order);
> > extern int user_min_free_kbytes;
> >
> >+extern void set_zone_contiguous(struct zone *zone);
> >+extern void clear_zone_contiguous(struct zone *zone);
> >+
> > #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> >
> > /*
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index 0197d5d..796b271 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -1572,16 +1572,38 @@ void __init page_alloc_init_late(void)
> > }
> >
> > #ifdef CONFIG_CMA
> >+static void __init adjust_present_page_count(struct page *page, long count)
> >+{
> >+	struct zone *zone = page_zone(page);
> >+
> >+	/* We don't need to hold a lock since it is boot-up process */
> >+	zone->present_pages += count;
> >+}
> >+
> > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> > void __init init_cma_reserved_pageblock(struct page *page)
> > {
> > 	unsigned i = pageblock_nr_pages;
> >+	unsigned long pfn = page_to_pfn(page);
> > 	struct page *p = page;
> >+	int nid = page_to_nid(page);
> >+
> >+	/*
> >+	 * ZONE_CMA will steal present pages from other zones by changing
> >+	 * page links so page_zone() is changed. Before that,
> >+	 * we need to adjust previous zone's page count first.
> >+	 */
> >+	adjust_present_page_count(page, -pageblock_nr_pages);
> 
> Ideally, zone's start_pfn and spanned_pages should be also adjusted
> if we stole from the beginning/end (which I suppose should be quite
> common?).

It would be possible. Maybe, there is a reason I didn't do that but I
don't remember it. I will think more.

> BTW, shouldn't it be possible with ZONE_CMA to drop the requirement
> in cma_activate_area() that all pages come (originally) from the
> same zone, after your series?

Maybe! I will look at it more.

Thanks.

> >
> > 	do {
> > 		__ClearPageReserved(p);
> > 		set_page_count(p, 0);
> >-	} while (++p, --i);
> >+
> >+		/* Steal pages from other zones */
> >+		set_page_links(p, ZONE_CMA, nid, pfn);
> >+	} while (++p, ++pfn, --i);
> >+
> >+	adjust_present_page_count(page, pageblock_nr_pages);
> >
> > 	set_pageblock_migratetype(page, MIGRATE_CMA);
> >
> >@@ -7545,7 +7567,7 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
> > }
> > #endif
> >
> >-#ifdef CONFIG_MEMORY_HOTPLUG
> >+#if defined CONFIG_MEMORY_HOTPLUG || defined CONFIG_CMA
> > /*
> >  * The zone indicated has a new number of managed_pages; batch sizes and percpu
> >  * page high values need to be recalulated.
> >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 3/6] mm/cma: populate ZONE_CMA
  2016-06-23  2:52     ` Joonsoo Kim
@ 2016-06-28 11:23       ` Chen Feng
  2016-06-29  8:00         ` Joonsoo Kim
  0 siblings, 1 reply; 34+ messages in thread
From: Chen Feng @ 2016-06-28 11:23 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, fujun (F),
	Zhuangluan Su, Yiping Xu, Dan Zhao

Hello,

On 2016/6/23 10:52, Joonsoo Kim wrote:
> On Wed, Jun 22, 2016 at 05:23:06PM +0800, Chen Feng wrote:
>> Hello,
>>
>> On 2016/5/26 14:22, js1304@gmail.com wrote:
>>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>>
>>> Until now, reserved pages for CMA are managed in the ordinary zones
>>> where page's pfn are belong to. This approach has numorous problems
>>> and fixing them isn't easy. (It is mentioned on previous patch.)
>>> To fix this situation, ZONE_CMA is introduced in previous patch, but,
>>> not yet populated. This patch implement population of ZONE_CMA
>>> by stealing reserved pages from the ordinary zones.
>>>
>>> Unlike previous implementation that kernel allocation request with
>>> __GFP_MOVABLE could be serviced from CMA region, allocation request only
>>> with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
>>> approach. This is an inevitable design decision to use the zone
>>> implementation because ZONE_CMA could contain highmem. Due to this
>>> decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.
>>>
>>> I don't think it would be a problem because most of file cache pages
>>> and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
>>> be proved by the fact that there are many systems with ZONE_HIGHMEM and
>>> they work fine. Notable disadvantage is that we cannot use these pages
>>> for blockdev file cache page, because it usually has __GFP_MOVABLE but
>>> not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
>>> cons. In my experience, blockdev file cache pages are one of the top
>>> reason that causes cma_alloc() to fail temporarily. So, we can get more
>>> guarantee of cma_alloc() success by discarding that case.
>>>
>>> Implementation itself is very easy to understand. Steal when cma area is
>>> initialized and recalculate various per zone stat/threshold.
>>>
>>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>> ---
>>>  include/linux/memory_hotplug.h |  3 ---
>>>  mm/cma.c                       | 41 +++++++++++++++++++++++++++++++++++++++++
>>>  mm/internal.h                  |  3 +++
>>>  mm/page_alloc.c                | 26 ++++++++++++++++++++++++--
>>>  4 files changed, 68 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>>> index a864d79..6fde69b 100644
>>> --- a/include/linux/memory_hotplug.h
>>> +++ b/include/linux/memory_hotplug.h
>>> @@ -198,9 +198,6 @@ void put_online_mems(void);
>>>  void mem_hotplug_begin(void);
>>>  void mem_hotplug_done(void);
>>>  
>>> -extern void set_zone_contiguous(struct zone *zone);
>>> -extern void clear_zone_contiguous(struct zone *zone);
>>> -
>>>  #else /* ! CONFIG_MEMORY_HOTPLUG */
>>>  /*
>>>   * Stub functions for when hotplug is off
>>> diff --git a/mm/cma.c b/mm/cma.c
>>> index ea506eb..8684f50 100644
>>> --- a/mm/cma.c
>>> +++ b/mm/cma.c
>>> @@ -38,6 +38,7 @@
>>>  #include <trace/events/cma.h>
>>>  
>>>  #include "cma.h"
>>> +#include "internal.h"
>>>  
>>>  struct cma cma_areas[MAX_CMA_AREAS];
>>>  unsigned cma_area_count;
>>> @@ -145,6 +146,11 @@ err:
>>>  static int __init cma_init_reserved_areas(void)
>>>  {
>>>  	int i;
>>> +	struct zone *zone;
>>> +	unsigned long start_pfn = UINT_MAX, end_pfn = 0;
>>> +
>>> +	if (!cma_area_count)
>>> +		return 0;
>>>  
>>>  	for (i = 0; i < cma_area_count; i++) {
>>>  		int ret = cma_activate_area(&cma_areas[i]);
>>> @@ -153,6 +159,41 @@ static int __init cma_init_reserved_areas(void)
>>>  			return ret;
>>>  	}
>>>  
>>> +	for (i = 0; i < cma_area_count; i++) {
>>> +		if (start_pfn > cma_areas[i].base_pfn)
>>> +			start_pfn = cma_areas[i].base_pfn;
>>> +		if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
>>> +			end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
>>> +	}
>>> +
>>> +	for_each_populated_zone(zone) {
>>> +		if (!is_zone_cma(zone))
>>> +			continue;
>>> +
>>> +		/* ZONE_CMA doesn't need to exceed CMA region */
>>> +		zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
>>> +		zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
>>> +					zone->zone_start_pfn;
>>> +	}
>>> +
>>> +	/*
>>> +	 * Reserved pages for ZONE_CMA are now activated and this would change
>>> +	 * ZONE_CMA's managed page counter and other zone's present counter.
>>> +	 * We need to re-calculate various zone information that depends on
>>> +	 * this initialization.
>>> +	 */
>>> +	build_all_zonelists(NULL, NULL);
>>> +	for_each_populated_zone(zone) {
>>> +		zone_pcp_update(zone);
>>> +		set_zone_contiguous(zone);
>>> +	}
>>> +
>>> +	/*
>>> +	 * We need to re-init per zone wmark by calling
>>> +	 * init_per_zone_wmark_min() but doesn't call here because it is
>>> +	 * registered on module_init and it will be called later than us.
>>> +	 */
>>> +
>>>  	return 0;
>>>  }
>>>  core_initcall(cma_init_reserved_areas);
>>> diff --git a/mm/internal.h b/mm/internal.h
>>> index b6ead95..4c37234 100644
>>> --- a/mm/internal.h
>>> +++ b/mm/internal.h
>>> @@ -155,6 +155,9 @@ extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
>>>  extern void prep_compound_page(struct page *page, unsigned int order);
>>>  extern int user_min_free_kbytes;
>>>  
>>> +extern void set_zone_contiguous(struct zone *zone);
>>> +extern void clear_zone_contiguous(struct zone *zone);
>>> +
>>>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>>>  
>>>  /*
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 0197d5d..796b271 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -1572,16 +1572,38 @@ void __init page_alloc_init_late(void)
>>>  }
>>>  
>>>  #ifdef CONFIG_CMA
>>> +static void __init adjust_present_page_count(struct page *page, long count)
>>> +{
>>> +	struct zone *zone = page_zone(page);
>>> +
>>> +	/* We don't need to hold a lock since it is boot-up process */
>>> +	zone->present_pages += count;
>>> +}
>>> +
>>>  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>>>  void __init init_cma_reserved_pageblock(struct page *page)
>>>  {
>>>  	unsigned i = pageblock_nr_pages;
>>> +	unsigned long pfn = page_to_pfn(page);
>>>  	struct page *p = page;
>>> +	int nid = page_to_nid(page);
>>> +
>>> +	/*
>>> +	 * ZONE_CMA will steal present pages from other zones by changing
>>> +	 * page links so page_zone() is changed. Before that,
>>> +	 * we need to adjust previous zone's page count first.
>>> +	 */
>>> +	adjust_present_page_count(page, -pageblock_nr_pages);
>>>  
>>>  	do {
>>>  		__ClearPageReserved(p);
>>>  		set_page_count(p, 0);
>>> -	} while (++p, --i);
>>> +
>>> +		/* Steal pages from other zones */
>>> +		set_page_links(p, ZONE_CMA, nid, pfn);
>>> +	} while (++p, ++pfn, --i);
>>> +
>>> +	adjust_present_page_count(page, pageblock_nr_pages);
>>>  
>>>  	set_pageblock_migratetype(page, MIGRATE_CMA);
>>
>> The ZONE_CMA should depends on sparse_mem.
>>
>> Because the zone size is not fixed when init the buddy core.
>> The pageblock_flags will be NULL when setup_usemap.
> 
> Before setup_usemap(), range of ZONE_CMA is set conservatively, from
> min_start_pfn of the node to max_end_pfn of the node. So,
> pageblock_flags will be allocated and assigned properly.
> 
> Unfortunately, I found a bug for FLATMEM system. If you'd like to
> test ZONE_CMA on FLATMEM system, please apply below one.
> 

The filesystem, inode map is also GFP_HIGHUSER_MOVABLE.

SyS_write filemap_fault will also use cma memory.

This may also make cma migrate failed. What's your idea on this type?

> Thanks.
> ------------>8------------
> diff --git a/mm/cma.c b/mm/cma.c
> index 0c1a72f..6cd2973 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -168,13 +168,6 @@ static int __init cma_init_reserved_areas(void)
>                 return 0;
>  
>         for (i = 0; i < cma_area_count; i++) {
> -               int ret = cma_activate_area(&cma_areas[i]);
> -
> -               if (ret)
> -                       return ret;
> -       }
> -
> -       for (i = 0; i < cma_area_count; i++) {
>                 if (start_pfn > cma_areas[i].base_pfn)
>                         start_pfn = cma_areas[i].base_pfn;
>                 if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
> @@ -191,6 +184,13 @@ static int __init cma_init_reserved_areas(void)
>                                         zone->zone_start_pfn;
>         }
>  
> +       for (i = 0; i < cma_area_count; i++) {
> +               int ret = cma_activate_area(&cma_areas[i]);
> +
> +               if (ret)
> +                       return ret;
> +       }
> +
>         /*
>          * Reserved pages for ZONE_CMA are now activated and this would change
>          * ZONE_CMA's managed page counter and other zone's present counter.
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/6] Introduce ZONE_CMA
  2016-06-27 11:25 ` Balbir Singh
@ 2016-06-29  7:57   ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2016-06-29  7:57 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel

On Mon, Jun 27, 2016 at 09:25:45PM +1000, Balbir Singh wrote:
> 
> 
> On 26/05/16 16:22, js1304@gmail.com wrote:
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
> > Hello,
> > 
> > Changes from v2
> > o Rebase on next-20160525
> > o No other changes except following description
> > 
> > There was a discussion with Mel [1] after LSF/MM 2016. I could summarise
> > it to help merge decision but it's better to read by yourself since
> > if I summarise it, it would be biased for me. But, if anyone hope
> > the summary, I will do it. :)
> > 
> > Anyway, Mel's position on this patchset seems to be neutral. He said:
> > "I'm not going to outright NAK your series but I won't ACK it either"
> > 
> > We can fix the problems with any approach but I hope to go a new zone
> > approach because it is less error-prone. It reduces some corner case
> > handling for now and remove need for potential corner case handling to fix
> > problems.
> > 
> > Note that our company is already using ZONE_CMA for a years and
> > there is no problem.
> > 
> > If anyone has a different opinion, please let me know and let's discuss
> > together.
> > 
> > Andrew, if there is something to do for merge, please let me know.
> > 
> > [1] https://lkml.kernel.org/r/20160425053653.GA25662@js1304-P5Q-DELUXE
> > 
> > Changes from v1
> > o Separate some patches which deserve to submit independently
> > o Modify description to reflect current kernel state
> > (e.g. high-order watermark problem disappeared by Mel's work)
> > o Don't increase SECTION_SIZE_BITS to make a room in page flags
> > (detailed reason is on the patch that adds ZONE_CMA)
> > o Adjust ZONE_CMA population code
> > 
> > This series try to solve problems of current CMA implementation.
> > 
> > CMA is introduced to provide physically contiguous pages at runtime
> > without exclusive reserved memory area. But, current implementation
> > works like as previous reserved memory approach, because freepages
> > on CMA region are used only if there is no movable freepage. In other
> > words, freepages on CMA region are only used as fallback. In that
> > situation where freepages on CMA region are used as fallback, kswapd
> > would be woken up easily since there is no unmovable and reclaimable
> > freepage, too. If kswapd starts to reclaim memory, fallback allocation
> > to MIGRATE_CMA doesn't occur any more since movable freepages are
> > already refilled by kswapd and then most of freepage on CMA are left
> > to be in free. This situation looks like exclusive reserved memory case.
> 
> I am afraid I don't understand the problem statement completely understand.
> Is this the ALLOC_CMA case or the !ALLOC_CMA one? I also think one other

It's caused by the mixed usage of these flags, not caused by one
specific flags.

> problem is that in my experience and observation all CMA allocations seem
> to come from one node-- the highest node on the system
> 
> > 
> > In my experiment, I found that if system memory has 1024 MB memory and
> > 512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
> > free memory is left. Detailed reason is that for keeping enough free
> > memory for unmovable and reclaimable allocation, kswapd uses below
> > equation when calculating free memory and it easily go under the watermark.
> > 
> > Free memory for unmovable and reclaimable = Free total - Free CMA pages
> > 
> > This is derivated from the property of CMA freepage that CMA freepage
> > can't be used for unmovable and reclaimable allocation.
> > 
> > Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
> > is lower than low watermark and tries to make free memory until
> > (FreeTotal - FreeCMA) is higher than high watermark. That results
> > in that FreeTotal is moving around 512MB boundary consistently. It
> > then means that we can't utilize full memory capacity.
> > 
> 
> OK.. so you are suggesting that we are under-utilizing the memory in the
> CMA region?

That's right.

> > To fix this problem, I submitted some patches [1] about 10 months ago,
> > but, found some more problems to be fixed before solving this problem.
> > It requires many hooks in allocator hotpath so some developers doesn't
> > like it. Instead, some of them suggest different approach [2] to fix
> > all the problems related to CMA, that is, introducing a new zone to deal
> > with free CMA pages. I agree that it is the best way to go so implement
> > here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
> > decide to add a new zone rather than piggyback on ZONE_MOVABLE since
> > they have some differences. First, reserved CMA pages should not be
> > offlined.
> 
> Why? Why are they special? Even if they are offlined by user action,
> one would expect the following to occur
> 
> 1. User would mark/release the cma region associated with them
> 2. User would then hotplug the memory

CMA region is reserved at booting time and used until system
shutdown. Hotplug CMA region isn't possible, yet. Later, we would
handle it, but, at least, it's not a required feature for now.

> > If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
> > MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
> > to distiguish hotpluggable memory and reserved memory for CMA in the same
> > zone. It would make memory hotplug code which is already complicated
> > more complicated.
> 
> Again why treat it special, one could potentially deny the hotplug based
> on the knowledge of where the CMA region is allocated from

Yes. But, I don't want to use ZONE_MOVABLE for CMA region because there are
some special handling codes for ZONE_MOVABLE and I'd like to minimize
side-effect of this change. If adding a new zone is really problem, I
will use ZONE_MOVABLE.

Thanks.

> > Second, cma_alloc() can be called more frequently
> > than memory hotplug operation and possibly we need to control
> > allocation rate of ZONE_CMA to optimize latency in the future.
> > In this case, separate zone approach is easy to modify. Third, I'd
> > like to see statistics for CMA, separately. Sometimes, we need to debug
> > why cma_alloc() is failed and separate statistics would be more helpful
> > in this situtaion.
> > 
> > Anyway, this patchset solves four problems related to CMA implementation.
> >
> 
> Balbir 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 3/6] mm/cma: populate ZONE_CMA
  2016-06-28 11:23       ` Chen Feng
@ 2016-06-29  8:00         ` Joonsoo Kim
  0 siblings, 0 replies; 34+ messages in thread
From: Joonsoo Kim @ 2016-06-29  8:00 UTC (permalink / raw)
  To: Chen Feng
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, Rui Teng, linux-mm,
	linux-kernel, fujun (F),
	Zhuangluan Su, Yiping Xu, Dan Zhao

On Tue, Jun 28, 2016 at 07:23:23PM +0800, Chen Feng wrote:
> Hello,
> 
> On 2016/6/23 10:52, Joonsoo Kim wrote:
> > On Wed, Jun 22, 2016 at 05:23:06PM +0800, Chen Feng wrote:
> >> Hello,
> >>
> >> On 2016/5/26 14:22, js1304@gmail.com wrote:
> >>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >>>
> >>> Until now, reserved pages for CMA are managed in the ordinary zones
> >>> where page's pfn are belong to. This approach has numorous problems
> >>> and fixing them isn't easy. (It is mentioned on previous patch.)
> >>> To fix this situation, ZONE_CMA is introduced in previous patch, but,
> >>> not yet populated. This patch implement population of ZONE_CMA
> >>> by stealing reserved pages from the ordinary zones.
> >>>
> >>> Unlike previous implementation that kernel allocation request with
> >>> __GFP_MOVABLE could be serviced from CMA region, allocation request only
> >>> with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
> >>> approach. This is an inevitable design decision to use the zone
> >>> implementation because ZONE_CMA could contain highmem. Due to this
> >>> decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.
> >>>
> >>> I don't think it would be a problem because most of file cache pages
> >>> and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
> >>> be proved by the fact that there are many systems with ZONE_HIGHMEM and
> >>> they work fine. Notable disadvantage is that we cannot use these pages
> >>> for blockdev file cache page, because it usually has __GFP_MOVABLE but
> >>> not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
> >>> cons. In my experience, blockdev file cache pages are one of the top
> >>> reason that causes cma_alloc() to fail temporarily. So, we can get more
> >>> guarantee of cma_alloc() success by discarding that case.
> >>>
> >>> Implementation itself is very easy to understand. Steal when cma area is
> >>> initialized and recalculate various per zone stat/threshold.
> >>>
> >>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >>> ---
> >>>  include/linux/memory_hotplug.h |  3 ---
> >>>  mm/cma.c                       | 41 +++++++++++++++++++++++++++++++++++++++++
> >>>  mm/internal.h                  |  3 +++
> >>>  mm/page_alloc.c                | 26 ++++++++++++++++++++++++--
> >>>  4 files changed, 68 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> >>> index a864d79..6fde69b 100644
> >>> --- a/include/linux/memory_hotplug.h
> >>> +++ b/include/linux/memory_hotplug.h
> >>> @@ -198,9 +198,6 @@ void put_online_mems(void);
> >>>  void mem_hotplug_begin(void);
> >>>  void mem_hotplug_done(void);
> >>>  
> >>> -extern void set_zone_contiguous(struct zone *zone);
> >>> -extern void clear_zone_contiguous(struct zone *zone);
> >>> -
> >>>  #else /* ! CONFIG_MEMORY_HOTPLUG */
> >>>  /*
> >>>   * Stub functions for when hotplug is off
> >>> diff --git a/mm/cma.c b/mm/cma.c
> >>> index ea506eb..8684f50 100644
> >>> --- a/mm/cma.c
> >>> +++ b/mm/cma.c
> >>> @@ -38,6 +38,7 @@
> >>>  #include <trace/events/cma.h>
> >>>  
> >>>  #include "cma.h"
> >>> +#include "internal.h"
> >>>  
> >>>  struct cma cma_areas[MAX_CMA_AREAS];
> >>>  unsigned cma_area_count;
> >>> @@ -145,6 +146,11 @@ err:
> >>>  static int __init cma_init_reserved_areas(void)
> >>>  {
> >>>  	int i;
> >>> +	struct zone *zone;
> >>> +	unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> >>> +
> >>> +	if (!cma_area_count)
> >>> +		return 0;
> >>>  
> >>>  	for (i = 0; i < cma_area_count; i++) {
> >>>  		int ret = cma_activate_area(&cma_areas[i]);
> >>> @@ -153,6 +159,41 @@ static int __init cma_init_reserved_areas(void)
> >>>  			return ret;
> >>>  	}
> >>>  
> >>> +	for (i = 0; i < cma_area_count; i++) {
> >>> +		if (start_pfn > cma_areas[i].base_pfn)
> >>> +			start_pfn = cma_areas[i].base_pfn;
> >>> +		if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
> >>> +			end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
> >>> +	}
> >>> +
> >>> +	for_each_populated_zone(zone) {
> >>> +		if (!is_zone_cma(zone))
> >>> +			continue;
> >>> +
> >>> +		/* ZONE_CMA doesn't need to exceed CMA region */
> >>> +		zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
> >>> +		zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
> >>> +					zone->zone_start_pfn;
> >>> +	}
> >>> +
> >>> +	/*
> >>> +	 * Reserved pages for ZONE_CMA are now activated and this would change
> >>> +	 * ZONE_CMA's managed page counter and other zone's present counter.
> >>> +	 * We need to re-calculate various zone information that depends on
> >>> +	 * this initialization.
> >>> +	 */
> >>> +	build_all_zonelists(NULL, NULL);
> >>> +	for_each_populated_zone(zone) {
> >>> +		zone_pcp_update(zone);
> >>> +		set_zone_contiguous(zone);
> >>> +	}
> >>> +
> >>> +	/*
> >>> +	 * We need to re-init per zone wmark by calling
> >>> +	 * init_per_zone_wmark_min() but doesn't call here because it is
> >>> +	 * registered on module_init and it will be called later than us.
> >>> +	 */
> >>> +
> >>>  	return 0;
> >>>  }
> >>>  core_initcall(cma_init_reserved_areas);
> >>> diff --git a/mm/internal.h b/mm/internal.h
> >>> index b6ead95..4c37234 100644
> >>> --- a/mm/internal.h
> >>> +++ b/mm/internal.h
> >>> @@ -155,6 +155,9 @@ extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
> >>>  extern void prep_compound_page(struct page *page, unsigned int order);
> >>>  extern int user_min_free_kbytes;
> >>>  
> >>> +extern void set_zone_contiguous(struct zone *zone);
> >>> +extern void clear_zone_contiguous(struct zone *zone);
> >>> +
> >>>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> >>>  
> >>>  /*
> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>> index 0197d5d..796b271 100644
> >>> --- a/mm/page_alloc.c
> >>> +++ b/mm/page_alloc.c
> >>> @@ -1572,16 +1572,38 @@ void __init page_alloc_init_late(void)
> >>>  }
> >>>  
> >>>  #ifdef CONFIG_CMA
> >>> +static void __init adjust_present_page_count(struct page *page, long count)
> >>> +{
> >>> +	struct zone *zone = page_zone(page);
> >>> +
> >>> +	/* We don't need to hold a lock since it is boot-up process */
> >>> +	zone->present_pages += count;
> >>> +}
> >>> +
> >>>  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> >>>  void __init init_cma_reserved_pageblock(struct page *page)
> >>>  {
> >>>  	unsigned i = pageblock_nr_pages;
> >>> +	unsigned long pfn = page_to_pfn(page);
> >>>  	struct page *p = page;
> >>> +	int nid = page_to_nid(page);
> >>> +
> >>> +	/*
> >>> +	 * ZONE_CMA will steal present pages from other zones by changing
> >>> +	 * page links so page_zone() is changed. Before that,
> >>> +	 * we need to adjust previous zone's page count first.
> >>> +	 */
> >>> +	adjust_present_page_count(page, -pageblock_nr_pages);
> >>>  
> >>>  	do {
> >>>  		__ClearPageReserved(p);
> >>>  		set_page_count(p, 0);
> >>> -	} while (++p, --i);
> >>> +
> >>> +		/* Steal pages from other zones */
> >>> +		set_page_links(p, ZONE_CMA, nid, pfn);
> >>> +	} while (++p, ++pfn, --i);
> >>> +
> >>> +	adjust_present_page_count(page, pageblock_nr_pages);
> >>>  
> >>>  	set_pageblock_migratetype(page, MIGRATE_CMA);
> >>
> >> The ZONE_CMA should depends on sparse_mem.
> >>
> >> Because the zone size is not fixed when init the buddy core.
> >> The pageblock_flags will be NULL when setup_usemap.
> > 
> > Before setup_usemap(), range of ZONE_CMA is set conservatively, from
> > min_start_pfn of the node to max_end_pfn of the node. So,
> > pageblock_flags will be allocated and assigned properly.
> > 
> > Unfortunately, I found a bug for FLATMEM system. If you'd like to
> > test ZONE_CMA on FLATMEM system, please apply below one.
> > 
> 
> The filesystem, inode map is also GFP_HIGHUSER_MOVABLE.
> 
> SyS_write filemap_fault will also use cma memory.
> 
> This may also make cma migrate failed. What's your idea on this type?

Purpose of this patchset is not improving success rate. It can be
solved separately if it is a real problem. Could you share how above
usage makes CMA migration failed and how long does it continue to
fail?

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-06-29  7:58 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-26  6:22 [PATCH v3 0/6] Introduce ZONE_CMA js1304
2016-05-26  6:22 ` [PATCH v3 1/6] mm/page_alloc: recalculate some of zone threshold when on/offline memory js1304
2016-06-24 13:20   ` Vlastimil Babka
2016-06-28  8:12     ` Joonsoo Kim
2016-05-26  6:22 ` [PATCH v3 2/6] mm/cma: introduce new zone, ZONE_CMA js1304
2016-05-26  6:22 ` [PATCH v3 3/6] mm/cma: populate ZONE_CMA js1304
2016-06-22  9:23   ` Chen Feng
2016-06-23  2:52     ` Joonsoo Kim
2016-06-28 11:23       ` Chen Feng
2016-06-29  8:00         ` Joonsoo Kim
2016-06-27  8:24   ` Vlastimil Babka
2016-06-28  8:31     ` Joonsoo Kim
2016-05-26  6:22 ` [PATCH v3 4/6] mm/cma: remove ALLOC_CMA js1304
2016-06-27  9:30   ` Vlastimil Babka
2016-06-28  8:16     ` Joonsoo Kim
2016-05-26  6:22 ` [PATCH v3 5/6] mm/cma: remove MIGRATE_CMA js1304
2016-05-27  1:42   ` Chen Feng
2016-05-27  5:32     ` Joonsoo Kim
2016-06-27  9:46   ` Vlastimil Babka
2016-06-28  8:17     ` Joonsoo Kim
2016-05-26  6:22 ` [PATCH v3 6/6] mm/cma: remove per zone CMA stat js1304
2016-06-27  9:54   ` Vlastimil Babka
2016-05-26  8:04 ` [PATCH v3 0/6] Introduce ZONE_CMA Feng Tang
2016-05-27  5:28   ` Joonsoo Kim
2016-05-27  6:25     ` Feng Tang
2016-05-27  6:42       ` Joonsoo Kim
2016-05-27  7:27         ` Feng Tang
2016-05-30  5:45           ` Joonsoo Kim
2016-06-17  7:38           ` Chen Feng
2016-06-20  6:48             ` Joonsoo Kim
2016-06-21  2:08               ` Chen Feng
2016-06-21  6:56                 ` Joonsoo Kim
2016-06-27 11:25 ` Balbir Singh
2016-06-29  7:57   ` Joonsoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).