All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/6] Introduce ZONE_CMA
@ 2016-08-29  5:07 ` js1304
  0 siblings, 0 replies; 54+ messages in thread
From: js1304 @ 2016-08-29  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Hello,

Changes from v4
o Rebase on next-20160825
o Add general fix patch for lowmem reserve
o Fix lowmem reserve ratio
o Fix zone span optimizaion per Vlastimil
o Fix pageset initialization
o Change invocation timing on cma_init_reserved_areas()

Changes from v3
o Rebase on next-20160805
o Split first patch per Vlastimil
o Remove useless function parameter per Vlastimil
o Add code comment per Vlastimil
o Add following description on cover-letter

This is the 5th version of ZONE_CMA patchset. Most of changes are
due to rebase and some minor fixes.

CMA has many problems and I mentioned them on the bottom of the
cover letter. These problems comes from limitation of CMA memory that
should be always migratable for device usage. I think that introducing
a new zone is the best approach to solve them. Here are the reasons.

Zone is introduced to solve some issues due to H/W addressing limitation.
MM subsystem is implemented to work efficiently with these zones.
Allocation/reclaim logic in MM consider this limitation very much.
What I did in this patchset is introducing a new zone and extending zone's
concept slightly. New concept is that zone can have not only H/W addressing
limitation but also S/W limitation to guarantee page migration.
This concept is originated from ZONE_MOVABLE and it works well
for a long time. So, ZONE_CMA should not be special at this moment.

There is a major concern from Mel that ZONE_MOVABLE which has
S/W limitation causes highmem/lowmem problem. Highmem/lowmem problem is
that some of memory cannot be usable for kernel memory due to limitation
of the zone. It causes to break LRU ordering and makes hard to find kernel
usable memory when memory pressure.

However, important point is that this problem doesn't come from
implementation detail (ZONE_MOVABLE/MIGRATETYPE). Even if we implement it
by MIGRATETYPE instead of by ZONE_MOVABLE, we cannot use that type of
memory for kernel allocation because it isn't migratable. So, it will cause
to break LRU ordering, too. We cannot avoid the problem in any case.
Therefore, we should focus on which solution is better for maintainance
and not intrusive for MM subsystem.

In this viewpoint, I think that zone approach is better. As mentioned
earlier, MM subsystem already have many infrastructures to deal with
zone's H/W addressing limitation. Adding S/W limitation on zone concept
and adding a new zone doesn't change anything. It will work by itself.
My patchset can remove many hooks related to CMA area management in MM
while solving the problems. More hooks are required to solve the problems
if we choose MIGRATETYPE approach.

Although Mel withdrew the review, Vlastimil expressed an agreement on this
new zone approach [6].

 "I realize I differ here from much more experienced mm guys, and will
 probably deservingly regret it later on, but I think that the ZONE_CMA
 approach could work indeed better than current MIGRATE_CMA pageblocks."

If anyone has a different opinion, please let me know.

Thanks.


Changes from v2
o Rebase on next-20160525
o No other changes except following description

There was a discussion with Mel [5] after LSF/MM 2016. I could summarise
it to help merge decision but it's better to read by yourself since
if I summarise it, it would be biased for me. But, if anyone hope
the summary, I will do it. :)

Anyway, Mel's position on this patchset seems to be neutral. He saids:
"I'm not going to outright NAK your series but I won't ACK it either"

We can fix the problems with any approach but I hope to go a new zone
approach because it is less error-prone. It reduces some corner case
handling for now and remove need for potential corner case handling to fix
problems.

Note that our company is already using ZONE_CMA and there is no problem.

If anyone has a different opinion, please let me know and let's discuss
together.

Andrew, if there is something to do for merge, please let me know.

Changes from v1
o Separate some patches which deserve to submit independently
o Modify description to reflect current kernel state
(e.g. high-order watermark problem disappeared by Mel's work)
o Don't increase SECTION_SIZE_BITS to make a room in page flags
(detailed reason is on the patch that adds ZONE_CMA)
o Adjust ZONE_CMA population code

This series try to solve problems of current CMA implementation.

CMA is introduced to provide physically contiguous pages at runtime
without exclusive reserved memory area. But, current implementation
works like as previous reserved memory approach, because freepages
on CMA region are used only if there is no movable freepage. In other
words, freepages on CMA region are only used as fallback. In that
situation where freepages on CMA region are used as fallback, kswapd
would be woken up easily since there is no unmovable and reclaimable
freepage, too. If kswapd starts to reclaim memory, fallback allocation
to MIGRATE_CMA doesn't occur any more since movable freepages are
already refilled by kswapd and then most of freepage on CMA are left
to be in free. This situation looks like exclusive reserved memory case.

In my experiment, I found that if system memory has 1024 MB memory and
512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
free memory is left. Detailed reason is that for keeping enough free
memory for unmovable and reclaimable allocation, kswapd uses below
equation when calculating free memory and it easily go under the watermark.

Free memory for unmovable and reclaimable = Free total - Free CMA pages

This is derivated from the property of CMA freepage that CMA freepage
can't be used for unmovable and reclaimable allocation.

Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
is lower than low watermark and tries to make free memory until
(FreeTotal - FreeCMA) is higher than high watermark. That results
in that FreeTotal is moving around 512MB boundary consistently. It
then means that we can't utilize full memory capacity.

To fix this problem, I submitted some patches [1] about 10 months ago,
but, found some more problems to be fixed before solving this problem.
It requires many hooks in allocator hotpath so some developers doesn't
like it. Instead, some of them suggest different approach [2] to fix
all the problems related to CMA, that is, introducing a new zone to deal
with free CMA pages. I agree that it is the best way to go so implement
here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
decide to add a new zone rather than piggyback on ZONE_MOVABLE since
they have some differences. First, reserved CMA pages should not be
offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
to distiguish hotpluggable memory and reserved memory for CMA in the same
zone. It would make memory hotplug code which is already complicated
more complicated. Second, cma_alloc() can be called more frequently
than memory hotplug operation and possibly we need to control
allocation rate of ZONE_CMA to optimize latency in the future.
In this case, separate zone approach is easy to modify. Third, I'd
like to see statistics for CMA, separately. Sometimes, we need to debug
why cma_alloc() is failed and separate statistics would be more helpful
in this situtaion.

Anyway, this patchset solves four problems related to CMA implementation.

1) Utilization problem
As mentioned above, we can't utilize full memory capacity due to the
limitation of CMA freepage and fallback policy. This patchset implements
a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
typed allocation is used for page cache and anonymous pages which
occupies most of memory usage in normal case so we can utilize full
memory capacity. Below is the experiment result about this problem.

8 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16

<Before this series>
CMA reserve:            0 MB            512 MB
Elapsed-time:           92.4		186.5
pswpin:                 82		18647
pswpout:                160		69839

<After this series>
CMA reserve:            0 MB            512 MB
Elapsed-time:           93.1		93.4
pswpin:                 84		46
pswpout:                183		92

FYI, there is another attempt [3] trying to solve this problem in lkml.
And, as far as I know, Qualcomm also has out-of-tree solution for this
problem.

2) Reclaim problem
Currently, there is no logic to distinguish CMA pages in reclaim path.
If reclaim is initiated for unmovable and reclaimable allocation,
reclaiming CMA pages doesn't help to satisfy the request and reclaiming
CMA page is just waste. By managing CMA pages in the new zone, we can
skip to reclaim ZONE_CMA completely if it is unnecessary.

3) Atomic allocation failure problem
Kswapd isn't started to reclaim pages when allocation request is movable
type and there is enough free page in the CMA region. After bunch of
consecutive movable allocation requests, free pages in ordinary region
(not CMA region) would be exhausted without waking up kswapd. At that time,
if atomic unmovable allocation comes, it can't be successful since there
is not enough page in ordinary region. This problem is reported
by Aneesh [4] and can be solved by this patchset.

4) Inefficiently work of compaction
Usual high-order allocation request is unmovable type and it cannot
be serviced from CMA area. In compaction, migration scanner doesn't
distinguish migratable pages on the CMA area and do migration.
In this case, even if we make high-order page on that region, it
cannot be used due to type mismatch. This patch will solve this problem
by separating CMA pages from ordinary zones.

I passed boot test on x86_64, x86_32, arm and arm64. I did some stress
tests on x86_64 and x86_32 and there is no problem. Feel free to enjoy
and please give me a feedback. :)

This patchset is based on linux-next-20160330.

Thanks.

[1] https://lkml.org/lkml/2014/5/28/64
[2] https://lkml.org/lkml/2014/11/4/55
[3] https://lkml.org/lkml/2014/10/15/623
[4] http://www.spinics.net/lists/linux-mm/msg100562.html
[5] https://lkml.kernel.org/r/20160425053653.GA25662@js1304-P5Q-DELUXE
[6] https://lkml.kernel.org/r/1919a85d-6e1e-374f-b8c3-1236c36b0393@suse.cz

Joonsoo Kim (6):
  mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request
  mm/cma: introduce new zone, ZONE_CMA
  mm/cma: populate ZONE_CMA
  mm/cma: remove ALLOC_CMA
  mm/cma: remove MIGRATE_CMA
  mm/cma: remove per zone CMA stat

 arch/x86/mm/highmem_32.c          |   8 ++
 fs/proc/meminfo.c                 |   2 +-
 include/linux/cma.h               |   6 ++
 include/linux/gfp.h               |  32 +++---
 include/linux/memory_hotplug.h    |   3 -
 include/linux/mempolicy.h         |   2 +-
 include/linux/mm.h                |   1 +
 include/linux/mmzone.h            |  58 +++++-----
 include/linux/page-isolation.h    |   5 +-
 include/linux/vm_event_item.h     |  10 +-
 include/linux/vmstat.h            |   8 --
 include/trace/events/compaction.h |  10 +-
 kernel/power/snapshot.c           |   8 ++
 mm/cma.c                          |  73 +++++++++++--
 mm/compaction.c                   |  14 +--
 mm/hugetlb.c                      |   2 +-
 mm/internal.h                     |   4 +-
 mm/memory_hotplug.c               |  10 +-
 mm/page_alloc.c                   | 220 +++++++++++++++++++-------------------
 mm/page_isolation.c               |  15 ++-
 mm/page_owner.c                   |   6 +-
 mm/usercopy.c                     |   4 +-
 mm/vmstat.c                       |  10 +-
 23 files changed, 303 insertions(+), 208 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v5 0/6] Introduce ZONE_CMA
@ 2016-08-29  5:07 ` js1304
  0 siblings, 0 replies; 54+ messages in thread
From: js1304 @ 2016-08-29  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Hello,

Changes from v4
o Rebase on next-20160825
o Add general fix patch for lowmem reserve
o Fix lowmem reserve ratio
o Fix zone span optimizaion per Vlastimil
o Fix pageset initialization
o Change invocation timing on cma_init_reserved_areas()

Changes from v3
o Rebase on next-20160805
o Split first patch per Vlastimil
o Remove useless function parameter per Vlastimil
o Add code comment per Vlastimil
o Add following description on cover-letter

This is the 5th version of ZONE_CMA patchset. Most of changes are
due to rebase and some minor fixes.

CMA has many problems and I mentioned them on the bottom of the
cover letter. These problems comes from limitation of CMA memory that
should be always migratable for device usage. I think that introducing
a new zone is the best approach to solve them. Here are the reasons.

Zone is introduced to solve some issues due to H/W addressing limitation.
MM subsystem is implemented to work efficiently with these zones.
Allocation/reclaim logic in MM consider this limitation very much.
What I did in this patchset is introducing a new zone and extending zone's
concept slightly. New concept is that zone can have not only H/W addressing
limitation but also S/W limitation to guarantee page migration.
This concept is originated from ZONE_MOVABLE and it works well
for a long time. So, ZONE_CMA should not be special at this moment.

There is a major concern from Mel that ZONE_MOVABLE which has
S/W limitation causes highmem/lowmem problem. Highmem/lowmem problem is
that some of memory cannot be usable for kernel memory due to limitation
of the zone. It causes to break LRU ordering and makes hard to find kernel
usable memory when memory pressure.

However, important point is that this problem doesn't come from
implementation detail (ZONE_MOVABLE/MIGRATETYPE). Even if we implement it
by MIGRATETYPE instead of by ZONE_MOVABLE, we cannot use that type of
memory for kernel allocation because it isn't migratable. So, it will cause
to break LRU ordering, too. We cannot avoid the problem in any case.
Therefore, we should focus on which solution is better for maintainance
and not intrusive for MM subsystem.

In this viewpoint, I think that zone approach is better. As mentioned
earlier, MM subsystem already have many infrastructures to deal with
zone's H/W addressing limitation. Adding S/W limitation on zone concept
and adding a new zone doesn't change anything. It will work by itself.
My patchset can remove many hooks related to CMA area management in MM
while solving the problems. More hooks are required to solve the problems
if we choose MIGRATETYPE approach.

Although Mel withdrew the review, Vlastimil expressed an agreement on this
new zone approach [6].

 "I realize I differ here from much more experienced mm guys, and will
 probably deservingly regret it later on, but I think that the ZONE_CMA
 approach could work indeed better than current MIGRATE_CMA pageblocks."

If anyone has a different opinion, please let me know.

Thanks.


Changes from v2
o Rebase on next-20160525
o No other changes except following description

There was a discussion with Mel [5] after LSF/MM 2016. I could summarise
it to help merge decision but it's better to read by yourself since
if I summarise it, it would be biased for me. But, if anyone hope
the summary, I will do it. :)

Anyway, Mel's position on this patchset seems to be neutral. He saids:
"I'm not going to outright NAK your series but I won't ACK it either"

We can fix the problems with any approach but I hope to go a new zone
approach because it is less error-prone. It reduces some corner case
handling for now and remove need for potential corner case handling to fix
problems.

Note that our company is already using ZONE_CMA and there is no problem.

If anyone has a different opinion, please let me know and let's discuss
together.

Andrew, if there is something to do for merge, please let me know.

Changes from v1
o Separate some patches which deserve to submit independently
o Modify description to reflect current kernel state
(e.g. high-order watermark problem disappeared by Mel's work)
o Don't increase SECTION_SIZE_BITS to make a room in page flags
(detailed reason is on the patch that adds ZONE_CMA)
o Adjust ZONE_CMA population code

This series try to solve problems of current CMA implementation.

CMA is introduced to provide physically contiguous pages at runtime
without exclusive reserved memory area. But, current implementation
works like as previous reserved memory approach, because freepages
on CMA region are used only if there is no movable freepage. In other
words, freepages on CMA region are only used as fallback. In that
situation where freepages on CMA region are used as fallback, kswapd
would be woken up easily since there is no unmovable and reclaimable
freepage, too. If kswapd starts to reclaim memory, fallback allocation
to MIGRATE_CMA doesn't occur any more since movable freepages are
already refilled by kswapd and then most of freepage on CMA are left
to be in free. This situation looks like exclusive reserved memory case.

In my experiment, I found that if system memory has 1024 MB memory and
512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
free memory is left. Detailed reason is that for keeping enough free
memory for unmovable and reclaimable allocation, kswapd uses below
equation when calculating free memory and it easily go under the watermark.

Free memory for unmovable and reclaimable = Free total - Free CMA pages

This is derivated from the property of CMA freepage that CMA freepage
can't be used for unmovable and reclaimable allocation.

Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
is lower than low watermark and tries to make free memory until
(FreeTotal - FreeCMA) is higher than high watermark. That results
in that FreeTotal is moving around 512MB boundary consistently. It
then means that we can't utilize full memory capacity.

To fix this problem, I submitted some patches [1] about 10 months ago,
but, found some more problems to be fixed before solving this problem.
It requires many hooks in allocator hotpath so some developers doesn't
like it. Instead, some of them suggest different approach [2] to fix
all the problems related to CMA, that is, introducing a new zone to deal
with free CMA pages. I agree that it is the best way to go so implement
here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
decide to add a new zone rather than piggyback on ZONE_MOVABLE since
they have some differences. First, reserved CMA pages should not be
offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
to distiguish hotpluggable memory and reserved memory for CMA in the same
zone. It would make memory hotplug code which is already complicated
more complicated. Second, cma_alloc() can be called more frequently
than memory hotplug operation and possibly we need to control
allocation rate of ZONE_CMA to optimize latency in the future.
In this case, separate zone approach is easy to modify. Third, I'd
like to see statistics for CMA, separately. Sometimes, we need to debug
why cma_alloc() is failed and separate statistics would be more helpful
in this situtaion.

Anyway, this patchset solves four problems related to CMA implementation.

1) Utilization problem
As mentioned above, we can't utilize full memory capacity due to the
limitation of CMA freepage and fallback policy. This patchset implements
a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
typed allocation is used for page cache and anonymous pages which
occupies most of memory usage in normal case so we can utilize full
memory capacity. Below is the experiment result about this problem.

8 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16

<Before this series>
CMA reserve:            0 MB            512 MB
Elapsed-time:           92.4		186.5
pswpin:                 82		18647
pswpout:                160		69839

<After this series>
CMA reserve:            0 MB            512 MB
Elapsed-time:           93.1		93.4
pswpin:                 84		46
pswpout:                183		92

FYI, there is another attempt [3] trying to solve this problem in lkml.
And, as far as I know, Qualcomm also has out-of-tree solution for this
problem.

2) Reclaim problem
Currently, there is no logic to distinguish CMA pages in reclaim path.
If reclaim is initiated for unmovable and reclaimable allocation,
reclaiming CMA pages doesn't help to satisfy the request and reclaiming
CMA page is just waste. By managing CMA pages in the new zone, we can
skip to reclaim ZONE_CMA completely if it is unnecessary.

3) Atomic allocation failure problem
Kswapd isn't started to reclaim pages when allocation request is movable
type and there is enough free page in the CMA region. After bunch of
consecutive movable allocation requests, free pages in ordinary region
(not CMA region) would be exhausted without waking up kswapd. At that time,
if atomic unmovable allocation comes, it can't be successful since there
is not enough page in ordinary region. This problem is reported
by Aneesh [4] and can be solved by this patchset.

4) Inefficiently work of compaction
Usual high-order allocation request is unmovable type and it cannot
be serviced from CMA area. In compaction, migration scanner doesn't
distinguish migratable pages on the CMA area and do migration.
In this case, even if we make high-order page on that region, it
cannot be used due to type mismatch. This patch will solve this problem
by separating CMA pages from ordinary zones.

I passed boot test on x86_64, x86_32, arm and arm64. I did some stress
tests on x86_64 and x86_32 and there is no problem. Feel free to enjoy
and please give me a feedback. :)

This patchset is based on linux-next-20160330.

Thanks.

[1] https://lkml.org/lkml/2014/5/28/64
[2] https://lkml.org/lkml/2014/11/4/55
[3] https://lkml.org/lkml/2014/10/15/623
[4] http://www.spinics.net/lists/linux-mm/msg100562.html
[5] https://lkml.kernel.org/r/20160425053653.GA25662@js1304-P5Q-DELUXE
[6] https://lkml.kernel.org/r/1919a85d-6e1e-374f-b8c3-1236c36b0393@suse.cz

Joonsoo Kim (6):
  mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request
  mm/cma: introduce new zone, ZONE_CMA
  mm/cma: populate ZONE_CMA
  mm/cma: remove ALLOC_CMA
  mm/cma: remove MIGRATE_CMA
  mm/cma: remove per zone CMA stat

 arch/x86/mm/highmem_32.c          |   8 ++
 fs/proc/meminfo.c                 |   2 +-
 include/linux/cma.h               |   6 ++
 include/linux/gfp.h               |  32 +++---
 include/linux/memory_hotplug.h    |   3 -
 include/linux/mempolicy.h         |   2 +-
 include/linux/mm.h                |   1 +
 include/linux/mmzone.h            |  58 +++++-----
 include/linux/page-isolation.h    |   5 +-
 include/linux/vm_event_item.h     |  10 +-
 include/linux/vmstat.h            |   8 --
 include/trace/events/compaction.h |  10 +-
 kernel/power/snapshot.c           |   8 ++
 mm/cma.c                          |  73 +++++++++++--
 mm/compaction.c                   |  14 +--
 mm/hugetlb.c                      |   2 +-
 mm/internal.h                     |   4 +-
 mm/memory_hotplug.c               |  10 +-
 mm/page_alloc.c                   | 220 +++++++++++++++++++-------------------
 mm/page_isolation.c               |  15 ++-
 mm/page_owner.c                   |   6 +-
 mm/usercopy.c                     |   4 +-
 mm/vmstat.c                       |  10 +-
 23 files changed, 303 insertions(+), 208 deletions(-)

-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v5 1/6] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request
  2016-08-29  5:07 ` js1304
@ 2016-08-29  5:07   ` js1304
  -1 siblings, 0 replies; 54+ messages in thread
From: js1304 @ 2016-08-29  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
important to reserve. When ZONE_MOVABLE is used, this problem would
theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
allocation request which is mainly used for page cache and anon page
allocation. So, fix it.

And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
makes code complex. For example, if there is highmem system, following
reserve ratio is activated for *NORMAL ZONE* which would be easyily
misleading people.

 #ifdef CONFIG_HIGHMEM
 32
 #endif

This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
array by MAX_NR_ZONES and place "#ifdef" to right place.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/mmzone.h | 2 +-
 mm/page_alloc.c        | 7 ++++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d572b78..e3f39af 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -877,7 +877,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
 					void __user *, size_t *, loff_t *);
 int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
 					void __user *, size_t *, loff_t *);
-extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
+extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
 int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
 					void __user *, size_t *, loff_t *);
 int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4f7d5d7..a8310de 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -198,17 +198,18 @@ static void __free_pages_ok(struct page *page, unsigned int order);
  * TBD: should special case ZONE_DMA32 machines here - in those we normally
  * don't need any ZONE_NORMAL reservation
  */
-int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
+int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
 #ifdef CONFIG_ZONE_DMA
 	 256,
 #endif
 #ifdef CONFIG_ZONE_DMA32
 	 256,
 #endif
-#ifdef CONFIG_HIGHMEM
 	 32,
+#ifdef CONFIG_HIGHMEM
+	 INT_MAX,
 #endif
-	 32,
+	 INT_MAX,
 };
 
 EXPORT_SYMBOL(totalram_pages);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v5 1/6] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request
@ 2016-08-29  5:07   ` js1304
  0 siblings, 0 replies; 54+ messages in thread
From: js1304 @ 2016-08-29  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
important to reserve. When ZONE_MOVABLE is used, this problem would
theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
allocation request which is mainly used for page cache and anon page
allocation. So, fix it.

And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
makes code complex. For example, if there is highmem system, following
reserve ratio is activated for *NORMAL ZONE* which would be easyily
misleading people.

 #ifdef CONFIG_HIGHMEM
 32
 #endif

This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
array by MAX_NR_ZONES and place "#ifdef" to right place.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/mmzone.h | 2 +-
 mm/page_alloc.c        | 7 ++++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d572b78..e3f39af 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -877,7 +877,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
 					void __user *, size_t *, loff_t *);
 int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
 					void __user *, size_t *, loff_t *);
-extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
+extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
 int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
 					void __user *, size_t *, loff_t *);
 int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4f7d5d7..a8310de 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -198,17 +198,18 @@ static void __free_pages_ok(struct page *page, unsigned int order);
  * TBD: should special case ZONE_DMA32 machines here - in those we normally
  * don't need any ZONE_NORMAL reservation
  */
-int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
+int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
 #ifdef CONFIG_ZONE_DMA
 	 256,
 #endif
 #ifdef CONFIG_ZONE_DMA32
 	 256,
 #endif
-#ifdef CONFIG_HIGHMEM
 	 32,
+#ifdef CONFIG_HIGHMEM
+	 INT_MAX,
 #endif
-	 32,
+	 INT_MAX,
 };
 
 EXPORT_SYMBOL(totalram_pages);
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v5 2/6] mm/cma: introduce new zone, ZONE_CMA
  2016-08-29  5:07 ` js1304
@ 2016-08-29  5:07   ` js1304
  -1 siblings, 0 replies; 54+ messages in thread
From: js1304 @ 2016-08-29  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Attached cover-letter:

This series try to solve problems of current CMA implementation.

CMA is introduced to provide physically contiguous pages at runtime
without exclusive reserved memory area. But, current implementation
works like as previous reserved memory approach, because freepages
on CMA region are used only if there is no movable freepage. In other
words, freepages on CMA region are only used as fallback. In that
situation where freepages on CMA region are used as fallback, kswapd
would be woken up easily since there is no unmovable and reclaimable
freepage, too. If kswapd starts to reclaim memory, fallback allocation
to MIGRATE_CMA doesn't occur any more since movable freepages are
already refilled by kswapd and then most of freepage on CMA are left
to be in free. This situation looks like exclusive reserved memory case.

In my experiment, I found that if system memory has 1024 MB memory and
512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
free memory is left. Detailed reason is that for keeping enough free
memory for unmovable and reclaimable allocation, kswapd uses below
equation when calculating free memory and it easily go under the watermark.

Free memory for unmovable and reclaimable = Free total - Free CMA pages

This is derivated from the property of CMA freepage that CMA freepage
can't be used for unmovable and reclaimable allocation.

Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
is lower than low watermark and tries to make free memory until
(FreeTotal - FreeCMA) is higher than high watermark. That results
in that FreeTotal is moving around 512MB boundary consistently. It
then means that we can't utilize full memory capacity.

To fix this problem, I submitted some patches [1] about 10 months ago,
but, found some more problems to be fixed before solving this problem.
It requires many hooks in allocator hotpath so some developers doesn't
like it. Instead, some of them suggest different approach [2] to fix
all the problems related to CMA, that is, introducing a new zone to deal
with free CMA pages. I agree that it is the best way to go so implement
here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
decide to add a new zone rather than piggyback on ZONE_MOVABLE since
they have some differences. First, reserved CMA pages should not be
offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
to distiguish hotpluggable memory and reserved memory for CMA in the same
zone. It would make memory hotplug code which is already complicated
more complicated. Second, cma_alloc() can be called more frequently
than memory hotplug operation and possibly we need to control
allocation rate of ZONE_CMA to optimize latency in the future.
In this case, separate zone approach is easy to modify. Third, I'd
like to see statistics for CMA, separately. Sometimes, we need to debug
why cma_alloc() is failed and separate statistics would be more helpful
in this situtaion.

Anyway, this patchset solves four problems related to CMA implementation.

1) Utilization problem
As mentioned above, we can't utilize full memory capacity due to the
limitation of CMA freepage and fallback policy. This patchset implements
a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
typed allocation is used for page cache and anonymous pages which
occupies most of memory usage in normal case so we can utilize full
memory capacity. Below is the experiment result about this problem.

8 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16

<Before this series>
CMA reserve:            0 MB            512 MB
Elapsed-time:           92.4		186.5
pswpin:                 82		18647
pswpout:                160		69839

<After this series>
CMA reserve:            0 MB            512 MB
Elapsed-time:           93.1		93.4
pswpin:                 84		46
pswpout:                183		92

FYI, there is another attempt [3] trying to solve this problem in lkml.
And, as far as I know, Qualcomm also has out-of-tree solution for this
problem.

2) Reclaim problem
Currently, there is no logic to distinguish CMA pages in reclaim path.
If reclaim is initiated for unmovable and reclaimable allocation,
reclaiming CMA pages doesn't help to satisfy the request and reclaiming
CMA page is just waste. By managing CMA pages in the new zone, we can
skip to reclaim ZONE_CMA completely if it is unnecessary.

3) Atomic allocation failure problem
Kswapd isn't started to reclaim pages when allocation request is movable
type and there is enough free page in the CMA region. After bunch of
consecutive movable allocation requests, free pages in ordinary region
(not CMA region) would be exhausted without waking up kswapd. At that time,
if atomic unmovable allocation comes, it can't be successful since there
is not enough page in ordinary region. This problem is reported
by Aneesh [4] and can be solved by this patchset.

4) Inefficiently work of compaction
Usual high-order allocation request is unmovable type and it cannot
be serviced from CMA area. In compaction, migration scanner doesn't
distinguish migratable pages on the CMA area and do migration.
In this case, even if we make high-order page on that region, it
cannot be used due to type mismatch. This patch will solve this problem
by separating CMA pages from ordinary zones.

[1] https://lkml.org/lkml/2014/5/28/64
[2] https://lkml.org/lkml/2014/11/4/55
[3] https://lkml.org/lkml/2014/10/15/623
[4] http://www.spinics.net/lists/linux-mm/msg100562.html
[5] https://lkml.org/lkml/2014/5/30/320

For this patch:

Currently, reserved pages for CMA are managed together with normal pages.
To distinguish them, we used migratetype, MIGRATE_CMA, and
do special handlings for this migratetype. But, it turns out that
there are too many problems with this approach and to fix all of them
needs many more hooks to page allocation and reclaim path so
some developers express their discomfort and problems on CMA aren't fixed
for a long time.

To terminate this situation and fix CMA problems, this patch implements
ZONE_CMA. Reserved pages for CMA will be managed in this new zone. This
approach will remove all exisiting hooks for MIGRATE_CMA and many
problems related to CMA implementation will be solved.

This patch only add basic infrastructure of ZONE_CMA. In the following
patch, ZONE_CMA is actually populated and used.

Adding a new zone could cause two possible problems. One is the overflow
of page flags and the other is GFP_ZONES_TABLE issue.

Following is page-flags layout described in page-flags-layout.h.

1. No sparsemem or sparsemem vmemmap: |       NODE     | ZONE |             ... | FLAGS |
2.      " plus space for last_cpupid: |       NODE     | ZONE | LAST_CPUPID ... | FLAGS |
3. classic sparse with space for node:| SECTION | NODE | ZONE |             ... | FLAGS |
4.      " plus space for last_cpupid: | SECTION | NODE | ZONE | LAST_CPUPID ... | FLAGS |
5. classic sparse no space for node:  | SECTION |     ZONE    | ... | FLAGS |

There is no problem in #1, #2 configurations for 64-bit system. There are
enough room even for extremiely large x86_64 system. 32-bit system would
not have many nodes so it would have no problem, too.
System with #3, #4, #5 configurations could be affected by this zone
addition, but, thanks to recent THP rework which reduce one page flag,
problem surface would be small. In some configurations, problem is
still possible, but, it highly depends on individual configuration
so impact cannot be easily estimated. I guess that usual system
with CONFIG_CMA would not be affected. If there is a problem,
we can adjust section width or node width for that architecture.

Currently, GFP_ZONES_TABLE is 32-bit value for 32-bit bit operation
in the 32-bit system. If we add one more zone, it will be 48-bit and
32-bit bit operation cannot be possible. Although it will cause slight
overhead, there is no other way so this patch relax GFP_ZONES_TABLE's
32-bit limitation. 32-bit System with CONFIG_CMA will be affected by
this change but it would be marginal.

Note that there are many checkpatch warnings but I think that current
code is better for readability than fixing them up.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 arch/x86/mm/highmem_32.c          |  8 +++++
 include/linux/gfp.h               | 29 +++++++++++-------
 include/linux/mempolicy.h         |  2 +-
 include/linux/mmzone.h            | 31 +++++++++++++++++++-
 include/linux/vm_event_item.h     | 10 ++++++-
 include/trace/events/compaction.h | 10 ++++++-
 kernel/power/snapshot.c           |  8 +++++
 mm/memory_hotplug.c               |  3 ++
 mm/page_alloc.c                   | 62 +++++++++++++++++++++++++++++++++------
 mm/vmstat.c                       |  9 +++++-
 10 files changed, 147 insertions(+), 25 deletions(-)

diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index 6d18b70..52a14da 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -120,6 +120,14 @@ void __init set_highmem_pages_init(void)
 		if (!is_highmem(zone))
 			continue;
 
+		/*
+		 * ZONE_CMA is a special zone that should not be
+		 * participated in initialization because it's pages
+		 * would be initialized by initialization of other zones.
+		 */
+		if (is_zone_cma(zone))
+			continue;
+
 		zone_start_pfn = zone->zone_start_pfn;
 		zone_end_pfn = zone_start_pfn + zone->spanned_pages;
 
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index f8041f9de..b86e0c2 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -302,6 +302,12 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
 #define OPT_ZONE_DMA32 ZONE_NORMAL
 #endif
 
+#ifdef CONFIG_CMA
+#define OPT_ZONE_CMA ZONE_CMA
+#else
+#define OPT_ZONE_CMA ZONE_MOVABLE
+#endif
+
 /*
  * GFP_ZONE_TABLE is a word size bitstring that is used for looking up the
  * zone to use given the lowest 4 bits of gfp_t. Entries are ZONE_SHIFT long
@@ -332,7 +338,6 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
  *       0xe    => BAD (MOVABLE+DMA32+HIGHMEM)
  *       0xf    => BAD (MOVABLE+DMA32+HIGHMEM+DMA)
  *
- * GFP_ZONES_SHIFT must be <= 2 on 32 bit platforms.
  */
 
 #if defined(CONFIG_ZONE_DEVICE) && (MAX_NR_ZONES-1) <= 4
@@ -342,19 +347,21 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
 #define GFP_ZONES_SHIFT ZONES_SHIFT
 #endif
 
-#if 16 * GFP_ZONES_SHIFT > BITS_PER_LONG
-#error GFP_ZONES_SHIFT too large to create GFP_ZONE_TABLE integer
+#if !defined(CONFIG_64BITS) && GFP_ZONES_SHIFT > 2
+#define GFP_ZONE_TABLE_CAST unsigned long long
+#else
+#define GFP_ZONE_TABLE_CAST unsigned long
 #endif
 
 #define GFP_ZONE_TABLE ( \
-	(ZONE_NORMAL << 0 * GFP_ZONES_SHIFT)				       \
-	| (OPT_ZONE_DMA << ___GFP_DMA * GFP_ZONES_SHIFT)		       \
-	| (OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * GFP_ZONES_SHIFT)	       \
-	| (OPT_ZONE_DMA32 << ___GFP_DMA32 * GFP_ZONES_SHIFT)		       \
-	| (ZONE_NORMAL << ___GFP_MOVABLE * GFP_ZONES_SHIFT)		       \
-	| (OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * GFP_ZONES_SHIFT)    \
-	| (ZONE_MOVABLE << (___GFP_MOVABLE | ___GFP_HIGHMEM) * GFP_ZONES_SHIFT)\
-	| (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * GFP_ZONES_SHIFT)\
+	((GFP_ZONE_TABLE_CAST) ZONE_NORMAL << 0 * GFP_ZONES_SHIFT)					\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_DMA << ___GFP_DMA * GFP_ZONES_SHIFT)				\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * GFP_ZONES_SHIFT)			\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_DMA32 << ___GFP_DMA32 * GFP_ZONES_SHIFT)			\
+	| ((GFP_ZONE_TABLE_CAST) ZONE_NORMAL << ___GFP_MOVABLE * GFP_ZONES_SHIFT)			\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * GFP_ZONES_SHIFT)	\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_CMA << (___GFP_MOVABLE | ___GFP_HIGHMEM) * GFP_ZONES_SHIFT)	\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * GFP_ZONES_SHIFT)	\
 )
 
 /*
diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 4429d25..c4cc86e 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -157,7 +157,7 @@ extern enum zone_type policy_zone;
 
 static inline void check_highest_zone(enum zone_type k)
 {
-	if (k > policy_zone && k != ZONE_MOVABLE)
+	if (k > policy_zone && k != ZONE_MOVABLE && !is_zone_cma_idx(k))
 		policy_zone = k;
 }
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e3f39af..87b344e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -334,6 +334,9 @@ enum zone_type {
 	ZONE_HIGHMEM,
 #endif
 	ZONE_MOVABLE,
+#ifdef CONFIG_CMA
+	ZONE_CMA,
+#endif
 #ifdef CONFIG_ZONE_DEVICE
 	ZONE_DEVICE,
 #endif
@@ -846,11 +849,37 @@ static inline int zone_movable_is_highmem(void)
 }
 #endif
 
+static inline int is_zone_cma_idx(enum zone_type idx)
+{
+#ifdef CONFIG_CMA
+	return idx == ZONE_CMA;
+#else
+	return 0;
+#endif
+}
+
+static inline int is_zone_cma(struct zone *zone)
+{
+	int zone_idx = zone_idx(zone);
+
+	return is_zone_cma_idx(zone_idx);
+}
+
+static inline int zone_cma_is_highmem(void)
+{
+#ifdef CONFIG_HIGHMEM
+	return 1;
+#else
+	return 0;
+#endif
+}
+
 static inline int is_highmem_idx(enum zone_type idx)
 {
 #ifdef CONFIG_HIGHMEM
 	return (idx == ZONE_HIGHMEM ||
-		(idx == ZONE_MOVABLE && zone_movable_is_highmem()));
+		(idx == ZONE_MOVABLE && zone_movable_is_highmem()) ||
+		(is_zone_cma_idx(idx) && zone_cma_is_highmem()));
 #else
 	return 0;
 #endif
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 4d6ec58..2ff89d4 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -19,7 +19,15 @@
 #define HIGHMEM_ZONE(xx)
 #endif
 
-#define FOR_ALL_ZONES(xx) DMA_ZONE(xx) DMA32_ZONE(xx) xx##_NORMAL, HIGHMEM_ZONE(xx) xx##_MOVABLE
+#ifdef CONFIG_CMA
+#define MOVABLE_ZONE(xx) xx##_MOVABLE,
+#define CMA_ZONE(xx) xx##_CMA
+#else
+#define MOVABLE_ZONE(xx) xx##_MOVABLE
+#define CMA_ZONE(xx)
+#endif
+
+#define FOR_ALL_ZONES(xx) DMA_ZONE(xx) DMA32_ZONE(xx) xx##_NORMAL, HIGHMEM_ZONE(xx) MOVABLE_ZONE(xx) CMA_ZONE(xx)
 
 enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		FOR_ALL_ZONES(PGALLOC),
diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
index cbdb90b..25bb8402 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -38,12 +38,20 @@
 #define IFDEF_ZONE_HIGHMEM(X)
 #endif
 
+#ifdef CONFIG_CMA
+#define IFDEF_ZONE_CMA(X, Y, Z) X Z
+#else
+#define IFDEF_ZONE_CMA(X, Y, Z) Y
+#endif
+
 #define ZONE_TYPE						\
 	IFDEF_ZONE_DMA(		EM (ZONE_DMA,	 "DMA"))	\
 	IFDEF_ZONE_DMA32(	EM (ZONE_DMA32,	 "DMA32"))	\
 				EM (ZONE_NORMAL, "Normal")	\
 	IFDEF_ZONE_HIGHMEM(	EM (ZONE_HIGHMEM,"HighMem"))	\
-				EMe(ZONE_MOVABLE,"Movable")
+	IFDEF_ZONE_CMA(		EM (ZONE_MOVABLE,"Movable"),	\
+				EMe(ZONE_MOVABLE,"Movable"),	\
+				EMe(ZONE_CMA,    "CMA"))
 
 /*
  * First define the enums in the above macros to be exported to userspace
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index b022284..0c94796 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1144,6 +1144,14 @@ unsigned int snapshot_additional_pages(struct zone *zone)
 {
 	unsigned int rtree, nodes;
 
+	/*
+	 * Estimation of needed pages for ZONE_CMA is already considered
+	 * when calculating other zones since span of ZONE_CMA is subset
+	 * of other zones.
+	 */
+	if (is_zone_cma(zone))
+		return 0;
+
 	rtree = nodes = DIV_ROUND_UP(zone->spanned_pages, BM_BITS_PER_BLOCK);
 	rtree += DIV_ROUND_UP(rtree * sizeof(struct rtree_node),
 			      LINKED_PAGE_DATA_SIZE);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 41266dc..6747dfe 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1885,6 +1885,9 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	if (zone_idx(zone) <= ZONE_NORMAL && !can_offline_normal(zone, nr_pages))
 		return -EINVAL;
 
+	if (is_zone_cma(zone))
+		return -EINVAL;
+
 	/* set above range as isolated */
 	ret = start_isolate_page_range(start_pfn, end_pfn,
 				       MIGRATE_MOVABLE, true);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a8310de..34db275 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -210,6 +210,9 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
 	 INT_MAX,
 #endif
 	 INT_MAX,
+#ifdef CONFIG_CMA
+	 INT_MAX,
+#endif
 };
 
 EXPORT_SYMBOL(totalram_pages);
@@ -226,6 +229,9 @@ static char * const zone_names[MAX_NR_ZONES] = {
 	 "HighMem",
 #endif
 	 "Movable",
+#ifdef CONFIG_CMA
+	 "CMA",
+#endif
 #ifdef CONFIG_ZONE_DEVICE
 	 "Device",
 #endif
@@ -5019,6 +5025,15 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	struct memblock_region *r = NULL, *tmp;
 #endif
 
+	/*
+	 * Physical pages for ZONE_CMA are belong to other zones now. They
+	 * are initialized when corresponding zone is initialized and they
+	 * will be moved to ZONE_CMA later. Zone information will also be
+	 * adjusted later.
+	 */
+	if (is_zone_cma_idx(zone))
+		return;
+
 	if (highest_memmap_pfn < end_pfn - 1)
 		highest_memmap_pfn = end_pfn - 1;
 
@@ -5451,7 +5466,7 @@ static void __init find_usable_zone_for_movable(void)
 {
 	int zone_index;
 	for (zone_index = MAX_NR_ZONES - 1; zone_index >= 0; zone_index--) {
-		if (zone_index == ZONE_MOVABLE)
+		if (zone_index == ZONE_MOVABLE || is_zone_cma_idx(zone_index))
 			continue;
 
 		if (arch_zone_highest_possible_pfn[zone_index] >
@@ -5661,6 +5676,8 @@ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
 						unsigned long *zholes_size)
 {
 	unsigned long realtotalpages = 0, totalpages = 0;
+	unsigned long zone_cma_start_pfn = UINT_MAX;
+	unsigned long zone_cma_end_pfn = 0;
 	enum zone_type i;
 
 	for (i = 0; i < MAX_NR_ZONES; i++) {
@@ -5668,6 +5685,13 @@ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
 		unsigned long zone_start_pfn, zone_end_pfn;
 		unsigned long size, real_size;
 
+		if (is_zone_cma_idx(i)) {
+			zone->zone_start_pfn = zone_cma_start_pfn;
+			size = zone_cma_end_pfn - zone_cma_start_pfn;
+			real_size = 0;
+			goto init_zone;
+		}
+
 		size = zone_spanned_pages_in_node(pgdat->node_id, i,
 						  node_start_pfn,
 						  node_end_pfn,
@@ -5677,13 +5701,23 @@ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
 		real_size = size - zone_absent_pages_in_node(pgdat->node_id, i,
 						  node_start_pfn, node_end_pfn,
 						  zholes_size);
-		if (size)
+		if (size) {
 			zone->zone_start_pfn = zone_start_pfn;
-		else
+			if (zone_cma_start_pfn > zone_start_pfn)
+				zone_cma_start_pfn = zone_start_pfn;
+			if (zone_cma_end_pfn < zone_start_pfn + size)
+				zone_cma_end_pfn = zone_start_pfn + size;
+		} else
 			zone->zone_start_pfn = 0;
+
+init_zone:
 		zone->spanned_pages = size;
 		zone->present_pages = real_size;
 
+		/* Prevent to over-count node span */
+		if (is_zone_cma_idx(i))
+			size = 0;
+
 		totalpages += size;
 		realtotalpages += real_size;
 	}
@@ -5827,6 +5861,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		struct zone *zone = pgdat->node_zones + j;
 		unsigned long size, realsize, freesize, memmap_pages;
 		unsigned long zone_start_pfn = zone->zone_start_pfn;
+		bool zone_kernel = !is_highmem_idx(j) && !is_zone_cma_idx(j);
 
 		size = zone->spanned_pages;
 		realsize = freesize = zone->present_pages;
@@ -5837,7 +5872,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		 * and per-cpu initialisations
 		 */
 		memmap_pages = calc_memmap_size(size, realsize);
-		if (!is_highmem_idx(j)) {
+		if (zone_kernel) {
 			if (freesize >= memmap_pages) {
 				freesize -= memmap_pages;
 				if (memmap_pages)
@@ -5856,7 +5891,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 					zone_names[0], nr_memory_reserve);
 		}
 
-		if (!is_highmem_idx(j))
+		if (zone_kernel)
 			nr_kernel_pages += freesize;
 		/* Charge for highmem memmap if there are enough kernel pages */
 		else if (nr_kernel_pages > memmap_pages * 2)
@@ -5868,7 +5903,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		 * when the bootmem allocator frees pages into the buddy system.
 		 * And all highmem pages will be managed by the buddy system.
 		 */
-		zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
+		zone->managed_pages = zone_kernel ? freesize : realsize;
 #ifdef CONFIG_NUMA
 		zone->node = nid;
 #endif
@@ -5878,7 +5913,11 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		zone_seqlock_init(zone);
 		zone_pcp_init(zone);
 
-		if (!size)
+		/*
+		 * ZONE_CMA should be initialized even if it has no present
+		 * page now since pages will be moved to the zone later.
+		 */
+		if (!size && !is_zone_cma_idx(j))
 			continue;
 
 		set_pageblock_order();
@@ -6334,7 +6373,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 	start_pfn = find_min_pfn_with_active_regions();
 
 	for (i = 0; i < MAX_NR_ZONES; i++) {
-		if (i == ZONE_MOVABLE)
+		if (i == ZONE_MOVABLE || is_zone_cma_idx(i))
 			continue;
 
 		end_pfn = max(max_zone_pfn[i], start_pfn);
@@ -6353,7 +6392,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 	/* Print out the zone ranges */
 	pr_info("Zone ranges:\n");
 	for (i = 0; i < MAX_NR_ZONES; i++) {
-		if (i == ZONE_MOVABLE)
+		if (i == ZONE_MOVABLE || is_zone_cma_idx(i))
 			continue;
 		pr_info("  %-8s ", zone_names[i]);
 		if (arch_zone_lowest_possible_pfn[i] ==
@@ -7086,6 +7125,11 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return false;
+
+	/* ZONE_CMA never contains unmovable pages */
+	if (is_zone_cma(zone))
+		return false;
+
 	mt = get_pageblock_migratetype(page);
 	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
 		return false;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 3b2131e..ce5838b 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -915,8 +915,15 @@ int fragmentation_index(struct zone *zone, unsigned int order)
 #define TEXT_FOR_HIGHMEM(xx)
 #endif
 
+#ifdef CONFIG_CMA
+#define TEXT_FOR_CMA(xx) xx "_cma",
+#else
+#define TEXT_FOR_CMA(xx)
+#endif
+
 #define TEXTS_FOR_ZONES(xx) TEXT_FOR_DMA(xx) TEXT_FOR_DMA32(xx) xx "_normal", \
-					TEXT_FOR_HIGHMEM(xx) xx "_movable",
+					TEXT_FOR_HIGHMEM(xx) xx "_movable", \
+					TEXT_FOR_CMA(xx)
 
 const char * const vmstat_text[] = {
 	/* enum zone_stat_item countes */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v5 2/6] mm/cma: introduce new zone, ZONE_CMA
@ 2016-08-29  5:07   ` js1304
  0 siblings, 0 replies; 54+ messages in thread
From: js1304 @ 2016-08-29  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Attached cover-letter:

This series try to solve problems of current CMA implementation.

CMA is introduced to provide physically contiguous pages at runtime
without exclusive reserved memory area. But, current implementation
works like as previous reserved memory approach, because freepages
on CMA region are used only if there is no movable freepage. In other
words, freepages on CMA region are only used as fallback. In that
situation where freepages on CMA region are used as fallback, kswapd
would be woken up easily since there is no unmovable and reclaimable
freepage, too. If kswapd starts to reclaim memory, fallback allocation
to MIGRATE_CMA doesn't occur any more since movable freepages are
already refilled by kswapd and then most of freepage on CMA are left
to be in free. This situation looks like exclusive reserved memory case.

In my experiment, I found that if system memory has 1024 MB memory and
512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
free memory is left. Detailed reason is that for keeping enough free
memory for unmovable and reclaimable allocation, kswapd uses below
equation when calculating free memory and it easily go under the watermark.

Free memory for unmovable and reclaimable = Free total - Free CMA pages

This is derivated from the property of CMA freepage that CMA freepage
can't be used for unmovable and reclaimable allocation.

Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
is lower than low watermark and tries to make free memory until
(FreeTotal - FreeCMA) is higher than high watermark. That results
in that FreeTotal is moving around 512MB boundary consistently. It
then means that we can't utilize full memory capacity.

To fix this problem, I submitted some patches [1] about 10 months ago,
but, found some more problems to be fixed before solving this problem.
It requires many hooks in allocator hotpath so some developers doesn't
like it. Instead, some of them suggest different approach [2] to fix
all the problems related to CMA, that is, introducing a new zone to deal
with free CMA pages. I agree that it is the best way to go so implement
here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
decide to add a new zone rather than piggyback on ZONE_MOVABLE since
they have some differences. First, reserved CMA pages should not be
offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
to distiguish hotpluggable memory and reserved memory for CMA in the same
zone. It would make memory hotplug code which is already complicated
more complicated. Second, cma_alloc() can be called more frequently
than memory hotplug operation and possibly we need to control
allocation rate of ZONE_CMA to optimize latency in the future.
In this case, separate zone approach is easy to modify. Third, I'd
like to see statistics for CMA, separately. Sometimes, we need to debug
why cma_alloc() is failed and separate statistics would be more helpful
in this situtaion.

Anyway, this patchset solves four problems related to CMA implementation.

1) Utilization problem
As mentioned above, we can't utilize full memory capacity due to the
limitation of CMA freepage and fallback policy. This patchset implements
a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
typed allocation is used for page cache and anonymous pages which
occupies most of memory usage in normal case so we can utilize full
memory capacity. Below is the experiment result about this problem.

8 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16

<Before this series>
CMA reserve:            0 MB            512 MB
Elapsed-time:           92.4		186.5
pswpin:                 82		18647
pswpout:                160		69839

<After this series>
CMA reserve:            0 MB            512 MB
Elapsed-time:           93.1		93.4
pswpin:                 84		46
pswpout:                183		92

FYI, there is another attempt [3] trying to solve this problem in lkml.
And, as far as I know, Qualcomm also has out-of-tree solution for this
problem.

2) Reclaim problem
Currently, there is no logic to distinguish CMA pages in reclaim path.
If reclaim is initiated for unmovable and reclaimable allocation,
reclaiming CMA pages doesn't help to satisfy the request and reclaiming
CMA page is just waste. By managing CMA pages in the new zone, we can
skip to reclaim ZONE_CMA completely if it is unnecessary.

3) Atomic allocation failure problem
Kswapd isn't started to reclaim pages when allocation request is movable
type and there is enough free page in the CMA region. After bunch of
consecutive movable allocation requests, free pages in ordinary region
(not CMA region) would be exhausted without waking up kswapd. At that time,
if atomic unmovable allocation comes, it can't be successful since there
is not enough page in ordinary region. This problem is reported
by Aneesh [4] and can be solved by this patchset.

4) Inefficiently work of compaction
Usual high-order allocation request is unmovable type and it cannot
be serviced from CMA area. In compaction, migration scanner doesn't
distinguish migratable pages on the CMA area and do migration.
In this case, even if we make high-order page on that region, it
cannot be used due to type mismatch. This patch will solve this problem
by separating CMA pages from ordinary zones.

[1] https://lkml.org/lkml/2014/5/28/64
[2] https://lkml.org/lkml/2014/11/4/55
[3] https://lkml.org/lkml/2014/10/15/623
[4] http://www.spinics.net/lists/linux-mm/msg100562.html
[5] https://lkml.org/lkml/2014/5/30/320

For this patch:

Currently, reserved pages for CMA are managed together with normal pages.
To distinguish them, we used migratetype, MIGRATE_CMA, and
do special handlings for this migratetype. But, it turns out that
there are too many problems with this approach and to fix all of them
needs many more hooks to page allocation and reclaim path so
some developers express their discomfort and problems on CMA aren't fixed
for a long time.

To terminate this situation and fix CMA problems, this patch implements
ZONE_CMA. Reserved pages for CMA will be managed in this new zone. This
approach will remove all exisiting hooks for MIGRATE_CMA and many
problems related to CMA implementation will be solved.

This patch only add basic infrastructure of ZONE_CMA. In the following
patch, ZONE_CMA is actually populated and used.

Adding a new zone could cause two possible problems. One is the overflow
of page flags and the other is GFP_ZONES_TABLE issue.

Following is page-flags layout described in page-flags-layout.h.

1. No sparsemem or sparsemem vmemmap: |       NODE     | ZONE |             ... | FLAGS |
2.      " plus space for last_cpupid: |       NODE     | ZONE | LAST_CPUPID ... | FLAGS |
3. classic sparse with space for node:| SECTION | NODE | ZONE |             ... | FLAGS |
4.      " plus space for last_cpupid: | SECTION | NODE | ZONE | LAST_CPUPID ... | FLAGS |
5. classic sparse no space for node:  | SECTION |     ZONE    | ... | FLAGS |

There is no problem in #1, #2 configurations for 64-bit system. There are
enough room even for extremiely large x86_64 system. 32-bit system would
not have many nodes so it would have no problem, too.
System with #3, #4, #5 configurations could be affected by this zone
addition, but, thanks to recent THP rework which reduce one page flag,
problem surface would be small. In some configurations, problem is
still possible, but, it highly depends on individual configuration
so impact cannot be easily estimated. I guess that usual system
with CONFIG_CMA would not be affected. If there is a problem,
we can adjust section width or node width for that architecture.

Currently, GFP_ZONES_TABLE is 32-bit value for 32-bit bit operation
in the 32-bit system. If we add one more zone, it will be 48-bit and
32-bit bit operation cannot be possible. Although it will cause slight
overhead, there is no other way so this patch relax GFP_ZONES_TABLE's
32-bit limitation. 32-bit System with CONFIG_CMA will be affected by
this change but it would be marginal.

Note that there are many checkpatch warnings but I think that current
code is better for readability than fixing them up.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 arch/x86/mm/highmem_32.c          |  8 +++++
 include/linux/gfp.h               | 29 +++++++++++-------
 include/linux/mempolicy.h         |  2 +-
 include/linux/mmzone.h            | 31 +++++++++++++++++++-
 include/linux/vm_event_item.h     | 10 ++++++-
 include/trace/events/compaction.h | 10 ++++++-
 kernel/power/snapshot.c           |  8 +++++
 mm/memory_hotplug.c               |  3 ++
 mm/page_alloc.c                   | 62 +++++++++++++++++++++++++++++++++------
 mm/vmstat.c                       |  9 +++++-
 10 files changed, 147 insertions(+), 25 deletions(-)

diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index 6d18b70..52a14da 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -120,6 +120,14 @@ void __init set_highmem_pages_init(void)
 		if (!is_highmem(zone))
 			continue;
 
+		/*
+		 * ZONE_CMA is a special zone that should not be
+		 * participated in initialization because it's pages
+		 * would be initialized by initialization of other zones.
+		 */
+		if (is_zone_cma(zone))
+			continue;
+
 		zone_start_pfn = zone->zone_start_pfn;
 		zone_end_pfn = zone_start_pfn + zone->spanned_pages;
 
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index f8041f9de..b86e0c2 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -302,6 +302,12 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
 #define OPT_ZONE_DMA32 ZONE_NORMAL
 #endif
 
+#ifdef CONFIG_CMA
+#define OPT_ZONE_CMA ZONE_CMA
+#else
+#define OPT_ZONE_CMA ZONE_MOVABLE
+#endif
+
 /*
  * GFP_ZONE_TABLE is a word size bitstring that is used for looking up the
  * zone to use given the lowest 4 bits of gfp_t. Entries are ZONE_SHIFT long
@@ -332,7 +338,6 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
  *       0xe    => BAD (MOVABLE+DMA32+HIGHMEM)
  *       0xf    => BAD (MOVABLE+DMA32+HIGHMEM+DMA)
  *
- * GFP_ZONES_SHIFT must be <= 2 on 32 bit platforms.
  */
 
 #if defined(CONFIG_ZONE_DEVICE) && (MAX_NR_ZONES-1) <= 4
@@ -342,19 +347,21 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags)
 #define GFP_ZONES_SHIFT ZONES_SHIFT
 #endif
 
-#if 16 * GFP_ZONES_SHIFT > BITS_PER_LONG
-#error GFP_ZONES_SHIFT too large to create GFP_ZONE_TABLE integer
+#if !defined(CONFIG_64BITS) && GFP_ZONES_SHIFT > 2
+#define GFP_ZONE_TABLE_CAST unsigned long long
+#else
+#define GFP_ZONE_TABLE_CAST unsigned long
 #endif
 
 #define GFP_ZONE_TABLE ( \
-	(ZONE_NORMAL << 0 * GFP_ZONES_SHIFT)				       \
-	| (OPT_ZONE_DMA << ___GFP_DMA * GFP_ZONES_SHIFT)		       \
-	| (OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * GFP_ZONES_SHIFT)	       \
-	| (OPT_ZONE_DMA32 << ___GFP_DMA32 * GFP_ZONES_SHIFT)		       \
-	| (ZONE_NORMAL << ___GFP_MOVABLE * GFP_ZONES_SHIFT)		       \
-	| (OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * GFP_ZONES_SHIFT)    \
-	| (ZONE_MOVABLE << (___GFP_MOVABLE | ___GFP_HIGHMEM) * GFP_ZONES_SHIFT)\
-	| (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * GFP_ZONES_SHIFT)\
+	((GFP_ZONE_TABLE_CAST) ZONE_NORMAL << 0 * GFP_ZONES_SHIFT)					\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_DMA << ___GFP_DMA * GFP_ZONES_SHIFT)				\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * GFP_ZONES_SHIFT)			\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_DMA32 << ___GFP_DMA32 * GFP_ZONES_SHIFT)			\
+	| ((GFP_ZONE_TABLE_CAST) ZONE_NORMAL << ___GFP_MOVABLE * GFP_ZONES_SHIFT)			\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * GFP_ZONES_SHIFT)	\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_CMA << (___GFP_MOVABLE | ___GFP_HIGHMEM) * GFP_ZONES_SHIFT)	\
+	| ((GFP_ZONE_TABLE_CAST) OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * GFP_ZONES_SHIFT)	\
 )
 
 /*
diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 4429d25..c4cc86e 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -157,7 +157,7 @@ extern enum zone_type policy_zone;
 
 static inline void check_highest_zone(enum zone_type k)
 {
-	if (k > policy_zone && k != ZONE_MOVABLE)
+	if (k > policy_zone && k != ZONE_MOVABLE && !is_zone_cma_idx(k))
 		policy_zone = k;
 }
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e3f39af..87b344e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -334,6 +334,9 @@ enum zone_type {
 	ZONE_HIGHMEM,
 #endif
 	ZONE_MOVABLE,
+#ifdef CONFIG_CMA
+	ZONE_CMA,
+#endif
 #ifdef CONFIG_ZONE_DEVICE
 	ZONE_DEVICE,
 #endif
@@ -846,11 +849,37 @@ static inline int zone_movable_is_highmem(void)
 }
 #endif
 
+static inline int is_zone_cma_idx(enum zone_type idx)
+{
+#ifdef CONFIG_CMA
+	return idx == ZONE_CMA;
+#else
+	return 0;
+#endif
+}
+
+static inline int is_zone_cma(struct zone *zone)
+{
+	int zone_idx = zone_idx(zone);
+
+	return is_zone_cma_idx(zone_idx);
+}
+
+static inline int zone_cma_is_highmem(void)
+{
+#ifdef CONFIG_HIGHMEM
+	return 1;
+#else
+	return 0;
+#endif
+}
+
 static inline int is_highmem_idx(enum zone_type idx)
 {
 #ifdef CONFIG_HIGHMEM
 	return (idx == ZONE_HIGHMEM ||
-		(idx == ZONE_MOVABLE && zone_movable_is_highmem()));
+		(idx == ZONE_MOVABLE && zone_movable_is_highmem()) ||
+		(is_zone_cma_idx(idx) && zone_cma_is_highmem()));
 #else
 	return 0;
 #endif
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 4d6ec58..2ff89d4 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -19,7 +19,15 @@
 #define HIGHMEM_ZONE(xx)
 #endif
 
-#define FOR_ALL_ZONES(xx) DMA_ZONE(xx) DMA32_ZONE(xx) xx##_NORMAL, HIGHMEM_ZONE(xx) xx##_MOVABLE
+#ifdef CONFIG_CMA
+#define MOVABLE_ZONE(xx) xx##_MOVABLE,
+#define CMA_ZONE(xx) xx##_CMA
+#else
+#define MOVABLE_ZONE(xx) xx##_MOVABLE
+#define CMA_ZONE(xx)
+#endif
+
+#define FOR_ALL_ZONES(xx) DMA_ZONE(xx) DMA32_ZONE(xx) xx##_NORMAL, HIGHMEM_ZONE(xx) MOVABLE_ZONE(xx) CMA_ZONE(xx)
 
 enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		FOR_ALL_ZONES(PGALLOC),
diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
index cbdb90b..25bb8402 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -38,12 +38,20 @@
 #define IFDEF_ZONE_HIGHMEM(X)
 #endif
 
+#ifdef CONFIG_CMA
+#define IFDEF_ZONE_CMA(X, Y, Z) X Z
+#else
+#define IFDEF_ZONE_CMA(X, Y, Z) Y
+#endif
+
 #define ZONE_TYPE						\
 	IFDEF_ZONE_DMA(		EM (ZONE_DMA,	 "DMA"))	\
 	IFDEF_ZONE_DMA32(	EM (ZONE_DMA32,	 "DMA32"))	\
 				EM (ZONE_NORMAL, "Normal")	\
 	IFDEF_ZONE_HIGHMEM(	EM (ZONE_HIGHMEM,"HighMem"))	\
-				EMe(ZONE_MOVABLE,"Movable")
+	IFDEF_ZONE_CMA(		EM (ZONE_MOVABLE,"Movable"),	\
+				EMe(ZONE_MOVABLE,"Movable"),	\
+				EMe(ZONE_CMA,    "CMA"))
 
 /*
  * First define the enums in the above macros to be exported to userspace
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index b022284..0c94796 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1144,6 +1144,14 @@ unsigned int snapshot_additional_pages(struct zone *zone)
 {
 	unsigned int rtree, nodes;
 
+	/*
+	 * Estimation of needed pages for ZONE_CMA is already considered
+	 * when calculating other zones since span of ZONE_CMA is subset
+	 * of other zones.
+	 */
+	if (is_zone_cma(zone))
+		return 0;
+
 	rtree = nodes = DIV_ROUND_UP(zone->spanned_pages, BM_BITS_PER_BLOCK);
 	rtree += DIV_ROUND_UP(rtree * sizeof(struct rtree_node),
 			      LINKED_PAGE_DATA_SIZE);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 41266dc..6747dfe 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1885,6 +1885,9 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	if (zone_idx(zone) <= ZONE_NORMAL && !can_offline_normal(zone, nr_pages))
 		return -EINVAL;
 
+	if (is_zone_cma(zone))
+		return -EINVAL;
+
 	/* set above range as isolated */
 	ret = start_isolate_page_range(start_pfn, end_pfn,
 				       MIGRATE_MOVABLE, true);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a8310de..34db275 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -210,6 +210,9 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
 	 INT_MAX,
 #endif
 	 INT_MAX,
+#ifdef CONFIG_CMA
+	 INT_MAX,
+#endif
 };
 
 EXPORT_SYMBOL(totalram_pages);
@@ -226,6 +229,9 @@ static char * const zone_names[MAX_NR_ZONES] = {
 	 "HighMem",
 #endif
 	 "Movable",
+#ifdef CONFIG_CMA
+	 "CMA",
+#endif
 #ifdef CONFIG_ZONE_DEVICE
 	 "Device",
 #endif
@@ -5019,6 +5025,15 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	struct memblock_region *r = NULL, *tmp;
 #endif
 
+	/*
+	 * Physical pages for ZONE_CMA are belong to other zones now. They
+	 * are initialized when corresponding zone is initialized and they
+	 * will be moved to ZONE_CMA later. Zone information will also be
+	 * adjusted later.
+	 */
+	if (is_zone_cma_idx(zone))
+		return;
+
 	if (highest_memmap_pfn < end_pfn - 1)
 		highest_memmap_pfn = end_pfn - 1;
 
@@ -5451,7 +5466,7 @@ static void __init find_usable_zone_for_movable(void)
 {
 	int zone_index;
 	for (zone_index = MAX_NR_ZONES - 1; zone_index >= 0; zone_index--) {
-		if (zone_index == ZONE_MOVABLE)
+		if (zone_index == ZONE_MOVABLE || is_zone_cma_idx(zone_index))
 			continue;
 
 		if (arch_zone_highest_possible_pfn[zone_index] >
@@ -5661,6 +5676,8 @@ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
 						unsigned long *zholes_size)
 {
 	unsigned long realtotalpages = 0, totalpages = 0;
+	unsigned long zone_cma_start_pfn = UINT_MAX;
+	unsigned long zone_cma_end_pfn = 0;
 	enum zone_type i;
 
 	for (i = 0; i < MAX_NR_ZONES; i++) {
@@ -5668,6 +5685,13 @@ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
 		unsigned long zone_start_pfn, zone_end_pfn;
 		unsigned long size, real_size;
 
+		if (is_zone_cma_idx(i)) {
+			zone->zone_start_pfn = zone_cma_start_pfn;
+			size = zone_cma_end_pfn - zone_cma_start_pfn;
+			real_size = 0;
+			goto init_zone;
+		}
+
 		size = zone_spanned_pages_in_node(pgdat->node_id, i,
 						  node_start_pfn,
 						  node_end_pfn,
@@ -5677,13 +5701,23 @@ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
 		real_size = size - zone_absent_pages_in_node(pgdat->node_id, i,
 						  node_start_pfn, node_end_pfn,
 						  zholes_size);
-		if (size)
+		if (size) {
 			zone->zone_start_pfn = zone_start_pfn;
-		else
+			if (zone_cma_start_pfn > zone_start_pfn)
+				zone_cma_start_pfn = zone_start_pfn;
+			if (zone_cma_end_pfn < zone_start_pfn + size)
+				zone_cma_end_pfn = zone_start_pfn + size;
+		} else
 			zone->zone_start_pfn = 0;
+
+init_zone:
 		zone->spanned_pages = size;
 		zone->present_pages = real_size;
 
+		/* Prevent to over-count node span */
+		if (is_zone_cma_idx(i))
+			size = 0;
+
 		totalpages += size;
 		realtotalpages += real_size;
 	}
@@ -5827,6 +5861,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		struct zone *zone = pgdat->node_zones + j;
 		unsigned long size, realsize, freesize, memmap_pages;
 		unsigned long zone_start_pfn = zone->zone_start_pfn;
+		bool zone_kernel = !is_highmem_idx(j) && !is_zone_cma_idx(j);
 
 		size = zone->spanned_pages;
 		realsize = freesize = zone->present_pages;
@@ -5837,7 +5872,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		 * and per-cpu initialisations
 		 */
 		memmap_pages = calc_memmap_size(size, realsize);
-		if (!is_highmem_idx(j)) {
+		if (zone_kernel) {
 			if (freesize >= memmap_pages) {
 				freesize -= memmap_pages;
 				if (memmap_pages)
@@ -5856,7 +5891,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 					zone_names[0], nr_memory_reserve);
 		}
 
-		if (!is_highmem_idx(j))
+		if (zone_kernel)
 			nr_kernel_pages += freesize;
 		/* Charge for highmem memmap if there are enough kernel pages */
 		else if (nr_kernel_pages > memmap_pages * 2)
@@ -5868,7 +5903,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		 * when the bootmem allocator frees pages into the buddy system.
 		 * And all highmem pages will be managed by the buddy system.
 		 */
-		zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
+		zone->managed_pages = zone_kernel ? freesize : realsize;
 #ifdef CONFIG_NUMA
 		zone->node = nid;
 #endif
@@ -5878,7 +5913,11 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 		zone_seqlock_init(zone);
 		zone_pcp_init(zone);
 
-		if (!size)
+		/*
+		 * ZONE_CMA should be initialized even if it has no present
+		 * page now since pages will be moved to the zone later.
+		 */
+		if (!size && !is_zone_cma_idx(j))
 			continue;
 
 		set_pageblock_order();
@@ -6334,7 +6373,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 	start_pfn = find_min_pfn_with_active_regions();
 
 	for (i = 0; i < MAX_NR_ZONES; i++) {
-		if (i == ZONE_MOVABLE)
+		if (i == ZONE_MOVABLE || is_zone_cma_idx(i))
 			continue;
 
 		end_pfn = max(max_zone_pfn[i], start_pfn);
@@ -6353,7 +6392,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
 	/* Print out the zone ranges */
 	pr_info("Zone ranges:\n");
 	for (i = 0; i < MAX_NR_ZONES; i++) {
-		if (i == ZONE_MOVABLE)
+		if (i == ZONE_MOVABLE || is_zone_cma_idx(i))
 			continue;
 		pr_info("  %-8s ", zone_names[i]);
 		if (arch_zone_lowest_possible_pfn[i] ==
@@ -7086,6 +7125,11 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return false;
+
+	/* ZONE_CMA never contains unmovable pages */
+	if (is_zone_cma(zone))
+		return false;
+
 	mt = get_pageblock_migratetype(page);
 	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
 		return false;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 3b2131e..ce5838b 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -915,8 +915,15 @@ int fragmentation_index(struct zone *zone, unsigned int order)
 #define TEXT_FOR_HIGHMEM(xx)
 #endif
 
+#ifdef CONFIG_CMA
+#define TEXT_FOR_CMA(xx) xx "_cma",
+#else
+#define TEXT_FOR_CMA(xx)
+#endif
+
 #define TEXTS_FOR_ZONES(xx) TEXT_FOR_DMA(xx) TEXT_FOR_DMA32(xx) xx "_normal", \
-					TEXT_FOR_HIGHMEM(xx) xx "_movable",
+					TEXT_FOR_HIGHMEM(xx) xx "_movable", \
+					TEXT_FOR_CMA(xx)
 
 const char * const vmstat_text[] = {
 	/* enum zone_stat_item countes */
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v5 3/6] mm/cma: populate ZONE_CMA
  2016-08-29  5:07 ` js1304
@ 2016-08-29  5:07   ` js1304
  -1 siblings, 0 replies; 54+ messages in thread
From: js1304 @ 2016-08-29  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Until now, reserved pages for CMA are managed in the ordinary zones
where page's pfn are belong to. This approach has numorous problems
and fixing them isn't easy. (It is mentioned on previous patch.)
To fix this situation, ZONE_CMA is introduced in previous patch, but,
not yet populated. This patch implement population of ZONE_CMA
by stealing reserved pages from the ordinary zones.

Unlike previous implementation that kernel allocation request with
__GFP_MOVABLE could be serviced from CMA region, allocation request only
with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
approach. This is an inevitable design decision to use the zone
implementation because ZONE_CMA could contain highmem. Due to this
decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.

I don't think it would be a problem because most of file cache pages
and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
be proved by the fact that there are many systems with ZONE_HIGHMEM and
they work fine. Notable disadvantage is that we cannot use these pages
for blockdev file cache page, because it usually has __GFP_MOVABLE but
not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
cons. In my experience, blockdev file cache pages are one of the top
reason that causes cma_alloc() to fail temporarily. So, we can get more
guarantee of cma_alloc() success by discarding that case.

Implementation itself is very easy to understand. Steal when cma area is
initialized and recalculate various per zone stat/threshold.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/memory_hotplug.h |  3 ---
 include/linux/mm.h             |  1 +
 mm/cma.c                       | 56 ++++++++++++++++++++++++++++++++++++++----
 mm/internal.h                  |  3 +++
 mm/page_alloc.c                | 29 +++++++++++++++++++---
 5 files changed, 80 insertions(+), 12 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 01033fa..ea5af47 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -198,9 +198,6 @@ void put_online_mems(void);
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);
 
-extern void set_zone_contiguous(struct zone *zone);
-extern void clear_zone_contiguous(struct zone *zone);
-
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 /*
  * Stub functions for when hotplug is off
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9d85402..f45e0e4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1933,6 +1933,7 @@ extern void setup_per_cpu_pageset(void);
 
 extern void zone_pcp_update(struct zone *zone);
 extern void zone_pcp_reset(struct zone *zone);
+extern void setup_zone_pageset(struct zone *zone);
 
 /* page_alloc.c */
 extern int min_free_kbytes;
diff --git a/mm/cma.c b/mm/cma.c
index 384c2cb..d69bdf7 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -38,6 +38,7 @@
 #include <trace/events/cma.h>
 
 #include "cma.h"
+#include "internal.h"
 
 struct cma cma_areas[MAX_CMA_AREAS];
 unsigned cma_area_count;
@@ -116,10 +117,9 @@ static int __init cma_activate_area(struct cma *cma)
 		for (j = pageblock_nr_pages; j; --j, pfn++) {
 			WARN_ON_ONCE(!pfn_valid(pfn));
 			/*
-			 * alloc_contig_range requires the pfn range
-			 * specified to be in the same zone. Make this
-			 * simple by forcing the entire CMA resv range
-			 * to be in the same zone.
+			 * In init_cma_reserved_pageblock(), present_pages is
+			 * adjusted with assumption that all pages come from
+			 * a single zone. It could be fixed but not yet done.
 			 */
 			if (page_zone(pfn_to_page(pfn)) != zone)
 				goto err;
@@ -145,6 +145,28 @@ err:
 static int __init cma_init_reserved_areas(void)
 {
 	int i;
+	struct zone *zone;
+	unsigned long start_pfn = UINT_MAX, end_pfn = 0;
+
+	if (!cma_area_count)
+		return 0;
+
+	for (i = 0; i < cma_area_count; i++) {
+		if (start_pfn > cma_areas[i].base_pfn)
+			start_pfn = cma_areas[i].base_pfn;
+		if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
+			end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
+	}
+
+	for_each_zone(zone) {
+		if (!is_zone_cma(zone))
+			continue;
+
+		/* ZONE_CMA doesn't need to exceed CMA region */
+		zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
+		zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
+					zone->zone_start_pfn;
+	}
 
 	for (i = 0; i < cma_area_count; i++) {
 		int ret = cma_activate_area(&cma_areas[i]);
@@ -153,9 +175,33 @@ static int __init cma_init_reserved_areas(void)
 			return ret;
 	}
 
+	/*
+	 * Reserved pages for ZONE_CMA are now activated and this would change
+	 * ZONE_CMA's managed page counter and other zone's present counter.
+	 * We need to re-calculate various zone information that depends on
+	 * this initialization.
+	 */
+	build_all_zonelists(NULL, NULL);
+	for_each_populated_zone(zone) {
+		zone_pcp_update(zone);
+		set_zone_contiguous(zone);
+	}
+
+	/*
+	 * We need to re-init per zone wmark by calling
+	 * init_per_zone_wmark_min() but doesn't call here because it is
+	 * registered on core_initcall and it will be called later than us.
+	 */
+	for_each_populated_zone(zone) {
+		if (!is_zone_cma(zone))
+			continue;
+
+		setup_zone_pageset(zone);
+	}
+
 	return 0;
 }
-core_initcall(cma_init_reserved_areas);
+pure_initcall(cma_init_reserved_areas);
 
 /**
  * cma_init_reserved_mem() - create custom contiguous area from reserved memory
diff --git a/mm/internal.h b/mm/internal.h
index 5214bf8..3d3f052 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -156,6 +156,9 @@ extern void post_alloc_hook(struct page *page, unsigned int order,
 					gfp_t gfp_flags);
 extern int user_min_free_kbytes;
 
+extern void set_zone_contiguous(struct zone *zone);
+extern void clear_zone_contiguous(struct zone *zone);
+
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 34db275..91fb172 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1610,16 +1610,38 @@ void __init page_alloc_init_late(void)
 }
 
 #ifdef CONFIG_CMA
+static void __init adjust_present_page_count(struct page *page, long count)
+{
+	struct zone *zone = page_zone(page);
+
+	/* We don't need to hold a lock since it is boot-up process */
+	zone->present_pages += count;
+}
+
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
 {
 	unsigned i = pageblock_nr_pages;
+	unsigned long pfn = page_to_pfn(page);
 	struct page *p = page;
+	int nid = page_to_nid(page);
+
+	/*
+	 * ZONE_CMA will steal present pages from other zones by changing
+	 * page links so page_zone() is changed. Before that,
+	 * we need to adjust previous zone's page count first.
+	 */
+	adjust_present_page_count(page, -pageblock_nr_pages);
 
 	do {
 		__ClearPageReserved(p);
 		set_page_count(p, 0);
-	} while (++p, --i);
+
+		/* Steal pages from other zones */
+		set_page_links(p, ZONE_CMA, nid, pfn);
+	} while (++p, ++pfn, --i);
+
+	adjust_present_page_count(page, pageblock_nr_pages);
 
 	set_pageblock_migratetype(page, MIGRATE_CMA);
 
@@ -4824,7 +4846,6 @@ static void build_zonelists(pg_data_t *pgdat)
  */
 static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
 static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
-static void setup_zone_pageset(struct zone *zone);
 
 /*
  * Global mutex to protect against size modification of zonelists
@@ -5254,7 +5275,7 @@ static void __meminit zone_pageset_init(struct zone *zone, int cpu)
 	pageset_set_high_and_batch(zone, pcp);
 }
 
-static void __meminit setup_zone_pageset(struct zone *zone)
+void __meminit setup_zone_pageset(struct zone *zone)
 {
 	int cpu;
 	zone->pageset = alloc_percpu(struct per_cpu_pageset);
@@ -7433,7 +7454,7 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
 }
 #endif
 
-#ifdef CONFIG_MEMORY_HOTPLUG
+#if defined CONFIG_MEMORY_HOTPLUG || defined CONFIG_CMA
 /*
  * The zone indicated has a new number of managed_pages; batch sizes and percpu
  * page high values need to be recalulated.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v5 3/6] mm/cma: populate ZONE_CMA
@ 2016-08-29  5:07   ` js1304
  0 siblings, 0 replies; 54+ messages in thread
From: js1304 @ 2016-08-29  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Until now, reserved pages for CMA are managed in the ordinary zones
where page's pfn are belong to. This approach has numorous problems
and fixing them isn't easy. (It is mentioned on previous patch.)
To fix this situation, ZONE_CMA is introduced in previous patch, but,
not yet populated. This patch implement population of ZONE_CMA
by stealing reserved pages from the ordinary zones.

Unlike previous implementation that kernel allocation request with
__GFP_MOVABLE could be serviced from CMA region, allocation request only
with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
approach. This is an inevitable design decision to use the zone
implementation because ZONE_CMA could contain highmem. Due to this
decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.

I don't think it would be a problem because most of file cache pages
and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
be proved by the fact that there are many systems with ZONE_HIGHMEM and
they work fine. Notable disadvantage is that we cannot use these pages
for blockdev file cache page, because it usually has __GFP_MOVABLE but
not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
cons. In my experience, blockdev file cache pages are one of the top
reason that causes cma_alloc() to fail temporarily. So, we can get more
guarantee of cma_alloc() success by discarding that case.

Implementation itself is very easy to understand. Steal when cma area is
initialized and recalculate various per zone stat/threshold.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/memory_hotplug.h |  3 ---
 include/linux/mm.h             |  1 +
 mm/cma.c                       | 56 ++++++++++++++++++++++++++++++++++++++----
 mm/internal.h                  |  3 +++
 mm/page_alloc.c                | 29 +++++++++++++++++++---
 5 files changed, 80 insertions(+), 12 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 01033fa..ea5af47 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -198,9 +198,6 @@ void put_online_mems(void);
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);
 
-extern void set_zone_contiguous(struct zone *zone);
-extern void clear_zone_contiguous(struct zone *zone);
-
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 /*
  * Stub functions for when hotplug is off
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9d85402..f45e0e4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1933,6 +1933,7 @@ extern void setup_per_cpu_pageset(void);
 
 extern void zone_pcp_update(struct zone *zone);
 extern void zone_pcp_reset(struct zone *zone);
+extern void setup_zone_pageset(struct zone *zone);
 
 /* page_alloc.c */
 extern int min_free_kbytes;
diff --git a/mm/cma.c b/mm/cma.c
index 384c2cb..d69bdf7 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -38,6 +38,7 @@
 #include <trace/events/cma.h>
 
 #include "cma.h"
+#include "internal.h"
 
 struct cma cma_areas[MAX_CMA_AREAS];
 unsigned cma_area_count;
@@ -116,10 +117,9 @@ static int __init cma_activate_area(struct cma *cma)
 		for (j = pageblock_nr_pages; j; --j, pfn++) {
 			WARN_ON_ONCE(!pfn_valid(pfn));
 			/*
-			 * alloc_contig_range requires the pfn range
-			 * specified to be in the same zone. Make this
-			 * simple by forcing the entire CMA resv range
-			 * to be in the same zone.
+			 * In init_cma_reserved_pageblock(), present_pages is
+			 * adjusted with assumption that all pages come from
+			 * a single zone. It could be fixed but not yet done.
 			 */
 			if (page_zone(pfn_to_page(pfn)) != zone)
 				goto err;
@@ -145,6 +145,28 @@ err:
 static int __init cma_init_reserved_areas(void)
 {
 	int i;
+	struct zone *zone;
+	unsigned long start_pfn = UINT_MAX, end_pfn = 0;
+
+	if (!cma_area_count)
+		return 0;
+
+	for (i = 0; i < cma_area_count; i++) {
+		if (start_pfn > cma_areas[i].base_pfn)
+			start_pfn = cma_areas[i].base_pfn;
+		if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
+			end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
+	}
+
+	for_each_zone(zone) {
+		if (!is_zone_cma(zone))
+			continue;
+
+		/* ZONE_CMA doesn't need to exceed CMA region */
+		zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
+		zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
+					zone->zone_start_pfn;
+	}
 
 	for (i = 0; i < cma_area_count; i++) {
 		int ret = cma_activate_area(&cma_areas[i]);
@@ -153,9 +175,33 @@ static int __init cma_init_reserved_areas(void)
 			return ret;
 	}
 
+	/*
+	 * Reserved pages for ZONE_CMA are now activated and this would change
+	 * ZONE_CMA's managed page counter and other zone's present counter.
+	 * We need to re-calculate various zone information that depends on
+	 * this initialization.
+	 */
+	build_all_zonelists(NULL, NULL);
+	for_each_populated_zone(zone) {
+		zone_pcp_update(zone);
+		set_zone_contiguous(zone);
+	}
+
+	/*
+	 * We need to re-init per zone wmark by calling
+	 * init_per_zone_wmark_min() but doesn't call here because it is
+	 * registered on core_initcall and it will be called later than us.
+	 */
+	for_each_populated_zone(zone) {
+		if (!is_zone_cma(zone))
+			continue;
+
+		setup_zone_pageset(zone);
+	}
+
 	return 0;
 }
-core_initcall(cma_init_reserved_areas);
+pure_initcall(cma_init_reserved_areas);
 
 /**
  * cma_init_reserved_mem() - create custom contiguous area from reserved memory
diff --git a/mm/internal.h b/mm/internal.h
index 5214bf8..3d3f052 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -156,6 +156,9 @@ extern void post_alloc_hook(struct page *page, unsigned int order,
 					gfp_t gfp_flags);
 extern int user_min_free_kbytes;
 
+extern void set_zone_contiguous(struct zone *zone);
+extern void clear_zone_contiguous(struct zone *zone);
+
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 34db275..91fb172 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1610,16 +1610,38 @@ void __init page_alloc_init_late(void)
 }
 
 #ifdef CONFIG_CMA
+static void __init adjust_present_page_count(struct page *page, long count)
+{
+	struct zone *zone = page_zone(page);
+
+	/* We don't need to hold a lock since it is boot-up process */
+	zone->present_pages += count;
+}
+
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
 {
 	unsigned i = pageblock_nr_pages;
+	unsigned long pfn = page_to_pfn(page);
 	struct page *p = page;
+	int nid = page_to_nid(page);
+
+	/*
+	 * ZONE_CMA will steal present pages from other zones by changing
+	 * page links so page_zone() is changed. Before that,
+	 * we need to adjust previous zone's page count first.
+	 */
+	adjust_present_page_count(page, -pageblock_nr_pages);
 
 	do {
 		__ClearPageReserved(p);
 		set_page_count(p, 0);
-	} while (++p, --i);
+
+		/* Steal pages from other zones */
+		set_page_links(p, ZONE_CMA, nid, pfn);
+	} while (++p, ++pfn, --i);
+
+	adjust_present_page_count(page, pageblock_nr_pages);
 
 	set_pageblock_migratetype(page, MIGRATE_CMA);
 
@@ -4824,7 +4846,6 @@ static void build_zonelists(pg_data_t *pgdat)
  */
 static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
 static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
-static void setup_zone_pageset(struct zone *zone);
 
 /*
  * Global mutex to protect against size modification of zonelists
@@ -5254,7 +5275,7 @@ static void __meminit zone_pageset_init(struct zone *zone, int cpu)
 	pageset_set_high_and_batch(zone, pcp);
 }
 
-static void __meminit setup_zone_pageset(struct zone *zone)
+void __meminit setup_zone_pageset(struct zone *zone)
 {
 	int cpu;
 	zone->pageset = alloc_percpu(struct per_cpu_pageset);
@@ -7433,7 +7454,7 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
 }
 #endif
 
-#ifdef CONFIG_MEMORY_HOTPLUG
+#if defined CONFIG_MEMORY_HOTPLUG || defined CONFIG_CMA
 /*
  * The zone indicated has a new number of managed_pages; batch sizes and percpu
  * page high values need to be recalulated.
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v5 4/6] mm/cma: remove ALLOC_CMA
  2016-08-29  5:07 ` js1304
@ 2016-08-29  5:07   ` js1304
  -1 siblings, 0 replies; 54+ messages in thread
From: js1304 @ 2016-08-29  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Now, all reserved pages for CMA region are belong to the ZONE_CMA
and it only serves for GFP_HIGHUSER_MOVABLE. Therefore, we don't need to
consider ALLOC_CMA at all.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/compaction.c |  4 +---
 mm/internal.h   |  1 -
 mm/page_alloc.c | 28 +++-------------------------
 3 files changed, 4 insertions(+), 29 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 29f6c49..4532905 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1401,14 +1401,12 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 	 * if compaction succeeds.
 	 * For costly orders, we require low watermark instead of min for
 	 * compaction to proceed to increase its chances.
-	 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
-	 * suitable migration targets
 	 */
 	watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
 				low_wmark_pages(zone) : min_wmark_pages(zone);
 	watermark += compact_gap(order);
 	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
-						ALLOC_CMA, wmark_target))
+						0, wmark_target))
 		return COMPACT_SKIPPED;
 
 	/*
diff --git a/mm/internal.h b/mm/internal.h
index 3d3f052..01d06bb 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -466,7 +466,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_HARDER		0x10 /* try to alloc harder */
 #define ALLOC_HIGH		0x20 /* __GFP_HIGH set */
 #define ALLOC_CPUSET		0x40 /* check for correct cpuset */
-#define ALLOC_CMA		0x80 /* allow allocations from CMA areas */
 
 enum ttu_flags;
 struct tlbflush_unmap_batch;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 91fb172..16ba1fe 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2565,7 +2565,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
 		 * exists.
 		 */
 		watermark = min_wmark_pages(zone) + (1UL << order);
-		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
+		if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
 			return 0;
 
 		__mod_zone_freepage_state(zone, -(1UL << order), mt);
@@ -2808,12 +2808,6 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 	else
 		min -= min / 4;
 
-#ifdef CONFIG_CMA
-	/* If allocation can't use CMA areas don't use free CMA pages */
-	if (!(alloc_flags & ALLOC_CMA))
-		free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
-#endif
-
 	/*
 	 * Check watermarks for an order-0 allocation request. If these
 	 * are not met, then a high-order request also cannot go ahead
@@ -2843,10 +2837,8 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 		}
 
 #ifdef CONFIG_CMA
-		if ((alloc_flags & ALLOC_CMA) &&
-		    !list_empty(&area->free_list[MIGRATE_CMA])) {
+		if (!list_empty(&area->free_list[MIGRATE_CMA]))
 			return true;
-		}
 #endif
 	}
 	return false;
@@ -2863,13 +2855,6 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
 		unsigned long mark, int classzone_idx, unsigned int alloc_flags)
 {
 	long free_pages = zone_page_state(z, NR_FREE_PAGES);
-	long cma_pages = 0;
-
-#ifdef CONFIG_CMA
-	/* If allocation can't use CMA areas don't use free CMA pages */
-	if (!(alloc_flags & ALLOC_CMA))
-		cma_pages = zone_page_state(z, NR_FREE_CMA_PAGES);
-#endif
 
 	/*
 	 * Fast check for order-0 only. If this fails then the reserves
@@ -2878,7 +2863,7 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
 	 * the caller is !atomic then it'll uselessly search the free
 	 * list. That corner case is then slower but it is harmless.
 	 */
-	if (!order && (free_pages - cma_pages) > mark + z->lowmem_reserve[classzone_idx])
+	if (!order && free_pages > mark + z->lowmem_reserve[classzone_idx])
 		return true;
 
 	return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags,
@@ -3355,10 +3340,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 	} else if (unlikely(rt_task(current)) && !in_interrupt())
 		alloc_flags |= ALLOC_HARDER;
 
-#ifdef CONFIG_CMA
-	if (gfpflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
-#endif
 	return alloc_flags;
 }
 
@@ -3727,9 +3708,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	if (unlikely(!zonelist->_zonerefs->zone))
 		return NULL;
 
-	if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
-
 retry_cpuset:
 	cpuset_mems_cookie = read_mems_allowed_begin();
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v5 4/6] mm/cma: remove ALLOC_CMA
@ 2016-08-29  5:07   ` js1304
  0 siblings, 0 replies; 54+ messages in thread
From: js1304 @ 2016-08-29  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Now, all reserved pages for CMA region are belong to the ZONE_CMA
and it only serves for GFP_HIGHUSER_MOVABLE. Therefore, we don't need to
consider ALLOC_CMA at all.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/compaction.c |  4 +---
 mm/internal.h   |  1 -
 mm/page_alloc.c | 28 +++-------------------------
 3 files changed, 4 insertions(+), 29 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 29f6c49..4532905 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1401,14 +1401,12 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 	 * if compaction succeeds.
 	 * For costly orders, we require low watermark instead of min for
 	 * compaction to proceed to increase its chances.
-	 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
-	 * suitable migration targets
 	 */
 	watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
 				low_wmark_pages(zone) : min_wmark_pages(zone);
 	watermark += compact_gap(order);
 	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
-						ALLOC_CMA, wmark_target))
+						0, wmark_target))
 		return COMPACT_SKIPPED;
 
 	/*
diff --git a/mm/internal.h b/mm/internal.h
index 3d3f052..01d06bb 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -466,7 +466,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 #define ALLOC_HARDER		0x10 /* try to alloc harder */
 #define ALLOC_HIGH		0x20 /* __GFP_HIGH set */
 #define ALLOC_CPUSET		0x40 /* check for correct cpuset */
-#define ALLOC_CMA		0x80 /* allow allocations from CMA areas */
 
 enum ttu_flags;
 struct tlbflush_unmap_batch;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 91fb172..16ba1fe 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2565,7 +2565,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
 		 * exists.
 		 */
 		watermark = min_wmark_pages(zone) + (1UL << order);
-		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
+		if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
 			return 0;
 
 		__mod_zone_freepage_state(zone, -(1UL << order), mt);
@@ -2808,12 +2808,6 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 	else
 		min -= min / 4;
 
-#ifdef CONFIG_CMA
-	/* If allocation can't use CMA areas don't use free CMA pages */
-	if (!(alloc_flags & ALLOC_CMA))
-		free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
-#endif
-
 	/*
 	 * Check watermarks for an order-0 allocation request. If these
 	 * are not met, then a high-order request also cannot go ahead
@@ -2843,10 +2837,8 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 		}
 
 #ifdef CONFIG_CMA
-		if ((alloc_flags & ALLOC_CMA) &&
-		    !list_empty(&area->free_list[MIGRATE_CMA])) {
+		if (!list_empty(&area->free_list[MIGRATE_CMA]))
 			return true;
-		}
 #endif
 	}
 	return false;
@@ -2863,13 +2855,6 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
 		unsigned long mark, int classzone_idx, unsigned int alloc_flags)
 {
 	long free_pages = zone_page_state(z, NR_FREE_PAGES);
-	long cma_pages = 0;
-
-#ifdef CONFIG_CMA
-	/* If allocation can't use CMA areas don't use free CMA pages */
-	if (!(alloc_flags & ALLOC_CMA))
-		cma_pages = zone_page_state(z, NR_FREE_CMA_PAGES);
-#endif
 
 	/*
 	 * Fast check for order-0 only. If this fails then the reserves
@@ -2878,7 +2863,7 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
 	 * the caller is !atomic then it'll uselessly search the free
 	 * list. That corner case is then slower but it is harmless.
 	 */
-	if (!order && (free_pages - cma_pages) > mark + z->lowmem_reserve[classzone_idx])
+	if (!order && free_pages > mark + z->lowmem_reserve[classzone_idx])
 		return true;
 
 	return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags,
@@ -3355,10 +3340,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 	} else if (unlikely(rt_task(current)) && !in_interrupt())
 		alloc_flags |= ALLOC_HARDER;
 
-#ifdef CONFIG_CMA
-	if (gfpflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
-#endif
 	return alloc_flags;
 }
 
@@ -3727,9 +3708,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	if (unlikely(!zonelist->_zonerefs->zone))
 		return NULL;
 
-	if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
-
 retry_cpuset:
 	cpuset_mems_cookie = read_mems_allowed_begin();
 
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v5 5/6] mm/cma: remove MIGRATE_CMA
  2016-08-29  5:07 ` js1304
@ 2016-08-29  5:07   ` js1304
  -1 siblings, 0 replies; 54+ messages in thread
From: js1304 @ 2016-08-29  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Now, all reserved pages for CMA region are belong to the ZONE_CMA
and there is no other type of pages. Therefore, we don't need to
use MIGRATE_CMA to distinguish and handle differently for CMA pages
and ordinary pages. Remove MIGRATE_CMA.

Unfortunately, this patch make free CMA counter incorrect because
we count it when pages are on the MIGRATE_CMA. It will be fixed
by next patch. I can squash next patch here but it makes changes
complicated and hard to review so I separate that.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/gfp.h            |  3 +-
 include/linux/mmzone.h         | 24 ------------
 include/linux/page-isolation.h |  5 +--
 include/linux/vmstat.h         |  8 ----
 mm/cma.c                       |  2 +-
 mm/compaction.c                | 10 +----
 mm/hugetlb.c                   |  2 +-
 mm/memory_hotplug.c            |  7 ++--
 mm/page_alloc.c                | 89 ++++++++++++------------------------------
 mm/page_isolation.c            | 15 +++----
 mm/page_owner.c                |  6 +--
 mm/usercopy.c                  |  4 +-
 12 files changed, 43 insertions(+), 132 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index b86e0c2..815d756 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -553,8 +553,7 @@ static inline bool pm_suspended_storage(void)
 
 #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
 /* The below functions must be run on a range from a single zone. */
-extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      unsigned migratetype);
+extern int alloc_contig_range(unsigned long start, unsigned long end);
 extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
 #endif
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 87b344e..24e46ca 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -41,22 +41,6 @@ enum {
 	MIGRATE_RECLAIMABLE,
 	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
 	MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
-#ifdef CONFIG_CMA
-	/*
-	 * MIGRATE_CMA migration type is designed to mimic the way
-	 * ZONE_MOVABLE works.  Only movable pages can be allocated
-	 * from MIGRATE_CMA pageblocks and page allocator never
-	 * implicitly change migration type of MIGRATE_CMA pageblock.
-	 *
-	 * The way to use it is to change migratetype of a range of
-	 * pageblocks to MIGRATE_CMA which can be done by
-	 * __free_pageblock_cma() function.  What is important though
-	 * is that a range of pageblocks must be aligned to
-	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
-	 * a single pageblock.
-	 */
-	MIGRATE_CMA,
-#endif
 #ifdef CONFIG_MEMORY_ISOLATION
 	MIGRATE_ISOLATE,	/* can't allocate from here */
 #endif
@@ -66,14 +50,6 @@ enum {
 /* In mm/page_alloc.c; keep in sync also with show_migration_types() there */
 extern char * const migratetype_names[MIGRATE_TYPES];
 
-#ifdef CONFIG_CMA
-#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
-#  define is_migrate_cma_page(_page) (get_pageblock_migratetype(_page) == MIGRATE_CMA)
-#else
-#  define is_migrate_cma(migratetype) false
-#  define is_migrate_cma_page(_page) false
-#endif
-
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 047d647..1db9759 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -49,15 +49,14 @@ int move_freepages(struct zone *zone,
  */
 int
 start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			 unsigned migratetype, bool skip_hwpoisoned_pages);
+				bool skip_hwpoisoned_pages);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
  * target range is [start_pfn, end_pfn)
  */
 int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			unsigned migratetype);
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
 
 /*
  * Test all pages in [start_pfn, end_pfn) are isolated or not.
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 6137719..ac6db88 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -341,14 +341,6 @@ static inline void drain_zonestat(struct zone *zone,
 			struct per_cpu_pageset *pset) { }
 #endif		/* CONFIG_SMP */
 
-static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages,
-					     int migratetype)
-{
-	__mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
-	if (is_migrate_cma(migratetype))
-		__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
-}
-
 extern const char * const vmstat_text[];
 
 #endif /* _LINUX_VMSTAT_H */
diff --git a/mm/cma.c b/mm/cma.c
index d69bdf7..c1bae7f 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -450,7 +450,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align)
 
 		pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
 		mutex_lock(&cma_mutex);
-		ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
+		ret = alloc_contig_range(pfn, pfn + count);
 		mutex_unlock(&cma_mutex);
 		if (ret == 0) {
 			page = pfn_to_page(pfn);
diff --git a/mm/compaction.c b/mm/compaction.c
index 4532905..9c5da79 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -90,7 +90,7 @@ static void map_pages(struct list_head *list)
 
 static inline bool migrate_async_suitable(int migratetype)
 {
-	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
+	return migratetype == MIGRATE_MOVABLE;
 }
 
 #ifdef CONFIG_COMPACTION
@@ -1010,7 +1010,7 @@ static bool suitable_migration_target(struct page *page)
 			return false;
 	}
 
-	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
+	/* If the block is MIGRATE_MOVABLE, allow migration */
 	if (migrate_async_suitable(get_pageblock_migratetype(page)))
 		return true;
 
@@ -1331,12 +1331,6 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 		if (!list_empty(&area->free_list[migratetype]))
 			return COMPACT_SUCCESS;
 
-#ifdef CONFIG_CMA
-		/* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
-		if (migratetype == MIGRATE_MOVABLE &&
-			!list_empty(&area->free_list[MIGRATE_CMA]))
-			return COMPACT_SUCCESS;
-#endif
 		/*
 		 * Job done if allocation would steal freepages from
 		 * other migratetype buddy lists.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 87e11d8..6735ad5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1051,7 +1051,7 @@ static int __alloc_gigantic_page(unsigned long start_pfn,
 				unsigned long nr_pages)
 {
 	unsigned long end_pfn = start_pfn + nr_pages;
-	return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	return alloc_contig_range(start_pfn, end_pfn);
 }
 
 static bool pfn_range_valid_gigantic(struct zone *z,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6747dfe..0e438b1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1889,8 +1889,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		return -EINVAL;
 
 	/* set above range as isolated */
-	ret = start_isolate_page_range(start_pfn, end_pfn,
-				       MIGRATE_MOVABLE, true);
+	ret = start_isolate_page_range(start_pfn, end_pfn, true);
 	if (ret)
 		return ret;
 
@@ -1958,7 +1957,7 @@ repeat:
 	   We cannot do rollback at this point. */
 	offline_isolated_pages(start_pfn, end_pfn);
 	/* reset pagetype flags and makes migrate type to be MOVABLE */
-	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	undo_isolate_page_range(start_pfn, end_pfn);
 	/* removal success */
 	adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages);
 	zone->present_pages -= offlined_pages;
@@ -1995,7 +1994,7 @@ failed_removal:
 		 ((unsigned long long) end_pfn << PAGE_SHIFT) - 1);
 	memory_notify(MEM_CANCEL_OFFLINE, &arg);
 	/* pushback to free area */
-	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	undo_isolate_page_range(start_pfn, end_pfn);
 	return ret;
 }
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 16ba1fe..ca17de9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -131,8 +131,8 @@ gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
  * put on a pcplist. Used to avoid the pageblock migratetype lookup when
  * freeing from pcplists in most cases, at the cost of possibly becoming stale.
  * Also the migratetype set in the page does not necessarily match the pcplist
- * index, e.g. page might have MIGRATE_CMA set but be on a pcplist with any
- * other index - this ensures that it will be put on the correct CMA freelist.
+ * index, e.g. page might have MIGRATE_MOVABLE set but be on a pcplist with any
+ * other index - this ensures that it will be put on the correct freelist.
  */
 static inline int get_pcppage_migratetype(struct page *page)
 {
@@ -242,9 +242,6 @@ char * const migratetype_names[MIGRATE_TYPES] = {
 	"Movable",
 	"Reclaimable",
 	"HighAtomic",
-#ifdef CONFIG_CMA
-	"CMA",
-#endif
 #ifdef CONFIG_MEMORY_ISOLATION
 	"Isolate",
 #endif
@@ -676,7 +673,7 @@ static inline bool set_page_guard(struct zone *zone, struct page *page,
 	INIT_LIST_HEAD(&page->lru);
 	set_page_private(page, order);
 	/* Guard pages are not available for any usage */
-	__mod_zone_freepage_state(zone, -(1 << order), migratetype);
+	__mod_zone_page_state(zone, NR_FREE_PAGES, -(1 << order));
 
 	return true;
 }
@@ -697,7 +694,7 @@ static inline void clear_page_guard(struct zone *zone, struct page *page,
 
 	set_page_private(page, 0);
 	if (!is_migrate_isolate(migratetype))
-		__mod_zone_freepage_state(zone, (1 << order), migratetype);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, (1 << order));
 }
 #else
 struct page_ext_operations debug_guardpage_ops;
@@ -808,7 +805,7 @@ static inline void __free_one_page(struct page *page,
 
 	VM_BUG_ON(migratetype == -1);
 	if (likely(!is_migrate_isolate(migratetype)))
-		__mod_zone_freepage_state(zone, 1 << order, migratetype);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, 1 << order);
 
 	page_idx = pfn & ((1 << MAX_ORDER) - 1);
 
@@ -1618,7 +1615,7 @@ static void __init adjust_present_page_count(struct page *page, long count)
 	zone->present_pages += count;
 }
 
-/* Free whole pageblock and set its migration type to MIGRATE_CMA. */
+/* Free whole pageblock and set its migration type to MIGRATE_MOVABLE. */
 void __init init_cma_reserved_pageblock(struct page *page)
 {
 	unsigned i = pageblock_nr_pages;
@@ -1643,7 +1640,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
 
 	adjust_present_page_count(page, pageblock_nr_pages);
 
-	set_pageblock_migratetype(page, MIGRATE_CMA);
+	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 
 	if (pageblock_order >= MAX_ORDER) {
 		i = pageblock_nr_pages;
@@ -1870,25 +1867,11 @@ static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
 	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
-#ifdef CONFIG_CMA
-	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
-#endif
 #ifdef CONFIG_MEMORY_ISOLATION
 	[MIGRATE_ISOLATE]     = { MIGRATE_TYPES }, /* Never used */
 #endif
 };
 
-#ifdef CONFIG_CMA
-static struct page *__rmqueue_cma_fallback(struct zone *zone,
-					unsigned int order)
-{
-	return __rmqueue_smallest(zone, order, MIGRATE_CMA);
-}
-#else
-static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
-					unsigned int order) { return NULL; }
-#endif
-
 /*
  * Move the free pages in a range to the free lists of the requested type.
  * Note that start_page and end_pages are not aligned on a pageblock
@@ -2093,7 +2076,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
 	/* Yoink! */
 	mt = get_pageblock_migratetype(page);
 	if (mt != MIGRATE_HIGHATOMIC &&
-			!is_migrate_isolate(mt) && !is_migrate_cma(mt)) {
+			!is_migrate_isolate(mt)) {
 		zone->nr_reserved_highatomic += pageblock_nr_pages;
 		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
 		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC);
@@ -2196,9 +2179,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
 		/*
 		 * The pcppage_migratetype may differ from pageblock's
 		 * migratetype depending on the decisions in
-		 * find_suitable_fallback(). This is OK as long as it does not
-		 * differ for MIGRATE_CMA pageblocks. Those can be used as
-		 * fallback only via special __rmqueue_cma_fallback() function
+		 * find_suitable_fallback(). This is OK.
 		 */
 		set_pcppage_migratetype(page, start_migratetype);
 
@@ -2221,13 +2202,8 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 	struct page *page;
 
 	page = __rmqueue_smallest(zone, order, migratetype);
-	if (unlikely(!page)) {
-		if (migratetype == MIGRATE_MOVABLE)
-			page = __rmqueue_cma_fallback(zone, order);
-
-		if (!page)
-			page = __rmqueue_fallback(zone, order, migratetype);
-	}
+	if (unlikely(!page))
+		page = __rmqueue_fallback(zone, order, migratetype);
 
 	trace_mm_page_alloc_zone_locked(page, order, migratetype);
 	return page;
@@ -2267,9 +2243,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 		else
 			list_add_tail(&page->lru, list);
 		list = &page->lru;
-		if (is_migrate_cma(get_pcppage_migratetype(page)))
-			__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
-					      -(1 << order));
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
 	spin_unlock(&zone->lock);
@@ -2568,7 +2541,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
 		if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
 			return 0;
 
-		__mod_zone_freepage_state(zone, -(1UL << order), mt);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
 	}
 
 	/* Remove page from free list */
@@ -2584,7 +2557,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages) {
 			int mt = get_pageblock_migratetype(page);
-			if (!is_migrate_isolate(mt) && !is_migrate_cma(mt))
+			if (!is_migrate_isolate(mt))
 				set_pageblock_migratetype(page,
 							  MIGRATE_MOVABLE);
 		}
@@ -2684,8 +2657,8 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
 		spin_unlock(&zone->lock);
 		if (!page)
 			goto failed;
-		__mod_zone_freepage_state(zone, -(1 << order),
-					  get_pcppage_migratetype(page));
+
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1 << order));
 	}
 
 	__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
@@ -2835,11 +2808,6 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 			if (!list_empty(&area->free_list[mt]))
 				return true;
 		}
-
-#ifdef CONFIG_CMA
-		if (!list_empty(&area->free_list[MIGRATE_CMA]))
-			return true;
-#endif
 	}
 	return false;
 }
@@ -4173,9 +4141,6 @@ static void show_migration_types(unsigned char type)
 		[MIGRATE_MOVABLE]	= 'M',
 		[MIGRATE_RECLAIMABLE]	= 'E',
 		[MIGRATE_HIGHATOMIC]	= 'H',
-#ifdef CONFIG_CMA
-		[MIGRATE_CMA]		= 'C',
-#endif
 #ifdef CONFIG_MEMORY_ISOLATION
 		[MIGRATE_ISOLATE]	= 'I',
 #endif
@@ -7130,7 +7095,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 		return false;
 
 	mt = get_pageblock_migratetype(page);
-	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
+	if (mt == MIGRATE_MOVABLE)
 		return false;
 
 	pfn = page_to_pfn(page);
@@ -7278,15 +7243,11 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
- * @migratetype:	migratetype of the underlaying pageblocks (either
- *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
- *			in range must have the same migratetype and it must
- *			be either of the two.
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
  * aligned, however it's the caller's responsibility to guarantee that
  * we are the only thread that changes migrate type of pageblocks the
- * pages fall in.
+ * pages fall in and it should be MIGRATE_MOVABLE.
  *
  * The PFN range must belong to a single zone.
  *
@@ -7294,8 +7255,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
  * pages which PFN is in [start, end) are allocated for the caller and
  * need to be freed with free_contig_range().
  */
-int alloc_contig_range(unsigned long start, unsigned long end,
-		       unsigned migratetype)
+int alloc_contig_range(unsigned long start, unsigned long end)
 {
 	unsigned long outer_start, outer_end;
 	unsigned int order;
@@ -7328,15 +7288,14 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * allocator removing them from the buddy system.  This way
 	 * page allocator will never consider using them.
 	 *
-	 * This lets us mark the pageblocks back as
-	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
-	 * aligned range but not in the unaligned, original range are
-	 * put back to page allocator so that buddy can use them.
+	 * This lets us mark the pageblocks back as MIGRATE_MOVABLE
+	 * so that free pages in the aligned range but not in the
+	 * unaligned, original range are put back to page allocator
+	 * so that buddy can use them.
 	 */
 
 	ret = start_isolate_page_range(pfn_max_align_down(start),
-				       pfn_max_align_up(end), migratetype,
-				       false);
+				       pfn_max_align_up(end), false);
 	if (ret)
 		return ret;
 
@@ -7414,7 +7373,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 done:
 	undo_isolate_page_range(pfn_max_align_down(start),
-				pfn_max_align_up(end), migratetype);
+				pfn_max_align_up(end));
 	return ret;
 }
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 064b7fb..e7933ff 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -62,13 +62,12 @@ static int set_migratetype_isolate(struct page *page,
 out:
 	if (!ret) {
 		unsigned long nr_pages;
-		int migratetype = get_pageblock_migratetype(page);
 
 		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 		zone->nr_isolate_pageblock++;
 		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
 
-		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -nr_pages);
 	}
 
 	spin_unlock_irqrestore(&zone->lock, flags);
@@ -121,7 +120,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 	 */
 	if (!isolated_page) {
 		nr_pages = move_freepages_block(zone, page, migratetype);
-		__mod_zone_freepage_state(zone, nr_pages, migratetype);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
 	}
 	set_pageblock_migratetype(page, migratetype);
 	zone->nr_isolate_pageblock--;
@@ -150,7 +149,6 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * to be MIGRATE_ISOLATE.
  * @start_pfn: The lower PFN of the range to be isolated.
  * @end_pfn: The upper PFN of the range to be isolated.
- * @migratetype: migrate type to set in error recovery.
  *
  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  * the range will never be allocated. Any free pages and pages freed in the
@@ -160,7 +158,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
  */
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			     unsigned migratetype, bool skip_hwpoisoned_pages)
+				bool skip_hwpoisoned_pages)
 {
 	unsigned long pfn;
 	unsigned long undo_pfn;
@@ -184,7 +182,7 @@ undo:
 	for (pfn = start_pfn;
 	     pfn < undo_pfn;
 	     pfn += pageblock_nr_pages)
-		unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
+		unset_migratetype_isolate(pfn_to_page(pfn), MIGRATE_MOVABLE);
 
 	return -EBUSY;
 }
@@ -192,8 +190,7 @@ undo:
 /*
  * Make isolated pages available again.
  */
-int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			    unsigned migratetype)
+int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -207,7 +204,7 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 			continue;
-		unset_migratetype_isolate(page, migratetype);
+		unset_migratetype_isolate(page, MIGRATE_MOVABLE);
 	}
 	return 0;
 }
diff --git a/mm/page_owner.c b/mm/page_owner.c
index c3cee24..4016815 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -299,11 +299,7 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m,
 			page_mt = gfpflags_to_migratetype(
 					page_owner->gfp_mask);
 			if (pageblock_mt != page_mt) {
-				if (is_migrate_cma(pageblock_mt))
-					count[MIGRATE_MOVABLE]++;
-				else
-					count[pageblock_mt]++;
-
+				count[pageblock_mt]++;
 				pfn = block_end_pfn;
 				break;
 			}
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a3cc305..f7682d9 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -197,7 +197,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 	 * several independently allocated pages.
 	 */
 	is_reserved = PageReserved(page);
-	is_cma = is_migrate_cma_page(page);
+	is_cma = is_zone_cma(page_zone(page));
 	if (!is_reserved && !is_cma)
 		goto reject;
 
@@ -205,7 +205,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 		page = virt_to_head_page(ptr);
 		if (is_reserved && !PageReserved(page))
 			goto reject;
-		if (is_cma && !is_migrate_cma_page(page))
+		if (is_cma && !is_zone_cma(page_zone(page)))
 			goto reject;
 	}
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v5 5/6] mm/cma: remove MIGRATE_CMA
@ 2016-08-29  5:07   ` js1304
  0 siblings, 0 replies; 54+ messages in thread
From: js1304 @ 2016-08-29  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Now, all reserved pages for CMA region are belong to the ZONE_CMA
and there is no other type of pages. Therefore, we don't need to
use MIGRATE_CMA to distinguish and handle differently for CMA pages
and ordinary pages. Remove MIGRATE_CMA.

Unfortunately, this patch make free CMA counter incorrect because
we count it when pages are on the MIGRATE_CMA. It will be fixed
by next patch. I can squash next patch here but it makes changes
complicated and hard to review so I separate that.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/gfp.h            |  3 +-
 include/linux/mmzone.h         | 24 ------------
 include/linux/page-isolation.h |  5 +--
 include/linux/vmstat.h         |  8 ----
 mm/cma.c                       |  2 +-
 mm/compaction.c                | 10 +----
 mm/hugetlb.c                   |  2 +-
 mm/memory_hotplug.c            |  7 ++--
 mm/page_alloc.c                | 89 ++++++++++++------------------------------
 mm/page_isolation.c            | 15 +++----
 mm/page_owner.c                |  6 +--
 mm/usercopy.c                  |  4 +-
 12 files changed, 43 insertions(+), 132 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index b86e0c2..815d756 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -553,8 +553,7 @@ static inline bool pm_suspended_storage(void)
 
 #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
 /* The below functions must be run on a range from a single zone. */
-extern int alloc_contig_range(unsigned long start, unsigned long end,
-			      unsigned migratetype);
+extern int alloc_contig_range(unsigned long start, unsigned long end);
 extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
 #endif
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 87b344e..24e46ca 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -41,22 +41,6 @@ enum {
 	MIGRATE_RECLAIMABLE,
 	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
 	MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
-#ifdef CONFIG_CMA
-	/*
-	 * MIGRATE_CMA migration type is designed to mimic the way
-	 * ZONE_MOVABLE works.  Only movable pages can be allocated
-	 * from MIGRATE_CMA pageblocks and page allocator never
-	 * implicitly change migration type of MIGRATE_CMA pageblock.
-	 *
-	 * The way to use it is to change migratetype of a range of
-	 * pageblocks to MIGRATE_CMA which can be done by
-	 * __free_pageblock_cma() function.  What is important though
-	 * is that a range of pageblocks must be aligned to
-	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
-	 * a single pageblock.
-	 */
-	MIGRATE_CMA,
-#endif
 #ifdef CONFIG_MEMORY_ISOLATION
 	MIGRATE_ISOLATE,	/* can't allocate from here */
 #endif
@@ -66,14 +50,6 @@ enum {
 /* In mm/page_alloc.c; keep in sync also with show_migration_types() there */
 extern char * const migratetype_names[MIGRATE_TYPES];
 
-#ifdef CONFIG_CMA
-#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
-#  define is_migrate_cma_page(_page) (get_pageblock_migratetype(_page) == MIGRATE_CMA)
-#else
-#  define is_migrate_cma(migratetype) false
-#  define is_migrate_cma_page(_page) false
-#endif
-
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 047d647..1db9759 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -49,15 +49,14 @@ int move_freepages(struct zone *zone,
  */
 int
 start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			 unsigned migratetype, bool skip_hwpoisoned_pages);
+				bool skip_hwpoisoned_pages);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
  * target range is [start_pfn, end_pfn)
  */
 int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			unsigned migratetype);
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
 
 /*
  * Test all pages in [start_pfn, end_pfn) are isolated or not.
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 6137719..ac6db88 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -341,14 +341,6 @@ static inline void drain_zonestat(struct zone *zone,
 			struct per_cpu_pageset *pset) { }
 #endif		/* CONFIG_SMP */
 
-static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages,
-					     int migratetype)
-{
-	__mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
-	if (is_migrate_cma(migratetype))
-		__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
-}
-
 extern const char * const vmstat_text[];
 
 #endif /* _LINUX_VMSTAT_H */
diff --git a/mm/cma.c b/mm/cma.c
index d69bdf7..c1bae7f 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -450,7 +450,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align)
 
 		pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
 		mutex_lock(&cma_mutex);
-		ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
+		ret = alloc_contig_range(pfn, pfn + count);
 		mutex_unlock(&cma_mutex);
 		if (ret == 0) {
 			page = pfn_to_page(pfn);
diff --git a/mm/compaction.c b/mm/compaction.c
index 4532905..9c5da79 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -90,7 +90,7 @@ static void map_pages(struct list_head *list)
 
 static inline bool migrate_async_suitable(int migratetype)
 {
-	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
+	return migratetype == MIGRATE_MOVABLE;
 }
 
 #ifdef CONFIG_COMPACTION
@@ -1010,7 +1010,7 @@ static bool suitable_migration_target(struct page *page)
 			return false;
 	}
 
-	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
+	/* If the block is MIGRATE_MOVABLE, allow migration */
 	if (migrate_async_suitable(get_pageblock_migratetype(page)))
 		return true;
 
@@ -1331,12 +1331,6 @@ static enum compact_result __compact_finished(struct zone *zone, struct compact_
 		if (!list_empty(&area->free_list[migratetype]))
 			return COMPACT_SUCCESS;
 
-#ifdef CONFIG_CMA
-		/* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
-		if (migratetype == MIGRATE_MOVABLE &&
-			!list_empty(&area->free_list[MIGRATE_CMA]))
-			return COMPACT_SUCCESS;
-#endif
 		/*
 		 * Job done if allocation would steal freepages from
 		 * other migratetype buddy lists.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 87e11d8..6735ad5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1051,7 +1051,7 @@ static int __alloc_gigantic_page(unsigned long start_pfn,
 				unsigned long nr_pages)
 {
 	unsigned long end_pfn = start_pfn + nr_pages;
-	return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	return alloc_contig_range(start_pfn, end_pfn);
 }
 
 static bool pfn_range_valid_gigantic(struct zone *z,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6747dfe..0e438b1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1889,8 +1889,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		return -EINVAL;
 
 	/* set above range as isolated */
-	ret = start_isolate_page_range(start_pfn, end_pfn,
-				       MIGRATE_MOVABLE, true);
+	ret = start_isolate_page_range(start_pfn, end_pfn, true);
 	if (ret)
 		return ret;
 
@@ -1958,7 +1957,7 @@ repeat:
 	   We cannot do rollback at this point. */
 	offline_isolated_pages(start_pfn, end_pfn);
 	/* reset pagetype flags and makes migrate type to be MOVABLE */
-	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	undo_isolate_page_range(start_pfn, end_pfn);
 	/* removal success */
 	adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages);
 	zone->present_pages -= offlined_pages;
@@ -1995,7 +1994,7 @@ failed_removal:
 		 ((unsigned long long) end_pfn << PAGE_SHIFT) - 1);
 	memory_notify(MEM_CANCEL_OFFLINE, &arg);
 	/* pushback to free area */
-	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	undo_isolate_page_range(start_pfn, end_pfn);
 	return ret;
 }
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 16ba1fe..ca17de9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -131,8 +131,8 @@ gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
  * put on a pcplist. Used to avoid the pageblock migratetype lookup when
  * freeing from pcplists in most cases, at the cost of possibly becoming stale.
  * Also the migratetype set in the page does not necessarily match the pcplist
- * index, e.g. page might have MIGRATE_CMA set but be on a pcplist with any
- * other index - this ensures that it will be put on the correct CMA freelist.
+ * index, e.g. page might have MIGRATE_MOVABLE set but be on a pcplist with any
+ * other index - this ensures that it will be put on the correct freelist.
  */
 static inline int get_pcppage_migratetype(struct page *page)
 {
@@ -242,9 +242,6 @@ char * const migratetype_names[MIGRATE_TYPES] = {
 	"Movable",
 	"Reclaimable",
 	"HighAtomic",
-#ifdef CONFIG_CMA
-	"CMA",
-#endif
 #ifdef CONFIG_MEMORY_ISOLATION
 	"Isolate",
 #endif
@@ -676,7 +673,7 @@ static inline bool set_page_guard(struct zone *zone, struct page *page,
 	INIT_LIST_HEAD(&page->lru);
 	set_page_private(page, order);
 	/* Guard pages are not available for any usage */
-	__mod_zone_freepage_state(zone, -(1 << order), migratetype);
+	__mod_zone_page_state(zone, NR_FREE_PAGES, -(1 << order));
 
 	return true;
 }
@@ -697,7 +694,7 @@ static inline void clear_page_guard(struct zone *zone, struct page *page,
 
 	set_page_private(page, 0);
 	if (!is_migrate_isolate(migratetype))
-		__mod_zone_freepage_state(zone, (1 << order), migratetype);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, (1 << order));
 }
 #else
 struct page_ext_operations debug_guardpage_ops;
@@ -808,7 +805,7 @@ static inline void __free_one_page(struct page *page,
 
 	VM_BUG_ON(migratetype == -1);
 	if (likely(!is_migrate_isolate(migratetype)))
-		__mod_zone_freepage_state(zone, 1 << order, migratetype);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, 1 << order);
 
 	page_idx = pfn & ((1 << MAX_ORDER) - 1);
 
@@ -1618,7 +1615,7 @@ static void __init adjust_present_page_count(struct page *page, long count)
 	zone->present_pages += count;
 }
 
-/* Free whole pageblock and set its migration type to MIGRATE_CMA. */
+/* Free whole pageblock and set its migration type to MIGRATE_MOVABLE. */
 void __init init_cma_reserved_pageblock(struct page *page)
 {
 	unsigned i = pageblock_nr_pages;
@@ -1643,7 +1640,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
 
 	adjust_present_page_count(page, pageblock_nr_pages);
 
-	set_pageblock_migratetype(page, MIGRATE_CMA);
+	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 
 	if (pageblock_order >= MAX_ORDER) {
 		i = pageblock_nr_pages;
@@ -1870,25 +1867,11 @@ static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_TYPES },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_TYPES },
 	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
-#ifdef CONFIG_CMA
-	[MIGRATE_CMA]         = { MIGRATE_TYPES }, /* Never used */
-#endif
 #ifdef CONFIG_MEMORY_ISOLATION
 	[MIGRATE_ISOLATE]     = { MIGRATE_TYPES }, /* Never used */
 #endif
 };
 
-#ifdef CONFIG_CMA
-static struct page *__rmqueue_cma_fallback(struct zone *zone,
-					unsigned int order)
-{
-	return __rmqueue_smallest(zone, order, MIGRATE_CMA);
-}
-#else
-static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
-					unsigned int order) { return NULL; }
-#endif
-
 /*
  * Move the free pages in a range to the free lists of the requested type.
  * Note that start_page and end_pages are not aligned on a pageblock
@@ -2093,7 +2076,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
 	/* Yoink! */
 	mt = get_pageblock_migratetype(page);
 	if (mt != MIGRATE_HIGHATOMIC &&
-			!is_migrate_isolate(mt) && !is_migrate_cma(mt)) {
+			!is_migrate_isolate(mt)) {
 		zone->nr_reserved_highatomic += pageblock_nr_pages;
 		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
 		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC);
@@ -2196,9 +2179,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
 		/*
 		 * The pcppage_migratetype may differ from pageblock's
 		 * migratetype depending on the decisions in
-		 * find_suitable_fallback(). This is OK as long as it does not
-		 * differ for MIGRATE_CMA pageblocks. Those can be used as
-		 * fallback only via special __rmqueue_cma_fallback() function
+		 * find_suitable_fallback(). This is OK.
 		 */
 		set_pcppage_migratetype(page, start_migratetype);
 
@@ -2221,13 +2202,8 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 	struct page *page;
 
 	page = __rmqueue_smallest(zone, order, migratetype);
-	if (unlikely(!page)) {
-		if (migratetype == MIGRATE_MOVABLE)
-			page = __rmqueue_cma_fallback(zone, order);
-
-		if (!page)
-			page = __rmqueue_fallback(zone, order, migratetype);
-	}
+	if (unlikely(!page))
+		page = __rmqueue_fallback(zone, order, migratetype);
 
 	trace_mm_page_alloc_zone_locked(page, order, migratetype);
 	return page;
@@ -2267,9 +2243,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 		else
 			list_add_tail(&page->lru, list);
 		list = &page->lru;
-		if (is_migrate_cma(get_pcppage_migratetype(page)))
-			__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
-					      -(1 << order));
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
 	spin_unlock(&zone->lock);
@@ -2568,7 +2541,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
 		if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
 			return 0;
 
-		__mod_zone_freepage_state(zone, -(1UL << order), mt);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
 	}
 
 	/* Remove page from free list */
@@ -2584,7 +2557,7 @@ int __isolate_free_page(struct page *page, unsigned int order)
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages) {
 			int mt = get_pageblock_migratetype(page);
-			if (!is_migrate_isolate(mt) && !is_migrate_cma(mt))
+			if (!is_migrate_isolate(mt))
 				set_pageblock_migratetype(page,
 							  MIGRATE_MOVABLE);
 		}
@@ -2684,8 +2657,8 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
 		spin_unlock(&zone->lock);
 		if (!page)
 			goto failed;
-		__mod_zone_freepage_state(zone, -(1 << order),
-					  get_pcppage_migratetype(page));
+
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -(1 << order));
 	}
 
 	__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
@@ -2835,11 +2808,6 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
 			if (!list_empty(&area->free_list[mt]))
 				return true;
 		}
-
-#ifdef CONFIG_CMA
-		if (!list_empty(&area->free_list[MIGRATE_CMA]))
-			return true;
-#endif
 	}
 	return false;
 }
@@ -4173,9 +4141,6 @@ static void show_migration_types(unsigned char type)
 		[MIGRATE_MOVABLE]	= 'M',
 		[MIGRATE_RECLAIMABLE]	= 'E',
 		[MIGRATE_HIGHATOMIC]	= 'H',
-#ifdef CONFIG_CMA
-		[MIGRATE_CMA]		= 'C',
-#endif
 #ifdef CONFIG_MEMORY_ISOLATION
 		[MIGRATE_ISOLATE]	= 'I',
 #endif
@@ -7130,7 +7095,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 		return false;
 
 	mt = get_pageblock_migratetype(page);
-	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
+	if (mt == MIGRATE_MOVABLE)
 		return false;
 
 	pfn = page_to_pfn(page);
@@ -7278,15 +7243,11 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
- * @migratetype:	migratetype of the underlaying pageblocks (either
- *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
- *			in range must have the same migratetype and it must
- *			be either of the two.
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
  * aligned, however it's the caller's responsibility to guarantee that
  * we are the only thread that changes migrate type of pageblocks the
- * pages fall in.
+ * pages fall in and it should be MIGRATE_MOVABLE.
  *
  * The PFN range must belong to a single zone.
  *
@@ -7294,8 +7255,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
  * pages which PFN is in [start, end) are allocated for the caller and
  * need to be freed with free_contig_range().
  */
-int alloc_contig_range(unsigned long start, unsigned long end,
-		       unsigned migratetype)
+int alloc_contig_range(unsigned long start, unsigned long end)
 {
 	unsigned long outer_start, outer_end;
 	unsigned int order;
@@ -7328,15 +7288,14 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	 * allocator removing them from the buddy system.  This way
 	 * page allocator will never consider using them.
 	 *
-	 * This lets us mark the pageblocks back as
-	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
-	 * aligned range but not in the unaligned, original range are
-	 * put back to page allocator so that buddy can use them.
+	 * This lets us mark the pageblocks back as MIGRATE_MOVABLE
+	 * so that free pages in the aligned range but not in the
+	 * unaligned, original range are put back to page allocator
+	 * so that buddy can use them.
 	 */
 
 	ret = start_isolate_page_range(pfn_max_align_down(start),
-				       pfn_max_align_up(end), migratetype,
-				       false);
+				       pfn_max_align_up(end), false);
 	if (ret)
 		return ret;
 
@@ -7414,7 +7373,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 done:
 	undo_isolate_page_range(pfn_max_align_down(start),
-				pfn_max_align_up(end), migratetype);
+				pfn_max_align_up(end));
 	return ret;
 }
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 064b7fb..e7933ff 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -62,13 +62,12 @@ static int set_migratetype_isolate(struct page *page,
 out:
 	if (!ret) {
 		unsigned long nr_pages;
-		int migratetype = get_pageblock_migratetype(page);
 
 		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 		zone->nr_isolate_pageblock++;
 		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
 
-		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, -nr_pages);
 	}
 
 	spin_unlock_irqrestore(&zone->lock, flags);
@@ -121,7 +120,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 	 */
 	if (!isolated_page) {
 		nr_pages = move_freepages_block(zone, page, migratetype);
-		__mod_zone_freepage_state(zone, nr_pages, migratetype);
+		__mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
 	}
 	set_pageblock_migratetype(page, migratetype);
 	zone->nr_isolate_pageblock--;
@@ -150,7 +149,6 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * to be MIGRATE_ISOLATE.
  * @start_pfn: The lower PFN of the range to be isolated.
  * @end_pfn: The upper PFN of the range to be isolated.
- * @migratetype: migrate type to set in error recovery.
  *
  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  * the range will never be allocated. Any free pages and pages freed in the
@@ -160,7 +158,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
  */
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			     unsigned migratetype, bool skip_hwpoisoned_pages)
+				bool skip_hwpoisoned_pages)
 {
 	unsigned long pfn;
 	unsigned long undo_pfn;
@@ -184,7 +182,7 @@ undo:
 	for (pfn = start_pfn;
 	     pfn < undo_pfn;
 	     pfn += pageblock_nr_pages)
-		unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
+		unset_migratetype_isolate(pfn_to_page(pfn), MIGRATE_MOVABLE);
 
 	return -EBUSY;
 }
@@ -192,8 +190,7 @@ undo:
 /*
  * Make isolated pages available again.
  */
-int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			    unsigned migratetype)
+int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -207,7 +204,7 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 			continue;
-		unset_migratetype_isolate(page, migratetype);
+		unset_migratetype_isolate(page, MIGRATE_MOVABLE);
 	}
 	return 0;
 }
diff --git a/mm/page_owner.c b/mm/page_owner.c
index c3cee24..4016815 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -299,11 +299,7 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m,
 			page_mt = gfpflags_to_migratetype(
 					page_owner->gfp_mask);
 			if (pageblock_mt != page_mt) {
-				if (is_migrate_cma(pageblock_mt))
-					count[MIGRATE_MOVABLE]++;
-				else
-					count[pageblock_mt]++;
-
+				count[pageblock_mt]++;
 				pfn = block_end_pfn;
 				break;
 			}
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a3cc305..f7682d9 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -197,7 +197,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 	 * several independently allocated pages.
 	 */
 	is_reserved = PageReserved(page);
-	is_cma = is_migrate_cma_page(page);
+	is_cma = is_zone_cma(page_zone(page));
 	if (!is_reserved && !is_cma)
 		goto reject;
 
@@ -205,7 +205,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
 		page = virt_to_head_page(ptr);
 		if (is_reserved && !PageReserved(page))
 			goto reject;
-		if (is_cma && !is_migrate_cma_page(page))
+		if (is_cma && !is_zone_cma(page_zone(page)))
 			goto reject;
 	}
 
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v5 6/6] mm/cma: remove per zone CMA stat
  2016-08-29  5:07 ` js1304
@ 2016-08-29  5:07   ` js1304
  -1 siblings, 0 replies; 54+ messages in thread
From: js1304 @ 2016-08-29  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Now, all reserved pages for CMA region are belong to the ZONE_CMA
so we don't need to maintain CMA stat in other zones. Remove it.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 fs/proc/meminfo.c      |  2 +-
 include/linux/cma.h    |  6 ++++++
 include/linux/mmzone.h |  1 -
 mm/cma.c               | 15 +++++++++++++++
 mm/page_alloc.c        |  7 +++----
 mm/vmstat.c            |  1 -
 6 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 8a42849..0ca6f38 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -151,7 +151,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 #ifdef CONFIG_CMA
 	show_val_kb(m, "CmaTotal:       ", totalcma_pages);
 	show_val_kb(m, "CmaFree:        ",
-		    global_page_state(NR_FREE_CMA_PAGES));
+		    cma_get_free());
 #endif
 
 	hugetlb_report_meminfo(m);
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 29f9e77..816290c 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -28,4 +28,10 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 					struct cma **res_cma);
 extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align);
 extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count);
+
+#ifdef CONFIG_CMA
+extern unsigned long cma_get_free(void);
+#else
+static inline unsigned long cma_get_free(void) { return 0; }
+#endif
 #endif
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 24e46ca..8bc2611 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -113,7 +113,6 @@ enum zone_stat_item {
 	NUMA_LOCAL,		/* allocation from local node */
 	NUMA_OTHER,		/* allocation from other node */
 #endif
-	NR_FREE_CMA_PAGES,
 	NR_VM_ZONE_STAT_ITEMS };
 
 enum node_stat_item {
diff --git a/mm/cma.c b/mm/cma.c
index c1bae7f..981633b 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -54,6 +54,21 @@ unsigned long cma_get_size(const struct cma *cma)
 	return cma->count << PAGE_SHIFT;
 }
 
+unsigned long cma_get_free(void)
+{
+	struct zone *zone;
+	unsigned long freecma = 0;
+
+	for_each_populated_zone(zone) {
+		if (!is_zone_cma(zone))
+			continue;
+
+		freecma += zone_page_state(zone, NR_FREE_PAGES);
+	}
+
+	return freecma;
+}
+
 static unsigned long cma_bitmap_aligned_mask(const struct cma *cma,
 					     int align_order)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ca17de9..587d542 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -65,6 +65,7 @@
 #include <linux/kthread.h>
 #include <linux/memcontrol.h>
 #include <linux/random.h>
+#include <linux/cma.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
@@ -4206,7 +4207,7 @@ void show_free_areas(unsigned int filter)
 		global_page_state(NR_BOUNCE),
 		global_page_state(NR_FREE_PAGES),
 		free_pcp,
-		global_page_state(NR_FREE_CMA_PAGES));
+		cma_get_free());
 
 	for_each_online_pgdat(pgdat) {
 		printk("Node %d"
@@ -4287,7 +4288,6 @@ void show_free_areas(unsigned int filter)
 			" bounce:%lukB"
 			" free_pcp:%lukB"
 			" local_pcp:%ukB"
-			" free_cma:%lukB"
 			"\n",
 			zone->name,
 			K(zone_page_state(zone, NR_FREE_PAGES)),
@@ -4309,8 +4309,7 @@ void show_free_areas(unsigned int filter)
 			K(zone_page_state(zone, NR_PAGETABLE)),
 			K(zone_page_state(zone, NR_BOUNCE)),
 			K(free_pcp),
-			K(this_cpu_read(zone->pageset->pcp.count)),
-			K(zone_page_state(zone, NR_FREE_CMA_PAGES)));
+			K(this_cpu_read(zone->pageset->pcp.count)));
 		printk("lowmem_reserve[]:");
 		for (i = 0; i < MAX_NR_ZONES; i++)
 			printk(" %ld", zone->lowmem_reserve[i]);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index ce5838b..93dfd9d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -951,7 +951,6 @@ const char * const vmstat_text[] = {
 	"numa_local",
 	"numa_other",
 #endif
-	"nr_free_cma",
 
 	/* Node-based counters */
 	"nr_inactive_anon",
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v5 6/6] mm/cma: remove per zone CMA stat
@ 2016-08-29  5:07   ` js1304
  0 siblings, 0 replies; 54+ messages in thread
From: js1304 @ 2016-08-29  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Vlastimil Babka, linux-mm, linux-kernel,
	Joonsoo Kim

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Now, all reserved pages for CMA region are belong to the ZONE_CMA
so we don't need to maintain CMA stat in other zones. Remove it.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 fs/proc/meminfo.c      |  2 +-
 include/linux/cma.h    |  6 ++++++
 include/linux/mmzone.h |  1 -
 mm/cma.c               | 15 +++++++++++++++
 mm/page_alloc.c        |  7 +++----
 mm/vmstat.c            |  1 -
 6 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 8a42849..0ca6f38 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -151,7 +151,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 #ifdef CONFIG_CMA
 	show_val_kb(m, "CmaTotal:       ", totalcma_pages);
 	show_val_kb(m, "CmaFree:        ",
-		    global_page_state(NR_FREE_CMA_PAGES));
+		    cma_get_free());
 #endif
 
 	hugetlb_report_meminfo(m);
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 29f9e77..816290c 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -28,4 +28,10 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 					struct cma **res_cma);
 extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align);
 extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count);
+
+#ifdef CONFIG_CMA
+extern unsigned long cma_get_free(void);
+#else
+static inline unsigned long cma_get_free(void) { return 0; }
+#endif
 #endif
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 24e46ca..8bc2611 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -113,7 +113,6 @@ enum zone_stat_item {
 	NUMA_LOCAL,		/* allocation from local node */
 	NUMA_OTHER,		/* allocation from other node */
 #endif
-	NR_FREE_CMA_PAGES,
 	NR_VM_ZONE_STAT_ITEMS };
 
 enum node_stat_item {
diff --git a/mm/cma.c b/mm/cma.c
index c1bae7f..981633b 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -54,6 +54,21 @@ unsigned long cma_get_size(const struct cma *cma)
 	return cma->count << PAGE_SHIFT;
 }
 
+unsigned long cma_get_free(void)
+{
+	struct zone *zone;
+	unsigned long freecma = 0;
+
+	for_each_populated_zone(zone) {
+		if (!is_zone_cma(zone))
+			continue;
+
+		freecma += zone_page_state(zone, NR_FREE_PAGES);
+	}
+
+	return freecma;
+}
+
 static unsigned long cma_bitmap_aligned_mask(const struct cma *cma,
 					     int align_order)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ca17de9..587d542 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -65,6 +65,7 @@
 #include <linux/kthread.h>
 #include <linux/memcontrol.h>
 #include <linux/random.h>
+#include <linux/cma.h>
 
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
@@ -4206,7 +4207,7 @@ void show_free_areas(unsigned int filter)
 		global_page_state(NR_BOUNCE),
 		global_page_state(NR_FREE_PAGES),
 		free_pcp,
-		global_page_state(NR_FREE_CMA_PAGES));
+		cma_get_free());
 
 	for_each_online_pgdat(pgdat) {
 		printk("Node %d"
@@ -4287,7 +4288,6 @@ void show_free_areas(unsigned int filter)
 			" bounce:%lukB"
 			" free_pcp:%lukB"
 			" local_pcp:%ukB"
-			" free_cma:%lukB"
 			"\n",
 			zone->name,
 			K(zone_page_state(zone, NR_FREE_PAGES)),
@@ -4309,8 +4309,7 @@ void show_free_areas(unsigned int filter)
 			K(zone_page_state(zone, NR_PAGETABLE)),
 			K(zone_page_state(zone, NR_BOUNCE)),
 			K(free_pcp),
-			K(this_cpu_read(zone->pageset->pcp.count)),
-			K(zone_page_state(zone, NR_FREE_CMA_PAGES)));
+			K(this_cpu_read(zone->pageset->pcp.count)));
 		printk("lowmem_reserve[]:");
 		for (i = 0; i < MAX_NR_ZONES; i++)
 			printk(" %ld", zone->lowmem_reserve[i]);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index ce5838b..93dfd9d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -951,7 +951,6 @@ const char * const vmstat_text[] = {
 	"numa_local",
 	"numa_other",
 #endif
-	"nr_free_cma",
 
 	/* Node-based counters */
 	"nr_inactive_anon",
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/6] Introduce ZONE_CMA
  2016-08-29  5:07 ` js1304
@ 2016-08-29  9:27   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 54+ messages in thread
From: Aneesh Kumar K.V @ 2016-08-29  9:27 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, linux-mm, linux-kernel, Joonsoo Kim

js1304@gmail.com writes:

> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Hello,
>
> Changes from v4
> o Rebase on next-20160825
> o Add general fix patch for lowmem reserve
> o Fix lowmem reserve ratio
> o Fix zone span optimizaion per Vlastimil
> o Fix pageset initialization
> o Change invocation timing on cma_init_reserved_areas()

I don't see much information regarding how we interleave between
ZONE_CMA and other zones for movable allocation. Is that explained in
any of the patch ? The fair zone allocator got removed by
e6cbd7f2efb433d717af72aa8510a9db6f7a7e05 

-aneesh

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/6] Introduce ZONE_CMA
@ 2016-08-29  9:27   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 54+ messages in thread
From: Aneesh Kumar K.V @ 2016-08-29  9:27 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, linux-mm, linux-kernel, Joonsoo Kim

js1304@gmail.com writes:

> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Hello,
>
> Changes from v4
> o Rebase on next-20160825
> o Add general fix patch for lowmem reserve
> o Fix lowmem reserve ratio
> o Fix zone span optimizaion per Vlastimil
> o Fix pageset initialization
> o Change invocation timing on cma_init_reserved_areas()

I don't see much information regarding how we interleave between
ZONE_CMA and other zones for movable allocation. Is that explained in
any of the patch ? The fair zone allocator got removed by
e6cbd7f2efb433d717af72aa8510a9db6f7a7e05 

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/6] Introduce ZONE_CMA
  2016-08-29  9:27   ` Aneesh Kumar K.V
@ 2016-08-30  8:21     ` Joonsoo Kim
  -1 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-08-30  8:21 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, Linux Memory Management List, LKML, Joonsoo Kim

2016-08-29 18:27 GMT+09:00 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>:
> js1304@gmail.com writes:
>
>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>
>> Hello,
>>
>> Changes from v4
>> o Rebase on next-20160825
>> o Add general fix patch for lowmem reserve
>> o Fix lowmem reserve ratio
>> o Fix zone span optimizaion per Vlastimil
>> o Fix pageset initialization
>> o Change invocation timing on cma_init_reserved_areas()
>
> I don't see much information regarding how we interleave between
> ZONE_CMA and other zones for movable allocation. Is that explained in
> any of the patch ? The fair zone allocator got removed by
> e6cbd7f2efb433d717af72aa8510a9db6f7a7e05

Interleaving would not work since the fair zone allocator policy is removed.
I don't think that it's a big problem because it is just matter of
timing to fill
up the memory. Eventually, memory on ZONE_CMA will be fully used in
any case.

Thanks.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/6] Introduce ZONE_CMA
@ 2016-08-30  8:21     ` Joonsoo Kim
  0 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-08-30  8:21 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, Linux Memory Management List, LKML, Joonsoo Kim

2016-08-29 18:27 GMT+09:00 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>:
> js1304@gmail.com writes:
>
>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>
>> Hello,
>>
>> Changes from v4
>> o Rebase on next-20160825
>> o Add general fix patch for lowmem reserve
>> o Fix lowmem reserve ratio
>> o Fix zone span optimizaion per Vlastimil
>> o Fix pageset initialization
>> o Change invocation timing on cma_init_reserved_areas()
>
> I don't see much information regarding how we interleave between
> ZONE_CMA and other zones for movable allocation. Is that explained in
> any of the patch ? The fair zone allocator got removed by
> e6cbd7f2efb433d717af72aa8510a9db6f7a7e05

Interleaving would not work since the fair zone allocator policy is removed.
I don't think that it's a big problem because it is just matter of
timing to fill
up the memory. Eventually, memory on ZONE_CMA will be fully used in
any case.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 2/6] mm/cma: introduce new zone, ZONE_CMA
  2016-08-29  5:07   ` js1304
@ 2016-08-30 10:35     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 54+ messages in thread
From: Aneesh Kumar K.V @ 2016-08-30 10:35 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, linux-mm, linux-kernel, Joonsoo Kim


....

>  static inline void check_highest_zone(enum zone_type k)
>  {
> -	if (k > policy_zone && k != ZONE_MOVABLE)
> +	if (k > policy_zone && k != ZONE_MOVABLE && !is_zone_cma_idx(k))
>  		policy_zone = k;
>  }
>


Should we apply policy to allocation from ZONE CMA ?. CMA reserve
happens early and may mostly come from one node. Do we want the
CMA allocation to fail if we use mbind(MPOL_BIND) with a node mask not
including that node on which CMA is reserved, considering CMA memory is
going to be used for special purpose.

-aneesh

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 2/6] mm/cma: introduce new zone, ZONE_CMA
@ 2016-08-30 10:35     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 54+ messages in thread
From: Aneesh Kumar K.V @ 2016-08-30 10:35 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, linux-mm, linux-kernel, Joonsoo Kim


....

>  static inline void check_highest_zone(enum zone_type k)
>  {
> -	if (k > policy_zone && k != ZONE_MOVABLE)
> +	if (k > policy_zone && k != ZONE_MOVABLE && !is_zone_cma_idx(k))
>  		policy_zone = k;
>  }
>


Should we apply policy to allocation from ZONE CMA ?. CMA reserve
happens early and may mostly come from one node. Do we want the
CMA allocation to fail if we use mbind(MPOL_BIND) with a node mask not
including that node on which CMA is reserved, considering CMA memory is
going to be used for special purpose.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/6] Introduce ZONE_CMA
  2016-08-30  8:21     ` Joonsoo Kim
@ 2016-08-30 10:39       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 54+ messages in thread
From: Aneesh Kumar K.V @ 2016-08-30 10:39 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, Linux Memory Management List, LKML, Joonsoo Kim

Joonsoo Kim <js1304@gmail.com> writes:

> 2016-08-29 18:27 GMT+09:00 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>:
>> js1304@gmail.com writes:
>>
>>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>>
>>> Hello,
>>>
>>> Changes from v4
>>> o Rebase on next-20160825
>>> o Add general fix patch for lowmem reserve
>>> o Fix lowmem reserve ratio
>>> o Fix zone span optimizaion per Vlastimil
>>> o Fix pageset initialization
>>> o Change invocation timing on cma_init_reserved_areas()
>>
>> I don't see much information regarding how we interleave between
>> ZONE_CMA and other zones for movable allocation. Is that explained in
>> any of the patch ? The fair zone allocator got removed by
>> e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
>
> Interleaving would not work since the fair zone allocator policy is removed.
> I don't think that it's a big problem because it is just matter of
> timing to fill
> up the memory. Eventually, memory on ZONE_CMA will be fully used in
> any case.

Does that mean a CMA allocation will now be slower because in most case we
will need to reclaim ? The zone list will now have ZONE_CMA in the
beginning right ?

-aneesh

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/6] Introduce ZONE_CMA
@ 2016-08-30 10:39       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 54+ messages in thread
From: Aneesh Kumar K.V @ 2016-08-30 10:39 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, Linux Memory Management List, LKML, Joonsoo Kim

Joonsoo Kim <js1304@gmail.com> writes:

> 2016-08-29 18:27 GMT+09:00 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>:
>> js1304@gmail.com writes:
>>
>>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>>
>>> Hello,
>>>
>>> Changes from v4
>>> o Rebase on next-20160825
>>> o Add general fix patch for lowmem reserve
>>> o Fix lowmem reserve ratio
>>> o Fix zone span optimizaion per Vlastimil
>>> o Fix pageset initialization
>>> o Change invocation timing on cma_init_reserved_areas()
>>
>> I don't see much information regarding how we interleave between
>> ZONE_CMA and other zones for movable allocation. Is that explained in
>> any of the patch ? The fair zone allocator got removed by
>> e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
>
> Interleaving would not work since the fair zone allocator policy is removed.
> I don't think that it's a big problem because it is just matter of
> timing to fill
> up the memory. Eventually, memory on ZONE_CMA will be fully used in
> any case.

Does that mean a CMA allocation will now be slower because in most case we
will need to reclaim ? The zone list will now have ZONE_CMA in the
beginning right ?

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 2/6] mm/cma: introduce new zone, ZONE_CMA
  2016-08-30 10:35     ` Aneesh Kumar K.V
@ 2016-08-30 12:40       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 54+ messages in thread
From: Aneesh Kumar K.V @ 2016-08-30 12:40 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, linux-mm, linux-kernel, Joonsoo Kim

"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> ....
>
>>  static inline void check_highest_zone(enum zone_type k)
>>  {
>> -	if (k > policy_zone && k != ZONE_MOVABLE)
>> +	if (k > policy_zone && k != ZONE_MOVABLE && !is_zone_cma_idx(k))
>>  		policy_zone = k;
>>  }
>>
>
>
> Should we apply policy to allocation from ZONE CMA ?. CMA reserve
> happens early and may mostly come from one node. Do we want the
> CMA allocation to fail if we use mbind(MPOL_BIND) with a node mask not
> including that node on which CMA is reserved, considering CMA memory is
> going to be used for special purpose.

Looking at this again, I guess CMA alloc is not going to depend on
memory policy, but this is for other movable allocation ?

-aneesh

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 2/6] mm/cma: introduce new zone, ZONE_CMA
@ 2016-08-30 12:40       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 54+ messages in thread
From: Aneesh Kumar K.V @ 2016-08-30 12:40 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, linux-mm, linux-kernel, Joonsoo Kim

"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> ....
>
>>  static inline void check_highest_zone(enum zone_type k)
>>  {
>> -	if (k > policy_zone && k != ZONE_MOVABLE)
>> +	if (k > policy_zone && k != ZONE_MOVABLE && !is_zone_cma_idx(k))
>>  		policy_zone = k;
>>  }
>>
>
>
> Should we apply policy to allocation from ZONE CMA ?. CMA reserve
> happens early and may mostly come from one node. Do we want the
> CMA allocation to fail if we use mbind(MPOL_BIND) with a node mask not
> including that node on which CMA is reserved, considering CMA memory is
> going to be used for special purpose.

Looking at this again, I guess CMA alloc is not going to depend on
memory policy, but this is for other movable allocation ?

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 2/6] mm/cma: introduce new zone, ZONE_CMA
  2016-08-30 12:40       ` Aneesh Kumar K.V
@ 2016-08-31  7:58         ` Joonsoo Kim
  -1 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-08-31  7:58 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, linux-mm, linux-kernel

On Tue, Aug 30, 2016 at 06:10:46PM +0530, Aneesh Kumar K.V wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
> 
> > ....
> >
> >>  static inline void check_highest_zone(enum zone_type k)
> >>  {
> >> -	if (k > policy_zone && k != ZONE_MOVABLE)
> >> +	if (k > policy_zone && k != ZONE_MOVABLE && !is_zone_cma_idx(k))
> >>  		policy_zone = k;
> >>  }
> >>
> >
> >
> > Should we apply policy to allocation from ZONE CMA ?. CMA reserve
> > happens early and may mostly come from one node. Do we want the
> > CMA allocation to fail if we use mbind(MPOL_BIND) with a node mask not
> > including that node on which CMA is reserved, considering CMA memory is
> > going to be used for special purpose.
> 
> Looking at this again, I guess CMA alloc is not going to depend on
> memory policy, but this is for other movable allocation ?

This is for usual file cache or anonymous page allocation. IIUC,
policy_zone is used to determine if mempolicy should be applied or not
and setting policy_zone to ZONE_CMA makes mempolicy less useful.

Thanks.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 2/6] mm/cma: introduce new zone, ZONE_CMA
@ 2016-08-31  7:58         ` Joonsoo Kim
  0 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-08-31  7:58 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, linux-mm, linux-kernel

On Tue, Aug 30, 2016 at 06:10:46PM +0530, Aneesh Kumar K.V wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
> 
> > ....
> >
> >>  static inline void check_highest_zone(enum zone_type k)
> >>  {
> >> -	if (k > policy_zone && k != ZONE_MOVABLE)
> >> +	if (k > policy_zone && k != ZONE_MOVABLE && !is_zone_cma_idx(k))
> >>  		policy_zone = k;
> >>  }
> >>
> >
> >
> > Should we apply policy to allocation from ZONE CMA ?. CMA reserve
> > happens early and may mostly come from one node. Do we want the
> > CMA allocation to fail if we use mbind(MPOL_BIND) with a node mask not
> > including that node on which CMA is reserved, considering CMA memory is
> > going to be used for special purpose.
> 
> Looking at this again, I guess CMA alloc is not going to depend on
> memory policy, but this is for other movable allocation ?

This is for usual file cache or anonymous page allocation. IIUC,
policy_zone is used to determine if mempolicy should be applied or not
and setting policy_zone to ZONE_CMA makes mempolicy less useful.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/6] Introduce ZONE_CMA
  2016-08-30 10:39       ` Aneesh Kumar K.V
@ 2016-08-31  8:03         ` Joonsoo Kim
  -1 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-08-31  8:03 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, Linux Memory Management List, LKML

On Tue, Aug 30, 2016 at 04:09:37PM +0530, Aneesh Kumar K.V wrote:
> Joonsoo Kim <js1304@gmail.com> writes:
> 
> > 2016-08-29 18:27 GMT+09:00 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>:
> >> js1304@gmail.com writes:
> >>
> >>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >>>
> >>> Hello,
> >>>
> >>> Changes from v4
> >>> o Rebase on next-20160825
> >>> o Add general fix patch for lowmem reserve
> >>> o Fix lowmem reserve ratio
> >>> o Fix zone span optimizaion per Vlastimil
> >>> o Fix pageset initialization
> >>> o Change invocation timing on cma_init_reserved_areas()
> >>
> >> I don't see much information regarding how we interleave between
> >> ZONE_CMA and other zones for movable allocation. Is that explained in
> >> any of the patch ? The fair zone allocator got removed by
> >> e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
> >
> > Interleaving would not work since the fair zone allocator policy is removed.
> > I don't think that it's a big problem because it is just matter of
> > timing to fill
> > up the memory. Eventually, memory on ZONE_CMA will be fully used in
> > any case.
> 
> Does that mean a CMA allocation will now be slower because in most case we
> will need to reclaim ? The zone list will now have ZONE_CMA in the
> beginning right ?

ZONE_CMA will be used first but I don't think that CMA allocation will
be slower. In most case, memory would be fully used (usually
by page cache). So, we need reclaim or migration in any case.

Thanks.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/6] Introduce ZONE_CMA
@ 2016-08-31  8:03         ` Joonsoo Kim
  0 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-08-31  8:03 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, Linux Memory Management List, LKML

On Tue, Aug 30, 2016 at 04:09:37PM +0530, Aneesh Kumar K.V wrote:
> Joonsoo Kim <js1304@gmail.com> writes:
> 
> > 2016-08-29 18:27 GMT+09:00 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>:
> >> js1304@gmail.com writes:
> >>
> >>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >>>
> >>> Hello,
> >>>
> >>> Changes from v4
> >>> o Rebase on next-20160825
> >>> o Add general fix patch for lowmem reserve
> >>> o Fix lowmem reserve ratio
> >>> o Fix zone span optimizaion per Vlastimil
> >>> o Fix pageset initialization
> >>> o Change invocation timing on cma_init_reserved_areas()
> >>
> >> I don't see much information regarding how we interleave between
> >> ZONE_CMA and other zones for movable allocation. Is that explained in
> >> any of the patch ? The fair zone allocator got removed by
> >> e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
> >
> > Interleaving would not work since the fair zone allocator policy is removed.
> > I don't think that it's a big problem because it is just matter of
> > timing to fill
> > up the memory. Eventually, memory on ZONE_CMA will be fully used in
> > any case.
> 
> Does that mean a CMA allocation will now be slower because in most case we
> will need to reclaim ? The zone list will now have ZONE_CMA in the
> beginning right ?

ZONE_CMA will be used first but I don't think that CMA allocation will
be slower. In most case, memory would be fully used (usually
by page cache). So, we need reclaim or migration in any case.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/6] Introduce ZONE_CMA
  2016-08-31  8:03         ` Joonsoo Kim
@ 2016-09-01  5:47           ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 54+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-01  5:47 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, Linux Memory Management List, LKML

Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:

> On Tue, Aug 30, 2016 at 04:09:37PM +0530, Aneesh Kumar K.V wrote:
>> Joonsoo Kim <js1304@gmail.com> writes:
>> 
>> > 2016-08-29 18:27 GMT+09:00 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>:
>> >> js1304@gmail.com writes:
>> >>
>> >>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> >>>
>> >>> Hello,
>> >>>
>> >>> Changes from v4
>> >>> o Rebase on next-20160825
>> >>> o Add general fix patch for lowmem reserve
>> >>> o Fix lowmem reserve ratio
>> >>> o Fix zone span optimizaion per Vlastimil
>> >>> o Fix pageset initialization
>> >>> o Change invocation timing on cma_init_reserved_areas()
>> >>
>> >> I don't see much information regarding how we interleave between
>> >> ZONE_CMA and other zones for movable allocation. Is that explained in
>> >> any of the patch ? The fair zone allocator got removed by
>> >> e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
>> >
>> > Interleaving would not work since the fair zone allocator policy is removed.
>> > I don't think that it's a big problem because it is just matter of
>> > timing to fill
>> > up the memory. Eventually, memory on ZONE_CMA will be fully used in
>> > any case.
>> 
>> Does that mean a CMA allocation will now be slower because in most case we
>> will need to reclaim ? The zone list will now have ZONE_CMA in the
>> beginning right ?
>
> ZONE_CMA will be used first but I don't think that CMA allocation will
> be slower. In most case, memory would be fully used (usually
> by page cache). So, we need reclaim or migration in any case.

Considering that the upstream kernel doesn't allow migration of THP
pages, this would mean that migrate will fail in most case if we have
THP enabled and the THP allocation request got satisfied via ZONE_CMA.
Isn't that going to be a problem ?

-aneesh

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/6] Introduce ZONE_CMA
@ 2016-09-01  5:47           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 54+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-01  5:47 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, Linux Memory Management List, LKML

Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:

> On Tue, Aug 30, 2016 at 04:09:37PM +0530, Aneesh Kumar K.V wrote:
>> Joonsoo Kim <js1304@gmail.com> writes:
>> 
>> > 2016-08-29 18:27 GMT+09:00 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>:
>> >> js1304@gmail.com writes:
>> >>
>> >>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> >>>
>> >>> Hello,
>> >>>
>> >>> Changes from v4
>> >>> o Rebase on next-20160825
>> >>> o Add general fix patch for lowmem reserve
>> >>> o Fix lowmem reserve ratio
>> >>> o Fix zone span optimizaion per Vlastimil
>> >>> o Fix pageset initialization
>> >>> o Change invocation timing on cma_init_reserved_areas()
>> >>
>> >> I don't see much information regarding how we interleave between
>> >> ZONE_CMA and other zones for movable allocation. Is that explained in
>> >> any of the patch ? The fair zone allocator got removed by
>> >> e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
>> >
>> > Interleaving would not work since the fair zone allocator policy is removed.
>> > I don't think that it's a big problem because it is just matter of
>> > timing to fill
>> > up the memory. Eventually, memory on ZONE_CMA will be fully used in
>> > any case.
>> 
>> Does that mean a CMA allocation will now be slower because in most case we
>> will need to reclaim ? The zone list will now have ZONE_CMA in the
>> beginning right ?
>
> ZONE_CMA will be used first but I don't think that CMA allocation will
> be slower. In most case, memory would be fully used (usually
> by page cache). So, we need reclaim or migration in any case.

Considering that the upstream kernel doesn't allow migration of THP
pages, this would mean that migrate will fail in most case if we have
THP enabled and the THP allocation request got satisfied via ZONE_CMA.
Isn't that going to be a problem ?

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/6] Introduce ZONE_CMA
  2016-09-01  5:47           ` Aneesh Kumar K.V
@ 2016-09-01  6:01             ` Joonsoo Kim
  -1 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-09-01  6:01 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, Linux Memory Management List, LKML

On Thu, Sep 01, 2016 at 11:17:23AM +0530, Aneesh Kumar K.V wrote:
> Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:
> 
> > On Tue, Aug 30, 2016 at 04:09:37PM +0530, Aneesh Kumar K.V wrote:
> >> Joonsoo Kim <js1304@gmail.com> writes:
> >> 
> >> > 2016-08-29 18:27 GMT+09:00 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>:
> >> >> js1304@gmail.com writes:
> >> >>
> >> >>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >> >>>
> >> >>> Hello,
> >> >>>
> >> >>> Changes from v4
> >> >>> o Rebase on next-20160825
> >> >>> o Add general fix patch for lowmem reserve
> >> >>> o Fix lowmem reserve ratio
> >> >>> o Fix zone span optimizaion per Vlastimil
> >> >>> o Fix pageset initialization
> >> >>> o Change invocation timing on cma_init_reserved_areas()
> >> >>
> >> >> I don't see much information regarding how we interleave between
> >> >> ZONE_CMA and other zones for movable allocation. Is that explained in
> >> >> any of the patch ? The fair zone allocator got removed by
> >> >> e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
> >> >
> >> > Interleaving would not work since the fair zone allocator policy is removed.
> >> > I don't think that it's a big problem because it is just matter of
> >> > timing to fill
> >> > up the memory. Eventually, memory on ZONE_CMA will be fully used in
> >> > any case.
> >> 
> >> Does that mean a CMA allocation will now be slower because in most case we
> >> will need to reclaim ? The zone list will now have ZONE_CMA in the
> >> beginning right ?
> >
> > ZONE_CMA will be used first but I don't think that CMA allocation will
> > be slower. In most case, memory would be fully used (usually
> > by page cache). So, we need reclaim or migration in any case.
> 
> Considering that the upstream kernel doesn't allow migration of THP
> pages, this would mean that migrate will fail in most case if we have
> THP enabled and the THP allocation request got satisfied via ZONE_CMA.
> Isn't that going to be a problem ?

I think that it is a separate problem. Once we restore utilization of
CMA area, it would become a problem in any case. It is just hidden by
utilization bug and should be handled separately. I guess that it's
not that hard to fix that problem.

Thanks.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/6] Introduce ZONE_CMA
@ 2016-09-01  6:01             ` Joonsoo Kim
  0 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-09-01  6:01 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, Linux Memory Management List, LKML

On Thu, Sep 01, 2016 at 11:17:23AM +0530, Aneesh Kumar K.V wrote:
> Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:
> 
> > On Tue, Aug 30, 2016 at 04:09:37PM +0530, Aneesh Kumar K.V wrote:
> >> Joonsoo Kim <js1304@gmail.com> writes:
> >> 
> >> > 2016-08-29 18:27 GMT+09:00 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>:
> >> >> js1304@gmail.com writes:
> >> >>
> >> >>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >> >>>
> >> >>> Hello,
> >> >>>
> >> >>> Changes from v4
> >> >>> o Rebase on next-20160825
> >> >>> o Add general fix patch for lowmem reserve
> >> >>> o Fix lowmem reserve ratio
> >> >>> o Fix zone span optimizaion per Vlastimil
> >> >>> o Fix pageset initialization
> >> >>> o Change invocation timing on cma_init_reserved_areas()
> >> >>
> >> >> I don't see much information regarding how we interleave between
> >> >> ZONE_CMA and other zones for movable allocation. Is that explained in
> >> >> any of the patch ? The fair zone allocator got removed by
> >> >> e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
> >> >
> >> > Interleaving would not work since the fair zone allocator policy is removed.
> >> > I don't think that it's a big problem because it is just matter of
> >> > timing to fill
> >> > up the memory. Eventually, memory on ZONE_CMA will be fully used in
> >> > any case.
> >> 
> >> Does that mean a CMA allocation will now be slower because in most case we
> >> will need to reclaim ? The zone list will now have ZONE_CMA in the
> >> beginning right ?
> >
> > ZONE_CMA will be used first but I don't think that CMA allocation will
> > be slower. In most case, memory would be fully used (usually
> > by page cache). So, we need reclaim or migration in any case.
> 
> Considering that the upstream kernel doesn't allow migration of THP
> pages, this would mean that migrate will fail in most case if we have
> THP enabled and the THP allocation request got satisfied via ZONE_CMA.
> Isn't that going to be a problem ?

I think that it is a separate problem. Once we restore utilization of
CMA area, it would become a problem in any case. It is just hidden by
utilization bug and should be handled separately. I guess that it's
not that hard to fix that problem.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 1/6] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request
  2016-08-29  5:07   ` js1304
@ 2016-09-16  3:14     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 54+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-16  3:14 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, linux-mm, linux-kernel, Joonsoo Kim

js1304@gmail.com writes:

> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> important to reserve. When ZONE_MOVABLE is used, this problem would
> theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> allocation request which is mainly used for page cache and anon page
> allocation. So, fix it.
>
> And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> makes code complex. For example, if there is highmem system, following
> reserve ratio is activated for *NORMAL ZONE* which would be easyily
> misleading people.
>
>  #ifdef CONFIG_HIGHMEM
>  32
>  #endif
>
> This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> array by MAX_NR_ZONES and place "#ifdef" to right place.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  include/linux/mmzone.h | 2 +-
>  mm/page_alloc.c        | 7 ++++---
>  2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index d572b78..e3f39af 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -877,7 +877,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
>  					void __user *, size_t *, loff_t *);
>  int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
>  					void __user *, size_t *, loff_t *);
> -extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
> +extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
>  int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
>  					void __user *, size_t *, loff_t *);
>  int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4f7d5d7..a8310de 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -198,17 +198,18 @@ static void __free_pages_ok(struct page *page, unsigned int order);
>   * TBD: should special case ZONE_DMA32 machines here - in those we normally
>   * don't need any ZONE_NORMAL reservation
>   */
> -int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
> +int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
>  #ifdef CONFIG_ZONE_DMA
>  	 256,
>  #endif
>  #ifdef CONFIG_ZONE_DMA32
>  	 256,
>  #endif
> -#ifdef CONFIG_HIGHMEM
>  	 32,
> +#ifdef CONFIG_HIGHMEM
> +	 INT_MAX,
>  #endif
> -	 32,
> +	 INT_MAX,
>  };
>
>  EXPORT_SYMBOL(totalram_pages);
> -- 
> 1.9.1

We can also do things like below to make it readable ?

#ifdef CONFIG_ZONE_DMA
	[ZONE_DMA] = 256,
#endif

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 1/6] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request
@ 2016-09-16  3:14     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 54+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-16  3:14 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, linux-mm, linux-kernel, Joonsoo Kim

js1304@gmail.com writes:

> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> important to reserve. When ZONE_MOVABLE is used, this problem would
> theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> allocation request which is mainly used for page cache and anon page
> allocation. So, fix it.
>
> And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> makes code complex. For example, if there is highmem system, following
> reserve ratio is activated for *NORMAL ZONE* which would be easyily
> misleading people.
>
>  #ifdef CONFIG_HIGHMEM
>  32
>  #endif
>
> This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> array by MAX_NR_ZONES and place "#ifdef" to right place.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  include/linux/mmzone.h | 2 +-
>  mm/page_alloc.c        | 7 ++++---
>  2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index d572b78..e3f39af 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -877,7 +877,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
>  					void __user *, size_t *, loff_t *);
>  int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
>  					void __user *, size_t *, loff_t *);
> -extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
> +extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
>  int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
>  					void __user *, size_t *, loff_t *);
>  int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4f7d5d7..a8310de 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -198,17 +198,18 @@ static void __free_pages_ok(struct page *page, unsigned int order);
>   * TBD: should special case ZONE_DMA32 machines here - in those we normally
>   * don't need any ZONE_NORMAL reservation
>   */
> -int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
> +int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
>  #ifdef CONFIG_ZONE_DMA
>  	 256,
>  #endif
>  #ifdef CONFIG_ZONE_DMA32
>  	 256,
>  #endif
> -#ifdef CONFIG_HIGHMEM
>  	 32,
> +#ifdef CONFIG_HIGHMEM
> +	 INT_MAX,
>  #endif
> -	 32,
> +	 INT_MAX,
>  };
>
>  EXPORT_SYMBOL(totalram_pages);
> -- 
> 1.9.1

We can also do things like below to make it readable ?

#ifdef CONFIG_ZONE_DMA
	[ZONE_DMA] = 256,
#endif

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 1/6] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request
  2016-08-29  5:07   ` js1304
@ 2016-09-21  9:06     ` Vlastimil Babka
  -1 siblings, 0 replies; 54+ messages in thread
From: Vlastimil Babka @ 2016-09-21  9:06 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel, Joonsoo Kim

On 08/29/2016 07:07 AM, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> important to reserve. When ZONE_MOVABLE is used, this problem would
> theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> allocation request which is mainly used for page cache and anon page
> allocation. So, fix it.
>
> And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> makes code complex. For example, if there is highmem system, following
> reserve ratio is activated for *NORMAL ZONE* which would be easyily
> misleading people.
>
>  #ifdef CONFIG_HIGHMEM
>  32
>  #endif
>
> This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> array by MAX_NR_ZONES and place "#ifdef" to right place.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 1/6] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request
@ 2016-09-21  9:06     ` Vlastimil Babka
  0 siblings, 0 replies; 54+ messages in thread
From: Vlastimil Babka @ 2016-09-21  9:06 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel, Joonsoo Kim

On 08/29/2016 07:07 AM, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> important to reserve. When ZONE_MOVABLE is used, this problem would
> theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> allocation request which is mainly used for page cache and anon page
> allocation. So, fix it.
>
> And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> makes code complex. For example, if there is highmem system, following
> reserve ratio is activated for *NORMAL ZONE* which would be easyily
> misleading people.
>
>  #ifdef CONFIG_HIGHMEM
>  32
>  #endif
>
> This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> array by MAX_NR_ZONES and place "#ifdef" to right place.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 2/6] mm/cma: introduce new zone, ZONE_CMA
  2016-08-29  5:07   ` js1304
@ 2016-09-21  9:11     ` Vlastimil Babka
  -1 siblings, 0 replies; 54+ messages in thread
From: Vlastimil Babka @ 2016-09-21  9:11 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel, Joonsoo Kim

On 08/29/2016 07:07 AM, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Attached cover-letter:
>
> This series try to solve problems of current CMA implementation.
>
> CMA is introduced to provide physically contiguous pages at runtime
> without exclusive reserved memory area. But, current implementation
> works like as previous reserved memory approach, because freepages
> on CMA region are used only if there is no movable freepage. In other
> words, freepages on CMA region are only used as fallback. In that
> situation where freepages on CMA region are used as fallback, kswapd
> would be woken up easily since there is no unmovable and reclaimable
> freepage, too. If kswapd starts to reclaim memory, fallback allocation
> to MIGRATE_CMA doesn't occur any more since movable freepages are
> already refilled by kswapd and then most of freepage on CMA are left
> to be in free. This situation looks like exclusive reserved memory case.
>
> In my experiment, I found that if system memory has 1024 MB memory and
> 512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
> free memory is left. Detailed reason is that for keeping enough free
> memory for unmovable and reclaimable allocation, kswapd uses below
> equation when calculating free memory and it easily go under the watermark.
>
> Free memory for unmovable and reclaimable = Free total - Free CMA pages
>
> This is derivated from the property of CMA freepage that CMA freepage
> can't be used for unmovable and reclaimable allocation.
>
> Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
> is lower than low watermark and tries to make free memory until
> (FreeTotal - FreeCMA) is higher than high watermark. That results
> in that FreeTotal is moving around 512MB boundary consistently. It
> then means that we can't utilize full memory capacity.
>
> To fix this problem, I submitted some patches [1] about 10 months ago,
> but, found some more problems to be fixed before solving this problem.
> It requires many hooks in allocator hotpath so some developers doesn't
> like it. Instead, some of them suggest different approach [2] to fix
> all the problems related to CMA, that is, introducing a new zone to deal
> with free CMA pages. I agree that it is the best way to go so implement
> here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
> decide to add a new zone rather than piggyback on ZONE_MOVABLE since
> they have some differences. First, reserved CMA pages should not be
> offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
> MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
> to distiguish hotpluggable memory and reserved memory for CMA in the same
> zone. It would make memory hotplug code which is already complicated
> more complicated. Second, cma_alloc() can be called more frequently
> than memory hotplug operation and possibly we need to control
> allocation rate of ZONE_CMA to optimize latency in the future.
> In this case, separate zone approach is easy to modify. Third, I'd
> like to see statistics for CMA, separately. Sometimes, we need to debug
> why cma_alloc() is failed and separate statistics would be more helpful
> in this situtaion.
>
> Anyway, this patchset solves four problems related to CMA implementation.
>
> 1) Utilization problem
> As mentioned above, we can't utilize full memory capacity due to the
> limitation of CMA freepage and fallback policy. This patchset implements
> a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
> typed allocation is used for page cache and anonymous pages which
> occupies most of memory usage in normal case so we can utilize full
> memory capacity. Below is the experiment result about this problem.
>
> 8 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
>
> <Before this series>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           92.4		186.5
> pswpin:                 82		18647
> pswpout:                160		69839
>
> <After this series>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           93.1		93.4
> pswpin:                 84		46
> pswpout:                183		92
>
> FYI, there is another attempt [3] trying to solve this problem in lkml.
> And, as far as I know, Qualcomm also has out-of-tree solution for this
> problem.
>
> 2) Reclaim problem
> Currently, there is no logic to distinguish CMA pages in reclaim path.
> If reclaim is initiated for unmovable and reclaimable allocation,
> reclaiming CMA pages doesn't help to satisfy the request and reclaiming
> CMA page is just waste. By managing CMA pages in the new zone, we can
> skip to reclaim ZONE_CMA completely if it is unnecessary.
>
> 3) Atomic allocation failure problem
> Kswapd isn't started to reclaim pages when allocation request is movable
> type and there is enough free page in the CMA region. After bunch of
> consecutive movable allocation requests, free pages in ordinary region
> (not CMA region) would be exhausted without waking up kswapd. At that time,
> if atomic unmovable allocation comes, it can't be successful since there
> is not enough page in ordinary region. This problem is reported
> by Aneesh [4] and can be solved by this patchset.
>
> 4) Inefficiently work of compaction
> Usual high-order allocation request is unmovable type and it cannot
> be serviced from CMA area. In compaction, migration scanner doesn't
> distinguish migratable pages on the CMA area and do migration.
> In this case, even if we make high-order page on that region, it
> cannot be used due to type mismatch. This patch will solve this problem
> by separating CMA pages from ordinary zones.
>
> [1] https://lkml.org/lkml/2014/5/28/64
> [2] https://lkml.org/lkml/2014/11/4/55
> [3] https://lkml.org/lkml/2014/10/15/623
> [4] http://www.spinics.net/lists/linux-mm/msg100562.html
> [5] https://lkml.org/lkml/2014/5/30/320
>
> For this patch:
>
> Currently, reserved pages for CMA are managed together with normal pages.
> To distinguish them, we used migratetype, MIGRATE_CMA, and
> do special handlings for this migratetype. But, it turns out that
> there are too many problems with this approach and to fix all of them
> needs many more hooks to page allocation and reclaim path so
> some developers express their discomfort and problems on CMA aren't fixed
> for a long time.
>
> To terminate this situation and fix CMA problems, this patch implements
> ZONE_CMA. Reserved pages for CMA will be managed in this new zone. This
> approach will remove all exisiting hooks for MIGRATE_CMA and many
> problems related to CMA implementation will be solved.
>
> This patch only add basic infrastructure of ZONE_CMA. In the following
> patch, ZONE_CMA is actually populated and used.
>
> Adding a new zone could cause two possible problems. One is the overflow
> of page flags and the other is GFP_ZONES_TABLE issue.
>
> Following is page-flags layout described in page-flags-layout.h.
>
> 1. No sparsemem or sparsemem vmemmap: |       NODE     | ZONE |             ... | FLAGS |
> 2.      " plus space for last_cpupid: |       NODE     | ZONE | LAST_CPUPID ... | FLAGS |
> 3. classic sparse with space for node:| SECTION | NODE | ZONE |             ... | FLAGS |
> 4.      " plus space for last_cpupid: | SECTION | NODE | ZONE | LAST_CPUPID ... | FLAGS |
> 5. classic sparse no space for node:  | SECTION |     ZONE    | ... | FLAGS |
>
> There is no problem in #1, #2 configurations for 64-bit system. There are
> enough room even for extremiely large x86_64 system. 32-bit system would
> not have many nodes so it would have no problem, too.
> System with #3, #4, #5 configurations could be affected by this zone
> addition, but, thanks to recent THP rework which reduce one page flag,
> problem surface would be small. In some configurations, problem is
> still possible, but, it highly depends on individual configuration
> so impact cannot be easily estimated. I guess that usual system
> with CONFIG_CMA would not be affected. If there is a problem,
> we can adjust section width or node width for that architecture.
>
> Currently, GFP_ZONES_TABLE is 32-bit value for 32-bit bit operation
> in the 32-bit system. If we add one more zone, it will be 48-bit and
> 32-bit bit operation cannot be possible. Although it will cause slight
> overhead, there is no other way so this patch relax GFP_ZONES_TABLE's
> 32-bit limitation. 32-bit System with CONFIG_CMA will be affected by
> this change but it would be marginal.
>
> Note that there are many checkpatch warnings but I think that current
> code is better for readability than fixing them up.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

The special hooks in all the initialization/hotplug functions are tricky 
and I wouldn't be surprised if we find some subtle bugs. But better than 
the current hooks in the alloc fastpaths...

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 2/6] mm/cma: introduce new zone, ZONE_CMA
@ 2016-09-21  9:11     ` Vlastimil Babka
  0 siblings, 0 replies; 54+ messages in thread
From: Vlastimil Babka @ 2016-09-21  9:11 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel, Joonsoo Kim

On 08/29/2016 07:07 AM, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Attached cover-letter:
>
> This series try to solve problems of current CMA implementation.
>
> CMA is introduced to provide physically contiguous pages at runtime
> without exclusive reserved memory area. But, current implementation
> works like as previous reserved memory approach, because freepages
> on CMA region are used only if there is no movable freepage. In other
> words, freepages on CMA region are only used as fallback. In that
> situation where freepages on CMA region are used as fallback, kswapd
> would be woken up easily since there is no unmovable and reclaimable
> freepage, too. If kswapd starts to reclaim memory, fallback allocation
> to MIGRATE_CMA doesn't occur any more since movable freepages are
> already refilled by kswapd and then most of freepage on CMA are left
> to be in free. This situation looks like exclusive reserved memory case.
>
> In my experiment, I found that if system memory has 1024 MB memory and
> 512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB
> free memory is left. Detailed reason is that for keeping enough free
> memory for unmovable and reclaimable allocation, kswapd uses below
> equation when calculating free memory and it easily go under the watermark.
>
> Free memory for unmovable and reclaimable = Free total - Free CMA pages
>
> This is derivated from the property of CMA freepage that CMA freepage
> can't be used for unmovable and reclaimable allocation.
>
> Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA)
> is lower than low watermark and tries to make free memory until
> (FreeTotal - FreeCMA) is higher than high watermark. That results
> in that FreeTotal is moving around 512MB boundary consistently. It
> then means that we can't utilize full memory capacity.
>
> To fix this problem, I submitted some patches [1] about 10 months ago,
> but, found some more problems to be fixed before solving this problem.
> It requires many hooks in allocator hotpath so some developers doesn't
> like it. Instead, some of them suggest different approach [2] to fix
> all the problems related to CMA, that is, introducing a new zone to deal
> with free CMA pages. I agree that it is the best way to go so implement
> here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I
> decide to add a new zone rather than piggyback on ZONE_MOVABLE since
> they have some differences. First, reserved CMA pages should not be
> offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep
> MIGRATE_CMA migratetype and insert many hooks on memory hotplug code
> to distiguish hotpluggable memory and reserved memory for CMA in the same
> zone. It would make memory hotplug code which is already complicated
> more complicated. Second, cma_alloc() can be called more frequently
> than memory hotplug operation and possibly we need to control
> allocation rate of ZONE_CMA to optimize latency in the future.
> In this case, separate zone approach is easy to modify. Third, I'd
> like to see statistics for CMA, separately. Sometimes, we need to debug
> why cma_alloc() is failed and separate statistics would be more helpful
> in this situtaion.
>
> Anyway, this patchset solves four problems related to CMA implementation.
>
> 1) Utilization problem
> As mentioned above, we can't utilize full memory capacity due to the
> limitation of CMA freepage and fallback policy. This patchset implements
> a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This
> typed allocation is used for page cache and anonymous pages which
> occupies most of memory usage in normal case so we can utilize full
> memory capacity. Below is the experiment result about this problem.
>
> 8 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
>
> <Before this series>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           92.4		186.5
> pswpin:                 82		18647
> pswpout:                160		69839
>
> <After this series>
> CMA reserve:            0 MB            512 MB
> Elapsed-time:           93.1		93.4
> pswpin:                 84		46
> pswpout:                183		92
>
> FYI, there is another attempt [3] trying to solve this problem in lkml.
> And, as far as I know, Qualcomm also has out-of-tree solution for this
> problem.
>
> 2) Reclaim problem
> Currently, there is no logic to distinguish CMA pages in reclaim path.
> If reclaim is initiated for unmovable and reclaimable allocation,
> reclaiming CMA pages doesn't help to satisfy the request and reclaiming
> CMA page is just waste. By managing CMA pages in the new zone, we can
> skip to reclaim ZONE_CMA completely if it is unnecessary.
>
> 3) Atomic allocation failure problem
> Kswapd isn't started to reclaim pages when allocation request is movable
> type and there is enough free page in the CMA region. After bunch of
> consecutive movable allocation requests, free pages in ordinary region
> (not CMA region) would be exhausted without waking up kswapd. At that time,
> if atomic unmovable allocation comes, it can't be successful since there
> is not enough page in ordinary region. This problem is reported
> by Aneesh [4] and can be solved by this patchset.
>
> 4) Inefficiently work of compaction
> Usual high-order allocation request is unmovable type and it cannot
> be serviced from CMA area. In compaction, migration scanner doesn't
> distinguish migratable pages on the CMA area and do migration.
> In this case, even if we make high-order page on that region, it
> cannot be used due to type mismatch. This patch will solve this problem
> by separating CMA pages from ordinary zones.
>
> [1] https://lkml.org/lkml/2014/5/28/64
> [2] https://lkml.org/lkml/2014/11/4/55
> [3] https://lkml.org/lkml/2014/10/15/623
> [4] http://www.spinics.net/lists/linux-mm/msg100562.html
> [5] https://lkml.org/lkml/2014/5/30/320
>
> For this patch:
>
> Currently, reserved pages for CMA are managed together with normal pages.
> To distinguish them, we used migratetype, MIGRATE_CMA, and
> do special handlings for this migratetype. But, it turns out that
> there are too many problems with this approach and to fix all of them
> needs many more hooks to page allocation and reclaim path so
> some developers express their discomfort and problems on CMA aren't fixed
> for a long time.
>
> To terminate this situation and fix CMA problems, this patch implements
> ZONE_CMA. Reserved pages for CMA will be managed in this new zone. This
> approach will remove all exisiting hooks for MIGRATE_CMA and many
> problems related to CMA implementation will be solved.
>
> This patch only add basic infrastructure of ZONE_CMA. In the following
> patch, ZONE_CMA is actually populated and used.
>
> Adding a new zone could cause two possible problems. One is the overflow
> of page flags and the other is GFP_ZONES_TABLE issue.
>
> Following is page-flags layout described in page-flags-layout.h.
>
> 1. No sparsemem or sparsemem vmemmap: |       NODE     | ZONE |             ... | FLAGS |
> 2.      " plus space for last_cpupid: |       NODE     | ZONE | LAST_CPUPID ... | FLAGS |
> 3. classic sparse with space for node:| SECTION | NODE | ZONE |             ... | FLAGS |
> 4.      " plus space for last_cpupid: | SECTION | NODE | ZONE | LAST_CPUPID ... | FLAGS |
> 5. classic sparse no space for node:  | SECTION |     ZONE    | ... | FLAGS |
>
> There is no problem in #1, #2 configurations for 64-bit system. There are
> enough room even for extremiely large x86_64 system. 32-bit system would
> not have many nodes so it would have no problem, too.
> System with #3, #4, #5 configurations could be affected by this zone
> addition, but, thanks to recent THP rework which reduce one page flag,
> problem surface would be small. In some configurations, problem is
> still possible, but, it highly depends on individual configuration
> so impact cannot be easily estimated. I guess that usual system
> with CONFIG_CMA would not be affected. If there is a problem,
> we can adjust section width or node width for that architecture.
>
> Currently, GFP_ZONES_TABLE is 32-bit value for 32-bit bit operation
> in the 32-bit system. If we add one more zone, it will be 48-bit and
> 32-bit bit operation cannot be possible. Although it will cause slight
> overhead, there is no other way so this patch relax GFP_ZONES_TABLE's
> 32-bit limitation. 32-bit System with CONFIG_CMA will be affected by
> this change but it would be marginal.
>
> Note that there are many checkpatch warnings but I think that current
> code is better for readability than fixing them up.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

The special hooks in all the initialization/hotplug functions are tricky 
and I wouldn't be surprised if we find some subtle bugs. But better than 
the current hooks in the alloc fastpaths...

Acked-by: Vlastimil Babka <vbabka@suse.cz>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 3/6] mm/cma: populate ZONE_CMA
  2016-08-29  5:07   ` js1304
@ 2016-09-21  9:20     ` Vlastimil Babka
  -1 siblings, 0 replies; 54+ messages in thread
From: Vlastimil Babka @ 2016-09-21  9:20 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel, Joonsoo Kim

On 08/29/2016 07:07 AM, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Until now, reserved pages for CMA are managed in the ordinary zones
> where page's pfn are belong to. This approach has numorous problems
> and fixing them isn't easy. (It is mentioned on previous patch.)
> To fix this situation, ZONE_CMA is introduced in previous patch, but,
> not yet populated. This patch implement population of ZONE_CMA
> by stealing reserved pages from the ordinary zones.
>
> Unlike previous implementation that kernel allocation request with
> __GFP_MOVABLE could be serviced from CMA region, allocation request only
> with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
> approach. This is an inevitable design decision to use the zone
> implementation because ZONE_CMA could contain highmem. Due to this
> decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.
>
> I don't think it would be a problem because most of file cache pages
> and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
> be proved by the fact that there are many systems with ZONE_HIGHMEM and
> they work fine. Notable disadvantage is that we cannot use these pages
> for blockdev file cache page, because it usually has __GFP_MOVABLE but
> not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
> cons. In my experience, blockdev file cache pages are one of the top
> reason that causes cma_alloc() to fail temporarily. So, we can get more
> guarantee of cma_alloc() success by discarding that case.
>
> Implementation itself is very easy to understand. Steal when cma area is
> initialized and recalculate various per zone stat/threshold.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

...

> @@ -145,6 +145,28 @@ err:
>  static int __init cma_init_reserved_areas(void)
>  {
>  	int i;
> +	struct zone *zone;
> +	unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> +
> +	if (!cma_area_count)
> +		return 0;
> +
> +	for (i = 0; i < cma_area_count; i++) {
> +		if (start_pfn > cma_areas[i].base_pfn)
> +			start_pfn = cma_areas[i].base_pfn;
> +		if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
> +			end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
> +	}
> +
> +	for_each_zone(zone) {
> +		if (!is_zone_cma(zone))
> +			continue;
> +
> +		/* ZONE_CMA doesn't need to exceed CMA region */
> +		zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
> +		zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
> +					zone->zone_start_pfn;
> +	}

Hmm, so what happens on a system with multiple nodes? Each will have its 
own ZONE_CMA, and all will have the same start pfn and spanned pages?

>  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>  void __init init_cma_reserved_pageblock(struct page *page)
>  {
>  	unsigned i = pageblock_nr_pages;
> +	unsigned long pfn = page_to_pfn(page);
>  	struct page *p = page;
> +	int nid = page_to_nid(page);
> +
> +	/*
> +	 * ZONE_CMA will steal present pages from other zones by changing
> +	 * page links so page_zone() is changed. Before that,
> +	 * we need to adjust previous zone's page count first.
> +	 */
> +	adjust_present_page_count(page, -pageblock_nr_pages);
>
>  	do {
>  		__ClearPageReserved(p);
>  		set_page_count(p, 0);
> -	} while (++p, --i);
> +
> +		/* Steal pages from other zones */
> +		set_page_links(p, ZONE_CMA, nid, pfn);
> +	} while (++p, ++pfn, --i);
> +
> +	adjust_present_page_count(page, pageblock_nr_pages);

This seems to assign pages to ZONE_CMA on the proper node, which is 
good. But then ZONE_CMA on multiple nodes will have unnecessary holes in 
the spanned pages, as each will contain only a subset.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 3/6] mm/cma: populate ZONE_CMA
@ 2016-09-21  9:20     ` Vlastimil Babka
  0 siblings, 0 replies; 54+ messages in thread
From: Vlastimil Babka @ 2016-09-21  9:20 UTC (permalink / raw)
  To: js1304, Andrew Morton
  Cc: Rik van Riel, Johannes Weiner, mgorman, Laura Abbott,
	Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel, Joonsoo Kim

On 08/29/2016 07:07 AM, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> Until now, reserved pages for CMA are managed in the ordinary zones
> where page's pfn are belong to. This approach has numorous problems
> and fixing them isn't easy. (It is mentioned on previous patch.)
> To fix this situation, ZONE_CMA is introduced in previous patch, but,
> not yet populated. This patch implement population of ZONE_CMA
> by stealing reserved pages from the ordinary zones.
>
> Unlike previous implementation that kernel allocation request with
> __GFP_MOVABLE could be serviced from CMA region, allocation request only
> with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
> approach. This is an inevitable design decision to use the zone
> implementation because ZONE_CMA could contain highmem. Due to this
> decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.
>
> I don't think it would be a problem because most of file cache pages
> and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
> be proved by the fact that there are many systems with ZONE_HIGHMEM and
> they work fine. Notable disadvantage is that we cannot use these pages
> for blockdev file cache page, because it usually has __GFP_MOVABLE but
> not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
> cons. In my experience, blockdev file cache pages are one of the top
> reason that causes cma_alloc() to fail temporarily. So, we can get more
> guarantee of cma_alloc() success by discarding that case.
>
> Implementation itself is very easy to understand. Steal when cma area is
> initialized and recalculate various per zone stat/threshold.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

...

> @@ -145,6 +145,28 @@ err:
>  static int __init cma_init_reserved_areas(void)
>  {
>  	int i;
> +	struct zone *zone;
> +	unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> +
> +	if (!cma_area_count)
> +		return 0;
> +
> +	for (i = 0; i < cma_area_count; i++) {
> +		if (start_pfn > cma_areas[i].base_pfn)
> +			start_pfn = cma_areas[i].base_pfn;
> +		if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
> +			end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
> +	}
> +
> +	for_each_zone(zone) {
> +		if (!is_zone_cma(zone))
> +			continue;
> +
> +		/* ZONE_CMA doesn't need to exceed CMA region */
> +		zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
> +		zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
> +					zone->zone_start_pfn;
> +	}

Hmm, so what happens on a system with multiple nodes? Each will have its 
own ZONE_CMA, and all will have the same start pfn and spanned pages?

>  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>  void __init init_cma_reserved_pageblock(struct page *page)
>  {
>  	unsigned i = pageblock_nr_pages;
> +	unsigned long pfn = page_to_pfn(page);
>  	struct page *p = page;
> +	int nid = page_to_nid(page);
> +
> +	/*
> +	 * ZONE_CMA will steal present pages from other zones by changing
> +	 * page links so page_zone() is changed. Before that,
> +	 * we need to adjust previous zone's page count first.
> +	 */
> +	adjust_present_page_count(page, -pageblock_nr_pages);
>
>  	do {
>  		__ClearPageReserved(p);
>  		set_page_count(p, 0);
> -	} while (++p, --i);
> +
> +		/* Steal pages from other zones */
> +		set_page_links(p, ZONE_CMA, nid, pfn);
> +	} while (++p, ++pfn, --i);
> +
> +	adjust_present_page_count(page, pageblock_nr_pages);

This seems to assign pages to ZONE_CMA on the proper node, which is 
good. But then ZONE_CMA on multiple nodes will have unnecessary holes in 
the spanned pages, as each will contain only a subset.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/6] Introduce ZONE_CMA
  2016-09-01  5:47           ` Aneesh Kumar K.V
@ 2016-09-21 14:47             ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 54+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-21 14:47 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, Linux Memory Management List, LKML

"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:
>
>> On Tue, Aug 30, 2016 at 04:09:37PM +0530, Aneesh Kumar K.V wrote:
>>> Joonsoo Kim <js1304@gmail.com> writes:
>>> 
>>> > 2016-08-29 18:27 GMT+09:00 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>:
>>> >> js1304@gmail.com writes:
>>> >>
>>> >>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> Changes from v4
>>> >>> o Rebase on next-20160825
>>> >>> o Add general fix patch for lowmem reserve
>>> >>> o Fix lowmem reserve ratio
>>> >>> o Fix zone span optimizaion per Vlastimil
>>> >>> o Fix pageset initialization
>>> >>> o Change invocation timing on cma_init_reserved_areas()
>>> >>
>>> >> I don't see much information regarding how we interleave between
>>> >> ZONE_CMA and other zones for movable allocation. Is that explained in
>>> >> any of the patch ? The fair zone allocator got removed by
>>> >> e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
>>> >
>>> > Interleaving would not work since the fair zone allocator policy is removed.
>>> > I don't think that it's a big problem because it is just matter of
>>> > timing to fill
>>> > up the memory. Eventually, memory on ZONE_CMA will be fully used in
>>> > any case.
>>> 
>>> Does that mean a CMA allocation will now be slower because in most case we
>>> will need to reclaim ? The zone list will now have ZONE_CMA in the
>>> beginning right ?
>>
>> ZONE_CMA will be used first but I don't think that CMA allocation will
>> be slower. In most case, memory would be fully used (usually
>> by page cache). So, we need reclaim or migration in any case.
>
> Considering that the upstream kernel doesn't allow migration of THP
> pages, this would mean that migrate will fail in most case if we have
> THP enabled and the THP allocation request got satisfied via ZONE_CMA.
> Isn't that going to be a problem ?
>

Even though we have the issues of migration failures due to pinned and
THP pages in ZONE_CMA, overall the code is simpler. IMHO we should get
this upstream now and work on solving those issues later.

You can add for the complete series.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

-aneesh

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/6] Introduce ZONE_CMA
@ 2016-09-21 14:47             ` Aneesh Kumar K.V
  0 siblings, 0 replies; 54+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-21 14:47 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, Linux Memory Management List, LKML

"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:
>
>> On Tue, Aug 30, 2016 at 04:09:37PM +0530, Aneesh Kumar K.V wrote:
>>> Joonsoo Kim <js1304@gmail.com> writes:
>>> 
>>> > 2016-08-29 18:27 GMT+09:00 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>:
>>> >> js1304@gmail.com writes:
>>> >>
>>> >>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> Changes from v4
>>> >>> o Rebase on next-20160825
>>> >>> o Add general fix patch for lowmem reserve
>>> >>> o Fix lowmem reserve ratio
>>> >>> o Fix zone span optimizaion per Vlastimil
>>> >>> o Fix pageset initialization
>>> >>> o Change invocation timing on cma_init_reserved_areas()
>>> >>
>>> >> I don't see much information regarding how we interleave between
>>> >> ZONE_CMA and other zones for movable allocation. Is that explained in
>>> >> any of the patch ? The fair zone allocator got removed by
>>> >> e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
>>> >
>>> > Interleaving would not work since the fair zone allocator policy is removed.
>>> > I don't think that it's a big problem because it is just matter of
>>> > timing to fill
>>> > up the memory. Eventually, memory on ZONE_CMA will be fully used in
>>> > any case.
>>> 
>>> Does that mean a CMA allocation will now be slower because in most case we
>>> will need to reclaim ? The zone list will now have ZONE_CMA in the
>>> beginning right ?
>>
>> ZONE_CMA will be used first but I don't think that CMA allocation will
>> be slower. In most case, memory would be fully used (usually
>> by page cache). So, we need reclaim or migration in any case.
>
> Considering that the upstream kernel doesn't allow migration of THP
> pages, this would mean that migrate will fail in most case if we have
> THP enabled and the THP allocation request got satisfied via ZONE_CMA.
> Isn't that going to be a problem ?
>

Even though we have the issues of migration failures due to pinned and
THP pages in ZONE_CMA, overall the code is simpler. IMHO we should get
this upstream now and work on solving those issues later.

You can add for the complete series.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 1/6] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request
  2016-09-16  3:14     ` Aneesh Kumar K.V
@ 2016-09-22  5:30       ` Joonsoo Kim
  -1 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-09-22  5:30 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, linux-mm, linux-kernel

On Fri, Sep 16, 2016 at 08:44:17AM +0530, Aneesh Kumar K.V wrote:
> js1304@gmail.com writes:
> 
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> > Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> > important to reserve. When ZONE_MOVABLE is used, this problem would
> > theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> > allocation request which is mainly used for page cache and anon page
> > allocation. So, fix it.
> >
> > And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> > makes code complex. For example, if there is highmem system, following
> > reserve ratio is activated for *NORMAL ZONE* which would be easyily
> > misleading people.
> >
> >  #ifdef CONFIG_HIGHMEM
> >  32
> >  #endif
> >
> > This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> > array by MAX_NR_ZONES and place "#ifdef" to right place.
> >
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> >  include/linux/mmzone.h | 2 +-
> >  mm/page_alloc.c        | 7 ++++---
> >  2 files changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index d572b78..e3f39af 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -877,7 +877,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
> >  					void __user *, size_t *, loff_t *);
> >  int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
> >  					void __user *, size_t *, loff_t *);
> > -extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
> > +extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
> >  int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
> >  					void __user *, size_t *, loff_t *);
> >  int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 4f7d5d7..a8310de 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -198,17 +198,18 @@ static void __free_pages_ok(struct page *page, unsigned int order);
> >   * TBD: should special case ZONE_DMA32 machines here - in those we normally
> >   * don't need any ZONE_NORMAL reservation
> >   */
> > -int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
> > +int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
> >  #ifdef CONFIG_ZONE_DMA
> >  	 256,
> >  #endif
> >  #ifdef CONFIG_ZONE_DMA32
> >  	 256,
> >  #endif
> > -#ifdef CONFIG_HIGHMEM
> >  	 32,
> > +#ifdef CONFIG_HIGHMEM
> > +	 INT_MAX,
> >  #endif
> > -	 32,
> > +	 INT_MAX,
> >  };
> >
> >  EXPORT_SYMBOL(totalram_pages);
> > -- 
> > 1.9.1
> 
> We can also do things like below to make it readable ?
> 
> #ifdef CONFIG_ZONE_DMA
> 	[ZONE_DMA] = 256,
> #endif

It looks more readable! I will change it.

> 
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Thanks!

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 1/6] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request
@ 2016-09-22  5:30       ` Joonsoo Kim
  0 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-09-22  5:30 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, linux-mm, linux-kernel

On Fri, Sep 16, 2016 at 08:44:17AM +0530, Aneesh Kumar K.V wrote:
> js1304@gmail.com writes:
> 
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> > Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
> > important to reserve. When ZONE_MOVABLE is used, this problem would
> > theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
> > allocation request which is mainly used for page cache and anon page
> > allocation. So, fix it.
> >
> > And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
> > makes code complex. For example, if there is highmem system, following
> > reserve ratio is activated for *NORMAL ZONE* which would be easyily
> > misleading people.
> >
> >  #ifdef CONFIG_HIGHMEM
> >  32
> >  #endif
> >
> > This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
> > array by MAX_NR_ZONES and place "#ifdef" to right place.
> >
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> >  include/linux/mmzone.h | 2 +-
> >  mm/page_alloc.c        | 7 ++++---
> >  2 files changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index d572b78..e3f39af 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -877,7 +877,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
> >  					void __user *, size_t *, loff_t *);
> >  int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
> >  					void __user *, size_t *, loff_t *);
> > -extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
> > +extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
> >  int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
> >  					void __user *, size_t *, loff_t *);
> >  int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 4f7d5d7..a8310de 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -198,17 +198,18 @@ static void __free_pages_ok(struct page *page, unsigned int order);
> >   * TBD: should special case ZONE_DMA32 machines here - in those we normally
> >   * don't need any ZONE_NORMAL reservation
> >   */
> > -int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
> > +int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
> >  #ifdef CONFIG_ZONE_DMA
> >  	 256,
> >  #endif
> >  #ifdef CONFIG_ZONE_DMA32
> >  	 256,
> >  #endif
> > -#ifdef CONFIG_HIGHMEM
> >  	 32,
> > +#ifdef CONFIG_HIGHMEM
> > +	 INT_MAX,
> >  #endif
> > -	 32,
> > +	 INT_MAX,
> >  };
> >
> >  EXPORT_SYMBOL(totalram_pages);
> > -- 
> > 1.9.1
> 
> We can also do things like below to make it readable ?
> 
> #ifdef CONFIG_ZONE_DMA
> 	[ZONE_DMA] = 256,
> #endif

It looks more readable! I will change it.

> 
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/6] Introduce ZONE_CMA
  2016-09-21 14:47             ` Aneesh Kumar K.V
@ 2016-09-22  5:32               ` Joonsoo Kim
  -1 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-09-22  5:32 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, Linux Memory Management List, LKML

On Wed, Sep 21, 2016 at 08:17:27PM +0530, Aneesh Kumar K.V wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
> 
> > Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:
> >
> >> On Tue, Aug 30, 2016 at 04:09:37PM +0530, Aneesh Kumar K.V wrote:
> >>> Joonsoo Kim <js1304@gmail.com> writes:
> >>> 
> >>> > 2016-08-29 18:27 GMT+09:00 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>:
> >>> >> js1304@gmail.com writes:
> >>> >>
> >>> >>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >>> >>>
> >>> >>> Hello,
> >>> >>>
> >>> >>> Changes from v4
> >>> >>> o Rebase on next-20160825
> >>> >>> o Add general fix patch for lowmem reserve
> >>> >>> o Fix lowmem reserve ratio
> >>> >>> o Fix zone span optimizaion per Vlastimil
> >>> >>> o Fix pageset initialization
> >>> >>> o Change invocation timing on cma_init_reserved_areas()
> >>> >>
> >>> >> I don't see much information regarding how we interleave between
> >>> >> ZONE_CMA and other zones for movable allocation. Is that explained in
> >>> >> any of the patch ? The fair zone allocator got removed by
> >>> >> e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
> >>> >
> >>> > Interleaving would not work since the fair zone allocator policy is removed.
> >>> > I don't think that it's a big problem because it is just matter of
> >>> > timing to fill
> >>> > up the memory. Eventually, memory on ZONE_CMA will be fully used in
> >>> > any case.
> >>> 
> >>> Does that mean a CMA allocation will now be slower because in most case we
> >>> will need to reclaim ? The zone list will now have ZONE_CMA in the
> >>> beginning right ?
> >>
> >> ZONE_CMA will be used first but I don't think that CMA allocation will
> >> be slower. In most case, memory would be fully used (usually
> >> by page cache). So, we need reclaim or migration in any case.
> >
> > Considering that the upstream kernel doesn't allow migration of THP
> > pages, this would mean that migrate will fail in most case if we have
> > THP enabled and the THP allocation request got satisfied via ZONE_CMA.
> > Isn't that going to be a problem ?
> >
> 
> Even though we have the issues of migration failures due to pinned and
> THP pages in ZONE_CMA, overall the code is simpler. IMHO we should get
> this upstream now and work on solving those issues later.

Yep! I will take a look on those problems after merging this patchset.

> 
> You can add for the complete series.
> 
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Thanks!

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 0/6] Introduce ZONE_CMA
@ 2016-09-22  5:32               ` Joonsoo Kim
  0 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-09-22  5:32 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, Mel Gorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Vlastimil Babka, Linux Memory Management List, LKML

On Wed, Sep 21, 2016 at 08:17:27PM +0530, Aneesh Kumar K.V wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
> 
> > Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:
> >
> >> On Tue, Aug 30, 2016 at 04:09:37PM +0530, Aneesh Kumar K.V wrote:
> >>> Joonsoo Kim <js1304@gmail.com> writes:
> >>> 
> >>> > 2016-08-29 18:27 GMT+09:00 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>:
> >>> >> js1304@gmail.com writes:
> >>> >>
> >>> >>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >>> >>>
> >>> >>> Hello,
> >>> >>>
> >>> >>> Changes from v4
> >>> >>> o Rebase on next-20160825
> >>> >>> o Add general fix patch for lowmem reserve
> >>> >>> o Fix lowmem reserve ratio
> >>> >>> o Fix zone span optimizaion per Vlastimil
> >>> >>> o Fix pageset initialization
> >>> >>> o Change invocation timing on cma_init_reserved_areas()
> >>> >>
> >>> >> I don't see much information regarding how we interleave between
> >>> >> ZONE_CMA and other zones for movable allocation. Is that explained in
> >>> >> any of the patch ? The fair zone allocator got removed by
> >>> >> e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
> >>> >
> >>> > Interleaving would not work since the fair zone allocator policy is removed.
> >>> > I don't think that it's a big problem because it is just matter of
> >>> > timing to fill
> >>> > up the memory. Eventually, memory on ZONE_CMA will be fully used in
> >>> > any case.
> >>> 
> >>> Does that mean a CMA allocation will now be slower because in most case we
> >>> will need to reclaim ? The zone list will now have ZONE_CMA in the
> >>> beginning right ?
> >>
> >> ZONE_CMA will be used first but I don't think that CMA allocation will
> >> be slower. In most case, memory would be fully used (usually
> >> by page cache). So, we need reclaim or migration in any case.
> >
> > Considering that the upstream kernel doesn't allow migration of THP
> > pages, this would mean that migrate will fail in most case if we have
> > THP enabled and the THP allocation request got satisfied via ZONE_CMA.
> > Isn't that going to be a problem ?
> >
> 
> Even though we have the issues of migration failures due to pinned and
> THP pages in ZONE_CMA, overall the code is simpler. IMHO we should get
> this upstream now and work on solving those issues later.

Yep! I will take a look on those problems after merging this patchset.

> 
> You can add for the complete series.
> 
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 3/6] mm/cma: populate ZONE_CMA
  2016-09-21  9:20     ` Vlastimil Babka
@ 2016-09-22  5:45       ` Joonsoo Kim
  -1 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-09-22  5:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel

On Wed, Sep 21, 2016 at 11:20:11AM +0200, Vlastimil Babka wrote:
> On 08/29/2016 07:07 AM, js1304@gmail.com wrote:
> >From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> >Until now, reserved pages for CMA are managed in the ordinary zones
> >where page's pfn are belong to. This approach has numorous problems
> >and fixing them isn't easy. (It is mentioned on previous patch.)
> >To fix this situation, ZONE_CMA is introduced in previous patch, but,
> >not yet populated. This patch implement population of ZONE_CMA
> >by stealing reserved pages from the ordinary zones.
> >
> >Unlike previous implementation that kernel allocation request with
> >__GFP_MOVABLE could be serviced from CMA region, allocation request only
> >with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
> >approach. This is an inevitable design decision to use the zone
> >implementation because ZONE_CMA could contain highmem. Due to this
> >decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.
> >
> >I don't think it would be a problem because most of file cache pages
> >and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
> >be proved by the fact that there are many systems with ZONE_HIGHMEM and
> >they work fine. Notable disadvantage is that we cannot use these pages
> >for blockdev file cache page, because it usually has __GFP_MOVABLE but
> >not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
> >cons. In my experience, blockdev file cache pages are one of the top
> >reason that causes cma_alloc() to fail temporarily. So, we can get more
> >guarantee of cma_alloc() success by discarding that case.
> >
> >Implementation itself is very easy to understand. Steal when cma area is
> >initialized and recalculate various per zone stat/threshold.
> >
> >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> ...
> 
> >@@ -145,6 +145,28 @@ err:
> > static int __init cma_init_reserved_areas(void)
> > {
> > 	int i;
> >+	struct zone *zone;
> >+	unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> >+
> >+	if (!cma_area_count)
> >+		return 0;
> >+
> >+	for (i = 0; i < cma_area_count; i++) {
> >+		if (start_pfn > cma_areas[i].base_pfn)
> >+			start_pfn = cma_areas[i].base_pfn;
> >+		if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
> >+			end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
> >+	}
> >+
> >+	for_each_zone(zone) {
> >+		if (!is_zone_cma(zone))
> >+			continue;
> >+
> >+		/* ZONE_CMA doesn't need to exceed CMA region */
> >+		zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
> >+		zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
> >+					zone->zone_start_pfn;
> >+	}
> 
> Hmm, so what happens on a system with multiple nodes? Each will have
> its own ZONE_CMA, and all will have the same start pfn and spanned
> pages?

Each of zone_start_pfn and spanned_pages are initialized in
calculate_node_totalpages() which considers node boundary. So, they will
have not the same start pfn and spanned pages. However, each would
contain unnecessary holes.

> 
> > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> > void __init init_cma_reserved_pageblock(struct page *page)
> > {
> > 	unsigned i = pageblock_nr_pages;
> >+	unsigned long pfn = page_to_pfn(page);
> > 	struct page *p = page;
> >+	int nid = page_to_nid(page);
> >+
> >+	/*
> >+	 * ZONE_CMA will steal present pages from other zones by changing
> >+	 * page links so page_zone() is changed. Before that,
> >+	 * we need to adjust previous zone's page count first.
> >+	 */
> >+	adjust_present_page_count(page, -pageblock_nr_pages);
> >
> > 	do {
> > 		__ClearPageReserved(p);
> > 		set_page_count(p, 0);
> >-	} while (++p, --i);
> >+
> >+		/* Steal pages from other zones */
> >+		set_page_links(p, ZONE_CMA, nid, pfn);
> >+	} while (++p, ++pfn, --i);
> >+
> >+	adjust_present_page_count(page, pageblock_nr_pages);
> 
> This seems to assign pages to ZONE_CMA on the proper node, which is
> good. But then ZONE_CMA on multiple nodes will have unnecessary
> holes in the spanned pages, as each will contain only a subset.

True, I will fix it and respin the series.

Thanks.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 3/6] mm/cma: populate ZONE_CMA
@ 2016-09-22  5:45       ` Joonsoo Kim
  0 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-09-22  5:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel

On Wed, Sep 21, 2016 at 11:20:11AM +0200, Vlastimil Babka wrote:
> On 08/29/2016 07:07 AM, js1304@gmail.com wrote:
> >From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> >Until now, reserved pages for CMA are managed in the ordinary zones
> >where page's pfn are belong to. This approach has numorous problems
> >and fixing them isn't easy. (It is mentioned on previous patch.)
> >To fix this situation, ZONE_CMA is introduced in previous patch, but,
> >not yet populated. This patch implement population of ZONE_CMA
> >by stealing reserved pages from the ordinary zones.
> >
> >Unlike previous implementation that kernel allocation request with
> >__GFP_MOVABLE could be serviced from CMA region, allocation request only
> >with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
> >approach. This is an inevitable design decision to use the zone
> >implementation because ZONE_CMA could contain highmem. Due to this
> >decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.
> >
> >I don't think it would be a problem because most of file cache pages
> >and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
> >be proved by the fact that there are many systems with ZONE_HIGHMEM and
> >they work fine. Notable disadvantage is that we cannot use these pages
> >for blockdev file cache page, because it usually has __GFP_MOVABLE but
> >not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
> >cons. In my experience, blockdev file cache pages are one of the top
> >reason that causes cma_alloc() to fail temporarily. So, we can get more
> >guarantee of cma_alloc() success by discarding that case.
> >
> >Implementation itself is very easy to understand. Steal when cma area is
> >initialized and recalculate various per zone stat/threshold.
> >
> >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> ...
> 
> >@@ -145,6 +145,28 @@ err:
> > static int __init cma_init_reserved_areas(void)
> > {
> > 	int i;
> >+	struct zone *zone;
> >+	unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> >+
> >+	if (!cma_area_count)
> >+		return 0;
> >+
> >+	for (i = 0; i < cma_area_count; i++) {
> >+		if (start_pfn > cma_areas[i].base_pfn)
> >+			start_pfn = cma_areas[i].base_pfn;
> >+		if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
> >+			end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
> >+	}
> >+
> >+	for_each_zone(zone) {
> >+		if (!is_zone_cma(zone))
> >+			continue;
> >+
> >+		/* ZONE_CMA doesn't need to exceed CMA region */
> >+		zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
> >+		zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
> >+					zone->zone_start_pfn;
> >+	}
> 
> Hmm, so what happens on a system with multiple nodes? Each will have
> its own ZONE_CMA, and all will have the same start pfn and spanned
> pages?

Each of zone_start_pfn and spanned_pages are initialized in
calculate_node_totalpages() which considers node boundary. So, they will
have not the same start pfn and spanned pages. However, each would
contain unnecessary holes.

> 
> > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> > void __init init_cma_reserved_pageblock(struct page *page)
> > {
> > 	unsigned i = pageblock_nr_pages;
> >+	unsigned long pfn = page_to_pfn(page);
> > 	struct page *p = page;
> >+	int nid = page_to_nid(page);
> >+
> >+	/*
> >+	 * ZONE_CMA will steal present pages from other zones by changing
> >+	 * page links so page_zone() is changed. Before that,
> >+	 * we need to adjust previous zone's page count first.
> >+	 */
> >+	adjust_present_page_count(page, -pageblock_nr_pages);
> >
> > 	do {
> > 		__ClearPageReserved(p);
> > 		set_page_count(p, 0);
> >-	} while (++p, --i);
> >+
> >+		/* Steal pages from other zones */
> >+		set_page_links(p, ZONE_CMA, nid, pfn);
> >+	} while (++p, ++pfn, --i);
> >+
> >+	adjust_present_page_count(page, pageblock_nr_pages);
> 
> This seems to assign pages to ZONE_CMA on the proper node, which is
> good. But then ZONE_CMA on multiple nodes will have unnecessary
> holes in the spanned pages, as each will contain only a subset.

True, I will fix it and respin the series.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 3/6] mm/cma: populate ZONE_CMA
  2016-09-22  5:45       ` Joonsoo Kim
@ 2016-09-22  6:50         ` Joonsoo Kim
  -1 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-09-22  6:50 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel

On Thu, Sep 22, 2016 at 02:45:46PM +0900, Joonsoo Kim wrote:
> On Wed, Sep 21, 2016 at 11:20:11AM +0200, Vlastimil Babka wrote:
> > On 08/29/2016 07:07 AM, js1304@gmail.com wrote:
> > >From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > >
> > >Until now, reserved pages for CMA are managed in the ordinary zones
> > >where page's pfn are belong to. This approach has numorous problems
> > >and fixing them isn't easy. (It is mentioned on previous patch.)
> > >To fix this situation, ZONE_CMA is introduced in previous patch, but,
> > >not yet populated. This patch implement population of ZONE_CMA
> > >by stealing reserved pages from the ordinary zones.
> > >
> > >Unlike previous implementation that kernel allocation request with
> > >__GFP_MOVABLE could be serviced from CMA region, allocation request only
> > >with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
> > >approach. This is an inevitable design decision to use the zone
> > >implementation because ZONE_CMA could contain highmem. Due to this
> > >decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.
> > >
> > >I don't think it would be a problem because most of file cache pages
> > >and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
> > >be proved by the fact that there are many systems with ZONE_HIGHMEM and
> > >they work fine. Notable disadvantage is that we cannot use these pages
> > >for blockdev file cache page, because it usually has __GFP_MOVABLE but
> > >not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
> > >cons. In my experience, blockdev file cache pages are one of the top
> > >reason that causes cma_alloc() to fail temporarily. So, we can get more
> > >guarantee of cma_alloc() success by discarding that case.
> > >
> > >Implementation itself is very easy to understand. Steal when cma area is
> > >initialized and recalculate various per zone stat/threshold.
> > >
> > >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
> > ...
> > 
> > >@@ -145,6 +145,28 @@ err:
> > > static int __init cma_init_reserved_areas(void)
> > > {
> > > 	int i;
> > >+	struct zone *zone;
> > >+	unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> > >+
> > >+	if (!cma_area_count)
> > >+		return 0;
> > >+
> > >+	for (i = 0; i < cma_area_count; i++) {
> > >+		if (start_pfn > cma_areas[i].base_pfn)
> > >+			start_pfn = cma_areas[i].base_pfn;
> > >+		if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
> > >+			end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
> > >+	}
> > >+
> > >+	for_each_zone(zone) {
> > >+		if (!is_zone_cma(zone))
> > >+			continue;
> > >+
> > >+		/* ZONE_CMA doesn't need to exceed CMA region */
> > >+		zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
> > >+		zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
> > >+					zone->zone_start_pfn;
> > >+	}
> > 
> > Hmm, so what happens on a system with multiple nodes? Each will have
> > its own ZONE_CMA, and all will have the same start pfn and spanned
> > pages?
> 
> Each of zone_start_pfn and spanned_pages are initialized in
> calculate_node_totalpages() which considers node boundary. So, they will
> have not the same start pfn and spanned pages. However, each would
> contain unnecessary holes.
> 
> > 
> > > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> > > void __init init_cma_reserved_pageblock(struct page *page)
> > > {
> > > 	unsigned i = pageblock_nr_pages;
> > >+	unsigned long pfn = page_to_pfn(page);
> > > 	struct page *p = page;
> > >+	int nid = page_to_nid(page);
> > >+
> > >+	/*
> > >+	 * ZONE_CMA will steal present pages from other zones by changing
> > >+	 * page links so page_zone() is changed. Before that,
> > >+	 * we need to adjust previous zone's page count first.
> > >+	 */
> > >+	adjust_present_page_count(page, -pageblock_nr_pages);
> > >
> > > 	do {
> > > 		__ClearPageReserved(p);
> > > 		set_page_count(p, 0);
> > >-	} while (++p, --i);
> > >+
> > >+		/* Steal pages from other zones */
> > >+		set_page_links(p, ZONE_CMA, nid, pfn);
> > >+	} while (++p, ++pfn, --i);
> > >+
> > >+	adjust_present_page_count(page, pageblock_nr_pages);
> > 
> > This seems to assign pages to ZONE_CMA on the proper node, which is
> > good. But then ZONE_CMA on multiple nodes will have unnecessary
> > holes in the spanned pages, as each will contain only a subset.
> 
> True, I will fix it and respin the series.

I now realize that it's too late to send full series for next
merge window. I will send full series after next merge window is closed.

Anyway, I'd like to confirm that following incremental patch will solve
your concern.

Thanks.


------>8--------------
 mm/cma.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/mm/cma.c b/mm/cma.c
index d69bdf7..8375554 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -146,22 +146,29 @@ static int __init cma_init_reserved_areas(void)
 {
        int i;
        struct zone *zone;
-       unsigned long start_pfn = UINT_MAX, end_pfn = 0;
+       pg_data_t *pgdat;
 
        if (!cma_area_count)
                return 0;
 
-       for (i = 0; i < cma_area_count; i++) {
-               if (start_pfn > cma_areas[i].base_pfn)
-                       start_pfn = cma_areas[i].base_pfn;
-               if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
-                       end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
-       }
+       for_each_online_pgdat(pgdat) {
+               unsigned long start_pfn = UINT_MAX, end_pfn = 0;
 
-       for_each_zone(zone) {
-               if (!is_zone_cma(zone))
+               for (i = 0; i < cma_area_count; i++) {
+                       if (page_to_nid(pfn_to_page(cma_areas[i].base_pfn)) !=
+                               pgdat->node_id)
+                               continue;
+
+                       start_pfn = min(start_pfn, cma_areas[i].base_pfn);
+                       end_pfn = max(end_pfn, cma_areas[i].base_pfn +
+                                               cma_areas[i].count);
+               }
+
+               if (!end_pfn)
                        continue;
 
+               zone = &pgdat->node_zones[ZONE_CMA];
+
                /* ZONE_CMA doesn't need to exceed CMA region */
                zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
                zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 3/6] mm/cma: populate ZONE_CMA
@ 2016-09-22  6:50         ` Joonsoo Kim
  0 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-09-22  6:50 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel

On Thu, Sep 22, 2016 at 02:45:46PM +0900, Joonsoo Kim wrote:
> On Wed, Sep 21, 2016 at 11:20:11AM +0200, Vlastimil Babka wrote:
> > On 08/29/2016 07:07 AM, js1304@gmail.com wrote:
> > >From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > >
> > >Until now, reserved pages for CMA are managed in the ordinary zones
> > >where page's pfn are belong to. This approach has numorous problems
> > >and fixing them isn't easy. (It is mentioned on previous patch.)
> > >To fix this situation, ZONE_CMA is introduced in previous patch, but,
> > >not yet populated. This patch implement population of ZONE_CMA
> > >by stealing reserved pages from the ordinary zones.
> > >
> > >Unlike previous implementation that kernel allocation request with
> > >__GFP_MOVABLE could be serviced from CMA region, allocation request only
> > >with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
> > >approach. This is an inevitable design decision to use the zone
> > >implementation because ZONE_CMA could contain highmem. Due to this
> > >decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.
> > >
> > >I don't think it would be a problem because most of file cache pages
> > >and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
> > >be proved by the fact that there are many systems with ZONE_HIGHMEM and
> > >they work fine. Notable disadvantage is that we cannot use these pages
> > >for blockdev file cache page, because it usually has __GFP_MOVABLE but
> > >not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
> > >cons. In my experience, blockdev file cache pages are one of the top
> > >reason that causes cma_alloc() to fail temporarily. So, we can get more
> > >guarantee of cma_alloc() success by discarding that case.
> > >
> > >Implementation itself is very easy to understand. Steal when cma area is
> > >initialized and recalculate various per zone stat/threshold.
> > >
> > >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
> > ...
> > 
> > >@@ -145,6 +145,28 @@ err:
> > > static int __init cma_init_reserved_areas(void)
> > > {
> > > 	int i;
> > >+	struct zone *zone;
> > >+	unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> > >+
> > >+	if (!cma_area_count)
> > >+		return 0;
> > >+
> > >+	for (i = 0; i < cma_area_count; i++) {
> > >+		if (start_pfn > cma_areas[i].base_pfn)
> > >+			start_pfn = cma_areas[i].base_pfn;
> > >+		if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
> > >+			end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
> > >+	}
> > >+
> > >+	for_each_zone(zone) {
> > >+		if (!is_zone_cma(zone))
> > >+			continue;
> > >+
> > >+		/* ZONE_CMA doesn't need to exceed CMA region */
> > >+		zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
> > >+		zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
> > >+					zone->zone_start_pfn;
> > >+	}
> > 
> > Hmm, so what happens on a system with multiple nodes? Each will have
> > its own ZONE_CMA, and all will have the same start pfn and spanned
> > pages?
> 
> Each of zone_start_pfn and spanned_pages are initialized in
> calculate_node_totalpages() which considers node boundary. So, they will
> have not the same start pfn and spanned pages. However, each would
> contain unnecessary holes.
> 
> > 
> > > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> > > void __init init_cma_reserved_pageblock(struct page *page)
> > > {
> > > 	unsigned i = pageblock_nr_pages;
> > >+	unsigned long pfn = page_to_pfn(page);
> > > 	struct page *p = page;
> > >+	int nid = page_to_nid(page);
> > >+
> > >+	/*
> > >+	 * ZONE_CMA will steal present pages from other zones by changing
> > >+	 * page links so page_zone() is changed. Before that,
> > >+	 * we need to adjust previous zone's page count first.
> > >+	 */
> > >+	adjust_present_page_count(page, -pageblock_nr_pages);
> > >
> > > 	do {
> > > 		__ClearPageReserved(p);
> > > 		set_page_count(p, 0);
> > >-	} while (++p, --i);
> > >+
> > >+		/* Steal pages from other zones */
> > >+		set_page_links(p, ZONE_CMA, nid, pfn);
> > >+	} while (++p, ++pfn, --i);
> > >+
> > >+	adjust_present_page_count(page, pageblock_nr_pages);
> > 
> > This seems to assign pages to ZONE_CMA on the proper node, which is
> > good. But then ZONE_CMA on multiple nodes will have unnecessary
> > holes in the spanned pages, as each will contain only a subset.
> 
> True, I will fix it and respin the series.

I now realize that it's too late to send full series for next
merge window. I will send full series after next merge window is closed.

Anyway, I'd like to confirm that following incremental patch will solve
your concern.

Thanks.


------>8--------------
 mm/cma.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/mm/cma.c b/mm/cma.c
index d69bdf7..8375554 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -146,22 +146,29 @@ static int __init cma_init_reserved_areas(void)
 {
        int i;
        struct zone *zone;
-       unsigned long start_pfn = UINT_MAX, end_pfn = 0;
+       pg_data_t *pgdat;
 
        if (!cma_area_count)
                return 0;
 
-       for (i = 0; i < cma_area_count; i++) {
-               if (start_pfn > cma_areas[i].base_pfn)
-                       start_pfn = cma_areas[i].base_pfn;
-               if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
-                       end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
-       }
+       for_each_online_pgdat(pgdat) {
+               unsigned long start_pfn = UINT_MAX, end_pfn = 0;
 
-       for_each_zone(zone) {
-               if (!is_zone_cma(zone))
+               for (i = 0; i < cma_area_count; i++) {
+                       if (page_to_nid(pfn_to_page(cma_areas[i].base_pfn)) !=
+                               pgdat->node_id)
+                               continue;
+
+                       start_pfn = min(start_pfn, cma_areas[i].base_pfn);
+                       end_pfn = max(end_pfn, cma_areas[i].base_pfn +
+                                               cma_areas[i].count);
+               }
+
+               if (!end_pfn)
                        continue;
 
+               zone = &pgdat->node_zones[ZONE_CMA];
+
                /* ZONE_CMA doesn't need to exceed CMA region */
                zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
                zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 3/6] mm/cma: populate ZONE_CMA
  2016-09-22  6:50         ` Joonsoo Kim
@ 2016-09-22 15:59           ` Vlastimil Babka
  -1 siblings, 0 replies; 54+ messages in thread
From: Vlastimil Babka @ 2016-09-22 15:59 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel

On 09/22/2016 08:50 AM, Joonsoo Kim wrote:
> On Thu, Sep 22, 2016 at 02:45:46PM +0900, Joonsoo Kim wrote:
>> >
>> > > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>> > > void __init init_cma_reserved_pageblock(struct page *page)
>> > > {
>> > > 	unsigned i = pageblock_nr_pages;
>> > >+	unsigned long pfn = page_to_pfn(page);
>> > > 	struct page *p = page;
>> > >+	int nid = page_to_nid(page);
>> > >+
>> > >+	/*
>> > >+	 * ZONE_CMA will steal present pages from other zones by changing
>> > >+	 * page links so page_zone() is changed. Before that,
>> > >+	 * we need to adjust previous zone's page count first.
>> > >+	 */
>> > >+	adjust_present_page_count(page, -pageblock_nr_pages);
>> > >
>> > > 	do {
>> > > 		__ClearPageReserved(p);
>> > > 		set_page_count(p, 0);
>> > >-	} while (++p, --i);
>> > >+
>> > >+		/* Steal pages from other zones */
>> > >+		set_page_links(p, ZONE_CMA, nid, pfn);
>> > >+	} while (++p, ++pfn, --i);
>> > >+
>> > >+	adjust_present_page_count(page, pageblock_nr_pages);
>> >
>> > This seems to assign pages to ZONE_CMA on the proper node, which is
>> > good. But then ZONE_CMA on multiple nodes will have unnecessary
>> > holes in the spanned pages, as each will contain only a subset.
>>
>> True, I will fix it and respin the series.
>
> I now realize that it's too late to send full series for next
> merge window. I will send full series after next merge window is closed.

I think there might still be rc8 thus another week.

> Anyway, I'd like to confirm that following incremental patch will solve
> your concern.

Yeah that should work, as long as single cma areas don't include multiple nodes?

> Thanks.
>
>
> ------>8--------------
>  mm/cma.c | 25 ++++++++++++++++---------
>  1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/mm/cma.c b/mm/cma.c
> index d69bdf7..8375554 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -146,22 +146,29 @@ static int __init cma_init_reserved_areas(void)
>  {
>         int i;
>         struct zone *zone;
> -       unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> +       pg_data_t *pgdat;
>
>         if (!cma_area_count)
>                 return 0;
>
> -       for (i = 0; i < cma_area_count; i++) {
> -               if (start_pfn > cma_areas[i].base_pfn)
> -                       start_pfn = cma_areas[i].base_pfn;
> -               if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
> -                       end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
> -       }
> +       for_each_online_pgdat(pgdat) {
> +               unsigned long start_pfn = UINT_MAX, end_pfn = 0;
>
> -       for_each_zone(zone) {
> -               if (!is_zone_cma(zone))
> +               for (i = 0; i < cma_area_count; i++) {
> +                       if (page_to_nid(pfn_to_page(cma_areas[i].base_pfn)) !=

We have pfn_to_nid() (although the implementation is just like this).

> +                               pgdat->node_id)
> +                               continue;
> +
> +                       start_pfn = min(start_pfn, cma_areas[i].base_pfn);
> +                       end_pfn = max(end_pfn, cma_areas[i].base_pfn +
> +                                               cma_areas[i].count);
> +               }
> +
> +               if (!end_pfn)
>                         continue;
>
> +               zone = &pgdat->node_zones[ZONE_CMA];
> +
>                 /* ZONE_CMA doesn't need to exceed CMA region */
>                 zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
>                 zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 3/6] mm/cma: populate ZONE_CMA
@ 2016-09-22 15:59           ` Vlastimil Babka
  0 siblings, 0 replies; 54+ messages in thread
From: Vlastimil Babka @ 2016-09-22 15:59 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel

On 09/22/2016 08:50 AM, Joonsoo Kim wrote:
> On Thu, Sep 22, 2016 at 02:45:46PM +0900, Joonsoo Kim wrote:
>> >
>> > > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>> > > void __init init_cma_reserved_pageblock(struct page *page)
>> > > {
>> > > 	unsigned i = pageblock_nr_pages;
>> > >+	unsigned long pfn = page_to_pfn(page);
>> > > 	struct page *p = page;
>> > >+	int nid = page_to_nid(page);
>> > >+
>> > >+	/*
>> > >+	 * ZONE_CMA will steal present pages from other zones by changing
>> > >+	 * page links so page_zone() is changed. Before that,
>> > >+	 * we need to adjust previous zone's page count first.
>> > >+	 */
>> > >+	adjust_present_page_count(page, -pageblock_nr_pages);
>> > >
>> > > 	do {
>> > > 		__ClearPageReserved(p);
>> > > 		set_page_count(p, 0);
>> > >-	} while (++p, --i);
>> > >+
>> > >+		/* Steal pages from other zones */
>> > >+		set_page_links(p, ZONE_CMA, nid, pfn);
>> > >+	} while (++p, ++pfn, --i);
>> > >+
>> > >+	adjust_present_page_count(page, pageblock_nr_pages);
>> >
>> > This seems to assign pages to ZONE_CMA on the proper node, which is
>> > good. But then ZONE_CMA on multiple nodes will have unnecessary
>> > holes in the spanned pages, as each will contain only a subset.
>>
>> True, I will fix it and respin the series.
>
> I now realize that it's too late to send full series for next
> merge window. I will send full series after next merge window is closed.

I think there might still be rc8 thus another week.

> Anyway, I'd like to confirm that following incremental patch will solve
> your concern.

Yeah that should work, as long as single cma areas don't include multiple nodes?

> Thanks.
>
>
> ------>8--------------
>  mm/cma.c | 25 ++++++++++++++++---------
>  1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/mm/cma.c b/mm/cma.c
> index d69bdf7..8375554 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -146,22 +146,29 @@ static int __init cma_init_reserved_areas(void)
>  {
>         int i;
>         struct zone *zone;
> -       unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> +       pg_data_t *pgdat;
>
>         if (!cma_area_count)
>                 return 0;
>
> -       for (i = 0; i < cma_area_count; i++) {
> -               if (start_pfn > cma_areas[i].base_pfn)
> -                       start_pfn = cma_areas[i].base_pfn;
> -               if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
> -                       end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
> -       }
> +       for_each_online_pgdat(pgdat) {
> +               unsigned long start_pfn = UINT_MAX, end_pfn = 0;
>
> -       for_each_zone(zone) {
> -               if (!is_zone_cma(zone))
> +               for (i = 0; i < cma_area_count; i++) {
> +                       if (page_to_nid(pfn_to_page(cma_areas[i].base_pfn)) !=

We have pfn_to_nid() (although the implementation is just like this).

> +                               pgdat->node_id)
> +                               continue;
> +
> +                       start_pfn = min(start_pfn, cma_areas[i].base_pfn);
> +                       end_pfn = max(end_pfn, cma_areas[i].base_pfn +
> +                                               cma_areas[i].count);
> +               }
> +
> +               if (!end_pfn)
>                         continue;
>
> +               zone = &pgdat->node_zones[ZONE_CMA];
> +
>                 /* ZONE_CMA doesn't need to exceed CMA region */
>                 zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
>                 zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 3/6] mm/cma: populate ZONE_CMA
  2016-09-22 15:59           ` Vlastimil Babka
@ 2016-09-28  5:34             ` Joonsoo Kim
  -1 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-09-28  5:34 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel

On Thu, Sep 22, 2016 at 05:59:46PM +0200, Vlastimil Babka wrote:
> On 09/22/2016 08:50 AM, Joonsoo Kim wrote:
> >On Thu, Sep 22, 2016 at 02:45:46PM +0900, Joonsoo Kim wrote:
> >>>
> >>> > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> >>> > void __init init_cma_reserved_pageblock(struct page *page)
> >>> > {
> >>> > 	unsigned i = pageblock_nr_pages;
> >>> >+	unsigned long pfn = page_to_pfn(page);
> >>> > 	struct page *p = page;
> >>> >+	int nid = page_to_nid(page);
> >>> >+
> >>> >+	/*
> >>> >+	 * ZONE_CMA will steal present pages from other zones by changing
> >>> >+	 * page links so page_zone() is changed. Before that,
> >>> >+	 * we need to adjust previous zone's page count first.
> >>> >+	 */
> >>> >+	adjust_present_page_count(page, -pageblock_nr_pages);
> >>> >
> >>> > 	do {
> >>> > 		__ClearPageReserved(p);
> >>> > 		set_page_count(p, 0);
> >>> >-	} while (++p, --i);
> >>> >+
> >>> >+		/* Steal pages from other zones */
> >>> >+		set_page_links(p, ZONE_CMA, nid, pfn);
> >>> >+	} while (++p, ++pfn, --i);
> >>> >+
> >>> >+	adjust_present_page_count(page, pageblock_nr_pages);
> >>>
> >>> This seems to assign pages to ZONE_CMA on the proper node, which is
> >>> good. But then ZONE_CMA on multiple nodes will have unnecessary
> >>> holes in the spanned pages, as each will contain only a subset.
> >>
> >>True, I will fix it and respin the series.
> >
> >I now realize that it's too late to send full series for next
> >merge window. I will send full series after next merge window is closed.
> 
> I think there might still be rc8 thus another week.

Indeed. I will send full series, soon.

> 
> >Anyway, I'd like to confirm that following incremental patch will solve
> >your concern.
> 
> Yeah that should work, as long as single cma areas don't include multiple nodes?

Single cma areas cannot include multiple nodes at least until now.
There is a check that single cma area is on a single zone.

Thanks.

> 
> >Thanks.
> >
> >
> >------>8--------------
> > mm/cma.c | 25 ++++++++++++++++---------
> > 1 file changed, 16 insertions(+), 9 deletions(-)
> >
> >diff --git a/mm/cma.c b/mm/cma.c
> >index d69bdf7..8375554 100644
> >--- a/mm/cma.c
> >+++ b/mm/cma.c
> >@@ -146,22 +146,29 @@ static int __init cma_init_reserved_areas(void)
> > {
> >        int i;
> >        struct zone *zone;
> >-       unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> >+       pg_data_t *pgdat;
> >
> >        if (!cma_area_count)
> >                return 0;
> >
> >-       for (i = 0; i < cma_area_count; i++) {
> >-               if (start_pfn > cma_areas[i].base_pfn)
> >-                       start_pfn = cma_areas[i].base_pfn;
> >-               if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
> >-                       end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
> >-       }
> >+       for_each_online_pgdat(pgdat) {
> >+               unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> >
> >-       for_each_zone(zone) {
> >-               if (!is_zone_cma(zone))
> >+               for (i = 0; i < cma_area_count; i++) {
> >+                       if (page_to_nid(pfn_to_page(cma_areas[i].base_pfn)) !=
> 
> We have pfn_to_nid() (although the implementation is just like this).

Will fix.

Thanks.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v5 3/6] mm/cma: populate ZONE_CMA
@ 2016-09-28  5:34             ` Joonsoo Kim
  0 siblings, 0 replies; 54+ messages in thread
From: Joonsoo Kim @ 2016-09-28  5:34 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Rik van Riel, Johannes Weiner, mgorman,
	Laura Abbott, Minchan Kim, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, linux-mm, linux-kernel

On Thu, Sep 22, 2016 at 05:59:46PM +0200, Vlastimil Babka wrote:
> On 09/22/2016 08:50 AM, Joonsoo Kim wrote:
> >On Thu, Sep 22, 2016 at 02:45:46PM +0900, Joonsoo Kim wrote:
> >>>
> >>> > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> >>> > void __init init_cma_reserved_pageblock(struct page *page)
> >>> > {
> >>> > 	unsigned i = pageblock_nr_pages;
> >>> >+	unsigned long pfn = page_to_pfn(page);
> >>> > 	struct page *p = page;
> >>> >+	int nid = page_to_nid(page);
> >>> >+
> >>> >+	/*
> >>> >+	 * ZONE_CMA will steal present pages from other zones by changing
> >>> >+	 * page links so page_zone() is changed. Before that,
> >>> >+	 * we need to adjust previous zone's page count first.
> >>> >+	 */
> >>> >+	adjust_present_page_count(page, -pageblock_nr_pages);
> >>> >
> >>> > 	do {
> >>> > 		__ClearPageReserved(p);
> >>> > 		set_page_count(p, 0);
> >>> >-	} while (++p, --i);
> >>> >+
> >>> >+		/* Steal pages from other zones */
> >>> >+		set_page_links(p, ZONE_CMA, nid, pfn);
> >>> >+	} while (++p, ++pfn, --i);
> >>> >+
> >>> >+	adjust_present_page_count(page, pageblock_nr_pages);
> >>>
> >>> This seems to assign pages to ZONE_CMA on the proper node, which is
> >>> good. But then ZONE_CMA on multiple nodes will have unnecessary
> >>> holes in the spanned pages, as each will contain only a subset.
> >>
> >>True, I will fix it and respin the series.
> >
> >I now realize that it's too late to send full series for next
> >merge window. I will send full series after next merge window is closed.
> 
> I think there might still be rc8 thus another week.

Indeed. I will send full series, soon.

> 
> >Anyway, I'd like to confirm that following incremental patch will solve
> >your concern.
> 
> Yeah that should work, as long as single cma areas don't include multiple nodes?

Single cma areas cannot include multiple nodes at least until now.
There is a check that single cma area is on a single zone.

Thanks.

> 
> >Thanks.
> >
> >
> >------>8--------------
> > mm/cma.c | 25 ++++++++++++++++---------
> > 1 file changed, 16 insertions(+), 9 deletions(-)
> >
> >diff --git a/mm/cma.c b/mm/cma.c
> >index d69bdf7..8375554 100644
> >--- a/mm/cma.c
> >+++ b/mm/cma.c
> >@@ -146,22 +146,29 @@ static int __init cma_init_reserved_areas(void)
> > {
> >        int i;
> >        struct zone *zone;
> >-       unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> >+       pg_data_t *pgdat;
> >
> >        if (!cma_area_count)
> >                return 0;
> >
> >-       for (i = 0; i < cma_area_count; i++) {
> >-               if (start_pfn > cma_areas[i].base_pfn)
> >-                       start_pfn = cma_areas[i].base_pfn;
> >-               if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count)
> >-                       end_pfn = cma_areas[i].base_pfn + cma_areas[i].count;
> >-       }
> >+       for_each_online_pgdat(pgdat) {
> >+               unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> >
> >-       for_each_zone(zone) {
> >-               if (!is_zone_cma(zone))
> >+               for (i = 0; i < cma_area_count; i++) {
> >+                       if (page_to_nid(pfn_to_page(cma_areas[i].base_pfn)) !=
> 
> We have pfn_to_nid() (although the implementation is just like this).

Will fix.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2016-09-28  5:25 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-29  5:07 [PATCH v5 0/6] Introduce ZONE_CMA js1304
2016-08-29  5:07 ` js1304
2016-08-29  5:07 ` [PATCH v5 1/6] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request js1304
2016-08-29  5:07   ` js1304
2016-09-16  3:14   ` Aneesh Kumar K.V
2016-09-16  3:14     ` Aneesh Kumar K.V
2016-09-22  5:30     ` Joonsoo Kim
2016-09-22  5:30       ` Joonsoo Kim
2016-09-21  9:06   ` Vlastimil Babka
2016-09-21  9:06     ` Vlastimil Babka
2016-08-29  5:07 ` [PATCH v5 2/6] mm/cma: introduce new zone, ZONE_CMA js1304
2016-08-29  5:07   ` js1304
2016-08-30 10:35   ` Aneesh Kumar K.V
2016-08-30 10:35     ` Aneesh Kumar K.V
2016-08-30 12:40     ` Aneesh Kumar K.V
2016-08-30 12:40       ` Aneesh Kumar K.V
2016-08-31  7:58       ` Joonsoo Kim
2016-08-31  7:58         ` Joonsoo Kim
2016-09-21  9:11   ` Vlastimil Babka
2016-09-21  9:11     ` Vlastimil Babka
2016-08-29  5:07 ` [PATCH v5 3/6] mm/cma: populate ZONE_CMA js1304
2016-08-29  5:07   ` js1304
2016-09-21  9:20   ` Vlastimil Babka
2016-09-21  9:20     ` Vlastimil Babka
2016-09-22  5:45     ` Joonsoo Kim
2016-09-22  5:45       ` Joonsoo Kim
2016-09-22  6:50       ` Joonsoo Kim
2016-09-22  6:50         ` Joonsoo Kim
2016-09-22 15:59         ` Vlastimil Babka
2016-09-22 15:59           ` Vlastimil Babka
2016-09-28  5:34           ` Joonsoo Kim
2016-09-28  5:34             ` Joonsoo Kim
2016-08-29  5:07 ` [PATCH v5 4/6] mm/cma: remove ALLOC_CMA js1304
2016-08-29  5:07   ` js1304
2016-08-29  5:07 ` [PATCH v5 5/6] mm/cma: remove MIGRATE_CMA js1304
2016-08-29  5:07   ` js1304
2016-08-29  5:07 ` [PATCH v5 6/6] mm/cma: remove per zone CMA stat js1304
2016-08-29  5:07   ` js1304
2016-08-29  9:27 ` [PATCH v5 0/6] Introduce ZONE_CMA Aneesh Kumar K.V
2016-08-29  9:27   ` Aneesh Kumar K.V
2016-08-30  8:21   ` Joonsoo Kim
2016-08-30  8:21     ` Joonsoo Kim
2016-08-30 10:39     ` Aneesh Kumar K.V
2016-08-30 10:39       ` Aneesh Kumar K.V
2016-08-31  8:03       ` Joonsoo Kim
2016-08-31  8:03         ` Joonsoo Kim
2016-09-01  5:47         ` Aneesh Kumar K.V
2016-09-01  5:47           ` Aneesh Kumar K.V
2016-09-01  6:01           ` Joonsoo Kim
2016-09-01  6:01             ` Joonsoo Kim
2016-09-21 14:47           ` Aneesh Kumar K.V
2016-09-21 14:47             ` Aneesh Kumar K.V
2016-09-22  5:32             ` Joonsoo Kim
2016-09-22  5:32               ` Joonsoo Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.