linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv19 00/15] Contiguous Memory Allocator
@ 2012-01-26  9:00 Marek Szyprowski
  2012-01-26  9:00 ` [PATCH 01/15] mm: page_alloc: remove trailing whitespace Marek Szyprowski
                   ` (15 more replies)
  0 siblings, 16 replies; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26  9:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

Welcome everyone!

Yes, that's true. This is yet another release of the Contiguous Memory
Allocator patches. This version mainly includes code cleanups requested
by Mel Gorman and a few minor bug fixes.

ARM integration code has not been changed since v16. It provides
implementation of the ideas that has been discussed during Linaro Sprint
meeting in Cambourne, August 2011. Here are the details:

  This version provides a solution for complete integration of CMA to
  DMA mapping subsystem on ARM architecture. The issue caused by double
  dma pages mapping and possible aliasing in coherent memory mapping has
  been finally resolved, both for GFP_ATOMIC case (allocations comes from
  coherent memory pool) and non-GFP_ATOMIC case (allocations comes from
  CMA managed areas).

  For coherent, nommu, ARMv4 and ARMv5 systems the current DMA-mapping
  implementation has been kept.

  For ARMv6+ systems, CMA has been enabled and a special pool of coherent
  memory for atomic allocations has been created. The size of this pool
  defaults to DEFAULT_CONSISTEN_DMA_SIZE/8, but can be changed with
  coherent_pool kernel parameter (if really required).

  All atomic allocations are served from this pool. I've did a little
  simplification here, because there is no separate pool for writecombine
  memory - such requests are also served from coherent pool. I don't
  think that such simplification is a problem here - I found no driver
  that use dma_alloc_writecombine with GFP_ATOMIC flags.

  All non-atomic allocation are served from CMA area. Kernel mappings are
  updated to reflect required memory attributes changes. This is possible
  because during early boot, all CMA area are remapped with 4KiB pages in
  kernel low-memory.

  This version have been tested on Samsung S5PC110 based Goni machine and
  Exynos4 UniversalC210 board with various V4L2 multimedia drivers.

  Coherent atomic allocations has been tested by manually enabling the dma
  bounce for the s3c-sdhci device.

All patches are prepared for Linux Kernel v3.3-rc1.

A few words for these who see CMA for the first time:

   The Contiguous Memory Allocator (CMA) makes it possible for device
   drivers to allocate big contiguous chunks of memory after the system
   has booted. 

   The main difference from the similar frameworks is the fact that CMA
   allows to transparently reuse memory region reserved for the big
   chunk allocation as a system memory, so no memory is wasted when no
   big chunk is allocated. Once the alloc request is issued, the
   framework will migrate system pages to create a required big chunk of
   physically contiguous memory.

   For more information you can refer to nice LWN articles: 
   http://lwn.net/Articles/447405/ and http://lwn.net/Articles/450286/
   as well as links to previous versions of the CMA framework.

   The CMA framework has been initially developed by Michal Nazarewicz
   at Samsung Poland R&D Center. Since version 9, I've taken over the
   development, because Michal has left the company. Since version v17
   Michal is working again on CMA patches and the current version is
   the result of our joint open-source effort.

TODO (optional):
- implement support for contiguous memory areas placed in HIGHMEM zone
- resolve issue with movable pages with pending io operations

Best regards
Marek Szyprowski
Samsung Poland R&D Center


Links to previous versions of the patchset:
v18: <http://www.spinics.net/lists/linux-mm/msg28125.html>
v17: <http://www.spinics.net/lists/arm-kernel/msg148499.html>
v16: <http://www.spinics.net/lists/linux-mm/msg25066.html>
v15: <http://www.spinics.net/lists/linux-mm/msg23365.html>
v14: <http://www.spinics.net/lists/linux-media/msg36536.html>
v13: (internal, intentionally not released)
v12: <http://www.spinics.net/lists/linux-media/msg35674.html>
v11: <http://www.spinics.net/lists/linux-mm/msg21868.html>
v10: <http://www.spinics.net/lists/linux-mm/msg20761.html>
 v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787>
 v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855>
 v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626>
 v5: (intentionally left out as CMA v5 was identical to CMA v4)
 v4: <http://article.gmane.org/gmane.linux.kernel.mm/52010>
 v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573>
 v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986>
 v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669>


Changelog:

v19:
    1. Addressed another set of comments and suggestions from Mel Gorman, mainly
       related to breaking patches into smaller, single-feature related chunks
       and rewriting already existing functions in memory compaction code.

    2. Reworked completely page reclaim code, removed it from split_free_page()
       and introduce direct call from alloc_contig_range().

    3. Merged a fix from Mans Rullgard for correct cma area limit alignment.

    4. Replaced broken "mm: page_alloc: set_migratetype_isolate: drain PCP prior
       to isolating" patch with "mm: page_alloc: update migrate type of pages on
       pcp when isolating" which is another attempt to solve this issue without
       touching free_pcppages_bulk().

    5. Rebased onto v3.3-rc1

v18:
    1. Addressed comments and suggestions from Mel Gorman related to changes
       in memory compaction code, most important points:
	- removed "mm: page_alloc: handle MIGRATE_ISOLATE in free_pcppages_bulk()"
	  and moved all the logic to set_migratetype_isolate - see
	  "mm: page_alloc: set_migratetype_isolate: drain PCP prior to isolating"
	  patch
	- code in "mm: compaction: introduce isolate_{free,migrate}pages_range()"
	  patch have been simplified and improved
	- removed "mm: mmzone: introduce zone_pfn_same_memmap()" patch

    2. Fixed crash on initialization if HIGHMEM is available on ARM platforms

    3. Fixed problems with allocation of contiguous memory if all free pages
       are occupied by page cache and reclaim is required.

    4. Added a workaround for temporary migration failures (now CMA tries
       to allocate different memory block in such case), what heavily increased
       reliability of the CMA.

    5. Minor cleanup here and there.

    6. Rebased onto v3.2-rc7 kernel tree.

v17:
    1. Replaced whole CMA core memory migration code to the new one kindly
       provided by Michal Nazarewicz. The new code is based on memory
       compaction framework not the memory hotplug, like it was before. This
       change has been suggested by Mel Godman.

    2. Addressed most of the comments from Andrew Morton and Mel Gorman in
       the rest of the CMA code.

    3. Fixed broken initialization on ARM systems with DMA zone enabled.

    4. Rebased onto v3.2-rc2 kernel.

v16:
    1. merged a fixup from Michal Nazarewicz to address comments from Dave
       Hansen about checking if pfns belong to the same memory zone

    2. merged a fix from Michal Nazarewicz for incorrect handling of pages
       which belong to page block that is in MIGRATE_ISOLATE state, in very
       rare cases the migrate type of page block might have been changed
       from MIGRATE_CMA to MIGRATE_MOVABLE because of this bug

    3. moved some common code to include/asm-generic

    4. added support for x86 DMA-mapping framework for pci-dma hardware,
       CMA can be now even more widely tested on KVM/QEMU and a lot of common
       x86 boxes

    5. rebased onto next-20111005 kernel tree, which includes changes in ARM
       DMA-mapping subsystem (CONSISTENT_DMA_SIZE removal)

    6. removed patch for CMA s5p-fimc device private regions (served only as
       example) and provided the one that matches real life case - s5p-mfc
       device

v15:
    1. fixed calculation of the total memory after activating CMA area (was
       broken from v12)

    2. more code cleanup in drivers/base/dma-contiguous.c

    3. added address limit for default CMA area

    4. rewrote ARM DMA integration:
	- removed "ARM: DMA: steal memory for DMA coherent mappings" patch
	- kept current DMA mapping implementation for coherent, nommu and
	  ARMv4/ARMv5 systems
	- enabled CMA for all ARMv6+ systems
	- added separate, small pool for coherent atomic allocations, defaults
	  to CONSISTENT_DMA_SIZE/8, but can be changed with kernel parameter
	  coherent_pool=[size]

v14:
    1. Merged with "ARM: DMA: steal memory for DMA coherent mappings" 
       patch, added support for GFP_ATOMIC allocations.

    2. Added checks for NULL device pointer

v13: (internal, intentionally not released)

v12:
    1. Fixed 2 nasty bugs in dma-contiguous allocator:
       - alignment argument was not passed correctly
       - range for dma_release_from_contiguous was not checked correctly

    2. Added support for architecture specfic dma_contiguous_early_fixup()
       function

    3. CMA and DMA-mapping integration for ARM architechture has been
       rewritten to take care of the memory aliasing issue that might
       happen for newer ARM CPUs (mapping of the same pages with different
       cache attributes is forbidden). TODO: add support for GFP_ATOMIC
       allocations basing on the "ARM: DMA: steal memory for DMA coherent
       mappings" patch and implement support for contiguous memory areas
       that are placed in HIGHMEM zone

v11:
    1. Removed genalloc usage and replaced it with direct calls to
       bitmap_* functions, dropped patches that are not needed
       anymore (genalloc extensions)

    2. Moved all contiguous area management code from mm/cma.c
       to drivers/base/dma-contiguous.c

    3. Renamed cm_alloc/free to dma_alloc/release_from_contiguous

    4. Introduced global, system wide (default) contiguous area
       configured with kernel config and kernel cmdline parameters

    5. Simplified initialization to just one function:
       dma_declare_contiguous()

    6. Added example of device private memory contiguous area

v10:
    1. Rebased onto 3.0-rc2 and resolved all conflicts

    2. Simplified CMA to be just a pure memory allocator, for use
       with platfrom/bus specific subsystems, like dma-mapping.
       Removed all device specific functions are calls.

    3. Integrated with ARM DMA-mapping subsystem.

    4. Code cleanup here and there.

    5. Removed private context support.

v9: 1. Rebased onto 2.6.39-rc1 and resolved all conflicts

    2. Fixed a bunch of nasty bugs that happened when the allocation
       failed (mainly kernel oops due to NULL ptr dereference).

    3. Introduced testing code: cma-regions compatibility layer and
       videobuf2-cma memory allocator module.

v8: 1. The alloc_contig_range() function has now been separated from
       CMA and put in page_allocator.c.  This function tries to
       migrate all LRU pages in specified range and then allocate the
       range using alloc_contig_freed_pages().

    2. Support for MIGRATE_CMA has been separated from the CMA code.
       I have not tested if CMA works with ZONE_MOVABLE but I see no
       reasons why it shouldn't.

    3. I have added a @private argument when creating CMA contexts so
       that one can reserve memory and not share it with the rest of
       the system.  This way, CMA acts only as allocation algorithm.

v7: 1. A lot of functionality that handled driver->allocator_context
       mapping has been removed from the patchset.  This is not to say
       that this code is not needed, it's just not worth posting
       everything in one patchset.

       Currently, CMA is "just" an allocator.  It uses it's own
       migratetype (MIGRATE_CMA) for defining ranges of pageblokcs
       which behave just like ZONE_MOVABLE but dispite the latter can
       be put in arbitrary places.

    2. The migration code that was introduced in the previous version
       actually started working.


v6: 1. Most importantly, v6 introduces support for memory migration.
       The implementation is not yet complete though.

       Migration support means that when CMA is not using memory
       reserved for it, page allocator can allocate pages from it.
       When CMA wants to use the memory, the pages have to be moved
       and/or evicted as to make room for CMA.

       To make it possible it must be guaranteed that only movable and
       reclaimable pages are allocated in CMA controlled regions.
       This is done by introducing a MIGRATE_CMA migrate type that
       guarantees exactly that.

       Some of the migration code is "borrowed" from Kamezawa
       Hiroyuki's alloc_contig_pages() implementation.  The main
       difference is that thanks to MIGRATE_CMA migrate type CMA
       assumes that memory controlled by CMA are is always movable or
       reclaimable so that it makes allocation decisions regardless of
       the whether some pages are actually allocated and migrates them
       if needed.

       The most interesting patches from the patchset that implement
       the functionality are:

         09/13: mm: alloc_contig_free_pages() added
         10/13: mm: MIGRATE_CMA migration type added
         11/13: mm: MIGRATE_CMA isolation functions added
         12/13: mm: cma: Migration support added [wip]

       Currently, kernel panics in some situations which I am trying
       to investigate.

    2. cma_pin() and cma_unpin() functions has been added (after
       a conversation with Johan Mossberg).  The idea is that whenever
       hardware does not use the memory (no transaction is on) the
       chunk can be moved around.  This would allow defragmentation to
       be implemented if desired.  No defragmentation algorithm is
       provided at this time.

    3. Sysfs support has been replaced with debugfs.  I always felt
       unsure about the sysfs interface and when Greg KH pointed it
       out I finally got to rewrite it to debugfs.


v5: (intentionally left out as CMA v5 was identical to CMA v4)


v4: 1. The "asterisk" flag has been removed in favour of requiring
       that platform will provide a "*=<regions>" rule in the map
       attribute.

    2. The terminology has been changed slightly renaming "kind" to
       "type" of memory.  In the previous revisions, the documentation
       indicated that device drivers define memory kinds and now,

v3: 1. The command line parameters have been removed (and moved to
       a separate patch, the fourth one).  As a consequence, the
       cma_set_defaults() function has been changed -- it no longer
       accepts a string with list of regions but an array of regions.

    2. The "asterisk" attribute has been removed.  Now, each region
       has an "asterisk" flag which lets one specify whether this
       region should by considered "asterisk" region.

    3. SysFS support has been moved to a separate patch (the third one
       in the series) and now also includes list of regions.

v2: 1. The "cma_map" command line have been removed.  In exchange,
       a SysFS entry has been created under kernel/mm/contiguous.

       The intended way of specifying the attributes is
       a cma_set_defaults() function called by platform initialisation
       code.  "regions" attribute (the string specified by "cma"
       command line parameter) can be overwritten with command line
       parameter; the other attributes can be changed during run-time
       using the SysFS entries.

    2. The behaviour of the "map" attribute has been modified
       slightly.  Currently, if no rule matches given device it is
       assigned regions specified by the "asterisk" attribute.  It is
       by default built from the region names given in "regions"
       attribute.

    3. Devices can register private regions as well as regions that
       can be shared but are not reserved using standard CMA
       mechanisms.  A private region has no name and can be accessed
       only by devices that have the pointer to it.

    4. The way allocators are registered has changed.  Currently,
       a cma_allocator_register() function is used for that purpose.
       Moreover, allocators are attached to regions the first time
       memory is registered from the region or when allocator is
       registered which means that allocators can be dynamic modules
       that are loaded after the kernel booted (of course, it won't be
       possible to allocate a chunk of memory from a region if
       allocator is not loaded).

    5. Index of new functions:

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions, size_t size,
    +               dma_addr_t alignment)

    +static inline int
    +cma_info_about(struct cma_info *info, const const char *regions)

    +int __must_check cma_region_register(struct cma_region *reg);

    +dma_addr_t __must_check
    +cma_alloc_from_region(struct cma_region *reg,
    +                      size_t size, dma_addr_t alignment);

    +static inline dma_addr_t __must_check
    +cma_alloc_from(const char *regions,
    +               size_t size, dma_addr_t alignment);

    +int cma_allocator_register(struct cma_allocator *alloc);


Patches in this patchset:

Marek Szyprowski (6):
  mm: extract reclaim code from __alloc_pages_direct_reclaim()
  mm: trigger page reclaim in alloc_contig_range() to stabilize
    watermarks
  drivers: add Contiguous Memory Allocator
  X86: integrate CMA with DMA-mapping subsystem
  ARM: integrate CMA with DMA-mapping subsystem
  ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device

Michal Nazarewicz (9):
  mm: page_alloc: remove trailing whitespace
  mm: page_alloc: update migrate type of pages on pcp when isolating
  mm: compaction: introduce isolate_migratepages_range().
  mm: compaction: introduce isolate_freepages_range()
  mm: compaction: export some of the functions
  mm: page_alloc: introduce alloc_contig_range()
  mm: page_alloc: change fallbacks array handling
  mm: mmzone: MIGRATE_CMA migration type added
  mm: page_isolation: MIGRATE_CMA isolation functions added

 Documentation/kernel-parameters.txt   |    9 +
 arch/Kconfig                          |    3 +
 arch/arm/Kconfig                      |    2 +
 arch/arm/include/asm/dma-contiguous.h |   16 ++
 arch/arm/include/asm/mach/map.h       |    1 +
 arch/arm/kernel/setup.c               |    9 +-
 arch/arm/mm/dma-mapping.c             |  368 ++++++++++++++++++++++++------
 arch/arm/mm/init.c                    |   22 ++-
 arch/arm/mm/mm.h                      |    3 +
 arch/arm/mm/mmu.c                     |   31 ++-
 arch/arm/plat-s5p/dev-mfc.c           |   51 +----
 arch/x86/Kconfig                      |    1 +
 arch/x86/include/asm/dma-contiguous.h |   13 +
 arch/x86/include/asm/dma-mapping.h    |    4 +
 arch/x86/kernel/pci-dma.c             |   18 ++-
 arch/x86/kernel/pci-nommu.c           |    8 +-
 arch/x86/kernel/setup.c               |    2 +
 drivers/base/Kconfig                  |   89 +++++++
 drivers/base/Makefile                 |    1 +
 drivers/base/dma-contiguous.c         |  404 ++++++++++++++++++++++++++++++++
 include/asm-generic/dma-contiguous.h  |   27 +++
 include/linux/device.h                |    4 +
 include/linux/dma-contiguous.h        |  110 +++++++++
 include/linux/mmzone.h                |   43 +++-
 include/linux/page-isolation.h        |   35 ++-
 mm/Kconfig                            |    2 +-
 mm/Makefile                           |    3 +-
 mm/compaction.c                       |  414 +++++++++++++++++++++------------
 mm/internal.h                         |   33 +++
 mm/memory-failure.c                   |    2 +-
 mm/memory_hotplug.c                   |    6 +-
 mm/page_alloc.c                       |  355 +++++++++++++++++++++++++---
 mm/page_isolation.c                   |   39 +++-
 mm/vmstat.c                           |    3 +
 34 files changed, 1770 insertions(+), 361 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h
 create mode 100644 arch/x86/include/asm/dma-contiguous.h
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/asm-generic/dma-contiguous.h
 create mode 100644 include/linux/dma-contiguous.h

-- 
1.7.1.569.g6f426


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 01/15] mm: page_alloc: remove trailing whitespace
  2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
@ 2012-01-26  9:00 ` Marek Szyprowski
  2012-01-30 10:59   ` Mel Gorman
  2012-01-26  9:00 ` [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating Marek Szyprowski
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26  9:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

From: Michal Nazarewicz <mina86@mina86.com>

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 mm/page_alloc.c |   18 +++++++++---------
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0027d8f..e1c5656 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -513,10 +513,10 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
  * free pages of length of (1 << order) and marked with _mapcount -2. Page's
  * order is recorded in page_private(page) field.
  * So when we are allocating or freeing one, we can derive the state of the
- * other.  That is, if we allocate a small block, and both were   
- * free, the remainder of the region must be split into blocks.   
+ * other.  That is, if we allocate a small block, and both were
+ * free, the remainder of the region must be split into blocks.
  * If a block is freed, and its buddy is also free, then this
- * triggers coalescing into a block of larger size.            
+ * triggers coalescing into a block of larger size.
  *
  * -- wli
  */
@@ -1061,17 +1061,17 @@ retry_reserve:
 	return page;
 }
 
-/* 
+/*
  * Obtain a specified number of elements from the buddy allocator, all under
  * a single hold of the lock, for efficiency.  Add them to the supplied list.
  * Returns the number of new pages which were placed at *list.
  */
-static int rmqueue_bulk(struct zone *zone, unsigned int order, 
+static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			unsigned long count, struct list_head *list,
 			int migratetype, int cold)
 {
 	int i;
-	
+
 	spin_lock(&zone->lock);
 	for (i = 0; i < count; ++i) {
 		struct page *page = __rmqueue(zone, order, migratetype);
@@ -4258,7 +4258,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 	init_waitqueue_head(&pgdat->kswapd_wait);
 	pgdat->kswapd_max_order = 0;
 	pgdat_page_cgroup_init(pgdat);
-	
+
 	for (j = 0; j < MAX_NR_ZONES; j++) {
 		struct zone *zone = pgdat->node_zones + j;
 		unsigned long size, realsize, memmap_pages;
@@ -5081,11 +5081,11 @@ int __meminit init_per_zone_wmark_min(void)
 module_init(init_per_zone_wmark_min)
 
 /*
- * min_free_kbytes_sysctl_handler - just a wrapper around proc_dointvec() so 
+ * min_free_kbytes_sysctl_handler - just a wrapper around proc_dointvec() so
  *	that we can call two helper functions whenever min_free_kbytes
  *	changes.
  */
-int min_free_kbytes_sysctl_handler(ctl_table *table, int write, 
+int min_free_kbytes_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
 	proc_dointvec(table, write, buffer, length, ppos);
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating
  2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
  2012-01-26  9:00 ` [PATCH 01/15] mm: page_alloc: remove trailing whitespace Marek Szyprowski
@ 2012-01-26  9:00 ` Marek Szyprowski
  2012-01-30 11:15   ` Mel Gorman
  2012-01-26  9:00 ` [PATCH 03/15] mm: compaction: introduce isolate_migratepages_range() Marek Szyprowski
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26  9:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

From: Michal Nazarewicz <mina86@mina86.com>

This commit changes set_migratetype_isolate() so that it updates
migrate type of pages on pcp list which is saved in their
page_private.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 include/linux/page-isolation.h |    6 ++++++
 mm/page_alloc.c                |    1 +
 mm/page_isolation.c            |   24 ++++++++++++++++++++++++
 3 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 051c1b1..8c02c2b 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -27,6 +27,12 @@ extern int
 test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 
 /*
+ * Check all pages in pageblock, find the ones on pcp list, and set
+ * their page_private to MIGRATE_ISOLATE.
+ */
+extern void update_pcp_isolate_block(unsigned long pfn);
+
+/*
  * Internal funcs.Changes pageblock's migrate type.
  * Please use make_pagetype_isolated()/make_pagetype_movable().
  */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e1c5656..70709e7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5465,6 +5465,7 @@ out:
 	if (!ret) {
 		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 		move_freepages_block(zone, page, MIGRATE_ISOLATE);
+		update_pcp_isolate_block(pfn);
 	}
 
 	spin_unlock_irqrestore(&zone->lock, flags);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 4ae42bb..9ea2f6e 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -139,3 +139,27 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
 }
+
+/* must hold zone->lock */
+void update_pcp_isolate_block(unsigned long pfn)
+{
+	unsigned long end_pfn = pfn + pageblock_nr_pages;
+	struct page *page;
+
+	while (pfn < end_pfn) {
+		if (!pfn_valid_within(pfn)) {
+			++pfn;
+			continue;
+		}
+
+		page = pfn_to_page(pfn);
+		if (PageBuddy(page)) {
+			pfn += 1 << page_order(page);
+		} else if (page_count(page) == 0) {
+			set_page_private(page, MIGRATE_ISOLATE);
+			++pfn;
+		} else {
+			++pfn;
+		}
+	}
+}
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 03/15] mm: compaction: introduce isolate_migratepages_range().
  2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
  2012-01-26  9:00 ` [PATCH 01/15] mm: page_alloc: remove trailing whitespace Marek Szyprowski
  2012-01-26  9:00 ` [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating Marek Szyprowski
@ 2012-01-26  9:00 ` Marek Szyprowski
  2012-01-30 11:24   ` Mel Gorman
  2012-01-26  9:00 ` [PATCH 04/15] mm: compaction: introduce isolate_freepages_range() Marek Szyprowski
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26  9:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

From: Michal Nazarewicz <mina86@mina86.com>

This commit introduces isolate_migratepages_range() function which
extracts functionality from isolate_migratepages() so that it can be
used on arbitrary PFN ranges.

isolate_migratepages() function is implemented as a simple wrapper
around isolate_migratepages_range().

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 mm/compaction.c |   77 +++++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 55 insertions(+), 22 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 71a58f6..a42bbdd 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -250,31 +250,34 @@ typedef enum {
 	ISOLATE_SUCCESS,	/* Pages isolated, migrate */
 } isolate_migrate_t;
 
-/*
- * Isolate all pages that can be migrated from the block pointed to by
- * the migrate scanner within compact_control.
+/**
+ * isolate_migratepages_range() - isolate all migrate-able pages in range.
+ * @zone:	Zone pages are in.
+ * @cc:		Compaction control structure.
+ * @low_pfn:	The first PFN of the range.
+ * @end_pfn:	The one-past-the-last PFN of the range.
+ *
+ * Isolate all pages that can be migrated from the range specified by
+ * [low_pfn, end_pfn).  Returns zero if there is a fatal signal
+ * pending), otherwise PFN of the first page that was not scanned
+ * (which may be both less, equal to or more then end_pfn).
+ *
+ * Assumes that cc->migratepages is empty and cc->nr_migratepages is
+ * zero.
+ *
+ * Apart from cc->migratepages and cc->nr_migratetypes this function
+ * does not modify any cc's fields, in particular it does not modify
+ * (or read for that matter) cc->migrate_pfn.
  */
-static isolate_migrate_t isolate_migratepages(struct zone *zone,
-					struct compact_control *cc)
+static unsigned long
+isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
+			   unsigned long low_pfn, unsigned long end_pfn)
 {
-	unsigned long low_pfn, end_pfn;
 	unsigned long last_pageblock_nr = 0, pageblock_nr;
 	unsigned long nr_scanned = 0, nr_isolated = 0;
 	struct list_head *migratelist = &cc->migratepages;
 	isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE;
 
-	/* Do not scan outside zone boundaries */
-	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
-
-	/* Only scan within a pageblock boundary */
-	end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages);
-
-	/* Do not cross the free scanner or scan within a memory hole */
-	if (end_pfn > cc->free_pfn || !pfn_valid(low_pfn)) {
-		cc->migrate_pfn = end_pfn;
-		return ISOLATE_NONE;
-	}
-
 	/*
 	 * Ensure that there are not too many pages isolated from the LRU
 	 * list by either parallel reclaimers or compaction. If there are,
@@ -283,12 +286,12 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	while (unlikely(too_many_isolated(zone))) {
 		/* async migration should just abort */
 		if (!cc->sync)
-			return ISOLATE_ABORT;
+			return 0;
 
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
 
 		if (fatal_signal_pending(current))
-			return ISOLATE_ABORT;
+			return 0;
 	}
 
 	/* Time to isolate some pages for migration */
@@ -313,7 +316,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		} else if (!locked)
 			spin_lock_irq(&zone->lru_lock);
 
-		if (!pfn_valid_within(low_pfn))
+		if (!pfn_valid(low_pfn))
 			continue;
 		nr_scanned++;
 
@@ -374,10 +377,40 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	acct_isolated(zone, cc);
 
 	spin_unlock_irq(&zone->lru_lock);
-	cc->migrate_pfn = low_pfn;
 
 	trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated);
 
+	return low_pfn;
+}
+
+/*
+ * Isolate all pages that can be migrated from the block pointed to by
+ * the migrate scanner within compact_control.
+ */
+static isolate_migrate_t isolate_migratepages(struct zone *zone,
+					struct compact_control *cc)
+{
+	unsigned long low_pfn, end_pfn;
+
+	/* Do not scan outside zone boundaries */
+	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
+
+	/* Only scan within a pageblock boundary */
+	end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages);
+
+	/* Do not cross the free scanner or scan within a memory hole */
+	if (end_pfn > cc->free_pfn || !pfn_valid(low_pfn)) {
+		cc->migrate_pfn = end_pfn;
+		return ISOLATE_NONE;
+	}
+
+	/* Perform the isolation */
+	low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn);
+	if (!low_pfn)
+		return ISOLATE_ABORT;
+
+	cc->migrate_pfn = low_pfn;
+
 	return ISOLATE_SUCCESS;
 }
 
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 04/15] mm: compaction: introduce isolate_freepages_range()
  2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
                   ` (2 preceding siblings ...)
  2012-01-26  9:00 ` [PATCH 03/15] mm: compaction: introduce isolate_migratepages_range() Marek Szyprowski
@ 2012-01-26  9:00 ` Marek Szyprowski
  2012-01-30 11:48   ` Mel Gorman
  2012-01-26  9:00 ` [PATCH 05/15] mm: compaction: export some of the functions Marek Szyprowski
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26  9:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

From: Michal Nazarewicz <mina86@mina86.com>

This commit introduces isolate_freepages_range() function which
generalises isolate_freepages_block() so that it can be used on
arbitrary PFN ranges.

isolate_freepages_block() is left with only minor changes.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 mm/compaction.c |  118 ++++++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 100 insertions(+), 18 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index a42bbdd..63f82be 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -54,24 +54,20 @@ static unsigned long release_freepages(struct list_head *freelist)
 	return count;
 }
 
-/* Isolate free pages onto a private freelist. Must hold zone->lock */
-static unsigned long isolate_freepages_block(struct zone *zone,
-				unsigned long blockpfn,
-				struct list_head *freelist)
+/*
+ * Isolate free pages onto a private freelist. Caller must hold zone->lock.
+ * If @strict is true, will abort returning 0 on any invalid PFNs or non-free
+ * pages inside of the pageblock (even though it may still end up isolating
+ * some pages).
+ */
+static unsigned long isolate_freepages_block(unsigned long blockpfn,
+				unsigned long end_pfn,
+				struct list_head *freelist,
+				bool strict)
 {
-	unsigned long zone_end_pfn, end_pfn;
 	int nr_scanned = 0, total_isolated = 0;
 	struct page *cursor;
 
-	/* Get the last PFN we should scan for free pages at */
-	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
-	end_pfn = min(blockpfn + pageblock_nr_pages, zone_end_pfn);
-
-	/* Find the first usable PFN in the block to initialse page cursor */
-	for (; blockpfn < end_pfn; blockpfn++) {
-		if (pfn_valid_within(blockpfn))
-			break;
-	}
 	cursor = pfn_to_page(blockpfn);
 
 	/* Isolate free pages. This assumes the block is valid */
@@ -79,15 +75,23 @@ static unsigned long isolate_freepages_block(struct zone *zone,
 		int isolated, i;
 		struct page *page = cursor;
 
-		if (!pfn_valid_within(blockpfn))
+		if (!pfn_valid_within(blockpfn)) {
+			if (strict)
+				return 0;
 			continue;
+		}
 		nr_scanned++;
 
-		if (!PageBuddy(page))
+		if (!PageBuddy(page)) {
+			if (strict)
+				return 0;
 			continue;
+		}
 
 		/* Found a free page, break it into order-0 pages */
 		isolated = split_free_page(page);
+		if (!isolated && strict)
+			return 0;
 		total_isolated += isolated;
 		for (i = 0; i < isolated; i++) {
 			list_add(&page->lru, freelist);
@@ -105,6 +109,80 @@ static unsigned long isolate_freepages_block(struct zone *zone,
 	return total_isolated;
 }
 
+/**
+ * isolate_freepages_range() - isolate free pages.
+ * @start_pfn: The first PFN to start isolating.
+ * @end_pfn:   The one-past-last PFN.
+ *
+ * Non-free pages, invalid PFNs, or zone boundaries within the
+ * [start_pfn, end_pfn) range are considered errors, cause function to
+ * undo its actions and return zero.
+ *
+ * Otherwise, function returns one-past-the-last PFN of isolated page
+ * (which may be greater then end_pfn if end fell in a middle of
+ * a free page).
+ */
+static unsigned long
+isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long isolated, pfn, block_end_pfn, flags;
+	struct zone *zone = NULL;
+	LIST_HEAD(freelist);
+	struct page *page;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) {
+		if (!pfn_valid(pfn))
+			break;
+
+		if (!zone)
+			zone = page_zone(pfn_to_page(pfn));
+		else if (zone != page_zone(pfn_to_page(pfn)))
+			break;
+
+		/*
+		 * On subsequent iterations round_down() is actually not
+		 * needed, but we keep it that we not to complicate the code.
+		 */
+		block_end_pfn = round_down(pfn, pageblock_nr_pages)
+			+ pageblock_nr_pages;
+		block_end_pfn = min(block_end_pfn, end_pfn);
+
+		spin_lock_irqsave(&zone->lock, flags);
+		isolated = isolate_freepages_block(pfn, block_end_pfn,
+						   &freelist, true);
+		spin_unlock_irqrestore(&zone->lock, flags);
+
+		/*
+		 * In strict mode, isolate_freepages_block() returns 0 if
+		 * there are any holes in the block (ie. invalid PFNs or
+		 * non-free pages).
+		 */
+		if (!isolated)
+			break;
+
+		/*
+		 * If we managed to isolate pages, it is always (1 << n) *
+		 * pageblock_nr_pages for some non-negative n.  (Max order
+		 * page may span two pageblocks).
+		 */
+	}
+
+	/* split_free_page does not map the pages */
+	list_for_each_entry(page, &freelist, lru) {
+		arch_alloc_page(page, 0);
+		kernel_map_pages(page, 1, 1);
+	}
+
+	if (pfn < end_pfn) {
+		/* Loop terminated early, cleanup. */
+		release_freepages(&freelist);
+		return 0;
+	}
+
+	/* We don't use freelists for anything. */
+	return pfn;
+}
+
 /* Returns true if the page is within a block suitable for migration to */
 static bool suitable_migration_target(struct page *page)
 {
@@ -135,7 +213,7 @@ static void isolate_freepages(struct zone *zone,
 				struct compact_control *cc)
 {
 	struct page *page;
-	unsigned long high_pfn, low_pfn, pfn;
+	unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn;
 	unsigned long flags;
 	int nr_freepages = cc->nr_freepages;
 	struct list_head *freelist = &cc->freepages;
@@ -155,6 +233,8 @@ static void isolate_freepages(struct zone *zone,
 	 */
 	high_pfn = min(low_pfn, pfn);
 
+	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
+
 	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
@@ -191,7 +271,9 @@ static void isolate_freepages(struct zone *zone,
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
 		if (suitable_migration_target(page)) {
-			isolated = isolate_freepages_block(zone, pfn, freelist);
+			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
+			isolated = isolate_freepages_block(pfn, end_pfn,
+							   freelist, false);
 			nr_freepages += isolated;
 		}
 		spin_unlock_irqrestore(&zone->lock, flags);
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 05/15] mm: compaction: export some of the functions
  2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
                   ` (3 preceding siblings ...)
  2012-01-26  9:00 ` [PATCH 04/15] mm: compaction: introduce isolate_freepages_range() Marek Szyprowski
@ 2012-01-26  9:00 ` Marek Szyprowski
  2012-01-30 11:57   ` Mel Gorman
  2012-01-26  9:00 ` [PATCH 06/15] mm: page_alloc: introduce alloc_contig_range() Marek Szyprowski
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26  9:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

From: Michal Nazarewicz <mina86@mina86.com>

This commit exports some of the functions from compaction.c file
outside of it adding their declaration into internal.h header
file so that other mm related code can use them.

This forced compaction.c to always be compiled (as opposed to being
compiled only if CONFIG_COMPACTION is defined) but as to avoid
introducing code that user did not ask for, part of the compaction.c
is now wrapped in on #ifdef.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 mm/Makefile     |    3 +-
 mm/compaction.c |  314 ++++++++++++++++++++++++++-----------------------------
 mm/internal.h   |   33 ++++++
 3 files changed, 184 insertions(+), 166 deletions(-)

diff --git a/mm/Makefile b/mm/Makefile
index 50ec00e..8aada89 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -13,7 +13,7 @@ obj-y			:= filemap.o mempool.o oom_kill.o fadvise.o \
 			   readahead.o swap.o truncate.o vmscan.o shmem.o \
 			   prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
 			   page_isolation.o mm_init.o mmu_context.o percpu.o \
-			   $(mmu-y)
+			   compaction.o $(mmu-y)
 obj-y += init-mm.o
 
 ifdef CONFIG_NO_BOOTMEM
@@ -32,7 +32,6 @@ obj-$(CONFIG_NUMA) 	+= mempolicy.o
 obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
-obj-$(CONFIG_COMPACTION) += compaction.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o
diff --git a/mm/compaction.c b/mm/compaction.c
index 63f82be..3e21d28 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -16,30 +16,11 @@
 #include <linux/sysfs.h>
 #include "internal.h"
 
+#if defined CONFIG_COMPACTION || defined CONFIG_CMA
+
 #define CREATE_TRACE_POINTS
 #include <trace/events/compaction.h>
 
-/*
- * compact_control is used to track pages being migrated and the free pages
- * they are being migrated to during memory compaction. The free_pfn starts
- * at the end of a zone and migrate_pfn begins at the start. Movable pages
- * are moved to the end of a zone during a compaction run and the run
- * completes when free_pfn <= migrate_pfn
- */
-struct compact_control {
-	struct list_head freepages;	/* List of free pages to migrate to */
-	struct list_head migratepages;	/* List of pages being migrated */
-	unsigned long nr_freepages;	/* Number of isolated free pages */
-	unsigned long nr_migratepages;	/* Number of pages to migrate */
-	unsigned long free_pfn;		/* isolate_freepages search base */
-	unsigned long migrate_pfn;	/* isolate_migratepages search base */
-	bool sync;			/* Synchronous migration */
-
-	unsigned int order;		/* order a direct compactor needs */
-	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
-	struct zone *zone;
-};
-
 static unsigned long release_freepages(struct list_head *freelist)
 {
 	struct page *page, *next;
@@ -122,7 +103,7 @@ static unsigned long isolate_freepages_block(unsigned long blockpfn,
  * (which may be greater then end_pfn if end fell in a middle of
  * a free page).
  */
-static unsigned long
+unsigned long
 isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
 {
 	unsigned long isolated, pfn, block_end_pfn, flags;
@@ -183,120 +164,6 @@ isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
 	return pfn;
 }
 
-/* Returns true if the page is within a block suitable for migration to */
-static bool suitable_migration_target(struct page *page)
-{
-
-	int migratetype = get_pageblock_migratetype(page);
-
-	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
-	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
-		return false;
-
-	/* If the page is a large free page, then allow migration */
-	if (PageBuddy(page) && page_order(page) >= pageblock_order)
-		return true;
-
-	/* If the block is MIGRATE_MOVABLE, allow migration */
-	if (migratetype == MIGRATE_MOVABLE)
-		return true;
-
-	/* Otherwise skip the block */
-	return false;
-}
-
-/*
- * Based on information in the current compact_control, find blocks
- * suitable for isolating free pages from and then isolate them.
- */
-static void isolate_freepages(struct zone *zone,
-				struct compact_control *cc)
-{
-	struct page *page;
-	unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn;
-	unsigned long flags;
-	int nr_freepages = cc->nr_freepages;
-	struct list_head *freelist = &cc->freepages;
-
-	/*
-	 * Initialise the free scanner. The starting point is where we last
-	 * scanned from (or the end of the zone if starting). The low point
-	 * is the end of the pageblock the migration scanner is using.
-	 */
-	pfn = cc->free_pfn;
-	low_pfn = cc->migrate_pfn + pageblock_nr_pages;
-
-	/*
-	 * Take care that if the migration scanner is at the end of the zone
-	 * that the free scanner does not accidentally move to the next zone
-	 * in the next isolation cycle.
-	 */
-	high_pfn = min(low_pfn, pfn);
-
-	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
-
-	/*
-	 * Isolate free pages until enough are available to migrate the
-	 * pages on cc->migratepages. We stop searching if the migrate
-	 * and free page scanners meet or enough free pages are isolated.
-	 */
-	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
-					pfn -= pageblock_nr_pages) {
-		unsigned long isolated;
-
-		if (!pfn_valid(pfn))
-			continue;
-
-		/*
-		 * Check for overlapping nodes/zones. It's possible on some
-		 * configurations to have a setup like
-		 * node0 node1 node0
-		 * i.e. it's possible that all pages within a zones range of
-		 * pages do not belong to a single zone.
-		 */
-		page = pfn_to_page(pfn);
-		if (page_zone(page) != zone)
-			continue;
-
-		/* Check the block is suitable for migration */
-		if (!suitable_migration_target(page))
-			continue;
-
-		/*
-		 * Found a block suitable for isolating free pages from. Now
-		 * we disabled interrupts, double check things are ok and
-		 * isolate the pages. This is to minimise the time IRQs
-		 * are disabled
-		 */
-		isolated = 0;
-		spin_lock_irqsave(&zone->lock, flags);
-		if (suitable_migration_target(page)) {
-			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
-			isolated = isolate_freepages_block(pfn, end_pfn,
-							   freelist, false);
-			nr_freepages += isolated;
-		}
-		spin_unlock_irqrestore(&zone->lock, flags);
-
-		/*
-		 * Record the highest PFN we isolated pages from. When next
-		 * looking for free pages, the search will restart here as
-		 * page migration may have returned some pages to the allocator
-		 */
-		if (isolated)
-			high_pfn = max(high_pfn, pfn);
-	}
-
-	/* split_free_page does not map the pages */
-	list_for_each_entry(page, freelist, lru) {
-		arch_alloc_page(page, 0);
-		kernel_map_pages(page, 1, 1);
-	}
-
-	cc->free_pfn = high_pfn;
-	cc->nr_freepages = nr_freepages;
-}
-
 /* Update the number of anon and file isolated pages in the zone */
 static void acct_isolated(struct zone *zone, struct compact_control *cc)
 {
@@ -325,13 +192,6 @@ static bool too_many_isolated(struct zone *zone)
 	return isolated > (inactive + active) / 2;
 }
 
-/* possible outcome of isolate_migratepages */
-typedef enum {
-	ISOLATE_ABORT,		/* Abort compaction now */
-	ISOLATE_NONE,		/* No pages isolated, continue scanning */
-	ISOLATE_SUCCESS,	/* Pages isolated, migrate */
-} isolate_migrate_t;
-
 /**
  * isolate_migratepages_range() - isolate all migrate-able pages in range.
  * @zone:	Zone pages are in.
@@ -351,7 +211,7 @@ typedef enum {
  * does not modify any cc's fields, in particular it does not modify
  * (or read for that matter) cc->migrate_pfn.
  */
-static unsigned long
+unsigned long
 isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 			   unsigned long low_pfn, unsigned long end_pfn)
 {
@@ -465,35 +325,121 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 	return low_pfn;
 }
 
+#endif /* CONFIG_COMPACTION || CONFIG_CMA */
+#ifdef CONFIG_COMPACTION
+
+/* Returns true if the page is within a block suitable for migration to */
+static bool suitable_migration_target(struct page *page)
+{
+
+	int migratetype = get_pageblock_migratetype(page);
+
+	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
+	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
+		return false;
+
+	/* If the page is a large free page, then allow migration */
+	if (PageBuddy(page) && page_order(page) >= pageblock_order)
+		return true;
+
+	/* If the block is MIGRATE_MOVABLE, allow migration */
+	if (migratetype == MIGRATE_MOVABLE)
+		return true;
+
+	/* Otherwise skip the block */
+	return false;
+}
+
 /*
- * Isolate all pages that can be migrated from the block pointed to by
- * the migrate scanner within compact_control.
+ * Based on information in the current compact_control, find blocks
+ * suitable for isolating free pages from and then isolate them.
  */
-static isolate_migrate_t isolate_migratepages(struct zone *zone,
-					struct compact_control *cc)
+static void isolate_freepages(struct zone *zone,
+				struct compact_control *cc)
 {
-	unsigned long low_pfn, end_pfn;
+	struct page *page;
+	unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn;
+	unsigned long flags;
+	int nr_freepages = cc->nr_freepages;
+	struct list_head *freelist = &cc->freepages;
 
-	/* Do not scan outside zone boundaries */
-	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
+	/*
+	 * Initialise the free scanner. The starting point is where we last
+	 * scanned from (or the end of the zone if starting). The low point
+	 * is the end of the pageblock the migration scanner is using.
+	 */
+	pfn = cc->free_pfn;
+	low_pfn = cc->migrate_pfn + pageblock_nr_pages;
 
-	/* Only scan within a pageblock boundary */
-	end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages);
+	/*
+	 * Take care that if the migration scanner is at the end of the zone
+	 * that the free scanner does not accidentally move to the next zone
+	 * in the next isolation cycle.
+	 */
+	high_pfn = min(low_pfn, pfn);
 
-	/* Do not cross the free scanner or scan within a memory hole */
-	if (end_pfn > cc->free_pfn || !pfn_valid(low_pfn)) {
-		cc->migrate_pfn = end_pfn;
-		return ISOLATE_NONE;
-	}
+	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
 
-	/* Perform the isolation */
-	low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn);
-	if (!low_pfn)
-		return ISOLATE_ABORT;
+	/*
+	 * Isolate free pages until enough are available to migrate the
+	 * pages on cc->migratepages. We stop searching if the migrate
+	 * and free page scanners meet or enough free pages are isolated.
+	 */
+	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
+					pfn -= pageblock_nr_pages) {
+		unsigned long isolated;
 
-	cc->migrate_pfn = low_pfn;
+		if (!pfn_valid(pfn))
+			continue;
 
-	return ISOLATE_SUCCESS;
+		/*
+		 * Check for overlapping nodes/zones. It's possible on some
+		 * configurations to have a setup like
+		 * node0 node1 node0
+		 * i.e. it's possible that all pages within a zones range of
+		 * pages do not belong to a single zone.
+		 */
+		page = pfn_to_page(pfn);
+		if (page_zone(page) != zone)
+			continue;
+
+		/* Check the block is suitable for migration */
+		if (!suitable_migration_target(page))
+			continue;
+
+		/*
+		 * Found a block suitable for isolating free pages from. Now
+		 * we disabled interrupts, double check things are ok and
+		 * isolate the pages. This is to minimise the time IRQs
+		 * are disabled
+		 */
+		isolated = 0;
+		spin_lock_irqsave(&zone->lock, flags);
+		if (suitable_migration_target(page)) {
+			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
+			isolated = isolate_freepages_block(pfn, end_pfn,
+							   freelist, false);
+			nr_freepages += isolated;
+		}
+		spin_unlock_irqrestore(&zone->lock, flags);
+
+		/*
+		 * Record the highest PFN we isolated pages from. When next
+		 * looking for free pages, the search will restart here as
+		 * page migration may have returned some pages to the allocator
+		 */
+		if (isolated)
+			high_pfn = max(high_pfn, pfn);
+	}
+
+	/* split_free_page does not map the pages */
+	list_for_each_entry(page, freelist, lru) {
+		arch_alloc_page(page, 0);
+		kernel_map_pages(page, 1, 1);
+	}
+
+	cc->free_pfn = high_pfn;
+	cc->nr_freepages = nr_freepages;
 }
 
 /*
@@ -542,6 +488,44 @@ static void update_nr_listpages(struct compact_control *cc)
 	cc->nr_freepages = nr_freepages;
 }
 
+/* possible outcome of isolate_migratepages */
+typedef enum {
+	ISOLATE_ABORT,		/* Abort compaction now */
+	ISOLATE_NONE,		/* No pages isolated, continue scanning */
+	ISOLATE_SUCCESS,	/* Pages isolated, migrate */
+} isolate_migrate_t;
+
+/*
+ * Isolate all pages that can be migrated from the block pointed to by
+ * the migrate scanner within compact_control.
+ */
+static isolate_migrate_t isolate_migratepages(struct zone *zone,
+					struct compact_control *cc)
+{
+	unsigned long low_pfn, end_pfn;
+
+	/* Do not scan outside zone boundaries */
+	low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
+
+	/* Only scan within a pageblock boundary */
+	end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages);
+
+	/* Do not cross the free scanner or scan within a memory hole */
+	if (end_pfn > cc->free_pfn || !pfn_valid(low_pfn)) {
+		cc->migrate_pfn = end_pfn;
+		return ISOLATE_NONE;
+	}
+
+	/* Perform the isolation */
+	low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn);
+	if (!low_pfn)
+		return ISOLATE_ABORT;
+
+	cc->migrate_pfn = low_pfn;
+
+	return ISOLATE_SUCCESS;
+}
+
 static int compact_finished(struct zone *zone,
 			    struct compact_control *cc)
 {
@@ -859,3 +843,5 @@ void compaction_unregister_node(struct node *node)
 	return device_remove_file(&node->dev, &dev_attr_compact);
 }
 #endif /* CONFIG_SYSFS && CONFIG_NUMA */
+
+#endif /* CONFIG_COMPACTION */
diff --git a/mm/internal.h b/mm/internal.h
index 2189af4..55e7eed 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -100,6 +100,39 @@ extern void prep_compound_page(struct page *page, unsigned long order);
 extern bool is_free_buddy_page(struct page *page);
 #endif
 
+#if defined CONFIG_COMPACTION || defined CONFIG_CMA
+
+/*
+ * in mm/compaction.c
+ */
+/*
+ * compact_control is used to track pages being migrated and the free pages
+ * they are being migrated to during memory compaction. The free_pfn starts
+ * at the end of a zone and migrate_pfn begins at the start. Movable pages
+ * are moved to the end of a zone during a compaction run and the run
+ * completes when free_pfn <= migrate_pfn
+ */
+struct compact_control {
+	struct list_head freepages;	/* List of free pages to migrate to */
+	struct list_head migratepages;	/* List of pages being migrated */
+	unsigned long nr_freepages;	/* Number of isolated free pages */
+	unsigned long nr_migratepages;	/* Number of pages to migrate */
+	unsigned long free_pfn;		/* isolate_freepages search base */
+	unsigned long migrate_pfn;	/* isolate_migratepages search base */
+	bool sync;			/* Synchronous migration */
+
+	unsigned int order;		/* order a direct compactor needs */
+	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
+	struct zone *zone;
+};
+
+unsigned long
+isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn);
+unsigned long
+isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
+			   unsigned long low_pfn, unsigned long end_pfn);
+
+#endif
 
 /*
  * function for dealing with page's order in buddy system.
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 06/15] mm: page_alloc: introduce alloc_contig_range()
  2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
                   ` (4 preceding siblings ...)
  2012-01-26  9:00 ` [PATCH 05/15] mm: compaction: export some of the functions Marek Szyprowski
@ 2012-01-26  9:00 ` Marek Szyprowski
  2012-01-30 12:11   ` Mel Gorman
  2012-01-26  9:00 ` [PATCH 07/15] mm: page_alloc: change fallbacks array handling Marek Szyprowski
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26  9:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

From: Michal Nazarewicz <mina86@mina86.com>

This commit adds the alloc_contig_range() function which tries
to allocate given range of pages.  It tries to migrate all
already allocated pages that fall in the range thus freeing them.
Once all pages in the range are freed they are removed from the
buddy system thus allocated for the caller to use.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 include/linux/page-isolation.h |    7 ++
 mm/page_alloc.c                |  183 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 190 insertions(+), 0 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 8c02c2b..430cf61 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -39,5 +39,12 @@ extern void update_pcp_isolate_block(unsigned long pfn);
 extern int set_migratetype_isolate(struct page *page);
 extern void unset_migratetype_isolate(struct page *page);
 
+#ifdef CONFIG_CMA
+
+/* The below functions must be run on a range from a single zone. */
+extern int alloc_contig_range(unsigned long start, unsigned long end);
+extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
+
+#endif
 
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 70709e7..b4f50532 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -57,6 +57,7 @@
 #include <linux/ftrace_event.h>
 #include <linux/memcontrol.h>
 #include <linux/prefetch.h>
+#include <linux/migrate.h>
 #include <linux/page-debug-flags.h>
 
 #include <asm/tlbflush.h>
@@ -5488,6 +5489,188 @@ out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
+#ifdef CONFIG_CMA
+
+static unsigned long pfn_align_to_maxpage_down(unsigned long pfn)
+{
+	return pfn & ~(MAX_ORDER_NR_PAGES - 1);
+}
+
+static unsigned long pfn_align_to_maxpage_up(unsigned long pfn)
+{
+	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
+}
+
+static struct page *
+__alloc_contig_migrate_alloc(struct page *page, unsigned long private,
+			     int **resultp)
+{
+	return alloc_page(GFP_HIGHUSER_MOVABLE);
+}
+
+/* [start, end) must belong to a single zone. */
+static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
+{
+	/* This function is based on compact_zone() from compaction.c. */
+
+	unsigned long pfn = start;
+	unsigned int tries = 0;
+	int ret = 0;
+
+	struct compact_control cc = {
+		.nr_migratepages = 0,
+		.order = -1,
+		.zone = page_zone(pfn_to_page(start)),
+		.sync = true,
+	};
+	INIT_LIST_HEAD(&cc.migratepages);
+
+	migrate_prep_local();
+
+	while (pfn < end || !list_empty(&cc.migratepages)) {
+		if (fatal_signal_pending(current)) {
+			ret = -EINTR;
+			break;
+		}
+
+		if (list_empty(&cc.migratepages)) {
+			cc.nr_migratepages = 0;
+			pfn = isolate_migratepages_range(cc.zone, &cc,
+							 pfn, end);
+			if (!pfn) {
+				ret = -EINTR;
+				break;
+			}
+			tries = 0;
+		} else if (++tries == 5) {
+			ret = ret < 0 ? ret : -EBUSY;
+			break;
+		}
+
+		ret = migrate_pages(&cc.migratepages,
+				    __alloc_contig_migrate_alloc,
+				    0, false, true);
+	}
+
+	putback_lru_pages(&cc.migratepages);
+	return ret;
+}
+
+/**
+ * alloc_contig_range() -- tries to allocate given range of pages
+ * @start:	start PFN to allocate
+ * @end:	one-past-the-last PFN to allocate
+ *
+ * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
+ * aligned, however it's the caller's responsibility to guarantee that
+ * we are the only thread that changes migrate type of pageblocks the
+ * pages fall in.
+ *
+ * The PFN range must belong to a single zone.
+ *
+ * Returns zero on success or negative error code.  On success all
+ * pages which PFN is in [start, end) are allocated for the caller and
+ * need to be freed with free_contig_range().
+ */
+int alloc_contig_range(unsigned long start, unsigned long end)
+{
+	unsigned long outer_start, outer_end;
+	int ret = 0, order;
+
+	/*
+	 * What we do here is we mark all pageblocks in range as
+	 * MIGRATE_ISOLATE.  Because of the way page allocator work, we
+	 * align the range to MAX_ORDER pages so that page allocator
+	 * won't try to merge buddies from different pageblocks and
+	 * change MIGRATE_ISOLATE to some other migration type.
+	 *
+	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
+	 * migrate the pages from an unaligned range (ie. pages that
+	 * we are interested in).  This will put all the pages in
+	 * range back to page allocator as MIGRATE_ISOLATE.
+	 *
+	 * When this is done, we take the pages in range from page
+	 * allocator removing them from the buddy system.  This way
+	 * page allocator will never consider using them.
+	 *
+	 * This lets us mark the pageblocks back as
+	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
+	 * MAX_ORDER aligned range but not in the unaligned, original
+	 * range are put back to page allocator so that buddy can use
+	 * them.
+	 */
+
+	ret = start_isolate_page_range(pfn_align_to_maxpage_down(start),
+				       pfn_align_to_maxpage_up(end));
+	if (ret)
+		goto done;
+
+	ret = __alloc_contig_migrate_range(start, end);
+	if (ret)
+		goto done;
+
+	/*
+	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
+	 * more, all pages in [start, end) are free in page allocator.
+	 * What we are going to do is to allocate all pages from
+	 * [start, end) (that is remove them from page allocater).
+	 *
+	 * The only problem is that pages at the beginning and at the
+	 * end of interesting range may be not aligned with pages that
+	 * page allocator holds, ie. they can be part of higher order
+	 * pages.  Because of this, we reserve the bigger range and
+	 * once this is done free the pages we are not interested in.
+	 */
+
+	lru_add_drain_all();
+	drain_all_pages();
+
+	order = 0;
+	outer_start = start;
+	while (!PageBuddy(pfn_to_page(outer_start))) {
+		if (WARN_ON(++order >= MAX_ORDER)) {
+			ret = -EINVAL;
+			goto done;
+		}
+		outer_start &= ~0UL << order;
+	}
+
+	/* Make sure the range is really isolated. */
+	if (test_pages_isolated(outer_start, end)) {
+		pr_warn("__alloc_contig_migrate_range: test_pages_isolated(%lx, %lx) failed\n",
+		       outer_start, end);
+		ret = -EBUSY;
+		goto done;
+	}
+
+	outer_end = isolate_freepages_range(outer_start, end);
+	if (!outer_end) {
+		ret = -EBUSY;
+		goto done;
+	}
+
+	/* Free head and tail (if any) */
+	if (start != outer_start)
+		free_contig_range(outer_start, start - outer_start);
+	if (end != outer_end)
+		free_contig_range(end, outer_end - end);
+
+done:
+	undo_isolate_page_range(pfn_align_to_maxpage_down(start),
+				pfn_align_to_maxpage_up(end));
+	return ret;
+}
+
+void free_contig_range(unsigned long pfn, unsigned nr_pages)
+{
+	for (; nr_pages--; ++pfn)
+		__free_page(pfn_to_page(pfn));
+}
+
+#endif
+
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be isolated before calling this.
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 07/15] mm: page_alloc: change fallbacks array handling
  2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
                   ` (5 preceding siblings ...)
  2012-01-26  9:00 ` [PATCH 06/15] mm: page_alloc: introduce alloc_contig_range() Marek Szyprowski
@ 2012-01-26  9:00 ` Marek Szyprowski
  2012-01-30 12:12   ` Mel Gorman
  2012-01-26  9:00 ` [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added Marek Szyprowski
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26  9:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

From: Michal Nazarewicz <mina86@mina86.com>

This commit adds a row for MIGRATE_ISOLATE type to the fallbacks array
which was missing from it.  It also, changes the array traversal logic
a little making MIGRATE_RESERVE an end marker.  The letter change,
removes the implicit MIGRATE_UNMOVABLE from the end of each row which
was read by __rmqueue_fallback() function.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 mm/page_alloc.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b4f50532..0a9cc8e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -875,11 +875,12 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
+static int fallbacks[MIGRATE_TYPES][3] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
+	[MIGRATE_ISOLATE]     = { MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -974,12 +975,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0; i < MIGRATE_TYPES - 1; i++) {
+		for (i = 0;; i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
-				continue;
+				break;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added
  2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
                   ` (6 preceding siblings ...)
  2012-01-26  9:00 ` [PATCH 07/15] mm: page_alloc: change fallbacks array handling Marek Szyprowski
@ 2012-01-26  9:00 ` Marek Szyprowski
  2012-01-30 12:35   ` Mel Gorman
  2012-01-26  9:00 ` [PATCH 09/15] mm: page_isolation: MIGRATE_CMA isolation functions added Marek Szyprowski
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26  9:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

From: Michal Nazarewicz <mina86@mina86.com>

The MIGRATE_CMA migration type has two main characteristics:
(i) only movable pages can be allocated from MIGRATE_CMA
pageblocks and (ii) page allocator will never change migration
type of MIGRATE_CMA pageblocks.

This guarantees (to some degree) that page in a MIGRATE_CMA page
block can always be migrated somewhere else (unless there's no
memory left in the system).

It is designed to be used for allocating big chunks (eg. 10MiB)
of physically contiguous memory.  Once driver requests
contiguous memory, pages from MIGRATE_CMA pageblocks may be
migrated away to create a contiguous block.

To minimise number of migrations, MIGRATE_CMA migration type
is the last type tried when page allocator falls back to other
migration types then requested.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 include/linux/mmzone.h         |   43 +++++++++++++++++++++----
 include/linux/page-isolation.h |    3 ++
 mm/Kconfig                     |    2 +-
 mm/compaction.c                |   11 +++++--
 mm/page_alloc.c                |   68 +++++++++++++++++++++++++++++++++-------
 mm/vmstat.c                    |    3 ++
 6 files changed, 107 insertions(+), 23 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 650ba2f..fcd4a14 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -35,13 +35,37 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
-#define MIGRATE_RESERVE       3
-#define MIGRATE_ISOLATE       4 /* can't allocate from here */
-#define MIGRATE_TYPES         5
+enum {
+	MIGRATE_UNMOVABLE,
+	MIGRATE_RECLAIMABLE,
+	MIGRATE_MOVABLE,
+	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
+#ifdef CONFIG_CMA
+	/*
+	 * MIGRATE_CMA migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_CMA pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_CMA pageblock.
+	 *
+	 * The way to use it is to change migratetype of a range of
+	 * pageblocks to MIGRATE_CMA which can be done by
+	 * __free_pageblock_cma() function.  What is important though
+	 * is that a range of pageblocks must be aligned to
+	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
+	 * a single pageblock.
+	 */
+	MIGRATE_CMA,
+#endif
+	MIGRATE_ISOLATE,	/* can't allocate from here */
+	MIGRATE_TYPES
+};
+
+#ifdef CONFIG_CMA
+#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
+#else
+#  define is_migrate_cma(migratetype) false
+#endif
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
@@ -54,6 +78,11 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+static inline bool is_pageblock_cma(struct page *page)
+{
+	return is_migrate_cma(get_pageblock_migratetype(page));
+}
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 430cf61..454dd29 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -45,6 +45,9 @@ extern void unset_migratetype_isolate(struct page *page);
 extern int alloc_contig_range(unsigned long start, unsigned long end);
 extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
 
+/* CMA stuff */
+extern void init_cma_reserved_pageblock(struct page *page);
+
 #endif
 
 #endif
diff --git a/mm/Kconfig b/mm/Kconfig
index e338407..3922002 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -198,7 +198,7 @@ config COMPACTION
 config MIGRATION
 	bool "Page migration"
 	def_bool y
-	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
+	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA
 	help
 	  Allows the migration of the physical location of pages of processes
 	  while the virtual addresses are not changed. This is useful in
diff --git a/mm/compaction.c b/mm/compaction.c
index 3e21d28..a075b43 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -35,6 +35,11 @@ static unsigned long release_freepages(struct list_head *freelist)
 	return count;
 }
 
+static inline bool migrate_async_suitable(int migratetype)
+{
+	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
+}
+
 /*
  * Isolate free pages onto a private freelist. Caller must hold zone->lock.
  * If @strict is true, will abort returning 0 on any invalid PFNs or non-free
@@ -274,7 +279,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 		 */
 		pageblock_nr = low_pfn >> pageblock_order;
 		if (!cc->sync && last_pageblock_nr != pageblock_nr &&
-				get_pageblock_migratetype(page) != MIGRATE_MOVABLE) {
+		    migrate_async_suitable(get_pageblock_migratetype(page))) {
 			low_pfn += pageblock_nr_pages;
 			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
 			last_pageblock_nr = pageblock_nr;
@@ -342,8 +347,8 @@ static bool suitable_migration_target(struct page *page)
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
 
-	/* If the block is MIGRATE_MOVABLE, allow migration */
-	if (migratetype == MIGRATE_MOVABLE)
+	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
+	if (migrate_async_suitable(migratetype))
 		return true;
 
 	/* Otherwise skip the block */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0a9cc8e..0fcde78 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -750,6 +750,26 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
 	__free_pages(page, order);
 }
 
+#ifdef CONFIG_CMA
+/*
+ * Free whole pageblock and set it's migration type to MIGRATE_CMA.
+ */
+void __init init_cma_reserved_pageblock(struct page *page)
+{
+	unsigned i = pageblock_nr_pages;
+	struct page *p = page;
+
+	do {
+		__ClearPageReserved(p);
+		set_page_count(p, 0);
+	} while (++p, --i);
+
+	set_page_refcounted(page);
+	set_pageblock_migratetype(page, MIGRATE_CMA);
+	__free_pages(page, pageblock_order);
+	totalram_pages += pageblock_nr_pages;
+}
+#endif
 
 /*
  * The order of subdivision here is critical for the IO subsystem.
@@ -875,10 +895,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][3] = {
+static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
+#ifdef CONFIG_CMA
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },
+	[MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
+#else
 	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
+#endif
 	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 	[MIGRATE_ISOLATE]     = { MIGRATE_RESERVE }, /* Never used */
 };
@@ -995,11 +1020,18 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
+			 *
+			 * On the other hand, never change migration
+			 * type of MIGRATE_CMA pageblocks nor move CMA
+			 * pages on different free lists. We don't
+			 * want unmovable pages to be allocated from
+			 * MIGRATE_CMA areas.
 			 */
-			if (unlikely(current_order >= (pageblock_order >> 1)) ||
-					start_migratetype == MIGRATE_RECLAIMABLE ||
-					page_group_by_mobility_disabled) {
-				unsigned long pages;
+			if (!is_pageblock_cma(page) &&
+			    (unlikely(current_order >= pageblock_order / 2) ||
+			     start_migratetype == MIGRATE_RECLAIMABLE ||
+			     page_group_by_mobility_disabled)) {
+				int pages;
 				pages = move_freepages_block(zone, page,
 								start_migratetype);
 
@@ -1017,11 +1049,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order)
+			if (current_order >= pageblock_order &&
+			    !is_pageblock_cma(page))
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
-			expand(zone, page, order, current_order, area, migratetype);
+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(start_migratetype)
+			     ? start_migratetype : migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1093,7 +1128,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		set_page_private(page, migratetype);
+#ifdef CONFIG_CMA
+		if (is_pageblock_cma(page))
+			set_page_private(page, MIGRATE_CMA);
+		else
+#endif
+			set_page_private(page, migratetype);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1337,8 +1377,12 @@ int split_free_page(struct page *page)
 
 	if (order >= pageblock_order - 1) {
 		struct page *endpage = page + (1 << order) - 1;
-		for (; page < endpage; page += pageblock_nr_pages)
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+		for (; page < endpage; page += pageblock_nr_pages) {
+			int mt = get_pageblock_migratetype(page);
+			if (mt != MIGRATE_ISOLATE && !is_migrate_cma(mt))
+				set_pageblock_migratetype(page,
+							  MIGRATE_MOVABLE);
+		}
 	}
 
 	return 1 << order;
@@ -5375,8 +5419,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
-
-	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
+	    is_pageblock_cma(page))
 		return true;
 
 	pfn = page_to_pfn(page);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index f600557..ace5383 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -613,6 +613,9 @@ static char * const migratetype_names[MIGRATE_TYPES] = {
 	"Reclaimable",
 	"Movable",
 	"Reserve",
+#ifdef CONFIG_CMA
+	"CMA",
+#endif
 	"Isolate",
 };
 
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 09/15] mm: page_isolation: MIGRATE_CMA isolation functions added
  2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
                   ` (7 preceding siblings ...)
  2012-01-26  9:00 ` [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added Marek Szyprowski
@ 2012-01-26  9:00 ` Marek Szyprowski
  2012-01-26  9:00 ` [PATCH 10/15] mm: extract reclaim code from __alloc_pages_direct_reclaim() Marek Szyprowski
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26  9:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

From: Michal Nazarewicz <mina86@mina86.com>

This commit changes various functions that change pages and
pageblocks migrate type between MIGRATE_ISOLATE and
MIGRATE_MOVABLE in such a way as to allow to work with
MIGRATE_CMA migrate type.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 include/linux/page-isolation.h |   21 +++++++++++----------
 mm/memory-failure.c            |    2 +-
 mm/memory_hotplug.c            |    6 +++---
 mm/page_alloc.c                |   18 ++++++++++++------
 mm/page_isolation.c            |   15 ++++++++-------
 5 files changed, 35 insertions(+), 27 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 454dd29..0659713 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -3,7 +3,7 @@
 
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- * If specified range includes migrate types other than MOVABLE,
+ * If specified range includes migrate types other than MOVABLE or CMA,
  * this will fail with -EBUSY.
  *
  * For isolating all pages in the range finally, the caller have to
@@ -11,20 +11,21 @@
  * test it.
  */
 extern int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			 unsigned migratetype);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
  * target range is [start_pfn, end_pfn)
  */
 extern int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
+undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			unsigned migratetype);
 
 /*
- * test all pages in [start_pfn, end_pfn)are isolated or not.
+ * Test all pages in [start_pfn, end_pfn) are isolated or not.
  */
-extern int
-test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 
 /*
  * Check all pages in pageblock, find the ones on pcp list, and set
@@ -33,16 +34,16 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
 extern void update_pcp_isolate_block(unsigned long pfn);
 
 /*
- * Internal funcs.Changes pageblock's migrate type.
- * Please use make_pagetype_isolated()/make_pagetype_movable().
+ * Internal functions. Changes pageblock's migrate type.
  */
 extern int set_migratetype_isolate(struct page *page);
-extern void unset_migratetype_isolate(struct page *page);
+extern void unset_migratetype_isolate(struct page *page, unsigned migratetype);
 
 #ifdef CONFIG_CMA
 
 /* The below functions must be run on a range from a single zone. */
-extern int alloc_contig_range(unsigned long start, unsigned long end);
+extern int alloc_contig_range(unsigned long start, unsigned long end,
+			      unsigned migratetype);
 extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
 
 /* CMA stuff */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 56080ea..76b01bf 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1400,7 +1400,7 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags)
 		/* Not a free page */
 		ret = 1;
 	}
-	unset_migratetype_isolate(p);
+	unset_migratetype_isolate(p, MIGRATE_MOVABLE);
 	unlock_memory_hotplug();
 	return ret;
 }
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6629faf..fc898cb 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -891,7 +891,7 @@ static int __ref offline_pages(unsigned long start_pfn,
 	nr_pages = end_pfn - start_pfn;
 
 	/* set above range as isolated */
-	ret = start_isolate_page_range(start_pfn, end_pfn);
+	ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
 	if (ret)
 		goto out;
 
@@ -956,7 +956,7 @@ repeat:
 	   We cannot do rollback at this point. */
 	offline_isolated_pages(start_pfn, end_pfn);
 	/* reset pagetype flags and makes migrate type to be MOVABLE */
-	undo_isolate_page_range(start_pfn, end_pfn);
+	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
 	/* removal success */
 	zone->present_pages -= offlined_pages;
 	zone->zone_pgdat->node_present_pages -= offlined_pages;
@@ -981,7 +981,7 @@ failed_removal:
 		start_pfn, end_pfn);
 	memory_notify(MEM_CANCEL_OFFLINE, &arg);
 	/* pushback to free area */
-	undo_isolate_page_range(start_pfn, end_pfn);
+	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
 
 out:
 	unlock_memory_hotplug();
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0fcde78..4e60c0b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5520,7 +5520,7 @@ out:
 	return ret;
 }
 
-void unset_migratetype_isolate(struct page *page)
+void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 {
 	struct zone *zone;
 	unsigned long flags;
@@ -5528,8 +5528,8 @@ void unset_migratetype_isolate(struct page *page)
 	spin_lock_irqsave(&zone->lock, flags);
 	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 		goto out;
-	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
-	move_freepages_block(zone, page, MIGRATE_MOVABLE);
+	set_pageblock_migratetype(page, migratetype);
+	move_freepages_block(zone, page, migratetype);
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
@@ -5605,6 +5605,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
  * @end:	one-past-the-last PFN to allocate
+ * @migratetype:	migratetype of the underlaying pageblocks (either
+ *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
+ *			in range must have the same migratetype and it must
+ *			be either of the two.
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
  * aligned, however it's the caller's responsibility to guarantee that
@@ -5617,7 +5621,8 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
  * pages which PFN is in [start, end) are allocated for the caller and
  * need to be freed with free_contig_range().
  */
-int alloc_contig_range(unsigned long start, unsigned long end)
+int alloc_contig_range(unsigned long start, unsigned long end,
+		       unsigned migratetype)
 {
 	unsigned long outer_start, outer_end;
 	int ret = 0, order;
@@ -5646,7 +5651,8 @@ int alloc_contig_range(unsigned long start, unsigned long end)
 	 */
 
 	ret = start_isolate_page_range(pfn_align_to_maxpage_down(start),
-				       pfn_align_to_maxpage_up(end));
+				       pfn_align_to_maxpage_up(end),
+				       migratetype);
 	if (ret)
 		goto done;
 
@@ -5703,7 +5709,7 @@ int alloc_contig_range(unsigned long start, unsigned long end)
 
 done:
 	undo_isolate_page_range(pfn_align_to_maxpage_down(start),
-				pfn_align_to_maxpage_up(end));
+				pfn_align_to_maxpage_up(end), migratetype);
 	return ret;
 }
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 9ea2f6e..c80daa9 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -24,6 +24,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * to be MIGRATE_ISOLATE.
  * @start_pfn: The lower PFN of the range to be isolated.
  * @end_pfn: The upper PFN of the range to be isolated.
+ * @migratetype: migrate type to set in error recovery.
  *
  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  * the range will never be allocated. Any free pages and pages freed in the
@@ -32,8 +33,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * start_pfn/end_pfn must be aligned to pageblock_order.
  * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
  */
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			     unsigned migratetype)
 {
 	unsigned long pfn;
 	unsigned long undo_pfn;
@@ -56,7 +57,7 @@ undo:
 	for (pfn = start_pfn;
 	     pfn < undo_pfn;
 	     pfn += pageblock_nr_pages)
-		unset_migratetype_isolate(pfn_to_page(pfn));
+		unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
 
 	return -EBUSY;
 }
@@ -64,8 +65,8 @@ undo:
 /*
  * Make isolated pages available again.
  */
-int
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
+int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
+			    unsigned migratetype)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -77,7 +78,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 			continue;
-		unset_migratetype_isolate(page);
+		unset_migratetype_isolate(page, migratetype);
 	}
 	return 0;
 }
@@ -86,7 +87,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn)
  * all pages in [start_pfn...end_pfn) must be in the same zone.
  * zone->lock must be held before call this.
  *
- * Returns 1 if all pages in the range is isolated.
+ * Returns 1 if all pages in the range are isolated.
  */
 static int
 __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 10/15] mm: extract reclaim code from __alloc_pages_direct_reclaim()
  2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
                   ` (8 preceding siblings ...)
  2012-01-26  9:00 ` [PATCH 09/15] mm: page_isolation: MIGRATE_CMA isolation functions added Marek Szyprowski
@ 2012-01-26  9:00 ` Marek Szyprowski
  2012-01-30 12:42   ` Mel Gorman
  2012-01-26  9:00 ` [PATCH 11/15] mm: trigger page reclaim in alloc_contig_range() to stabilize watermarks Marek Szyprowski
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26  9:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

This patch extracts common reclaim code from __alloc_pages_direct_reclaim()
function to separate function: __perform_reclaim() which can be later used
by alloc_contig_range().

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 mm/page_alloc.c |   30 +++++++++++++++++++++---------
 1 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4e60c0b..e35d06b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2094,16 +2094,13 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 }
 #endif /* CONFIG_COMPACTION */
 
-/* The really slow allocator path where we enter direct reclaim */
-static inline struct page *
-__alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
-	struct zonelist *zonelist, enum zone_type high_zoneidx,
-	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
-	int migratetype, unsigned long *did_some_progress)
+/* Perform direct synchronous page reclaim */
+static inline int
+__perform_reclaim(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist,
+		  nodemask_t *nodemask)
 {
-	struct page *page = NULL;
 	struct reclaim_state reclaim_state;
-	bool drained = false;
+	int progress;
 
 	cond_resched();
 
@@ -2114,7 +2111,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 	reclaim_state.reclaimed_slab = 0;
 	current->reclaim_state = &reclaim_state;
 
-	*did_some_progress = try_to_free_pages(zonelist, order, gfp_mask, nodemask);
+	progress = try_to_free_pages(zonelist, order, gfp_mask, nodemask);
 
 	current->reclaim_state = NULL;
 	lockdep_clear_current_reclaim_state();
@@ -2122,6 +2119,21 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 
 	cond_resched();
 
+	return progress;
+}
+
+/* The really slow allocator path where we enter direct reclaim */
+static inline struct page *
+__alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
+	struct zonelist *zonelist, enum zone_type high_zoneidx,
+	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
+	int migratetype, unsigned long *did_some_progress)
+{
+	struct page *page = NULL;
+	bool drained = false;
+
+	*did_some_progress = __perform_reclaim(gfp_mask, order, zonelist,
+					       nodemask);
 	if (unlikely(!(*did_some_progress)))
 		return NULL;
 
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 11/15] mm: trigger page reclaim in alloc_contig_range() to stabilize watermarks
  2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
                   ` (9 preceding siblings ...)
  2012-01-26  9:00 ` [PATCH 10/15] mm: extract reclaim code from __alloc_pages_direct_reclaim() Marek Szyprowski
@ 2012-01-26  9:00 ` Marek Szyprowski
  2012-01-30 13:05   ` Mel Gorman
  2012-01-26  9:00 ` [PATCH 12/15] drivers: add Contiguous Memory Allocator Marek Szyprowski
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26  9:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

alloc_contig_range() performs memory allocation so it also should keep
track on keeping the correct level of memory watermarks. This commit adds
a call to *_slowpath style reclaim to grab enough pages to make sure that
the final collection of contiguous pages from freelists will not starve
the system.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 mm/page_alloc.c |   36 ++++++++++++++++++++++++++++++++++++
 1 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e35d06b..05eaa82 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5613,6 +5613,34 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 	return ret;
 }
 
+/*
+ * Trigger memory pressure bump to reclaim some pages in order to be able to
+ * allocate 'count' pages in single page units. Does similar work as
+ *__alloc_pages_slowpath() function.
+ */
+static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count)
+{
+	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
+	struct zonelist *zonelist = node_zonelist(0, gfp_mask);
+	int did_some_progress = 0;
+	int order = 1;
+	unsigned long watermark;
+
+	/* Obey watermarks as if the page was being allocated */
+	watermark = low_wmark_pages(zone) + count;
+	while (!zone_watermark_ok(zone, 0, watermark, 0, 0)) {
+		wake_all_kswapd(order, zonelist, high_zoneidx, zone_idx(zone));
+
+		did_some_progress = __perform_reclaim(gfp_mask, order, zonelist,
+						      NULL);
+		if (!did_some_progress) {
+			/* Exhausted what can be done so it's blamo time */
+			out_of_memory(zonelist, gfp_mask, order, NULL);
+		}
+	}
+	return count;
+}
+
 /**
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
@@ -5707,6 +5735,14 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		goto done;
 	}
 
+	/*
+	 * Reclaim enough pages to make sure that contiguous allocation
+	 * will not starve the system.
+	 */
+	__reclaim_pages(page_zone(pfn_to_page(outer_start)),
+		        GFP_HIGHUSER_MOVABLE, end-start);
+
+	/* Grab isolated pages from freelists. */
 	outer_end = isolate_freepages_range(outer_start, end);
 	if (!outer_end) {
 		ret = -EBUSY;
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 12/15] drivers: add Contiguous Memory Allocator
  2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
                   ` (10 preceding siblings ...)
  2012-01-26  9:00 ` [PATCH 11/15] mm: trigger page reclaim in alloc_contig_range() to stabilize watermarks Marek Szyprowski
@ 2012-01-26  9:00 ` Marek Szyprowski
  2012-01-27  9:44   ` [Linaro-mm-sig] " Ohad Ben-Cohen
  2012-01-26  9:00 ` [PATCH 13/15] X86: integrate CMA with DMA-mapping subsystem Marek Szyprowski
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26  9:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

The Contiguous Memory Allocator is a set of helper functions for DMA
mapping framework that improves allocations of contiguous memory chunks.

CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and
gives back to the system. Kernel is allowed to allocate movable pages
within CMA's managed memory so that it can be used for example for page
cache when DMA mapping do not use it. On dma_alloc_from_contiguous()
request such pages are migrated out of CMA area to free required
contiguous block and fulfill the request. This allows to allocate large
contiguous chunks of memory at any time assuming that there is enough
free memory available in the system.

This code is heavily based on earlier works by Michal Nazarewicz.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 Documentation/kernel-parameters.txt  |    5 +
 arch/Kconfig                         |    3 +
 drivers/base/Kconfig                 |   89 ++++++++
 drivers/base/Makefile                |    1 +
 drivers/base/dma-contiguous.c        |  404 ++++++++++++++++++++++++++++++++++
 include/asm-generic/dma-contiguous.h |   27 +++
 include/linux/device.h               |    4 +
 include/linux/dma-contiguous.h       |  110 +++++++++
 8 files changed, 643 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/dma-contiguous.c
 create mode 100644 include/asm-generic/dma-contiguous.h
 create mode 100644 include/linux/dma-contiguous.h

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 033d4e6..84982e2 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -508,6 +508,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Also note the kernel might malfunction if you disable
 			some critical bits.
 
+	cma=nn[MG]	[ARM,KNL]
+			Sets the size of kernel global memory area for contiguous
+			memory allocations. For more information, see
+			include/linux/dma-contiguous.h
+
 	cmo_free_hint=	[PPC] Format: { yes | no }
 			Specify whether pages are marked as being inactive
 			when they are freed.  This is used in CMO environments
diff --git a/arch/Kconfig b/arch/Kconfig
index 4f55c73..8ec200c 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -128,6 +128,9 @@ config HAVE_ARCH_TRACEHOOK
 config HAVE_DMA_ATTRS
 	bool
 
+config HAVE_DMA_CONTIGUOUS
+	bool
+
 config USE_GENERIC_SMP_HELPERS
 	bool
 
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 7be9f79..f56cb20 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -189,4 +189,93 @@ config DMA_SHARED_BUFFER
 	  APIs extension; the file's descriptor can then be passed on to other
 	  driver.
 
+config CMA
+	bool "Contiguous Memory Allocator (EXPERIMENTAL)"
+	depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK && EXPERIMENTAL
+	select MIGRATION
+	help
+	  This enables the Contiguous Memory Allocator which allows drivers
+	  to allocate big physically-contiguous blocks of memory for use with
+	  hardware components that do not support I/O map nor scatter-gather.
+
+	  For more information see <include/linux/dma-contiguous.h>.
+	  If unsure, say "n".
+
+if CMA
+
+config CMA_DEBUG
+	bool "CMA debug messages (DEVELOPMENT)"
+	depends on DEBUG_KERNEL
+	help
+	  Turns on debug messages in CMA.  This produces KERN_DEBUG
+	  messages for every CMA call as well as various messages while
+	  processing calls such as dma_alloc_from_contiguous().
+	  This option does not affect warning and error messages.
+
+comment "Default contiguous memory area size:"
+
+config CMA_SIZE_MBYTES
+	int "Size in Mega Bytes"
+	depends on !CMA_SIZE_SEL_PERCENTAGE
+	default 16
+	help
+	  Defines the size (in MiB) of the default memory area for Contiguous
+	  Memory Allocator.
+
+config CMA_SIZE_PERCENTAGE
+	int "Percentage of total memory"
+	depends on !CMA_SIZE_SEL_MBYTES
+	default 10
+	help
+	  Defines the size of the default memory area for Contiguous Memory
+	  Allocator as a percentage of the total memory in the system.
+
+choice
+	prompt "Selected region size"
+	default CMA_SIZE_SEL_ABSOLUTE
+
+config CMA_SIZE_SEL_MBYTES
+	bool "Use mega bytes value only"
+
+config CMA_SIZE_SEL_PERCENTAGE
+	bool "Use percentage value only"
+
+config CMA_SIZE_SEL_MIN
+	bool "Use lower value (minimum)"
+
+config CMA_SIZE_SEL_MAX
+	bool "Use higher value (maximum)"
+
+endchoice
+
+config CMA_ALIGNMENT
+	int "Maximum PAGE_SIZE order of alignment for contiguous buffers"
+	range 4 9
+	default 8
+	help
+	  DMA mapping framework by default aligns all buffers to the smallest
+	  PAGE_SIZE order which is greater than or equal to the requested buffer
+	  size. This works well for buffers up to a few hundreds kilobytes, but
+	  for larger buffers it just a memory waste. With this parameter you can
+	  specify the maximum PAGE_SIZE order for contiguous buffers. Larger
+	  buffers will be aligned only to this specified order. The order is
+	  expressed as a power of two multiplied by the PAGE_SIZE.
+
+	  For example, if your system defaults to 4KiB pages, the order value
+	  of 8 means that the buffers will be aligned up to 1MiB only.
+
+	  If unsure, leave the default value "8".
+
+config CMA_AREAS
+	int "Maximum count of the CMA device-private areas"
+	default 7
+	help
+	  CMA allows to create CMA areas for particular devices. This parameter
+	  sets the maximum number of such device private CMA areas in the
+	  system.
+
+	  If unsure, leave the default value "7".
+
+endif
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 2c8272d..23ac863 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -6,6 +6,7 @@ obj-y			:= core.o sys.o bus.o dd.o syscore.o \
 			   attribute_container.o transport_class.o \
 			   topology.o
 obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
+obj-$(CONFIG_CMA) += dma-contiguous.o
 obj-y			+= power/
 obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
new file mode 100644
index 0000000..f41e699
--- /dev/null
+++ b/drivers/base/dma-contiguous.c
@@ -0,0 +1,404 @@
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+#ifndef DEBUG
+#  define DEBUG
+#endif
+#endif
+
+#include <asm/page.h>
+#include <asm/dma-contiguous.h>
+
+#include <linux/memblock.h>
+#include <linux/err.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/page-isolation.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/mm_types.h>
+#include <linux/dma-contiguous.h>
+
+#ifndef SZ_1M
+#define SZ_1M (1 << 20)
+#endif
+
+struct cma {
+	unsigned long	base_pfn;
+	unsigned long	count;
+	unsigned long	*bitmap;
+};
+
+struct cma *dma_contiguous_default_area;
+
+#ifdef CONFIG_CMA_SIZE_MBYTES
+#define CMA_SIZE_MBYTES CONFIG_CMA_SIZE_MBYTES
+#else
+#define CMA_SIZE_MBYTES 0
+#endif
+
+#ifdef CONFIG_CMA_SIZE_PERCENTAGE
+#define CMA_SIZE_PERCENTAGE CONFIG_CMA_SIZE_PERCENTAGE
+#else
+#define CMA_SIZE_PERCENTAGE 0
+#endif
+
+/*
+ * Default global CMA area size can be defined in kernel's .config.
+ * This is usefull mainly for distro maintainers to create a kernel
+ * that works correctly for most supported systems.
+ * The size can be set in bytes or as a percentage of the total memory
+ * in the system.
+ *
+ * Users, who want to set the size of global CMA area for their system
+ * should use cma= kernel parameter.
+ */
+static unsigned long size_bytes = CMA_SIZE_MBYTES * SZ_1M;
+static unsigned long size_percent = CMA_SIZE_PERCENTAGE;
+static long size_cmdline = -1;
+
+static int __init early_cma(char *p)
+{
+	pr_debug("%s(%s)\n", __func__, p);
+	size_cmdline = memparse(p, &p);
+	return 0;
+}
+early_param("cma", early_cma);
+
+static unsigned long __init cma_early_get_total_pages(void)
+{
+	struct memblock_region *reg;
+	unsigned long total_pages = 0;
+
+	/*
+	 * We cannot use memblock_phys_mem_size() here, because
+	 * memblock_analyze() has not been called yet.
+	 */
+	for_each_memblock(memory, reg)
+		total_pages += memblock_region_memory_end_pfn(reg) -
+			       memblock_region_memory_base_pfn(reg);
+	return total_pages;
+}
+
+/**
+ * dma_contiguous_reserve() - reserve area for contiguous memory handling
+ * @limit: End address of the reserved memory (optional, 0 for any).
+ *
+ * This funtion reserves memory from early allocator. It should be
+ * called by arch specific code once the early allocator (memblock or bootmem)
+ * has been activated and all other subsystems have already allocated/reserved
+ * memory.
+ */
+void __init dma_contiguous_reserve(phys_addr_t limit)
+{
+	unsigned long selected_size = 0;
+	unsigned long total_pages;
+
+	pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
+
+	total_pages = cma_early_get_total_pages();
+	size_percent *= (total_pages << PAGE_SHIFT) / 100;
+
+	pr_debug("%s: total available: %ld MiB, size absolute: %ld MiB, size percentage: %ld MiB\n",
+		 __func__, (total_pages << PAGE_SHIFT) / SZ_1M,
+		size_bytes / SZ_1M, size_percent / SZ_1M);
+
+#ifdef CONFIG_CMA_SIZE_SEL_MBYTES
+	selected_size = size_bytes;
+#elif defined(CONFIG_CMA_SIZE_SEL_PERCENTAGE)
+	selected_size = size_percent;
+#elif defined(CONFIG_CMA_SIZE_SEL_MIN)
+	selected_size = min(size_bytes, size_percent);
+#elif defined(CONFIG_CMA_SIZE_SEL_MAX)
+	selected_size = max(size_bytes, size_percent);
+#endif
+
+	if (size_cmdline != -1)
+		selected_size = size_cmdline;
+
+	if (!selected_size)
+		return;
+
+	pr_debug("%s: reserving %ld MiB for global area\n", __func__,
+		 selected_size / SZ_1M);
+
+	dma_declare_contiguous(NULL, selected_size, 0, limit);
+};
+
+static DEFINE_MUTEX(cma_mutex);
+
+static int cma_activate_area(unsigned long base_pfn, unsigned long count)
+{
+	unsigned long pfn = base_pfn;
+	unsigned i = count >> pageblock_order;
+	struct zone *zone;
+
+	WARN_ON_ONCE(!pfn_valid(pfn));
+	zone = page_zone(pfn_to_page(pfn));
+
+	do {
+		unsigned j;
+		base_pfn = pfn;
+		for (j = pageblock_nr_pages; j; --j, pfn++) {
+			WARN_ON_ONCE(!pfn_valid(pfn));
+			if (page_zone(pfn_to_page(pfn)) != zone)
+				return -EINVAL;
+		}
+		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
+	} while (--i);
+	return 0;
+}
+
+static struct cma *cma_create_area(unsigned long base_pfn,
+				     unsigned long count)
+{
+	int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
+	struct cma *cma;
+	int ret = -ENOMEM;
+
+	pr_debug("%s(base %08lx, count %lx)\n", __func__, base_pfn, count);
+
+	cma = kmalloc(sizeof *cma, GFP_KERNEL);
+	if (!cma)
+		return ERR_PTR(-ENOMEM);
+
+	cma->base_pfn = base_pfn;
+	cma->count = count;
+	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+	if (!cma->bitmap)
+		goto no_mem;
+
+	ret = cma_activate_area(base_pfn, count);
+	if (ret)
+		goto error;
+
+	pr_debug("%s: returned %p\n", __func__, (void *)cma);
+	return cma;
+
+error:
+	kfree(cma->bitmap);
+no_mem:
+	kfree(cma);
+	return ERR_PTR(ret);
+}
+
+static struct cma_reserved {
+	phys_addr_t start;
+	unsigned long size;
+	struct device *dev;
+} cma_reserved[MAX_CMA_AREAS] __initdata;
+static unsigned cma_reserved_count __initdata;
+
+static int __init cma_init_reserved_areas(void)
+{
+	struct cma_reserved *r = cma_reserved;
+	unsigned i = cma_reserved_count;
+
+	pr_debug("%s()\n", __func__);
+
+	for (; i; --i, ++r) {
+		struct cma *cma;
+		cma = cma_create_area(PFN_DOWN(r->start),
+				      r->size >> PAGE_SHIFT);
+		if (!IS_ERR(cma))
+			dev_set_cma_area(r->dev, cma);
+	}
+	return 0;
+}
+core_initcall(cma_init_reserved_areas);
+
+/**
+ * dma_declare_contiguous() - reserve area for contiguous memory handling
+ *			      for particular device
+ * @dev:   Pointer to device structure.
+ * @size:  Size of the reserved memory.
+ * @start: Start address of the reserved memory (optional, 0 for any).
+ * @limit: End address of the reserved memory (optional, 0 for any).
+ *
+ * This funtion reserves memory for specified device. It should be
+ * called by board specific code when early allocator (memblock or bootmem)
+ * is still activate.
+ */
+int __init dma_declare_contiguous(struct device *dev, unsigned long size,
+				  phys_addr_t base, phys_addr_t limit)
+{
+	struct cma_reserved *r = &cma_reserved[cma_reserved_count];
+	unsigned long alignment;
+
+	pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__,
+		 (unsigned long)size, (unsigned long)base,
+		 (unsigned long)limit);
+
+	/* Sanity checks */
+	if (cma_reserved_count == ARRAY_SIZE(cma_reserved)) {
+		pr_err("Not enough slots for CMA reserved regions!\n");
+		return -ENOSPC;
+	}
+
+	if (!size)
+		return -EINVAL;
+
+	/* Sanitise input arguments */
+	alignment = PAGE_SIZE << max(MAX_ORDER, pageblock_order);
+	base = ALIGN(base, alignment);
+	size = ALIGN(size, alignment);
+	limit &= ~(alignment - 1);
+
+	/* Reserve memory */
+	if (base) {
+		if (memblock_is_region_reserved(base, size) ||
+		    memblock_reserve(base, size) < 0) {
+			base = -EBUSY;
+			goto err;
+		}
+	} else {
+		/*
+		 * Use __memblock_alloc_base() since
+		 * memblock_alloc_base() panic()s.
+		 */
+		phys_addr_t addr = __memblock_alloc_base(size, alignment, limit);
+		if (!addr) {
+			base = -ENOMEM;
+			goto err;
+		} else if (addr + size > ~(unsigned long)0) {
+			memblock_free(addr, size);
+			base = -EINVAL;
+			goto err;
+		} else {
+			base = addr;
+		}
+	}
+
+	/*
+	 * Each reserved area must be initialised later, when more kernel
+	 * subsystems (like slab allocator) are available.
+	 */
+	r->start = base;
+	r->size = size;
+	r->dev = dev;
+	cma_reserved_count++;
+	pr_info("CMA: reserved %ld MiB at %08lx\n", size / SZ_1M,
+		(unsigned long)base);
+
+	/*
+	 * Architecture specific contiguous memory fixup.
+	 */
+	dma_contiguous_early_fixup(base, size);
+	return 0;
+err:
+	pr_err("CMA: failed to reserve %ld MiB\n", size / SZ_1M);
+	return base;
+}
+
+/**
+ * dma_alloc_from_contiguous() - allocate pages from contiguous area
+ * @dev:   Pointer to device for which the allocation is performed.
+ * @count: Requested number of pages.
+ * @align: Requested alignment of pages (in PAGE_SIZE order).
+ *
+ * This funtion allocates memory buffer for specified device. It uses
+ * device specific contiguous memory area if available or the default
+ * global one. Requires architecture specific get_dev_cma_area() helper
+ * function.
+ */
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int align)
+{
+	struct cma *cma = dev_get_cma_area(dev);
+	unsigned long pfn, pageno, start = 0;
+	unsigned long mask = (1 << align) - 1;
+	int ret;
+
+	if (!cma || !cma->count)
+		return NULL;
+
+	if (align > CONFIG_CMA_ALIGNMENT)
+		align = CONFIG_CMA_ALIGNMENT;
+
+	pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
+		 count, align);
+
+	if (!count)
+		return NULL;
+
+	mutex_lock(&cma_mutex);
+
+	for (;;) {
+		pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count,
+						    start, count, mask);
+		if (pageno >= cma->count) {
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		pfn = cma->base_pfn + pageno;
+		ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
+		if (ret == 0) {
+			bitmap_set(cma->bitmap, pageno, count);
+			break;
+		} else if (ret != -EBUSY) {
+			goto error;
+		}
+		pr_debug("%s(): memory range at %p is busy, retrying\n",
+			 __func__, pfn_to_page(pfn));
+		/* try again with a bit different memory target */
+		start = pageno + mask + 1;
+	}
+
+	mutex_unlock(&cma_mutex);
+
+	pr_debug("%s(): returned %p\n", __func__, pfn_to_page(pfn));
+	return pfn_to_page(pfn);
+error:
+	mutex_unlock(&cma_mutex);
+	return NULL;
+}
+
+/**
+ * dma_release_from_contiguous() - release allocated pages
+ * @dev:   Pointer to device for which the pages were allocated.
+ * @pages: Allocated pages.
+ * @count: Number of allocated pages.
+ *
+ * This funtion releases memory allocated by dma_alloc_from_contiguous().
+ * It return 0 when provided pages doen't belongs to contiguous area and
+ * 1 on success.
+ */
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	struct cma *cma = dev_get_cma_area(dev);
+	unsigned long pfn;
+
+	if (!cma || !pages)
+		return 0;
+
+	pr_debug("%s(page %p)\n", __func__, (void *)pages);
+
+	pfn = page_to_pfn(pages);
+
+	if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count)
+		return 0;
+
+	mutex_lock(&cma_mutex);
+
+	bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count);
+	free_contig_range(pfn, count);
+
+	mutex_unlock(&cma_mutex);
+	return 1;
+}
diff --git a/include/asm-generic/dma-contiguous.h b/include/asm-generic/dma-contiguous.h
new file mode 100644
index 0000000..bf2bccc
--- /dev/null
+++ b/include/asm-generic/dma-contiguous.h
@@ -0,0 +1,27 @@
+#ifndef ASM_DMA_CONTIGUOUS_H
+#define ASM_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+
+#ifdef CONFIG_CMA
+
+static inline struct cma *dev_get_cma_area(struct device *dev)
+{
+	if (dev && dev->cma_area)
+		return dev->cma_area;
+	return dma_contiguous_default_area;
+}
+
+static inline void dev_set_cma_area(struct device *dev, struct cma *cma)
+{
+	if (dev)
+		dev->cma_area = cma;
+	dma_contiguous_default_area = cma;
+}
+
+#endif
+#endif
+#endif
diff --git a/include/linux/device.h b/include/linux/device.h
index 5b3adb8..020c095 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -666,6 +666,10 @@ struct device {
 
 	struct dma_coherent_mem	*dma_mem; /* internal for coherent mem
 					     override */
+#ifdef CONFIG_CMA
+	struct cma *cma_area;		/* contiguous memory area for dma
+					   allocations */
+#endif
 	/* arch specific additions */
 	struct dev_archdata	archdata;
 
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
new file mode 100644
index 0000000..ffb4b40
--- /dev/null
+++ b/include/linux/dma-contiguous.h
@@ -0,0 +1,110 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator for DMA mapping framework
+ * Copyright (c) 2010-2011 by Samsung Electronics.
+ * Written by:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *	Michal Nazarewicz <mina86@mina86.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * Contiguous Memory Allocator
+ *
+ *   The Contiguous Memory Allocator (CMA) makes it possible to
+ *   allocate big contiguous chunks of memory after the system has
+ *   booted.
+ *
+ * Why is it needed?
+ *
+ *   Various devices on embedded systems have no scatter-getter and/or
+ *   IO map support and require contiguous blocks of memory to
+ *   operate.  They include devices such as cameras, hardware video
+ *   coders, etc.
+ *
+ *   Such devices often require big memory buffers (a full HD frame
+ *   is, for instance, more then 2 mega pixels large, i.e. more than 6
+ *   MB of memory), which makes mechanisms such as kmalloc() or
+ *   alloc_page() ineffective.
+ *
+ *   At the same time, a solution where a big memory region is
+ *   reserved for a device is suboptimal since often more memory is
+ *   reserved then strictly required and, moreover, the memory is
+ *   inaccessible to page system even if device drivers don't use it.
+ *
+ *   CMA tries to solve this issue by operating on memory regions
+ *   where only movable pages can be allocated from.  This way, kernel
+ *   can use the memory for pagecache and when device driver requests
+ *   it, allocated pages can be migrated.
+ *
+ * Driver usage
+ *
+ *   CMA should not be used by the device drivers directly. It is
+ *   only a helper framework for dma-mapping subsystem.
+ *
+ *   For more information, see kernel-docs in drivers/base/dma-contiguous.c
+ */
+
+#ifdef __KERNEL__
+
+struct cma;
+struct page;
+struct device;
+
+#ifdef CONFIG_CMA
+
+/*
+ * There is always at least global CMA area and a few optional device
+ * private areas configured in kernel .config.
+ */
+#define MAX_CMA_AREAS	(1 + CONFIG_CMA_AREAS)
+
+extern struct cma *dma_contiguous_default_area;
+
+void dma_contiguous_reserve(phys_addr_t addr_limit);
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base, phys_addr_t limit);
+
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order);
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count);
+
+#else
+
+#define MAX_CMA_AREAS	(0)
+
+static inline void dma_contiguous_reserve(phys_addr_t limit) { }
+
+static inline
+int dma_declare_contiguous(struct device *dev, unsigned long size,
+			   phys_addr_t base, phys_addr_t limit)
+{
+	return -ENOSYS;
+}
+
+static inline
+struct page *dma_alloc_from_contiguous(struct device *dev, int count,
+				       unsigned int order)
+{
+	return NULL;
+}
+
+static inline
+int dma_release_from_contiguous(struct device *dev, struct page *pages,
+				int count)
+{
+	return 0;
+}
+
+#endif
+
+#endif
+
+#endif
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 13/15] X86: integrate CMA with DMA-mapping subsystem
  2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
                   ` (11 preceding siblings ...)
  2012-01-26  9:00 ` [PATCH 12/15] drivers: add Contiguous Memory Allocator Marek Szyprowski
@ 2012-01-26  9:00 ` Marek Szyprowski
  2012-01-26  9:00 ` [PATCH 14/15] ARM: " Marek Szyprowski
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26  9:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

This patch adds support for CMA to dma-mapping subsystem for x86
architecture that uses common pci-dma/pci-nommu implementation. This
allows to test CMA on KVM/QEMU and a lot of common x86 boxes.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 arch/x86/Kconfig                      |    1 +
 arch/x86/include/asm/dma-contiguous.h |   13 +++++++++++++
 arch/x86/include/asm/dma-mapping.h    |    4 ++++
 arch/x86/kernel/pci-dma.c             |   18 ++++++++++++++++--
 arch/x86/kernel/pci-nommu.c           |    8 +-------
 arch/x86/kernel/setup.c               |    2 ++
 6 files changed, 37 insertions(+), 9 deletions(-)
 create mode 100644 arch/x86/include/asm/dma-contiguous.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 864cc6e..1e00736 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -31,6 +31,7 @@ config X86
 	select ARCH_WANT_OPTIONAL_GPIOLIB
 	select ARCH_WANT_FRAME_POINTERS
 	select HAVE_DMA_ATTRS
+	select HAVE_DMA_CONTIGUOUS if !SWIOTLB
 	select HAVE_KRETPROBES
 	select HAVE_OPTPROBES
 	select HAVE_FTRACE_MCOUNT_RECORD
diff --git a/arch/x86/include/asm/dma-contiguous.h b/arch/x86/include/asm/dma-contiguous.h
new file mode 100644
index 0000000..8fb117d
--- /dev/null
+++ b/arch/x86/include/asm/dma-contiguous.h
@@ -0,0 +1,13 @@
+#ifndef ASMX86_DMA_CONTIGUOUS_H
+#define ASMX86_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+#include <asm-generic/dma-contiguous.h>
+
+static inline void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size) { }
+
+#endif
+#endif
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index ed3065f..90ac6f0 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -13,6 +13,7 @@
 #include <asm/io.h>
 #include <asm/swiotlb.h>
 #include <asm-generic/dma-coherent.h>
+#include <linux/dma-contiguous.h>
 
 #ifdef CONFIG_ISA
 # define ISA_DMA_BIT_MASK DMA_BIT_MASK(24)
@@ -61,6 +62,9 @@ extern int dma_set_mask(struct device *dev, u64 mask);
 extern void *dma_generic_alloc_coherent(struct device *dev, size_t size,
 					dma_addr_t *dma_addr, gfp_t flag);
 
+extern void dma_generic_free_coherent(struct device *dev, size_t size,
+				      void *vaddr, dma_addr_t dma_addr);
+
 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
 {
 	if (!dev->dma_mask)
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 1c4d769..d3c3723 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -99,14 +99,18 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
 				 dma_addr_t *dma_addr, gfp_t flag)
 {
 	unsigned long dma_mask;
-	struct page *page;
+	struct page *page = NULL;
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
 	dma_addr_t addr;
 
 	dma_mask = dma_alloc_coherent_mask(dev, flag);
 
 	flag |= __GFP_ZERO;
 again:
-	page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
+	if (!(flag & GFP_ATOMIC))
+		page = dma_alloc_from_contiguous(dev, count, get_order(size));
+	if (!page)
+		page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
 	if (!page)
 		return NULL;
 
@@ -126,6 +130,16 @@ again:
 	return page_address(page);
 }
 
+void dma_generic_free_coherent(struct device *dev, size_t size, void *vaddr,
+			       dma_addr_t dma_addr)
+{
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	struct page *page = virt_to_page(vaddr);
+
+	if (!dma_release_from_contiguous(dev, page, count))
+		free_pages((unsigned long)vaddr, get_order(size));
+}
+
 /*
  * See <Documentation/x86/x86_64/boot-options.txt> for the iommu kernel
  * parameter documentation.
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index 3af4af8..656566f 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -74,12 +74,6 @@ static int nommu_map_sg(struct device *hwdev, struct scatterlist *sg,
 	return nents;
 }
 
-static void nommu_free_coherent(struct device *dev, size_t size, void *vaddr,
-				dma_addr_t dma_addr)
-{
-	free_pages((unsigned long)vaddr, get_order(size));
-}
-
 static void nommu_sync_single_for_device(struct device *dev,
 			dma_addr_t addr, size_t size,
 			enum dma_data_direction dir)
@@ -97,7 +91,7 @@ static void nommu_sync_sg_for_device(struct device *dev,
 
 struct dma_map_ops nommu_dma_ops = {
 	.alloc_coherent		= dma_generic_alloc_coherent,
-	.free_coherent		= nommu_free_coherent,
+	.free_coherent		= dma_generic_free_coherent,
 	.map_sg			= nommu_map_sg,
 	.map_page		= nommu_map_page,
 	.sync_single_for_device = nommu_sync_single_for_device,
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d7d5099..be6795f 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -50,6 +50,7 @@
 #include <asm/pci-direct.h>
 #include <linux/init_ohci1394_dma.h>
 #include <linux/kvm_para.h>
+#include <linux/dma-contiguous.h>
 
 #include <linux/errno.h>
 #include <linux/kernel.h>
@@ -938,6 +939,7 @@ void __init setup_arch(char **cmdline_p)
 	}
 #endif
 	memblock.current_limit = get_max_mapped();
+	dma_contiguous_reserve(0);
 
 	/*
 	 * NOTE: On x86-32, only from this point on, fixmaps are ready for use.
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 14/15] ARM: integrate CMA with DMA-mapping subsystem
  2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
                   ` (12 preceding siblings ...)
  2012-01-26  9:00 ` [PATCH 13/15] X86: integrate CMA with DMA-mapping subsystem Marek Szyprowski
@ 2012-01-26  9:00 ` Marek Szyprowski
  2012-01-26  9:00 ` [PATCH 15/15] ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device Marek Szyprowski
  2012-01-26 15:31 ` [PATCHv19 00/15] Contiguous Memory Allocator Arnd Bergmann
  15 siblings, 0 replies; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26  9:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

This patch adds support for CMA to dma-mapping subsystem for ARM
architecture. By default a global CMA area is used, but specific devices
are allowed to have their private memory areas if required (they can be
created with dma_declare_contiguous() function during board
initialization).

Contiguous memory areas reserved for DMA are remapped with 2-level page
tables on boot. Once a buffer is requested, a low memory kernel mapping
is updated to to match requested memory access type.

GFP_ATOMIC allocations are performed from special pool which is created
early during boot. This way remapping page attributes is not needed on
allocation time.

CMA has been enabled unconditionally for ARMv6+ systems.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
 Documentation/kernel-parameters.txt   |    4 +
 arch/arm/Kconfig                      |    2 +
 arch/arm/include/asm/dma-contiguous.h |   16 ++
 arch/arm/include/asm/mach/map.h       |    1 +
 arch/arm/kernel/setup.c               |    9 +-
 arch/arm/mm/dma-mapping.c             |  368 +++++++++++++++++++++++++++------
 arch/arm/mm/init.c                    |   22 ++-
 arch/arm/mm/mm.h                      |    3 +
 arch/arm/mm/mmu.c                     |   31 ++-
 9 files changed, 368 insertions(+), 88 deletions(-)
 create mode 100644 arch/arm/include/asm/dma-contiguous.h

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 84982e2..ff97085 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -520,6 +520,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			a hypervisor.
 			Default: yes
 
+	coherent_pool=nn[KMG]	[ARM,KNL]
+			Sets the size of memory pool for coherent, atomic dma
+			allocations if Contiguous Memory Allocator (CMA) is used.
+
 	code_bytes	[X86] How many bytes of object code to print
 			in an oops report.
 			Range: 0 - 8192
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 24626b0..8179981 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -4,6 +4,8 @@ config ARM
 	select HAVE_AOUT
 	select HAVE_DMA_API_DEBUG
 	select HAVE_IDE if PCI || ISA || PCMCIA
+	select HAVE_DMA_CONTIGUOUS if (CPU_V6 || CPU_V6K || CPU_V7)
+	select CMA if (CPU_V6 || CPU_V6K || CPU_V7)
 	select HAVE_MEMBLOCK
 	select RTC_LIB
 	select SYS_SUPPORTS_APM_EMULATION
diff --git a/arch/arm/include/asm/dma-contiguous.h b/arch/arm/include/asm/dma-contiguous.h
new file mode 100644
index 0000000..c7ba05e
--- /dev/null
+++ b/arch/arm/include/asm/dma-contiguous.h
@@ -0,0 +1,16 @@
+#ifndef ASMARM_DMA_CONTIGUOUS_H
+#define ASMARM_DMA_CONTIGUOUS_H
+
+#ifdef __KERNEL__
+
+#include <linux/device.h>
+#include <linux/dma-contiguous.h>
+#include <asm-generic/dma-contiguous.h>
+
+#ifdef CONFIG_CMA
+
+void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size);
+
+#endif
+#endif
+#endif
diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index b36f365..a6efcdd 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -30,6 +30,7 @@ struct map_desc {
 #define MT_MEMORY_DTCM		12
 #define MT_MEMORY_ITCM		13
 #define MT_MEMORY_SO		14
+#define MT_MEMORY_DMA_READY	15
 
 #ifdef CONFIG_MMU
 extern void iotable_init(struct map_desc *, int);
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 129fbd5..ae9e86d 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -80,6 +80,7 @@ __setup("fpe=", fpe_setup);
 extern void paging_init(struct machine_desc *desc);
 extern void sanity_check_meminfo(void);
 extern void reboot_setup(char *str);
+extern void setup_dma_zone(struct machine_desc *desc);
 
 unsigned int processor_id;
 EXPORT_SYMBOL(processor_id);
@@ -910,12 +911,8 @@ void __init setup_arch(char **cmdline_p)
 	machine_desc = mdesc;
 	machine_name = mdesc->name;
 
-#ifdef CONFIG_ZONE_DMA
-	if (mdesc->dma_zone_size) {
-		extern unsigned long arm_dma_zone_size;
-		arm_dma_zone_size = mdesc->dma_zone_size;
-	}
-#endif
+	setup_dma_zone(mdesc);
+
 	if (mdesc->restart_mode)
 		reboot_setup(&mdesc->restart_mode);
 
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 1aa664a..77e7755 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -17,7 +17,9 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/highmem.h>
+#include <linux/memblock.h>
 #include <linux/slab.h>
 
 #include <asm/memory.h>
@@ -26,6 +28,8 @@
 #include <asm/tlbflush.h>
 #include <asm/sizes.h>
 #include <asm/mach/arch.h>
+#include <asm/mach/map.h>
+#include <asm/dma-contiguous.h>
 
 #include "mm.h"
 
@@ -56,6 +60,19 @@ static u64 get_coherent_dma_mask(struct device *dev)
 	return mask;
 }
 
+static void __dma_clear_buffer(struct page *page, size_t size)
+{
+	void *ptr;
+	/*
+	 * Ensure that the allocated pages are zeroed, and that any data
+	 * lurking in the kernel direct-mapped region is invalidated.
+	 */
+	ptr = page_address(page);
+	memset(ptr, 0, size);
+	dmac_flush_range(ptr, ptr + size);
+	outer_flush_range(__pa(ptr), __pa(ptr) + size);
+}
+
 /*
  * Allocate a DMA buffer for 'dev' of size 'size' using the
  * specified gfp mask.  Note that 'size' must be page aligned.
@@ -64,23 +81,6 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 {
 	unsigned long order = get_order(size);
 	struct page *page, *p, *e;
-	void *ptr;
-	u64 mask = get_coherent_dma_mask(dev);
-
-#ifdef CONFIG_DMA_API_DEBUG
-	u64 limit = (mask + 1) & ~mask;
-	if (limit && size >= limit) {
-		dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n",
-			size, mask);
-		return NULL;
-	}
-#endif
-
-	if (!mask)
-		return NULL;
-
-	if (mask < 0xffffffffULL)
-		gfp |= GFP_DMA;
 
 	page = alloc_pages(gfp, order);
 	if (!page)
@@ -93,14 +93,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
 	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
 		__free_page(p);
 
-	/*
-	 * Ensure that the allocated pages are zeroed, and that any data
-	 * lurking in the kernel direct-mapped region is invalidated.
-	 */
-	ptr = page_address(page);
-	memset(ptr, 0, size);
-	dmac_flush_range(ptr, ptr + size);
-	outer_flush_range(__pa(ptr), __pa(ptr) + size);
+	__dma_clear_buffer(page, size);
 
 	return page;
 }
@@ -170,6 +163,9 @@ static int __init consistent_init(void)
 	unsigned long base = consistent_base;
 	unsigned long num_ptes = (CONSISTENT_END - base) >> PMD_SHIFT;
 
+	if (cpu_architecture() >= CPU_ARCH_ARMv6)
+		return 0;
+
 	consistent_pte = kmalloc(num_ptes * sizeof(pte_t), GFP_KERNEL);
 	if (!consistent_pte) {
 		pr_err("%s: no memory\n", __func__);
@@ -210,9 +206,101 @@ static int __init consistent_init(void)
 
 	return ret;
 }
-
 core_initcall(consistent_init);
 
+static void *__alloc_from_contiguous(struct device *dev, size_t size,
+				     pgprot_t prot, struct page **ret_page);
+
+static struct arm_vmregion_head coherent_head = {
+	.vm_lock	= __SPIN_LOCK_UNLOCKED(&coherent_head.vm_lock),
+	.vm_list	= LIST_HEAD_INIT(coherent_head.vm_list),
+};
+
+size_t coherent_pool_size = DEFAULT_CONSISTENT_DMA_SIZE / 8;
+
+static int __init early_coherent_pool(char *p)
+{
+	coherent_pool_size = memparse(p, &p);
+	return 0;
+}
+early_param("coherent_pool", early_coherent_pool);
+
+/*
+ * Initialise the coherent pool for atomic allocations.
+ */
+static int __init coherent_init(void)
+{
+	pgprot_t prot = pgprot_dmacoherent(pgprot_kernel);
+	size_t size = coherent_pool_size;
+	struct page *page;
+	void *ptr;
+
+	if (cpu_architecture() < CPU_ARCH_ARMv6)
+		return 0;
+
+	ptr = __alloc_from_contiguous(NULL, size, prot, &page);
+	if (ptr) {
+		coherent_head.vm_start = (unsigned long) ptr;
+		coherent_head.vm_end = (unsigned long) ptr + size;
+		printk(KERN_INFO "DMA: preallocated %u KiB pool for atomic coherent allocations\n",
+		       (unsigned)size / 1024);
+		return 0;
+	}
+	printk(KERN_ERR "DMA: failed to allocate %u KiB pool for atomic coherent allocation\n",
+	       (unsigned)size / 1024);
+	return -ENOMEM;
+}
+/*
+ * CMA is activated by core_initcall, so we must be called after it
+ */
+postcore_initcall(coherent_init);
+
+struct dma_contig_early_reserve {
+	phys_addr_t base;
+	unsigned long size;
+};
+
+static struct dma_contig_early_reserve dma_mmu_remap[MAX_CMA_AREAS] __initdata;
+
+static int dma_mmu_remap_num __initdata;
+
+void __init dma_contiguous_early_fixup(phys_addr_t base, unsigned long size)
+{
+	dma_mmu_remap[dma_mmu_remap_num].base = base;
+	dma_mmu_remap[dma_mmu_remap_num].size = size;
+	dma_mmu_remap_num++;
+}
+
+void __init dma_contiguous_remap(void)
+{
+	int i;
+	for (i = 0; i < dma_mmu_remap_num; i++) {
+		phys_addr_t start = dma_mmu_remap[i].base;
+		phys_addr_t end = start + dma_mmu_remap[i].size;
+		struct map_desc map;
+		unsigned long addr;
+
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
+		if (start >= end)
+			return;
+
+		map.pfn = __phys_to_pfn(start);
+		map.virtual = __phys_to_virt(start);
+		map.length = end - start;
+		map.type = MT_MEMORY_DMA_READY;
+
+		/*
+		 * Clear previous low-memory mapping
+		 */
+		for (addr = __phys_to_virt(start); addr < __phys_to_virt(end);
+		     addr += PGDIR_SIZE)
+			pmd_clear(pmd_off_k(addr));
+
+		iotable_init(&map, 1);
+	}
+}
+
 static void *
 __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot)
 {
@@ -318,20 +406,172 @@ static void __dma_free_remap(void *cpu_addr, size_t size)
 	arm_vmregion_free(&consistent_head, c);
 }
 
+static int __dma_update_pte(pte_t *pte, pgtable_t token, unsigned long addr,
+			    void *data)
+{
+	struct page *page = virt_to_page(addr);
+	pgprot_t prot = *(pgprot_t *)data;
+
+	set_pte_ext(pte, mk_pte(page, prot), 0);
+	return 0;
+}
+
+static void __dma_remap(struct page *page, size_t size, pgprot_t prot)
+{
+	unsigned long start = (unsigned long) page_address(page);
+	unsigned end = start + size;
+
+	apply_to_page_range(&init_mm, start, size, __dma_update_pte, &prot);
+	dsb();
+	flush_tlb_kernel_range(start, end);
+}
+
+static void *__alloc_remap_buffer(struct device *dev, size_t size, gfp_t gfp,
+				 pgprot_t prot, struct page **ret_page)
+{
+	struct page *page;
+	void *ptr;
+	page = __dma_alloc_buffer(dev, size, gfp);
+	if (!page)
+		return NULL;
+
+	ptr = __dma_alloc_remap(page, size, gfp, prot);
+	if (!ptr) {
+		__dma_free_buffer(page, size);
+		return NULL;
+	}
+
+	*ret_page = page;
+	return ptr;
+}
+
+static void *__alloc_from_pool(struct device *dev, size_t size,
+			       struct page **ret_page)
+{
+	struct arm_vmregion *c;
+	size_t align;
+
+	if (!coherent_head.vm_start) {
+		printk(KERN_ERR "%s: coherent pool not initialised!\n",
+		       __func__);
+		dump_stack();
+		return NULL;
+	}
+
+	/*
+	 * Align the region allocation - allocations from pool are rather
+	 * small, so align them to their order in pages, minimum is a page
+	 * size. This helps reduce fragmentation of the DMA space.
+	 */
+	align = PAGE_SIZE << get_order(size);
+	c = arm_vmregion_alloc(&coherent_head, align, size, 0);
+	if (c) {
+		void *ptr = (void *)c->vm_start;
+		struct page *page = virt_to_page(ptr);
+		*ret_page = page;
+		return ptr;
+	}
+	return NULL;
+}
+
+static int __free_from_pool(void *cpu_addr, size_t size)
+{
+	unsigned long start = (unsigned long)cpu_addr;
+	unsigned long end = start + size;
+	struct arm_vmregion *c;
+
+	if (start < coherent_head.vm_start || end > coherent_head.vm_end)
+		return 0;
+
+	c = arm_vmregion_find_remove(&coherent_head, (unsigned long)start);
+
+	if ((c->vm_end - c->vm_start) != size) {
+		printk(KERN_ERR "%s: freeing wrong coherent size (%ld != %d)\n",
+		       __func__, c->vm_end - c->vm_start, size);
+		dump_stack();
+		size = c->vm_end - c->vm_start;
+	}
+
+	arm_vmregion_free(&coherent_head, c);
+	return 1;
+}
+
+static void *__alloc_from_contiguous(struct device *dev, size_t size,
+				     pgprot_t prot, struct page **ret_page)
+{
+	unsigned long order = get_order(size);
+	size_t count = size >> PAGE_SHIFT;
+	struct page *page;
+
+	page = dma_alloc_from_contiguous(dev, count, order);
+	if (!page)
+		return NULL;
+
+	__dma_clear_buffer(page, size);
+	__dma_remap(page, size, prot);
+
+	*ret_page = page;
+	return page_address(page);
+}
+
+static void __free_from_contiguous(struct device *dev, struct page *page,
+				   size_t size)
+{
+	__dma_remap(page, size, pgprot_kernel);
+	dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
+}
+
+#define nommu() 0
+
 #else	/* !CONFIG_MMU */
 
-#define __dma_alloc_remap(page, size, gfp, prot)	page_address(page)
-#define __dma_free_remap(addr, size)			do { } while (0)
+#define nommu() 1
+
+#define __alloc_remap_buffer(dev, size, gfp, prot, ret)	NULL
+#define __alloc_from_pool(dev, size, ret_page)		NULL
+#define __alloc_from_contiguous(dev, size, prot, ret)	NULL
+#define __free_from_pool(cpu_addr, size)		0
+#define __free_from_contiguous(dev, page, size)		do { } while (0)
+#define __dma_free_remap(cpu_addr, size)		do { } while (0)
 
 #endif	/* CONFIG_MMU */
 
-static void *
-__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
-	    pgprot_t prot)
+static void *__alloc_simple_buffer(struct device *dev, size_t size, gfp_t gfp,
+				   struct page **ret_page)
 {
 	struct page *page;
+	page = __dma_alloc_buffer(dev, size, gfp);
+	if (!page)
+		return NULL;
+
+	*ret_page = page;
+	return page_address(page);
+}
+
+
+
+static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
+			 gfp_t gfp, pgprot_t prot)
+{
+	u64 mask = get_coherent_dma_mask(dev);
+	struct page *page;
 	void *addr;
 
+#ifdef CONFIG_DMA_API_DEBUG
+	u64 limit = (mask + 1) & ~mask;
+	if (limit && size >= limit) {
+		dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n",
+			size, mask);
+		return NULL;
+	}
+#endif
+
+	if (!mask)
+		return NULL;
+
+	if (mask < 0xffffffffULL)
+		gfp |= GFP_DMA;
+
 	/*
 	 * Following is a work-around (a.k.a. hack) to prevent pages
 	 * with __GFP_COMP being passed to split_page() which cannot
@@ -344,19 +584,17 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
 	*handle = ~0;
 	size = PAGE_ALIGN(size);
 
-	page = __dma_alloc_buffer(dev, size, gfp);
-	if (!page)
-		return NULL;
-
-	if (!arch_is_coherent())
-		addr = __dma_alloc_remap(page, size, gfp, prot);
+	if (arch_is_coherent() || nommu())
+		addr = __alloc_simple_buffer(dev, size, gfp, &page);
+	else if (cpu_architecture() < CPU_ARCH_ARMv6)
+		addr = __alloc_remap_buffer(dev, size, gfp, prot, &page);
+	else if (gfp & GFP_ATOMIC)
+		addr = __alloc_from_pool(dev, size, &page);
 	else
-		addr = page_address(page);
+		addr = __alloc_from_contiguous(dev, size, prot, &page);
 
 	if (addr)
 		*handle = pfn_to_dma(dev, page_to_pfn(page));
-	else
-		__dma_free_buffer(page, size);
 
 	return addr;
 }
@@ -365,8 +603,8 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
  * Allocate DMA-coherent memory space and return both the kernel remapped
  * virtual and bus address for that space.
  */
-void *
-dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp)
+void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle,
+			 gfp_t gfp)
 {
 	void *memory;
 
@@ -395,25 +633,11 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 {
 	int ret = -ENXIO;
 #ifdef CONFIG_MMU
-	unsigned long user_size, kern_size;
-	struct arm_vmregion *c;
-
-	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
-
-	c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
-	if (c) {
-		unsigned long off = vma->vm_pgoff;
-
-		kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT;
-
-		if (off < kern_size &&
-		    user_size <= (kern_size - off)) {
-			ret = remap_pfn_range(vma, vma->vm_start,
-					      page_to_pfn(c->vm_pages) + off,
-					      user_size << PAGE_SHIFT,
-					      vma->vm_page_prot);
-		}
-	}
+	unsigned long pfn = dma_to_pfn(dev, dma_addr);
+	ret = remap_pfn_range(vma, vma->vm_start,
+			      pfn + vma->vm_pgoff,
+			      vma->vm_end - vma->vm_start,
+			      vma->vm_page_prot);
 #endif	/* CONFIG_MMU */
 
 	return ret;
@@ -435,23 +659,33 @@ int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL(dma_mmap_writecombine);
 
+
 /*
- * free a page as defined by the above mapping.
- * Must not be called with IRQs disabled.
+ * Free a buffer as defined by the above mapping.
  */
 void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle)
 {
-	WARN_ON(irqs_disabled());
+	struct page *page = pfn_to_page(dma_to_pfn(dev, handle));
 
 	if (dma_release_from_coherent(dev, get_order(size), cpu_addr))
 		return;
 
 	size = PAGE_ALIGN(size);
 
-	if (!arch_is_coherent())
+	if (arch_is_coherent() || nommu()) {
+		__dma_free_buffer(page, size);
+	} else if (cpu_architecture() < CPU_ARCH_ARMv6) {
 		__dma_free_remap(cpu_addr, size);
-
-	__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size);
+		__dma_free_buffer(page, size);
+	} else {
+		if (__free_from_pool(cpu_addr, size))
+			return;
+		/*
+		 * Non-atomic allocations cannot be freed with IRQs disabled
+		 */
+		WARN_ON(irqs_disabled());
+		__free_from_contiguous(dev, page, size);
+	}
 }
 EXPORT_SYMBOL(dma_free_coherent);
 
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 6ec1226..eb8d662 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -20,6 +20,7 @@
 #include <linux/highmem.h>
 #include <linux/gfp.h>
 #include <linux/memblock.h>
+#include <linux/dma-contiguous.h>
 
 #include <asm/mach-types.h>
 #include <asm/memblock.h>
@@ -227,6 +228,17 @@ static void __init arm_adjust_dma_zone(unsigned long *size, unsigned long *hole,
 }
 #endif
 
+void __init setup_dma_zone(struct machine_desc *mdesc)
+{
+#ifdef CONFIG_ZONE_DMA
+	if (mdesc->dma_zone_size) {
+		arm_dma_zone_size = mdesc->dma_zone_size;
+		arm_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
+	} else
+		arm_dma_limit = 0xffffffff;
+#endif
+}
+
 static void __init arm_bootmem_free(unsigned long min, unsigned long max_low,
 	unsigned long max_high)
 {
@@ -274,12 +286,9 @@ static void __init arm_bootmem_free(unsigned long min, unsigned long max_low,
 	 * Adjust the sizes according to any special requirements for
 	 * this machine type.
 	 */
-	if (arm_dma_zone_size) {
+	if (arm_dma_zone_size)
 		arm_adjust_dma_zone(zone_size, zhole_size,
 			arm_dma_zone_size >> PAGE_SHIFT);
-		arm_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
-	} else
-		arm_dma_limit = 0xffffffff;
 #endif
 
 	free_area_init_node(0, zone_size, min, zhole_size);
@@ -365,6 +374,11 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc)
 	if (mdesc->reserve)
 		mdesc->reserve();
 
+	/* reserve memory for DMA contigouos allocations,
+	   must come from DMA area inside low memory */
+	dma_contiguous_reserve(arm_dma_limit < arm_lowmem_limit ?
+			       arm_dma_limit : arm_lowmem_limit);
+
 	arm_memblock_steal_permitted = false;
 	memblock_allow_resize();
 	memblock_dump_all();
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index 70f6d3ea..398c438 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -43,5 +43,8 @@ extern u32 arm_dma_limit;
 #define arm_dma_limit ((u32)~0)
 #endif
 
+extern phys_addr_t arm_lowmem_limit;
+
 void __init bootmem_init(void);
 void arm_mm_memblock_reserve(void);
+void dma_contiguous_remap(void);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 94c5a0c..b9fbec2 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -286,6 +286,11 @@ static struct mem_type mem_types[] = {
 				PMD_SECT_UNCACHED | PMD_SECT_XN,
 		.domain    = DOMAIN_KERNEL,
 	},
+	[MT_MEMORY_DMA_READY] = {
+		.prot_pte  = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY,
+		.prot_l1   = PMD_TYPE_TABLE,
+		.domain    = DOMAIN_KERNEL,
+	},
 };
 
 const struct mem_type *get_mem_type(unsigned int type)
@@ -427,6 +432,7 @@ static void __init build_mem_type_table(void)
 	if (arch_is_coherent() && cpu_is_xsc3()) {
 		mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+		mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 		mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 		mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 	}
@@ -458,6 +464,7 @@ static void __init build_mem_type_table(void)
 			mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+			mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED;
 			mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
 			mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
 		}
@@ -509,6 +516,7 @@ static void __init build_mem_type_table(void)
 	mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;
 	mem_types[MT_MEMORY].prot_sect |= ecc_mask | cp->pmd;
 	mem_types[MT_MEMORY].prot_pte |= kern_pgprot;
+	mem_types[MT_MEMORY_DMA_READY].prot_pte |= kern_pgprot;
 	mem_types[MT_MEMORY_NONCACHED].prot_sect |= ecc_mask;
 	mem_types[MT_ROM].prot_sect |= cp->pmd;
 
@@ -593,7 +601,7 @@ static void __init alloc_init_section(pud_t *pud, unsigned long addr,
 	 * L1 entries, whereas PGDs refer to a group of L1 entries making
 	 * up one logical pointer to an L2 table.
 	 */
-	if (((addr | end | phys) & ~SECTION_MASK) == 0) {
+	if (type->prot_sect && ((addr | end | phys) & ~SECTION_MASK) == 0) {
 		pmd_t *p = pmd;
 
 #ifndef CONFIG_ARM_LPAE
@@ -811,7 +819,7 @@ static int __init early_vmalloc(char *arg)
 }
 early_param("vmalloc", early_vmalloc);
 
-static phys_addr_t lowmem_limit __initdata = 0;
+phys_addr_t arm_lowmem_limit __initdata = 0;
 
 void __init sanity_check_meminfo(void)
 {
@@ -894,8 +902,8 @@ void __init sanity_check_meminfo(void)
 			bank->size = newsize;
 		}
 #endif
-		if (!bank->highmem && bank->start + bank->size > lowmem_limit)
-			lowmem_limit = bank->start + bank->size;
+		if (!bank->highmem && bank->start + bank->size > arm_lowmem_limit)
+			arm_lowmem_limit = bank->start + bank->size;
 
 		j++;
 	}
@@ -920,8 +928,8 @@ void __init sanity_check_meminfo(void)
 	}
 #endif
 	meminfo.nr_banks = j;
-	high_memory = __va(lowmem_limit - 1) + 1;
-	memblock_set_current_limit(lowmem_limit);
+	high_memory = __va(arm_lowmem_limit - 1) + 1;
+	memblock_set_current_limit(arm_lowmem_limit);
 }
 
 static inline void prepare_page_table(void)
@@ -946,8 +954,8 @@ static inline void prepare_page_table(void)
 	 * Find the end of the first block of lowmem.
 	 */
 	end = memblock.memory.regions[0].base + memblock.memory.regions[0].size;
-	if (end >= lowmem_limit)
-		end = lowmem_limit;
+	if (end >= arm_lowmem_limit)
+		end = arm_lowmem_limit;
 
 	/*
 	 * Clear out all the kernel space mappings, except for the first
@@ -1087,8 +1095,8 @@ static void __init map_lowmem(void)
 		phys_addr_t end = start + reg->size;
 		struct map_desc map;
 
-		if (end > lowmem_limit)
-			end = lowmem_limit;
+		if (end > arm_lowmem_limit)
+			end = arm_lowmem_limit;
 		if (start >= end)
 			break;
 
@@ -1109,11 +1117,12 @@ void __init paging_init(struct machine_desc *mdesc)
 {
 	void *zero_page;
 
-	memblock_set_current_limit(lowmem_limit);
+	memblock_set_current_limit(arm_lowmem_limit);
 
 	build_mem_type_table();
 	prepare_page_table();
 	map_lowmem();
+	dma_contiguous_remap();
 	devicemaps_init(mdesc);
 	kmap_init();
 
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 15/15] ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device
  2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
                   ` (13 preceding siblings ...)
  2012-01-26  9:00 ` [PATCH 14/15] ARM: " Marek Szyprowski
@ 2012-01-26  9:00 ` Marek Szyprowski
  2012-01-26 15:31 ` [PATCHv19 00/15] Contiguous Memory Allocator Arnd Bergmann
  15 siblings, 0 replies; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26  9:00 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig
  Cc: Michal Nazarewicz, Marek Szyprowski, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Arnd Bergmann, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

Replace custom memory bank initialization using memblock_reserve and
dma_declare_coherent with a single call to CMA's dma_declare_contiguous.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 arch/arm/plat-s5p/dev-mfc.c |   51 ++++++-------------------------------------
 1 files changed, 7 insertions(+), 44 deletions(-)

diff --git a/arch/arm/plat-s5p/dev-mfc.c b/arch/arm/plat-s5p/dev-mfc.c
index a30d36b..fcb8400 100644
--- a/arch/arm/plat-s5p/dev-mfc.c
+++ b/arch/arm/plat-s5p/dev-mfc.c
@@ -14,6 +14,7 @@
 #include <linux/interrupt.h>
 #include <linux/platform_device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma-contiguous.h>
 #include <linux/memblock.h>
 #include <linux/ioport.h>
 
@@ -22,52 +23,14 @@
 #include <plat/irqs.h>
 #include <plat/mfc.h>
 
-struct s5p_mfc_reserved_mem {
-	phys_addr_t	base;
-	unsigned long	size;
-	struct device	*dev;
-};
-
-static struct s5p_mfc_reserved_mem s5p_mfc_mem[2] __initdata;
-
 void __init s5p_mfc_reserve_mem(phys_addr_t rbase, unsigned int rsize,
 				phys_addr_t lbase, unsigned int lsize)
 {
-	int i;
-
-	s5p_mfc_mem[0].dev = &s5p_device_mfc_r.dev;
-	s5p_mfc_mem[0].base = rbase;
-	s5p_mfc_mem[0].size = rsize;
-
-	s5p_mfc_mem[1].dev = &s5p_device_mfc_l.dev;
-	s5p_mfc_mem[1].base = lbase;
-	s5p_mfc_mem[1].size = lsize;
-
-	for (i = 0; i < ARRAY_SIZE(s5p_mfc_mem); i++) {
-		struct s5p_mfc_reserved_mem *area = &s5p_mfc_mem[i];
-		if (memblock_remove(area->base, area->size)) {
-			printk(KERN_ERR "Failed to reserve memory for MFC device (%ld bytes at 0x%08lx)\n",
-			       area->size, (unsigned long) area->base);
-			area->base = 0;
-		}
-	}
-}
-
-static int __init s5p_mfc_memory_init(void)
-{
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(s5p_mfc_mem); i++) {
-		struct s5p_mfc_reserved_mem *area = &s5p_mfc_mem[i];
-		if (!area->base)
-			continue;
+	if (dma_declare_contiguous(&s5p_device_mfc_r.dev, rsize, rbase, 0))
+		printk(KERN_ERR "Failed to reserve memory for MFC device (%u bytes at 0x%08lx)\n",
+		       rsize, (unsigned long) rbase);
 
-		if (dma_declare_coherent_memory(area->dev, area->base,
-				area->base, area->size,
-				DMA_MEMORY_MAP | DMA_MEMORY_EXCLUSIVE) == 0)
-			printk(KERN_ERR "Failed to declare coherent memory for MFC device (%ld bytes at 0x%08lx)\n",
-			       area->size, (unsigned long) area->base);
-	}
-	return 0;
+	if (dma_declare_contiguous(&s5p_device_mfc_l.dev, lsize, lbase, 0))
+		printk(KERN_ERR "Failed to reserve memory for MFC device (%u bytes at 0x%08lx)\n",
+		       rsize, (unsigned long) rbase);
 }
-device_initcall(s5p_mfc_memory_init);
-- 
1.7.1.569.g6f426


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCHv19 00/15] Contiguous Memory Allocator
  2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
                   ` (14 preceding siblings ...)
  2012-01-26  9:00 ` [PATCH 15/15] ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device Marek Szyprowski
@ 2012-01-26 15:31 ` Arnd Bergmann
  2012-01-26 15:38   ` Michal Nazarewicz
                     ` (2 more replies)
  15 siblings, 3 replies; 61+ messages in thread
From: Arnd Bergmann @ 2012-01-26 15:31 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Thursday 26 January 2012, Marek Szyprowski wrote:
> Welcome everyone!
> 
> Yes, that's true. This is yet another release of the Contiguous Memory
> Allocator patches. This version mainly includes code cleanups requested
> by Mel Gorman and a few minor bug fixes.

Hi Marek,

Thanks for keeping up this work! I really hope it works out for the
next merge window.

> TODO (optional):
> - implement support for contiguous memory areas placed in HIGHMEM zone
> - resolve issue with movable pages with pending io operations

Can you clarify these? I believe the contiguous memory areas in highmem
is something that should really be after the existing code is merged
into the upstream kernel and should better not be listed as TODO here.

I haven't followed the last two releases so closely. It seems that
in v17 the movable pages with pending i/o was still a major problem
but in v18 you added a solution. Is that right? What is still left
to be done here then?

	Arnd

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv19 00/15] Contiguous Memory Allocator
  2012-01-26 15:31 ` [PATCHv19 00/15] Contiguous Memory Allocator Arnd Bergmann
@ 2012-01-26 15:38   ` Michal Nazarewicz
  2012-01-26 15:48   ` Marek Szyprowski
  2012-01-28  0:26   ` Andrew Morton
  2 siblings, 0 replies; 61+ messages in thread
From: Michal Nazarewicz @ 2012-01-26 15:38 UTC (permalink / raw)
  To: Marek Szyprowski, Arnd Bergmann
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman, Jesse Barker,
	Jonathan Corbet, Shariq Hasnain, Chunsang Jeong, Dave Hansen,
	Benjamin Gaignard

On Thu, 26 Jan 2012 16:31:40 +0100, Arnd Bergmann <arnd@arndb.de> wrote:
> I haven't followed the last two releases so closely. It seems that
> in v17 the movable pages with pending i/o was still a major problem
> but in v18 you added a solution. Is that right? What is still left
> to be done here then?

In the current version, when allocation fails because of a page with
pending I/O, CMA automatically tries allocation in another region.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCHv19 00/15] Contiguous Memory Allocator
  2012-01-26 15:31 ` [PATCHv19 00/15] Contiguous Memory Allocator Arnd Bergmann
  2012-01-26 15:38   ` Michal Nazarewicz
@ 2012-01-26 15:48   ` Marek Szyprowski
  2012-01-28  0:26   ` Andrew Morton
  2 siblings, 0 replies; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-26 15:48 UTC (permalink / raw)
  To: 'Arnd Bergmann'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Russell King',
	'Andrew Morton', 'KAMEZAWA Hiroyuki',
	'Daniel Walker', 'Mel Gorman',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong',
	'Dave Hansen', 'Benjamin Gaignard'

Hello,

On Thursday, January 26, 2012 4:32 PM Arnd Bergmann wrote:

> On Thursday 26 January 2012, Marek Szyprowski wrote:
> > Welcome everyone!
> >
> > Yes, that's true. This is yet another release of the Contiguous Memory
> > Allocator patches. This version mainly includes code cleanups requested
> > by Mel Gorman and a few minor bug fixes.
> 
> Hi Marek,
> 
> Thanks for keeping up this work! I really hope it works out for the
> next merge window.
> 
> > TODO (optional):
> > - implement support for contiguous memory areas placed in HIGHMEM zone
> > - resolve issue with movable pages with pending io operations
> 
> Can you clarify these? I believe the contiguous memory areas in highmem
> is something that should really be after the existing code is merged
> into the upstream kernel and should better not be listed as TODO here.

Ok, I will remove it from the TODO list. Core memory management is very 
little dependence on HIGHMEM, it is more about DMA-mapping framework to 
be aware that there might be no lowmem mappings for the allocated pages.
This can be easily added once the initial version got merged.
 
> I haven't followed the last two releases so closely. It seems that
> in v17 the movable pages with pending i/o was still a major problem
> but in v18 you added a solution. Is that right? What is still left
> to be done here then?

Since v18 the failed allocation is retried in a bit different place in 
the contiguous memory area what heavily increased overall reliability.

This can be improved by making cma a bit more aware about pending io 
operations, but I want to leave this after the initial merge.

I think that there are no major issues left to be resolved now.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 12/15] drivers: add Contiguous Memory Allocator
  2012-01-26  9:00 ` [PATCH 12/15] drivers: add Contiguous Memory Allocator Marek Szyprowski
@ 2012-01-27  9:44   ` Ohad Ben-Cohen
  2012-01-27 10:53     ` Marek Szyprowski
  0 siblings, 1 reply; 61+ messages in thread
From: Ohad Ben-Cohen @ 2012-01-27  9:44 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Daniel Walker, Russell King, Arnd Bergmann,
	Jonathan Corbet, Mel Gorman, Michal Nazarewicz, Dave Hansen,
	Jesse Barker, Kyungmin Park, Andrew Morton, KAMEZAWA Hiroyuki

Hi Marek,

With v19, I can't seem to allocate big regions anymore (e.g. 101MiB).
In particular, this seems to fail:

On Thu, Jan 26, 2012 at 11:00 AM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> +static int cma_activate_area(unsigned long base_pfn, unsigned long count)
> +{
> +       unsigned long pfn = base_pfn;
> +       unsigned i = count >> pageblock_order;
> +       struct zone *zone;
> +
> +       WARN_ON_ONCE(!pfn_valid(pfn));
> +       zone = page_zone(pfn_to_page(pfn));
> +
> +       do {
> +               unsigned j;
> +               base_pfn = pfn;
> +               for (j = pageblock_nr_pages; j; --j, pfn++) {
> +                       WARN_ON_ONCE(!pfn_valid(pfn));
> +                       if (page_zone(pfn_to_page(pfn)) != zone)
> +                               return -EINVAL;

The above WARN_ON_ONCE is triggered, and then the conditional is
asserted (page_zone() retuns a "Movable" zone, whereas zone is
"Normal") and the function fails.

This happens to me on OMAP4 with your 3.3-rc1-cma-v19 branch (and a
bunch of remoteproc/rpmsg patches).

Do big allocations work for you ?

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [Linaro-mm-sig] [PATCH 12/15] drivers: add Contiguous Memory Allocator
  2012-01-27  9:44   ` [Linaro-mm-sig] " Ohad Ben-Cohen
@ 2012-01-27 10:53     ` Marek Szyprowski
  2012-01-27 14:27       ` Clark, Rob
  2012-01-27 14:56       ` Ohad Ben-Cohen
  0 siblings, 2 replies; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-27 10:53 UTC (permalink / raw)
  To: 'Ohad Ben-Cohen'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Daniel Walker', 'Russell King',
	'Arnd Bergmann', 'Jonathan Corbet',
	'Mel Gorman', 'Michal Nazarewicz',
	'Dave Hansen', 'Jesse Barker',
	'Kyungmin Park', 'Andrew Morton',
	'KAMEZAWA Hiroyuki'

Hi Ohad,

On Friday, January 27, 2012 10:44 AM Ohad Ben-Cohen wrote:

> With v19, I can't seem to allocate big regions anymore (e.g. 101MiB).
> In particular, this seems to fail:
> 
> On Thu, Jan 26, 2012 at 11:00 AM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
> > +static int cma_activate_area(unsigned long base_pfn, unsigned long count)
> > +{
> > +       unsigned long pfn = base_pfn;
> > +       unsigned i = count >> pageblock_order;
> > +       struct zone *zone;
> > +
> > +       WARN_ON_ONCE(!pfn_valid(pfn));
> > +       zone = page_zone(pfn_to_page(pfn));
> > +
> > +       do {
> > +               unsigned j;
> > +               base_pfn = pfn;
> > +               for (j = pageblock_nr_pages; j; --j, pfn++) {
> > +                       WARN_ON_ONCE(!pfn_valid(pfn));
> > +                       if (page_zone(pfn_to_page(pfn)) != zone)
> > +                               return -EINVAL;
> 
> The above WARN_ON_ONCE is triggered, and then the conditional is
> asserted (page_zone() retuns a "Movable" zone, whereas zone is
> "Normal") and the function fails.
> 
> This happens to me on OMAP4 with your 3.3-rc1-cma-v19 branch (and a
> bunch of remoteproc/rpmsg patches).
> 
> Do big allocations work for you ?

I've tested it with 256MiB on Exynos4 platform. Could you check if the
problem also appears on 3.2-cma-v19 branch (I've uploaded it a few hours
ago) and 3.2-cma-v18? Both are available on our public repo:
git://git.infradead.org/users/kmpark/linux-samsung/

The above code has not been changed since v16, so I'm really surprised 
that it causes problems. Maybe the memory configuration or layout has 
been changed in 3.3-rc1 for OMAP4?

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 12/15] drivers: add Contiguous Memory Allocator
  2012-01-27 10:53     ` Marek Szyprowski
@ 2012-01-27 14:27       ` Clark, Rob
  2012-01-27 14:51         ` Marek Szyprowski
  2012-01-27 14:56       ` Ohad Ben-Cohen
  1 sibling, 1 reply; 61+ messages in thread
From: Clark, Rob @ 2012-01-27 14:27 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Ohad Ben-Cohen, Daniel Walker, Russell King, Arnd Bergmann,
	Jonathan Corbet, Mel Gorman, Jesse Barker, linux-kernel,
	Michal Nazarewicz, Dave Hansen, linaro-mm-sig, linux-mm,
	Kyungmin Park, KAMEZAWA Hiroyuki, Andrew Morton,
	linux-arm-kernel, linux-media

2012/1/27 Marek Szyprowski <m.szyprowski@samsung.com>:
> Hi Ohad,
>
> On Friday, January 27, 2012 10:44 AM Ohad Ben-Cohen wrote:
>
>> With v19, I can't seem to allocate big regions anymore (e.g. 101MiB).
>> In particular, this seems to fail:
>>
>> On Thu, Jan 26, 2012 at 11:00 AM, Marek Szyprowski
>> <m.szyprowski@samsung.com> wrote:
>> > +static int cma_activate_area(unsigned long base_pfn, unsigned long count)
>> > +{
>> > +       unsigned long pfn = base_pfn;
>> > +       unsigned i = count >> pageblock_order;
>> > +       struct zone *zone;
>> > +
>> > +       WARN_ON_ONCE(!pfn_valid(pfn));
>> > +       zone = page_zone(pfn_to_page(pfn));
>> > +
>> > +       do {
>> > +               unsigned j;
>> > +               base_pfn = pfn;
>> > +               for (j = pageblock_nr_pages; j; --j, pfn++) {
>> > +                       WARN_ON_ONCE(!pfn_valid(pfn));
>> > +                       if (page_zone(pfn_to_page(pfn)) != zone)
>> > +                               return -EINVAL;
>>
>> The above WARN_ON_ONCE is triggered, and then the conditional is
>> asserted (page_zone() retuns a "Movable" zone, whereas zone is
>> "Normal") and the function fails.
>>
>> This happens to me on OMAP4 with your 3.3-rc1-cma-v19 branch (and a
>> bunch of remoteproc/rpmsg patches).
>>
>> Do big allocations work for you ?
>
> I've tested it with 256MiB on Exynos4 platform. Could you check if the
> problem also appears on 3.2-cma-v19 branch (I've uploaded it a few hours
> ago) and 3.2-cma-v18? Both are available on our public repo:
> git://git.infradead.org/users/kmpark/linux-samsung/
>
> The above code has not been changed since v16, so I'm really surprised
> that it causes problems. Maybe the memory configuration or layout has
> been changed in 3.3-rc1 for OMAP4?

is highmem still an issue?  I remember hitting this WARN_ON_ONCE() but
went away after I switched to a 2g/2g vm split (which avoids highmem)

BR,
-R

> Best regards
> --
> Marek Szyprowski
> Samsung Poland R&D Center
>
>
>
>
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [Linaro-mm-sig] [PATCH 12/15] drivers: add Contiguous Memory Allocator
  2012-01-27 14:27       ` Clark, Rob
@ 2012-01-27 14:51         ` Marek Szyprowski
  2012-01-27 14:59           ` Ohad Ben-Cohen
  0 siblings, 1 reply; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-27 14:51 UTC (permalink / raw)
  To: 'Clark, Rob'
  Cc: 'Ohad Ben-Cohen', 'Daniel Walker',
	'Russell King', 'Arnd Bergmann',
	'Jonathan Corbet', 'Mel Gorman',
	'Jesse Barker', linux-kernel, 'Michal Nazarewicz',
	'Dave Hansen',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'KAMEZAWA Hiroyuki', 'Andrew Morton',
	linux-arm-kernel, linux-media

Hello,

On Friday, January 27, 2012 3:28 PM Clark, Rob wrote:

> 2012/1/27 Marek Szyprowski <m.szyprowski@samsung.com>:
> > Hi Ohad,
> >
> > On Friday, January 27, 2012 10:44 AM Ohad Ben-Cohen wrote:
> >
> >> With v19, I can't seem to allocate big regions anymore (e.g. 101MiB).
> >> In particular, this seems to fail:
> >>
> >> On Thu, Jan 26, 2012 at 11:00 AM, Marek Szyprowski
> >> <m.szyprowski@samsung.com> wrote:
> >> > +static int cma_activate_area(unsigned long base_pfn, unsigned long count)
> >> > +{
> >> > +       unsigned long pfn = base_pfn;
> >> > +       unsigned i = count >> pageblock_order;
> >> > +       struct zone *zone;
> >> > +
> >> > +       WARN_ON_ONCE(!pfn_valid(pfn));
> >> > +       zone = page_zone(pfn_to_page(pfn));
> >> > +
> >> > +       do {
> >> > +               unsigned j;
> >> > +               base_pfn = pfn;
> >> > +               for (j = pageblock_nr_pages; j; --j, pfn++) {
> >> > +                       WARN_ON_ONCE(!pfn_valid(pfn));
> >> > +                       if (page_zone(pfn_to_page(pfn)) != zone)
> >> > +                               return -EINVAL;
> >>
> >> The above WARN_ON_ONCE is triggered, and then the conditional is
> >> asserted (page_zone() retuns a "Movable" zone, whereas zone is
> >> "Normal") and the function fails.
> >>
> >> This happens to me on OMAP4 with your 3.3-rc1-cma-v19 branch (and a
> >> bunch of remoteproc/rpmsg patches).
> >>
> >> Do big allocations work for you ?
> >
> > I've tested it with 256MiB on Exynos4 platform. Could you check if the
> > problem also appears on 3.2-cma-v19 branch (I've uploaded it a few hours
> > ago) and 3.2-cma-v18? Both are available on our public repo:
> > git://git.infradead.org/users/kmpark/linux-samsung/
> >
> > The above code has not been changed since v16, so I'm really surprised
> > that it causes problems. Maybe the memory configuration or layout has
> > been changed in 3.3-rc1 for OMAP4?
> 
> is highmem still an issue?  I remember hitting this WARN_ON_ONCE() but
> went away after I switched to a 2g/2g vm split (which avoids highmem)

No, it shouldn't be an issue. I've tested CMA v19 on a system with 1GiB of
the memory and general purpose (global) cma region was allocated correctly
at the end of low memory. For device private regions you should take care 
of correct placement by yourself, so maybe this is an issue in this case?

Ohad, could you tell a bit more about your issue? Does this 'large region'
is a device private region (declared with dma_declare_contiguous()) or is it
a global one (defined in Kconfig or cma= kernel boot parameter)? 

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 12/15] drivers: add Contiguous Memory Allocator
  2012-01-27 10:53     ` Marek Szyprowski
  2012-01-27 14:27       ` Clark, Rob
@ 2012-01-27 14:56       ` Ohad Ben-Cohen
  1 sibling, 0 replies; 61+ messages in thread
From: Ohad Ben-Cohen @ 2012-01-27 14:56 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Daniel Walker, Russell King, Arnd Bergmann,
	Jonathan Corbet, Mel Gorman, Michal Nazarewicz, Dave Hansen,
	Jesse Barker, Kyungmin Park, Andrew Morton, KAMEZAWA Hiroyuki

2012/1/27 Marek Szyprowski <m.szyprowski@samsung.com>:
> I've tested it with 256MiB on Exynos4 platform. Could you check if the
> problem also appears on 3.2-cma-v19 branch (I've uploaded it a few hours
> ago)

Exactly what I needed, thanks :)

Both v18 and v19 seem to work fine with 3.2.

> The above code has not been changed since v16, so I'm really surprised
> that it causes problems. Maybe the memory configuration or layout has
> been changed in 3.3-rc1 for OMAP4?

Not sure what the culprit is, but it is only triggered with 3.3-rc1.

I'll tell you if I find anything.

Thanks!
Ohad.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 12/15] drivers: add Contiguous Memory Allocator
  2012-01-27 14:51         ` Marek Szyprowski
@ 2012-01-27 14:59           ` Ohad Ben-Cohen
  2012-01-27 15:17             ` Marek Szyprowski
  0 siblings, 1 reply; 61+ messages in thread
From: Ohad Ben-Cohen @ 2012-01-27 14:59 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Clark, Rob, Daniel Walker, Russell King, Arnd Bergmann,
	Jonathan Corbet, Mel Gorman, Jesse Barker, linux-kernel,
	Michal Nazarewicz, Dave Hansen, linaro-mm-sig, linux-mm,
	Kyungmin Park, KAMEZAWA Hiroyuki, Andrew Morton,
	linux-arm-kernel, linux-media

2012/1/27 Marek Szyprowski <m.szyprowski@samsung.com>:
> Ohad, could you tell a bit more about your issue?

Sure, feel free to ask.

> Does this 'large region'
> is a device private region (declared with dma_declare_contiguous())

Yes, it is.

See omap_rproc_reserve_cma() in:

http://git.kernel.org/?p=linux/kernel/git/ohad/remoteproc.git;a=commitdiff;h=dab6a2584550a629746fa1dea2be8ffbe1910277

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [Linaro-mm-sig] [PATCH 12/15] drivers: add Contiguous Memory Allocator
  2012-01-27 14:59           ` Ohad Ben-Cohen
@ 2012-01-27 15:17             ` Marek Szyprowski
  2012-01-28 18:57               ` Ohad Ben-Cohen
  0 siblings, 1 reply; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-27 15:17 UTC (permalink / raw)
  To: 'Ohad Ben-Cohen'
  Cc: 'Clark, Rob', 'Daniel Walker',
	'Russell King', 'Arnd Bergmann',
	'Jonathan Corbet', 'Mel Gorman',
	'Jesse Barker', linux-kernel, 'Michal Nazarewicz',
	'Dave Hansen',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'KAMEZAWA Hiroyuki', 'Andrew Morton',
	linux-arm-kernel, linux-media

Hello,

On Friday, January 27, 2012 3:59 PM Ohad Ben-Cohen wrote:

> 2012/1/27 Marek Szyprowski <m.szyprowski@samsung.com>:
> > Ohad, could you tell a bit more about your issue?
> 
> Sure, feel free to ask.
> 
> > Does this 'large region'
> > is a device private region (declared with dma_declare_contiguous())
> 
> Yes, it is.
> 
> See omap_rproc_reserve_cma() in:
> 
> http://git.kernel.org/?p=linux/kernel/git/ohad/remoteproc.git;a=commitdiff;h=dab6a2584550a6297
> 46fa1dea2be8ffbe1910277

There have been some vmalloc layout changes merged to v3.3-rc1. Please check
if the hardcoded OMAP_RPROC_CMA_BASE+CONFIG_OMAP_DUCATI_CMA_SIZE fits into kernel
low-memory. Some hints you can find after the "Virtual kernel memory layout:" 
message during boot and using "cat /proc/iomem".

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv19 00/15] Contiguous Memory Allocator
  2012-01-26 15:31 ` [PATCHv19 00/15] Contiguous Memory Allocator Arnd Bergmann
  2012-01-26 15:38   ` Michal Nazarewicz
  2012-01-26 15:48   ` Marek Szyprowski
@ 2012-01-28  0:26   ` Andrew Morton
  2012-01-29 18:09     ` Rob Clark
                       ` (3 more replies)
  2 siblings, 4 replies; 61+ messages in thread
From: Andrew Morton @ 2012-01-28  0:26 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	Russell King, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Thu, 26 Jan 2012 15:31:40 +0000
Arnd Bergmann <arnd@arndb.de> wrote:

> On Thursday 26 January 2012, Marek Szyprowski wrote:
> > Welcome everyone!
> > 
> > Yes, that's true. This is yet another release of the Contiguous Memory
> > Allocator patches. This version mainly includes code cleanups requested
> > by Mel Gorman and a few minor bug fixes.
> 
> Hi Marek,
> 
> Thanks for keeping up this work! I really hope it works out for the
> next merge window.

Someone please tell me when it's time to start paying attention
again ;)

These patches don't seem to have as many acked-bys and reviewed-bys as
I'd expect.  Given the scope and duration of this, it would be useful
to gather these up.  But please ensure they are real ones - people
sometimes like to ack things without showing much sign of having
actually read them.

Also there is the supreme tag: "Tested-by:.".  Ohad (at least) has been
testing the code.  Let's mention that.


The patches do seem to have been going round in ever-decreasing circles
lately and I think we have decided to merge them (yes?) so we may as well
get on and do that and sort out remaining issues in-tree.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 12/15] drivers: add Contiguous Memory Allocator
  2012-01-27 15:17             ` Marek Szyprowski
@ 2012-01-28 18:57               ` Ohad Ben-Cohen
  2012-01-30  7:43                 ` Marek Szyprowski
  0 siblings, 1 reply; 61+ messages in thread
From: Ohad Ben-Cohen @ 2012-01-28 18:57 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Clark, Rob, Daniel Walker, Russell King, Arnd Bergmann,
	Jonathan Corbet, Mel Gorman, Jesse Barker, linux-kernel,
	Michal Nazarewicz, Dave Hansen, linaro-mm-sig, linux-mm,
	Kyungmin Park, KAMEZAWA Hiroyuki, Andrew Morton,
	linux-arm-kernel, linux-media

Hi Marek,

On Fri, Jan 27, 2012 at 5:17 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> There have been some vmalloc layout changes merged to v3.3-rc1.

That was dead-on, thanks a lot!

I did then bump into a different allocation failure which happened
because dma_alloc_from_contiguous() computes 'mask' before capping the
'align' argument.

The early 'mask' computation was added in v18 (and therefore exists in
v19 too) and I was actually testing v17 previously, so I didn't notice
it before.

You may want to squash something like this:

diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
index f41e699..8455cb7 100644
--- a/drivers/base/dma-contiguous.c
+++ b/drivers/base/dma-contiguous.c
@@ -319,8 +319,7 @@ struct page *dma_alloc_from_contiguous(struct device *dev, i
                                       unsigned int align)
 {
        struct cma *cma = dev_get_cma_area(dev);
-       unsigned long pfn, pageno, start = 0;
-       unsigned long mask = (1 << align) - 1;
+       unsigned long mask, pfn, pageno, start = 0;
        int ret;

        if (!cma || !cma->count)
@@ -329,6 +328,8 @@ struct page *dma_alloc_from_contiguous(struct device *dev, i
        if (align > CONFIG_CMA_ALIGNMENT)
                align = CONFIG_CMA_ALIGNMENT;

+       mask = (1 << align) - 1;
+
        pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
                 count, align);

Thanks,
Ohad.

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCHv19 00/15] Contiguous Memory Allocator
  2012-01-28  0:26   ` Andrew Morton
@ 2012-01-29 18:09     ` Rob Clark
  2012-01-29 20:32       ` Anca Emanuel
  2012-01-29 20:51     ` Arnd Bergmann
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 61+ messages in thread
From: Rob Clark @ 2012-01-29 18:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Arnd Bergmann, Marek Szyprowski, linux-kernel, linux-arm-kernel,
	linux-media, linux-mm, linaro-mm-sig, Michal Nazarewicz,
	Kyungmin Park, Russell King, KAMEZAWA Hiroyuki, Daniel Walker,
	Mel Gorman, Jesse Barker, Jonathan Corbet, Shariq Hasnain,
	Chunsang Jeong, Dave Hansen, Benjamin Gaignard

On Fri, Jan 27, 2012 at 6:26 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu, 26 Jan 2012 15:31:40 +0000
> Arnd Bergmann <arnd@arndb.de> wrote:
>
>> On Thursday 26 January 2012, Marek Szyprowski wrote:
>> > Welcome everyone!
>> >
>> > Yes, that's true. This is yet another release of the Contiguous Memory
>> > Allocator patches. This version mainly includes code cleanups requested
>> > by Mel Gorman and a few minor bug fixes.
>>
>> Hi Marek,
>>
>> Thanks for keeping up this work! I really hope it works out for the
>> next merge window.
>
> Someone please tell me when it's time to start paying attention
> again ;)
>
> These patches don't seem to have as many acked-bys and reviewed-bys as
> I'd expect.  Given the scope and duration of this, it would be useful
> to gather these up.  But please ensure they are real ones - people
> sometimes like to ack things without showing much sign of having
> actually read them.
>
> Also there is the supreme tag: "Tested-by:.".  Ohad (at least) has been
> testing the code.  Let's mention that.
>

fyi Marek, I've been testing CMA as well, both in context of Ohad's
rpmsg driver and my omapdrm driver (and combination of the two)..  so
you can add:

Tested-by: Rob Clark <rob.clark@linaro.org>

And there are some others from linaro that have written a test driver,
and various stress test scripts using the test driver.  I guess that
could also count for some additional Tested-by's.

BR,
-R

> The patches do seem to have been going round in ever-decreasing circles
> lately and I think we have decided to merge them (yes?) so we may as well
> get on and do that and sort out remaining issues in-tree.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv19 00/15] Contiguous Memory Allocator
  2012-01-29 18:09     ` Rob Clark
@ 2012-01-29 20:32       ` Anca Emanuel
  0 siblings, 0 replies; 61+ messages in thread
From: Anca Emanuel @ 2012-01-29 20:32 UTC (permalink / raw)
  To: Rob Clark
  Cc: Andrew Morton, Arnd Bergmann, Marek Szyprowski, linux-kernel,
	linux-arm-kernel, linux-media, linux-mm, linaro-mm-sig,
	Michal Nazarewicz, Kyungmin Park, Russell King,
	KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman, Jesse Barker,
	Jonathan Corbet, Shariq Hasnain, Chunsang Jeong, Dave Hansen,
	Benjamin Gaignard

>> Also there is the supreme tag: "Tested-by:.".  Ohad (at least) has been
>> testing the code.  Let's mention that.
>>
>
> fyi Marek, I've been testing CMA as well, both in context of Ohad's
> rpmsg driver and my omapdrm driver (and combination of the two)..  so
> you can add:
>
> Tested-by: Rob Clark <rob.clark@linaro.org>
>
> And there are some others from linaro that have written a test driver,
> and various stress test scripts using the test driver.  I guess that
> could also count for some additional Tested-by's.

Convince them to report with Tested-by tag.
This is a first step for them to face the open source.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv19 00/15] Contiguous Memory Allocator
  2012-01-28  0:26   ` Andrew Morton
  2012-01-29 18:09     ` Rob Clark
@ 2012-01-29 20:51     ` Arnd Bergmann
  2012-01-30 13:25     ` Mel Gorman
  2012-02-10 18:10     ` Marek Szyprowski
  3 siblings, 0 replies; 61+ messages in thread
From: Arnd Bergmann @ 2012-01-29 20:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Michal Nazarewicz, Kyungmin Park,
	Russell King, KAMEZAWA Hiroyuki, Daniel Walker, Mel Gorman,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Saturday 28 January 2012, Andrew Morton wrote:
> These patches don't seem to have as many acked-bys and reviewed-bys as
> I'd expect.  Given the scope and duration of this, it would be useful
> to gather these up.  But please ensure they are real ones - people
> sometimes like to ack things without showing much sign of having
> actually read them.

I reviewed early versions of this patch set and had a lot of comments on the
interfaces that were exposed to device drivers and platform maintainers.

All of the comments were addressed back then and I gave an Acked-by.
I assume that it was dropped in subsequent versions because the
implementation changed significantly since, but I'm still happy with the
way this looks to the user, in particular that it is practically invisible
because all users just go through the dma mapping API instead of the
horrors that were used in the original patches.

>From an ARM architecture perspective, we have come to the point (some
versions ago) where we actually require the CMA patchset for correctness,
even on IOMMU based systems because it avoids some nasty corner cases
with pages that are both in the linear kernel mapping and in an
uncached mapping for DMA: We know that the code we are using in mainline
is broken on ARMv6 and later and that CMA fixes that problem.

I'm not the right person to judge the memory management code changes,
others need to comment on that. Aside from that:

Acked-by: Arnd Bergmann <arnd@arndb.de>

	Arnd

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [Linaro-mm-sig] [PATCH 12/15] drivers: add Contiguous Memory Allocator
  2012-01-28 18:57               ` Ohad Ben-Cohen
@ 2012-01-30  7:43                 ` Marek Szyprowski
  2012-01-30  9:16                   ` Ohad Ben-Cohen
  0 siblings, 1 reply; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-30  7:43 UTC (permalink / raw)
  To: 'Ohad Ben-Cohen'
  Cc: 'Clark, Rob', 'Daniel Walker',
	'Russell King', 'Arnd Bergmann',
	'Jonathan Corbet', 'Mel Gorman',
	'Jesse Barker', linux-kernel, 'Michal Nazarewicz',
	'Dave Hansen',
	linaro-mm-sig, linux-mm, 'Kyungmin Park',
	'KAMEZAWA Hiroyuki', 'Andrew Morton',
	linux-arm-kernel, linux-media

Hello,

On Saturday, January 28, 2012 7:57 PM Ohad Ben-Cohen wrote:

> On Fri, Jan 27, 2012 at 5:17 PM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
> > There have been some vmalloc layout changes merged to v3.3-rc1.
> 
> That was dead-on, thanks a lot!

Did you managed to fix this issue?

> 
> I did then bump into a different allocation failure which happened
> because dma_alloc_from_contiguous() computes 'mask' before capping the
> 'align' argument.
> 
> The early 'mask' computation was added in v18 (and therefore exists in
> v19 too) and I was actually testing v17 previously, so I didn't notice
> it before.

Right, thanks for spotting it, I will squash it to the next release.

> You may want to squash something like this:
> 
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index f41e699..8455cb7 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -319,8 +319,7 @@ struct page *dma_alloc_from_contiguous(struct device *dev, i
>                                        unsigned int align)
>  {
>         struct cma *cma = dev_get_cma_area(dev);
> -       unsigned long pfn, pageno, start = 0;
> -       unsigned long mask = (1 << align) - 1;
> +       unsigned long mask, pfn, pageno, start = 0;
>         int ret;
> 
>         if (!cma || !cma->count)
> @@ -329,6 +328,8 @@ struct page *dma_alloc_from_contiguous(struct device *dev, i
>         if (align > CONFIG_CMA_ALIGNMENT)
>                 align = CONFIG_CMA_ALIGNMENT;
> 
> +       mask = (1 << align) - 1;
> +
>         pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
>                  count, align);
> 

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Linaro-mm-sig] [PATCH 12/15] drivers: add Contiguous Memory Allocator
  2012-01-30  7:43                 ` Marek Szyprowski
@ 2012-01-30  9:16                   ` Ohad Ben-Cohen
  0 siblings, 0 replies; 61+ messages in thread
From: Ohad Ben-Cohen @ 2012-01-30  9:16 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Clark, Rob, Daniel Walker, Russell King, Arnd Bergmann,
	Jonathan Corbet, Mel Gorman, Jesse Barker, linux-kernel,
	Michal Nazarewicz, Dave Hansen, linaro-mm-sig, linux-mm,
	Kyungmin Park, KAMEZAWA Hiroyuki, Andrew Morton,
	linux-arm-kernel, linux-media

Hi Marek,

On Mon, Jan 30, 2012 at 9:43 AM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> Did you managed to fix this issue?

Yes -- the recent increase in the vmalloc region triggered a bigger
truncation in the system RAM than we had before, and therefore
conflicted with the previous hardcoded region we were using.

Long term, our plan is to get rid of those hardcoded values, but for
the moment our remote RTOS still needs to know the physical address in
advance.

> Right, thanks for spotting it, I will squash it to the next release.

Thanks. With that hunk squashed in, feel free to add my Tested-by tag
to the patches.

Thanks!
Ohad.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/15] mm: page_alloc: remove trailing whitespace
  2012-01-26  9:00 ` [PATCH 01/15] mm: page_alloc: remove trailing whitespace Marek Szyprowski
@ 2012-01-30 10:59   ` Mel Gorman
  0 siblings, 0 replies; 61+ messages in thread
From: Mel Gorman @ 2012-01-30 10:59 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Thu, Jan 26, 2012 at 10:00:43AM +0100, Marek Szyprowski wrote:
> From: Michal Nazarewicz <mina86@mina86.com>
> 
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

Ordinarily, I do not like these sort of patches because they can
interfere with git blame but as it is comments that are affected;

Acked-by: Mel Gorman <mel@csn.ul.ie>

Thanks

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating
  2012-01-26  9:00 ` [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating Marek Szyprowski
@ 2012-01-30 11:15   ` Mel Gorman
  2012-01-30 15:41     ` Michal Nazarewicz
  0 siblings, 1 reply; 61+ messages in thread
From: Mel Gorman @ 2012-01-30 11:15 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Thu, Jan 26, 2012 at 10:00:44AM +0100, Marek Szyprowski wrote:
> From: Michal Nazarewicz <mina86@mina86.com>
> 
> This commit changes set_migratetype_isolate() so that it updates
> migrate type of pages on pcp list which is saved in their
> page_private.
> 
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
>  include/linux/page-isolation.h |    6 ++++++
>  mm/page_alloc.c                |    1 +
>  mm/page_isolation.c            |   24 ++++++++++++++++++++++++
>  3 files changed, 31 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 051c1b1..8c02c2b 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -27,6 +27,12 @@ extern int
>  test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
>  
>  /*
> + * Check all pages in pageblock, find the ones on pcp list, and set
> + * their page_private to MIGRATE_ISOLATE.
> + */
> +extern void update_pcp_isolate_block(unsigned long pfn);
> +
> +/*
>   * Internal funcs.Changes pageblock's migrate type.
>   * Please use make_pagetype_isolated()/make_pagetype_movable().
>   */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e1c5656..70709e7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5465,6 +5465,7 @@ out:
>  	if (!ret) {
>  		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
>  		move_freepages_block(zone, page, MIGRATE_ISOLATE);
> +		update_pcp_isolate_block(pfn);
>  	}
>  
>  	spin_unlock_irqrestore(&zone->lock, flags);
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 4ae42bb..9ea2f6e 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -139,3 +139,27 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  	return ret ? 0 : -EBUSY;
>  }
> +
> +/* must hold zone->lock */
> +void update_pcp_isolate_block(unsigned long pfn)
> +{
> +	unsigned long end_pfn = pfn + pageblock_nr_pages;
> +	struct page *page;
> +
> +	while (pfn < end_pfn) {
> +		if (!pfn_valid_within(pfn)) {
> +			++pfn;
> +			continue;
> +		}
> +

There is a potential problem here that you need to be aware of.
set_pageblock_migratetype() is called from start_isolate_page_range().
I do not think there is a guarantee that pfn + pageblock_nr_pages is
not in a different block of MAX_ORDER_NR_PAGES. If that is right then
your options are to add a check like this;

if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0 && !pfn_valid(pfn))
	break;

or else ensure that end_pfn is always MAX_ORDER_NR_PAGES aligned and in
the same block as pfn and relying on the caller to have called
pfn_valid.

> +		page = pfn_to_page(pfn);
> +		if (PageBuddy(page)) {
> +			pfn += 1 << page_order(page);
> +		} else if (page_count(page) == 0) {
> +			set_page_private(page, MIGRATE_ISOLATE);
> +			++pfn;

This is dangerous for two reasons. If the page_count is 0, it could
be because the page is in the process of being freed and is not
necessarily on the per-cpu lists yet and you cannot be sure if the
contents of page->private are important. Second, there is nothing to
prevent another CPU allocating this page from its per-cpu list while
the private field is getting updated from here which might lead to
some interesting races.

I recognise that what you are trying to do is respond to Gilad's
request that you really check if an IPI here is necessary. I think what
you need to do is check if a page with a count of 0 is encountered
and if it is, then a draining of the per-cpu lists is necessary. To
address Gilad's concerns, be sure to only this this once per attempt at
CMA rather than for every page encountered with a count of 0 to avoid a
storm of IPIs.

> +		} else {
> +			++pfn;
> +		}
> +	}
> +}

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/15] mm: compaction: introduce isolate_migratepages_range().
  2012-01-26  9:00 ` [PATCH 03/15] mm: compaction: introduce isolate_migratepages_range() Marek Szyprowski
@ 2012-01-30 11:24   ` Mel Gorman
  2012-01-30 12:42     ` Michal Nazarewicz
  0 siblings, 1 reply; 61+ messages in thread
From: Mel Gorman @ 2012-01-30 11:24 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Thu, Jan 26, 2012 at 10:00:45AM +0100, Marek Szyprowski wrote:
> From: Michal Nazarewicz <mina86@mina86.com>
> 
> This commit introduces isolate_migratepages_range() function which
> extracts functionality from isolate_migratepages() so that it can be
> used on arbitrary PFN ranges.
> 
> isolate_migratepages() function is implemented as a simple wrapper
> around isolate_migratepages_range().
> 
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

Super, this is much easier to read. I have just one nit below but once
that is fixed;

Acked-by: Mel Gorman <mel@csn.ul.ie>

> @@ -313,7 +316,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  		} else if (!locked)
>  			spin_lock_irq(&zone->lru_lock);
>  
> -		if (!pfn_valid_within(low_pfn))
> +		if (!pfn_valid(low_pfn))
>  			continue;
>  		nr_scanned++;
>  

This chunk looks unrelated to the rest of the patch.

I think what you are doing is patching around a bug that CMA exposed
which is very similar to the bug report at
http://www.spinics.net/lists/linux-mm/msg29260.html . Is this true?

If so, I posted a fix that only calls pfn_valid() when necessary. Can
you check if that works for you and if so, drop this hunk please? If
the patch does not work for you, then this hunk still needs to be
in a separate patch and handled separately as it would also be a fix
for -stable.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/15] mm: compaction: introduce isolate_freepages_range()
  2012-01-26  9:00 ` [PATCH 04/15] mm: compaction: introduce isolate_freepages_range() Marek Szyprowski
@ 2012-01-30 11:48   ` Mel Gorman
  2012-01-30 11:55     ` Mel Gorman
  0 siblings, 1 reply; 61+ messages in thread
From: Mel Gorman @ 2012-01-30 11:48 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Thu, Jan 26, 2012 at 10:00:46AM +0100, Marek Szyprowski wrote:
> From: Michal Nazarewicz <mina86@mina86.com>
> 
> This commit introduces isolate_freepages_range() function which
> generalises isolate_freepages_block() so that it can be used on
> arbitrary PFN ranges.
> 
> isolate_freepages_block() is left with only minor changes.
> 

The minor changes to isolate_freepages_block() look fine in
terms of how current compaction works. I have a minor comment on
isolate_freepages_range() but it is up to you whether to address them
or not. Whether you alter isolate_freepages_range() or not;

Acked-by: Mel Gorman <mel@csn.ul.ie>

> <SNIP>
> @@ -105,6 +109,80 @@ static unsigned long isolate_freepages_block(struct zone *zone,
>  	return total_isolated;
>  }
>  
> +/**
> + * isolate_freepages_range() - isolate free pages.
> + * @start_pfn: The first PFN to start isolating.
> + * @end_pfn:   The one-past-last PFN.
> + *
> + * Non-free pages, invalid PFNs, or zone boundaries within the
> + * [start_pfn, end_pfn) range are considered errors, cause function to
> + * undo its actions and return zero.
> + *
> + * Otherwise, function returns one-past-the-last PFN of isolated page
> + * (which may be greater then end_pfn if end fell in a middle of
> + * a free page).
> + */
> +static unsigned long
> +isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long isolated, pfn, block_end_pfn, flags;
> +	struct zone *zone = NULL;
> +	LIST_HEAD(freelist);
> +	struct page *page;
> +
> +	for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) {
> +		if (!pfn_valid(pfn))
> +			break;
> +
> +		if (!zone)
> +			zone = page_zone(pfn_to_page(pfn));
> +		else if (zone != page_zone(pfn_to_page(pfn)))
> +			break;
> +

So what you are checking for here is if you straddle zones.
You could just initialise zone outside of the for loop. You can
then check outside the loop if end_pfn is in a different zone to
start_pfn. If it is, either adjust end_pfn accordingly or bail the
entire operation avoiding the need for release_freepages() later. This
will be a little cheaper.

> +		/*
> +		 * On subsequent iterations round_down() is actually not
> +		 * needed, but we keep it that we not to complicate the code.
> +		 */
> +		block_end_pfn = round_down(pfn, pageblock_nr_pages)
> +			+ pageblock_nr_pages;

Seems a little more involved than it needs to be. Something like
this might suit and be a bit nicer?

block_end_pfn = ALIGN(pfn+1, pageblock_nr_pages);

> +		block_end_pfn = min(block_end_pfn, end_pfn);
> +
> +		spin_lock_irqsave(&zone->lock, flags);
> +		isolated = isolate_freepages_block(pfn, block_end_pfn,
> +						   &freelist, true);
> +		spin_unlock_irqrestore(&zone->lock, flags);
> +
> +		/*
> +		 * In strict mode, isolate_freepages_block() returns 0 if
> +		 * there are any holes in the block (ie. invalid PFNs or
> +		 * non-free pages).
> +		 */
> +		if (!isolated)
> +			break;
> +
> +		/*
> +		 * If we managed to isolate pages, it is always (1 << n) *
> +		 * pageblock_nr_pages for some non-negative n.  (Max order
> +		 * page may span two pageblocks).
> +		 */
> +	}
> +
> +	/* split_free_page does not map the pages */
> +	list_for_each_entry(page, &freelist, lru) {
> +		arch_alloc_page(page, 0);
> +		kernel_map_pages(page, 1, 1);
> +	}
> +

This block is copied in two places - isolate_freepages and
isolate_freepages_range() so sharing a common helper would be nice. I
suspect you didn't because it would interfere with existing code more
than was strictly necessary which I complained about previously as
it made review harder. If that was your thinking, then just create
this helper in a separate patch. It's not critical though.

> +	if (pfn < end_pfn) {
> +		/* Loop terminated early, cleanup. */
> +		release_freepages(&freelist);
> +		return 0;
> +	}
> +
> +	/* We don't use freelists for anything. */
> +	return pfn;
> +}
> +
>  /* Returns true if the page is within a block suitable for migration to */
>  static bool suitable_migration_target(struct page *page)
>  {

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/15] mm: compaction: introduce isolate_freepages_range()
  2012-01-30 11:48   ` Mel Gorman
@ 2012-01-30 11:55     ` Mel Gorman
  0 siblings, 0 replies; 61+ messages in thread
From: Mel Gorman @ 2012-01-30 11:55 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Mon, Jan 30, 2012 at 11:48:20AM +0000, Mel Gorman wrote:
> > +		if (!zone)
> > +			zone = page_zone(pfn_to_page(pfn));
> > +		else if (zone != page_zone(pfn_to_page(pfn)))
> > +			break;
> > +
> 
> So what you are checking for here is if you straddle zones.
> You could just initialise zone outside of the for loop. You can
> then check outside the loop if end_pfn is in a different zone to
> start_pfn. If it is, either adjust end_pfn accordingly or bail the
> entire operation avoiding the need for release_freepages() later. This
> will be a little cheaper.
> 

Whoops, silly me! You are watching for overlapping zones which can
happen in some rare configurations and for that checking page_zone()
like this is necessary. You can still initialise zone outside the loop
but the page_zone() check is still necessary.

My bad.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 05/15] mm: compaction: export some of the functions
  2012-01-26  9:00 ` [PATCH 05/15] mm: compaction: export some of the functions Marek Szyprowski
@ 2012-01-30 11:57   ` Mel Gorman
  2012-01-30 12:33     ` Michal Nazarewicz
  0 siblings, 1 reply; 61+ messages in thread
From: Mel Gorman @ 2012-01-30 11:57 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Thu, Jan 26, 2012 at 10:00:47AM +0100, Marek Szyprowski wrote:
> From: Michal Nazarewicz <mina86@mina86.com>
> 
> This commit exports some of the functions from compaction.c file
> outside of it adding their declaration into internal.h header
> file so that other mm related code can use them.
> 
> This forced compaction.c to always be compiled (as opposed to being
> compiled only if CONFIG_COMPACTION is defined) but as to avoid
> introducing code that user did not ask for, part of the compaction.c
> is now wrapped in on #ifdef.
> 
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
>  mm/Makefile     |    3 +-
>  mm/compaction.c |  314 ++++++++++++++++++++++++++-----------------------------
>  mm/internal.h   |   33 ++++++
>  3 files changed, 184 insertions(+), 166 deletions(-)
> 
> diff --git a/mm/Makefile b/mm/Makefile
> index 50ec00e..8aada89 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -13,7 +13,7 @@ obj-y			:= filemap.o mempool.o oom_kill.o fadvise.o \
>  			   readahead.o swap.o truncate.o vmscan.o shmem.o \
>  			   prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
>  			   page_isolation.o mm_init.o mmu_context.o percpu.o \
> -			   $(mmu-y)
> +			   compaction.o $(mmu-y)
>  obj-y += init-mm.o
>  
>  ifdef CONFIG_NO_BOOTMEM
> @@ -32,7 +32,6 @@ obj-$(CONFIG_NUMA) 	+= mempolicy.o
>  obj-$(CONFIG_SPARSEMEM)	+= sparse.o
>  obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
>  obj-$(CONFIG_SLOB) += slob.o
> -obj-$(CONFIG_COMPACTION) += compaction.o
>  obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
>  obj-$(CONFIG_KSM) += ksm.o
>  obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 63f82be..3e21d28 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -16,30 +16,11 @@
>  #include <linux/sysfs.h>
>  #include "internal.h"
>  
> +#if defined CONFIG_COMPACTION || defined CONFIG_CMA
> +

This is pedantic but you reference CONFIG_CMA before the patch that
declares it. The only time this really matters is when it breaks
bisection but I do not think that is the case here.

Whether you fix this or not by moving the CONFIG_CMA check to the same
patch that declares it in Kconfig

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/15] mm: page_alloc: introduce alloc_contig_range()
  2012-01-26  9:00 ` [PATCH 06/15] mm: page_alloc: introduce alloc_contig_range() Marek Szyprowski
@ 2012-01-30 12:11   ` Mel Gorman
  0 siblings, 0 replies; 61+ messages in thread
From: Mel Gorman @ 2012-01-30 12:11 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Thu, Jan 26, 2012 at 10:00:48AM +0100, Marek Szyprowski wrote:
> From: Michal Nazarewicz <mina86@mina86.com>
> 
> This commit adds the alloc_contig_range() function which tries
> to allocate given range of pages.  It tries to migrate all
> already allocated pages that fall in the range thus freeing them.
> Once all pages in the range are freed they are removed from the
> buddy system thus allocated for the caller to use.
> 
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
>  include/linux/page-isolation.h |    7 ++
>  mm/page_alloc.c                |  183 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 190 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 8c02c2b..430cf61 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -39,5 +39,12 @@ extern void update_pcp_isolate_block(unsigned long pfn);
>  extern int set_migratetype_isolate(struct page *page);
>  extern void unset_migratetype_isolate(struct page *page);
>  
> +#ifdef CONFIG_CMA
> +
> +/* The below functions must be run on a range from a single zone. */
> +extern int alloc_contig_range(unsigned long start, unsigned long end);
> +extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
> +
> +#endif
>  

Did you really mean page-isolation.h? I would have thought gfp.h
would be a more suitable fit.

>  #endif
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 70709e7..b4f50532 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -57,6 +57,7 @@
>  #include <linux/ftrace_event.h>
>  #include <linux/memcontrol.h>
>  #include <linux/prefetch.h>
> +#include <linux/migrate.h>
>  #include <linux/page-debug-flags.h>
>  
>  #include <asm/tlbflush.h>
> @@ -5488,6 +5489,188 @@ out:
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  }
>  
> +#ifdef CONFIG_CMA
> +
> +static unsigned long pfn_align_to_maxpage_down(unsigned long pfn)
> +{
> +	return pfn & ~(MAX_ORDER_NR_PAGES - 1);
> +}
> +
> +static unsigned long pfn_align_to_maxpage_up(unsigned long pfn)
> +{
> +	return ALIGN(pfn, MAX_ORDER_NR_PAGES);
> +}
> +
> +static struct page *
> +__alloc_contig_migrate_alloc(struct page *page, unsigned long private,
> +			     int **resultp)
> +{
> +	return alloc_page(GFP_HIGHUSER_MOVABLE);
> +}
> +
> +/* [start, end) must belong to a single zone. */
> +static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
> +{
> +	/* This function is based on compact_zone() from compaction.c. */
> +
> +	unsigned long pfn = start;
> +	unsigned int tries = 0;
> +	int ret = 0;
> +
> +	struct compact_control cc = {
> +		.nr_migratepages = 0,
> +		.order = -1,
> +		.zone = page_zone(pfn_to_page(start)),
> +		.sync = true,
> +	};
> +	INIT_LIST_HEAD(&cc.migratepages);
> +
> +	migrate_prep_local();
> +
> +	while (pfn < end || !list_empty(&cc.migratepages)) {
> +		if (fatal_signal_pending(current)) {
> +			ret = -EINTR;
> +			break;
> +		}
> +
> +		if (list_empty(&cc.migratepages)) {
> +			cc.nr_migratepages = 0;
> +			pfn = isolate_migratepages_range(cc.zone, &cc,
> +							 pfn, end);
> +			if (!pfn) {
> +				ret = -EINTR;
> +				break;
> +			}
> +			tries = 0;
> +		} else if (++tries == 5) {
> +			ret = ret < 0 ? ret : -EBUSY;
> +			break;
> +		}
> +
> +		ret = migrate_pages(&cc.migratepages,
> +				    __alloc_contig_migrate_alloc,
> +				    0, false, true);
> +	}
> +
> +	putback_lru_pages(&cc.migratepages);
> +	return ret;
> +}
> +
> +/**
> + * alloc_contig_range() -- tries to allocate given range of pages
> + * @start:	start PFN to allocate
> + * @end:	one-past-the-last PFN to allocate
> + *
> + * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
> + * aligned, however it's the caller's responsibility to guarantee that
> + * we are the only thread that changes migrate type of pageblocks the
> + * pages fall in.
> + *
> + * The PFN range must belong to a single zone.
> + *
> + * Returns zero on success or negative error code.  On success all
> + * pages which PFN is in [start, end) are allocated for the caller and
> + * need to be freed with free_contig_range().
> + */
> +int alloc_contig_range(unsigned long start, unsigned long end)
> +{
> +	unsigned long outer_start, outer_end;
> +	int ret = 0, order;
> +
> +	/*
> +	 * What we do here is we mark all pageblocks in range as
> +	 * MIGRATE_ISOLATE.  Because of the way page allocator work, we
> +	 * align the range to MAX_ORDER pages so that page allocator
> +	 * won't try to merge buddies from different pageblocks and
> +	 * change MIGRATE_ISOLATE to some other migration type.
> +	 *
> +	 * Once the pageblocks are marked as MIGRATE_ISOLATE, we
> +	 * migrate the pages from an unaligned range (ie. pages that
> +	 * we are interested in).  This will put all the pages in
> +	 * range back to page allocator as MIGRATE_ISOLATE.
> +	 *
> +	 * When this is done, we take the pages in range from page
> +	 * allocator removing them from the buddy system.  This way
> +	 * page allocator will never consider using them.
> +	 *
> +	 * This lets us mark the pageblocks back as
> +	 * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
> +	 * MAX_ORDER aligned range but not in the unaligned, original
> +	 * range are put back to page allocator so that buddy can use
> +	 * them.
> +	 */
> +
> +	ret = start_isolate_page_range(pfn_align_to_maxpage_down(start),
> +				       pfn_align_to_maxpage_up(end));
> +	if (ret)
> +		goto done;
> +
> +	ret = __alloc_contig_migrate_range(start, end);
> +	if (ret)
> +		goto done;
> +
> +	/*
> +	 * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
> +	 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
> +	 * more, all pages in [start, end) are free in page allocator.
> +	 * What we are going to do is to allocate all pages from
> +	 * [start, end) (that is remove them from page allocater).
> +	 *
> +	 * The only problem is that pages at the beginning and at the
> +	 * end of interesting range may be not aligned with pages that
> +	 * page allocator holds, ie. they can be part of higher order
> +	 * pages.  Because of this, we reserve the bigger range and
> +	 * once this is done free the pages we are not interested in.
> +	 */
> +
> +	lru_add_drain_all();
> +	drain_all_pages();
> +

You unconditionally drain all pages here. It's up to you whether to
keep that or try reduce IPIs by only sending one if a page with count
0 is found in the range. I think it is something that could be followed
up on later and is not necessary for initial merging and wider testing.

> +	order = 0;
> +	outer_start = start;
> +	while (!PageBuddy(pfn_to_page(outer_start))) {
> +		if (WARN_ON(++order >= MAX_ORDER)) {
> +			ret = -EINVAL;
> +			goto done;
> +		}
> +		outer_start &= ~0UL << order;
> +	}
> +

Just a small note here - you are checking PageBuddy without zone->lock .
As you have isolated the range, you have a reasonable expectation that
this is safe but if you spin another version of the patch it might
justify a small comment.

> +	/* Make sure the range is really isolated. */
> +	if (test_pages_isolated(outer_start, end)) {
> +		pr_warn("__alloc_contig_migrate_range: test_pages_isolated(%lx, %lx) failed\n",
> +		       outer_start, end);
> +		ret = -EBUSY;
> +		goto done;
> +	}
> +
> +	outer_end = isolate_freepages_range(outer_start, end);
> +	if (!outer_end) {
> +		ret = -EBUSY;
> +		goto done;
> +	}
> +
> +	/* Free head and tail (if any) */
> +	if (start != outer_start)
> +		free_contig_range(outer_start, start - outer_start);
> +	if (end != outer_end)
> +		free_contig_range(end, outer_end - end);
> +
> +done:
> +	undo_isolate_page_range(pfn_align_to_maxpage_down(start),
> +				pfn_align_to_maxpage_up(end));
> +	return ret;
> +}
> +
> +void free_contig_range(unsigned long pfn, unsigned nr_pages)
> +{
> +	for (; nr_pages--; ++pfn)
> +		__free_page(pfn_to_page(pfn));
> +}
> +
> +#endif
> +
> +

Bit of whitespace damage there.

I confess that I did not read this one quite as carefully because I
think I looked a previous version that looked ok at the time. As it
affects CMA and only CMA I also expect others will be spending a lot
of effort and testing on this. Nothing obvious or horrible jumped out
at me other than the page_isolation.h thing and that could be argued
either way so;

Acked-by: Mel Gorman <mel@csn.ul.ie>

>  /*
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  /*
>   * All pages in the range must be isolated before calling this.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/15] mm: page_alloc: change fallbacks array handling
  2012-01-26  9:00 ` [PATCH 07/15] mm: page_alloc: change fallbacks array handling Marek Szyprowski
@ 2012-01-30 12:12   ` Mel Gorman
  0 siblings, 0 replies; 61+ messages in thread
From: Mel Gorman @ 2012-01-30 12:12 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Thu, Jan 26, 2012 at 10:00:49AM +0100, Marek Szyprowski wrote:
> From: Michal Nazarewicz <mina86@mina86.com>
> 
> This commit adds a row for MIGRATE_ISOLATE type to the fallbacks array
> which was missing from it.  It also, changes the array traversal logic
> a little making MIGRATE_RESERVE an end marker.  The letter change,
> removes the implicit MIGRATE_UNMOVABLE from the end of each row which
> was read by __rmqueue_fallback() function.
> 
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 05/15] mm: compaction: export some of the functions
  2012-01-30 11:57   ` Mel Gorman
@ 2012-01-30 12:33     ` Michal Nazarewicz
  0 siblings, 0 replies; 61+ messages in thread
From: Michal Nazarewicz @ 2012-01-30 12:33 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann, Jesse Barker,
	Jonathan Corbet, Shariq Hasnain, Chunsang Jeong, Dave Hansen,
	Benjamin Gaignard

> On Thu, Jan 26, 2012 at 10:00:47AM +0100, Marek Szyprowski wrote:
>> From: Michal Nazarewicz <mina86@mina86.com>
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -16,30 +16,11 @@
>>  #include <linux/sysfs.h>
>>  #include "internal.h"
>>
>> +#if defined CONFIG_COMPACTION || defined CONFIG_CMA
>> +

On Mon, 30 Jan 2012 12:57:26 +0100, Mel Gorman <mel@csn.ul.ie> wrote:
> This is pedantic but you reference CONFIG_CMA before the patch that
> declares it. The only time this really matters is when it breaks
> bisection but I do not think that is the case here.

I think I'll choose to be lazy on this one. ;) I actually tried to move
some commits around to resolve this future-reference, but this resulted
in quite a few conflicts during rebase and after several minutes I decided
that it's not worth the effort.

> Whether you fix this or not by moving the CONFIG_CMA check to the same
> patch that declares it in Kconfig
>
> Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added
  2012-01-26  9:00 ` [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added Marek Szyprowski
@ 2012-01-30 12:35   ` Mel Gorman
  2012-01-30 13:06     ` Michal Nazarewicz
  0 siblings, 1 reply; 61+ messages in thread
From: Mel Gorman @ 2012-01-30 12:35 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Thu, Jan 26, 2012 at 10:00:50AM +0100, Marek Szyprowski wrote:
> From: Michal Nazarewicz <mina86@mina86.com>
> 
> The MIGRATE_CMA migration type has two main characteristics:
> (i) only movable pages can be allocated from MIGRATE_CMA
> pageblocks and (ii) page allocator will never change migration
> type of MIGRATE_CMA pageblocks.
> 
> This guarantees (to some degree) that page in a MIGRATE_CMA page
> block can always be migrated somewhere else (unless there's no
> memory left in the system).
> 
> It is designed to be used for allocating big chunks (eg. 10MiB)
> of physically contiguous memory.  Once driver requests
> contiguous memory, pages from MIGRATE_CMA pageblocks may be
> migrated away to create a contiguous block.
> 
> To minimise number of migrations, MIGRATE_CMA migration type
> is the last type tried when page allocator falls back to other
> migration types then requested.
> 
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> ---
>  include/linux/mmzone.h         |   43 +++++++++++++++++++++----
>  include/linux/page-isolation.h |    3 ++
>  mm/Kconfig                     |    2 +-
>  mm/compaction.c                |   11 +++++--
>  mm/page_alloc.c                |   68 +++++++++++++++++++++++++++++++++-------
>  mm/vmstat.c                    |    3 ++
>  6 files changed, 107 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 650ba2f..fcd4a14 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -35,13 +35,37 @@
>   */
>  #define PAGE_ALLOC_COSTLY_ORDER 3
>  
> -#define MIGRATE_UNMOVABLE     0
> -#define MIGRATE_RECLAIMABLE   1
> -#define MIGRATE_MOVABLE       2
> -#define MIGRATE_PCPTYPES      3 /* the number of types on the pcp lists */
> -#define MIGRATE_RESERVE       3
> -#define MIGRATE_ISOLATE       4 /* can't allocate from here */
> -#define MIGRATE_TYPES         5
> +enum {
> +	MIGRATE_UNMOVABLE,
> +	MIGRATE_RECLAIMABLE,
> +	MIGRATE_MOVABLE,
> +	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
> +	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
> +#ifdef CONFIG_CMA
> +	/*
> +	 * MIGRATE_CMA migration type is designed to mimic the way
> +	 * ZONE_MOVABLE works.  Only movable pages can be allocated
> +	 * from MIGRATE_CMA pageblocks and page allocator never
> +	 * implicitly change migration type of MIGRATE_CMA pageblock.
> +	 *
> +	 * The way to use it is to change migratetype of a range of
> +	 * pageblocks to MIGRATE_CMA which can be done by
> +	 * __free_pageblock_cma() function.  What is important though
> +	 * is that a range of pageblocks must be aligned to
> +	 * MAX_ORDER_NR_PAGES should biggest page be bigger then
> +	 * a single pageblock.
> +	 */
> +	MIGRATE_CMA,
> +#endif
> +	MIGRATE_ISOLATE,	/* can't allocate from here */
> +	MIGRATE_TYPES
> +};
> +
> +#ifdef CONFIG_CMA
> +#  define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
> +#else
> +#  define is_migrate_cma(migratetype) false
> +#endif
>  
>  #define for_each_migratetype_order(order, type) \
>  	for (order = 0; order < MAX_ORDER; order++) \
> @@ -54,6 +78,11 @@ static inline int get_pageblock_migratetype(struct page *page)
>  	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
>  }
>  
> +static inline bool is_pageblock_cma(struct page *page)
> +{
> +	return is_migrate_cma(get_pageblock_migratetype(page));
> +}
> +
>  struct free_area {
>  	struct list_head	free_list[MIGRATE_TYPES];
>  	unsigned long		nr_free;
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 430cf61..454dd29 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -45,6 +45,9 @@ extern void unset_migratetype_isolate(struct page *page);
>  extern int alloc_contig_range(unsigned long start, unsigned long end);
>  extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
>  
> +/* CMA stuff */
> +extern void init_cma_reserved_pageblock(struct page *page);
> +
>  #endif
>  
>  #endif
> diff --git a/mm/Kconfig b/mm/Kconfig
> index e338407..3922002 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -198,7 +198,7 @@ config COMPACTION
>  config MIGRATION
>  	bool "Page migration"
>  	def_bool y
> -	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
> +	depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA
>  	help
>  	  Allows the migration of the physical location of pages of processes
>  	  while the virtual addresses are not changed. This is useful in
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 3e21d28..a075b43 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -35,6 +35,11 @@ static unsigned long release_freepages(struct list_head *freelist)
>  	return count;
>  }
>  
> +static inline bool migrate_async_suitable(int migratetype)
> +{
> +	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
> +}
> +
>  /*
>   * Isolate free pages onto a private freelist. Caller must hold zone->lock.
>   * If @strict is true, will abort returning 0 on any invalid PFNs or non-free
> @@ -274,7 +279,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>  		 */
>  		pageblock_nr = low_pfn >> pageblock_order;
>  		if (!cc->sync && last_pageblock_nr != pageblock_nr &&
> -				get_pageblock_migratetype(page) != MIGRATE_MOVABLE) {
> +		    migrate_async_suitable(get_pageblock_migratetype(page))) {
>  			low_pfn += pageblock_nr_pages;
>  			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
>  			last_pageblock_nr = pageblock_nr;
> @@ -342,8 +347,8 @@ static bool suitable_migration_target(struct page *page)
>  	if (PageBuddy(page) && page_order(page) >= pageblock_order)
>  		return true;
>  
> -	/* If the block is MIGRATE_MOVABLE, allow migration */
> -	if (migratetype == MIGRATE_MOVABLE)
> +	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
> +	if (migrate_async_suitable(migratetype))
>  		return true;
>  
>  	/* Otherwise skip the block */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0a9cc8e..0fcde78 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -750,6 +750,26 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order)
>  	__free_pages(page, order);
>  }
>  
> +#ifdef CONFIG_CMA
> +/*
> + * Free whole pageblock and set it's migration type to MIGRATE_CMA.
> + */
> +void __init init_cma_reserved_pageblock(struct page *page)
> +{
> +	unsigned i = pageblock_nr_pages;
> +	struct page *p = page;
> +
> +	do {
> +		__ClearPageReserved(p);
> +		set_page_count(p, 0);
> +	} while (++p, --i);
> +
> +	set_page_refcounted(page);
> +	set_pageblock_migratetype(page, MIGRATE_CMA);
> +	__free_pages(page, pageblock_order);
> +	totalram_pages += pageblock_nr_pages;
> +}
> +#endif
>  
>  /*
>   * The order of subdivision here is critical for the IO subsystem.
> @@ -875,10 +895,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
>   * This array describes the order lists are fallen back to when
>   * the free lists for the desirable migrate type are depleted
>   */
> -static int fallbacks[MIGRATE_TYPES][3] = {
> +static int fallbacks[MIGRATE_TYPES][4] = {
>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
> +#ifdef CONFIG_CMA
> +	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },

This is a curious choice. MIGRATE_CMA is allowed to contain movable
pages. By using MIGRATE_RECLAIMABLE and MIGRATE_UNMOVABLE for movable
pages instead of MIGRATE_CMA, you increase the changes that unmovable
pages will need to use MIGRATE_MOVABLE in the future which impacts
fragmentation avoidance. I would recommend that you change this to

{ MIGRATE_CMA, MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE }

> +	[MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
> +#else
>  	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> +#endif
>  	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
>  	[MIGRATE_ISOLATE]     = { MIGRATE_RESERVE }, /* Never used */
>  };

You should also be aware that you may have problems with zone
balancing. If MIGRATE_CMA is large and it is the only free memory
then UNMOVABLE and RECLAIMABLE allocations will fail. kswapd will
not necessarily help because it is checking the watermarks and the
watermarks may be fine. It's actually the reason ZONE_MOVABLE was
created originally.

> @@ -995,11 +1020,18 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  			 * pages to the preferred allocation list. If falling
>  			 * back for a reclaimable kernel allocation, be more
>  			 * aggressive about taking ownership of free pages
> +			 *
> +			 * On the other hand, never change migration
> +			 * type of MIGRATE_CMA pageblocks nor move CMA
> +			 * pages on different free lists. We don't
> +			 * want unmovable pages to be allocated from
> +			 * MIGRATE_CMA areas.
>  			 */
> -			if (unlikely(current_order >= (pageblock_order >> 1)) ||
> -					start_migratetype == MIGRATE_RECLAIMABLE ||
> -					page_group_by_mobility_disabled) {
> -				unsigned long pages;
> +			if (!is_pageblock_cma(page) &&
> +			    (unlikely(current_order >= pageblock_order / 2) ||
> +			     start_migratetype == MIGRATE_RECLAIMABLE ||
> +			     page_group_by_mobility_disabled)) {
> +				int pages;

You call is_pageblock_cma(page) here which in turn calls
get_pageblock_migratetype(). get_pageblock_migratetype() should be
avoided where possible and it is unecessary in this context because
we know what the migratetype of page. Use that information instead of
calling get_pageblock_migratetype().

>  				pages = move_freepages_block(zone, page,
>  								start_migratetype);
>  
> @@ -1017,11 +1049,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  			rmv_page_order(page);
>  
>  			/* Take ownership for orders >= pageblock_order */
> -			if (current_order >= pageblock_order)
> +			if (current_order >= pageblock_order &&
> +			    !is_pageblock_cma(page))

Same, the get_pageblock_migratetype() call can be avoided.

>  				change_pageblock_range(page, current_order,
>  							start_migratetype);
>  
> -			expand(zone, page, order, current_order, area, migratetype);
> +			expand(zone, page, order, current_order, area,
> +			       is_migrate_cma(start_migratetype)
> +			     ? start_migratetype : migratetype);
>  

What is this check meant to be doing?

start_migratetype is determined by allocflags_to_migratetype() and
that never will be MIGRATE_CMA so is_migrate_cma(start_migratetype)
should always be false.

>  			trace_mm_page_alloc_extfrag(page, order, current_order,
>  				start_migratetype, migratetype);
> @@ -1093,7 +1128,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>  			list_add(&page->lru, list);
>  		else
>  			list_add_tail(&page->lru, list);
> -		set_page_private(page, migratetype);
> +#ifdef CONFIG_CMA
> +		if (is_pageblock_cma(page))
> +			set_page_private(page, MIGRATE_CMA);
> +		else
> +#endif
> +			set_page_private(page, migratetype);
>  		list = &page->lru;
>  	}
>  	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
> @@ -1337,8 +1377,12 @@ int split_free_page(struct page *page)
>  
>  	if (order >= pageblock_order - 1) {
>  		struct page *endpage = page + (1 << order) - 1;
> -		for (; page < endpage; page += pageblock_nr_pages)
> -			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> +		for (; page < endpage; page += pageblock_nr_pages) {
> +			int mt = get_pageblock_migratetype(page);
> +			if (mt != MIGRATE_ISOLATE && !is_migrate_cma(mt))
> +				set_pageblock_migratetype(page,
> +							  MIGRATE_MOVABLE);
> +		}
>  	}
>  
>  	return 1 << order;
> @@ -5375,8 +5419,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
>  	 */
>  	if (zone_idx(zone) == ZONE_MOVABLE)
>  		return true;
> -
> -	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
> +	if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
> +	    is_pageblock_cma(page))
>  		return true;
>  
>  	pfn = page_to_pfn(page);
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index f600557..ace5383 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -613,6 +613,9 @@ static char * const migratetype_names[MIGRATE_TYPES] = {
>  	"Reclaimable",
>  	"Movable",
>  	"Reserve",
> +#ifdef CONFIG_CMA
> +	"CMA",
> +#endif
>  	"Isolate",
>  };
>  
> -- 
> 1.7.1.569.g6f426
> 

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/15] mm: extract reclaim code from __alloc_pages_direct_reclaim()
  2012-01-26  9:00 ` [PATCH 10/15] mm: extract reclaim code from __alloc_pages_direct_reclaim() Marek Szyprowski
@ 2012-01-30 12:42   ` Mel Gorman
  0 siblings, 0 replies; 61+ messages in thread
From: Mel Gorman @ 2012-01-30 12:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Thu, Jan 26, 2012 at 10:00:52AM +0100, Marek Szyprowski wrote:
> This patch extracts common reclaim code from __alloc_pages_direct_reclaim()
> function to separate function: __perform_reclaim() which can be later used
> by alloc_contig_range().
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> ---
>  mm/page_alloc.c |   30 +++++++++++++++++++++---------
>  1 files changed, 21 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4e60c0b..e35d06b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2094,16 +2094,13 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
>  }
>  #endif /* CONFIG_COMPACTION */
>  
> -/* The really slow allocator path where we enter direct reclaim */
> -static inline struct page *
> -__alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
> -	struct zonelist *zonelist, enum zone_type high_zoneidx,
> -	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
> -	int migratetype, unsigned long *did_some_progress)
> +/* Perform direct synchronous page reclaim */
> +static inline int
> +__perform_reclaim(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist,
> +		  nodemask_t *nodemask)

This function is too large to be inlined. Make it a static int. Once
that is fixed add a

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/15] mm: compaction: introduce isolate_migratepages_range().
  2012-01-30 11:24   ` Mel Gorman
@ 2012-01-30 12:42     ` Michal Nazarewicz
  2012-01-30 13:25       ` Mel Gorman
  0 siblings, 1 reply; 61+ messages in thread
From: Michal Nazarewicz @ 2012-01-30 12:42 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann, Jesse Barker,
	Jonathan Corbet, Shariq Hasnain, Chunsang Jeong, Dave Hansen,
	Benjamin Gaignard

> On Thu, Jan 26, 2012 at 10:00:45AM +0100, Marek Szyprowski wrote:
>> From: Michal Nazarewicz <mina86@mina86.com>
>> @@ -313,7 +316,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>>  		} else if (!locked)
>>  			spin_lock_irq(&zone->lru_lock);
>>
>> -		if (!pfn_valid_within(low_pfn))
>> +		if (!pfn_valid(low_pfn))
>>  			continue;
>>  		nr_scanned++;
>>

On Mon, 30 Jan 2012 12:24:28 +0100, Mel Gorman <mel@csn.ul.ie> wrote:
> This chunk looks unrelated to the rest of the patch.
>
> I think what you are doing is patching around a bug that CMA exposed
> which is very similar to the bug report at
> http://www.spinics.net/lists/linux-mm/msg29260.html . Is this true?
>
> If so, I posted a fix that only calls pfn_valid() when necessary. Can
> you check if that works for you and if so, drop this hunk please? If
> the patch does not work for you, then this hunk still needs to be
> in a separate patch and handled separately as it would also be a fix
> for -stable.

I'll actually never encountered this bug myself and CMA is unlikely to
expose it, since it always operates on continuous memory regions with
no holes.

I've made this change because looking at the code it seemed like this
may cause problems in some cases.  The crash that you linked to looks
like the kind of problem I was thinking about.

I'll drop this hunk and let you resolve this independently of CMA.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 11/15] mm: trigger page reclaim in alloc_contig_range() to stabilize watermarks
  2012-01-26  9:00 ` [PATCH 11/15] mm: trigger page reclaim in alloc_contig_range() to stabilize watermarks Marek Szyprowski
@ 2012-01-30 13:05   ` Mel Gorman
  2012-01-31 17:15     ` Marek Szyprowski
  0 siblings, 1 reply; 61+ messages in thread
From: Mel Gorman @ 2012-01-30 13:05 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Michal Nazarewicz, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Thu, Jan 26, 2012 at 10:00:53AM +0100, Marek Szyprowski wrote:
> alloc_contig_range() performs memory allocation so it also should keep
> track on keeping the correct level of memory watermarks. This commit adds
> a call to *_slowpath style reclaim to grab enough pages to make sure that
> the final collection of contiguous pages from freelists will not starve
> the system.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> CC: Michal Nazarewicz <mina86@mina86.com>
> ---
>  mm/page_alloc.c |   36 ++++++++++++++++++++++++++++++++++++
>  1 files changed, 36 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e35d06b..05eaa82 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5613,6 +5613,34 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
>  	return ret;
>  }
>  
> +/*
> + * Trigger memory pressure bump to reclaim some pages in order to be able to
> + * allocate 'count' pages in single page units. Does similar work as
> + *__alloc_pages_slowpath() function.
> + */
> +static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count)
> +{
> +	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
> +	struct zonelist *zonelist = node_zonelist(0, gfp_mask);
> +	int did_some_progress = 0;
> +	int order = 1;
> +	unsigned long watermark;
> +
> +	/* Obey watermarks as if the page was being allocated */
> +	watermark = low_wmark_pages(zone) + count;
> +	while (!zone_watermark_ok(zone, 0, watermark, 0, 0)) {
> +		wake_all_kswapd(order, zonelist, high_zoneidx, zone_idx(zone));
> +
> +		did_some_progress = __perform_reclaim(gfp_mask, order, zonelist,
> +						      NULL);
> +		if (!did_some_progress) {
> +			/* Exhausted what can be done so it's blamo time */
> +			out_of_memory(zonelist, gfp_mask, order, NULL);
> +		}

There are three problems here

1. CMA can trigger the OOM killer.

That seems like overkill to me but as I do not know the consequences
of CMA failing, it's your call.

2. You cannot guarantee that try_to_free_pages will free pages from the
   zone you care about or that kswapd will do anything

You check the watermarks and take into account the size of the pending
CMA allocation. kswapd in vmscan.c on the other hand will simply check
the watermarks and probably go back to sleep. You should be aware of
this in case you ever get bugs that CMA takes too long and that it
appears to be stuck in this loop with kswapd staying asleep.

3. You reclaim from zones other than your target zone

try_to_free_pages is not necessarily going to free pages in the
zone you are checking for. It'll work on ARM in many cases because
there will be only one zone but on other arches, this logic will
be problematic and will potentially livelock. You need to pass in
a zonelist that only contains the zone that CMA cares about. If it
cannot reclaim, did_some_progress == 0 and it'll exit. Otherwise
there is a possibility that this will loop forever reclaiming pages
from the wrong zones.

I won't ack this particular patch but I am not going to insist that
you fix these prior to merging either. If you leave problem 3 as it
is, I would really like to see a comment explaning the problem for
future users of CMA on other arches (if they exist).

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added
  2012-01-30 12:35   ` Mel Gorman
@ 2012-01-30 13:06     ` Michal Nazarewicz
  2012-01-30 14:52       ` Mel Gorman
  0 siblings, 1 reply; 61+ messages in thread
From: Michal Nazarewicz @ 2012-01-30 13:06 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann, Jesse Barker,
	Jonathan Corbet, Shariq Hasnain, Chunsang Jeong, Dave Hansen,
	Benjamin Gaignard

> On Thu, Jan 26, 2012 at 10:00:50AM +0100, Marek Szyprowski wrote:
>> From: Michal Nazarewicz <mina86@mina86.com>
>> @@ -875,10 +895,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
>>   * This array describes the order lists are fallen back to when
>>   * the free lists for the desirable migrate type are depleted
>>   */
>> -static int fallbacks[MIGRATE_TYPES][3] = {
>> +static int fallbacks[MIGRATE_TYPES][4] = {
>>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
>>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
>> +#ifdef CONFIG_CMA
>> +	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA    , MIGRATE_RESERVE },

On Mon, 30 Jan 2012 13:35:42 +0100, Mel Gorman <mel@csn.ul.ie> wrote:
> This is a curious choice. MIGRATE_CMA is allowed to contain movable
> pages. By using MIGRATE_RECLAIMABLE and MIGRATE_UNMOVABLE for movable
> pages instead of MIGRATE_CMA, you increase the changes that unmovable
> pages will need to use MIGRATE_MOVABLE in the future which impacts
> fragmentation avoidance. I would recommend that you change this to
>
> { MIGRATE_CMA, MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE }

At the beginning the idea was to try hard not to get pages from MIGRATE_CMA
allocated at all, thus it was put at the end of the fallbacks list, but on
a busy system this probably won't help anyway, so I'll change it per your
suggestion.

>> @@ -1017,11 +1049,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>>  			rmv_page_order(page);
>>
>>  			/* Take ownership for orders >= pageblock_order */
>> -			if (current_order >= pageblock_order)
>> +			if (current_order >= pageblock_order &&
>> +			    !is_pageblock_cma(page))
>>  				change_pageblock_range(page, current_order,
>>  							start_migratetype);
>>
>> -			expand(zone, page, order, current_order, area, migratetype);
>> +			expand(zone, page, order, current_order, area,
>> +			       is_migrate_cma(start_migratetype)
>> +			     ? start_migratetype : migratetype);
>>
>
> What is this check meant to be doing?
>
> start_migratetype is determined by allocflags_to_migratetype() and
> that never will be MIGRATE_CMA so is_migrate_cma(start_migratetype)
> should always be false.

Right, thanks!  This should be the other way around, ie.:

+			expand(zone, page, order, current_order, area,
+			       is_migrate_cma(migratetype)
+			     ? migratetype : start_migratetype);

I'll fix this and the calls to is_pageblock_cma().

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv19 00/15] Contiguous Memory Allocator
  2012-01-28  0:26   ` Andrew Morton
  2012-01-29 18:09     ` Rob Clark
  2012-01-29 20:51     ` Arnd Bergmann
@ 2012-01-30 13:25     ` Mel Gorman
  2012-01-30 15:43       ` Michal Nazarewicz
  2012-02-10 18:10     ` Marek Szyprowski
  3 siblings, 1 reply; 61+ messages in thread
From: Mel Gorman @ 2012-01-30 13:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Arnd Bergmann, Marek Szyprowski, linux-kernel, linux-arm-kernel,
	linux-media, linux-mm, linaro-mm-sig, Michal Nazarewicz,
	Kyungmin Park, Russell King, KAMEZAWA Hiroyuki, Daniel Walker,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Fri, Jan 27, 2012 at 04:26:24PM -0800, Andrew Morton wrote:
> On Thu, 26 Jan 2012 15:31:40 +0000
> Arnd Bergmann <arnd@arndb.de> wrote:
> 
> > On Thursday 26 January 2012, Marek Szyprowski wrote:
> > > Welcome everyone!
> > > 
> > > Yes, that's true. This is yet another release of the Contiguous Memory
> > > Allocator patches. This version mainly includes code cleanups requested
> > > by Mel Gorman and a few minor bug fixes.
> > 
> > Hi Marek,
> > 
> > Thanks for keeping up this work! I really hope it works out for the
> > next merge window.
> 
> Someone please tell me when it's time to start paying attention
> again ;)
> 
> These patches don't seem to have as many acked-bys and reviewed-bys as
> I'd expect. 

I reviewed the core MM changes and I've acked most of them so the
next release should have a few acks where you expect them. I did not
add a reviewed-by because I did not build and test the thing.

For me, Patch 2 is the only one that must be fixed prior to merging
as it can interfere with pages on a remote per-cpu list which is
dangerous. I know your suggestion will be to delete the per-cpu lists
and be done with it but I am a bit away from doing that just yet.

Patch 8 could do with a bit more care too but it is not a
potential hand grenade like patch 2 and could be fixed as part of
a follow-up. Even if you don't see an ack from me there, it should not
be treated as a show stopper.

I highlighted some issues on how CMA interacts with reclaim but I
think this is a problem specific to CMA and should not prevent it being
merged. I just wanted to be sure that the CMA people were aware of the
potential issues so they will recognise the class of bug if it occurs.

> Given the scope and duration of this, it would be useful
> to gather these up.  But please ensure they are real ones - people
> sometimes like to ack things without showing much sign of having
> actually read them.
> 

FWIW, the acks I put on the core MM changes are real acks :)

> The patches do seem to have been going round in ever-decreasing circles
> lately and I think we have decided to merge them (yes?) so we may as well
> get on and do that and sort out remaining issues in-tree.
> 

I'm a lot happier with the core MM patches than I was when I reviewed
this first around last September or October.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/15] mm: compaction: introduce isolate_migratepages_range().
  2012-01-30 12:42     ` Michal Nazarewicz
@ 2012-01-30 13:25       ` Mel Gorman
  0 siblings, 0 replies; 61+ messages in thread
From: Mel Gorman @ 2012-01-30 13:25 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Mon, Jan 30, 2012 at 01:42:50PM +0100, Michal Nazarewicz wrote:
> >On Thu, Jan 26, 2012 at 10:00:45AM +0100, Marek Szyprowski wrote:
> >>From: Michal Nazarewicz <mina86@mina86.com>
> >>@@ -313,7 +316,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> >> 		} else if (!locked)
> >> 			spin_lock_irq(&zone->lru_lock);
> >>
> >>-		if (!pfn_valid_within(low_pfn))
> >>+		if (!pfn_valid(low_pfn))
> >> 			continue;
> >> 		nr_scanned++;
> >>
> 
> On Mon, 30 Jan 2012 12:24:28 +0100, Mel Gorman <mel@csn.ul.ie> wrote:
> >This chunk looks unrelated to the rest of the patch.
> >
> >I think what you are doing is patching around a bug that CMA exposed
> >which is very similar to the bug report at
> >http://www.spinics.net/lists/linux-mm/msg29260.html . Is this true?
> >
> >If so, I posted a fix that only calls pfn_valid() when necessary. Can
> >you check if that works for you and if so, drop this hunk please? If
> >the patch does not work for you, then this hunk still needs to be
> >in a separate patch and handled separately as it would also be a fix
> >for -stable.
> 
> I'll actually never encountered this bug myself and CMA is unlikely to
> expose it, since it always operates on continuous memory regions with
> no holes.
> 
> I've made this change because looking at the code it seemed like this
> may cause problems in some cases.  The crash that you linked to looks
> like the kind of problem I was thinking about.
> 
> I'll drop this hunk and let you resolve this independently of CMA.
> 

Ok, thanks.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added
  2012-01-30 13:06     ` Michal Nazarewicz
@ 2012-01-30 14:52       ` Mel Gorman
  0 siblings, 0 replies; 61+ messages in thread
From: Mel Gorman @ 2012-01-30 14:52 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Mon, Jan 30, 2012 at 02:06:50PM +0100, Michal Nazarewicz wrote:
> >>@@ -1017,11 +1049,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> >> 			rmv_page_order(page);
> >>
> >> 			/* Take ownership for orders >= pageblock_order */
> >>-			if (current_order >= pageblock_order)
> >>+			if (current_order >= pageblock_order &&
> >>+			    !is_pageblock_cma(page))
> >> 				change_pageblock_range(page, current_order,
> >> 							start_migratetype);
> >>
> >>-			expand(zone, page, order, current_order, area, migratetype);
> >>+			expand(zone, page, order, current_order, area,
> >>+			       is_migrate_cma(start_migratetype)
> >>+			     ? start_migratetype : migratetype);
> >>
> >
> >What is this check meant to be doing?
> >
> >start_migratetype is determined by allocflags_to_migratetype() and
> >that never will be MIGRATE_CMA so is_migrate_cma(start_migratetype)
> >should always be false.
> 
> Right, thanks!  This should be the other way around, ie.:
> 
> +			expand(zone, page, order, current_order, area,
> +			       is_migrate_cma(migratetype)
> +			     ? migratetype : start_migratetype);
> 
> I'll fix this and the calls to is_pageblock_cma().
> 

That makes a lot more sense. Thanks.

I have a vague recollection that there was a problem with finding
unmovable pages in MIGRATE_CMA regions. This might have been part of
the problem.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating
  2012-01-30 11:15   ` Mel Gorman
@ 2012-01-30 15:41     ` Michal Nazarewicz
  2012-01-30 16:14       ` Mel Gorman
  0 siblings, 1 reply; 61+ messages in thread
From: Michal Nazarewicz @ 2012-01-30 15:41 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, Kyungmin Park, Russell King, Andrew Morton,
	KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann, Jesse Barker,
	Jonathan Corbet, Shariq Hasnain, Chunsang Jeong, Dave Hansen,
	Benjamin Gaignard

On Mon, 30 Jan 2012 12:15:22 +0100, Mel Gorman <mel@csn.ul.ie> wrote:

> On Thu, Jan 26, 2012 at 10:00:44AM +0100, Marek Szyprowski wrote:
>> From: Michal Nazarewicz <mina86@mina86.com>
>> @@ -139,3 +139,27 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
>>  	spin_unlock_irqrestore(&zone->lock, flags);
>>  	return ret ? 0 : -EBUSY;
>>  }
>> +
>> +/* must hold zone->lock */
>> +void update_pcp_isolate_block(unsigned long pfn)
>> +{
>> +	unsigned long end_pfn = pfn + pageblock_nr_pages;
>> +	struct page *page;
>> +
>> +	while (pfn < end_pfn) {
>> +		if (!pfn_valid_within(pfn)) {
>> +			++pfn;
>> +			continue;
>> +		}
>> +

On Mon, 30 Jan 2012 12:15:22 +0100, Mel Gorman <mel@csn.ul.ie> wrote:
> There is a potential problem here that you need to be aware of.
> set_pageblock_migratetype() is called from start_isolate_page_range().
> I do not think there is a guarantee that pfn + pageblock_nr_pages is
> not in a different block of MAX_ORDER_NR_PAGES. If that is right then
> your options are to add a check like this;
>
> if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0 && !pfn_valid(pfn))
> 	break;
>
> or else ensure that end_pfn is always MAX_ORDER_NR_PAGES aligned and in
> the same block as pfn and relying on the caller to have called
> pfn_valid.

	pfn = round_down(pfn, pageblock_nr_pages);
	end_pfn = pfn + pageblock_nr_pages;

should do the trick as well, right?  move_freepages_block() seem to be
doing the same thing.

>> +		page = pfn_to_page(pfn);
>> +		if (PageBuddy(page)) {
>> +			pfn += 1 << page_order(page);
>> +		} else if (page_count(page) == 0) {
>> +			set_page_private(page, MIGRATE_ISOLATE);
>> +			++pfn;
>
> This is dangerous for two reasons. If the page_count is 0, it could
> be because the page is in the process of being freed and is not
> necessarily on the per-cpu lists yet and you cannot be sure if the
> contents of page->private are important. Second, there is nothing to
> prevent another CPU allocating this page from its per-cpu list while
> the private field is getting updated from here which might lead to
> some interesting races.
>
> I recognise that what you are trying to do is respond to Gilad's
> request that you really check if an IPI here is necessary. I think what
> you need to do is check if a page with a count of 0 is encountered
> and if it is, then a draining of the per-cpu lists is necessary. To
> address Gilad's concerns, be sure to only this this once per attempt at
> CMA rather than for every page encountered with a count of 0 to avoid a
> storm of IPIs.

It's actually more then that.

This is the same issue that I first fixed with a change to free_pcppages_bulk()
function[1].  At the time of positing, you said you'd like me to try and find
a different solution which would not involve paying the price of calling
get_pageblock_migratetype().  Later I also realised that this solution is
not enough.

[1] http://article.gmane.org/gmane.linux.kernel.mm/70314

My next attempt was to run drain PCP list while holding zone->lock[2], but that
quickly proven to be broken approach when Marek started testing it on an SMP
system.

[2] http://article.gmane.org/gmane.linux.kernel.mm/72016

This patch is yet another attempt of solving this old issue.  Even though it has
a potential race condition we came to conclusion that the actual chances of
causing any problems are slim.  Various stress tests did not, in fact, show
the race to be an issue.

The problem is that if a page is on a PCP list, and it's underlaying pageblocks'
migrate type is changed to MIGRATE_ISOLATE, the page (i) will still remain on PCP
list and thus someone can allocate it, and (ii) when removed from PCP list, the
page will be put on freelist of migrate type it had prior to change.

(i) is actually not such a big issue since the next thing that happens after
isolation is migration so all the pages will get freed.  (ii) is actual problem
and if [1] is not an acceptable solution I really don't have a good fix for that.

One things that comes to mind is calling drain_all_pages() prior to acquiring
zone->lock in set_migratetype_isolate().  This is however prone to races since
after the drain and before the zone->lock is acquired, pages might get moved
back to PCP list.

Draining PCP list after acquiring zone->lock is not possible because
smp_call_function_many() cannot be called with interrupts disabled, and changing
spin_lock_irqsave() to spin_lock() followed by local_irq_save() causes a dead
lock (that's what [2] attempted to do).

Any suggestions are welcome!

>> +		} else {
>> +			++pfn;
>> +		}
>> +	}
>> +}

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCHv19 00/15] Contiguous Memory Allocator
  2012-01-30 13:25     ` Mel Gorman
@ 2012-01-30 15:43       ` Michal Nazarewicz
       [not found]         ` <CA+M3ks7h1t6DbPSAhPN6LJ5Dw84hSukfWG16avh2eZL+o4caJg@mail.gmail.com>
  0 siblings, 1 reply; 61+ messages in thread
From: Michal Nazarewicz @ 2012-01-30 15:43 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman
  Cc: Arnd Bergmann, Marek Szyprowski, linux-kernel, linux-arm-kernel,
	linux-media, linux-mm, linaro-mm-sig, Kyungmin Park,
	Russell King, KAMEZAWA Hiroyuki, Daniel Walker, Jesse Barker,
	Jonathan Corbet, Shariq Hasnain, Chunsang Jeong, Dave Hansen,
	Benjamin Gaignard

On Mon, 30 Jan 2012 14:25:12 +0100, Mel Gorman <mel@csn.ul.ie> wrote:
> I reviewed the core MM changes and I've acked most of them so the
> next release should have a few acks where you expect them. I did not
> add a reviewed-by because I did not build and test the thing.

Thanks!

I've either replied to your comments or applied suggested changes.
If anyone cares, not-tested changes are available at
	git://github.com/mina86/linux-2.6.git cma

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating
  2012-01-30 15:41     ` Michal Nazarewicz
@ 2012-01-30 16:14       ` Mel Gorman
  2012-01-31 16:23         ` Marek Szyprowski
  0 siblings, 1 reply; 61+ messages in thread
From: Mel Gorman @ 2012-01-30 16:14 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, Kyungmin Park, Russell King,
	Andrew Morton, KAMEZAWA Hiroyuki, Daniel Walker, Arnd Bergmann,
	Jesse Barker, Jonathan Corbet, Shariq Hasnain, Chunsang Jeong,
	Dave Hansen, Benjamin Gaignard

On Mon, Jan 30, 2012 at 04:41:22PM +0100, Michal Nazarewicz wrote:
> On Mon, 30 Jan 2012 12:15:22 +0100, Mel Gorman <mel@csn.ul.ie> wrote:
> 
> >On Thu, Jan 26, 2012 at 10:00:44AM +0100, Marek Szyprowski wrote:
> >>From: Michal Nazarewicz <mina86@mina86.com>
> >>@@ -139,3 +139,27 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
> >> 	spin_unlock_irqrestore(&zone->lock, flags);
> >> 	return ret ? 0 : -EBUSY;
> >> }
> >>+
> >>+/* must hold zone->lock */
> >>+void update_pcp_isolate_block(unsigned long pfn)
> >>+{
> >>+	unsigned long end_pfn = pfn + pageblock_nr_pages;
> >>+	struct page *page;
> >>+
> >>+	while (pfn < end_pfn) {
> >>+		if (!pfn_valid_within(pfn)) {
> >>+			++pfn;
> >>+			continue;
> >>+		}
> >>+
> 
> On Mon, 30 Jan 2012 12:15:22 +0100, Mel Gorman <mel@csn.ul.ie> wrote:
> >There is a potential problem here that you need to be aware of.
> >set_pageblock_migratetype() is called from start_isolate_page_range().
> >I do not think there is a guarantee that pfn + pageblock_nr_pages is
> >not in a different block of MAX_ORDER_NR_PAGES. If that is right then
> >your options are to add a check like this;
> >
> >if ((pfn & (MAX_ORDER_NR_PAGES - 1)) == 0 && !pfn_valid(pfn))
> >	break;
> >
> >or else ensure that end_pfn is always MAX_ORDER_NR_PAGES aligned and in
> >the same block as pfn and relying on the caller to have called
> >pfn_valid.
> 
> 	pfn = round_down(pfn, pageblock_nr_pages);
> 	end_pfn = pfn + pageblock_nr_pages;
> 
> should do the trick as well, right?  move_freepages_block() seem to be
> doing the same thing.
> 

That would also do it the trick.

> >>+		page = pfn_to_page(pfn);
> >>+		if (PageBuddy(page)) {
> >>+			pfn += 1 << page_order(page);
> >>+		} else if (page_count(page) == 0) {
> >>+			set_page_private(page, MIGRATE_ISOLATE);
> >>+			++pfn;
> >
> >This is dangerous for two reasons. If the page_count is 0, it could
> >be because the page is in the process of being freed and is not
> >necessarily on the per-cpu lists yet and you cannot be sure if the
> >contents of page->private are important. Second, there is nothing to
> >prevent another CPU allocating this page from its per-cpu list while
> >the private field is getting updated from here which might lead to
> >some interesting races.
> >
> >I recognise that what you are trying to do is respond to Gilad's
> >request that you really check if an IPI here is necessary. I think what
> >you need to do is check if a page with a count of 0 is encountered
> >and if it is, then a draining of the per-cpu lists is necessary. To
> >address Gilad's concerns, be sure to only this this once per attempt at
> >CMA rather than for every page encountered with a count of 0 to avoid a
> >storm of IPIs.
> 
> It's actually more then that.
> 
> This is the same issue that I first fixed with a change to free_pcppages_bulk()
> function[1].  At the time of positing, you said you'd like me to try and find
> a different solution which would not involve paying the price of calling
> get_pageblock_migratetype().  Later I also realised that this solution is
> not enough.
> 
> [1] http://article.gmane.org/gmane.linux.kernel.mm/70314
> 

Yes. I had forgotten the history but looking at that patch again,
I would reach the conclusion that this was adding a new call to
get_pageblock_migratetype() in the bulk free path. That would affect
everybody whether they were using CMA or not.

> My next attempt was to run drain PCP list while holding zone->lock[2], but that
> quickly proven to be broken approach when Marek started testing it on an SMP
> system.
> 
> [2] http://article.gmane.org/gmane.linux.kernel.mm/72016
> 
> This patch is yet another attempt of solving this old issue.  Even though it has
> a potential race condition we came to conclusion that the actual chances of
> causing any problems are slim.  Various stress tests did not, in fact, show
> the race to be an issue.
> 

It is a really small race. To cause a problem CPU 1 must find a page
with count 0, CPU 2 must then allocate the page and set page->private
before CPU 1 overwrites that value but it's there.

> The problem is that if a page is on a PCP list, and it's underlaying pageblocks'
> migrate type is changed to MIGRATE_ISOLATE, the page (i) will still remain on PCP
> list and thus someone can allocate it, and (ii) when removed from PCP list, the
> page will be put on freelist of migrate type it had prior to change.
> 
> (i) is actually not such a big issue since the next thing that happens after
> isolation is migration so all the pages will get freed.  (ii) is actual problem
> and if [1] is not an acceptable solution I really don't have a good fix for that.
> 
> One things that comes to mind is calling drain_all_pages() prior to acquiring
> zone->lock in set_migratetype_isolate().  This is however prone to races since
> after the drain and before the zone->lock is acquired, pages might get moved
> back to PCP list.
> 
> Draining PCP list after acquiring zone->lock is not possible because
> smp_call_function_many() cannot be called with interrupts disabled, and changing
> spin_lock_irqsave() to spin_lock() followed by local_irq_save() causes a dead
> lock (that's what [2] attempted to do).
> 
> Any suggestions are welcome!
> 

[1] is still not preferred as I'd still like to keep the impact
of CMA to the normal paths to be as close to 0 as possible. In
update_pcp_isolate_block() how about something like this?

if (page_count(page) == 0) {
	spin_unlock_irqrestore(zone->lock, flags);
	drain_all_pages()
	spin_lock_irqsave(zone->lock, flags);
	if (PageBuddy(page)) {
		order = page_order(page);
		list_del(&page->lru);
		list_add_tail(&page->lru, &zone->free_area[order].free_list[MIGRATE_ISOLATE]);
		set_page_private(page, MIGRATE_ISOLATE);
	}
}

If the page is !PageBuddy, it does not matter as alloc_contig_range()
is just about to migrate it.


> >>+		} else {
> >>+			++pfn;
> >>+		}
> >>+	}
> >>+}
> 
> -- 
> Best regards,                                         _     _
> .o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
> ..o | Computer Science,  Micha?? ???mina86??? Nazarewicz    (o o)
> ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--
> 

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating
  2012-01-30 16:14       ` Mel Gorman
@ 2012-01-31 16:23         ` Marek Szyprowski
  2012-02-02 12:47           ` Mel Gorman
  0 siblings, 1 reply; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-31 16:23 UTC (permalink / raw)
  To: 'Mel Gorman', 'Michal Nazarewicz'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Kyungmin Park', 'Russell King',
	'Andrew Morton', 'KAMEZAWA Hiroyuki',
	'Daniel Walker', 'Arnd Bergmann',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong',
	'Dave Hansen', 'Benjamin Gaignard'

Hello,

On Monday, January 30, 2012 5:15 PM Mel Gorman wrote:

> On Mon, Jan 30, 2012 at 04:41:22PM +0100, Michal Nazarewicz wrote:
> > On Mon, 30 Jan 2012 12:15:22 +0100, Mel Gorman <mel@csn.ul.ie> wrote:

(snipped)

> > >>+		page = pfn_to_page(pfn);
> > >>+		if (PageBuddy(page)) {
> > >>+			pfn += 1 << page_order(page);
> > >>+		} else if (page_count(page) == 0) {
> > >>+			set_page_private(page, MIGRATE_ISOLATE);
> > >>+			++pfn;
> > >
> > >This is dangerous for two reasons. If the page_count is 0, it could
> > >be because the page is in the process of being freed and is not
> > >necessarily on the per-cpu lists yet and you cannot be sure if the
> > >contents of page->private are important. Second, there is nothing to
> > >prevent another CPU allocating this page from its per-cpu list while
> > >the private field is getting updated from here which might lead to
> > >some interesting races.
> > >
> > >I recognise that what you are trying to do is respond to Gilad's
> > >request that you really check if an IPI here is necessary. I think what
> > >you need to do is check if a page with a count of 0 is encountered
> > >and if it is, then a draining of the per-cpu lists is necessary. To
> > >address Gilad's concerns, be sure to only this this once per attempt at
> > >CMA rather than for every page encountered with a count of 0 to avoid a
> > >storm of IPIs.
> >
> > It's actually more then that.
> >
> > This is the same issue that I first fixed with a change to free_pcppages_bulk()
> > function[1].  At the time of positing, you said you'd like me to try and find
> > a different solution which would not involve paying the price of calling
> > get_pageblock_migratetype().  Later I also realised that this solution is
> > not enough.
> >
> > [1] http://article.gmane.org/gmane.linux.kernel.mm/70314
> >
> 
> Yes. I had forgotten the history but looking at that patch again,
> I would reach the conclusion that this was adding a new call to
> get_pageblock_migratetype() in the bulk free path. That would affect
> everybody whether they were using CMA or not.

This will be a bit ugly, but we can also use that code and compile it conditionally
when CMA has been enabled. Pages, which have incorrect migrate type on free finally
causes pageblock migration type change from MIGRATE_CMA to MIGRATE_MOVABLE. This is
not a problem for non-CMA case where only pageblocks with MIGRATE_MOVABLE migration
type are being isolated.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 11/15] mm: trigger page reclaim in alloc_contig_range() to stabilize watermarks
  2012-01-30 13:05   ` Mel Gorman
@ 2012-01-31 17:15     ` Marek Szyprowski
  0 siblings, 0 replies; 61+ messages in thread
From: Marek Szyprowski @ 2012-01-31 17:15 UTC (permalink / raw)
  To: 'Mel Gorman'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Russell King',
	'Andrew Morton', 'KAMEZAWA Hiroyuki',
	'Daniel Walker', 'Arnd Bergmann',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong',
	'Dave Hansen', 'Benjamin Gaignard'

Hello,

On Monday, January 30, 2012 2:06 PM Mel Gorman wrote:

> On Thu, Jan 26, 2012 at 10:00:53AM +0100, Marek Szyprowski wrote:
> > alloc_contig_range() performs memory allocation so it also should keep
> > track on keeping the correct level of memory watermarks. This commit adds
> > a call to *_slowpath style reclaim to grab enough pages to make sure that
> > the final collection of contiguous pages from freelists will not starve
> > the system.
> >
> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> > CC: Michal Nazarewicz <mina86@mina86.com>
> > ---
> >  mm/page_alloc.c |   36 ++++++++++++++++++++++++++++++++++++
> >  1 files changed, 36 insertions(+), 0 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index e35d06b..05eaa82 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5613,6 +5613,34 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned
> long end)
> >  	return ret;
> >  }
> >
> > +/*
> > + * Trigger memory pressure bump to reclaim some pages in order to be able to
> > + * allocate 'count' pages in single page units. Does similar work as
> > + *__alloc_pages_slowpath() function.
> > + */
> > +static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count)
> > +{
> > +	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
> > +	struct zonelist *zonelist = node_zonelist(0, gfp_mask);
> > +	int did_some_progress = 0;
> > +	int order = 1;
> > +	unsigned long watermark;
> > +
> > +	/* Obey watermarks as if the page was being allocated */
> > +	watermark = low_wmark_pages(zone) + count;
> > +	while (!zone_watermark_ok(zone, 0, watermark, 0, 0)) {
> > +		wake_all_kswapd(order, zonelist, high_zoneidx, zone_idx(zone));
> > +
> > +		did_some_progress = __perform_reclaim(gfp_mask, order, zonelist,
> > +						      NULL);
> > +		if (!did_some_progress) {
> > +			/* Exhausted what can be done so it's blamo time */
> > +			out_of_memory(zonelist, gfp_mask, order, NULL);
> > +		}
> 
> There are three problems here
> 
> 1. CMA can trigger the OOM killer.
> 
> That seems like overkill to me but as I do not know the consequences
> of CMA failing, it's your call.

This behavior is intended, we agreed that the contiguous allocations should
have higher priority than others.

> 2. You cannot guarantee that try_to_free_pages will free pages from the
>    zone you care about or that kswapd will do anything
> 
> You check the watermarks and take into account the size of the pending
> CMA allocation. kswapd in vmscan.c on the other hand will simply check
> the watermarks and probably go back to sleep. You should be aware of
> this in case you ever get bugs that CMA takes too long and that it
> appears to be stuck in this loop with kswapd staying asleep.

Right, I experienced this problem today. The simplest workaround I've 
found is to adjust watermark before calling kswapd, but I'm not sure 
that increasing min_free_kbytes and calling setup_per_zone_wmarks() is
the nicest approach for it.

> 3. You reclaim from zones other than your target zone
> 
> try_to_free_pages is not necessarily going to free pages in the
> zone you are checking for. It'll work on ARM in many cases because
> there will be only one zone but on other arches, this logic will
> be problematic and will potentially livelock. You need to pass in
> a zonelist that only contains the zone that CMA cares about. If it
> cannot reclaim, did_some_progress == 0 and it'll exit. Otherwise
> there is a possibility that this will loop forever reclaiming pages
> from the wrong zones.

Right. I tested it on a system with only one zone, so I never experienced 
such problem. For the first version I think we might assume that the buffer
allocated by alloc_contig_range() must fit the single zone. I will add some
comments about it. Later we can extend it for more advanced cases. 

> I won't ack this particular patch but I am not going to insist that
> you fix these prior to merging either. If you leave problem 3 as it
> is, I would really like to see a comment explaning the problem for
> future users of CMA on other arches (if they exist).

I will add more comments about the issues You have pointed out to make
the life easier for other arch developers.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCHv19 00/15] Contiguous Memory Allocator
       [not found]         ` <CA+M3ks7h1t6DbPSAhPN6LJ5Dw84hSukfWG16avh2eZL+o4caJg@mail.gmail.com>
@ 2012-02-01  8:47           ` Marek Szyprowski
  0 siblings, 0 replies; 61+ messages in thread
From: Marek Szyprowski @ 2012-02-01  8:47 UTC (permalink / raw)
  To: 'Benjamin Gaignard', 'Michal Nazarewicz'
  Cc: 'Andrew Morton', 'Mel Gorman',
	'Arnd Bergmann',
	linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Kyungmin Park', 'Russell King',
	'KAMEZAWA Hiroyuki', 'Daniel Walker',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong',
	'Dave Hansen'

Hello,

On Tuesday, January 31, 2012 6:17 PM Benjamin Gaignard wrote:

> I have rebase Linaro CMA test driver to be compatible with CMA v19, it now use
> dma-mapping API instead of v17 CMA API.
> A kernel for snowball with CMA v19 and test driver is available here: 
> http://git.linaro.org/gitweb?p=people/bgaignard/linux-snowball-test-cma-v19.git;a=summary
>
> From this kernel build, I have execute CMA lava (the linaro automatic test tool) 
> test, the same than we are running since v16, the test is OK.
> With previous versions of CMA some the test has found issues when the memory was 
> filled with reclaimables pages, but with v19 this issue is no more present.
> Test logs are here:  https://validation.linaro.org/lava-server/scheduler/job/10841
>
> so you can add:
> Tested-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>

Thanks for Your contribution!

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating
  2012-01-31 16:23         ` Marek Szyprowski
@ 2012-02-02 12:47           ` Mel Gorman
  2012-02-02 19:53             ` Michal Nazarewicz
  0 siblings, 1 reply; 61+ messages in thread
From: Mel Gorman @ 2012-02-02 12:47 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: 'Michal Nazarewicz',
	linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Kyungmin Park', 'Russell King',
	'Andrew Morton', 'KAMEZAWA Hiroyuki',
	'Daniel Walker', 'Arnd Bergmann',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong',
	'Dave Hansen', 'Benjamin Gaignard'

On Tue, Jan 31, 2012 at 05:23:59PM +0100, Marek Szyprowski wrote:
> > > >>+		page = pfn_to_page(pfn);
> > > >>+		if (PageBuddy(page)) {
> > > >>+			pfn += 1 << page_order(page);
> > > >>+		} else if (page_count(page) == 0) {
> > > >>+			set_page_private(page, MIGRATE_ISOLATE);
> > > >>+			++pfn;
> > > >
> > > >This is dangerous for two reasons. If the page_count is 0, it could
> > > >be because the page is in the process of being freed and is not
> > > >necessarily on the per-cpu lists yet and you cannot be sure if the
> > > >contents of page->private are important. Second, there is nothing to
> > > >prevent another CPU allocating this page from its per-cpu list while
> > > >the private field is getting updated from here which might lead to
> > > >some interesting races.
> > > >
> > > >I recognise that what you are trying to do is respond to Gilad's
> > > >request that you really check if an IPI here is necessary. I think what
> > > >you need to do is check if a page with a count of 0 is encountered
> > > >and if it is, then a draining of the per-cpu lists is necessary. To
> > > >address Gilad's concerns, be sure to only this this once per attempt at
> > > >CMA rather than for every page encountered with a count of 0 to avoid a
> > > >storm of IPIs.
> > >
> > > It's actually more then that.
> > >
> > > This is the same issue that I first fixed with a change to free_pcppages_bulk()
> > > function[1].  At the time of positing, you said you'd like me to try and find
> > > a different solution which would not involve paying the price of calling
> > > get_pageblock_migratetype().  Later I also realised that this solution is
> > > not enough.
> > >
> > > [1] http://article.gmane.org/gmane.linux.kernel.mm/70314
> > >
> > 
> > Yes. I had forgotten the history but looking at that patch again,
> > I would reach the conclusion that this was adding a new call to
> > get_pageblock_migratetype() in the bulk free path. That would affect
> > everybody whether they were using CMA or not.
> 
> This will be a bit ugly, but we can also use that code and compile it conditionally
> when CMA has been enabled.

That would also be very unfortunate because it means enabling CMA incurs
a performance cost to everyone whether they use CMA or not. For ARM,
this may not be a problem but it would be for other arches if they
wanted to use CMA or if it ever became part of a distro contig.

> Pages, which have incorrect migrate type on free finally
> causes pageblock migration type change from MIGRATE_CMA to MIGRATE_MOVABLE.

I'm not quite seeing this. In free_hot_cold_page(), the pageblock
type is checked so the page private should be set to MIGRATE_CMA or
MIGRATE_ISOLATE for the CMA area. It's not clear how this can change a
pageblock to MIGRATE_MOVABLE in error. If it turns out that you
absolutely have to call get_pageblock_migratetype() from
free_pcppages_bulk() and my alternative suggestion did not work out then
document all these issues in a comment when putting the call under
CONFIG_CMA so that it is not forgotten.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating
  2012-02-02 12:47           ` Mel Gorman
@ 2012-02-02 19:53             ` Michal Nazarewicz
  2012-02-03  9:31               ` Marek Szyprowski
  2012-02-03 11:27               ` Mel Gorman
  0 siblings, 2 replies; 61+ messages in thread
From: Michal Nazarewicz @ 2012-02-02 19:53 UTC (permalink / raw)
  To: Marek Szyprowski, Mel Gorman
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Kyungmin Park', 'Russell King',
	'Andrew Morton', 'KAMEZAWA Hiroyuki',
	'Daniel Walker', 'Arnd Bergmann',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong',
	'Dave Hansen', 'Benjamin Gaignard'

> On Tue, Jan 31, 2012 at 05:23:59PM +0100, Marek Szyprowski wrote:
>> Pages, which have incorrect migrate type on free finally
>> causes pageblock migration type change from MIGRATE_CMA to MIGRATE_MOVABLE.

On Thu, 02 Feb 2012 13:47:29 +0100, Mel Gorman <mel@csn.ul.ie> wrote:
> I'm not quite seeing this. In free_hot_cold_page(), the pageblock
> type is checked so the page private should be set to MIGRATE_CMA or
> MIGRATE_ISOLATE for the CMA area. It's not clear how this can change a
> pageblock to MIGRATE_MOVABLE in error.

Here's what I think may happen:

When drain_all_pages() is called, __free_one_page() is called for each page on
pcp list with migrate type deducted from page_private() which is MIGRATE_CMA.
This result in the page being put on MIGRATE_CMA freelist even though its
pageblock's migrate type is MIGRATE_ISOLATE.

When allocation happens and pcp list is empty, rmqueue_bulk() will get executed
with migratetype argument set to MIGRATE_MOVABLE.  It calls __rmqueue() to grab
some pages and because the page described above is on MIGRATE_CMA freelist it
may be returned back to rmqueue_bulk().

But, pageblock's migrate type is not MIGRATE_CMA but MIGRATE_ISOLATE, so the
following code:

#ifdef CONFIG_CMA
		if (is_pageblock_cma(page))
			set_page_private(page, MIGRATE_CMA);
		else
#endif
			set_page_private(page, migratetype);

will set it's private to MIGRATE_MOVABLE and in the end the page lands back
on MIGRATE_MOVABLE pcp list but this time with page_private == MIGRATE_MOVABLE
and not MIGRATE_CMA.

One more drain_all_pages() (which may happen since alloc_contig_range() calls
set_migratetype_isolate() for each block) and next __rmqueue_fallback() may
convert the whole pageblock to MIGRATE_MOVABLE.

I know, this sounds crazy and improbable, but I couldn't find an easier path
to destruction.  As you pointed, once the page is allocated, free_hot_cold_page()
will do the right thing by reading pageblock's migrate type.

Marek is currently experimenting with various patches including the following
change:

#ifdef CONFIG_CMA
                 int mt = get_pageblock_migratetype(page);
                 if (is_migrate_cma(mt) || mt == MIGRATE_ISOLATE)
                         set_page_private(page, mt);
                 else
#endif
                         set_page_private(page, migratetype);

As a matter of fact, if __rmqueue() was changed to return migrate type of the
freelist it took page from, we could avoid this get_pageblock_migratetype() all
together.  For now, however, I'd rather not go that way just yet -- I'll be happy
to dig into it once CMA gets merged.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating
  2012-02-02 19:53             ` Michal Nazarewicz
@ 2012-02-03  9:31               ` Marek Szyprowski
  2012-02-03 11:27               ` Mel Gorman
  1 sibling, 0 replies; 61+ messages in thread
From: Marek Szyprowski @ 2012-02-03  9:31 UTC (permalink / raw)
  To: 'Michal Nazarewicz', 'Mel Gorman'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Kyungmin Park', 'Russell King',
	'Andrew Morton', 'KAMEZAWA Hiroyuki',
	'Daniel Walker', 'Arnd Bergmann',
	'Jesse Barker', 'Jonathan Corbet',
	'Shariq Hasnain', 'Chunsang Jeong',
	'Dave Hansen', 'Benjamin Gaignard'

Hello,

On Thursday, February 02, 2012 8:53 PM Michał Nazarewicz wrote:

> > On Tue, Jan 31, 2012 at 05:23:59PM +0100, Marek Szyprowski wrote:
> >> Pages, which have incorrect migrate type on free finally
> >> causes pageblock migration type change from MIGRATE_CMA to MIGRATE_MOVABLE.
> 
> On Thu, 02 Feb 2012 13:47:29 +0100, Mel Gorman <mel@csn.ul.ie> wrote:
> > I'm not quite seeing this. In free_hot_cold_page(), the pageblock
> > type is checked so the page private should be set to MIGRATE_CMA or
> > MIGRATE_ISOLATE for the CMA area. It's not clear how this can change a
> > pageblock to MIGRATE_MOVABLE in error.
> 
> Here's what I think may happen:
> 
> When drain_all_pages() is called, __free_one_page() is called for each page on
> pcp list with migrate type deducted from page_private() which is MIGRATE_CMA.
> This result in the page being put on MIGRATE_CMA freelist even though its
> pageblock's migrate type is MIGRATE_ISOLATE.
> 
> When allocation happens and pcp list is empty, rmqueue_bulk() will get executed
> with migratetype argument set to MIGRATE_MOVABLE.  It calls __rmqueue() to grab
> some pages and because the page described above is on MIGRATE_CMA freelist it
> may be returned back to rmqueue_bulk().
> 
> But, pageblock's migrate type is not MIGRATE_CMA but MIGRATE_ISOLATE, so the
> following code:
> 
> #ifdef CONFIG_CMA
> 		if (is_pageblock_cma(page))
> 			set_page_private(page, MIGRATE_CMA);
> 		else
> #endif
> 			set_page_private(page, migratetype);
> 
> will set it's private to MIGRATE_MOVABLE and in the end the page lands back
> on MIGRATE_MOVABLE pcp list but this time with page_private == MIGRATE_MOVABLE
> and not MIGRATE_CMA.
> 
> One more drain_all_pages() (which may happen since alloc_contig_range() calls
> set_migratetype_isolate() for each block) and next __rmqueue_fallback() may
> convert the whole pageblock to MIGRATE_MOVABLE.
> 
> I know, this sounds crazy and improbable, but I couldn't find an easier path
> to destruction.  As you pointed, once the page is allocated, free_hot_cold_page()
> will do the right thing by reading pageblock's migrate type.
> 
> Marek is currently experimenting with various patches including the following
> change:
> 
> #ifdef CONFIG_CMA
>                  int mt = get_pageblock_migratetype(page);
>                  if (is_migrate_cma(mt) || mt == MIGRATE_ISOLATE)
>                          set_page_private(page, mt);
>                  else
> #endif
>                          set_page_private(page, migratetype);
> 
> As a matter of fact, if __rmqueue() was changed to return migrate type of the
> freelist it took page from, we could avoid this get_pageblock_migratetype() all
> together.  For now, however, I'd rather not go that way just yet -- I'll be happy
> to dig into it once CMA gets merged.

After this and some other changes I'm unable to reproduce that issue. I did a whole
night tests and it still works fine, so it looks that it has been finally solved.
I will post v20 patchset soon :)

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating
  2012-02-02 19:53             ` Michal Nazarewicz
  2012-02-03  9:31               ` Marek Szyprowski
@ 2012-02-03 11:27               ` Mel Gorman
  1 sibling, 0 replies; 61+ messages in thread
From: Mel Gorman @ 2012-02-03 11:27 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Marek Szyprowski, linux-kernel, linux-arm-kernel, linux-media,
	linux-mm, linaro-mm-sig, 'Kyungmin Park',
	'Russell King', 'Andrew Morton',
	'KAMEZAWA Hiroyuki', 'Daniel Walker',
	'Arnd Bergmann', 'Jesse Barker',
	'Jonathan Corbet', 'Shariq Hasnain',
	'Chunsang Jeong', 'Dave Hansen',
	'Benjamin Gaignard'

On Thu, Feb 02, 2012 at 08:53:25PM +0100, Michal Nazarewicz wrote:
> >On Tue, Jan 31, 2012 at 05:23:59PM +0100, Marek Szyprowski wrote:
> >>Pages, which have incorrect migrate type on free finally
> >>causes pageblock migration type change from MIGRATE_CMA to MIGRATE_MOVABLE.
> 
> On Thu, 02 Feb 2012 13:47:29 +0100, Mel Gorman <mel@csn.ul.ie> wrote:
> >I'm not quite seeing this. In free_hot_cold_page(), the pageblock
> >type is checked so the page private should be set to MIGRATE_CMA or
> >MIGRATE_ISOLATE for the CMA area. It's not clear how this can change a
> >pageblock to MIGRATE_MOVABLE in error.
> 
> Here's what I think may happen:
> 
> When drain_all_pages() is called, __free_one_page() is called for each page on
> pcp list with migrate type deducted from page_private() which is MIGRATE_CMA.
> This result in the page being put on MIGRATE_CMA freelist even though its
> pageblock's migrate type is MIGRATE_ISOLATE.
> 

Ok, although it will only be allocated for MIGRATE_CMA-compatible
requests so it is not a disaster.

> When allocation happens and pcp list is empty, rmqueue_bulk() will get executed
> with migratetype argument set to MIGRATE_MOVABLE.  It calls __rmqueue() to grab
> some pages and because the page described above is on MIGRATE_CMA freelist it
> may be returned back to rmqueue_bulk().
> 

This will allocate the page from a pageblock we are trying to isolate
pages from, but only for a movable page that can still be migrated. It
does mean that CMA is doing more work than it should of course and
the problem also impacts memory hot-remove. It's worse for memory
hot-remove because potentially an UNMOVABLE page was allocated from
a MIGRATE_ISOLATE pageblock.

> But, pageblock's migrate type is not MIGRATE_CMA but MIGRATE_ISOLATE, so the
> following code:
> 
> #ifdef CONFIG_CMA
> 		if (is_pageblock_cma(page))
> 			set_page_private(page, MIGRATE_CMA);
> 		else
> #endif
> 			set_page_private(page, migratetype);
> 
> will set it's private to MIGRATE_MOVABLE and in the end the page lands back
> on MIGRATE_MOVABLE pcp list but this time with page_private == MIGRATE_MOVABLE
> and not MIGRATE_CMA.
> 
> One more drain_all_pages() (which may happen since alloc_contig_range() calls
> set_migratetype_isolate() for each block) and next __rmqueue_fallback() may
> convert the whole pageblock to MIGRATE_MOVABLE.
> 
> I know, this sounds crazy and improbable, but I couldn't find an easier path
> to destruction.  As you pointed, once the page is allocated, free_hot_cold_page()
> will do the right thing by reading pageblock's migrate type.
> 

Ok, it's crazy but the problem is there.

> Marek is currently experimenting with various patches including the following
> change:
> 
> #ifdef CONFIG_CMA
>                 int mt = get_pageblock_migratetype(page);
>                 if (is_migrate_cma(mt) || mt == MIGRATE_ISOLATE)
>                         set_page_private(page, mt);
>                 else
> #endif
>                         set_page_private(page, migratetype);
> 
> As a matter of fact, if __rmqueue() was changed to return migrate type of the
> freelist it took page from, we could avoid this get_pageblock_migratetype() all
> together.  For now, however, I'd rather not go that way just yet -- I'll be happy
> to dig into it once CMA gets merged.
> 

Ok, thanks for persisting with this.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCHv19 00/15] Contiguous Memory Allocator
  2012-01-28  0:26   ` Andrew Morton
                       ` (2 preceding siblings ...)
  2012-01-30 13:25     ` Mel Gorman
@ 2012-02-10 18:10     ` Marek Szyprowski
  3 siblings, 0 replies; 61+ messages in thread
From: Marek Szyprowski @ 2012-02-10 18:10 UTC (permalink / raw)
  To: 'Andrew Morton', 'Arnd Bergmann'
  Cc: linux-kernel, linux-arm-kernel, linux-media, linux-mm,
	linaro-mm-sig, 'Michal Nazarewicz',
	'Kyungmin Park', 'Russell King',
	'KAMEZAWA Hiroyuki', 'Daniel Walker',
	'Mel Gorman', 'Jesse Barker',
	'Jonathan Corbet', 'Shariq Hasnain',
	'Chunsang Jeong', 'Dave Hansen',
	'Benjamin Gaignard'

Hi Andrew,

On Saturday, January 28, 2012 1:26 AM Andrew Morton wrote:

> These patches don't seem to have as many acked-bys and reviewed-bys as
> I'd expect.  Given the scope and duration of this, it would be useful
> to gather these up.  But please ensure they are real ones - people
> sometimes like to ack things without showing much sign of having
> actually read them.
> 
> Also there is the supreme tag: "Tested-by:.".  Ohad (at least) has been
> testing the code.  Let's mention that.
> 
> 
> The patches do seem to have been going round in ever-decreasing circles
> lately and I think we have decided to merge them (yes?) so we may as well
> get on and do that and sort out remaining issues in-tree.

It looks that the CMA patch series reached the final version - I've just 
posted version 21 a few minutes ago. Most of the patches got acks from either 
Mel or Arnd and the remaining few needs only minor tweaking, but they affect
only CMA users, which we hope to fix once the series is merged. That's why I
would like to ask You to merge these patches to Your tree and finally give
them a try in linux-next kernel.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center




^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2012-02-10 18:22 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-26  9:00 [PATCHv19 00/15] Contiguous Memory Allocator Marek Szyprowski
2012-01-26  9:00 ` [PATCH 01/15] mm: page_alloc: remove trailing whitespace Marek Szyprowski
2012-01-30 10:59   ` Mel Gorman
2012-01-26  9:00 ` [PATCH 02/15] mm: page_alloc: update migrate type of pages on pcp when isolating Marek Szyprowski
2012-01-30 11:15   ` Mel Gorman
2012-01-30 15:41     ` Michal Nazarewicz
2012-01-30 16:14       ` Mel Gorman
2012-01-31 16:23         ` Marek Szyprowski
2012-02-02 12:47           ` Mel Gorman
2012-02-02 19:53             ` Michal Nazarewicz
2012-02-03  9:31               ` Marek Szyprowski
2012-02-03 11:27               ` Mel Gorman
2012-01-26  9:00 ` [PATCH 03/15] mm: compaction: introduce isolate_migratepages_range() Marek Szyprowski
2012-01-30 11:24   ` Mel Gorman
2012-01-30 12:42     ` Michal Nazarewicz
2012-01-30 13:25       ` Mel Gorman
2012-01-26  9:00 ` [PATCH 04/15] mm: compaction: introduce isolate_freepages_range() Marek Szyprowski
2012-01-30 11:48   ` Mel Gorman
2012-01-30 11:55     ` Mel Gorman
2012-01-26  9:00 ` [PATCH 05/15] mm: compaction: export some of the functions Marek Szyprowski
2012-01-30 11:57   ` Mel Gorman
2012-01-30 12:33     ` Michal Nazarewicz
2012-01-26  9:00 ` [PATCH 06/15] mm: page_alloc: introduce alloc_contig_range() Marek Szyprowski
2012-01-30 12:11   ` Mel Gorman
2012-01-26  9:00 ` [PATCH 07/15] mm: page_alloc: change fallbacks array handling Marek Szyprowski
2012-01-30 12:12   ` Mel Gorman
2012-01-26  9:00 ` [PATCH 08/15] mm: mmzone: MIGRATE_CMA migration type added Marek Szyprowski
2012-01-30 12:35   ` Mel Gorman
2012-01-30 13:06     ` Michal Nazarewicz
2012-01-30 14:52       ` Mel Gorman
2012-01-26  9:00 ` [PATCH 09/15] mm: page_isolation: MIGRATE_CMA isolation functions added Marek Szyprowski
2012-01-26  9:00 ` [PATCH 10/15] mm: extract reclaim code from __alloc_pages_direct_reclaim() Marek Szyprowski
2012-01-30 12:42   ` Mel Gorman
2012-01-26  9:00 ` [PATCH 11/15] mm: trigger page reclaim in alloc_contig_range() to stabilize watermarks Marek Szyprowski
2012-01-30 13:05   ` Mel Gorman
2012-01-31 17:15     ` Marek Szyprowski
2012-01-26  9:00 ` [PATCH 12/15] drivers: add Contiguous Memory Allocator Marek Szyprowski
2012-01-27  9:44   ` [Linaro-mm-sig] " Ohad Ben-Cohen
2012-01-27 10:53     ` Marek Szyprowski
2012-01-27 14:27       ` Clark, Rob
2012-01-27 14:51         ` Marek Szyprowski
2012-01-27 14:59           ` Ohad Ben-Cohen
2012-01-27 15:17             ` Marek Szyprowski
2012-01-28 18:57               ` Ohad Ben-Cohen
2012-01-30  7:43                 ` Marek Szyprowski
2012-01-30  9:16                   ` Ohad Ben-Cohen
2012-01-27 14:56       ` Ohad Ben-Cohen
2012-01-26  9:00 ` [PATCH 13/15] X86: integrate CMA with DMA-mapping subsystem Marek Szyprowski
2012-01-26  9:00 ` [PATCH 14/15] ARM: " Marek Szyprowski
2012-01-26  9:00 ` [PATCH 15/15] ARM: Samsung: use CMA for 2 memory banks for s5p-mfc device Marek Szyprowski
2012-01-26 15:31 ` [PATCHv19 00/15] Contiguous Memory Allocator Arnd Bergmann
2012-01-26 15:38   ` Michal Nazarewicz
2012-01-26 15:48   ` Marek Szyprowski
2012-01-28  0:26   ` Andrew Morton
2012-01-29 18:09     ` Rob Clark
2012-01-29 20:32       ` Anca Emanuel
2012-01-29 20:51     ` Arnd Bergmann
2012-01-30 13:25     ` Mel Gorman
2012-01-30 15:43       ` Michal Nazarewicz
     [not found]         ` <CA+M3ks7h1t6DbPSAhPN6LJ5Dw84hSukfWG16avh2eZL+o4caJg@mail.gmail.com>
2012-02-01  8:47           ` Marek Szyprowski
2012-02-10 18:10     ` Marek Szyprowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).